From Genome Analysis Wiki
Jump to navigationJump to search
1,227 bytes added
, 17:44, 3 February 2010
Line 1: |
Line 1: |
| + | == Validation Criteria == |
| + | === Sequence Identifier Line === |
| + | *Every entry in the file should have a unique identifier. |
| + | |
| + | === Raw Sequence Line === |
| + | *A base sequence should have non-zero length. |
| + | *Validates the base sequences against the characters allowed via configuration. |
| + | ** Base Only: A C T G N a c t g n |
| + | ** Color Space Only: 0 1 2 3 .(period) |
| + | ** Base or Color Space: A C T G N a c t g n 0 1 2 3 .(period) |
| + | *Reads should be of a minimum length; many mappers will get into trouble with very short reads. |
| + | |
| + | === Plus Line === |
| + | |
| + | === Quality String Line === |
| + | *A quality string should be present for every base sequence. |
| + | *Paired quality and base sequences should be of the same length. |
| + | *Valid quality values should all have ASCII codes > 32. |
| + | |
| + | == Additional Features == |
| + | *Base composition are reported and tracked by position. |
| + | *Consumes gzipped and uncompressed text files transparently (see libcsg/InputFile.h). |
| + | |
| + | == Additional Wishlist - Not Implemented == |
| + | *To reduce memory usage, implement a two-pass algorithm that stores only a key for each sequence name (rather than complete sequence names) in memory (suggest a pair of options -1 -> one pass, high memory use, -2 -> two pass lower memory use, default is -1). |
| + | |
| + | |
| + | |
| + | == Assumptions == |
| + | |
| == How to Use the fastQValidator Executable == | | == How to Use the fastQValidator Executable == |
| '''Required Parameters:''' | | '''Required Parameters:''' |
Line 22: |
Line 52: |
| | | |
| == FastQ Validator Output == | | == FastQ Validator Output == |
− | The FastQ Validator
| + | '''Coming Soon''' |