From Genome Analysis Wiki
Jump to navigationJump to search
493 bytes added
, 10:55, 13 January 2010
Line 15: |
Line 15: |
| * Valid quality values should all have ASCII codes > 32. | | * Valid quality values should all have ASCII codes > 32. |
| | | |
− | * Valid bases should be ACTG or N, unless ambiguous bases are explicitly allowed by the application consuming the file. | + | * Valid bases should be ACTG or N, unless ambiguous bases are explicitly allowed by the application consuming the file. Lower case characters are allowed. |
| | | |
| * Every entry in the file should have a unique identifier. | | * Every entry in the file should have a unique identifier. |
Line 22: |
Line 22: |
| | | |
| * Base composition should be reported and tracked by position. | | * Base composition should be reported and tracked by position. |
| + | |
| + | == Additional Wishlist == |
| + | |
| + | There are a series of optional capabilities a FastQ Validator should implement. Among those: |
| + | |
| + | * Consume gzipped and uncompressed text files transparently. |
| + | |
| + | * To reduce memory usage, implement a two-pass algorithm that stores only a key for each sequence name (rather than complete sequence names) in memory. |
| + | |
| + | * Support color space files, where valid base sequences include the characters 0, 1, 2, 3, 4 instead of A, C, T, G and N. |