Changes

From Genome Analysis Wiki
Jump to navigationJump to search
493 bytes added ,  10:55, 13 January 2010
no edit summary
Line 15: Line 15:  
* Valid quality values should all have ASCII codes > 32.
 
* Valid quality values should all have ASCII codes > 32.
   −
* Valid bases should be ACTG or N, unless ambiguous bases are explicitly allowed by the application consuming the file.  
+
* Valid bases should be ACTG or N, unless ambiguous bases are explicitly allowed by the application consuming the file. Lower case characters are allowed.
    
* Every entry in the file should have a unique identifier.
 
* Every entry in the file should have a unique identifier.
Line 22: Line 22:     
* Base composition should be reported and tracked by position.
 
* Base composition should be reported and tracked by position.
 +
 +
== Additional Wishlist ==
 +
 +
There are a series of optional capabilities a FastQ Validator should implement. Among those:
 +
 +
* Consume gzipped and uncompressed text files transparently.
 +
 +
* To reduce memory usage, implement a two-pass algorithm that stores only a key for each sequence name (rather than complete sequence names) in memory.
 +
 +
* Support color space files, where valid base sequences include the characters 0, 1, 2, 3, 4 instead of A, C, T, G and N.

Navigation menu