Changes

From Genome Analysis Wiki
Jump to: navigation, search

LibStatGen: FASTQ

1,227 bytes added, 16:44, 3 February 2010
no edit summary
== Validation Criteria ==
=== Sequence Identifier Line ===
*Every entry in the file should have a unique identifier.
 
=== Raw Sequence Line ===
*A base sequence should have non-zero length.
*Validates the base sequences against the characters allowed via configuration.
** Base Only: A C T G N a c t g n
** Color Space Only: 0 1 2 3 .(period)
** Base or Color Space: A C T G N a c t g n 0 1 2 3 .(period)
*Reads should be of a minimum length; many mappers will get into trouble with very short reads.
 
=== Plus Line ===
 
=== Quality String Line ===
*A quality string should be present for every base sequence.
*Paired quality and base sequences should be of the same length.
*Valid quality values should all have ASCII codes > 32.
 
== Additional Features ==
*Base composition are reported and tracked by position.
*Consumes gzipped and uncompressed text files transparently (see libcsg/InputFile.h).
 
== Additional Wishlist - Not Implemented ==
*To reduce memory usage, implement a two-pass algorithm that stores only a key for each sequence name (rather than complete sequence names) in memory (suggest a pair of options -1 -> one pass, high memory use, -2 -> two pass lower memory use, default is -1).
 
 
 
== Assumptions ==
 
== How to Use the fastQValidator Executable ==
'''Required Parameters:'''
== FastQ Validator Output ==
The FastQ Validator'''Coming Soon'''

Navigation menu