== Status ==
The [http://en.wikipedia.org/wiki/FASTQ_format FastQ] Validator is
on our [[Todo List]].
An initial version of the [[FastQFile]] has been completed which includes validation methods.
== Valid FastQ File Requirements ==
A valid fastQ file
should meet the following requirements:
*A base sequence should have non-zero length.
*A quality string should be present for every base sequence.
Paired quality and base sequences should be of the same length.
*Valid quality values should all have ASCII codes > 32.
*Valid bases should be ACTG or N, unless ambiguous bases are explicitly allowed by the application consuming the file. Lower case characters are allowed.
*Every entry in the file should have a unique identifier. *Reads should be of a minimum length; many mappers will get into trouble with very short reads. *Base composition should be reported and tracked by position. == Additional Wishlist == There are a series of optional capabilities a FastQ Validator should implement. Among those: *Consume gzipped and uncompressed text files transparently (see libcsg/InputFile.h).
*To reduce memory usage, implement a two-pass algorithm that stores only a key for each sequence name (rather than complete sequence names) in memory (suggest a pair of options -1 -> one pass, high memory use, -2 -> two pass lower memory use, default is -1).
*Support color space files, where valid base sequences include the characters 0, 1, 2, 3, '.' (period) in addition to A, C, T, G and N (some csfastq sequence lines start with a primer base).
== Discussion ==
* It may be useful to report 2 types of information to the user: ERROR (critical failure) and WARNING (tolerable errors).