Open main menu

Genome Analysis Wiki β

Changes

LibStatGen: FASTQ

171 bytes added, 13:54, 4 February 2010
no edit summary
== Validation Criteria ==
=== Sequence Identifier Line ===
{| class="wikitable" style="width:100%" border="1"
|-
! width="50%"|Validation Criteria! width="50%"|Error Message
|-
| Line is at least 2 characters long ('@' and at least 1 for the sequence identifier)
=== Raw Sequence Line ===
{| class="wikitable" style="width:100%" border="1"
|-
! width="50%"|Validation Criteria! width="50%"|Error Message
|-
| A base sequence should have non-zero length.
=== Plus Line ===
{| class="wikitable" style="width:100%" border="1"
|-
! width="50%"|Validation Criteria! width="50%"|Error Message
|-
| Must exist for every sequence.
=== Quality String Line ===
{| class="wikitable" style="width:100%" border="1"
|-
! width="50%"|Validation Criteria! width="50%"|Error Message
|-
| A quality string should be present for every base sequence.
*Base composition are reported and tracked by position.
*Consumes gzipped and uncompressed text files transparently (see libcsg/InputFile.h).
 
== Additional Wishlist - Not Implemented ==
*To reduce memory usage, implement a two-pass algorithm that stores only a key for each sequence name (rather than complete sequence names) in memory (suggest a pair of options -1 -> one pass, high memory use, -2 -> two pass lower memory use, default is -1).
 
== Assumptions ==
*All lines are part of the Raw Sequence Line until a line that starts with a '+' is discovered.
*All lines are considered part of the quality string until at least the length of the associated raw sequence is hit (or the end of the file is reached). This is due to the fact that '@' is a valid quality character, so does not necessarily indicate the start of a Sequence Identifier Line.
 
== Additional Wishlist - Not Implemented ==
*To reduce memory usage, implement a two-pass algorithm that stores only a key for each sequence name (rather than complete sequence names) in memory (suggest a pair of options -1 -> one pass, high memory use, -2 -> two pass lower memory use, default is -1).
== How to Use the fastQValidator Executable ==