Changes

LibStatGen: FASTQ (view source)

Revision as of 14:54, 4 February 2010

171 bytes added , 14:54, 4 February 2010

no edit summary

Line 1: Line 1:

== Validation Criteria ==

=== Sequence Identifier Line ===

−

{| class="wikitable" border="1"

+

{| class="wikitable" style="width:100%" border="1"

|-

−

! Validation Criteria

+

! width="50%"|Validation Criteria

−

! Error Message

+

! width="50%"|Error Message

|-

| Line is at least 2 characters long ('@' and at least 1 for the sequence identifier)

Line 20: Line 20:

=== Raw Sequence Line ===

−

{| class="wikitable" border="1"

+

{| class="wikitable" style="width:100%" border="1"

|-

−

! Validation Criteria

+

! width="50%"|Validation Criteria

−

! Error Message

+

! width="50%"|Error Message

|-

| A base sequence should have non-zero length.

Line 43: Line 43:

=== Plus Line ===

−

{| class="wikitable" border="1"

+

{| class="wikitable" style="width:100%" border="1"

|-

−

! Validation Criteria

+

! width="50%"|Validation Criteria

−

! Error Message

+

! width="50%"|Error Message

|-

| Must exist for every sequence.

Line 56: Line 56:

=== Quality String Line ===

−

{| class="wikitable" border="1"

+

{| class="wikitable" style="width:100%" border="1"

|-

−

! Validation Criteria

+

! width="50%"|Validation Criteria

−

! Error Message

+

! width="50%"|Error Message

|-

| A quality string should be present for every base sequence.

Line 74: Line 74:

*Base composition are reported and tracked by position.

*Consumes gzipped and uncompressed text files transparently (see libcsg/InputFile.h).

−

~~== Additional Wishlist - Not Implemented ==~~

−

*To reduce memory usage, implement a two-pass algorithm that stores only a key for each sequence name (rather than complete sequence names) in memory (suggest a pair of options -1 -> one pass, high memory use, -2 -> two pass lower memory use, default is -1).

−

== Assumptions ==

Line 86: Line 82:

*All lines are part of the Raw Sequence Line until a line that starts with a '+' is discovered.

*All lines are considered part of the quality string until at least the length of the associated raw sequence is hit (or the end of the file is reached). This is due to the fact that '@' is a valid quality character, so does not necessarily indicate the start of a Sequence Identifier Line.

+

== Additional Wishlist - Not Implemented ==

+

*To reduce memory usage, implement a two-pass algorithm that stores only a key for each sequence name (rather than complete sequence names) in memory (suggest a pair of options -1 -> one pass, high memory use, -2 -> two pass lower memory use, default is -1).

== How to Use the fastQValidator Executable ==

Mktrost

Administrators

3,045

edits

Changes

LibStatGen: FASTQ (view source)

Revision as of 14:54, 4 February 2010

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools