Changes

LibStatGen: FASTQ (view source)

Revision as of 14:59, 4 February 2010

984 bytes added , 14:59, 4 February 2010

no edit summary

Line 70: Line 70:

| ERROR on Line <current line #>: Invalid character ('<invalid char>') in quality string.

|}

+

== Additional Features ==

*Base composition are reported and tracked by position.

*Consumes gzipped and uncompressed text files transparently (see libcsg/InputFile.h).

+

*Prints error messages for errors up to the configurable maximum number of reportable errors. A summary of the total number of errors is also printed.

+

*Prints the total number of lines processed as well as the total number of sequences processed.

+

== Assumptions ==

Line 82: Line 86:

*All lines are part of the Raw Sequence Line until a line that starts with a '+' is discovered.

*All lines are considered part of the quality string until at least the length of the associated raw sequence is hit (or the end of the file is reached). This is due to the fact that '@' is a valid quality character, so does not necessarily indicate the start of a Sequence Identifier Line.

+

== Additional Wishlist - Not Implemented ==

*To reduce memory usage, implement a two-pass algorithm that stores only a key for each sequence name (rather than complete sequence names) in memory (suggest a pair of options -1 -> one pass, high memory use, -2 -> two pass lower memory use, default is -1).

+

*Add an option that would reject raw sequence and quality strings that wrap over multiple lines. It would only allow 1 line per raw sequence/quality string.

+

*Maybe report 2 types of information to the user: ERROR (critical failure) and WARNING (tolerable errors).

+

== Possible Issues ==

+

* For color space, there is no specification for:

+

# The length of read and quality string may be the same or differs by 1 (depending on whether the primer base has a corresponding quality value).

+

# Missing values are usually presented by "." or sometimes left as a blank " ".

+

# Tag names for paired end reads may be the same (e.g. MAQ actually enforces that), and may be in the same file (e.g. BFAST require paired reads in the same file)

+

== How to Use the fastQValidator Executable ==

Mktrost

Administrators

3,045

edits

Changes

LibStatGen: FASTQ (view source)

Revision as of 14:59, 4 February 2010

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools