Difference between revisions of "FastQ Validation Criteria"
From Genome Analysis Wiki
Jump to navigationJump to search (Created page with '== FastQ Sequence Validation Criteria == {| class="wikitable" style="width:100%" |+ style="font-size:150%" |'''Sequence Identifier Line''' ! width="50%"|Validation Criteria ! …') |
(Update path to fastqvalidator) |
||
(4 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== FastQ Sequence Validation Criteria == | == FastQ Sequence Validation Criteria == | ||
− | {| class="wikitable" style="width:100%" | + | |
+ | The following validation criteria is used by [[C++ Class: FastQFile|FastQFile class]] and the [[fastQValidator|the FastQ Validator Program]] when reading a FastQ Sequence | ||
+ | |||
+ | |||
+ | {| class="wikitable" style="width:100%" border="1" | ||
|+ style="font-size:150%" |'''Sequence Identifier Line''' | |+ style="font-size:150%" |'''Sequence Identifier Line''' | ||
! width="50%"|Validation Criteria | ! width="50%"|Validation Criteria | ||
Line 29: | Line 33: | ||
| All characters in the base sequence must be in the allowable set specified via configuration. | | All characters in the base sequence must be in the allowable set specified via configuration. | ||
* Base Only: A C T G N a c t g n | * Base Only: A C T G N a c t g n | ||
− | * Color Space Only: 0 1 2 3 .(period) | + | * Color Space Only: 0 1 2 3 .(period) Color Space files must start with a 1 character primer base. |
| ERROR on Line <current line #>: Invalid character ('<invalid char>') in base sequence. | | ERROR on Line <current line #>: Invalid character ('<invalid char>') in base sequence. | ||
|- | |- |
Latest revision as of 14:01, 7 September 2011
FastQ Sequence Validation Criteria
The following validation criteria is used by FastQFile class and the the FastQ Validator Program when reading a FastQ Sequence
Validation Criteria | Error Message |
---|---|
Line is at least 2 characters long ('@' and at least 1 for the sequence identifier) | ERROR on Line <current line #>: The sequence identifier line was too short. |
Line starts with an '@' | ERROR on Line <current line #>: First line of a sequence does not begin wtih @ |
Line does not contain a space between the '@' and the first sequence identifier (which must be at least 1 character). | ERROR on Line <current line #>: No Sequence Identifier specified before the comment. |
Every entry in the file should have a unique identifier. | ERROR on Line <current line #>: Repeated Sequence Identifier: <identifier> at Lines <previous line #> <current line #> |
Validation Criteria | Error Message |
---|---|
A base sequence should have non-zero length. | ERROR on Line <current line #>: Raw Sequence is shorter than the min read length: 0 < <config min read length> |
All characters in the base sequence must be in the allowable set specified via configuration.
|
ERROR on Line <current line #>: Invalid character ('<invalid char>') in base sequence. |
Reads should be of a configurable minimum length since many mappers will get into trouble with very short reads.
|
ERROR on Line <current line #>: Raw Sequence is shorter than the min read length: <read length> < <config min read length> |
Each Line of a Raw Sequence should have at least 1 character (not be blank). | ERROR on Line <current line #>: Looking for continuation of Raw Sequence or '+' instead found a blank line, assuming it was part of Raw Sequence. |
Validation Criteria | Error Message |
---|---|
Must exist for every sequence. | ERROR on Line <current line #>: Reached the end of the file without a '+' line. |
If the optional sequence identifier is specified, it must equal the one on the Sequence Identifier Line. | ERROR on Line <current line #>: Sequence Identifier on '+' line does not equal the one on the '@' line. |
Validation Criteria | Error Message |
---|---|
A quality string should be present for every base sequence. | ERROR on Line <current line #>: Quality string length (<quality length>) does not equal raw sequence length (<raw sequence length>) |
Paired quality and base sequences should be of the same length. | ERROR on Line <current line #>: Quality string length (<quality length>) does not equal raw sequence length (<raw sequence length>) |
Valid quality values should all have ASCII codes > 32. | ERROR on Line <current line #>: Invalid character ('<invalid char>') in quality string. |