FastQ Validation Criteria
From Genome Analysis Wiki
Jump to navigationJump to searchFastQ Sequence Validation Criteria
The following validation criteria is used by FastQFile class and the the FastQ Validator Program when reading a FastQ Sequence
Validation Criteria | Error Message |
---|---|
Line is at least 2 characters long ('@' and at least 1 for the sequence identifier) | ERROR on Line <current line #>: The sequence identifier line was too short. |
Line starts with an '@' | ERROR on Line <current line #>: First line of a sequence does not begin wtih @ |
Line does not contain a space between the '@' and the first sequence identifier (which must be at least 1 character). | ERROR on Line <current line #>: No Sequence Identifier specified before the comment. |
Every entry in the file should have a unique identifier. | ERROR on Line <current line #>: Repeated Sequence Identifier: <identifier> at Lines <previous line #> <current line #> |
Validation Criteria | Error Message |
---|---|
A base sequence should have non-zero length. | ERROR on Line <current line #>: Raw Sequence is shorter than the min read length: 0 < <config min read length> |
All characters in the base sequence must be in the allowable set specified via configuration.
|
ERROR on Line <current line #>: Invalid character ('<invalid char>') in base sequence. |
Reads should be of a configurable minimum length since many mappers will get into trouble with very short reads.
|
ERROR on Line <current line #>: Raw Sequence is shorter than the min read length: <read length> < <config min read length> |
Each Line of a Raw Sequence should have at least 1 character (not be blank). | ERROR on Line <current line #>: Looking for continuation of Raw Sequence or '+' instead found a blank line, assuming it was part of Raw Sequence. |
Validation Criteria | Error Message |
---|---|
Must exist for every sequence. | ERROR on Line <current line #>: Reached the end of the file without a '+' line. |
If the optional sequence identifier is specified, it must equal the one on the Sequence Identifier Line. | ERROR on Line <current line #>: Sequence Identifier on '+' line does not equal the one on the '@' line. |
Validation Criteria | Error Message |
---|---|
A quality string should be present for every base sequence. | ERROR on Line <current line #>: Quality string length (<quality length>) does not equal raw sequence length (<raw sequence length>) |
Paired quality and base sequences should be of the same length. | ERROR on Line <current line #>: Quality string length (<quality length>) does not equal raw sequence length (<raw sequence length>) |
Valid quality values should all have ASCII codes > 32. | ERROR on Line <current line #>: Invalid character ('<invalid char>') in quality string. |