Line 142: |
Line 142: |
| | | |
| == Validation Criteria Used For Reading a Sequence == | | == Validation Criteria Used For Reading a Sequence == |
− | {| class="wikitable" style="width:100%"
| + | [[FastQ Validation Criteria]] |
− | |+ style="font-size:150%" |'''Sequence Identifier Line'''
| |
− | ! width="50%"|Validation Criteria
| |
− | ! width="50%"|Error Message
| |
− | |-
| |
− | | Line is at least 2 characters long ('@' and at least 1 for the sequence identifier)
| |
− | | ERROR on Line <current line #>: The sequence identifier line was too short.
| |
− | |-
| |
− | | Line starts with an '@'
| |
− | | ERROR on Line <current line #>: First line of a sequence does not begin wtih @
| |
− | |-
| |
− | | Line does not contain a space between the '@' and the first sequence identifier (which must be at least 1 character).
| |
− | | ERROR on Line <current line #>: No Sequence Identifier specified before the comment.
| |
− | |-
| |
− | | Every entry in the file should have a unique identifier.
| |
− | | ERROR on Line <current line #>: Repeated Sequence Identifier: <identifier> at Lines <previous line #> <current line #>
| |
− | |}
| |
− | | |
− | | |
− | {| class="wikitable" style="width:100%" border="1"
| |
− | |+ style="font-size:150%"|'''Raw Sequence Line'''
| |
− | ! width="50%"|Validation Criteria
| |
− | ! width="50%"|Error Message
| |
− | |-
| |
− | | A base sequence should have non-zero length.
| |
− | | ERROR on Line <current line #>: Raw Sequence is shorter than the min read length: 0 < <config min read length>
| |
− | |-
| |
− | | All characters in the base sequence must be in the allowable set specified via configuration.
| |
− | * Base Only: A C T G N a c t g n
| |
− | * Color Space Only: 0 1 2 3 .(period)
| |
− | | ERROR on Line <current line #>: Invalid character ('<invalid char>') in base sequence.
| |
− | |-
| |
− | | Reads should be of a configurable minimum length since many mappers will get into trouble with very short reads.
| |
− | * If the raw sequence spans lines, the sum of the lengths of all lines are validated, not each individual line.
| |
− | | ERROR on Line <current line #>: Raw Sequence is shorter than the min read length: <read length> < <config min read length>
| |
− | |-
| |
− | | Each Line of a Raw Sequence should have at least 1 character (not be blank).
| |
− | | ERROR on Line <current line #>: Looking for continuation of Raw Sequence or '+' instead found a blank line, assuming it was part of Raw Sequence.
| |
− | |}
| |
− | | |
− | | |
− | {| class="wikitable" style="width:100%" border="1"
| |
− | |+ style="font-size:150%"|'''Plus Line'''
| |
− | ! width="50%"|Validation Criteria
| |
− | ! width="50%"|Error Message
| |
− | |-
| |
− | | Must exist for every sequence.
| |
− | | ERROR on Line <current line #>: Reached the end of the file without a '+' line.
| |
− | |-
| |
− | | If the optional sequence identifier is specified, it must equal the one on the Sequence Identifier Line.
| |
− | | ERROR on Line <current line #>: Sequence Identifier on '+' line does not equal the one on the '@' line.
| |
− | |}
| |
− | | |
− | | |
− | {| class="wikitable" style="width:100%" border="1"
| |
− | |+ style="font-size:150%"|'''Quality String Line'''
| |
− | ! width="50%"|Validation Criteria
| |
− | ! width="50%"|Error Message
| |
− | |-
| |
− | | A quality string should be present for every base sequence.
| |
− | | ERROR on Line <current line #>: Quality string length (<quality length>) does not equal raw sequence length (<raw sequence length>)
| |
− | |-
| |
− | | Paired quality and base sequences should be of the same length.
| |
− | | ERROR on Line <current line #>: Quality string length (<quality length>) does not equal raw sequence length (<raw sequence length>)
| |
− | |-
| |
− | | Valid quality values should all have ASCII codes > 32.
| |
− | | ERROR on Line <current line #>: Invalid character ('<invalid char>') in quality string.
| |
− | |}
| |
− | | |
| | | |
| == Reading Sequence Assumptions == | | == Reading Sequence Assumptions == |