Difference between revisions of "SAM Validation Criteria"
From Genome Analysis Wiki
Jump to navigationJump to searchLine 5: | Line 5: | ||
{| class="wikitable" style="width:100%" border="1" | {| class="wikitable" style="width:100%" border="1" | ||
|+ style="font-size:150%"|'''SAM Alignment Record''' | |+ style="font-size:150%"|'''SAM Alignment Record''' | ||
− | ! | + | ! rowspan='2' width="60%"|Validation Criteria |
− | ! | + | ! colspan="2" width="20%"|Implemented |
− | ! | + | ! colspan="2" width="20%"|Tested |
+ | |- | ||
+ | ! width="10%"|SAM | ||
+ | ! width="10%"|BAM | ||
+ | ! width="10%"|SAM | ||
+ | ! width="10%"|BAM | ||
|- | |- | ||
| QNAME.Length() > 0 and <= 254 | | QNAME.Length() > 0 and <= 254 | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
− | |style="background-color:red;"| | + | |style="background-color:red;"| |
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|- | |- | ||
| QNAME does not contain [ \t\n\r] | | QNAME does not contain [ \t\n\r] | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| FLAG is an integer [0-9]+ | | FLAG is an integer [0-9]+ | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| FLAG < 2048 (I think) or [0, (2^16)-1] | | FLAG < 2048 (I think) or [0, (2^16)-1] | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| RNAME does not contain [ \t\n\r@=] | | RNAME does not contain [ \t\n\r@=] | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| POS is an integer [0-9]+ | | POS is an integer [0-9]+ | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| POS is [0, (2^29)-1] | | POS is [0, (2^29)-1] | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| MAPQ is an integer [0-9]+ | | MAPQ is an integer [0-9]+ | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| MAPQ is [0, (2^8)-1] | | MAPQ is [0, (2^8)-1] | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| <nowiki>CIGAR ([0-9]+[MIDNSHP])+|\*</nowiki> | | <nowiki>CIGAR ([0-9]+[MIDNSHP])+|\*</nowiki> | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| MRNM does not contain [ \t\n\r@] ('=' means it is the same as RNAME) | | MRNM does not contain [ \t\n\r@] ('=' means it is the same as RNAME) | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| If SQ is in the header RNAME & MRNM (if not “=”) must be in SQ. | | If SQ is in the header RNAME & MRNM (if not “=”) must be in SQ. | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| MPOS is an integer [0-9]+ | | MPOS is an integer [0-9]+ | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| MPOS is [0, (2^29)-1] | | MPOS is [0, (2^29)-1] | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| ISIZE is an integer -?[0-9]+ | | ISIZE is an integer -?[0-9]+ | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| ISIZE is [-(2^29), 2^29] | | ISIZE is [-(2^29), 2^29] | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| <nowiki>SEQ is [acgtnACGTN.=]+|\*</nowiki> | | <nowiki>SEQ is [acgtnACGTN.=]+|\*</nowiki> | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| If SEQ is * then QUAL is * | | If SEQ is * then QUAL is * | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| <nowiki>QUAL is [!-~]+|* → dec 33 – 126 or dec 42 (which is in 32-126) (for BAM, it is between [0,93])</nowiki> | | <nowiki>QUAL is [!-~]+|* → dec 33 – 126 or dec 42 (which is in 32-126) (for BAM, it is between [0,93])</nowiki> | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| If QUAL is not “*” it is the same length as SEQ. | | If QUAL is not “*” it is the same length as SEQ. | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| TAG is [A-Z][A-Z0-9] | | TAG is [A-Z][A-Z0-9] | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| A TAG only appears once per alignment | | A TAG only appears once per alignment | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| VTYPE is [AifZH] for SAM and [AcCsSiIfZH] | | VTYPE is [AifZH] for SAM and [AcCsSiIfZH] | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| VALUE does NOT contain [\t\n\r] | | VALUE does NOT contain [\t\n\r] | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| For VTYPE = “A”, VALUE is a printable character | | For VTYPE = “A”, VALUE is a printable character | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| For VTYPE = “i”, VALUE is a signed 32-bit integer. | | For VTYPE = “i”, VALUE is a signed 32-bit integer. | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| For VTYPE = “f”, VALUE is a single-precision float. | | For VTYPE = “f”, VALUE is a single-precision float. | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| For VTYPE = “Z”, VALUE is a printable string. | | For VTYPE = “Z”, VALUE is a printable string. | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|- | |- | ||
| For VTYPE = “H”, VALUE is a Hex string. | | For VTYPE = “H”, VALUE is a Hex string. | ||
+ | |style="background-color:red;"| | ||
+ | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
|style="background-color:red;"| | |style="background-color:red;"| | ||
Line 128: | Line 191: | ||
NOTE: There are other TAG Validations that can be done. They will come later. | NOTE: There are other TAG Validations that can be done. They will come later. | ||
− | NOTE: There | + | NOTE: There may be other BAM Validations that can be done. They will come later. |
===SAM Questions=== | ===SAM Questions=== |
Revision as of 14:52, 17 March 2010
SAM Header Validation Rules
TODO
SAM Alignment Validation
Validation Criteria | Implemented | Tested | ||
---|---|---|---|---|
SAM | BAM | SAM | BAM | |
QNAME.Length() > 0 and <= 254 | ||||
QNAME does not contain [ \t\n\r] | ||||
FLAG is an integer [0-9]+ | ||||
FLAG < 2048 (I think) or [0, (2^16)-1] | ||||
RNAME does not contain [ \t\n\r@=] | ||||
POS is an integer [0-9]+ | ||||
POS is [0, (2^29)-1] | ||||
MAPQ is an integer [0-9]+ | ||||
MAPQ is [0, (2^8)-1] | ||||
CIGAR ([0-9]+[MIDNSHP])+|\* | ||||
MRNM does not contain [ \t\n\r@] ('=' means it is the same as RNAME) | ||||
If SQ is in the header RNAME & MRNM (if not “=”) must be in SQ. | ||||
MPOS is an integer [0-9]+ | ||||
MPOS is [0, (2^29)-1] | ||||
ISIZE is an integer -?[0-9]+ | ||||
ISIZE is [-(2^29), 2^29] | ||||
SEQ is [acgtnACGTN.=]+|\* | ||||
If SEQ is * then QUAL is * | ||||
QUAL is [!-~]+|* → dec 33 – 126 or dec 42 (which is in 32-126) (for BAM, it is between [0,93]) | ||||
If QUAL is not “*” it is the same length as SEQ. | ||||
TAG is [A-Z][A-Z0-9] | ||||
A TAG only appears once per alignment | ||||
VTYPE is [AifZH] for SAM and [AcCsSiIfZH] | ||||
VALUE does NOT contain [\t\n\r] | ||||
For VTYPE = “A”, VALUE is a printable character | ||||
For VTYPE = “i”, VALUE is a signed 32-bit integer. | ||||
For VTYPE = “f”, VALUE is a single-precision float. | ||||
For VTYPE = “Z”, VALUE is a printable string. | ||||
For VTYPE = “H”, VALUE is a Hex string. |
NOTE: There are other TAG Validations that can be done. They will come later.
NOTE: There may be other BAM Validations that can be done. They will come later.
SAM Questions
- Comment says: “If the mapping position of the query is not available, RNAME and CIGAR are set as “*”, and POS and MAPQ as 0.” Is it all or nothing? Can some be set to “*”/0 but not all?
- Same question for MRNM = “*” and MPOS & ISIZE = 0
- Comment says: “The name of a pair/read is required to be unique in the SAM file, but one pair/read may appear multiple times in different alignment records, representing multiple or split hits.” - Is there anything here that needs to be validated???