Difference between revisions of "SAM Validation Criteria"
From Genome Analysis Wiki
Jump to navigationJump to searchLine 10: | Line 10: | ||
|- | |- | ||
| QNAME.Length() > 0 and <= 254 | | QNAME.Length() > 0 and <= 254 | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| QNAME does not contain [ \t\n\r] | | QNAME does not contain [ \t\n\r] | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| FLAG is an integer [0-9]+ | | FLAG is an integer [0-9]+ | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| FLAG < 2048 (I think) or [0, (2^16)-1] | | FLAG < 2048 (I think) or [0, (2^16)-1] | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| RNAME does not contain [ \t\n\r@=] | | RNAME does not contain [ \t\n\r@=] | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| POS is an integer [0-9]+ | | POS is an integer [0-9]+ | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| POS is [0, (2^29)-1] | | POS is [0, (2^29)-1] | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| MAPQ is an integer [0-9]+ | | MAPQ is an integer [0-9]+ | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| MAPQ is [0, (2^8)-1] | | MAPQ is [0, (2^8)-1] | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| <nowiki>CIGAR ([0-9]+[MIDNSHP])+|\*</nowiki> | | <nowiki>CIGAR ([0-9]+[MIDNSHP])+|\*</nowiki> | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| MRNM does not contain [ \t\n\r@] ('=' means it is the same as RNAME) | | MRNM does not contain [ \t\n\r@] ('=' means it is the same as RNAME) | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| If SQ is in the header RNAME & MRNM (if not “=”) must be in SQ. | | If SQ is in the header RNAME & MRNM (if not “=”) must be in SQ. | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| MPOS is an integer [0-9]+ | | MPOS is an integer [0-9]+ | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| MPOS is [0, (2^29)-1] | | MPOS is [0, (2^29)-1] | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| ISIZE is an integer -?[0-9]+ | | ISIZE is an integer -?[0-9]+ | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| ISIZE is [-(2^29), 2^29] | | ISIZE is [-(2^29), 2^29] | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| <nowiki>SEQ is [acgtnACGTN.=]+|\*</nowiki> | | <nowiki>SEQ is [acgtnACGTN.=]+|\*</nowiki> | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| If SEQ is * then QUAL is * | | If SEQ is * then QUAL is * | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| <nowiki>QUAL is [!-~]+|* → dec 33 – 126 or dec 42 (which is in 32-126) (for BAM, it is between [0,93])</nowiki> | | <nowiki>QUAL is [!-~]+|* → dec 33 – 126 or dec 42 (which is in 32-126) (for BAM, it is between [0,93])</nowiki> | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| If QUAL is not “*” it is the same length as SEQ. | | If QUAL is not “*” it is the same length as SEQ. | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| TAG is [A-Z][A-Z0-9] | | TAG is [A-Z][A-Z0-9] | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| A TAG only appears once per alignment | | A TAG only appears once per alignment | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| VTYPE is [AifZH] for SAM and [AcCsSiIfZH] | | VTYPE is [AifZH] for SAM and [AcCsSiIfZH] | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| VALUE does NOT contain [\t\n\r] | | VALUE does NOT contain [\t\n\r] | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| For VTYPE = “A”, VALUE is a printable character | | For VTYPE = “A”, VALUE is a printable character | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| For VTYPE = “i”, VALUE is a signed 32-bit integer. | | For VTYPE = “i”, VALUE is a signed 32-bit integer. | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| For VTYPE = “f”, VALUE is a single-precision float. | | For VTYPE = “f”, VALUE is a single-precision float. | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| For VTYPE = “Z”, VALUE is a printable string. | | For VTYPE = “Z”, VALUE is a printable string. | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|- | |- | ||
| For VTYPE = “H”, VALUE is a Hex string. | | For VTYPE = “H”, VALUE is a Hex string. | ||
− | | | + | |style="background-color:red;"| |
− | | | + | |style="background-color:red;"| |
|} | |} | ||
Revision as of 14:44, 17 March 2010
SAM Header Validation Rules
TODO
SAM Alignment Validation
Validation Criteria | Implemented | Tested |
---|---|---|
QNAME.Length() > 0 and <= 254 | ||
QNAME does not contain [ \t\n\r] | ||
FLAG is an integer [0-9]+ | ||
FLAG < 2048 (I think) or [0, (2^16)-1] | ||
RNAME does not contain [ \t\n\r@=] | ||
POS is an integer [0-9]+ | ||
POS is [0, (2^29)-1] | ||
MAPQ is an integer [0-9]+ | ||
MAPQ is [0, (2^8)-1] | ||
CIGAR ([0-9]+[MIDNSHP])+|\* | ||
MRNM does not contain [ \t\n\r@] ('=' means it is the same as RNAME) | ||
If SQ is in the header RNAME & MRNM (if not “=”) must be in SQ. | ||
MPOS is an integer [0-9]+ | ||
MPOS is [0, (2^29)-1] | ||
ISIZE is an integer -?[0-9]+ | ||
ISIZE is [-(2^29), 2^29] | ||
SEQ is [acgtnACGTN.=]+|\* | ||
If SEQ is * then QUAL is * | ||
QUAL is [!-~]+|* → dec 33 – 126 or dec 42 (which is in 32-126) (for BAM, it is between [0,93]) | ||
If QUAL is not “*” it is the same length as SEQ. | ||
TAG is [A-Z][A-Z0-9] | ||
A TAG only appears once per alignment | ||
VTYPE is [AifZH] for SAM and [AcCsSiIfZH] | ||
VALUE does NOT contain [\t\n\r] | ||
For VTYPE = “A”, VALUE is a printable character | ||
For VTYPE = “i”, VALUE is a signed 32-bit integer. | ||
For VTYPE = “f”, VALUE is a single-precision float. | ||
For VTYPE = “Z”, VALUE is a printable string. | ||
For VTYPE = “H”, VALUE is a Hex string. |
NOTE: There are other TAG Validations that can be done. They will come later.
NOTE: There are other BAM Validations that can be done. They will come later.
SAM Questions
- Comment says: “If the mapping position of the query is not available, RNAME and CIGAR are set as “*”, and POS and MAPQ as 0.” Is it all or nothing? Can some be set to “*”/0 but not all?
- Same question for MRNM = “*” and MPOS & ISIZE = 0
- Comment says: “The name of a pair/read is required to be unique in the SAM file, but one pair/read may appear multiple times in different alignment records, representing multiple or split hits.” - Is there anything here that needs to be validated???