Difference between revisions of "SAM Validation Criteria"

From Genome Analysis Wiki
Jump to: navigation, search
Line 10: Line 10:
 
|-
 
|-
 
| QNAME.Length() > 0 and <= 254
 
| QNAME.Length() > 0 and <= 254
|
+
|style="background-color:red;"|
|   
+
|style="background-color:red;"|   
 
|-
 
|-
 
| QNAME does not contain [ \t\n\r]
 
| QNAME does not contain [ \t\n\r]
|
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| FLAG is an integer [0-9]+
 
| FLAG is an integer [0-9]+
|
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| FLAG < 2048 (I think) or [0, (2^16)-1]
 
| FLAG < 2048 (I think) or [0, (2^16)-1]
|
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| RNAME does not contain [ \t\n\r@=]
 
| RNAME does not contain [ \t\n\r@=]
|
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| POS is an integer [0-9]+
 
| POS is an integer [0-9]+
|
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| POS is [0, (2^29)-1]
 
| POS is [0, (2^29)-1]
|
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| MAPQ is an integer [0-9]+
 
| MAPQ is an integer [0-9]+
|
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| MAPQ is [0, (2^8)-1]
 
| MAPQ is [0, (2^8)-1]
|
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| <nowiki>CIGAR ([0-9]+[MIDNSHP])+|\*</nowiki>
 
| <nowiki>CIGAR ([0-9]+[MIDNSHP])+|\*</nowiki>
|
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| MRNM does not contain [ \t\n\r@] ('=' means it is the same as RNAME)
 
| MRNM does not contain [ \t\n\r@] ('=' means it is the same as RNAME)
|
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| If SQ is in the header RNAME & MRNM (if not “=”) must be in SQ.
 
| If SQ is in the header RNAME & MRNM (if not “=”) must be in SQ.
|  
+
|style="background-color:red;"|
|  
+
|style="background-color:red;"|
 
|-
 
|-
 
| MPOS is an integer [0-9]+
 
| MPOS is an integer [0-9]+
|
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| MPOS is [0, (2^29)-1]
 
| MPOS is [0, (2^29)-1]
|
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| ISIZE is an integer -?[0-9]+
 
| ISIZE is an integer -?[0-9]+
|
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| ISIZE is [-(2^29), 2^29]
 
| ISIZE is [-(2^29), 2^29]
|
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| <nowiki>SEQ is [acgtnACGTN.=]+|\*</nowiki>
 
| <nowiki>SEQ is [acgtnACGTN.=]+|\*</nowiki>
|
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| If SEQ is * then QUAL is *
 
| If SEQ is * then QUAL is *
|
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| <nowiki>QUAL is [!-~]+|* → dec 33 – 126 or dec 42 (which is in 32-126) (for BAM, it is between [0,93])</nowiki>
 
| <nowiki>QUAL is [!-~]+|* → dec 33 – 126 or dec 42 (which is in 32-126) (for BAM, it is between [0,93])</nowiki>
|
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| If QUAL is not “*” it is the same length as SEQ.
 
| If QUAL is not “*” it is the same length as SEQ.
|
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| TAG is [A-Z][A-Z0-9]
 
| TAG is [A-Z][A-Z0-9]
|
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| A TAG only appears once per alignment
 
| A TAG only appears once per alignment
|  
+
|style="background-color:red;"|
|  
+
|style="background-color:red;"|
 
|-
 
|-
 
| VTYPE is [AifZH] for SAM and [AcCsSiIfZH]
 
| VTYPE is [AifZH] for SAM and [AcCsSiIfZH]
|
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| VALUE does NOT contain [\t\n\r]
 
| VALUE does NOT contain [\t\n\r]
|
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| For VTYPE = “A”, VALUE is a printable character
 
| For VTYPE = “A”, VALUE is a printable character
|
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| For VTYPE = “i”, VALUE is a signed 32-bit integer.
 
| For VTYPE = “i”, VALUE is a signed 32-bit integer.
|  
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| For VTYPE = “f”, VALUE is a single-precision float.
 
| For VTYPE = “f”, VALUE is a single-precision float.
|  
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| For VTYPE = “Z”, VALUE is a printable string.
 
| For VTYPE = “Z”, VALUE is a printable string.
|  
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|-
 
|-
 
| For VTYPE = “H”, VALUE is a Hex string.
 
| For VTYPE = “H”, VALUE is a Hex string.
|  
+
|style="background-color:red;"|
|
+
|style="background-color:red;"|
 
|}
 
|}
  

Revision as of 14:44, 17 March 2010

SAM Header Validation Rules

TODO

SAM Alignment Validation

SAM Alignment Record
Validation Criteria Implemented Tested
QNAME.Length() > 0 and <= 254
QNAME does not contain [ \t\n\r]
FLAG is an integer [0-9]+
FLAG < 2048 (I think) or [0, (2^16)-1]
RNAME does not contain [ \t\n\r@=]
POS is an integer [0-9]+
POS is [0, (2^29)-1]
MAPQ is an integer [0-9]+
MAPQ is [0, (2^8)-1]
CIGAR ([0-9]+[MIDNSHP])+|\*
MRNM does not contain [ \t\n\r@] ('=' means it is the same as RNAME)
If SQ is in the header RNAME & MRNM (if not “=”) must be in SQ.
MPOS is an integer [0-9]+
MPOS is [0, (2^29)-1]
ISIZE is an integer -?[0-9]+
ISIZE is [-(2^29), 2^29]
SEQ is [acgtnACGTN.=]+|\*
If SEQ is * then QUAL is *
QUAL is [!-~]+|* → dec 33 – 126 or dec 42 (which is in 32-126) (for BAM, it is between [0,93])
If QUAL is not “*” it is the same length as SEQ.
TAG is [A-Z][A-Z0-9]
A TAG only appears once per alignment
VTYPE is [AifZH] for SAM and [AcCsSiIfZH]
VALUE does NOT contain [\t\n\r]
For VTYPE = “A”, VALUE is a printable character
For VTYPE = “i”, VALUE is a signed 32-bit integer.
For VTYPE = “f”, VALUE is a single-precision float.
For VTYPE = “Z”, VALUE is a printable string.
For VTYPE = “H”, VALUE is a Hex string.

NOTE: There are other TAG Validations that can be done. They will come later.

NOTE: There are other BAM Validations that can be done. They will come later.

SAM Questions

  • Comment says: “If the mapping position of the query is not available, RNAME and CIGAR are set as “*”, and POS and MAPQ as 0.” Is it all or nothing? Can some be set to “*”/0 but not all?
    • Same question for MRNM = “*” and MPOS & ISIZE = 0
  • Comment says: “The name of a pair/read is required to be unique in the SAM file, but one pair/read may appear multiple times in different alignment records, representing multiple or split hits.” - Is there anything here that needs to be validated???