Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Created page with '=== SAM Header Validation Rules === TODO === SAM Alignment Validation === {| class="wikitable" style="width:100%" border="1" |+ style="font-size:150%"|'''SAM Alignment Record'''…'
=== SAM Header Validation Rules ===
TODO

=== SAM Alignment Validation ===
{| class="wikitable" style="width:100%" border="1"
|+ style="font-size:150%"|'''SAM Alignment Record'''
! width="70%"|Validation Criteria
! width="15%"|Implemented
! width="15%"|Tested
|-
| QNAME.Length() > 0 and <= 254
|
|
|-
| QNAME does not contain [ \t\n\r]
|
|
|-
| FLAG is an integer [0-9]+
|
|
|-
| FLAG < 2048 (I think) or [0, (2^16)-1]
|
|
|-
| RNAME does not contain [ \t\n\r@=]
|
|
|-
| POS is an integer [0-9]+
|
|
|-
| POS is [0, (2^29)-1]
|
|
|-
| MAPQ is an integer [0-9]+
|
|
|-
| MAPQ is [0, (2^8)-1]
|
|
|-
| CIGAR ([0-9]+[MIDNSHP])+|\*
|
|
|-
| MRNM does not contain [ \t\n\r@] ('=' means it is the same as RNAME)
|
|
|-
| If SQ is in the header RNAME & MRNM (if not “=”) must be in SQ.
|
|
|-
| MPOS is an integer [0-9]+
|
|
|-
| MPOS is [0, (2^29)-1]
|
|
|-
| ISIZE is an integer -?[0-9]+
|
|
|-
| ISIZE is [-(2^29), 2^29]
|
|
|-
| SEQ is [acgtnACGTN.=]+|*
|
|
|-
| If SEQ is * then QUAL is *
|
|
|-
| QUAL is [!-~]+|* → dec 33 – 126 or dec 42 (which is in 32-126) (for BAM, it is between [0,93])
|
|
|-
| If QUAL is not “*” it is the same length as SEQ.
|
|
|-
| TAG is [A-Z][A-Z0-9]
|
|
|-
| A TAG only appears once per alignment
|
|
|-
| VTYPE is [AifZH] for SAM and [AcCsSiIfZH]
|
|
|-
| VALUE does NOT contain [\t\n\r]
|
|
|-
| For VTYPE = “A”, VALUE is a printable character
|
|
|-
| For VTYPE = “i”, VALUE is a signed 32-bit integer.
|
|
|-
| For VTYPE = “f”, VALUE is a single-precision float.
|
|
|-
| For VTYPE = “Z”, VALUE is a printable string.
|
|
|-
| For VTYPE = “H”, VALUE is a Hex string.
|
|
|}

NOTE: There are other TAG Validations that can be done. They will come later.

NOTE: There are other BAM Validations that can be done. They will come later.

===SAM Questions===
*Comment says: “If the mapping position of the query is not available, RNAME and CIGAR are set as “*”, and POS and MAPQ as 0.” Is it all or nothing? Can some be set to “*”/0 but not all?
**Same question for MRNM = “*” and MPOS & ISIZE = 0
*Comment says: “The name of a pair/read is required to be unique in the SAM file, but one pair/read may appear multiple times in different alignment records, representing multiple or split hits.” - Is there anything here that needs to be validated???

Navigation menu