Line 1: |
Line 1: |
| + | '''NOTE: Not all validation Criteria has been listed here, and not all listed here have been implemented (Implemented checks are marked green.)''' |
| + | |
| === SAM Header Validation Rules === | | === SAM Header Validation Rules === |
| TODO | | TODO |
− |
| |
− | === SAM Alignment Validation ===
| |
| {| class="wikitable" style="width:100%" border="1" | | {| class="wikitable" style="width:100%" border="1" |
− | |+ style="font-size:150%"|'''SAM Alignment Record''' | + | |+ style="font-size:150%"|'''SAM Header''' |
| ! rowspan='2' width="60%"|Validation Criteria | | ! rowspan='2' width="60%"|Validation Criteria |
| ! colspan="2" width="20%"|Implemented | | ! colspan="2" width="20%"|Implemented |
Line 14: |
Line 14: |
| ! width="10%"|BAM | | ! width="10%"|BAM |
| |- | | |- |
− | | QNAME.Length() > 0 and <= 254 | + | | All Required Fields are set |
| + | |style="background-color:red;"| |
| + | |style="background-color:red;"| |
| + | |style="background-color:red;"| |
| + | |style="background-color:red;"| |
| + | |- |
| + | | If HD line is there, VN is also there. |
| + | |style="background-color:red;"| |
| + | |style="background-color:red;"| |
| + | |style="background-color:red;"| |
| + | |style="background-color:red;"| |
| + | |- |
| + | | HD/VN is not in valid format /^[0-9]+\.[0-9]+$/ |
| + | |style="background-color:red;"| |
| + | |style="background-color:red;"| |
| + | |style="background-color:red;"| |
| + | |style="background-color:red;"| |
| + | |- |
| + | | HD/SO is a valid value (unsorted, queryname, coordinate) |
| + | |style="background-color:red;"| |
| + | |style="background-color:red;"| |
| + | |style="background-color:red;"| |
| + | |style="background-color:red;"| |
| + | |- |
| + | | SQ/SN all SQ lines have a unique SN field |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |style="background-color:red;"| | | |style="background-color:red;"| |
Line 20: |
Line 44: |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |- | | |- |
− | | QNAME does not contain [ \t\n\r] | + | | SQ/LN is in the range [1, (2^29) -1] |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |style="background-color:red;"| | | |style="background-color:red;"| |
Line 26: |
Line 50: |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |- | | |- |
− | | FLAG is an integer [0-9]+ | + | | SQ/LN is not a number |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |style="background-color:red;"| | | |style="background-color:red;"| |
Line 32: |
Line 56: |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |- | | |- |
− | | FLAG < 2048 (I think) or [0, (2^16)-1] | + | | RG/ID all RG lines have a unique ID field |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |style="background-color:red;"| | | |style="background-color:red;"| |
Line 38: |
Line 62: |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |- | | |- |
− | | RNAME does not contain [ \t\n\r@=] | + | | RG/PL is a valid value (ILLUMINA, SOLID, LS454, HELICOS, PACBIO) |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |style="background-color:red;"| | | |style="background-color:red;"| |
Line 44: |
Line 68: |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |- | | |- |
− | | POS is an integer [0-9]+ | + | | Header has X-lines or fewer (or a max number of SQ lines (this was a problem once of a file with a crazy number of header lines) |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| + | |} |
| + | |
| + | === SAM Alignment Validation === |
| + | {| class="wikitable" style="width:100%" border="1" |
| + | |+ style="font-size:150%"|'''SAM Alignment Record''' |
| + | ! rowspan='2' width="60%"|Validation Criteria |
| + | ! colspan="2" width="20%"|Implemented |
| + | ! colspan="2" width="20%"|Tested |
| |- | | |- |
− | | POS is [0, (2^29)-1] | + | ! width="10%"|SAM |
| + | ! width="10%"|BAM |
| + | ! width="10%"|SAM |
| + | ! width="10%"|BAM |
| + | |- |
| + | | QNAME.Length() > 0 and <= 254 |
| + | |style="background-color:green;"| |
| + | |style="background-color:green;"| |
| + | |style="background-color:green;"| |
| + | |style="background-color:red;"| |
| + | |- |
| + | | QNAME is valid: [!-?A-~] (printable characters minus space and '@') '''This is a new regular expression''' |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |style="background-color:red;"| | | |style="background-color:red;"| |
Line 56: |
Line 99: |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |- | | |- |
− | | MAPQ is an integer [0-9]+ | + | | FLAG is an integer [0-9]+ |
| + | |style="background-color:green;"| |
| + | |style="background-color:grey;"| N/A: just interpret the bits as an int. |
| + | |style="background-color:green;"| |
| + | |style="background-color:grey;"| N/A: just interpret the bits as an int. |
| + | |- |
| + | | FLAG is [0, (2^16)-1] |
| + | |style="background-color:green;"| Parse Error since it will be written into a 16 bit field. |
| + | |style="background-color:grey;"| N/A: only a 16 bit field |
| + | |style="background-color:green;"| |
| + | |style="background-color:grey;"| N/A: only a 16 bit field |
| + | |- |
| + | | RNAME does not contain [ \t\n\r@=] |
| + | |style="background-color:green;"| |
| + | |style="background-color:green;"| |
| + | |style="background-color:green;"| |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| + | |- |
| + | | RNAME is found in an SQ header record if there are any SQs in the header. |
| + | |style="background-color:green;"| |
| + | |style="background-color:green;"| |
| + | |style="background-color:green;"| |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| + | |- |
| + | | Reference Name length does not match specified length. |
| + | |style="background-color:grey;"| N/A: reference name length is in BAM format only |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| + | |style="background-color:grey;"| N/A: reference name length is in BAM format only |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |- | | |- |
− | | MAPQ is [0, (2^8)-1] | + | | Reference ID is in range of the number of references |
| + | |style="background-color:grey;"| N/A: rID is in BAM format only |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| + | |style="background-color:grey;"| N/A: rID is in BAM format only |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| + | |- |
| + | | POS is an integer [0-9]+ |
| + | |style="background-color:green;"| |
| + | |style="background-color:grey;"| N/A: just interpret the bits as an int. |
| + | |style="background-color:green;"| |
| + | |style="background-color:grey;"| N/A: just interpret the bits as an int. |
| + | |- |
| + | | POS is [0, (2^29)-1] |
| + | |style="background-color:green;"| Parse Error if it can't fit in the 32 bit field, other out of range is a validation error. |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| + | |style="background-color:green;"| |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| + | |- |
| + | | MAPQ is an integer [0-9]+ |
| + | |style="background-color:green;"| |
| + | |style="background-color:grey;"| N/A: just interpret the bits as an int. |
| + | |style="background-color:green;"| |
| + | |style="background-color:grey;"| N/A: just interpret the bits as an int. |
| + | |- |
| + | | MAPQ is [0, (2^8)-1] |
| + | |style="background-color:green;"| Parse Error since it will be written into an 8 bit field. |
| + | |style="background-color:grey;"| N/A: only a 8 bit field |
| + | |style="background-color:green;"| |
| + | |style="background-color:grey;"| N/A: only a 8 bit field |
| |- | | |- |
| | <nowiki>CIGAR ([0-9]+[MIDNSHP])+|\*</nowiki> | | | <nowiki>CIGAR ([0-9]+[MIDNSHP])+|\*</nowiki> |
Line 72: |
Line 163: |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| + | |style="background-color:red;"| |
| + | |- |
| + | | CIGAR string matches the length of SEQ if both are not "*" |
| + | |style="background-color:green;"| |
| + | |style="background-color:green;"| |
| + | |style="background-color:green;"| |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |- | | |- |
Line 87: |
Line 184: |
| |- | | |- |
| | MPOS is an integer [0-9]+ | | | MPOS is an integer [0-9]+ |
− | |style="background-color:red;"| | + | |style="background-color:green;"| |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |style="background-color:red;"| | | |style="background-color:red;"| |
Line 99: |
Line 196: |
| |- | | |- |
| | ISIZE is an integer -?[0-9]+ | | | ISIZE is an integer -?[0-9]+ |
− | |style="background-color:red;"| | + | |style="background-color:green;"| |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |style="background-color:red;"| | | |style="background-color:red;"| |
Line 128: |
Line 225: |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |- | | |- |
− | | If QUAL is not “*” it is the same length as SEQ. | + | | If QUAL and SEQ are not “*” they are the same length. |
− | |style="background-color:red;"| | + | |style="background-color:green;"| |
− | |style="background-color:red;"|
| |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| + | |style="background-color:green;"| |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |- | | |- |
Line 146: |
Line 243: |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |- | | |- |
− | | VTYPE is [AifZH] for SAM and [AcCsSiIfZH] | + | | VTYPE is [AifZH] for SAM and [AcCsSiIfZH] for BAM |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |style="background-color:red;"| | | |style="background-color:red;"| |
Line 187: |
Line 284: |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| |style="background-color:red;"| | | |style="background-color:red;"| |
| + | |- |
| + | | For TAG = E2, length should be the same as the Read Length |
| + | |style="background-color:red;"| |
| + | |style="background-color:red;"| |
| + | |style="background-color:red;"| |
| + | |style="background-color:red;"| |
| + | |- |
| + | | For TAG = E2, each base should be different than the read Base (unless 'N') |
| + | |style="background-color:red;"| |
| + | |style="background-color:red;"| |
| + | |style="background-color:red;"| |
| + | |style="background-color:red;"| |
| + | |- |
| + | | For TAG = U2, length should be the same as the Read Length |
| + | |style="background-color:red;"| |
| + | |style="background-color:red;"| |
| + | |style="background-color:red;"| |
| + | |style="background-color:red;"| |
| + | |- |
| |} | | |} |
| | | |
Line 192: |
Line 308: |
| | | |
| NOTE: There may be other BAM Validations that can be done. They will come later. | | NOTE: There may be other BAM Validations that can be done. They will come later. |
| + | |
| + | Consider may want to validate the cigar string against the read length... |
| + | |
| + | == Other Read Validation == |
| + | |
| + | {| class="wikitable" style="width:100%" border="1" |
| + | |+ style="font-size:150%"|'''SAM Alignment Record''' |
| + | ! rowspan='2' width="60%"|Validation Criteria |
| + | ! colspan="2" width="20%"|Implemented |
| + | ! colspan="2" width="20%"|Tested |
| + | |- |
| + | ! width="10%"|SAM |
| + | ! width="10%"|BAM |
| + | ! width="10%"|SAM |
| + | ! width="10%"|BAM |
| + | |- |
| + | | If specified to check sort order (either based on SO flag or user specifies coordinate or query name). |
| + | |style="background-color:green;"| |
| + | |style="background-color:green;"| |
| + | |style="background-color:green;"| |
| + | |style="background-color:green;"| |
| + | |} |
| + | |
| | | |
| ===SAM Questions=== | | ===SAM Questions=== |
Line 197: |
Line 336: |
| **Same question for MRNM = “*” and MPOS & ISIZE = 0 | | **Same question for MRNM = “*” and MPOS & ISIZE = 0 |
| *Comment says: “The name of a pair/read is required to be unique in the SAM file, but one pair/read may appear multiple times in different alignment records, representing multiple or split hits.” - Is there anything here that needs to be validated??? | | *Comment says: “The name of a pair/read is required to be unique in the SAM file, but one pair/read may appear multiple times in different alignment records, representing multiple or split hits.” - Is there anything here that needs to be validated??? |
| + | |
| + | == BamFile Classes == |
| + | [[C++ Library: libbam|BamFile]] |