From Genome Analysis Wiki
Jump to navigationJump to search
118 bytes added
, 13:12, 30 July 2010
Line 7: |
Line 7: |
| | | |
| The current definition of the format is at [[http://samtools.sourceforge.net/SAM1.pdf BAM/SAM Specification]]. | | The current definition of the format is at [[http://samtools.sourceforge.net/SAM1.pdf BAM/SAM Specification]]. |
| + | |
| | | |
| == What Information is in SAM & BAM == | | == What Information is in SAM & BAM == |
Line 15: |
Line 16: |
| | | |
| The alignment section contains the information for each sequence about where/how it aligns to the reference genome. | | The alignment section contains the information for each sequence about where/how it aligns to the reference genome. |
| + | |
| | | |
| === What Information Does SAM/BAM Have for an Alignment === | | === What Information Does SAM/BAM Have for an Alignment === |
Line 43: |
Line 45: |
| * the query quality for this alignment, [[SAM#What is QUAL?|QUAL]], one for each base in the query sequence. | | * the query quality for this alignment, [[SAM#What is QUAL?|QUAL]], one for each base in the query sequence. |
| * Additional optional information is also contained within the alignment, [[SAM#What are TAGs?|TAGs]]. A bunch of different information can be stored here and they appear as key/value pairs. See the spec for a detailed list of commonly used tags and what they mean. | | * Additional optional information is also contained within the alignment, [[SAM#What are TAGs?|TAGs]]. A bunch of different information can be stored here and they appear as key/value pairs. See the spec for a detailed list of commonly used tags and what they mean. |
| + | |
| | | |
| ==== What is a CIGAR? ==== | | ==== What is a CIGAR? ==== |
Line 63: |
Line 66: |
| The POS indicates that the read aligns starting at position 5 on the reference. | | The POS indicates that the read aligns starting at position 5 on the reference. |
| The CIGAR says that the first 3 bases in the read sequence align with the reference. The next base in the read does not exist in the reference. Then 3 bases align with the reference. The next reference base does not exist in the read sequence, then 5 more bases align with the reference. Note that at position 14, the base in the read is different than the reference, but it still counts as an M since it aligns to that position. | | The CIGAR says that the first 3 bases in the read sequence align with the reference. The next base in the read does not exist in the reference. Then 3 bases align with the reference. The next reference base does not exist in the read sequence, then 5 more bases align with the reference. Note that at position 14, the base in the read is different than the reference, but it still counts as an M since it aligns to that position. |
| + | |
| | | |
| ==== What is QUAL? ==== | | ==== What is QUAL? ==== |
Line 75: |
Line 79: |
| So, for SAM, the QUAL field is: | | So, for SAM, the QUAL field is: |
| <math>QUAL = (-10 \log_{10}p) + 33</math> | | <math>QUAL = (-10 \log_{10}p) + 33</math> |
| + | |
| + | Phred Quality is also found in a FASTQ file, described here: http://en.wikipedia.org/wiki/FASTQ_format#Quality |
| | | |
| ==== What are TAGs? ==== | | ==== What are TAGs? ==== |
| + | |
| + | |
| | | |
| == Example SAM == | | == Example SAM == |