From Genome Analysis Wiki
Jump to navigationJump to search
1,394 bytes added
, 14:32, 29 July 2010
Line 45: |
Line 45: |
| ==== What is a CIGAR? ==== | | ==== What is a CIGAR? ==== |
| You may have heard the term CIGAR, but wondered what it means. Hopefully this section will help clarify it. | | You may have heard the term CIGAR, but wondered what it means. Hopefully this section will help clarify it. |
| + | |
| + | The sequence being aligned to a reference may have additional bases that are not in the reference or may be missing bases that are in the reference. The CIGAR string is a sequence of of base lengths and the associated operation. They are used to indicate things like which bases align (either a match/mismatch) with the reference, are deleted from the reference, and are insertions that are not in the reference. |
| + | |
| + | For example: |
| + | RefPos: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
| + | Reference: C C A T A C T G A A C T G A C T A A C |
| + | Read: ACTAGAATGGCT |
| + | Aligning these two: |
| + | RefPos: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
| + | Reference: C C A T A C T G A A C T G A C T A A C |
| + | Read: A C T A G A A T G G C T |
| + | If the two align as above, you get: |
| + | POS: 5 |
| + | CIGAR: 3M1I3M1D5M |
| + | |
| + | The POS indicates that the read aligns starting at position 5 on the reference. |
| + | The CIGAR says that the first 3 bases in the read sequence align with the reference. The next base in the read does not exist in the reference. Then 3 bases align with the reference. The next reference base does not exist in the read sequence, then 5 more bases align with the reference. Note that at position 14, the base in the read is different than the reference, but it still counts as an M since it aligns to that position. |
| + | |
| | | |
| == Example SAM == | | == Example SAM == |