Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,394 bytes added ,  14:32, 29 July 2010
no edit summary
Line 45: Line 45:  
==== What is a CIGAR? ====
 
==== What is a CIGAR? ====
 
You may have heard the term CIGAR, but wondered what it means.  Hopefully this section will help clarify it.
 
You may have heard the term CIGAR, but wondered what it means.  Hopefully this section will help clarify it.
 +
 +
The sequence being aligned to a reference may have additional bases that are not in the reference or may be missing bases that are in the reference.  The CIGAR string is a sequence of of base lengths and the associated operation.  They are used to indicate things like which bases align (either a match/mismatch) with the reference, are deleted from the reference, and are insertions that are not in the reference.
 +
 +
For example:
 +
RefPos:    1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19
 +
Reference:  C  C  A  T  A  C  T  G  A  A  C  T  G  A  C  T  A  A  C
 +
Read: ACTAGAATGGCT
 +
Aligning these two:
 +
RefPos:    1  2  3  4  5  6  7    8  9 10 11 12 13 14 15 16 17 18 19
 +
Reference:  C  C  A  T  A  C  T    G  A  A  C  T  G  A  C  T  A  A  C
 +
Read:                  A  C  T  A  G  A  A    T  G  G  C  T
 +
If the two align as above, you get:
 +
POS: 5
 +
CIGAR: 3M1I3M1D5M
 +
 +
The POS indicates that the read aligns starting at position 5 on the reference.
 +
The CIGAR says that the first 3 bases in the read sequence align with the reference.  The next base in the read does not exist in the reference.  Then 3 bases align with the reference.  The next reference base does not exist in the read sequence, then 5 more bases align with the reference.  Note that at position 14, the base in the read is different than the reference, but it still counts as an M since it aligns to that position.
 +
    
== Example SAM ==
 
== Example SAM ==

Navigation menu