Changes

From Genome Analysis Wiki
Jump to navigationJump to search
118 bytes added ,  13:12, 30 July 2010
no edit summary
Line 7: Line 7:     
The current definition of the format is at [[http://samtools.sourceforge.net/SAM1.pdf BAM/SAM Specification]].
 
The current definition of the format is at [[http://samtools.sourceforge.net/SAM1.pdf BAM/SAM Specification]].
 +
    
== What Information is in SAM & BAM ==
 
== What Information is in SAM & BAM ==
Line 15: Line 16:     
The alignment section contains the information for each sequence about where/how it aligns to the reference genome.
 
The alignment section contains the information for each sequence about where/how it aligns to the reference genome.
 +
    
=== What Information Does SAM/BAM Have for an Alignment ===
 
=== What Information Does SAM/BAM Have for an Alignment ===
Line 43: Line 45:  
* the query quality for this alignment, [[SAM#What is QUAL?|QUAL]], one for each base in the query sequence.
 
* the query quality for this alignment, [[SAM#What is QUAL?|QUAL]], one for each base in the query sequence.
 
* Additional optional information is also contained within the alignment, [[SAM#What are TAGs?|TAGs]].  A bunch of different information can be stored here and they appear as key/value pairs.  See the spec for a detailed list of commonly used tags and what they mean.
 
* Additional optional information is also contained within the alignment, [[SAM#What are TAGs?|TAGs]].  A bunch of different information can be stored here and they appear as key/value pairs.  See the spec for a detailed list of commonly used tags and what they mean.
 +
    
==== What is a CIGAR? ====
 
==== What is a CIGAR? ====
Line 63: Line 66:  
The POS indicates that the read aligns starting at position 5 on the reference.
 
The POS indicates that the read aligns starting at position 5 on the reference.
 
The CIGAR says that the first 3 bases in the read sequence align with the reference.  The next base in the read does not exist in the reference.  Then 3 bases align with the reference.  The next reference base does not exist in the read sequence, then 5 more bases align with the reference.  Note that at position 14, the base in the read is different than the reference, but it still counts as an M since it aligns to that position.
 
The CIGAR says that the first 3 bases in the read sequence align with the reference.  The next base in the read does not exist in the reference.  Then 3 bases align with the reference.  The next reference base does not exist in the read sequence, then 5 more bases align with the reference.  Note that at position 14, the base in the read is different than the reference, but it still counts as an M since it aligns to that position.
 +
    
==== What is QUAL? ====
 
==== What is QUAL? ====
Line 75: Line 79:  
So, for SAM, the QUAL field is:
 
So, for SAM, the QUAL field is:
 
  <math>QUAL = (-10 \log_{10}p) + 33</math>
 
  <math>QUAL = (-10 \log_{10}p) + 33</math>
 +
 +
Phred Quality is also found in a FASTQ file, described here: http://en.wikipedia.org/wiki/FASTQ_format#Quality
    
==== What are TAGs? ====
 
==== What are TAGs? ====
 +
 +
    
== Example SAM ==
 
== Example SAM ==

Navigation menu