Changes

From Genome Analysis Wiki
Jump to navigationJump to search
202 bytes added ,  16:10, 11 September 2015
m
Line 8: Line 8:  
The current definition of the format is at [[http://samtools.sourceforge.net/SAM1.pdf BAM/SAM Specification]].
 
The current definition of the format is at [[http://samtools.sourceforge.net/SAM1.pdf BAM/SAM Specification]].
   −
If you are writing software to read SAM or BAM data, our C++ [[C++ Library: libbam|BamFile]] is a good resource to use.
+
If you are writing software to read SAM or BAM data, our C++ [[C++ Library: libStatGen|libStatGen]] is a good resource to use.
      Line 39: Line 39:  
* reference sequence name, RNAME, often contains the Chromosome name.   
 
* reference sequence name, RNAME, often contains the Chromosome name.   
 
* leftmost position of where this alignment maps to the reference, POS.  For SAM, the reference starts at 1, so this value is 1-based, while for BAM the reference starts at 0,so this value is 0-based.  Beware to always use the correct base when referencing positions.
 
* leftmost position of where this alignment maps to the reference, POS.  For SAM, the reference starts at 1, so this value is 1-based, while for BAM the reference starts at 0,so this value is 0-based.  Beware to always use the correct base when referencing positions.
* mapping quality, MAPQ, which contains the "phred-scaled posterior probability that the mapping position" is wrong. (from SAM-1.pdf)
+
* mapping quality, MAPQ, which contains the "phred-scaled posterior probability that the mapping position" is wrong. (see [[http://samtools.sourceforge.net/SAM1.pdf]])
 
* string indicating alignment information that allows the storing of clipped, [[SAM#What is a CIGAR?|CIGAR]]
 
* string indicating alignment information that allows the storing of clipped, [[SAM#What is a CIGAR?|CIGAR]]
 
* the reference sequence name of the next alignment in this group, MRNM or RNEXT.  In paired alignments, it is the mate's reference sequence name. (A group is alignments with the same query name.)
 
* the reference sequence name of the next alignment in this group, MRNM or RNEXT.  In paired alignments, it is the mate's reference sequence name. (A group is alignments with the same query name.)
Line 124: Line 124:  
  XT:A:U  - user defined tag called XT.  It holds a character.  The value associated with this tag is 'U'.
 
  XT:A:U  - user defined tag called XT.  It holds a character.  The value associated with this tag is 'U'.
 
  NM:i:2  - predefined tag NM means: Edit distance to the reference (number of changes necessary to make this equal the reference, excluding clipping)
 
  NM:i:2  - predefined tag NM means: Edit distance to the reference (number of changes necessary to make this equal the reference, excluding clipping)
      
=== What Information is in the SAM/BAM Header ===
 
=== What Information is in the SAM/BAM Header ===
Line 239: Line 238:  
|XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:5 X1:i:0 XM:i:0 XO:i:1 XG:i:2 MD:Z:35
 
|XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:5 X1:i:0 XM:i:0 XO:i:1 XG:i:2 MD:Z:35
 
|}
 
|}
 +
 +
== Tips/Tricks ==
 +
*Calculating BAM Block Size
 +
** Block Size = 8*4 + ReadNameLength(including null) + CigarLength*4 + (ReadLength+1)/2 + ReadLength + TagLength
 +
 +
    
You should now be a SAM expert :-)
 
You should now be a SAM expert :-)
61

edits

Navigation menu