Difference between revisions of "SAM"

From Genome Analysis Wiki
Jump to: navigation, search
Line 16: Line 16:
 
Each Alignment has:
 
Each Alignment has:
 
* query name, QNAME (SAM)/read_name (BAM).  It is used to group/identify alignments that are together, like paired alignments or a read that appears in multiple alignments.
 
* query name, QNAME (SAM)/read_name (BAM).  It is used to group/identify alignments that are together, like paired alignments or a read that appears in multiple alignments.
* FLAG
+
* a bitwise set of information describing the alignment, FLAG:
 +
** are there multiple fragments
  
 
Not all alignments contain The rest of the alignment fields may be set to default values if the information is unknown.
 
Not all alignments contain The rest of the alignment fields may be set to default values if the information is unknown.
Line 26: Line 27:
 
* leftmost position of where the next alignment in this group maps to the reference, MPOS or PNEXT.  For SAM, the reference starts at 1, so this value is 1-based, while for BAM the reference starts at 0,so this value is 0-based.  Beware to always use the correct base when referencing positions.
 
* leftmost position of where the next alignment in this group maps to the reference, MPOS or PNEXT.  For SAM, the reference starts at 1, so this value is 1-based, while for BAM the reference starts at 0,so this value is 0-based.  Beware to always use the correct base when referencing positions.
 
* length of this group from the leftmost position to the rightmost position, ISIZE or TLEN
 
* length of this group from the leftmost position to the rightmost position, ISIZE or TLEN
* the sequence for this alignment, SEQ
+
* the query sequence for this alignment, SEQ
* the quality for this alignment, QUAL
+
* the query quality for this alignment, QUAL, one for each base in the query sequence.
 
* Additional optional information is also contained within the alignment, TAGS.  A bunch of different information can be stored here and they appear as key/value pairs.  See the spec for a detailed list of commonly used tags and what they mean.
 
* Additional optional information is also contained within the alignment, TAGS.  A bunch of different information can be stored here and they appear as key/value pairs.  See the spec for a detailed list of commonly used tags and what they mean.
  
Line 34: Line 35:
  
 
== Example SAM ==
 
== Example SAM ==
 +
=== Example Alignments ===
 +
This is what the alignment section of a SAM file looks like:
 +
 +
1:497:R:-272+13M17D24M 113 1 497 37 37M 15 100338662 0 CGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAG 0;==-==9;>>>>>=>>>>>>>>>>>=>>>>>>>>>> XT:A:U NM:i:0 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:37
 +
19:20389:F:275+18M2D19M 99 1 17644 0 37M = 17919 314 TATGACTGCTAATAATACCTACACATGTTAGAACCAT >>>>>>>>>>>>>>>>>>>><<>>><<>>4::>>:<9 XT:A:R NM:i:0 SM:i:0 AM:i:0 X0:i:4 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:37
 +
19:20389:F:275+18M2D19M 147 1 17919 0 18M2D19M = 17644 -314 GTAGTACCAACTGTAAGTCCTTATCTTCATACTTTGT ;44999;499<8<8<<<8<<><<<<><7<;<<<>><< XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:4 X1:i:0 XM:i:0 XO:i:1 XG:i:2 MD:Z:18^CA19
 +
9:21597+10M2I25M:R:-209 83 1 21678 0 8M2I27M = 21469 -244 CACCACATCACATATACCAAGCCTGGCTGTGTCTTCT <;9<<5><<<<><<<>><<><>><9>><>>>9>>><> XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:5 X1:i:0 XM:i:0 XO:i:1 XG:i:2 MD:Z:35
 +
 +
In this example, the fields are:
 +
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 +
|-style="background: #f2f2f2; text-align: center;"
 +
! '''Field''' !! '''Alignment 1''' !! '''Alignment 2''' !! '''Alignment 3''' !! '''Alignment 4'''
 +
|-
 +
|QNAME
 +
|1:497:R:-272+13M17D24M
 +
|19:20389:F:275+18M2D19M
 +
|19:20389:F:275+18M2D19M
 +
|9:21597+10M2I25M:R:-209
 +
|-
 +
|FLAG
 +
|113
 +
|99
 +
|147
 +
|83
 +
|-
 +
|RNAME
 +
|1
 +
|1
 +
|1
 +
|1
 +
|-
 +
|POS
 +
|497
 +
|17644
 +
|17919
 +
|21678
 +
|-
 +
|MAPQ
 +
|37
 +
|0
 +
|0
 +
|0
 +
|-
 +
|CIGAR
 +
|37M
 +
|37M
 +
|18M2D19M
 +
|8M2I27M
 +
|-
 +
|MRNM/RNEXT
 +
|15
 +
|=
 +
|=
 +
|=
 +
|-
 +
|MPOS/PNEXT
 +
|100338662
 +
|17919
 +
|17644
 +
|21469
 +
|-
 +
|ISIZE/TLEN
 +
|0
 +
|314
 +
|-314
 +
|-244
 +
|-
 +
|SEQ
 +
|CGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAG
 +
|TATGACTGCTAATAATACCTACACATGTTAGAACCAT
 +
|GTAGTACCAACTGTAAGTCCTTATCTTCATACTTTGT
 +
|CACCACATCACATATACCAAGCCTGGCTGTGTCTTCT
 +
|-
 +
|QUAL
 +
|0;==-==9;>>>>>=>>>>>>>>>>>=>>>>>>>>>>
 +
|>>>>>>>>>>>>>>>>>>>><<>>><<>>4::>>:<9
 +
|;44999;499<8<8<<<8<<><<<<><7<;<<<>><<
 +
|<;9<<5><<<<><<<>><<><>><9>><>>>9>>><>
 +
|-
 +
|TAGs
 +
|XT:A:U NM:i:0 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:37
 +
|XT:A:R NM:i:0 SM:i:0 AM:i:0 X0:i:4 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:37
 +
|XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:4 X1:i:0 XM:i:0 XO:i:1 XG:i:2 MD:Z:18^CA19
 +
|XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:5 X1:i:0 XM:i:0 XO:i:1 XG:i:2 MD:Z:35
 +
|}

Revision as of 10:51, 29 July 2010

What is SAM

The SAM Format is a text format for storing aligned reads in a series of tab delimited ASCII columns.

Most often it is generated as a human readable projection of its sister BAM format, which can store data in a compact, indexed, binary representation.

The current definition of the format is at [BAM/SAM Specification].


What Information is in SAM & BAM

SAM files and BAM files contain the same information, but in a different format. Refer to the specs to see a format description.

Both SAM & BAM files contain a header section and an alignment section. The header section may contain information about the entire file and additional information for alignments. The alignments then associate themselves with specific header information.

What Information Does SAM/BAM Have for an Alignment

Each Alignment has:

  • query name, QNAME (SAM)/read_name (BAM). It is used to group/identify alignments that are together, like paired alignments or a read that appears in multiple alignments.
  • a bitwise set of information describing the alignment, FLAG:
    • are there multiple fragments

Not all alignments contain The rest of the alignment fields may be set to default values if the information is unknown.

  • reference sequence name, RNAME, often contains the Chromosome name.
  • leftmost position of where this alignment maps to the reference, POS. For SAM, the reference starts at 1, so this value is 1-based, while for BAM the reference starts at 0,so this value is 0-based. Beware to always use the correct base when referencing positions.
  • mapping quality, MAPQ, which contains the "phred-scaled posterior probability that the mapping position" is wrong. (from SAM-1.pdf)
  • CIGAR
  • the reference sequence name of the next alignment in this group, MRNM or RNEXT. In paired alignments, it is the mate's reference sequence name. (A group is alignments with the same query name.)
  • leftmost position of where the next alignment in this group maps to the reference, MPOS or PNEXT. For SAM, the reference starts at 1, so this value is 1-based, while for BAM the reference starts at 0,so this value is 0-based. Beware to always use the correct base when referencing positions.
  • length of this group from the leftmost position to the rightmost position, ISIZE or TLEN
  • the query sequence for this alignment, SEQ
  • the query quality for this alignment, QUAL, one for each base in the query sequence.
  • Additional optional information is also contained within the alignment, TAGS. A bunch of different information can be stored here and they appear as key/value pairs. See the spec for a detailed list of commonly used tags and what they mean.

What is a CIGAR?

You may have heard the term CIGAR, but wondered what it means. Hopefully this section will help clarify it.

Example SAM

Example Alignments

This is what the alignment section of a SAM file looks like:

1:497:R:-272+13M17D24M	113	1	497	37	37M	15	100338662	0	CGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAG	0;==-==9;>>>>>=>>>>>>>>>>>=>>>>>>>>>>	XT:A:U	NM:i:0	SM:i:37	AM:i:0	X0:i:1	X1:i:0	XM:i:0	XO:i:0	XG:i:0	MD:Z:37
19:20389:F:275+18M2D19M	99	1	17644	0	37M	=	17919	314	TATGACTGCTAATAATACCTACACATGTTAGAACCAT	>>>>>>>>>>>>>>>>>>>><<>>><<>>4::>>:<9	XT:A:R	NM:i:0	SM:i:0	AM:i:0	X0:i:4	X1:i:0	XM:i:0	XO:i:0	XG:i:0	MD:Z:37
19:20389:F:275+18M2D19M	147	1	17919	0	18M2D19M	=	17644	-314	GTAGTACCAACTGTAAGTCCTTATCTTCATACTTTGT	;44999;499<8<8<<<8<<><<<<><7<;<<<>><<	XT:A:R	NM:i:2	SM:i:0	AM:i:0	X0:i:4	X1:i:0	XM:i:0	XO:i:1	XG:i:2	MD:Z:18^CA19
9:21597+10M2I25M:R:-209	83	1	21678	0	8M2I27M	=	21469	-244	CACCACATCACATATACCAAGCCTGGCTGTGTCTTCT	<;9<<5><<<<><<<>><<><>><9>><>>>9>>><>	XT:A:R	NM:i:2	SM:i:0	AM:i:0	X0:i:5	X1:i:0	XM:i:0	XO:i:1	XG:i:2	MD:Z:35

In this example, the fields are:

Field Alignment 1 Alignment 2 Alignment 3 Alignment 4
QNAME 1:497:R:-272+13M17D24M 19:20389:F:275+18M2D19M 19:20389:F:275+18M2D19M 9:21597+10M2I25M:R:-209
FLAG 113 99 147 83
RNAME 1 1 1 1
POS 497 17644 17919 21678
MAPQ 37 0 0 0
CIGAR 37M 37M 18M2D19M 8M2I27M
MRNM/RNEXT 15 = = =
MPOS/PNEXT 100338662 17919 17644 21469
ISIZE/TLEN 0 314
SEQ CGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAG TATGACTGCTAATAATACCTACACATGTTAGAACCAT GTAGTACCAACTGTAAGTCCTTATCTTCATACTTTGT CACCACATCACATATACCAAGCCTGGCTGTGTCTTCT
QUAL 0;==-==9;>>>>>=>>>>>>>>>>>=>>>>>>>>>> >>>>>>>>>>>>>>>>>>>><<>>><<>>4::>>:<9 ;44999;499<8<8<<<8<<><<<<><7<;<<<>><< <;9<<5><<<<><<<>><<><>><9>><>>>9>>><>
TAGs XT:A:U NM:i:0 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:37 XT:A:R NM:i:0 SM:i:0 AM:i:0 X0:i:4 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:37 XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:4 X1:i:0 XM:i:0 XO:i:1 XG:i:2 MD:Z:18^CA19 XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:5 X1:i:0 XM:i:0 XO:i:1 XG:i:2 MD:Z:35