Changes

From Genome Analysis Wiki
Jump to navigationJump to search
3,034 bytes added ,  11:22, 13 August 2013
Line 17: Line 17:  
== File Format ==
 
== File Format ==
   −
The GLF-file format is defined in an Appendix to the [http://samtools.sourceforge.net/SAM1.pdf SAM-file format specification].
+
The GLF-file format was defined in an Appendix to the [http://samtools.sourceforge.net/SAM1.pdf SAM-file format specification].  It has since been removed from that document.
    
The current specification (GLF version 3) follows. All integers in are stored in the little-endian byte order. Most GLF files are compressed in a GZIP compatible format; SAMTOOLS will only read GLF files that are compressed with the BGZF library.
 
The current specification (GLF version 3) follows. All integers in are stored in the little-endian byte order. Most GLF files are compressed in a GZIP compatible format; SAMTOOLS will only read GLF files that are compressed with the BGZF library.
Line 73: Line 73:     
Records with recordType = 0 are empty.
 
Records with recordType = 0 are empty.
 +
 +
 +
=== SAM Format Specification 0.1.2-draft (20090820) A. Genotype Likelihood Format version 3 (GLFv3)===
 +
GLFs store the probability of a genotype given data.
 +
 +
All integers are in little-endian.
 +
 +
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 +
|-style="background: #f2f2f2; text-align: center;"
 +
! colspan="3"|Field !! Description !!  Type !! Value
 +
|-
 +
| colspan="3"|magic || GLFv3 magic number || char[4]||GLF\3
 +
|-
 +
|colspan="3"|l_text||Length of the header text, including any zero padding||int32_t||
 +
|-
 +
|colspan="3"|text||Text||char[l_text]||
 +
|-
 +
|colspan="6" align="center"|''List of reference information until the end of the file''
 +
|-
 +
| rowspan="20" style="width: 20px"| || colspan="2"|l_name||Length of the reference sequence name plus 1 (including NULL) || int32_t||
 +
|-
 +
| colspan="2"|name||Name; NULL terminated || char[l_name]||
 +
|-
 +
| colspan="2"|ref_len||length of the reference sequence || uint32_t||
 +
|-
 +
| colspan="5" align="center"|''List of sites until a record with rtype==0''
 +
|-
 +
| colspan="2"|rtype_ref||<nowiki>record_type<<4 | ref_base; 0..15=>XACMGRSVTWYHKDBN</nowiki> || uint8_t||
 +
|-
 +
| rowspan="4"|if rtype==1||offset||offset from the precedent record<sup>1</sup>||uint32_t||
 +
|-
 +
| min_depth||<nowiki>min_lk<<24 | read_depth (min_lk capped at 255)</nowiki>||uint32_t||
 +
|-
 +
| rmsMapQ|| RMS of mapping qualities of reads covering the site ||uint8_t||
 +
|-
 +
| lk||likelihood of each genotype in the order of AA AC AG AT CC GC CT GG GT TT||uint8_t[10]||
 +
|-style="border-top: 1px #aaa dashed;"
 +
| rowspan="10"|if rtype==2||offset||offset from the precedent record<sup>1,2</sup>||uint32_t||
 +
|-
 +
| min_depth||<nowiki>min_lk<<24 | read_depth</nowiki>||uint32_t||
 +
|-
 +
| rmsMapQ|| RMS of mapping qualities of reads covering the site ||uint8_t||
 +
|-
 +
| lkHom1||likelihood of the first homozygous indel allele (capped at 255)||uint8_t||
 +
|-
 +
| lkHom2||likelihood of the second homozygous indel allele (capped at 255)||uint8_t||
 +
|-
 +
| lkHet|| likelihood of a heterozygote the (capped at 255)||uint8_t||
 +
|-
 +
| indelLen1|| length of the first indel allele (positive=ins; negative=del; zero=no-indel)||uint16_t||
 +
|-
 +
| indelLen2|| length of the second indel allele ||uint16_t||
 +
|-
 +
| indelSeq1|| sequence of the first indel allele||char[indelLen1]||
 +
|-
 +
| indelSeq2|| sequence of the second indel allele||char[indelLen2]||
 +
|-style="border-top: 1px #aaa dashed;"
 +
| if rtype==0||endMarker||end of this chromosome; no data in this record||(null)||
 +
|}
 +
 +
'''Notes:'''
 +
 +
1. Field offset equals the zero-based coordinate of the current record minus the coordinate of the previous record.  For the first record in a reference sequence, the coordinate of the precedent record is assumed to be zero.  Offset is non-negative.
 +
 +
2. If a sequence is inserted between position [x,x+1] on the reference sequence, the coordinate of this record is x; if the sequence between [x,y] on the reference is is deleted, the coordinate of this record is x.
    
== Tools That Use GLF Files ==
 
== Tools That Use GLF Files ==

Navigation menu