Line 17: |
Line 17: |
| == File Format == | | == File Format == |
| | | |
− | The GLF-file format is defined in an Appendix to the [http://samtools.sourceforge.net/SAM1.pdf SAM-file format specification]. | + | The GLF-file format was defined in an Appendix to the [http://samtools.sourceforge.net/SAM1.pdf SAM-file format specification]. It has since been removed from that document. |
| | | |
| The current specification (GLF version 3) follows. All integers in are stored in the little-endian byte order. Most GLF files are compressed in a GZIP compatible format; SAMTOOLS will only read GLF files that are compressed with the BGZF library. | | The current specification (GLF version 3) follows. All integers in are stored in the little-endian byte order. Most GLF files are compressed in a GZIP compatible format; SAMTOOLS will only read GLF files that are compressed with the BGZF library. |
Line 73: |
Line 73: |
| | | |
| Records with recordType = 0 are empty. | | Records with recordType = 0 are empty. |
| + | |
| + | |
| + | === SAM Format Specification 0.1.2-draft (20090820) A. Genotype Likelihood Format version 3 (GLFv3)=== |
| + | GLFs store the probability of a genotype given data. |
| + | |
| + | All integers are in little-endian. |
| + | |
| + | {| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
| + | |-style="background: #f2f2f2; text-align: center;" |
| + | ! colspan="3"|Field !! Description !! Type !! Value |
| + | |- |
| + | | colspan="3"|magic || GLFv3 magic number || char[4]||GLF\3 |
| + | |- |
| + | |colspan="3"|l_text||Length of the header text, including any zero padding||int32_t|| |
| + | |- |
| + | |colspan="3"|text||Text||char[l_text]|| |
| + | |- |
| + | |colspan="6" align="center"|''List of reference information until the end of the file'' |
| + | |- |
| + | | rowspan="20" style="width: 20px"| || colspan="2"|l_name||Length of the reference sequence name plus 1 (including NULL) || int32_t|| |
| + | |- |
| + | | colspan="2"|name||Name; NULL terminated || char[l_name]|| |
| + | |- |
| + | | colspan="2"|ref_len||length of the reference sequence || uint32_t|| |
| + | |- |
| + | | colspan="5" align="center"|''List of sites until a record with rtype==0'' |
| + | |- |
| + | | colspan="2"|rtype_ref||<nowiki>record_type<<4 | ref_base; 0..15=>XACMGRSVTWYHKDBN</nowiki> || uint8_t|| |
| + | |- |
| + | | rowspan="4"|if rtype==1||offset||offset from the precedent record<sup>1</sup>||uint32_t|| |
| + | |- |
| + | | min_depth||<nowiki>min_lk<<24 | read_depth (min_lk capped at 255)</nowiki>||uint32_t|| |
| + | |- |
| + | | rmsMapQ|| RMS of mapping qualities of reads covering the site ||uint8_t|| |
| + | |- |
| + | | lk||likelihood of each genotype in the order of AA AC AG AT CC GC CT GG GT TT||uint8_t[10]|| |
| + | |-style="border-top: 1px #aaa dashed;" |
| + | | rowspan="10"|if rtype==2||offset||offset from the precedent record<sup>1,2</sup>||uint32_t|| |
| + | |- |
| + | | min_depth||<nowiki>min_lk<<24 | read_depth</nowiki>||uint32_t|| |
| + | |- |
| + | | rmsMapQ|| RMS of mapping qualities of reads covering the site ||uint8_t|| |
| + | |- |
| + | | lkHom1||likelihood of the first homozygous indel allele (capped at 255)||uint8_t|| |
| + | |- |
| + | | lkHom2||likelihood of the second homozygous indel allele (capped at 255)||uint8_t|| |
| + | |- |
| + | | lkHet|| likelihood of a heterozygote the (capped at 255)||uint8_t|| |
| + | |- |
| + | | indelLen1|| length of the first indel allele (positive=ins; negative=del; zero=no-indel)||uint16_t|| |
| + | |- |
| + | | indelLen2|| length of the second indel allele ||uint16_t|| |
| + | |- |
| + | | indelSeq1|| sequence of the first indel allele||char[indelLen1]|| |
| + | |- |
| + | | indelSeq2|| sequence of the second indel allele||char[indelLen2]|| |
| + | |-style="border-top: 1px #aaa dashed;" |
| + | | if rtype==0||endMarker||end of this chromosome; no data in this record||(null)|| |
| + | |} |
| + | |
| + | '''Notes:''' |
| + | |
| + | 1. Field offset equals the zero-based coordinate of the current record minus the coordinate of the previous record. For the first record in a reference sequence, the coordinate of the precedent record is assumed to be zero. Offset is non-negative. |
| + | |
| + | 2. If a sequence is inserted between position [x,x+1] on the reference sequence, the coordinate of this record is x; if the sequence between [x,y] on the reference is is deleted, the coordinate of this record is x. |
| | | |
| == Tools That Use GLF Files == | | == Tools That Use GLF Files == |