Difference between revisions of "LibStatGen: ASP"
(Created page with 'Category:C++ Category:libStatGen Category:libStatGen ASP = Asymmetric Pileup (ASP) = Asymmetric Pileup (ASP) is a new pileup file format that we created to replace …') |
|||
Line 23: | Line 23: | ||
| 3 || [[#Record Type: Detailed|Detailed]] || indicates that not all bases at this position match the reference and provides the number of bases, the bases, the qualities, the cycles, the strands, and the MQs. | | 3 || [[#Record Type: Detailed|Detailed]] || indicates that not all bases at this position match the reference and provides the number of bases, the bases, the qualities, the cycles, the strands, and the MQs. | ||
|} | |} | ||
+ | |||
+ | Only position records contain a chromosome id/position. The record after a position record has the chromosome id & position specified in the position record. All other records are assumed to just increment one position from the previous record. | ||
=== Record Type: Empty === | === Record Type: Empty === | ||
+ | |||
+ | An empty record is only 1 byte and just contains the type field. It is a placeholder to indicate at this position there were no bases. This is to prevent having to have a position-only record every time a position contains no bases. | ||
+ | |||
+ | |||
+ | {| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse; text-align:center;" border="1" | ||
+ | |-style="background: #f2f2f2; text-align: center;" | ||
+ | ! Field !! Description !! Type !! Value | ||
+ | |- | ||
+ | | type || Empty Record Type || uint8_t || 0 | ||
+ | |} | ||
+ | |||
+ | |||
=== Record Type: Position Only === | === Record Type: Position Only === | ||
+ | |||
+ | {| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse; text-align:center;" border="1" | ||
+ | |-style="background: #f2f2f2; text-align: center;" | ||
+ | ! Field !! Description !! Type !! Value | ||
+ | |- | ||
+ | | type || Position Only Record Type || uint8_t || 1 | ||
+ | |- | ||
+ | | chromID || Chromosome ID of the next record || int32_t || | ||
+ | |- | ||
+ | | pos || 0-based position of the next record || int32_t || | ||
+ | |} | ||
=== Record Type: Reference Only === | === Record Type: Reference Only === | ||
+ | |||
+ | {| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse; text-align:center;" border="1" | ||
+ | |-style="background: #f2f2f2; text-align: center;" | ||
+ | ! Field !! Description !! Type !! Value/Range | ||
+ | |- | ||
+ | | type || Reference Only Record Type || uint8_t || 2 | ||
+ | |- | ||
+ | | numBases || Number of bases at this position || uint8_t || 1-255 | ||
+ | |- | ||
+ | | GLH || Genotype Likelihood H || uint8_t || 0-255 | ||
+ | |- | ||
+ | | GLA || Genotype Likelihood Alternate || uint8_t || 0-255 | ||
+ | |} | ||
+ | |||
+ | If a position has more than 255 bases, only the first 255 are used for calculating the GLH, GLA. | ||
+ | |||
+ | Phred Qualities that are unknown or less than 13 are not used in the Genotype Likelihood calculation, but are counted in the numBases if there is a base at the position. | ||
=== Record Type: Detailed === | === Record Type: Detailed === | ||
+ | |||
+ | {| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse; text-align: center;" border="1" | ||
+ | |-style="background: #f2f2f2; text-align: center;" | ||
+ | ! Field !! Description !! Type !! Value/Range | ||
+ | |- | ||
+ | | type || style="text-align: left;"| Detailed Record Type || uint8_t || 2 | ||
+ | |- | ||
+ | | numBases || style="text-align: left;"| Number of bases at this position || uint8_t || 1-255 | ||
+ | |- | ||
+ | | all Bases || style="text-align: left;"| 4-bit encoded bases/deletions for this position | ||
+ | 1st base is in the upper bits of 1st byte | ||
+ | |||
+ | if odd number of bases, the lower bits of the last byte are 0. | ||
+ | | uint8_t[(numBases+1)/2] | ||
+ | | 0=A, 1=C, | ||
+ | 2=G, 3=T, | ||
+ | |||
+ | 4=N, 5=D (deletion) | ||
+ | |- | ||
+ | | allQuals || style="text-align: left;"| All Phred Quals for this position || uint8_t[numBases] || 0-254 | ||
+ | 255 - unknown quality | ||
+ | |- | ||
+ | | allCycles || style="text-align: left;"| 0-based position in the reads for all bases at this position || uint8_t[numBases] || 0-255 | ||
+ | |- | ||
+ | | allStrands || style="text-align: left;"| Strand for all bases at this position | ||
+ | strand of the 1st base is in the uppermost bit of the first byte | ||
+ | |||
+ | if numBases is not a multiple of 8, the extra lower bits are set to 0 | ||
+ | | uint8_t[(numBases+7)/8] | ||
+ | | 0 - forward | ||
+ | 1 - reverse | ||
+ | |- | ||
+ | | allMQs || style="text-align: left;"| Mapping Qualities for all bases at this position || uint8_t[numBases] || 0-255 | ||
+ | |} | ||
+ | |||
+ | |||
+ | ==== Example ==== | ||
+ | |||
+ | Hex dump of a record: 0x0302231d1d0201402c22 | ||
+ | |||
+ | {| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse; text-align: center;" border="1" | ||
+ | |-style="background: #f2f2f2; text-align: center;" | ||
+ | ! width="25" | 0 !! width="25" | 1 !! width="25" | 2 !! width="25" | 3 !! width="25" | 4 !! width="25" | 5 !! width="25" | 6 !! width="25" | 7 !! width="25" | 8 !! width="25" | 9 !! width="25" | 10 !! width="25" | 11 !! width="25" | 12 !! width="25" | 13 !! width="25" | 14 !! width="25" | 15 !! width="25" | 16 !! width="25" | 17 !! width="25" | 18 !! width="25" | 19 !! width="25" | 20 !! width="25" | 21 !! width="25" | 22 !! width="25" | 23 !! width="25" | 24 !! width="25" | 25 !! width="25" | 26 !! width="25" | 27 !! width="25" | 28 !! width="25" | 29 !! width="25" | 30 !! width="25" | 31 !! width="25" | 32 !! width="25" | 33 !! width="25" | 34 !! width="25" | 35 !! width="25" | 36 !! width="25" | 37 !! width="25" | 38 !! width="25" | 39 !! width="25" | 40 !! width="25" | 41 !! width="25" | 42 !! width="25" | 43 !! width="25" | 44 !! width="25" | 45 !! width="25" | 46 !! width="25" | 47 !! width="25" | 48 !! width="25" | 49 !! width="25" | 50 !! width="25" | 51 !! width="25" | 52 !! width="25" | 53 !! width="25" | 54 !! width="25" | 55 !! width="25" | 56 !! width="25" | 57 !! width="25" | 58 !! width="25" | 59 !! width="25" | 60 !! width="25" | 61 !! width="25" | 62 !! width="25" | 63 !! width="25" | 64 !! width="25" | 65 !! width="25" | 66 !! width="25" | 67 !! width="25" | 68 !! width="25" | 69 !! width="25" | 70 !! width="25" | 71 !! width="25" | 72 !! width="25" | 73 !! width="25" | 74 !! width="25" | 75 !! width="25" | 76 !! width="25" | 77 !! width="25" | 78 !! width="25" | 79 | ||
+ | |- | ||
+ | | colspan="8" | 03 || colspan="8" | 02 || colspan="8" | 23 || colspan="8" | 1D || colspan="8" | 1D || colspan="8" | 02 || colspan="8" | 01 || colspan="8" | 40 || colspan="8" | 2C || colspan="8" | 22 | ||
+ | |- | ||
+ | | colspan="8" | Type = DETAILED || colspan="8" | NumBases = 2 || colspan="8" | Base1 = 2 = G, Base2 = 3 = T || colspan="8" | Qual1 = 0x1D = 29 = '>' || colspan="8" | Qual2 = 0x1D = 29 = '>' || colspan="8" | Cycle1 = 2 || colspan="8" | Cycle2 = 1 || colspan="8" | Strand1 (bit 56) = 0 = forward, | ||
+ | Strand2 (bit 57) = 1 = reverse, | ||
+ | |||
+ | extra bits are dummy bits = 0 | ||
+ | | colspan="8" | MapQual1 = 0x2C = 44 || colspan="8" | MapQual2 = 0x22 = 34 | ||
+ | |- | ||
+ | | colspan="4" | 0 || colspan="4" | 3 || colspan="4" | 0 || colspan="4" | 2 || colspan="4" | 2 || colspan="4" | 3 || colspan="4" | 1 || colspan="4" | D || colspan="4" | 1 || colspan="4" | D || colspan="4" | 0 || colspan="4" | 2 || colspan="4" | 0 || colspan="4" | 1 || colspan="4" | 4 || colspan="4" | 0 || colspan="4" | 2 || colspan="4" | C || colspan="4" | 2 || colspan="4" | 2 | ||
+ | |- | ||
+ | | 0 || 0 || 0 || 0 || 0 || 0 || 1 || 1 || 0 || 0 || 0 || 0 || 0 || 0 || 1 || 0 || 0 || 0 || 1 || 0 || 0 || 0 || 1 || 1 || 0 || 0 || 0 || 1 || 1 || 1 || 0 || 1 || 0 || 0 || 0 || 1 || 1 || 1 || 0 || 1 || 0 || 0 || 0 || 0 || 0 || 0 || 1 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 1 || 0 || 1 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 1 || 0 || 1 || 1 || 0 || 0 || 0 || 0 || 1 || 0 || 0 || 0 || 1 || 0 | ||
+ | |} |
Revision as of 18:05, 25 January 2012
Asymmetric Pileup (ASP)
Asymmetric Pileup (ASP) is a new pileup file format that we created to replace GLF.
ASP File Format
ASP files are binary files consisting of 4 types of records. Every record starts with a 1-byte field that indicates the type.
Type Value | Record Type | Description |
---|---|---|
0 | Empty | indicates there are no bases for this position. |
1 | Position Only | specifies the chromsome id/0-based position of the next record (other records do not contain a position). |
2 | Reference Only | indicates that all bases at this position match the reference and provides the number of bases, GLH, and GLA. |
3 | Detailed | indicates that not all bases at this position match the reference and provides the number of bases, the bases, the qualities, the cycles, the strands, and the MQs. |
Only position records contain a chromosome id/position. The record after a position record has the chromosome id & position specified in the position record. All other records are assumed to just increment one position from the previous record.
Record Type: Empty
An empty record is only 1 byte and just contains the type field. It is a placeholder to indicate at this position there were no bases. This is to prevent having to have a position-only record every time a position contains no bases.
Field | Description | Type | Value |
---|---|---|---|
type | Empty Record Type | uint8_t | 0 |
Record Type: Position Only
Field | Description | Type | Value |
---|---|---|---|
type | Position Only Record Type | uint8_t | 1 |
chromID | Chromosome ID of the next record | int32_t | |
pos | 0-based position of the next record | int32_t |
Record Type: Reference Only
Field | Description | Type | Value/Range |
---|---|---|---|
type | Reference Only Record Type | uint8_t | 2 |
numBases | Number of bases at this position | uint8_t | 1-255 |
GLH | Genotype Likelihood H | uint8_t | 0-255 |
GLA | Genotype Likelihood Alternate | uint8_t | 0-255 |
If a position has more than 255 bases, only the first 255 are used for calculating the GLH, GLA.
Phred Qualities that are unknown or less than 13 are not used in the Genotype Likelihood calculation, but are counted in the numBases if there is a base at the position.
Record Type: Detailed
Field | Description | Type | Value/Range |
---|---|---|---|
type | Detailed Record Type | uint8_t | 2 |
numBases | Number of bases at this position | uint8_t | 1-255 |
all Bases | 4-bit encoded bases/deletions for this position
1st base is in the upper bits of 1st byte if odd number of bases, the lower bits of the last byte are 0. |
uint8_t[(numBases+1)/2] | 0=A, 1=C,
2=G, 3=T, 4=N, 5=D (deletion) |
allQuals | All Phred Quals for this position | uint8_t[numBases] | 0-254
255 - unknown quality |
allCycles | 0-based position in the reads for all bases at this position | uint8_t[numBases] | 0-255 |
allStrands | Strand for all bases at this position
strand of the 1st base is in the uppermost bit of the first byte if numBases is not a multiple of 8, the extra lower bits are set to 0 |
uint8_t[(numBases+7)/8] | 0 - forward
1 - reverse |
allMQs | Mapping Qualities for all bases at this position | uint8_t[numBases] | 0-255 |
Example
Hex dump of a record: 0x0302231d1d0201402c22
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
03 | 02 | 23 | 1D | 1D | 02 | 01 | 40 | 2C | 22 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Type = DETAILED | NumBases = 2 | Base1 = 2 = G, Base2 = 3 = T | Qual1 = 0x1D = 29 = '>' | Qual2 = 0x1D = 29 = '>' | Cycle1 = 2 | Cycle2 = 1 | Strand1 (bit 56) = 0 = forward,
Strand2 (bit 57) = 1 = reverse, extra bits are dummy bits = 0 |
MapQual1 = 0x2C = 44 | MapQual2 = 0x22 = 34 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
0 | 3 | 0 | 2 | 2 | 3 | 1 | D | 1 | D | 0 | 2 | 0 | 1 | 4 | 0 | 2 | C | 2 | 2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |