Difference between revisions of "LibStatGen: ASP"
Line 153: | Line 153: | ||
− | == ASP | + | == API for Reading ASP Files == |
+ | |||
+ | You will use both an <code>AspFileReader</code> and an <code>AspRecord</code> for reading ASP files. | ||
+ | |||
+ | === <code>AspFileReader</code> === | ||
+ | An instance of the AspFileReader class is used to read ASP files. | ||
+ | |||
+ | ==== include file ==== | ||
+ | <code>AspFileReader</code> is declared in <code>AspFile.h</code>, so be sure to include that file. | ||
+ | <source lang="cpp"> | ||
+ | #include "AspFile.h" | ||
+ | </source> | ||
+ | |||
+ | ==== Opening the ASP File ==== | ||
+ | <code>open</code> opens the specified file and throws an exception if it was not successfully open. | ||
+ | |||
+ | <source lang="cpp"> | ||
+ | // Open the asp file for reading. | ||
+ | AspFileReader asp; | ||
+ | asp.open("aspFileName.asp"); | ||
+ | </source> | ||
+ | |||
+ | ==== Reading the ASP Record ==== | ||
+ | Once the file is open, there are two methods to get the next record, <code>getNextRecord</code> and <code>getNextDataRecord</code>. | ||
+ | |||
+ | Both methods return true if a record was successfully found and false on EOF or an error. | ||
+ | |||
+ | |||
+ | Both methods take a reference to an <code>AspRecord</code> as a parameter. When true is returned, the <code>AspRecord</code> is updated with the next record. | ||
+ | |||
+ | The AspRecord set by <code>getNextRecord</code> will be any type of record, Reference Only, Detailed, Empty, or Position. | ||
+ | |||
+ | The AspRecord set by <code>getNextDataRecord</code> will only be a Reference Only Record or a Detailed Record. It consumes any Empty Records and Position Records it finds until a Reference Only or Detailed Record is found. | ||
'''Details Coming Soon''' | '''Details Coming Soon''' |
Revision as of 18:07, 26 January 2012
Asymmetric Pileup (ASP)
Asymmetric Pileup (ASP) is a new pileup file format that we created to replace GLF.
ASP File Format
ASP files are binary files consisting of 4 types of records. Every record starts with a 1-byte field that indicates the type.
Type Value | Record Type | Description |
---|---|---|
0 | Empty | indicates there are no bases for this position. |
1 | Position Only | specifies the chromsome id/0-based position of the next record (other records do not contain a position). |
2 | Reference Only | indicates that all bases at this position match the reference and provides the number of bases, GLH, and GLA. |
3 | Detailed | indicates that not all bases at this position match the reference and provides the number of bases, the bases, the qualities, the cycles, the strands, and the MQs. |
Only position records contain a chromosome id/position. The record after a position record has the chromosome id & position specified in the position record. All other records are assumed to just increment one position from the previous record.
The first record in a file must be a Position Only Record.
Record Type: Empty
An empty record is only 1 byte and just contains the type field.
Since non-position only records do not have a position associated with them, the position is determined by adding one to the position of the previous record.
When positions have no bases, there are two ways to deal with them.
- Write a new position record for the next position that has bases
- Write empty records to indicate those positions have no bases.
If positions that have bases are not far apart, it is preferable to write empty records rather than a new position record since Empty records should compress well and are only 1 byte.
Field | Description | Type | Value |
---|---|---|---|
type | Empty Record Type | uint8_t | 0 |
Record Type: Position Only
Position Only records are used to specify the chromosome ID & 0-based position of the following record. The first record in the file must be a Position Only record.
Field | Description | Type | Value |
---|---|---|---|
type | Position Only Record Type | uint8_t | 1 |
chromID | Chromosome ID of the next record | int32_t | |
pos | 0-based position of the next record | int32_t |
Record Type: Reference Only
The position associated with a Reference Only record is 1 greater than the position of the previous record unless the previous record is a position record.
Field | Description | Type | Value/Range |
---|---|---|---|
type | Reference Only Record Type | uint8_t | 2 |
numBases | Number of bases at this position | uint8_t | 1-255 |
GLH | Genotype Likelihood H | uint8_t | 0-255 |
GLA | Genotype Likelihood Alternate | uint8_t | 0-255 |
If a position has more than 255 bases, only the first 255 are used for calculating the GLH, GLA.
Phred Qualities that are unknown or less than 13 are not used in the Genotype Likelihood calculation, but are counted in the numBases if there is a base at the position.
Record Type: Detailed
The position associated with a Detailed record is 1 greater than the position of the previous record unless the previous record is a position record.
Field | Description | Type | Value/Range |
---|---|---|---|
type | Detailed Record Type | uint8_t | 2 |
numBases | Number of bases at this position | uint8_t | 1-255 |
all Bases | 4-bit encoded bases/deletions for this position
1st base is in the upper bits of 1st byte if odd number of bases, the lower bits of the last byte are 0. |
uint8_t[(numBases+1)/2] | 0=A, 1=C,
2=G, 3=T, 4=N, 5=D (deletion) |
allQuals | All Phred Quals for this position | uint8_t[numBases] | 0-254
255 - unknown quality |
allCycles | 0-based position in the reads for all bases at this position | uint8_t[numBases] | 0-255 |
allStrands | Strand for all bases at this position
strand of the 1st base is in the uppermost bit of the first byte if numBases is not a multiple of 8, the extra lower bits are set to 0 |
uint8_t[(numBases+7)/8] | 0 - forward
1 - reverse |
allMQs | Mapping Qualities for all bases at this position | uint8_t[numBases] | 0-255 |
If a position has more than 255 bases, only the first 255 are used in this record.
Example
Hex dump of a record: 0x0302231d1d0201402c22
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
03 | 02 | 23 | 1D | 1D | 02 | 01 | 40 | 2C | 22 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Type = DETAILED | NumBases = 2 | Base1 = 2 = G, Base2 = 3 = T | Qual1 = 0x1D = 29 = '>' | Qual2 = 0x1D = 29 = '>' | Cycle1 = 2 | Cycle2 = 1 | Strand1 (bit 56) = 0 = forward,
Strand2 (bit 57) = 1 = reverse, extra bits are dummy bits = 0 |
MapQual1 = 0x2C = 44 | MapQual2 = 0x22 = 34 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
0 | 3 | 0 | 2 | 2 | 3 | 1 | D | 1 | D | 0 | 2 | 0 | 1 | 4 | 0 | 2 | C | 2 | 2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
API for Reading ASP Files
You will use both an AspFileReader
and an AspRecord
for reading ASP files.
AspFileReader
An instance of the AspFileReader class is used to read ASP files.
include file
AspFileReader
is declared in AspFile.h
, so be sure to include that file.
#include "AspFile.h"
Opening the ASP File
open
opens the specified file and throws an exception if it was not successfully open.
// Open the asp file for reading.
AspFileReader asp;
asp.open("aspFileName.asp");
Reading the ASP Record
Once the file is open, there are two methods to get the next record, getNextRecord
and getNextDataRecord
.
Both methods return true if a record was successfully found and false on EOF or an error.
Both methods take a reference to an AspRecord
as a parameter. When true is returned, the AspRecord
is updated with the next record.
The AspRecord set by getNextRecord
will be any type of record, Reference Only, Detailed, Empty, or Position.
The AspRecord set by getNextDataRecord
will only be a Reference Only Record or a Detailed Record. It consumes any Empty Records and Position Records it finds until a Reference Only or Detailed Record is found.
Details Coming Soon