C++ Class: SamFile

From Genome Analysis Wiki
Jump to navigationJump to search


Reading/Writing SAM/BAM Files In Your Program

The SamFile class allows a user to easily read/write a SAM/BAM file.

The SamFile class contains additional functionality that allows a user to read specific sections of sorted & indexed BAM files. In order take advantage of this capability, the index file must be read prior to setting the read section. This logic saves the time of having to read the entire file and takes advantage of the seeking capability of BGZF files.

Future Enhancements: Add the ability to read alignments that match a given start, end position for a specific reference sequence.

This class is part of C++ Library: libStatGen.

Class Documentation

See: http://csg.sph.umich.edu//mktrost/doxygen/current/classSamFile.html

Child Classes

SamFileReader

http://csg.sph.umich.edu//mktrost/doxygen/current/classSamFileReader.html

SamFileWriter

http://www.sph.umich.edu/csg/mktrost/doxygen/current/classSamFileWriter.html

Statistics

Statistic Generation

The following statistics can be optionally recorded when reading a SamFile by specifying SamFile::GenerateStatistics() and displayed with SamFile::PrintStatistics()

The statistics only reflect alignments that were successfully read from the BAM file. Alignments that failed to parse from the file are not reflected in the statistics, but alignments that are invalid for other reasons may show up in the statistics.

Read Counts
Statistic Description
TotalReads Total number of alignments that were successfully read from the file.
MappedReads Total number of alignments that were successfully read from the file with FLAG bit 0x004 set to 0 (not unmapped).
PairedReads Total number of alignments that were successfully read from the file with FLAG bit 0x001 set to 1 (paired).
ProperPair Total number of alignments that were successfully read from the file with FLAG bits 0x001 set to 1 (paired) AND 0x002 (proper pair).
DuplicateReads Total number of alignments that were successfully read from the file with FLAG bit 0x400 set to 1 (PCR or optical duplicate).
QCFailureReads Total number of alignments that were successfully read from the file with FLAG bit 0x200 set to 1 (failed quality checks).
Statistic Description
MappingRate(%) 100 * MappedReads/TotalReads
PairedReads(%) 100 * PairedReads/TotalReads
ProperPair(%) 100 * ProperPair/TotalReads
DupRate(%) 100 * DuplicateReads/TotalReads
QCFailRate(%) 100 * QCFailureReads/TotalReads
Statistic Description
TotalBases Sum of the SEQ lengths for all alignments that were successfully read from the file.

NOTE: Includes bases that are 'N'.

BasesInMappedReads Sum of the SEQ lengths for all alignments that were successfully read from the file with FLAG bit 0x004 set to 0 (not unmapped).

NOTE: Includes bases that are 'N'.

NOTE: If the TotalReads is greater than 10^6, then the Read Counts and Base Counts specify the total counts divided by 10^6. This is indicated in the output with a (e6) appended to the field name.

Example Statistics Output

TotalReads(e6)	18.90
MappedReads(e6)	14.77
PairedReads(e6)	18.90
ProperPair(e6)	11.28
DuplicateReads(e6)	0.00
QCFailureReads(e6)	0.00

MappingRate(%)	78.17
PairedReads(%)	100.00
ProperPair(%)	59.68
DupRate(%)	0.00
QCFailRate(%)	0.00

TotalBases(e6)	699.30
BasesInMappedReads(e6)	546.67

Usage Examples

Sam Library Usage Examples