BamUtil: stats
Overview of the stats
function of bamUtil
The stats
option on the bamUtil executable generates the specified statistics on a SAM/BAM file.
Parameters
Required Parameters: --in : the SAM/BAM file to calculate stats for Types of Statistics that can be generated: --basic : Turn on basic statistic generation --qual : Generate a count for each quality (displayed as non-phred quality) --phred : Generate a count for each quality (displayed as phred quality) --baseQC : Write per base statistics to the specified file. Optional Parameters: --maxNumReads : Maximum number of reads to process Defaults to -1 to indicate all reads. --unmapped : Only process unmapped reads (requires a bamIndex file) --bamIndex : The path/name of the bam index file (if required and not specified, uses the --in value + ".bai") --regionList : File containing the region list chr<tab>start_pos<tab>end<pos>. Positions are 0 based and the end_pos is not included in the region. Uses bamIndex. --minMapQual : The minimum mapping quality for filtering reads in the baseQC stats. --dbsnp : The dbSnp file of positions to exclude from baseQC analysis. --noeof : Do not expect an EOF block on a bam file. --params : Print the parameter settings
For all types of statistics, the bam file used is specified by --in
.
The optional parameters are also used for all types of statistics.
Usage:
./bam stats --in <inputFile> [--basic] [--qual] [--phred] [--baseQC <outputFileName>] [--maxNumReads <maxNum>] [--unmapped] [--bamIndex <bamIndexFile>] [--regionList <regFileName>] [--minMapQual <minMapQ>] [--dbsnp <dbsnpFile>] [--noeof] [--params]
Types of Statistics
Basic
Prints summary statistics for the file:
- TotalReads - # of reads that are in the file
- MappedReads - # of reads marked mapped in the flag
- PairedReads - # of reads marked paired in the flag
- ProperPair - # of reads marked paired AND proper paired in the flag
- DuplicateReads - # of reads marked duplicate in the flag
- QCFailureReads - # of reads marked QC failure in the flag
- MappingRate(%) - # of reads marked mapped in the flag / TotalReads
- PairedReads(%) - # of reads marked paired in the flag / TotalReads
- ProperPair(%) - # of reads marked paired AND proper paired in the flag / TotalReads
- DupRate(%) - # of reads marked duplicate in the flag / TotalReads
- QCFailRate(%) - # of reads marked QC failure in the flag / TotalReads
- TotalBases - # of bases in all reads
- BasesInMappedReads - # of bases in reads marked mapped in the flag
Qual/Phred
Prints a count of the number of times each quality value appears in the file.
phred
Displays Quality as phred integers [0-93]qual
Displays Quality as non-phred integers (phred + 33) [33-126]
BaseQC
The baseQC
option generates the following statistics:
For each position, the following counts are incremented if:
- a read spans the reference position (starts before or at this reference position and ends at or after this position)
- regardless of duplicate/qc failure/unmapped/mapping quality
- if CIGAR for this position is M/X/=/D/N (any cigar other than clip or insert)
- TotalReads - # of reads that span this position.
- Dups - # of reads marked duplicate in the flag
- QCFail - # of reads marked QC failure in the flag
No further stats are incremented if the read is a duplicate, QC failure, or unmapped.
Additional counts incremented ONLY for mapped, non-duplicate, non-QC failure reads:
- Mapped - # of reads marked mapped in the flag
- Paired - # of reads marked paired in the flag
- ProperPaired - # of reads marked paired AND proper paired in the flag
- ZeroMapQual - # of reads that have a Mapping Quality of 0
- MapQual<10 - # of reads that have a Mapping Quality < 10
- MapQual255 - # of reads that have a Mapping Quality = 255
- PassMapQual - # of reads that have a Mapping Quality >= a minimum Mapping Quality (version 1.0, this includes mapping quality 255 reads).
Additional values ONLY for mapped, mapping quality != 255, non-duplicate, non-QC failure reads:
- AverageMapQuality - average calculated by summing all mapping qualities that are included (as defined above) and dividing by the number of mapping qualities added.
- AverageMapQualCount - # of mapping qualities used to calculate AverageMapQuality.
Additional values ONLY incremented for mapped, mapping quality >= min mapping quality, non-duplicate, non-QC failure reads (version 1.0, this includes mapping quality 255 reads):
- Depth - # of reads.
- Q20Bases - # of bases at this position with a base quality (from the read) of Q20 or higher.
Currently there is no special logic to exclude positions where the refernce is 'N'.
Currently there is no special logic to exclude reads from the counts when the base is 'N'.
BaseQC Output
There are two output options for BaseQC.
- Percentages
- Straight Counts
Percentage-Based Output Format
Order (with calculations based on the values described above):
- chrom - Chromosome/reference name string from the SAM/BAM
- chromStart - 0-based start position
- chromEnd - 0-based end position (always 1 greater than start and not included in this region)
- Depth - Depth
- Q20Bases - Q20Bases
- Q20BasesPct(%) - Q20Bases / Depth
- TotalReads - TotalReads
- MappedBases - Mapped
- MappingRate(%) - Mapped / TotalReads
- MapRate_MQPass(%) - PassMapQual / TotalReads
- ZeroMapQual(%) - ZeroMapQual / TotalReads
- MapQual<10(%) - MapQual<10 / TotalReads
- PairedReads(%) - Paired / TotalReads
- ProperPaired(%) - ProperPaired / TotalReads
- DupRate(%) - Dups / TotalReads
- QCFailRate(%) - QCFail / TotalReads
- AverageMapQuality - AverageMapQuality
- AverageMapQualCount - AverageMapQualCount
This output does not include a MapQual255 count in version 1.0.
Count-Based Output Format
Order (of values described above):
- chrom - Chromosome/reference name string from the SAM/BAM
- chromStart - 0-based start position
- chromEnd - 0-based end position (always 1 greater than start and not included in this region)
- TotalReads
- Dups
- QCFail
- Mapped
- Paired
- ProperPaired
- ZeroMapQual
- MapQual<10
- MapQual255
- PassMapQual
- AverageMapQuality
- AverageMapQualCount
- Depth
- Q20Bases
Sample Output
chrom chromStart chromEnd Depth Q20Bases Q20BasesPct(%) TotalReads MappedBases MappingRate(%) MapRate_MQPass(%) ZeroMapQual(%) MapQual<10(%) PairedReads(%) ProperPaired(%) DupRate(%) QCFailRate(%) AverageMapQuality AverageMapQualCount 1 100 101 2 2 100.000 3 3 100.000 66.667 33.333 66.667 100.000 0.000 0.000 0.000 11.000 3 1 101 102 2 0 0.000 3 3 100.000 66.667 33.333 66.667 100.000 0.000 0.000 0.000 11.000 3 1 102 103 0 0 0.000 3 3 100.000 66.667 33.333 66.667 100.000 0.000 0.000 0.000 0.000 0 1 103 104 0 0 0.000 3 3 100.000 66.667 33.333 66.667 100.000 0.000 0.000 0.000 0.000 0 1 104 105 2 0 0.000 3 3 100.000 66.667 33.333 66.667 100.000 0.000 0.000 0.000 11.000 3 1 105 106 2 2 100.000 3 3 100.000 66.667 33.333 66.667 100.000 0.000 0.000 0.000 11.000 3 1 110 111 0 0 0.000 3 3 100.000 66.667 33.333 66.667 100.000 0.000 0.000 0.000 0.000 0 1 111 112 2 2 100.000 3 3 100.000 66.667 33.333 66.667 100.000 0.000 0.000 0.000 11.000 3 1 112 113 2 2 100.000 3 3 100.000 66.667 33.333 66.667 100.000 0.000 0.000 0.000 11.000 3 1 10012 10013 14 0 0.000 42 33 78.571 52.381 26.190 52.381 85.714 35.714 14.286 14.286 11.000 21 1 10013 10014 14 10 71.429 39 30 76.923 51.282 25.641 51.282 84.615 38.462 15.385 15.385 11.000 21 1 10023 10024 0 0 0.000 39 30 76.923 51.282 25.641 51.282 84.615 38.462 15.385 15.385 0.000 0 1 10024 10025 14 12 85.714 39 30 76.923 51.282 25.641 51.282 84.615 38.462 15.385 15.385 11.000 21