Line 1: |
Line 1: |
− | [[Category:BamUtil|stats]]
| + | <br> |
− | [[Category:BAM Software]]
| |
− | [[Category:Software]]
| |
| | | |
− | = Overview of the <code>stats</code> function of <code>bamUtil</code> = | + | = Overview of the <code>stats</code> function of <code>bamUtil</code> = |
− | The <code>stats</code> option on the [[bamUtil]] executable generates the specified statistics on a SAM/BAM file.
| |
| | | |
− | = Parameters = | + | The <code>stats</code> option on the [[BamUtil]] executable generates the specified statistics on a SAM/BAM file. |
− | <pre> | + | |
− | Required Parameters:
| + | = Parameters = |
− | --in : the SAM/BAM file to calculate stats for | + | <pre> Required Parameters: |
| + | --in : the SAM/BAM file to calculate stats for |
| Types of Statistics that can be generated: | | Types of Statistics that can be generated: |
− | --basic : Turn on basic statistic generation | + | --basic : Turn on basic statistic generation |
− | --qual : Generate a count for each quality (displayed as non-phred quality) | + | --qual : Generate a count for each quality (displayed as non-phred quality) |
− | --phred : Generate a count for each quality (displayed as phred quality) | + | --phred : Generate a count for each quality (displayed as phred quality) |
− | --baseQC : Write per base statistics to the specified file. | + | --baseQC : Write per base statistics to the specified file. |
| Optional Parameters: | | Optional Parameters: |
− | --maxNumReads : Maximum number of reads to process | + | --maxNumReads : Maximum number of reads to process |
| Defaults to -1 to indicate all reads. | | Defaults to -1 to indicate all reads. |
− | --unmapped : Only process unmapped reads (requires a bamIndex file) | + | --unmapped : Only process unmapped reads (requires a bamIndex file) |
− | --bamIndex : The path/name of the bam index file | + | --bamIndex : The path/name of the bam index file |
| (if required and not specified, uses the --in value + ".bai") | | (if required and not specified, uses the --in value + ".bai") |
− | --regionList : File containing the region list chr<tab>start_pos<tab>end<pos>. | + | --regionList : File containing the region list chr<tab>start_pos<tab>end<pos>. |
| Positions are 0 based and the end_pos is not included in the region. | | Positions are 0 based and the end_pos is not included in the region. |
| Uses bamIndex. | | Uses bamIndex. |
− | --minMapQual : The minimum mapping quality for filtering reads in the baseQC stats. | + | --minMapQual : The minimum mapping quality for filtering reads in the baseQC stats. |
− | --dbsnp : The dbSnp file of positions to exclude from baseQC analysis. | + | --dbsnp : The dbSnp file of positions to exclude from baseQC analysis. |
− | --noeof : Do not expect an EOF block on a bam file. | + | --noeof : Do not expect an EOF block on a bam file. |
− | --params : Print the parameter settings | + | --params : Print the parameter settings |
− | </pre> | + | </pre> |
− | | + | For all types of statistics, the bam file used is specified by <code>--in</code>. |
− | For all types of statistics, the bam file used is specified by <code>--in</code>. | |
| | | |
| The optional parameters are also used for all types of statistics. | | The optional parameters are also used for all types of statistics. |
| | | |
− | Usage: | + | Usage: |
− | <pre> | + | <pre> ./bam stats --in <inputFile> [--basic] [--qual] [--phred] [--baseQC <outputFileName>] [--maxNumReads <maxNum>] [--unmapped] [--bamIndex <bamIndexFile>] [--regionList <regFileName>] [--minMapQual <minMapQ>] [--dbsnp <dbsnpFile>] [--noeof] [--params] |
− | ./bam stats --in <inputFile> [--basic] [--qual] [--phred] [--baseQC <outputFileName>] [--maxNumReads <maxNum>] [--unmapped] [--bamIndex <bamIndexFile>] [--regionList <regFileName>] [--minMapQual <minMapQ>] [--dbsnp <dbsnpFile>] [--noeof] [--params]
| + | </pre> |
− | </pre> | + | <br> |
| | | |
| + | = Types of Statistics = |
| | | |
| + | == Basic == |
| | | |
− | = Types of Statistics =
| + | Prints summary statistics for the file: |
| | | |
− | == Basic ==
| + | *TotalReads - # of reads that are in the file |
− | Prints summary statistics for the file:
| + | *MappedReads - # of reads marked mapped in the flag |
− | *TotalReads - # of reads that are in the file | + | *PairedReads - # of reads marked paired in the flag |
− | *MappedReads - # of reads marked mapped in the flag | + | *ProperPair - # of reads marked paired AND proper paired in the flag |
− | *PairedReads - # of reads marked paired in the flag | + | *DuplicateReads - # of reads marked duplicate in the flag |
− | *ProperPair - # of reads marked paired AND proper paired in the flag | + | *QCFailureReads - # of reads marked QC failure in the flag |
− | *DuplicateReads - # of reads marked duplicate in the flag | + | *MappingRate(%) - # of reads marked mapped in the flag / TotalReads |
− | *QCFailureReads - # of reads marked QC failure in the flag | + | *PairedReads(%) - # of reads marked paired in the flag / TotalReads |
− | *MappingRate(%) - # of reads marked mapped in the flag / TotalReads | + | *ProperPair(%) - # of reads marked paired AND proper paired in the flag / TotalReads |
− | *PairedReads(%) - # of reads marked paired in the flag / TotalReads | + | *DupRate(%) - # of reads marked duplicate in the flag / TotalReads |
− | *ProperPair(%) - # of reads marked paired AND proper paired in the flag / TotalReads | + | *QCFailRate(%) - # of reads marked QC failure in the flag / TotalReads |
− | *DupRate(%) - # of reads marked duplicate in the flag / TotalReads | + | *TotalBases - # of bases in all reads |
− | *QCFailRate(%) - # of reads marked QC failure in the flag / TotalReads | |
− | *TotalBases - # of bases in all reads | |
| *BasesInMappedReads - # of bases in reads marked mapped in the flag | | *BasesInMappedReads - # of bases in reads marked mapped in the flag |
| | | |
− | == Qual/Phred == | + | == Qual/Phred == |
− | Prints a count of the number of times each quality value appears in the file.
| |
− | *<code>phred</code> Displays Quality as phred integers [0-93]
| |
− | *<code>qual</code> Displays Quality as non-phred integers (phred + 33) [33-126]
| |
| | | |
| + | Prints a count of the number of times each quality value appears in the file. |
| | | |
− | == BaseQC ==
| + | *<code>phred</code> Displays Quality as phred integers [0-93] |
| + | *<code>qual</code> Displays Quality as non-phred integers (phred + 33) [33-126] |
| | | |
− | The <code>baseQC</code> option generates the following statistics:
| + | <br> |
| | | |
− | For each position, the following counts are incremented if:
| + | == BaseQC == |
− | # a read spans the reference position (starts before or at this reference position and ends at or after this position)
| |
− | # regardless of duplicate/qc failure/unmapped/mapping quality
| |
− | # if CIGAR for this position is M/X/=/D/N (any cigar other than clip or insert)
| |
− | *TotalReads - # of reads that span this position.
| |
− | *Dups - # of reads marked duplicate in the flag
| |
− | *QCFail - # of reads marked QC failure in the flag
| |
| | | |
− | No further stats are incremented if the read is a duplicate, QC failure, or unmapped.
| + | The <code>baseQC</code> option generates the following statistics: |
| | | |
− | Additional counts incremented ONLY for mapped, non-duplicate, non-QC failure reads:
| + | A read spans a position if the read starts at or before the position, ends at or after the position and the position is not a clip. CIGAR operations allowed for the position are M/X/=/D/N. |
− | *Mapped - # of reads marked mapped in the flag
| |
− | *Paired - # of reads marked paired in the flag
| |
− | *ProperPaired - # of reads marked paired AND proper paired in the flag
| |
− | *ZeroMapQual - # of reads that have a Mapping Quality of 0
| |
− | *MapQual<10 - # of reads that have a Mapping Quality < 10
| |
− | *MapQual255 - # of reads that have a Mapping Quality = 255
| |
− | *PassMapQual - # of reads that have a Mapping Quality >= a minimum Mapping Quality (version 1.0, this includes mapping quality 255 reads).
| |
| | | |
− | Additional values ONLY for mapped, mapping quality != 255, non-duplicate, non-QC failure reads:
| + | Currently there is no special logic to exclude positions/reads where the reference base is 'N' or the read base is 'N'. |
− | *AverageMapQuality - average calculated by summing all mapping qualities that are included (as defined above) and dividing by the number of mapping qualities added.
| |
− | *AverageMapQualCount - # of mapping qualities used to calculate AverageMapQuality.
| |
| | | |
− | Additional values ONLY incremented for mapped, mapping quality >= min mapping quality, non-duplicate, non-QC failure reads (version 1.0, this includes mapping quality 255 reads):
| + | <br> |
− | *Depth - # of reads.
| |
− | *Q20Bases - # of bases at this position with a base quality (from the read) of Q20 or higher.
| |
| | | |
− | Currently there is no special logic to exclude positions where the refernce is 'N'.
| + | === BaseQC Output === |
| | | |
− | Currently there is no special logic to exclude reads from the counts when the base is 'N'.
| + | There are two output options for BaseQC. |
| | | |
| + | #Percentages |
| + | #Straight Counts |
| | | |
− | === BaseQC Output === | + | ==== Percentage-Based Output Format ==== |
− | There are two output options for BaseQC.
| |
− | # Percentages
| |
− | # Straight Counts
| |
| | | |
− | ==== Percentage-Based Output Format ====
| + | Order/Descriptions: |
− | Order (with calculations based on the values described above): | |
− | *chrom - Chromosome/reference name string from the SAM/BAM
| |
− | *chromStart - 0-based start position
| |
− | *chromEnd - 0-based end position (always 1 greater than start and not included in this region)
| |
− | *Depth - Depth
| |
− | *Q20Bases - Q20Bases
| |
− | *Q20BasesPct(%) - Q20Bases / Depth
| |
− | *TotalReads - TotalReads
| |
− | *MappedBases - Mapped
| |
− | *MappingRate(%) - Mapped / TotalReads
| |
− | *MapRate_MQPass(%) - PassMapQual / TotalReads
| |
− | *ZeroMapQual(%) - ZeroMapQual / TotalReads
| |
− | *MapQual<10(%) - MapQual<10 / TotalReads
| |
− | *PairedReads(%) - Paired / TotalReads
| |
− | *ProperPaired(%) - ProperPaired / TotalReads
| |
− | *DupRate(%) - Dups / TotalReads
| |
− | *QCFailRate(%) - QCFail / TotalReads
| |
− | *AverageMapQuality - AverageMapQuality
| |
− | *AverageMapQualCount - AverageMapQualCount
| |
| | | |
− | This output does not include a MapQual255 count in version 1.0.
| + | {|border=1 |
| + | ! Field !! Description !!style="width: 80px"| Excludes Duplicates, QC Failures !!style="width: 80px"| Excludes Unmapped !!style="width: 80px"| Excludes MapQual = 255 !!style="width: 80px"| Excludes Below Min MapQual |
| + | |- |
| + | | chrom || Chromosome/reference name string from the SAM/BAM |
| + | |- |
| + | | chromStart || 0-based start position |
| + | |- |
| + | | chromEnd || 0-based end position (always 1 greater than start and not included in this region) |
| + | |- |
| + | | Depth || # of reads that are mapped with acceptable Mapping Quality, and are not duplicates or QC failures || align="center"|X || align="center"|X || || align="center"|X |
| + | |- |
| + | | Q20Bases || # of bases at this position with a base quality (from the read) of Q20 or higher || align="center"|X || align="center"|X || || align="center"|X |
| + | |- |
| + | | Q20BasesPct(%) || Q20Bases / Depth || align="center"|X || align="center"|X || || align="center"|X |
| + | |- |
| + | | TotalReads || # of reads that span this position || || || || |
| + | |- |
| + | | MappedBases || # of reads marked mapped in the flag || align="center"|X || align="center"|X || || |
| + | |- |
| + | | MappingRate(%) || MappedBases / TotalReads || align="center"|X || align="center"|X || || |
| + | |- |
| + | | MapRate_MQPass(%) || # of reads that have a Mapping Quality >= a minimum Mapping Quality / TotalReads || align="center"|X || align="center"|X || || |
| + | |- |
| + | | ZeroMapQual(%) || # of reads that have a Mapping Quality of 0 / TotalReads || align="center"|X || align="center"|X || || |
| + | |- |
| + | | MapQual<10(%) || # of reads that have a Mapping Quality < 10 / TotalReads || align="center"|X || align="center"|X || || |
| + | |- |
| + | | PairedReads(%) || # of reads marked paired in the flag / TotalReads || align="center"|X || align="center"|X || || |
| + | |- |
| + | | ProperPaired(%) || # of reads marked paired AND proper paired in the flag / TotalReads || align="center"|X || align="center"|X || || |
| + | |- |
| + | | DupRate(%) || # of reads marked duplicate in the flag / TotalReads || || || || |
| + | |- |
| + | | QCFailRate(%) || # of reads marked QC failure in the flag / TotalReads || || || || |
| + | |- |
| + | | AverageMapQuality || sum of included mapping qualities / AverageMapQualCount || align="center"|X || align="center"|X || align="center"|X || |
| + | |- |
| + | | AverageMapQualCount || # of mapping qualities in AverageMapQuality || align="center"|X || align="center"|X || align="center"|X || |
| + | |- |
| + | |} |
| | | |
| + | This output does not include a MapQual255 count in version 1.0. |
| | | |
− | ==== Count-Based Output Format ==== | + | ===== Sample Output ===== |
− | Order (of values described above):
| + | <pre>chrom chromStart chromEnd Depth Q20Bases Q20BasesPct(%) TotalReads MappedBases MappingRate(%) MapRate_MQPass(%) ZeroMapQual(%) MapQual<10(%) PairedReads(%) ProperPaired(%) DupRate(%) QCFailRate(%) AverageMapQuality AverageMapQualCount |
− | *chrom - Chromosome/reference name string from the SAM/BAM
| |
− | *chromStart - 0-based start position
| |
− | *chromEnd - 0-based end position (always 1 greater than start and not included in this region)
| |
− | *TotalReads
| |
− | *Dups
| |
− | *QCFail
| |
− | *Mapped
| |
− | *Paired
| |
− | *ProperPaired
| |
− | *ZeroMapQual
| |
− | *MapQual<10
| |
− | *MapQual255
| |
− | *PassMapQual
| |
− | *AverageMapQuality
| |
− | *AverageMapQualCount
| |
− | *Depth
| |
− | *Q20Bases
| |
− | | |
− | | |
− | === Sample Output === | |
− | | |
− | <pre> | |
− | chrom chromStart chromEnd Depth Q20Bases Q20BasesPct(%) TotalReads MappedBases MappingRate(%) MapRate_MQPass(%) ZeroMapQual(%) MapQual<10(%) PairedReads(%) ProperPaired(%) DupRate(%) QCFailRate(%) AverageMapQuality AverageMapQualCount | |
| 1 100 101 2 2 100.000 3 3 100.000 66.667 33.333 66.667 100.000 0.000 0.000 0.000 11.000 3 | | 1 100 101 2 2 100.000 3 3 100.000 66.667 33.333 66.667 100.000 0.000 0.000 0.000 11.000 3 |
| 1 101 102 2 0 0.000 3 3 100.000 66.667 33.333 66.667 100.000 0.000 0.000 0.000 11.000 3 | | 1 101 102 2 0 0.000 3 3 100.000 66.667 33.333 66.667 100.000 0.000 0.000 0.000 11.000 3 |
Line 168: |
Line 144: |
| 1 10023 10024 0 0 0.000 39 30 76.923 51.282 25.641 51.282 84.615 38.462 15.385 15.385 0.000 0 | | 1 10023 10024 0 0 0.000 39 30 76.923 51.282 25.641 51.282 84.615 38.462 15.385 15.385 0.000 0 |
| 1 10024 10025 14 12 85.714 39 30 76.923 51.282 25.641 51.282 84.615 38.462 15.385 15.385 11.000 21 | | 1 10024 10025 14 12 85.714 39 30 76.923 51.282 25.641 51.282 84.615 38.462 15.385 15.385 11.000 21 |
− | </pre> | + | </pre> |
| + | ==== Count-Based Output Format ==== |
| + | |
| + | {|border=1 |
| + | ! Field !! Description !!style="width: 80px"| Excludes Duplicates, QC Failures !!style="width: 80px"| Excludes Unmapped !!style="width: 80px"| Excludes MapQual = 255 !!style="width: 80px"| Excludes Below Min MapQual |
| + | |- |
| + | | chrom || Chromosome/reference name string from the SAM/BAM |
| + | |- |
| + | | chromStart || 0-based start position |
| + | |- |
| + | | chromEnd || 0-based end position (always 1 greater than start and not included in this region) |
| + | |- |
| + | | TotalReads || # of reads that span this position || || || || |
| + | |- |
| + | | Dups || # of reads marked duplicate in the flag || || || || |
| + | |- |
| + | | QCFail || # of reads marked QC failure in the flag || || || || |
| + | |- |
| + | | Mapped || # of reads marked mapped in the flag || align="center"|X || align="center"|X || || |
| + | |- |
| + | | Paired || # of reads marked paired in the flag || align="center"|X || align="center"|X || || |
| + | |- |
| + | | ProperPaired || # of reads marked paired AND proper paired in the flag || align="center"|X || align="center"|X || || |
| + | |- |
| + | | ZeroMapQual || # of reads that have a Mapping Quality of 0 || align="center"|X || align="center"|X || || |
| + | |- |
| + | | MapQual<10(%) || # of reads that have a Mapping Quality < 10 || align="center"|X || align="center"|X || || |
| + | |- |
| + | | MapQual255 || # of reads that have a Mapping Quality = 255 || align="center"|X || align="center"|X || || |
| + | |- |
| + | | PassMapQual || # of reads that have a Mapping Quality >= a minimum Mapping Quality || align="center"|X || align="center"|X || || |
| + | |- |
| + | | AverageMapQuality || sum of included mapping qualities / AverageMapQualCount || align="center"|X || align="center"|X || align="center"|X || |
| + | |- |
| + | | AverageMapQualCount || # of mapping qualities in AverageMapQuality || align="center"|X || align="center"|X || align="center"|X || |
| + | |- |
| + | | Depth || # of reads that are mapped with acceptable Mapping Quality, and are not duplicates or QC failures || align="center"|X || align="center"|X || || align="center"|X |
| + | |- |
| + | | Q20Bases || # of bases at this position with a base quality (from the read) of Q20 or higher || align="center"|X || align="center"|X || || align="center"|X |
| + | |- |
| + | |} |
| + | |
| + | |
| + | |
| + | [[Category:BamUtil|stats]] [[Category:BAM_Software]] [[Category:Software]] |