Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,729 bytes added ,  14:47, 7 October 2011
Update Stats
Line 1: Line 1: −
[[Category:BamUtil|stats]]
+
<br>
[[Category:BAM Software]]
  −
[[Category:Software]]
     −
= Overview of the <code>stats</code> function of <code>bamUtil</code> =
+
= Overview of the <code>stats</code> function of <code>bamUtil</code> =
The <code>stats</code> option on the [[bamUtil]] executable generates the specified statistics on a SAM/BAM file.
     −
= Parameters =
+
The <code>stats</code> option on the [[BamUtil]] executable generates the specified statistics on a SAM/BAM file.
<pre>
+
 
Required Parameters:
+
= Parameters =
--in : the SAM/BAM file to calculate stats for
+
<pre> Required Parameters:
 +
--in&nbsp;: the SAM/BAM file to calculate stats for
 
Types of Statistics that can be generated:
 
Types of Statistics that can be generated:
--basic       : Turn on basic statistic generation
+
--basic     &nbsp;: Turn on basic statistic generation
--qual       : Generate a count for each quality (displayed as non-phred quality)
+
--qual       &nbsp;: Generate a count for each quality (displayed as non-phred quality)
--phred       : Generate a count for each quality (displayed as phred quality)
+
--phred     &nbsp;: Generate a count for each quality (displayed as phred quality)
--baseQC     : Write per base statistics to the specified file.
+
--baseQC     &nbsp;: Write per base statistics to the specified file.
 
Optional Parameters:
 
Optional Parameters:
--maxNumReads : Maximum number of reads to process
+
--maxNumReads&nbsp;: Maximum number of reads to process
 
                Defaults to -1 to indicate all reads.
 
                Defaults to -1 to indicate all reads.
--unmapped   : Only process unmapped reads (requires a bamIndex file)
+
--unmapped   &nbsp;: Only process unmapped reads (requires a bamIndex file)
--bamIndex   : The path/name of the bam index file
+
--bamIndex   &nbsp;: The path/name of the bam index file
 
                (if required and not specified, uses the --in value + ".bai")
 
                (if required and not specified, uses the --in value + ".bai")
--regionList : File containing the region list chr<tab>start_pos<tab>end<pos>.
+
--regionList &nbsp;: File containing the region list chr&lt;tab&gt;start_pos&lt;tab&gt;end&lt;pos&gt;.
 
                Positions are 0 based and the end_pos is not included in the region.
 
                Positions are 0 based and the end_pos is not included in the region.
 
                Uses bamIndex.
 
                Uses bamIndex.
--minMapQual : The minimum mapping quality for filtering reads in the baseQC stats.
+
--minMapQual &nbsp;: The minimum mapping quality for filtering reads in the baseQC stats.
--dbsnp       : The dbSnp file of positions to exclude from baseQC analysis.
+
--dbsnp     &nbsp;: The dbSnp file of positions to exclude from baseQC analysis.
--noeof       : Do not expect an EOF block on a bam file.
+
--noeof     &nbsp;: Do not expect an EOF block on a bam file.
--params     : Print the parameter settings
+
--params     &nbsp;: Print the parameter settings
</pre>
+
</pre>  
 
+
For all types of statistics, the bam file used is specified by <code>--in</code>.  
For all types of statistics, the bam file used is specified by <code>--in</code>.
      
The optional parameters are also used for all types of statistics.  
 
The optional parameters are also used for all types of statistics.  
   −
Usage:
+
Usage:  
<pre>
+
<pre> ./bam stats --in &lt;inputFile&gt; [--basic] [--qual] [--phred] [--baseQC &lt;outputFileName&gt;] [--maxNumReads &lt;maxNum&gt;] [--unmapped] [--bamIndex &lt;bamIndexFile&gt;] [--regionList &lt;regFileName&gt;] [--minMapQual &lt;minMapQ&gt;] [--dbsnp &lt;dbsnpFile&gt;] [--noeof] [--params]
./bam stats --in <inputFile> [--basic] [--qual] [--phred] [--baseQC <outputFileName>] [--maxNumReads <maxNum>] [--unmapped] [--bamIndex <bamIndexFile>] [--regionList <regFileName>] [--minMapQual <minMapQ>] [--dbsnp <dbsnpFile>] [--noeof] [--params]
+
</pre>
</pre>
+
<br>  
    +
= Types of Statistics  =
    +
== Basic  ==
   −
= Types of Statistics =
+
Prints summary statistics for the file:
   −
== Basic ==
+
*TotalReads - # of reads that are in the file  
Prints summary statistics for the file:
+
*MappedReads - # of reads marked mapped in the flag  
*TotalReads - # of reads that are in the file
+
*PairedReads - # of reads marked paired in the flag  
*MappedReads - # of reads marked mapped in the flag
+
*ProperPair - # of reads marked paired AND proper paired in the flag  
*PairedReads - # of reads marked paired in the flag
+
*DuplicateReads - # of reads marked duplicate in the flag  
*ProperPair - # of reads marked paired AND proper paired in the flag
+
*QCFailureReads - # of reads marked QC failure in the flag  
*DuplicateReads - # of reads marked duplicate in the flag
+
*MappingRate(%) - # of reads marked mapped in the flag / TotalReads  
*QCFailureReads - # of reads marked QC failure in the flag
+
*PairedReads(%) - # of reads marked paired in the flag / TotalReads  
*MappingRate(%) - # of reads marked mapped in the flag / TotalReads
+
*ProperPair(%) - # of reads marked paired AND proper paired in the flag / TotalReads  
*PairedReads(%) - # of reads marked paired in the flag / TotalReads
+
*DupRate(%) - # of reads marked duplicate in the flag / TotalReads  
*ProperPair(%) - # of reads marked paired AND proper paired in the flag / TotalReads
+
*QCFailRate(%) - # of reads marked QC failure in the flag / TotalReads  
*DupRate(%) - # of reads marked duplicate in the flag / TotalReads
+
*TotalBases - # of bases in all reads  
*QCFailRate(%) - # of reads marked QC failure in the flag / TotalReads
  −
*TotalBases - # of bases in all reads
   
*BasesInMappedReads - # of bases in reads marked mapped in the flag
 
*BasesInMappedReads - # of bases in reads marked mapped in the flag
   −
== Qual/Phred ==
+
== Qual/Phred ==
Prints a count of the number of times each quality value appears in the file.
  −
*<code>phred</code> Displays Quality as phred integers [0-93]
  −
*<code>qual</code>  Displays Quality as non-phred integers (phred + 33) [33-126]
      +
Prints a count of the number of times each quality value appears in the file.
   −
== BaseQC ==
+
*<code>phred</code> Displays Quality as phred integers [0-93]
 +
*<code>qual</code> Displays Quality as non-phred integers (phred + 33) [33-126]
   −
The <code>baseQC</code> option generates the following statistics:
+
<br>  
   −
For each position, the following counts are incremented if:
+
== BaseQC  ==
# a read spans the reference position (starts before or at this reference position and ends at or after this position)
  −
# regardless of duplicate/qc failure/unmapped/mapping quality
  −
# if CIGAR for this position is M/X/=/D/N (any cigar other than clip or insert)
  −
*TotalReads - # of reads that span this position.
  −
*Dups - # of reads marked duplicate in the flag
  −
*QCFail - # of reads marked QC failure in the flag
     −
No further stats are incremented if the read is a duplicate, QC failure, or unmapped.
+
The <code>baseQC</code> option generates the following statistics:
   −
Additional counts incremented ONLY for mapped, non-duplicate, non-QC failure reads:
+
A read spans a position if the read starts at or before the position, ends at or after the position and the position is not a clip.  CIGAR operations allowed for the position are M/X/=/D/N.
*Mapped - # of reads marked mapped in the flag
  −
*Paired - # of reads marked paired in the flag
  −
*ProperPaired - # of reads marked paired AND proper paired in the flag
  −
*ZeroMapQual - # of reads that have a Mapping Quality of 0
  −
*MapQual<10 - # of reads that have a Mapping Quality < 10
  −
*MapQual255 - # of reads that have a Mapping Quality = 255
  −
*PassMapQual - # of reads that have a Mapping Quality >= a minimum Mapping Quality (version 1.0, this includes mapping quality 255 reads).
     −
Additional values ONLY for mapped, mapping quality != 255, non-duplicate, non-QC failure reads:
+
Currently there is no special logic to exclude positions/reads where the reference base is 'N' or the read base is 'N'.  
*AverageMapQuality - average calculated by summing all mapping qualities that are included (as defined above) and dividing by the number of mapping qualities added.
  −
*AverageMapQualCount - # of mapping qualities used to calculate AverageMapQuality.
     −
Additional values ONLY incremented for mapped, mapping quality >= min mapping quality, non-duplicate, non-QC failure reads (version 1.0, this includes mapping quality 255 reads):
+
<br>  
*Depth - # of reads. 
  −
*Q20Bases - # of bases at this position with a base quality (from the read) of Q20 or higher.
     −
Currently there is no special logic to exclude positions where the refernce is 'N'.
+
=== BaseQC Output  ===
   −
Currently there is no special logic to exclude reads from the counts when the base is 'N'.
+
There are two output options for BaseQC.  
    +
#Percentages
 +
#Straight Counts
   −
=== BaseQC Output ===
+
==== Percentage-Based Output Format  ====
There are two output options for BaseQC.
  −
# Percentages
  −
# Straight Counts
     −
==== Percentage-Based Output Format ====
+
Order/Descriptions:  
Order (with calculations based on the values described above):
  −
*chrom - Chromosome/reference name string from the SAM/BAM
  −
*chromStart - 0-based start position
  −
*chromEnd  - 0-based end position (always 1 greater than start and not included in this region)
  −
*Depth - Depth
  −
*Q20Bases - Q20Bases
  −
*Q20BasesPct(%) - Q20Bases / Depth
  −
*TotalReads - TotalReads
  −
*MappedBases - Mapped
  −
*MappingRate(%) - Mapped / TotalReads
  −
*MapRate_MQPass(%) - PassMapQual / TotalReads
  −
*ZeroMapQual(%) - ZeroMapQual / TotalReads
  −
*MapQual<10(%) - MapQual<10 / TotalReads
  −
*PairedReads(%) - Paired / TotalReads
  −
*ProperPaired(%) - ProperPaired / TotalReads
  −
*DupRate(%) - Dups / TotalReads
  −
*QCFailRate(%) - QCFail / TotalReads
  −
*AverageMapQuality - AverageMapQuality
  −
*AverageMapQualCount - AverageMapQualCount
     −
This output does not include a MapQual255 count in version 1.0.
+
{|border=1
 +
! Field !! Description !!style="width: 80px"| Excludes Duplicates, QC Failures !!style="width: 80px"| Excludes Unmapped !!style="width: 80px"|  Excludes MapQual = 255 !!style="width: 80px"| Excludes Below Min MapQual
 +
|-
 +
| chrom || Chromosome/reference name string from the SAM/BAM
 +
|-
 +
| chromStart || 0-based start position
 +
|-
 +
| chromEnd || 0-based end position (always 1 greater than start and not included in this region)
 +
|-
 +
| Depth || # of reads that are mapped with acceptable Mapping Quality, and are not duplicates or QC failures || align="center"|X || align="center"|X || || align="center"|X
 +
|-
 +
| Q20Bases || # of bases at this position with a base quality (from the read) of Q20 or higher || align="center"|X || align="center"|X || || align="center"|X
 +
|-
 +
| Q20BasesPct(%) || Q20Bases / Depth || align="center"|X || align="center"|X || || align="center"|X
 +
|-
 +
| TotalReads || # of reads that span this position || || || ||
 +
|-
 +
| MappedBases || # of reads marked mapped in the flag || align="center"|X || align="center"|X || ||
 +
|-
 +
| MappingRate(%) || MappedBases / TotalReads || align="center"|X || align="center"|X || ||
 +
|-
 +
| MapRate_MQPass(%) || # of reads that have a Mapping Quality &gt;= a minimum Mapping Quality / TotalReads || align="center"|X || align="center"|X || ||
 +
|-
 +
| ZeroMapQual(%) || # of reads that have a Mapping Quality of 0 / TotalReads || align="center"|X || align="center"|X || ||
 +
|-
 +
| MapQual&lt;10(%) || # of reads that have a Mapping Quality &lt; 10 / TotalReads || align="center"|X || align="center"|X || ||
 +
|-
 +
| PairedReads(%) || # of reads marked paired in the flag / TotalReads || align="center"|X || align="center"|X || ||
 +
|-
 +
| ProperPaired(%) || # of reads marked paired AND proper paired in the flag / TotalReads || align="center"|X || align="center"|X || ||
 +
|-
 +
| DupRate(%) || # of reads marked duplicate in the flag / TotalReads || || || ||
 +
|-
 +
| QCFailRate(%) || # of reads marked QC failure in the flag / TotalReads || || || ||
 +
|-
 +
| AverageMapQuality || sum of included mapping qualities / AverageMapQualCount || align="center"|X || align="center"|X || align="center"|X ||
 +
|-
 +
| AverageMapQualCount || # of mapping qualities in AverageMapQuality || align="center"|X || align="center"|X || align="center"|X ||
 +
|-
 +
|}
    +
This output does not include a MapQual255 count in version 1.0.
   −
==== Count-Based Output Format ====
+
===== Sample Output  =====
Order (of values described above):
+
<pre>chrom chromStart chromEnd Depth Q20Bases Q20BasesPct(%) TotalReads MappedBases MappingRate(%) MapRate_MQPass(%) ZeroMapQual(%) MapQual&lt;10(%) PairedReads(%) ProperPaired(%) DupRate(%) QCFailRate(%) AverageMapQuality AverageMapQualCount
*chrom - Chromosome/reference name string from the SAM/BAM
  −
*chromStart - 0-based start position
  −
*chromEnd - 0-based end position (always 1 greater than start and not included in this region)
  −
*TotalReads
  −
*Dups
  −
*QCFail
  −
*Mapped
  −
*Paired
  −
*ProperPaired
  −
*ZeroMapQual
  −
*MapQual<10
  −
*MapQual255
  −
*PassMapQual
  −
*AverageMapQuality
  −
*AverageMapQualCount
  −
*Depth
  −
*Q20Bases
  −
 
  −
 
  −
=== Sample Output ===
  −
 
  −
<pre>
  −
chrom chromStart chromEnd Depth Q20Bases Q20BasesPct(%) TotalReads MappedBases MappingRate(%) MapRate_MQPass(%) ZeroMapQual(%) MapQual<10(%) PairedReads(%) ProperPaired(%) DupRate(%) QCFailRate(%) AverageMapQuality AverageMapQualCount
   
1 100 101 2 2 100.000 3 3 100.000 66.667 33.333 66.667 100.000 0.000 0.000 0.000 11.000 3
 
1 100 101 2 2 100.000 3 3 100.000 66.667 33.333 66.667 100.000 0.000 0.000 0.000 11.000 3
 
1 101 102 2 0 0.000 3 3 100.000 66.667 33.333 66.667 100.000 0.000 0.000 0.000 11.000 3
 
1 101 102 2 0 0.000 3 3 100.000 66.667 33.333 66.667 100.000 0.000 0.000 0.000 11.000 3
Line 168: Line 144:  
1 10023 10024 0 0 0.000 39 30 76.923 51.282 25.641 51.282 84.615 38.462 15.385 15.385 0.000 0
 
1 10023 10024 0 0 0.000 39 30 76.923 51.282 25.641 51.282 84.615 38.462 15.385 15.385 0.000 0
 
1 10024 10025 14 12 85.714 39 30 76.923 51.282 25.641 51.282 84.615 38.462 15.385 15.385 11.000 21
 
1 10024 10025 14 12 85.714 39 30 76.923 51.282 25.641 51.282 84.615 38.462 15.385 15.385 11.000 21
</pre>
+
</pre>  
 +
==== Count-Based Output Format  ====
 +
 
 +
{|border=1
 +
! Field !! Description !!style="width: 80px"| Excludes Duplicates, QC Failures !!style="width: 80px"| Excludes Unmapped !!style="width: 80px"|  Excludes MapQual = 255 !!style="width: 80px"| Excludes Below Min MapQual
 +
|-
 +
| chrom || Chromosome/reference name string from the SAM/BAM
 +
|-
 +
| chromStart || 0-based start position
 +
|-
 +
| chromEnd || 0-based end position (always 1 greater than start and not included in this region)
 +
|-
 +
| TotalReads || # of reads that span this position || || || ||
 +
|-
 +
| Dups || # of reads marked duplicate in the flag || || || ||
 +
|-
 +
| QCFail || # of reads marked QC failure in the flag || || || ||
 +
|-
 +
| Mapped || # of reads marked mapped in the flag || align="center"|X || align="center"|X || ||
 +
|-
 +
| Paired || # of reads marked paired in the flag || align="center"|X || align="center"|X || ||
 +
|-
 +
| ProperPaired || # of reads marked paired AND proper paired in the flag || align="center"|X || align="center"|X || ||
 +
|-
 +
| ZeroMapQual || # of reads that have a Mapping Quality of 0 || align="center"|X || align="center"|X || ||
 +
|-
 +
| MapQual&lt;10(%) || # of reads that have a Mapping Quality &lt; 10 || align="center"|X || align="center"|X || ||
 +
|-
 +
| MapQual255 || # of reads that have a Mapping Quality = 255 || align="center"|X || align="center"|X || ||
 +
|-
 +
| PassMapQual || # of reads that have a Mapping Quality &gt;= a minimum Mapping Quality || align="center"|X || align="center"|X || ||
 +
|-
 +
| AverageMapQuality || sum of included mapping qualities / AverageMapQualCount || align="center"|X || align="center"|X || align="center"|X ||
 +
|-
 +
| AverageMapQualCount || # of mapping qualities in AverageMapQuality || align="center"|X || align="center"|X || align="center"|X ||
 +
|-
 +
| Depth || # of reads that are mapped with acceptable Mapping Quality, and are not duplicates or QC failures || align="center"|X || align="center"|X || || align="center"|X
 +
|-
 +
| Q20Bases || # of bases at this position with a base quality (from the read) of Q20 or higher || align="center"|X || align="center"|X || || align="center"|X
 +
|-
 +
|}
 +
 
 +
 
 +
 
 +
[[Category:BamUtil|stats]] [[Category:BAM_Software]] [[Category:Software]]

Navigation menu