Changes

From Genome Analysis Wiki
Jump to navigationJump to search
4,065 bytes added ,  19:43, 19 January 2012
no edit summary
Line 7: Line 7:  
= Usage =
 
= Usage =
 
<pre>
 
<pre>
./bam stats --in <inputFile> [--basic] [--qual] [--phred] [--baseQC <outputFileName>] [--maxNumReads <maxNum>][--unmapped] [--bamIndex <bamIndexFile>] [--regionList <regFileName>] [--minMapQual <minMapQ>] [--dbsnp <dbsnpFile>] [--sumStats] [--noeof] [--params]
+
./bam stats --in <inputFile> [--basic] [--qual] [--phred] [--pbaseQC <outputFileName>] [--cbaseQC <outputFileName>] [--baseSum] [--maxNumReads <maxNum>][--unmapped] [--bamIndex <bamIndexFile>] [--regionList <regFileName>] [--minMapQual <minMapQ>] [--dbsnp <dbsnpFile>] [--noeof] [--params]
 
</pre>  
 
</pre>  
   Line 19: Line 19:  
--qual        : Generate a count for each quality (displayed as non-phred quality)
 
--qual        : Generate a count for each quality (displayed as non-phred quality)
 
--phred      : Generate a count for each quality (displayed as phred quality)
 
--phred      : Generate a count for each quality (displayed as phred quality)
--baseQC      : Write per base statistics to the specified file.
+
--pBaseQC    : Write per base statistics as Percentages to the specified file.
 +
                pBaseQC & cBaseQC cannot both be specified.
 +
--cBaseQC    : Write per base statistics as Counts to the specified file.
 +
                pBaseQC & cBaseQC cannot both be specified.
 
Optional Parameters:
 
Optional Parameters:
 
--maxNumReads : Maximum number of reads to process
 
--maxNumReads : Maximum number of reads to process
Line 32: Line 35:  
--dbsnp      : The dbSnp file of positions to exclude from baseQC analysis.
 
--dbsnp      : The dbSnp file of positions to exclude from baseQC analysis.
 
--noeof      : Do not expect an EOF block on a bam file.
 
--noeof      : Do not expect an EOF block on a bam file.
--params      : Print the parameter settings
+
--params      : Print the parameter settings.
Optional Base QC Only Parameters:
+
Optional BaseQC Only Parameters:
--sumStats    : Alternate summary output.
+
--baseSum    : Print an overall summary of the baseQC for the file to stderr.
 
</pre>  
 
</pre>  
 
For all types of statistics, the bam file used is specified by <code>--in</code>.  
 
For all types of statistics, the bam file used is specified by <code>--in</code>.  
   −
The optional parameters are also used for all types of statistics.  
+
The optional parameters are used for all types of statistics.  
    +
{{inBAMInputFile}}
   −
= Types of Statistics  =
     −
== Basic ==
+
== Types of Statistics ==
 +
 
 +
=== Basic (<code>--basic</code>) ===
    
Prints summary statistics for the file:  
 
Prints summary statistics for the file:  
Line 61: Line 66:  
*BasesInMappedReads - # of bases in reads marked mapped in the flag
 
*BasesInMappedReads - # of bases in reads marked mapped in the flag
   −
== Qual/Phred ==
+
=== Qual/Phred (<code>--phred</code> and <code>--qual</code>) ===
    
Prints a count of the number of times each quality value appears in the file.  
 
Prints a count of the number of times each quality value appears in the file.  
Line 70: Line 75:  
<br>  
 
<br>  
   −
== BaseQC ==
+
=== BaseQC (<code>--pBaseQC</code> and <code>--cBaseQC</code> and <code>--baseSum</code>) ===
   −
The <code>baseQC</code> option generates the following statistics:
+
The <code>pBaseQC</code> and <code>cBaseQC</code> options generate per base statistics.  Only one of these two options can be specified.  They write statistics generated for each position to the file specified after the option.  They use the same logic for calculating statistics, but <code>pBaseQC</code> writes the statistics as percentages, and <code>cBaseQC</code> writes them as counts.  The order of the statistics are also different.
   −
A read spans a position if the read starts at or before the position, ends at or after the position and the position is not a clipCIGAR operations allowed for the position are M/X/=/D/N.  If the CIGAR is '*', only numbers for the specified reference position are incremented.
+
The <code>baseSum</code> option can be used with either <code>pBaseQC</code> or <code>cBaseQC</code> or on its own.  <code>baseSum</code> generates a summary of the per position statistics and writes it to stderrIt calculates the per position base statistics even if they will not be written anywhere (neither <code>pBaseQC</code> nor <code>cBaseQC</code> are specified).
   −
Currently there is no special logic to exclude positions/reads where the reference base is 'N' or the read base is 'N'.  
+
 
 +
All three options use the same logic for calculating the statistics:
 +
* A read spans a position if the read starts at or before the position, ends at or after the position and the position is not a clip.  CIGAR operations allowed for the position are M/X/=/D/N.  If the CIGAR is '*', only numbers for the specified reference position are incremented.
 +
*Currently there is no special logic to exclude positions/reads where the reference base is 'N' or the read base is 'N'.  
    
<br>  
 
<br>  
   −
=== BaseQC Output  ===
+
==== Percentage-Based Output Format (<code>--pBaseQC</code>) ====
 
  −
There are two output options for BaseQC.
  −
 
  −
#[[#Percentage-Based Output Format|Percentage-Based Output Format]]
  −
#[[#Count-Based Output Format|Count-Based Output Format]]
  −
 
  −
==== Percentage-Based Output Format  ====
      
Order/Descriptions:  
 
Order/Descriptions:  
Line 150: Line 151:  
1 10024 10025 14 12 85.714 39 30 76.923 51.282 25.641 51.282 84.615 38.462 15.385 15.385 11.000 21
 
1 10024 10025 14 12 85.714 39 30 76.923 51.282 25.641 51.282 84.615 38.462 15.385 15.385 11.000 21
 
</pre>  
 
</pre>  
==== Count-Based Output Format ====
+
==== Count-Based Output Format (<code>--pBaseQC</code>) ====
 
Order/Descriptions:  
 
Order/Descriptions:  
 
{|border=1  
 
{|border=1  
Line 191: Line 192:  
|}
 
|}
    +
==== Summary of per Position Statistics (<code>--baseSum</code>) ====
 +
Use <code>--baseSum</code> to print an overall summary of the baseQC for the file to stderr.
 +
 +
This option can be used with or without <code>--pBaseQC</code> and <code>--cBaseQC</code>
    +
The values are tab delimited.  First there is a header line describing the summary.  Then there is a line with the Mean values, then a line with the Standard Deviations.
 +
 +
{|border=1
 +
! Field !! Description !!style="width: 80px"| Excludes Duplicates, QC Failures !!style="width: 80px"| Excludes Unmapped !!style="width: 80px"|  Excludes MapQual = 255 !!style="width: 80px"| Excludes Below Min MapQual !!style="width: 80px"| Excludes CIGAR Deletions, Skips
 +
|-
 +
| TotalReads || # of reads that span this position || || || || ||
 +
|-
 +
| Dups || # of reads marked duplicate in the flag || || || || ||
 +
|-
 +
| QCFail || # of reads marked QC failure in the flag || || || || ||
 +
|-
 +
| Mapped || # of reads marked mapped in the flag || align="center"|X || align="center"|X || || ||
 +
|-
 +
| Paired || # of reads marked paired in the flag || align="center"|X || align="center"|X || || ||
 +
|-
 +
| ProperPaired || # of reads marked paired AND proper paired in the flag || align="center"|X || align="center"|X || || ||
 +
|-
 +
| ZeroMapQual || # of reads that have a Mapping Quality of 0 || align="center"|X || align="center"|X || || ||
 +
|-
 +
| MapQual&lt;10(%) || # of reads that have a Mapping Quality &lt; 10 || align="center"|X || align="center"|X || || ||
 +
|-
 +
| MapQual255 || # of reads that have a Mapping Quality = 255 || align="center"|X || align="center"|X || || ||
 +
|-
 +
| PassMapQual || # of reads that have a Mapping Quality &gt;= a minimum Mapping Quality || align="center"|X || align="center"|X || || ||
 +
|-
 +
| AverageMapQuality || sum of included mapping qualities / AverageMapQualCount || align="center"|X || align="center"|X || align="center"|X || ||
 +
|-
 +
| AverageMapQualCount || # of mapping qualities in AverageMapQuality || align="center"|X || align="center"|X || align="center"|X ||
 +
|- ||
 +
| Depth || # of reads that are mapped with acceptable Mapping Quality, and are not duplicates or QC failures || align="center"|X || align="center"|X || align="center"|X || align="center"|X || align="center"|X
 +
|-
 +
| Q20Bases || # of bases at this position with a base quality (from the read) of Q20 or higher || align="center"|X || align="center"|X || align="center"|X || align="center"|X || align="center"|X
 +
|-
 +
|}
 +
 +
===== Sample Output =====
 +
<pre>
 +
Summary of Pileup Stats (1st Mean, 2nd Standard Deviation)
 +
TotalReads Dups QCFail Mapped Paired ProperPaired ZeroMapQual MapQual<10 MapQual255 PassMapQual AverageMapQuality AverageMapQualCount
 +
Depth Q20Bases
 +
14.307692 1.846154 1.846154 8.769231 7.846154 0.923077 2.923077 5.846154 0.000000 2.923077 11.000000
 +
8.769231 2.076923 1.153846
 +
17.670053 2.882307 2.882307 9.038380 7.603137 1.441153 3.012793 6.025586 0.000000 3.012793 0.000000
 +
9.038380 2.841993 1.993579
 +
</pre>
    
[[Category:BamUtil|stats]] [[Category:BAM_Software]] [[Category:Software]]
 
[[Category:BamUtil|stats]] [[Category:BAM_Software]] [[Category:Software]]

Navigation menu