Changes

From Genome Analysis Wiki
Jump to navigationJump to search
3,195 bytes added ,  15:59, 24 August 2017
Line 3: Line 3:  
= Overview of the <code>stats</code> function of <code>bamUtil</code>  =
 
= Overview of the <code>stats</code> function of <code>bamUtil</code>  =
   −
The <code>stats</code> option on the [[BamUtil]] executable generates the specified statistics on a SAM/BAM file.  
+
The <code>stats</code> option on the [[BamUtil]] executable generates the specified statistics on a SAM/BAM file.
 +
 
 +
== Troubleshooting ==
 +
See [[BamUtil:_FAQ#BamUtil:_stats|BamUtil: FAQ -> BamUtil: stats]] for troubleshooting help.
    
= Usage =
 
= Usage =
 
<pre>
 
<pre>
./bam stats --in <inputFile> [--basic] [--qual] [--phred] [--pBaseQC <outputFileName>] [--cBaseQC <outputFileName>] [--maxNumReads <maxNum>][--unmapped] [--bamIndex <bamIndexFile>] [--regionList <regFileName>] [--requiredFlags <integerRequiredFlags>] [--excludeFlags <integerExcludeFlags>] [--noeof] [--params] [--withinRegion] [--baseSum] [--bufferSize <buffSize>] [--minMapQual <minMapQ>] [--dbsnp <dbsnpFile>]
+
./bam stats --in <inputFile> [--basic] [--qual] [--phred] [--pBaseQC <outputFileName>] [--cBaseQC <outputFileName>] [--maxNumReads <maxNum>][--unmapped] [--bamIndex <bamIndexFile>] [--regionList <regFileName>] [--requiredFlags <integerRequiredFlags>] [--excludeFlags <integerExcludeFlags>] [--noeof] [--params] [--withinRegion] [--baseSum] [--bufferSize <buffSize>] [--minMapQual <minMapQ>] [--dbsnp <dbsnpFile>]
 
</pre>
 
</pre>
    
= Parameters  =
 
= Parameters  =
 
<pre>
 
<pre>
Required Parameters:
+
        Required Parameters:
--in : the SAM/BAM file to calculate stats for
+
                --in : the SAM/BAM file to calculate stats for
Types of Statistics that can be generated:
+
        Types of Statistics that can be generated:
--basic        : Turn on basic statistic generation
+
                --basic        : Turn on basic statistic generation
--qual          : Generate a count for each quality (displayed as non-phred quality)
+
                --qual          : Generate a count for each quality (displayed as non-phred quality)
--phred        : Generate a count for each quality (displayed as phred quality)
+
                --phred        : Generate a count for each quality (displayed as phred quality)
--pBaseQC      : Write per base statistics as Percentages to the specified file.
+
                --pBaseQC      : Write per base statistics as Percentages to the specified file. (use - for stdout)
                  pBaseQC & cBaseQC cannot both be specified.
+
                                  pBaseQC & cBaseQC cannot both be specified.
--cBaseQC      : Write per base statistics as Counts to the specified file.
+
                --cBaseQC      : Write per base statistics as Counts to the specified file. (use - for stdout)
                  pBaseQC & cBaseQC cannot both be specified.
+
                                  pBaseQC & cBaseQC cannot both be specified.
Optional Parameters:
+
        Optional Parameters:
--maxNumReads  : Maximum number of reads to process
+
                --maxNumReads  : Maximum number of reads to process
                  Defaults to -1 to indicate all reads.
+
                                  Defaults to -1 to indicate all reads.
--unmapped      : Only process unmapped reads (requires a bamIndex file)
+
                --unmapped      : Only process unmapped reads (requires a bamIndex file)
--bamIndex      : The path/name of the bam index file
+
                --bamIndex      : The path/name of the bam index file
                  (if required and not specified, uses the --in value + ".bai")
+
                                  (if required and not specified, uses the --in value + ".bai")
--regionList    : File containing the regions to be processed chr<tab>start_pos<tab>end<pos>.
+
                --regionList    : File containing the regions to be processed chr<tab>start_pos<tab>end_pos.
                  Positions are 0 based and the end_pos is not included in the region.
+
                                  Positions are 0 based and the end_pos is not included in the region.
                  Uses bamIndex.
+
                                  Uses bamIndex.
--excludeFlags  : Skip any records with any of the specified flags set
+
                --excludeFlags  : Skip any records with any of the specified flags set
                  (specify an integer representation of the flags)
+
                                  (specify an integer representation of the flags)
--requiredFlags : Only process records with all of the specified flags set
+
                --requiredFlags : Only process records with all of the specified flags set
                  (specify an integer representation of the flags)
+
                                  (specify an integer representation of the flags)
--noeof        : Do not expect an EOF block on a bam file.
+
                --noeof        : Do not expect an EOF block on a bam file.
--params        : Print the parameter settings.
+
                --params        : Print the parameter settings.
Optional phred/qual Only Parameters:
+
        Optional phred/qual Only Parameters:
--withinRegion  : Only count qualities if they fall within regions specified.
+
                --withinRegion  : Only count qualities if they fall within regions specified.
                  Only applicable if regionList is also specified.
+
                                  Only applicable if regionList is also specified.
Optional BaseQC Only Parameters:
+
        Optional BaseQC Only Parameters:
--baseSum      : Print an overall summary of the baseQC for the file to stderr.
+
                --baseSum      : Print an overall summary of the baseQC for the file to stderr.
--bufferSize    : Size of the pileup buffer for calculating the BaseQC parameters.
+
                --bufferSize    : Size of the pileup buffer for calculating the BaseQC parameters.
                  Default: 1024
+
                                  Default: 1024
--minMapQual    : The minimum mapping quality for filtering reads in the baseQC stats.
+
                --minMapQual    : The minimum mapping quality for filtering reads in the baseQC stats.
--dbsnp        : The dbSnp file of positions to exclude from baseQC analysis.
+
                --dbsnp        : The dbSnp file of positions to exclude from baseQC analysis.
 
</pre>  
 
</pre>  
For all types of statistics, the bam file used is specified by <code>--in</code>.
+
{{PhoneHomeParamDesc}}
   −
The optional parameters are used for all types of statistics.
+
== Required Parameters ==
    
{{inBAMInputFile}}
 
{{inBAMInputFile}}
    +
== Optional Parameters ==
 +
===  Maximum number of reads to process(<code>--maxNumReads</code>) ===
 +
Use <code>--maxNumReads</code> followed by a number to indicate the maximum number of reads to process before exiting.  By default, it is set to -1 to indicate all reads should be processed.
 +
 +
=== Only Process Unmapped Reads (<code>--unmapped</code>) ===
 +
Use <code>--unmapped</code> to process only unmapped reads.
 +
 +
This parameter requires [[#Bam Index File (--bamIndex)|<code>--bamIndex</code>]].
 +
 +
{{BamIndex}}
 +
 +
=== Only Process Certain Regions (<code>--regionList</code>) ===
 +
Use <code>--regionList</code> followed by the filename to process only the regions specified in the file.
 +
 +
The positions in the file are specified one per line with the following format: <nowiki>chr<tab>start_pos<tab>end_pos.</nowiki>
 +
 +
Positions are 0 based and the end_pos is not included in the region.
 +
 +
This parameter requires [[#Bam Index File (--bamIndex)|<code>--bamIndex</code>]].
 +
 +
=== Exclude Flags (<code>--excludeFlags</code>) ===
 +
Use <code>--excludeFlags</code> followed by an integer representation of the flags to only process reads with any of the specified flags set.
 +
 +
=== Required Flags (<code>--requiredFlags</code>) ===
 +
Use <code>--requiredFlags</code> followed by an integer representation of the flags to only process records with all of the specified flags set.
    
== Types of Statistics ==
 
== Types of Statistics ==
Line 76: Line 104:  
=== Qual/Phred (<code>--phred</code> and <code>--qual</code>) ===
 
=== Qual/Phred (<code>--phred</code> and <code>--qual</code>) ===
   −
Prints a count of the number of times each quality value appears in the file.  
+
Prints a count of the number of times each quality value appears in the file to stderr.  
    
*<code>phred</code> Displays Quality as phred integers [0-93]  
 
*<code>phred</code> Displays Quality as phred integers [0-93]  
Line 87: Line 115:  
To only include records that overlap a set of regions, use --regionList and specify a bed file with the regions.  If a read overlaps the region, all qualities will be counted even if those bases do not fall in the region.  If you only want to count qualities that fall within the region, also specify --withinRegion.  Without excluding unmapped reads, it will include soft clips that overlap the region.
 
To only include records that overlap a set of regions, use --regionList and specify a bed file with the regions.  If a read overlaps the region, all qualities will be counted even if those bases do not fall in the region.  If you only want to count qualities that fall within the region, also specify --withinRegion.  Without excluding unmapped reads, it will include soft clips that overlap the region.
   −
<br>  
+
==== Optional Phred/Qual Only Parameters ====
 +
===== Within Region (<code>--withinRegion</code>) =====
 +
Use <code>--withinRegion</code> with [[#Qual/Phred (--phred and --qual)|<code>--phred</code> or <code>--qual</code>]] options to only count qualities if they fall within the regions specified using [[#Only Process Certain Regions (--regionList)|<code>--regionList</code>]] (only applicable if [[#Only Process Certain Regions (--regionList)|<code>--regionList</code>]]  is also specified).
    
=== BaseQC (<code>--pBaseQC</code> and <code>--cBaseQC</code> and <code>--baseSum</code>) ===
 
=== BaseQC (<code>--pBaseQC</code> and <code>--cBaseQC</code> and <code>--baseSum</code>) ===
   −
The <code>pBaseQC</code> and <code>cBaseQC</code> options generate per base statistics.  Only one of these two options can be specified.  They write statistics generated for each position to the file specified after the option.  They use the same logic for calculating statistics, but <code>pBaseQC</code> writes the statistics as percentages, and <code>cBaseQC</code> writes them as counts.  The order of the statistics are also different.
+
The <code>pBaseQC</code> and <code>cBaseQC</code> options generate per base statistics.  Only one of these two options can be specified.  They write statistics generated for each position to the file specified after the option (use <code>-</code> to write to STDOUT).  They use the same logic for calculating statistics, but <code>pBaseQC</code> writes the statistics as percentages, and <code>cBaseQC</code> writes them as counts.  The order of the statistics are also different.
    
The <code>baseSum</code> option can be used with either <code>pBaseQC</code> or <code>cBaseQC</code> or on its own.  <code>baseSum</code> generates a summary of the per position statistics and writes it to stderr.  It calculates the per position base statistics even if they will not be written anywhere (neither <code>pBaseQC</code> nor <code>cBaseQC</code> are specified).
 
The <code>baseSum</code> option can be used with either <code>pBaseQC</code> or <code>cBaseQC</code> or on its own.  <code>baseSum</code> generates a summary of the per position statistics and writes it to stderr.  It calculates the per position base statistics even if they will not be written anywhere (neither <code>pBaseQC</code> nor <code>cBaseQC</code> are specified).
Line 258: Line 288:  
17.670053 2.882307 2.882307 9.038380 7.603137 1.441153 3.012793 6.025586 0.000000 3.012793 0.000000 9.038380 2.841993 1.993579
 
17.670053 2.882307 2.882307 9.038380 7.603137 1.441153 3.012793 6.025586 0.000000 3.012793 0.000000 9.038380 2.841993 1.993579
 
</pre>
 
</pre>
 +
 +
==== Optional BaseQC Only Parameters ====
 +
===== Pileup Buffer Size (<code>--bufferSize</code>) =====
 +
Use the <code>--bufferSize</code> option followed by the size of the pileup buffer to use for [[BaseQC (--pBaseQC and --cBaseQC and --baseSum)|baseQC]] stats.
 +
 +
===== Minimum Mapping Quality (<code>--minMapQual</code>) =====
 +
Use the <code>--minMapQual</code> option followed by the minimum mapping quality for filtering reads in the [[BaseQC (--pBaseQC and --cBaseQC and --baseSum)|baseQC]] stats.
 +
 +
===== DBSNP File (<code>--dbsnp</code>) =====
 +
Use the <code>--dbsnp</code> option followed by the name of the dbsnp file to specify the positions to exclude from [[BaseQC (--pBaseQC and --cBaseQC and --baseSum)|baseQC]] analysis.
 +
 +
{{PhoneHomeParameters}}
 +
 +
= Return Value =
 +
0 on Success, non-0 on failure
 +
    
[[Category:BamUtil|stats]] [[Category:BAM_Software]] [[Category:Software]]
 
[[Category:BamUtil|stats]] [[Category:BAM_Software]] [[Category:Software]]

Navigation menu