Changes

From Genome Analysis Wiki
Jump to navigationJump to search
2,136 bytes added ,  16:53, 29 August 2011
no edit summary
Line 3: Line 3:  
[[Category:BAM Software]]
 
[[Category:BAM Software]]
   −
= bamUtil =
+
= bamUtil Overview =
    
bamUtil is a repository that contains several programs that perform operations on SAM/BAM files.  All of these programs are built into a single executable, <code>bam</code>.
 
bamUtil is a repository that contains several programs that perform operations on SAM/BAM files.  All of these programs are built into a single executable, <code>bam</code>.
Line 52: Line 52:       −
== Programs ==
+
= Programs =
    
The software reads the beginning of an input file to determine if it is SAM/BAM.  To determine the format (SAM/BAM) of the output file, the software checks the output file's extension.  If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file.
 
The software reads the beginning of an input file to determine if it is SAM/BAM.  To determine the format (SAM/BAM) of the output file, the software checks the output file's extension.  If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file.
Line 473: Line 473:     
</pre>
 
</pre>
 +
 +
== stats ==
 +
The <code>stats option on the bam executable generates the specified statistics on a SAM/BAM file.
 +
 +
=== Parameters ===
 +
 +
=== Notes ===
 +
==== BaseQC ====
 +
'''This capability is coming soon, so these notes may be updated prior to it being completed...'''
 +
 +
Do we print stats for positions where the reference base is 'N'??  (any special note for those?  Qplot would not count them in the depth.)
 +
 +
The <code>baseQC</code> option generates the following statistics:
 +
 +
For each position, the following counts are incremented if:
 +
# a read spans the reference position (starts before or at this reference position and ends at or after this position)
 +
# regardless of duplicate/qc failure/unmapped/mapping quality
 +
# regardless of the CIGAR for this position (other than clips at the beginning/end which are not counted, but deletions and skips are counted)
 +
*TotalReads(e6) - # of reads that span this position.
 +
*DupRate(%) - # of reads marked duplicate in the flag / TotalReads
 +
*QCFailRate(%) - # of reads marked QC failure in the flag / TotalReads
 +
*PairedReads(%) - # of reads marked paired in the flag / TotalReads
 +
*ProperPaired(%) - # of reads marked paired AND proper paired in the flag / TotalReads
 +
*MappedBases(e9) - # of reads marked mapped in the flag
 +
*MappingRate(%) - # of reads marked mapped in the flag / TotalReads
 +
*ZeroMapQual(%) - # of reads marked mapped in the flag AND have a Mapping Quality of 0 / TotalReads
 +
*MapQual<10(%) - # of reads marked mapped in the flag AND have a Mapping Quality < 10 / TotalReads
 +
*MapRate_MQpass(%) - # of reads marked mapped in the flag AND have a Mapping Quality >= a minimum Mapping Quality / TotalReads
 +
 +
 +
For each position, the following counts are incremented if:
 +
# a read spans the reference position (starts before or at this reference position and ends at or after this position)
 +
# the read is NOT a duplicate, qc failure, unmapped, or mapped with a mapping quality less than the min
 +
# the CIGAR for this position is a M/=/X (match/mismatch)
 +
TBD - should it count if the read has a base of 'N'
 +
*Depth - # of reads. 
 +
*Q20Bases(e9) - TBD
 +
*Q20BasesPct(%) - TBD
 +
*EPS_MSE - TBD

Navigation menu