From Genome Analysis Wiki
Jump to navigationJump to search
2,136 bytes added
, 16:53, 29 August 2011
Line 3: |
Line 3: |
| [[Category:BAM Software]] | | [[Category:BAM Software]] |
| | | |
− | = bamUtil = | + | = bamUtil Overview = |
| | | |
| bamUtil is a repository that contains several programs that perform operations on SAM/BAM files. All of these programs are built into a single executable, <code>bam</code>. | | bamUtil is a repository that contains several programs that perform operations on SAM/BAM files. All of these programs are built into a single executable, <code>bam</code>. |
Line 52: |
Line 52: |
| | | |
| | | |
− | == Programs ==
| + | = Programs = |
| | | |
| The software reads the beginning of an input file to determine if it is SAM/BAM. To determine the format (SAM/BAM) of the output file, the software checks the output file's extension. If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file. | | The software reads the beginning of an input file to determine if it is SAM/BAM. To determine the format (SAM/BAM) of the output file, the software checks the output file's extension. If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file. |
Line 473: |
Line 473: |
| | | |
| </pre> | | </pre> |
| + | |
| + | == stats == |
| + | The <code>stats option on the bam executable generates the specified statistics on a SAM/BAM file. |
| + | |
| + | === Parameters === |
| + | |
| + | === Notes === |
| + | ==== BaseQC ==== |
| + | '''This capability is coming soon, so these notes may be updated prior to it being completed...''' |
| + | |
| + | Do we print stats for positions where the reference base is 'N'?? (any special note for those? Qplot would not count them in the depth.) |
| + | |
| + | The <code>baseQC</code> option generates the following statistics: |
| + | |
| + | For each position, the following counts are incremented if: |
| + | # a read spans the reference position (starts before or at this reference position and ends at or after this position) |
| + | # regardless of duplicate/qc failure/unmapped/mapping quality |
| + | # regardless of the CIGAR for this position (other than clips at the beginning/end which are not counted, but deletions and skips are counted) |
| + | *TotalReads(e6) - # of reads that span this position. |
| + | *DupRate(%) - # of reads marked duplicate in the flag / TotalReads |
| + | *QCFailRate(%) - # of reads marked QC failure in the flag / TotalReads |
| + | *PairedReads(%) - # of reads marked paired in the flag / TotalReads |
| + | *ProperPaired(%) - # of reads marked paired AND proper paired in the flag / TotalReads |
| + | *MappedBases(e9) - # of reads marked mapped in the flag |
| + | *MappingRate(%) - # of reads marked mapped in the flag / TotalReads |
| + | *ZeroMapQual(%) - # of reads marked mapped in the flag AND have a Mapping Quality of 0 / TotalReads |
| + | *MapQual<10(%) - # of reads marked mapped in the flag AND have a Mapping Quality < 10 / TotalReads |
| + | *MapRate_MQpass(%) - # of reads marked mapped in the flag AND have a Mapping Quality >= a minimum Mapping Quality / TotalReads |
| + | |
| + | |
| + | For each position, the following counts are incremented if: |
| + | # a read spans the reference position (starts before or at this reference position and ends at or after this position) |
| + | # the read is NOT a duplicate, qc failure, unmapped, or mapped with a mapping quality less than the min |
| + | # the CIGAR for this position is a M/=/X (match/mismatch) |
| + | TBD - should it count if the read has a base of 'N' |
| + | *Depth - # of reads. |
| + | *Q20Bases(e9) - TBD |
| + | *Q20BasesPct(%) - TBD |
| + | *EPS_MSE - TBD |