Changes

From Genome Analysis Wiki
Jump to: navigation, search

BamUtil

3,625 bytes added, 17:14, 11 September 2021
no edit summary
== Getting Help ==
If you have any questions please use the [httphttps://groups.googlegithub.com/groupstatgen/bamUtils bamUtil Google GroupbamUtil GitHub page] to ask questions or recommend improvements to bamUtil. Alternatively, you can e-mail me, Mary Kate Wing, at mktrost@umich.eduraise and issue.
See [[BamUtil: FAQ]] to see if your question has already been answered.
== Where to Find It ==
To install an official release, unpack the downloaded file (tar xvf), cd into the bamUtil_x.x.x directory and type make all.
 
For version 1.0.14 and later, please download libStatGen and bamUtil separately:
'''Version 1.0.14 - Released 7/8/2015'''*[[Media:BamUtilLibStatGen.LibStatGen Download#Official Releases|libStatGen version 1.0.9.tgz14]]*[[#Release of just BamUtil (does not include libStatGen)|BamUtilLibStatGen.bamUtil version 1.0.9.tgz‎14]] - Released 7/7/2013
'''BamUtilLibStatGen.1.0.9 Release Notes'''
* Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.9]]
* Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.9]]
* Update to [[BamUtil: mergeBam|mergeBam]]
** Update to ignore PG lines with duplicate IDs
** Update to accept merges of matching RG lines
** Update to log to stderr if no log/out file is specified
'''Older Releases'''
* [[Media:BamUtilLibStatGen.1.0.13.tgz|BamUtilLibStatGen.1.0.13.tgz‎]] - Released 2/20/2015** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.13]] - see link for version updates** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.13]] - see link for version updates  * [[Media:BamUtilLibStatGen.1.0.12.tar.gz|BamUtilLibStatGen.1.0.12.tgz‎]] - Released 5/14/2014** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.12]] - see link for version updates** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.12]] - see link for version updates** Adds regions to [[BamUtil: mergeBam|mergeBam]]** Accept ',' delimiters for the tags string input in [[BamUtil: squeeze|squeeze]], [[BamUtil: revert|revert]], & [[BamUtil: diff|diff]] *[[Media:BamUtilLibStatGen.1.0.11.tar.gz|BamUtilLibStatGen.1.0.11.tar.gz‎]] - Released 2/28/2014** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.11]] - see link for version updates** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.11]] - see link for version updates** Now properly supports 'B' & 'f' tags** Cleanup - compile issues *[[Media:BamUtilLibStatGen.1.0.10.tar.gz|BamUtilLibStatGen.1.0.10.tar.gz‎]] - Released 1/2/2014** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.10]] - see link for version updates** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.10]] - see link for version updates** Adds PhoneHome/Version checking.  *[[Media:BamUtilLibStatGen.1.0.9.tgz|BamUtilLibStatGen.1.0.9.tgz‎]] - Released 7/7/2013** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.9]] ** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.9]]** Update to [[BamUtil: mergeBam|mergeBam]]*** Update to ignore PG lines with duplicate IDs*** Update to accept merges of matching RG lines*** Update to log to stderr if no log/out file is specified* There is no version 1.0.8. It was skipped to stya stay in line with libStatGen versions (libStatGen 1.0.8 added vcf support)
*[[Media:BamUtilLibStatGen.1.0.7.tgz|BamUtilLibStatGen.1.0.7.tgz‎]] - Released 1/29/2013
** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.7]]
**Initial release of bamUtil that includes libStatGen version 1.0.0. It started from the tool found in the deprecated StatGen repository.
**Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.0]] [[BamUtil: validate|validate]], [[BamUtil: convert|convert]], [[BamUtil: dumpHeader|dumpHeader]], [[BamUtil: splitChromosome|splitChromosome]], [[BamUtil: writeRegion|writeRegion]], [[BamUtil: dumpRefInfo|dumpRefInfo]], [[BamUtil: dumpIndex|dumpIndex]], [[BamUtil: readIndexedBam|readIndexedBam]], [[BamUtil: filter|filter]], [[BamUtil: readReference|readReference]], [[BamUtil: revert|revert]], [[BamUtil: diff|diff]], [[BamUtil: squeeze|squeeze]], [[BamUtil: findCigars|findCigars]], [[BamUtil: stats|stats]]
 
=== Release of just BamUtil (does not include libStatGen) ===
To install an official release, unpack the downloaded file (tar xvf), cd into the bamUtil_x.x.x directory and type make all.
'''BamUtil.1.0.14 Release Notes'''* BamUtil Version 1.0.14 - Released 7/8/2015** https://github.com/statgen/bamUtil/archive/v1.0.14.tar.gz** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.14]]** Update [[BamUtil: trimBam|trimBam]]*** Add option to soft clip (-c) instead of trimming** Update [[BamUtil: clipOverlap|clipOverlap]]*** Add option to mark reads as unmapped if they are entirely clipped** Update to [[BamUtil: bam2FastQ|bam2FastQ]]*** Add option to gzip the output files*** Add option to split Read Groups into separate fastq files*** Add option to get the quality from a tag** Update [[BamUtil: recab|recab]]*** Update to ignore ref 'N' when building the recalibration table*** Add ability to bin** Add Dedup_LowMem tool '''Older Releases'''* BamUtil Version 1.0.13 - Released 2/20/2015** https://github.com/statgen/bamUtil/archive/v1.0.13.tar.gz** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.13]]** Makefile Updates*** Improve logic to determine actual path for the library*** Update to append to USER_COMPILE_VARS even if specified on the command line** Update [[BamUtil: writeRegion|writeRegion]]*** Add option to specify readnames to keep in a file*** Fixed bug that if a read overlapped 2 BED positions, it was printed twice** Update to [[BamUtil: bam2FastQ|bam2FastQ]]*** Update to skip non-primary reads** Update to [[BamUtil: polishBam|polishBam]]*** Update to handle '\t' string inputs and to add CO option*** Fix MD5sum calculation to convert fasta to uppercase prior to calculating * [[Media:BamUtil.1.0.912.tgz|BamUtil.1.0.912.tgz‎]] - Released 75/714/20132014** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.12]]** Update [[BamUtil: mergeBam|mergeBam]]*** Add a regions option** Update to [[BamUtil: squeeze|squeeze]], [[BamUtil: revert|revert]], [[BamUtil: diff|diff]]*** Also accept ',' instead of just ';' as the delimiter in the input tags string.
'''* [[Media:BamUtil.1.0.11.tgz|BamUtil.1.0.9 Release Notes'''11.tgz‎]] - Released 2/28/2014** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.911]]*** Adds support for 'B' & 'f' tags that did not work properly before.** Update [[BamUtil: splitBam|splitBam]] & [[BamUtil: polishBam|polishBam]]*** Update to work properly if log & output file are not specified (version 1no longer creates '.0.7 should also worklog')** Update Main dummy/example tool to indicate the correct tool** Update to [[BamUtil: bam2FastQ|bam2FastQ]], [[BamUtil: clipOverlap|clipOverlap]], [[BamUtil: filter|filter]], [[BamUtil: mergeBam|mergeBam]], [[BamUtil: splitBam|splitBam]], [[BamUtil: squeeze|squeeze]], [[BamUtil: stats|stats]]*** Cleanup usage/parameter descriptions** Update to ignore PG lines with duplicate IDs[[BamUtil: revert|revert]]*** Update compatibility with libStatGen due to accept merges of matching RG lines'B' & 'f' tag handling updates** Update to log to stderr if no log/out file is specifiedAdd tests for 'B' & 'f' tags
* [[Media:BamUtil.1.0.10.tar.gz|BamUtil.1.0.10.tar.gz‎]] - Released 1/2/2014
** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.10]]
** All
*** Add PhoneHome/version checking
*** Make sub-program names case independent
*** Fix Logger.cpp compiler warning
** Adds: [[BamUtil: explainFlags|explainFlags]] - describes the SAM/BAM flags based on the flag value
** Update to [[BamUtil: stats|stats]]
*** Fix Stats to not try to not try to process a record after it is out of the loop (it would already have been processed or is invalid)
** Update to [[BamUtil: splitBam|splitBam]]
*** fix description of --noeof option
** Update to [[BamUtil: writeRegion|writeRegion]]
*** add exclude/required flags
** Update to [[BamUtil: dedup|dedup]] & [[BamUtil: recab|recab]]
*** Ignore secondary reads for dedup and making the recalibration table.
*** skip QC Failures
*** add excludeFlags parameters
** Update to [[BamUtil: clipOverlap|clipOverlap]]
*** add exclude flags
*** fix bug for readName sorted when a read is filtered due to flags
*** add sorting validation
** Update to [[BamUtil: bam2FastQ|bam2FastQ]]
*** add --merge option to generate interleaved files.
*** update to open the input file before opening the output files, so if there is an error, the outputs aren't opened
** Update to [[BamUtil: mergeBam|mergeBam]]
*** add option to ignore the RG PI field when checking headers
*** add more informative header merge error messages
 
* [[Media:BamUtil.1.0.9.tgz|BamUtil.1.0.9.tgz‎]] - Released 7/7/2013
** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.9]] (version 1.0.7 should also work)
** Update to [[BamUtil: mergeBam|mergeBam]]
*** Update to ignore PG lines with duplicate IDs
*** Update to accept merges of matching RG lines
*** Update to log to stderr if no log/out file is specified
'''Older Releases'''
*[[Media:BamUtil.1.0.7.tgz|BamUtil.1.0.7.tgz‎]] - Released 1/29/2013
** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.7]] or above
**Initial release of just bamUtil. It started from the tool found in the deprecated StatGen repository.
**Contains: [[BamUtil: validate|validate]], [[BamUtil: convert|convert]], [[BamUtil: dumpHeader|dumpHeader]], [[BamUtil: splitChromosome|splitChromosome]], [[BamUtil: writeRegion|writeRegion]], [[BamUtil: dumpRefInfo|dumpRefInfo]], [[BamUtil: dumpIndex|dumpIndex]], [[BamUtil: readIndexedBam|readIndexedBam]], [[BamUtil: filter|filter]], [[BamUtil: readReference|readReference]], [[BamUtil: revert|revert]], [[BamUtil: diff|diff]], [[BamUtil: squeeze|squeeze]], [[BamUtil: findCigars|findCigars]], [[BamUtil: stats|stats]]
 
== Citation ==
If you use BamUtil, please cite our publication on GotCloud which includes BamUtil:
[http://genome.cshlp.org/content/early/2015/04/14/gr.176552.114.abstract Jun, Goo, et al. "An efficient and scalable analysis framework for variant extraction and refinement from population scale DNA sequence data." Genome research (2015): gr-176552.]
 
= Programs =
The software reads the beginning of an input file to determine if it is SAM/BAM. To determine the format (SAM/BAM) of the output file, the software checks the output file's extension. If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file.
The bam executable has the following functions.  * Rewrite SAM/BAM Files** [[BamUtil: convert|'''convert''' - Read a SAM/BAM file and write as a SAM/BAM file (optionally converts between '=' & bases in the sequence)]]** [[BamUtil: writeRegion|'''writeRegion''' - Write the alignments in the indexed BAM file that fall into the specified region and/or have the specified read name]]** [[BamUtil: splitChromosome|'''splitChromosome''' - Split BAM by Chromosome]]** [[BamUtil: splitBam|'''splitBam''' - Split SAM/BAM file by Read Group]]** [[BamUtil: findCigars|'''findCigars''' - Output just the reads that contain any of the specified CIGAR operations]] * Modify & write SAM/BAM Files** [[BamUtil: clipOverlap|'''clipOverlap''' - Clip overlapping read pairs so they do not overlap]]** [[BamUtil: filter|'''filter''' - Filter reads by clipping ends with too high of a mismatch percentage and by marking reads unmapped if the quality of mismatches is too high]]** [[BamUtil: revert|'''revert''' - Revert SAM/BAM replacing the specified fields with their previous values (if known) and removes specified tags]]** [[BamUtil: squeeze|'''squeeze''' - Reduce file size by dropping OQ fields, duplicates, specified tags, using '=' when a base matches the reference, binning quality scores, and replacing readNames with unique integers]]** [[BamUtil: trimBam|'''trimBam''' - Trim end of reads, changing read ends to ‘N’ & quality to ‘!’]]**[[BamUtil: polishBam|'''polishBam''' – Add/Update header lines & add RG tag to each record]]**[[BamUtil: rgMergeBam|'''rgMergeBam''' – Merge sorted BAM files adding Read Groups]]**[[BamUtil: dedup|'''dedup''' – Mark or remove duplicates, can also perform recalibration]]**[[BamUtil: recab|'''recab''' - Recalibrate base qualities]] * Informational Tools** [[BamUtil: validate|'''validate''' - Read and Validate a SAM/BAM file]]** [[BamUtil: diff|'''diff''' - Print the diffs between 2 bams]]** [[BamUtil: stats|'''stats''' - Print some basic statistics on a SAM/BAM file]]** [[BamUtil: gapInfo|'''gapInfo''' - Print information on the gap between read pairs in a SAM/BAM file]] * Print Information in Readable Form:** [[BamUtil: dumpHeader|'''dumpHeader''' - Print SAM/BAM header]]** [[BamUtil: dumpRefInfo|'''dumpRefInfo''' - Print SAM/BAM Reference Information]]** [[BamUtil: dumpIndex|'''dumpIndex''' - Dump a BAM index file into an easy to read text version]]** [[BamUtil: readReference|'''readReference''' - Print the reference string for the specified region]] *Additional Tools** [[BamUtil: bam2FastQ|'''bam2FastQ''' - Convert the specified BAM file to fastQs]] * Dummy/Example Tools:** [[BamUtil: readIndexedBam|'''readIndexedBam''' - Read an indexed BAM file reference by reference id -1 to the max reference id and write it out as a SAM/BAM file]] This executable is built using [[C++ Library: libStatGen]]. Just running ./bam will print the Usage information for the bam executable.{{BamUtilPrograms}}

Navigation menu