Difference between revisions of "BamUtil"
(20 intermediate revisions by one other user not shown) | |||
Line 7: | Line 7: | ||
bamUtil is a repository that contains several programs that perform operations on SAM/BAM files. All of these programs are built into a single executable, <code>bam</code>. | bamUtil is a repository that contains several programs that perform operations on SAM/BAM files. All of these programs are built into a single executable, <code>bam</code>. | ||
+ | |||
+ | == Getting Help == | ||
+ | |||
+ | If you have any questions please use the [https://github.com/statgen/bamUtil bamUtil GitHub page] to raise and issue. | ||
+ | |||
+ | See [[BamUtil: FAQ]] to see if your question has already been answered. | ||
== Where to Find It == | == Where to Find It == | ||
Line 21: | Line 27: | ||
To install an official release, unpack the downloaded file (tar xvf), cd into the bamUtil_x.x.x directory and type make all. | To install an official release, unpack the downloaded file (tar xvf), cd into the bamUtil_x.x.x directory and type make all. | ||
+ | For version 1.0.14 and later, please download libStatGen and bamUtil separately: | ||
− | |||
− | ''' | + | '''Version 1.0.14 - Released 7/8/2015''' |
− | * | + | *[[LibStatGen Download#Official Releases|libStatGen version 1.0.14]] |
− | * | + | *[[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.14]] |
− | |||
− | |||
'''Older Releases''' | '''Older Releases''' | ||
+ | * [[Media:BamUtilLibStatGen.1.0.13.tgz|BamUtilLibStatGen.1.0.13.tgz]] - Released 2/20/2015 | ||
+ | ** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.13]] - see link for version updates | ||
+ | ** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.13]] - see link for version updates | ||
+ | |||
+ | |||
+ | * [[Media:BamUtilLibStatGen.1.0.12.tar.gz|BamUtilLibStatGen.1.0.12.tgz]] - Released 5/14/2014 | ||
+ | ** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.12]] - see link for version updates | ||
+ | ** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.12]] - see link for version updates | ||
+ | ** Adds regions to [[BamUtil: mergeBam|mergeBam]] | ||
+ | ** Accept ',' delimiters for the tags string input in [[BamUtil: squeeze|squeeze]], [[BamUtil: revert|revert]], & [[BamUtil: diff|diff]] | ||
+ | |||
+ | *[[Media:BamUtilLibStatGen.1.0.11.tar.gz|BamUtilLibStatGen.1.0.11.tar.gz]] - Released 2/28/2014 | ||
+ | ** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.11]] - see link for version updates | ||
+ | ** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.11]] - see link for version updates | ||
+ | ** Now properly supports 'B' & 'f' tags | ||
+ | ** Cleanup - compile issues | ||
+ | |||
+ | *[[Media:BamUtilLibStatGen.1.0.10.tar.gz|BamUtilLibStatGen.1.0.10.tar.gz]] - Released 1/2/2014 | ||
+ | ** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.10]] - see link for version updates | ||
+ | ** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.10]] - see link for version updates | ||
+ | ** Adds PhoneHome/Version checking. | ||
+ | |||
+ | *[[Media:BamUtilLibStatGen.1.0.9.tgz|BamUtilLibStatGen.1.0.9.tgz]] - Released 7/7/2013 | ||
+ | ** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.9]] | ||
+ | ** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.9]] | ||
+ | ** Update to [[BamUtil: mergeBam|mergeBam]] | ||
+ | *** Update to ignore PG lines with duplicate IDs | ||
+ | *** Update to accept merges of matching RG lines | ||
+ | *** Update to log to stderr if no log/out file is specified | ||
+ | * There is no version 1.0.8. It was skipped to stay in line with libStatGen versions (libStatGen 1.0.8 added vcf support) | ||
+ | *[[Media:BamUtilLibStatGen.1.0.7.tgz|BamUtilLibStatGen.1.0.7.tgz]] - Released 1/29/2013 | ||
+ | ** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.7]] | ||
+ | ** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.7]] | ||
+ | ** Update to fix some compile issues on ubuntu 12.10 | ||
+ | ** Update use of SamRecord::getStringTag to expect the return of a const string pointer due to libStatGen v1.0.7 updates | ||
+ | ** Update SamReferenceInfo usage due to libStatGen v1.0.7 updates | ||
+ | ** Update to [[BamUtil: diff|diff]] | ||
+ | *** Fix DIFF to test and properly handle running out of available records. Previously no message was printed when this happened and there was a bug for which file it freed | ||
+ | ** Update to [[BamUtil: clipOverlap|clipOverlap]] | ||
+ | *** Update to facilitate adding other overlap handling functions | ||
+ | ** Update to [[BamUtil: mergeBam|mergeBam]] (formerly RGMergeBam) | ||
+ | *** Rename RGMergeBam to MergeBam | ||
+ | *** Update to handle files that already have an RG | ||
+ | |||
+ | *[[Media:BamUtilLibStatGen.1.0.6.tgz|BamUtilLibStatGen.1.0.6.tgz]] - Released 11/14/2012 | ||
+ | ** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.6]] | ||
+ | ** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.6]] | ||
+ | ** Update to [[BamUtil: trimBam|trimBam]] | ||
+ | *** Update to allow trimming a different number of bases from each end of the read | ||
+ | *[[Media:BamUtilLibStatGen.1.0.5.tgz|BamUtilLibStatGen.1.0.5.tgz]] - Released 10/24/2012 | ||
+ | ** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.5]] | ||
+ | ** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.5]] | ||
+ | ** Updates to: [[BamUtil: dedup|dedup]], [[BamUtil: polishBam|polishBam]], [[BamUtil: recab|recab]] | ||
+ | ** Update to add compile option to compile without C++0x/C++11 | ||
+ | ** See [[#Release of just BamUtil (does not include libStatGen)|below]] for additional details on updates | ||
+ | *BamUtilLibStatGen.1.0.4.tgz - Released skipped | ||
+ | *[[Media:BamUtilLibStatGen.1.0.3.tgz|BamUtilLibStatGen.1.0.3.tgz]] - Released 09/19/2012 | ||
+ | ** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.3]] | ||
+ | ** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.3]] | ||
+ | ** Adds: [[BamUtil: dedup|dedup]] [[BamUtil: recab|recab]] | ||
*[[Media:BamUtilLibStatGen.1.0.2.tgz|BamUtilLibStatGen.1.0.2.tgz]] - Released 05/16/2012 | *[[Media:BamUtilLibStatGen.1.0.2.tgz|BamUtilLibStatGen.1.0.2.tgz]] - Released 05/16/2012 | ||
** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.2]] | ** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.2]] | ||
Line 41: | Line 105: | ||
** Adds leftShifting to [[BamUtil: writeRegion|writeRegion]] and [[BamUtil: convert|convert]] | ** Adds leftShifting to [[BamUtil: writeRegion|writeRegion]] and [[BamUtil: convert|convert]] | ||
** Adds more diff fields to [[BamUtil: diff|diff]] | ** Adds more diff fields to [[BamUtil: diff|diff]] | ||
− | |||
* [[Media:BamUtilLibStatGen.1.0.0.tgz|BamUtilLibStatGen.1.0.0.tgz]] - Released 10/10/2011 | * [[Media:BamUtilLibStatGen.1.0.0.tgz|BamUtilLibStatGen.1.0.0.tgz]] - Released 10/10/2011 | ||
**Initial release of bamUtil that includes libStatGen version 1.0.0. It started from the tool found in the deprecated StatGen repository. | **Initial release of bamUtil that includes libStatGen version 1.0.0. It started from the tool found in the deprecated StatGen repository. | ||
**Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.0]] [[BamUtil: validate|validate]], [[BamUtil: convert|convert]], [[BamUtil: dumpHeader|dumpHeader]], [[BamUtil: splitChromosome|splitChromosome]], [[BamUtil: writeRegion|writeRegion]], [[BamUtil: dumpRefInfo|dumpRefInfo]], [[BamUtil: dumpIndex|dumpIndex]], [[BamUtil: readIndexedBam|readIndexedBam]], [[BamUtil: filter|filter]], [[BamUtil: readReference|readReference]], [[BamUtil: revert|revert]], [[BamUtil: diff|diff]], [[BamUtil: squeeze|squeeze]], [[BamUtil: findCigars|findCigars]], [[BamUtil: stats|stats]] | **Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.0]] [[BamUtil: validate|validate]], [[BamUtil: convert|convert]], [[BamUtil: dumpHeader|dumpHeader]], [[BamUtil: splitChromosome|splitChromosome]], [[BamUtil: writeRegion|writeRegion]], [[BamUtil: dumpRefInfo|dumpRefInfo]], [[BamUtil: dumpIndex|dumpIndex]], [[BamUtil: readIndexedBam|readIndexedBam]], [[BamUtil: filter|filter]], [[BamUtil: readReference|readReference]], [[BamUtil: revert|revert]], [[BamUtil: diff|diff]], [[BamUtil: squeeze|squeeze]], [[BamUtil: findCigars|findCigars]], [[BamUtil: stats|stats]] | ||
− | |||
=== Release of just BamUtil (does not include libStatGen) === | === Release of just BamUtil (does not include libStatGen) === | ||
Line 51: | Line 113: | ||
To install an official release, unpack the downloaded file (tar xvf), cd into the bamUtil_x.x.x directory and type make all. | To install an official release, unpack the downloaded file (tar xvf), cd into the bamUtil_x.x.x directory and type make all. | ||
− | [[Media:BamUtil.1.0. | + | '''BamUtil.1.0.14 Release Notes''' |
+ | * BamUtil Version 1.0.14 - Released 7/8/2015 | ||
+ | ** https://github.com/statgen/bamUtil/archive/v1.0.14.tar.gz | ||
+ | ** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.14]] | ||
+ | ** Update [[BamUtil: trimBam|trimBam]] | ||
+ | *** Add option to soft clip (-c) instead of trimming | ||
+ | ** Update [[BamUtil: clipOverlap|clipOverlap]] | ||
+ | *** Add option to mark reads as unmapped if they are entirely clipped | ||
+ | ** Update to [[BamUtil: bam2FastQ|bam2FastQ]] | ||
+ | *** Add option to gzip the output files | ||
+ | *** Add option to split Read Groups into separate fastq files | ||
+ | *** Add option to get the quality from a tag | ||
+ | ** Update [[BamUtil: recab|recab]] | ||
+ | *** Update to ignore ref 'N' when building the recalibration table | ||
+ | *** Add ability to bin | ||
+ | ** Add Dedup_LowMem tool | ||
+ | |||
+ | '''Older Releases''' | ||
+ | * BamUtil Version 1.0.13 - Released 2/20/2015 | ||
+ | ** https://github.com/statgen/bamUtil/archive/v1.0.13.tar.gz | ||
+ | ** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.13]] | ||
+ | ** Makefile Updates | ||
+ | *** Improve logic to determine actual path for the library | ||
+ | *** Update to append to USER_COMPILE_VARS even if specified on the command line | ||
+ | ** Update [[BamUtil: writeRegion|writeRegion]] | ||
+ | *** Add option to specify readnames to keep in a file | ||
+ | *** Fixed bug that if a read overlapped 2 BED positions, it was printed twice | ||
+ | ** Update to [[BamUtil: bam2FastQ|bam2FastQ]] | ||
+ | *** Update to skip non-primary reads | ||
+ | ** Update to [[BamUtil: polishBam|polishBam]] | ||
+ | *** Update to handle '\t' string inputs and to add CO option | ||
+ | *** Fix MD5sum calculation to convert fasta to uppercase prior to calculating | ||
+ | |||
+ | * [[Media:BamUtil.1.0.12.tgz|BamUtil.1.0.12.tgz]] - Released 5/14/2014 | ||
+ | ** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.12]] | ||
+ | ** Update [[BamUtil: mergeBam|mergeBam]] | ||
+ | *** Add a regions option | ||
+ | ** Update to [[BamUtil: squeeze|squeeze]], [[BamUtil: revert|revert]], [[BamUtil: diff|diff]] | ||
+ | *** Also accept ',' instead of just ';' as the delimiter in the input tags string. | ||
+ | |||
+ | * [[Media:BamUtil.1.0.11.tgz|BamUtil.1.0.11.tgz]] - Released 2/28/2014 | ||
+ | ** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.11]] | ||
+ | *** Adds support for 'B' & 'f' tags that did not work properly before. | ||
+ | ** Update [[BamUtil: splitBam|splitBam]] & [[BamUtil: polishBam|polishBam]] | ||
+ | *** Update to work properly if log & output file are not specified (no longer creates '.log') | ||
+ | ** Update Main dummy/example tool to indicate the correct tool | ||
+ | ** Update to [[BamUtil: bam2FastQ|bam2FastQ]], [[BamUtil: clipOverlap|clipOverlap]], [[BamUtil: filter|filter]], [[BamUtil: mergeBam|mergeBam]], [[BamUtil: splitBam|splitBam]], [[BamUtil: squeeze|squeeze]], [[BamUtil: stats|stats]] | ||
+ | *** Cleanup usage/parameter descriptions | ||
+ | ** Update [[BamUtil: revert|revert]] | ||
+ | *** Update compatibility with libStatGen due to 'B' & 'f' tag handling updates | ||
+ | ** Add tests for 'B' & 'f' tags | ||
− | + | * [[Media:BamUtil.1.0.10.tar.gz|BamUtil.1.0.10.tar.gz]] - Released 1/2/2014 | |
− | * Adds: [[BamUtil: | + | ** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.10]] |
− | * | + | ** All |
− | ** | + | *** Add PhoneHome/version checking |
− | * Update to [[BamUtil: | + | *** Make sub-program names case independent |
− | ** | + | *** Fix Logger.cpp compiler warning |
− | ** | + | ** Adds: [[BamUtil: explainFlags|explainFlags]] - describes the SAM/BAM flags based on the flag value |
− | ** | + | ** Update to [[BamUtil: stats|stats]] |
− | * | + | *** Fix Stats to not try to not try to process a record after it is out of the loop (it would already have been processed or is invalid) |
− | * Update to [[BamUtil: | + | ** Update to [[BamUtil: splitBam|splitBam]] |
− | ** | + | *** fix description of --noeof option |
− | * Update to [[BamUtil: | + | ** Update to [[BamUtil: writeRegion|writeRegion]] |
− | ** | + | *** add exclude/required flags |
− | * Update to [[BamUtil: | + | ** Update to [[BamUtil: dedup|dedup]] & [[BamUtil: recab|recab]] |
− | ** | + | *** Ignore secondary reads for dedup and making the recalibration table. |
− | * Update to [[BamUtil: | + | *** skip QC Failures |
− | ** | + | *** add excludeFlags parameters |
− | * | + | ** Update to [[BamUtil: clipOverlap|clipOverlap]] |
+ | *** add exclude flags | ||
+ | *** fix bug for readName sorted when a read is filtered due to flags | ||
+ | *** add sorting validation | ||
+ | ** Update to [[BamUtil: bam2FastQ|bam2FastQ]] | ||
+ | *** add --merge option to generate interleaved files. | ||
+ | *** update to open the input file before opening the output files, so if there is an error, the outputs aren't opened | ||
+ | ** Update to [[BamUtil: mergeBam|mergeBam]] | ||
+ | *** add option to ignore the RG PI field when checking headers | ||
+ | *** add more informative header merge error messages | ||
+ | * [[Media:BamUtil.1.0.9.tgz|BamUtil.1.0.9.tgz]] - Released 7/7/2013 | ||
+ | ** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.9]] (version 1.0.7 should also work) | ||
+ | ** Update to [[BamUtil: mergeBam|mergeBam]] | ||
+ | *** Update to ignore PG lines with duplicate IDs | ||
+ | *** Update to accept merges of matching RG lines | ||
+ | *** Update to log to stderr if no log/out file is specified | ||
− | + | *[[Media:BamUtil.1.0.7.tgz|BamUtil.1.0.7.tgz]] - Released 1/29/2013 | |
+ | ** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.7]] or above | ||
+ | ** Update to fix some compile issues on ubuntu 12.10 | ||
+ | ** Update use of SamRecord::getStringTag to expect the return of a const string pointer due to libStatGen v1.0.7 updates | ||
+ | ** Update SamReferenceInfo usage due to libStatGen v1.0.7 updates | ||
+ | ** Update to [[BamUtil: diff|diff]] | ||
+ | *** Fix DIFF to test and properly handle running out of available records. Previously no message was printed when this happened and there was a bug for which file it freed | ||
+ | ** Update to [[BamUtil: clipOverlap|clipOverlap]] | ||
+ | *** Update to facilitate adding other overlap handling functions | ||
+ | ** Update to [[BamUtil: mergeBam|mergeBam]] (formerly RGMergeBam) | ||
+ | *** Rename RGMergeBam to MergeBam | ||
+ | *** Update to handle files that already have an RG | ||
+ | *[[Media:BamUtil.1.0.6.tgz|BamUtil.1.0.6.tgz]] - Released 11/14/2012 | ||
+ | ** Update to [[BamUtil: trimBam|trimBam]] | ||
+ | *** Update to allow trimming a different number of bases from each end of the read | ||
+ | *[[Media:BamUtil.1.0.5.tgz|BamUtil.1.0.5.tgz]] - Released 10/24/2012 | ||
+ | ** Update to [[BamUtil: dedup|dedup]] | ||
+ | *** Update logic for which pair to keep if they have the same quality | ||
+ | ** Update to [[BamUtil: polishBam|polishBam]] | ||
+ | *** Update to print the number of successful header additions | ||
+ | ** Update to [[BamUtil: recab|recab]] | ||
+ | *** Update to print the number of base skipped due to the base quality | ||
+ | ** General Updates | ||
+ | *** Update to add compile option to compile without C++0x/C++11 | ||
+ | *BamUtil.1.0.4.tgz - Released skipped | ||
+ | *[[Media:BamUtil.1.0.3.tgz|BamUtil.1.0.3.tgz]] - Released 09/19/2012 | ||
+ | ** Adds: [[BamUtil: dedup|dedup]] [[BamUtil: recab|recab]] | ||
+ | ** General Updates | ||
+ | *** Update Logger to write to stderr if output is stdout | ||
+ | ** Update to [[BamUtil: stats|stats]] | ||
+ | *** Add required/exclude flags | ||
+ | *** Exclude Clips if excluding umapped | ||
+ | *** Add --withinRegion flag | ||
+ | *** Update phred/qual counts to be uint64_t instead of int to avoid overflow | ||
+ | ** Update to [[BamUtil: validate|validate]] | ||
+ | *** Detect header failures | ||
+ | ** Update to [[BamUtil: diff|diff]] | ||
+ | *** Update to specify chromosome/pos in ZP as a string rather than int so both can be shown | ||
+ | ** Update to [[BamUtil: readReference|readReference]] | ||
+ | *** Output error message if the reference name is not found | ||
+ | ** Update to [[BamUtil: splitChromosome|splitChromosome]] | ||
+ | *** Update to actually split the chromosomes and not just hard coded to output chromosomes ids 0-22 | ||
+ | ** Update Makefile to have cloneLib for cloning libStatGen | ||
*[[Media:BamUtil.1.0.2.tgz|BamUtil.1.0.2.tgz]] - Released 05/16/2012 | *[[Media:BamUtil.1.0.2.tgz|BamUtil.1.0.2.tgz]] - Released 05/16/2012 | ||
** Adds: [[BamUtil: bam2FastQ|bam2FastQ]] | ** Adds: [[BamUtil: bam2FastQ|bam2FastQ]] | ||
Line 84: | Line 253: | ||
**Initial release of just bamUtil. It started from the tool found in the deprecated StatGen repository. | **Initial release of just bamUtil. It started from the tool found in the deprecated StatGen repository. | ||
**Contains: [[BamUtil: validate|validate]], [[BamUtil: convert|convert]], [[BamUtil: dumpHeader|dumpHeader]], [[BamUtil: splitChromosome|splitChromosome]], [[BamUtil: writeRegion|writeRegion]], [[BamUtil: dumpRefInfo|dumpRefInfo]], [[BamUtil: dumpIndex|dumpIndex]], [[BamUtil: readIndexedBam|readIndexedBam]], [[BamUtil: filter|filter]], [[BamUtil: readReference|readReference]], [[BamUtil: revert|revert]], [[BamUtil: diff|diff]], [[BamUtil: squeeze|squeeze]], [[BamUtil: findCigars|findCigars]], [[BamUtil: stats|stats]] | **Contains: [[BamUtil: validate|validate]], [[BamUtil: convert|convert]], [[BamUtil: dumpHeader|dumpHeader]], [[BamUtil: splitChromosome|splitChromosome]], [[BamUtil: writeRegion|writeRegion]], [[BamUtil: dumpRefInfo|dumpRefInfo]], [[BamUtil: dumpIndex|dumpIndex]], [[BamUtil: readIndexedBam|readIndexedBam]], [[BamUtil: filter|filter]], [[BamUtil: readReference|readReference]], [[BamUtil: revert|revert]], [[BamUtil: diff|diff]], [[BamUtil: squeeze|squeeze]], [[BamUtil: findCigars|findCigars]], [[BamUtil: stats|stats]] | ||
+ | |||
+ | == Citation == | ||
+ | If you use BamUtil, please cite our publication on GotCloud which includes BamUtil: | ||
+ | [http://genome.cshlp.org/content/early/2015/04/14/gr.176552.114.abstract Jun, Goo, et al. "An efficient and scalable analysis framework for variant extraction and refinement from population scale DNA sequence data." Genome research (2015): gr-176552.] | ||
+ | |||
= Programs = | = Programs = | ||
Line 89: | Line 263: | ||
The software reads the beginning of an input file to determine if it is SAM/BAM. To determine the format (SAM/BAM) of the output file, the software checks the output file's extension. If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file. | The software reads the beginning of an input file to determine if it is SAM/BAM. To determine the format (SAM/BAM) of the output file, the software checks the output file's extension. If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file. | ||
− | + | {{BamUtilPrograms}} | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− |
Latest revision as of 16:14, 11 September 2021
bamUtil Overview
bamUtil is a repository that contains several programs that perform operations on SAM/BAM files. All of these programs are built into a single executable, bam
.
Getting Help
If you have any questions please use the bamUtil GitHub page to raise and issue.
See BamUtil: FAQ to see if your question has already been answered.
Where to Find It
The bamUtil repository is available both via release downloads and via github.
On github, https://github.com/statgen/bamUtil, you can both browse and download the bamUtil source code as well as explore the history of changes.
You can obtain the source either with or without git.
The releases may be available both with and without libStatGen included.
If you do not use the release version that already contains libStatGen, you need to download the library: libStatGen.
If you try to compile bamUtil and it cannot find libStatGen, it will fail and provide instructions of what to do next:
- if libStatGen is in a different location then expected
- follow the directions to set the path to libStatGen
- if libStatGen is not downloaded and you have git
make libStatGen
will download via git and build libStatGen
- if libStatGen is not downloaded and you don't have git
- See libStatGen
Using Git To Track the Current Development Version
Clone (get your own copy)
You can create your own git clone (copy) using:
git clone https://github.com/statgen/bamUtil.git
or
git clone git://github.com/statgen/bamUtil.git
Either of these commands create a directory called bamUtil
in the current directory.
Then just cd bamUtil
and compile.
Get the latest Updates (update your copy)
To update your copy to the latest version (a major advantage of using git):
cd pathToYourCopy/bamUtil
make clean
git pull
make all
Git Refresher
If you decide to use git, but need a refresher, see How To Use Git or Notes on how to use git (if you have access)
Downloading From GitHub Without Git
If you download the latest code/version, make sure you periodically update it by downloading a newer version.
From github you can download:
- Latest Code (master branch)
- via Website
- Goto: https://github.com/statgen/bamUtil
- Click on the
Download ZIP
button on the right side panel.
- via Command Line
- via Website
- Specific Release (via a tag)
- via Website
- Goto: https://github.com/statgen/bamUtil/releases to see the available releases
- Click
zip
ortar.gz
for the desired version.
- via Command Line
wget https://github.com/statgen/bamUtil/archive/<tagName>.tar.gz
- or
wget https://github.com/statgen/bamUtil/archive/<tagName>.zip
- via Website
After downloading the file, uncompress (unzip/untar) it. The directory created will be named bamUtil-<name of version you downloaded>
.
Building
After obtaining the bamUtil repository (either by download or from github), compile the code using:
make all
Object (.o) files are compiled into the obj
directory with a subdirectory debug
and profile
for the debugging and profiling objects.
This creates the executable(s) in the bamUtil/bin/
directory, the debug executable(s) in the bamUtil/bin/debug/
directory, and the profiling executable(s) in the bamUtil/bin/profile/
directory.
make install
installs the opt binary if you have permission.
make test
compiles for opt, debug, and profile and runs the tests (found in the test
subdirectory).
To see all make options, type make help
.
If compilation fails due to warnings being treated as errors, please contact us so we can fix the warnings. As a work-around to get it to compile, you can disable the treatment of warnings as errors by editing libStatGen/general/Makefile to remove -Werror
.
Releases
If you prefer to run the last official release rather than the latest development version, you can download that here.
There are two versions of the release, one that include libStatGen and one that does not. If you already have libStatGen installed and want to use your own copy, use the version that does not include libStatGen.
Full Release (includes libStatGen)
To install an official release, unpack the downloaded file (tar xvf), cd into the bamUtil_x.x.x directory and type make all.
For version 1.0.14 and later, please download libStatGen and bamUtil separately:
Version 1.0.14 - Released 7/8/2015
Older Releases
- BamUtilLibStatGen.1.0.13.tgz - Released 2/20/2015
- Contains: libStatGen version 1.0.13 - see link for version updates
- Contains: bamUtil version 1.0.13 - see link for version updates
- BamUtilLibStatGen.1.0.12.tgz - Released 5/14/2014
- Contains: libStatGen version 1.0.12 - see link for version updates
- Contains: bamUtil version 1.0.12 - see link for version updates
- Adds regions to mergeBam
- Accept ',' delimiters for the tags string input in squeeze, revert, & diff
- BamUtilLibStatGen.1.0.11.tar.gz - Released 2/28/2014
- Contains: libStatGen version 1.0.11 - see link for version updates
- Contains: bamUtil version 1.0.11 - see link for version updates
- Now properly supports 'B' & 'f' tags
- Cleanup - compile issues
- BamUtilLibStatGen.1.0.10.tar.gz - Released 1/2/2014
- Contains: libStatGen version 1.0.10 - see link for version updates
- Contains: bamUtil version 1.0.10 - see link for version updates
- Adds PhoneHome/Version checking.
- BamUtilLibStatGen.1.0.9.tgz - Released 7/7/2013
- Contains: libStatGen version 1.0.9
- Contains: bamUtil version 1.0.9
- Update to mergeBam
- Update to ignore PG lines with duplicate IDs
- Update to accept merges of matching RG lines
- Update to log to stderr if no log/out file is specified
- There is no version 1.0.8. It was skipped to stay in line with libStatGen versions (libStatGen 1.0.8 added vcf support)
- BamUtilLibStatGen.1.0.7.tgz - Released 1/29/2013
- Contains: libStatGen version 1.0.7
- Contains: bamUtil version 1.0.7
- Update to fix some compile issues on ubuntu 12.10
- Update use of SamRecord::getStringTag to expect the return of a const string pointer due to libStatGen v1.0.7 updates
- Update SamReferenceInfo usage due to libStatGen v1.0.7 updates
- Update to diff
- Fix DIFF to test and properly handle running out of available records. Previously no message was printed when this happened and there was a bug for which file it freed
- Update to clipOverlap
- Update to facilitate adding other overlap handling functions
- Update to mergeBam (formerly RGMergeBam)
- Rename RGMergeBam to MergeBam
- Update to handle files that already have an RG
- BamUtilLibStatGen.1.0.6.tgz - Released 11/14/2012
- Contains: libStatGen version 1.0.6
- Contains: bamUtil version 1.0.6
- Update to trimBam
- Update to allow trimming a different number of bases from each end of the read
- BamUtilLibStatGen.1.0.5.tgz - Released 10/24/2012
- Contains: libStatGen version 1.0.5
- Contains: bamUtil version 1.0.5
- Updates to: dedup, polishBam, recab
- Update to add compile option to compile without C++0x/C++11
- See below for additional details on updates
- BamUtilLibStatGen.1.0.4.tgz - Released skipped
- BamUtilLibStatGen.1.0.3.tgz - Released 09/19/2012
- Contains: libStatGen version 1.0.3
- Contains: bamUtil version 1.0.3
- Adds: dedup recab
- BamUtilLibStatGen.1.0.2.tgz - Released 05/16/2012
- Contains: libStatGen version 1.0.2
- Adds: bam2FastQ
- BamUtilLibStatGen.1.0.1.tgz - Released 05/04/2012
- Contains: libStatGen version 1.0.1
- Adds: splitBam, clipOverlap, trimBam, polishBam, rgMergeBam, gapInfo
- Adds additional functionality to stats
- Adds leftShifting to writeRegion and convert
- Adds more diff fields to diff
- BamUtilLibStatGen.1.0.0.tgz - Released 10/10/2011
- Initial release of bamUtil that includes libStatGen version 1.0.0. It started from the tool found in the deprecated StatGen repository.
- Contains: libStatGen version 1.0.0 validate, convert, dumpHeader, splitChromosome, writeRegion, dumpRefInfo, dumpIndex, readIndexedBam, filter, readReference, revert, diff, squeeze, findCigars, stats
Release of just BamUtil (does not include libStatGen)
To install an official release, unpack the downloaded file (tar xvf), cd into the bamUtil_x.x.x directory and type make all.
BamUtil.1.0.14 Release Notes
- BamUtil Version 1.0.14 - Released 7/8/2015
- https://github.com/statgen/bamUtil/archive/v1.0.14.tar.gz
- Requires, but does not include: libStatGen version 1.0.14
- Update trimBam
- Add option to soft clip (-c) instead of trimming
- Update clipOverlap
- Add option to mark reads as unmapped if they are entirely clipped
- Update to bam2FastQ
- Add option to gzip the output files
- Add option to split Read Groups into separate fastq files
- Add option to get the quality from a tag
- Update recab
- Update to ignore ref 'N' when building the recalibration table
- Add ability to bin
- Add Dedup_LowMem tool
Older Releases
- BamUtil Version 1.0.13 - Released 2/20/2015
- https://github.com/statgen/bamUtil/archive/v1.0.13.tar.gz
- Requires, but does not include: libStatGen version 1.0.13
- Makefile Updates
- Improve logic to determine actual path for the library
- Update to append to USER_COMPILE_VARS even if specified on the command line
- Update writeRegion
- Add option to specify readnames to keep in a file
- Fixed bug that if a read overlapped 2 BED positions, it was printed twice
- Update to bam2FastQ
- Update to skip non-primary reads
- Update to polishBam
- Update to handle '\t' string inputs and to add CO option
- Fix MD5sum calculation to convert fasta to uppercase prior to calculating
- BamUtil.1.0.12.tgz - Released 5/14/2014
- Requires, but does not include: libStatGen version 1.0.12
- Update mergeBam
- Add a regions option
- Update to squeeze, revert, diff
- Also accept ',' instead of just ';' as the delimiter in the input tags string.
- BamUtil.1.0.11.tgz - Released 2/28/2014
- Requires, but does not include: libStatGen version 1.0.11
- Adds support for 'B' & 'f' tags that did not work properly before.
- Update splitBam & polishBam
- Update to work properly if log & output file are not specified (no longer creates '.log')
- Update Main dummy/example tool to indicate the correct tool
- Update to bam2FastQ, clipOverlap, filter, mergeBam, splitBam, squeeze, stats
- Cleanup usage/parameter descriptions
- Update revert
- Update compatibility with libStatGen due to 'B' & 'f' tag handling updates
- Add tests for 'B' & 'f' tags
- Requires, but does not include: libStatGen version 1.0.11
- BamUtil.1.0.10.tar.gz - Released 1/2/2014
- Requires, but does not include: libStatGen version 1.0.10
- All
- Add PhoneHome/version checking
- Make sub-program names case independent
- Fix Logger.cpp compiler warning
- Adds: explainFlags - describes the SAM/BAM flags based on the flag value
- Update to stats
- Fix Stats to not try to not try to process a record after it is out of the loop (it would already have been processed or is invalid)
- Update to splitBam
- fix description of --noeof option
- Update to writeRegion
- add exclude/required flags
- Update to dedup & recab
- Ignore secondary reads for dedup and making the recalibration table.
- skip QC Failures
- add excludeFlags parameters
- Update to clipOverlap
- add exclude flags
- fix bug for readName sorted when a read is filtered due to flags
- add sorting validation
- Update to bam2FastQ
- add --merge option to generate interleaved files.
- update to open the input file before opening the output files, so if there is an error, the outputs aren't opened
- Update to mergeBam
- add option to ignore the RG PI field when checking headers
- add more informative header merge error messages
- BamUtil.1.0.9.tgz - Released 7/7/2013
- Requires, but does not include: libStatGen version 1.0.9 (version 1.0.7 should also work)
- Update to mergeBam
- Update to ignore PG lines with duplicate IDs
- Update to accept merges of matching RG lines
- Update to log to stderr if no log/out file is specified
- BamUtil.1.0.7.tgz - Released 1/29/2013
- Requires, but does not include: libStatGen version 1.0.7 or above
- Update to fix some compile issues on ubuntu 12.10
- Update use of SamRecord::getStringTag to expect the return of a const string pointer due to libStatGen v1.0.7 updates
- Update SamReferenceInfo usage due to libStatGen v1.0.7 updates
- Update to diff
- Fix DIFF to test and properly handle running out of available records. Previously no message was printed when this happened and there was a bug for which file it freed
- Update to clipOverlap
- Update to facilitate adding other overlap handling functions
- Update to mergeBam (formerly RGMergeBam)
- Rename RGMergeBam to MergeBam
- Update to handle files that already have an RG
- BamUtil.1.0.6.tgz - Released 11/14/2012
- Update to trimBam
- Update to allow trimming a different number of bases from each end of the read
- Update to trimBam
- BamUtil.1.0.5.tgz - Released 10/24/2012
- Update to dedup
- Update logic for which pair to keep if they have the same quality
- Update to polishBam
- Update to print the number of successful header additions
- Update to recab
- Update to print the number of base skipped due to the base quality
- General Updates
- Update to add compile option to compile without C++0x/C++11
- Update to dedup
- BamUtil.1.0.4.tgz - Released skipped
- BamUtil.1.0.3.tgz - Released 09/19/2012
- Adds: dedup recab
- General Updates
- Update Logger to write to stderr if output is stdout
- Update to stats
- Add required/exclude flags
- Exclude Clips if excluding umapped
- Add --withinRegion flag
- Update phred/qual counts to be uint64_t instead of int to avoid overflow
- Update to validate
- Detect header failures
- Update to diff
- Update to specify chromosome/pos in ZP as a string rather than int so both can be shown
- Update to readReference
- Output error message if the reference name is not found
- Update to splitChromosome
- Update to actually split the chromosomes and not just hard coded to output chromosomes ids 0-22
- Update Makefile to have cloneLib for cloning libStatGen
- BamUtil.1.0.2.tgz - Released 05/16/2012
- Adds: bam2FastQ
- BamUtil.1.0.1.tgz - Released 05/04/2012
- Adds: splitBam, clipOverlap, trimBam, polishBam, rgMergeBam, gapInfo
- Adds additional functionality to stats
- Adds leftShifting to writeRegion and convert
- Adds more diff fields to diff
- BamUtil.1.0.0.tgz - Released 10/10/2011
- Initial release of just bamUtil. It started from the tool found in the deprecated StatGen repository.
- Contains: validate, convert, dumpHeader, splitChromosome, writeRegion, dumpRefInfo, dumpIndex, readIndexedBam, filter, readReference, revert, diff, squeeze, findCigars, stats
Citation
If you use BamUtil, please cite our publication on GotCloud which includes BamUtil: Jun, Goo, et al. "An efficient and scalable analysis framework for variant extraction and refinement from population scale DNA sequence data." Genome research (2015): gr-176552.
Programs
The software reads the beginning of an input file to determine if it is SAM/BAM. To determine the format (SAM/BAM) of the output file, the software checks the output file's extension. If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file.
BamUtil is built using libStatGen. Running bin/bam
with no parameters will print the usage information for the bam
executable. Running bin/bam subProgram
will print the usage information for the BamUtil sub-program.
Tools to Rewrite SAM/BAM Files:
- convert - Convert SAM/BAM to SAM/BAM (optionally converts between '=' & bases in the sequence
- writeRegion - Write a file with reads in the specified region and/or have the specified read name
- splitChromosome - Split BAM into 1 file per Chromosome
- splitBam - Split BAM into 1 file per Read Group
- findCigars - Output just the reads that contain any of the specified CIGAR operations.
- BAM Recovery - Recover corrupted BAM files
Tools to Modify & write SAM/BAM Files:
- clipOverlap - Clip overlapping read pairs in a SAM/BAM File already sorted by Coordinate or ReadName so they do not overlap
- filter - Filter reads by soft clipping ends with too high of a mismatch percentage and by marking reads unmapped if the quality of mismatches is too high
- revert - Revert SAM/BAM replacing the specified fields with their previous values (if known) and removes specified tags
- squeeze - Reduce file size by dropping OQ fields, duplicates, & specified tags, using '=' when a base matches the reference, binning quality scores, and replacing readNames with unique integers
- trimBam - Trim the ends of reads in a SAM/BAM file changing read ends to 'N' and quality to '!' or by doing soft clips
- mergeBam - Merge multiple BAMs and headers appending ReadGroupIDs if necessary
- polishBam - Add/update header lines & add the RG tag to each record
- dedup - Mark or remove duplicates, can also perform recalibration
- recab - Recalibrate base qualities
Informational Tools:
- validate - Validate a SAM/BAM File, checking file format & printing statistics
- diff - Diff 2 coordinate sorted SAM/BAM files.
- stats - Generate some basic statistics for a SAM/BAM file
- gapInfo - Print information on the gap between read pairs in a SAM/BAM File.
Helper Tools to Print Information In Readable Format:
- dumpHeader - Print the SAM/BAM Header to the screen
- dumpRefInfo - Print SAM/BAM Reference Name Information from the header
- dumpIndex - Print BAM Index File to the screen in a readable format
- readReference - Print the reference string for the specified region to the screen
- explainFlags - Describe SAM/BAM flags
Additional Tools:
- bam2FastQ - Convert the specified BAM file to fastQs.
Dummy/Example Tools:
- readIndexedBam - Read an indexed BAM file reference by reference id -1 to the max reference id and write it out as a SAM/BAM file
ASP programs: ASP is a new format that is currently in production, so this tool is not yet available for public release.