BamUtil
bamUtil Overview
bamUtil is a repository that contains several programs that perform operations on SAM/BAM files. All of these programs are built into a single executable, bam
.
Where to Find It
The bamUtil repository is available both via release downloads and via github.
On github, https://github.com/statgen/bamUtil, you can both browse and download the bamUtil source code as well as explore the history of changes.
You can obtain the source either with or without git.
The releases may be available both with and without libStatGen included.
If you do not use the release version that already contains libStatGen, you need to download the library: libStatGen.
If you try to compile bamUtil and it cannot find libStatGen, it will fail and provide instructions of what to do next:
- if libStatGen is in a different location then expected
- follow the directions to set the path to libStatGen
- if libStatGen is not downloaded and you have git
make libStatGen
will download via git and build libStatGen
- if libStatGen is not downloaded and you don't have git
- See libStatGen
Using Git To Track the Current Development Version
Clone (get your own copy)
You can create your own git clone (copy) using:
git clone https://github.com/statgen/bamUtil.git
or
git clone git://github.com/statgen/bamUtil.git
Either of these commands create a directory called bamUtil
in the current directory.
Then just cd bamUtil
and compile.
Get the latest Updates (update your copy)
To update your copy to the latest version (a major advantage of using git):
cd pathToYourCopy/bamUtil
make clean
git pull
make all
Git Refresher
If you decide to use git, but need a refresher, see How To Use Git or Notes on how to use git (if you have access)
Downloading From GitHub Without Git
If you download the latest code/version, make sure you periodically update it by downloading a newer version.
From github you can download:
- Latest Code (master branch)
- via Website
- Goto: https://github.com/statgen/bamUtil
- Click on the
Download ZIP
button on the right side panel.
- via Command Line
- via Website
- Specific Release (via a tag)
- via Website
- Goto: https://github.com/statgen/bamUtil/releases to see the available releases
- Click
zip
ortar.gz
for the desired version.
- via Command Line
wget https://github.com/statgen/bamUtil/archive/<tagName>.tar.gz
- or
wget https://github.com/statgen/bamUtil/archive/<tagName>.zip
- via Website
After downloading the file, uncompress (unzip/untar) it. The directory created will be named bamUtil-<name of version you downloaded>
.
Building
After obtaining the bamUtil repository (either by download or from github), compile the code using:
make all
Object (.o) files are compiled into the obj
directory with a subdirectory debug
and profile
for the debugging and profiling objects.
This creates the executable(s) in the bamUtil/bin/
directory, the debug executable(s) in the bamUtil/bin/debug/
directory, and the profiling executable(s) in the bamUtil/bin/profile/
directory.
make install
installs the opt binary if you have permission.
make test
compiles for opt, debug, and profile and runs the tests (found in the test
subdirectory).
To see all make options, type make help
.
If compilation fails due to warnings being treated as errors, please contact us so we can fix the warnings. As a work-around to get it to compile, you can disable the treatment of warnings as errors by editing libStatGen/general/Makefile to remove -Werror
.
Releases
If you prefer to run the last official release rather than the latest development version, you can download that here.
There are two versions of the release, one that include libStatGen and one that does not. If you already have libStatGen installed and want to use your own copy, use the version that does not include libStatGen.
Full Release (includes libStatGen)
To install an official release, unpack the downloaded file (tar xvf), cd into the bamUtil_x.x.x directory and type make all.
BamUtilLibStatGen.1.0.5.tgz - Released 10/24/2012
BamUtilLibStatGen.1.0.5 Release Notes
- Contains: libStatGen version 1.0.5
- Contains: bamUtil version 1.0.5
- Updates to: dedup, polishBam, recab
- Update to add compile option to compile without C++0x/C++11
- See below for additional details on updates
Older Releases
- BamUtilLibStatGen.1.0.4.tgz - Released skipped
- BamUtilLibStatGen.1.0.3.tgz - Released 09/19/2012
- Contains: libStatGen version 1.0.3
- Contains: bamUtil version 1.0.3
- Adds: dedup recab
- BamUtilLibStatGen.1.0.2.tgz - Released 05/16/2012
- Contains: libStatGen version 1.0.2
- Adds: bam2FastQ
- BamUtilLibStatGen.1.0.1.tgz - Released 05/04/2012
- Contains: libStatGen version 1.0.1
- Adds: splitBam, clipOverlap, trimBam, polishBam, rgMergeBam, gapInfo
- Adds additional functionality to stats
- Adds leftShifting to writeRegion and convert
- Adds more diff fields to diff
- BamUtilLibStatGen.1.0.0.tgz - Released 10/10/2011
- Initial release of bamUtil that includes libStatGen version 1.0.0. It started from the tool found in the deprecated StatGen repository.
- Contains: libStatGen version 1.0.0 validate, convert, dumpHeader, splitChromosome, writeRegion, dumpRefInfo, dumpIndex, readIndexedBam, filter, readReference, revert, diff, squeeze, findCigars, stats
Release of just BamUtil (does not include libStatGen)
To install an official release, unpack the downloaded file (tar xvf), cd into the bamUtil_x.x.x directory and type make all.
BamUtil.1.0.5.tgz - Released 10/24/2012
BamUtil.1.0.5 Release Notes
- Update to dedup
- Update logic for which pair to keep if they have the same quality
- Update to polishBam
- Update to print the number of successful header additions
- Update to recab
- Update to print the number of base skipped due to the base quality
- General Updates
- Update to add compile option to compile without C++0x/C++11
Older Releases
- BamUtil.1.0.4.tgz - Released skipped
- BamUtil.1.0.3.tgz - Released 09/19/2012
- Adds: dedup recab
- General Updates
- Update Logger to write to stderr if output is stdout
- Update to stats
- Add required/exclude flags
- Exclude Clips if excluding umapped
- Add --withinRegion flag
- Update phred/qual counts to be uint64_t instead of int to avoid overflow
- Update to validate
- Detect header failures
- Update to diff
- Update to specify chromosome/pos in ZP as a string rather than int so both can be shown
- Update to readReference
- Output error message if the reference name is not found
- Update to splitChromosome
- Update to actually split the chromosomes and not just hard coded to output chromosomes ids 0-22
- Update Makefile to have cloneLib for cloning libStatGen
- BamUtil.1.0.2.tgz - Released 05/16/2012
- Adds: bam2FastQ
- BamUtil.1.0.1.tgz - Released 05/04/2012
- Adds: splitBam, clipOverlap, trimBam, polishBam, rgMergeBam, gapInfo
- Adds additional functionality to stats
- Adds leftShifting to writeRegion and convert
- Adds more diff fields to diff
- BamUtil.1.0.0.tgz - Released 10/10/2011
- Initial release of just bamUtil. It started from the tool found in the deprecated StatGen repository.
- Contains: validate, convert, dumpHeader, splitChromosome, writeRegion, dumpRefInfo, dumpIndex, readIndexedBam, filter, readReference, revert, diff, squeeze, findCigars, stats
Programs
The software reads the beginning of an input file to determine if it is SAM/BAM. To determine the format (SAM/BAM) of the output file, the software checks the output file's extension. If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file.
The bam executable has the following functions.
- Rewrite SAM/BAM Files
- convert - Read a SAM/BAM file and write as a SAM/BAM file (optionally converts between '=' & bases in the sequence)
- writeRegion - Write the alignments in the indexed BAM file that fall into the specified region and/or have the specified read name
- splitChromosome - Split BAM by Chromosome
- splitBam - Split SAM/BAM file by Read Group
- findCigars - Output just the reads that contain any of the specified CIGAR operations
- Modify & write SAM/BAM Files
- clipOverlap - Clip overlapping read pairs so they do not overlap
- filter - Filter reads by clipping ends with too high of a mismatch percentage and by marking reads unmapped if the quality of mismatches is too high
- revert - Revert SAM/BAM replacing the specified fields with their previous values (if known) and removes specified tags
- squeeze - Reduce file size by dropping OQ fields, duplicates, specified tags, using '=' when a base matches the reference, binning quality scores, and replacing readNames with unique integers
- trimBam - Trim end of reads, changing read ends to ‘N’ & quality to ‘!’
- polishBam – Add/Update header lines & add RG tag to each record
- rgMergeBam – Merge sorted BAM files adding Read Groups
- dedup – Mark or remove duplicates, can also perform recalibration
- recab - Recalibrate base qualities
- Informational Tools
- Print Information in Readable Form:
- Additional Tools
- Dummy/Example Tools:
This executable is built using C++ Library: libStatGen.
Just running ./bam will print the Usage information for the bam executable.