BamUtil
bamUtil Overview
bamUtil is a repository that contains several programs that perform operations on SAM/BAM files. All of these programs are built into a single executable, bam
.
Where to Find It
The bamUtil repository is available both via release downloads (coming soon) and via github.
On github, you can both browse and download the latest version of the repository as well as explore the history of changes.
You can access the latest version with or without git.
If you download from github or use git to keep up to date, you also need to download our library: libStatGen.
The releases will be available both with and without libStatGen included. If you download the verison without libStatGen included, you will also need to download libStatGen separately. (It will be available without libStatGen in case you already have a downloaded version of libStatGen that you want to use.
Releases
Release downloads are Coming Soon.
Using github
Using Git To Track the Current Development Version
Clone (get your own copy)
You can create your own git clone (copy) using:
git clone https://github.com/statgen/bamUtil.git
or
git clone git://github.com/statgen/bamUtil.git
Either of these commands create a directory called bamUtil
in the current directory.
Then just cd bamUtil
and compile.
Get the latest Updates (update your copy)
To update your copy to the latest version (a major advantage of using git):
cd pathToYourCopy/bamUtil
make clean
git pull
make all
Git Refresher
If you decide to use git, but need a refresher, see How To Use Git or Notes on how to use git (if you have access)
Downloading From GitHub Without Git
Periodically download the latest copy from github from the "Downloads" link on the webpage: https://github.com/statgen/bamUtil/archives/master.
The downloaded tar file is named "statgen-bamUtil-someHexNumber.tar.gz". The directory created when it is untared shares the same base name. I recommend that you do not change the name of the directory. If you want one called bamUtil, create a link to this directory. The hex number in the directory name identifies the version of the repository that you downloaded and is necessary to easily troubleshoot any issues you encounter. If you must rename the directory, be sure to record the hex number that was on the download for future reference.
Building
After obtaining the bamUtil repository (either by download or from github), compile the code using make all
. This creates the executable, bam
, in the bamUtil/bin/
directory, the debug executable in the bamUtil/bin/debug/
directory, and the profiling executable in the bamUtil/bin/profile/
directory.
Programs
The software reads the beginning of an input file to determine if it is SAM/BAM. To determine the format (SAM/BAM) of the output file, the software checks the output file's extension. If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file.
The bam executable has the following functions.
- Rewrite SAM/BAM Files
- convert - Read a SAM/BAM file and write as a SAM/BAM file (optionally converts between '=' & bases in the sequence)
- splitChromosome - Split BAM by Chromosome
- writeRegion - Write the alignments in the indexed BAM file that fall into the specified region and/or have the specified read name
- filter - Filter reads by clipping ends with too high of a mismatch percentage and by marking reads unmapped if the quality of mismatches is too high
- revert - Revert SAM/BAM replacing the specified fields with their previous values (if known) and removes specified tags
- squeeze - reduces files size by dropping OQ fields, duplicates, specified tags, using '=' when a base matches the reference, binning quality scores, and replacing readNames with unique integers
- findCigars - Output just the reads that contain any of the specified CIGAR operations
- readIndexedBam - Read an indexed BAM file reference by reference id -1 to the max reference id and write it out as a SAM/BAM file
- Informational Tools
- Print Information in Readable Form:
This executable is built using C++ Library: libStatGen.
Just running ./bam will print the Usage information for the bam executable.