Changes

From Genome Analysis Wiki
Jump to: navigation, search

BamUtil

11,605 bytes added, 17:14, 11 September 2021
no edit summary
bamUtil is a repository that contains several programs that perform operations on SAM/BAM files. All of these programs are built into a single executable, <code>bam</code>.
 
== Getting Help ==
 
If you have any questions please use the [https://github.com/statgen/bamUtil bamUtil GitHub page] to raise and issue.
 
See [[BamUtil: FAQ]] to see if your question has already been answered.
== Where to Find It ==
The {{ToolGitRepo|repoName=bamUtil repository is available both via release downloads (coming soon) and via github.}}
On github, you can both browse and download the latest version of the repository as well as explore the history of changes.== Releases ==
You can access If you prefer to run the last official release rather than the latest development version with or without git, you can download that here.
There are two versions of the release, one that include libStatGen and one that does not. If you download from github or already have libStatGen installed and want to use git to keep up to dateyour own copy, you also need to download our library: [[C++ Library: libStatGen|use the version that does not include libStatGen]].
The releases will be available both with and without libStatGen included. If you download the verison without libStatGen included, you will also need to download libStatGen separately.=== Full Release (It will be available without libStatGen in case you already have a downloaded version of includes libStatGen that you want to use.) ===
=== Releases ===Release downloads are '''Coming Soon'''To install an official release, unpack the downloaded file (tar xvf), cd into the bamUtil_x.x.x directory and type make all.
For version 1.0.14 and later, please download libStatGen and bamUtil separately:
=== Using github ===
==== Using Git To Track the Current Development '''Version ====1.0.14 - Released 7/8/2015'''*[[LibStatGen Download#Official Releases|libStatGen version 1.0.14]]*[[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.14]]
===== Clone (get your own copy) =====
You can create your own git clone (copy) using:
git clone https://github.com/statgen/bamUtil.git
or
git clone git://github.com/statgen/bamUtil.git
Either '''Older Releases'''* [[Media:BamUtilLibStatGen.1.0.13.tgz|BamUtilLibStatGen.1.0.13.tgz‎]] - Released 2/20/2015** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.13]] - see link for version updates** Contains: [[#Release of these commands create a directory called <code>just BamUtil (does not include libStatGen)|bamUtil</code> in the current directoryversion 1.0.13]] - see link for version updates
Then just <code>cd bamUtil</code> and [[BamUtil#Building|compile]].
===== Get the latest Updates (update your copy) =====To update your copy to the latest version (a major advantage of using git)* [[Media:# <code>cd pathToYourCopyBamUtilLibStatGen.1.0.12.tar.gz|BamUtilLibStatGen.1.0.12.tgz‎]] - Released 5/bamUtil<14/code>2014** Contains: [[LibStatGen Download# <code>make clean</code>Official Releases|libStatGen version 1.0.12]] - see link for version updates** Contains: [[# <code>git pull</code>Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.12]] - see link for version updates** Adds regions to [[BamUtil: mergeBam|mergeBam]]# <code>make all</code>** Accept ',' delimiters for the tags string input in [[BamUtil: squeeze|squeeze]], [[BamUtil: revert|revert]], & [[BamUtil: diff|diff]]
=== Git Refresher ===If you decide to use git, but need a refresher, see *[[How To Use GitMedia:BamUtilLibStatGen.1.0.11.tar.gz|BamUtilLibStatGen.1.0.11.tar.gz‎]] or [https:- Released 2/28/statgen2014** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.sph0.umich.edu/wiki/How_To_Use_Git Notes on how to use git11]] - see link for version updates** Contains: [[#Release of just BamUtil (if you have accessdoes not include libStatGen)|bamUtil version 1.0.11]] - see link for version updates** Now properly supports 'B' & 'f' tags** Cleanup - compile issues
*[[Media:BamUtilLibStatGen.1.0.10.tar.gz|BamUtilLibStatGen.1.0.10.tar.gz‎]] - Released 1/2/2014
** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.10]] - see link for version updates
** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.10]] - see link for version updates
** Adds PhoneHome/Version checking.
==== Downloading From GitHub Without Git ====*[[Media:BamUtilLibStatGen.1.0.9.tgz|BamUtilLibStatGen.1.0.9.tgz‎]] - Released 7/7/2013Periodically download the latest copy from github from the "Downloads" link on the webpage** Contains: https[[LibStatGen Download#Official Releases|libStatGen version 1.0.9]] ** Contains:[[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.9]]** Update to [[BamUtil: mergeBam|mergeBam]]*** Update to ignore PG lines with duplicate IDs*** Update to accept merges of matching RG lines*** Update to log to stderr if no log//githubout file is specified* There is no version 1.0.8. It was skipped to stay in line with libStatGen versions (libStatGen 1.0.8 added vcf support)*[[Media:BamUtilLibStatGen.1.0.7.tgz|BamUtilLibStatGen.1.0.7.comtgz‎]] - Released 1/statgen29/2013** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.7]] ** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil/archives/masterversion 1.0.7]]** Update to fix some compile issues on ubuntu 12.10** Update use of SamRecord::getStringTag to expect the return of a const string pointer due to libStatGen v1.0.7 updates** Update SamReferenceInfo usage due to libStatGen v1.0.7 updates** Update to [[BamUtil: diff|diff]]*** Fix DIFF to test and properly handle running out of available records. Previously no message was printed when this happened and there was a bug for which file it freed** Update to [[BamUtil: clipOverlap|clipOverlap]]*** Update to facilitate adding other overlap handling functions** Update to [[BamUtil: mergeBam|mergeBam]] (formerly RGMergeBam)*** Rename RGMergeBam to MergeBam*** Update to handle files that already have an RG
The downloaded tar file is named "statgen*[[Media:BamUtilLibStatGen.1.0.6.tgz|BamUtilLibStatGen.1.0.6.tgz‎]] -bamUtil-someHexNumberReleased 11/14/2012** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.tar0.gz"6]] ** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1. The directory created when it is untared shares the same base name0. I recommend that you do not change the name 6]]** Update to [[BamUtil: trimBam|trimBam]]*** Update to allow trimming a different number of bases from each end of the directoryread*[[Media:BamUtilLibStatGen.1.0.5.tgz|BamUtilLibStatGen.1. If you want one called 0.5.tgz‎]] - Released 10/24/2012** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.5]] ** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtilversion 1.0.5]]** Updates to: [[BamUtil: dedup|dedup]], [[BamUtil: polishBam|polishBam]], create a link [[BamUtil: recab|recab]]** Update to add compile option to this directorycompile without C++0x/C++11** See [[#Release of just BamUtil (does not include libStatGen)|below]] for additional details on updates*BamUtilLibStatGen. The hex number in the directory name identifies the 1.0.4.tgz‎ - Released skipped*[[Media:BamUtilLibStatGen.1.0.3.tgz|BamUtilLibStatGen.1.0.3.tgz‎]] - Released 09/19/2012** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.3]] ** Contains: [[#Release of the repository that you downloaded just BamUtil (does not include libStatGen)|bamUtil version 1.0.3]]** Adds: [[BamUtil: dedup|dedup]] [[BamUtil: recab|recab]]*[[Media:BamUtilLibStatGen.1.0.2.tgz|BamUtilLibStatGen.1.0.2.tgz‎]] - Released 05/16/2012** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.2]] ** Adds: [[BamUtil: bam2FastQ|bam2FastQ]]*[[Media:BamUtilLibStatGen.1.0.1.tgz|BamUtilLibStatGen.1.0.1.tgz‎]] - Released 05/04/2012** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.1]] ** Adds: [[BamUtil: splitBam|splitBam]], [[BamUtil: clipOverlap|clipOverlap]], [[BamUtil: trimBam|trimBam]], [[BamUtil: polishBam|polishBam]], [[BamUtil: rgMergeBam|rgMergeBam]], [[BamUtil: gapInfo|gapInfo]]** Adds additional functionality to [[BamUtil: stats|stats]]** Adds leftShifting to [[BamUtil: writeRegion|writeRegion]] and is necessary [[BamUtil: convert|convert]]** Adds more diff fields to easily troubleshoot any issues you encounter[[BamUtil: diff|diff]]* [[Media:BamUtilLibStatGen. If you must rename 1.0.0.tgz|BamUtilLibStatGen.1.0.0.tgz‎]] - Released 10/10/2011**Initial release of bamUtil that includes libStatGen version 1.0.0. It started from the directory, be sure to record tool found in the hex number that was on the download for future referencedeprecated StatGen repository.**Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.0]] [[BamUtil: validate|validate]], [[BamUtil: convert|convert]], [[BamUtil: dumpHeader|dumpHeader]], [[BamUtil: splitChromosome|splitChromosome]], [[BamUtil: writeRegion|writeRegion]], [[BamUtil: dumpRefInfo|dumpRefInfo]], [[BamUtil: dumpIndex|dumpIndex]], [[BamUtil: readIndexedBam|readIndexedBam]], [[BamUtil: filter|filter]], [[BamUtil: readReference|readReference]], [[BamUtil: revert|revert]], [[BamUtil: diff|diff]], [[BamUtil: squeeze|squeeze]], [[BamUtil: findCigars|findCigars]], [[BamUtil: stats|stats]]
== Building ==After obtaining the bamUtil repository Release of just BamUtil (either by download or from githubdoes not include libStatGen), compile the code using <code>make all</code>. This creates the executable, <code>bam</code>, in the <code>bamUtil/bin/</code> directory, the debug executable in the <code>bamUtil/bin/debug/</code> directory, and the profiling executable in the <code>bamUtil/bin/profile/</code> directory.===
To install an official release, unpack the downloaded file (tar xvf), cd into the bamUtil_x.x.x directory and type make all.
'''BamUtil.1.0.14 Release Notes'''
* BamUtil Version 1.0.14 - Released 7/8/2015
** https://github.com/statgen/bamUtil/archive/v1.0.14.tar.gz
** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.14]]
** Update [[BamUtil: trimBam|trimBam]]
*** Add option to soft clip (-c) instead of trimming
** Update [[BamUtil: clipOverlap|clipOverlap]]
*** Add option to mark reads as unmapped if they are entirely clipped
** Update to [[BamUtil: bam2FastQ|bam2FastQ]]
*** Add option to gzip the output files
*** Add option to split Read Groups into separate fastq files
*** Add option to get the quality from a tag
** Update [[BamUtil: recab|recab]]
*** Update to ignore ref 'N' when building the recalibration table
*** Add ability to bin
** Add Dedup_LowMem tool
= Programs ='''Older Releases'''* BamUtil Version 1.0.13 - Released 2/20/2015** https://github.com/statgen/bamUtil/archive/v1.0.13.tar.gz** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.13]]** Makefile Updates*** Improve logic to determine actual path for the library*** Update to append to USER_COMPILE_VARS even if specified on the command line** Update [[BamUtil: writeRegion|writeRegion]]*** Add option to specify readnames to keep in a file*** Fixed bug that if a read overlapped 2 BED positions, it was printed twice** Update to [[BamUtil: bam2FastQ|bam2FastQ]]*** Update to skip non-primary reads** Update to [[BamUtil: polishBam|polishBam]]*** Update to handle '\t' string inputs and to add CO option*** Fix MD5sum calculation to convert fasta to uppercase prior to calculating
The software reads the beginning of an input file to determine if it is SAM* [[Media:BamUtil.1.0.12.tgz|BamUtil.1.0.12.tgz‎]] - Released 5/BAM. To determine the format (SAM14/BAM) of the output file2014** Requires, the software checks the output file's extensionbut does not include: [[LibStatGen Download#Official Releases|libStatGen version 1. If the extension is "0.bam" it writes 12]]** Update [[BamUtil: mergeBam|mergeBam]]*** Add a BAM fileregions option** Update to [[BamUtil: squeeze|squeeze]], [[BamUtil: revert|revert]], [[BamUtil: diff|diff]]*** Also accept ', otherwise it writes a SAM file' instead of just ';' as the delimiter in the input tags string.
The bam executable has * [[Media:BamUtil.1.0.11.tgz|BamUtil.1.0.11.tgz‎]] - Released 2/28/2014** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.11]]*** Adds support for 'B' & 'f' tags that did not work properly before.** Update [[BamUtil: splitBam|splitBam]] & [[BamUtil: polishBam|polishBam]]*** Update to work properly if log & output file are not specified (no longer creates '.log')** Update Main dummy/example tool to indicate the following functions.correct tool** Update to [[BamUtil: bam2FastQ|bam2FastQ]], [[BamUtil: clipOverlap|clipOverlap]], [[BamUtil: filter|filter]], [[BamUtil: mergeBam|mergeBam]], [[BamUtil: splitBam|splitBam]], [[BamUtil: squeeze|squeeze]], [[BamUtil: stats|stats]]*** Cleanup usage/parameter descriptions** Update [[BamUtil: revert|revert]]*** Update compatibility with libStatGen due to 'B' & 'f' tag handling updates** Add tests for 'B' & 'f' tags
* [[Media:BamUtil.1.0.10.tar.gz|BamUtil.1.0.10.tar.gz‎]] - Released 1/2/2014
** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.10]]
** All
*** Add PhoneHome/version checking
*** Make sub-program names case independent
*** Fix Logger.cpp compiler warning
** Adds: [[BamUtil: explainFlags|explainFlags]] - describes the SAM/BAM flags based on the flag value
** Update to [[BamUtil: stats|stats]]
*** Fix Stats to not try to not try to process a record after it is out of the loop (it would already have been processed or is invalid)
** Update to [[BamUtil: splitBam|splitBam]]
*** fix description of --noeof option
** Update to [[BamUtil: writeRegion|writeRegion]]
*** add exclude/required flags
** Update to [[BamUtil: dedup|dedup]] & [[BamUtil: recab|recab]]
*** Ignore secondary reads for dedup and making the recalibration table.
*** skip QC Failures
*** add excludeFlags parameters
** Update to [[BamUtil: clipOverlap|clipOverlap]]
*** add exclude flags
*** fix bug for readName sorted when a read is filtered due to flags
*** add sorting validation
** Update to [[BamUtil: bam2FastQ|bam2FastQ]]
*** add --merge option to generate interleaved files.
*** update to open the input file before opening the output files, so if there is an error, the outputs aren't opened
** Update to [[BamUtil: mergeBam|mergeBam]]
*** add option to ignore the RG PI field when checking headers
*** add more informative header merge error messages
* Rewrite SAM/BAM Files** [[Media:BamUtil: convert.1.0.9.tgz|'''convert''' BamUtil.1.0.9.tgz‎]] - Read a SAMReleased 7/BAM file and write as a SAM7/BAM file (optionally converts between '=' & bases in the sequence)]]2013** Requires, but does not include: [[BamUtil: splitChromosomeLibStatGen Download#Official Releases|'''splitChromosome''' - Split BAM by ChromosomelibStatGen version 1.0.9]](version 1.0.7 should also work)** Update to [[BamUtil: writeRegionmergeBam|'''writeRegion''' - Write the alignments in the indexed BAM file that fall into the specified region and/or have the specified read namemergeBam]]** [[BamUtil: findCigars|'''findCigars''' - Output just the reads that contain any * Update to ignore PG lines with duplicate IDs*** Update to accept merges of the specified CIGAR operations]]matching RG lines** [[BamUtil: readIndexedBam|'''readIndexedBam''' - Read an indexed BAM file reference by reference id -1 * Update to log to the max reference id and write it stderr if no log/out as a SAM/BAM file]]is specified
* Modify & write SAM[[Media:BamUtil.1.0.7.tgz|BamUtil.1.0.7.tgz‎]] - Released 1/29/BAM Files2013** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.7]] or above** Update to fix some compile issues on ubuntu 12.10** Update use of SamRecord::getStringTag to expect the return of a const string pointer due to libStatGen v1.0.7 updates** Update SamReferenceInfo usage due to libStatGen v1.0.7 updates** Update to [[BamUtil: diff|diff]]** * Fix DIFF to test and properly handle running out of available records. Previously no message was printed when this happened and there was a bug for which file it freed** Update to [[BamUtil: filterclipOverlap|clipOverlap]]*** Update to facilitate adding other overlap handling functions** Update to [[BamUtil: mergeBam|mergeBam]] (formerly RGMergeBam)*** Rename RGMergeBam to MergeBam*** Update to handle files that already have an RG*[[Media:BamUtil.1.0.6.tgz|'''filter''' BamUtil.1.0.6.tgz‎]] - Filter reads by clipping ends with too high Released 11/14/2012** Update to [[BamUtil: trimBam|trimBam]]*** Update to allow trimming a different number of a mismatch percentage and by marking reads unmapped bases from each end of the read*[[Media:BamUtil.1.0.5.tgz|BamUtil.1.0.5.tgz‎]] - Released 10/24/2012** Update to [[BamUtil: dedup|dedup]]*** Update logic for which pair to keep if they have the same quality ** Update to [[BamUtil: polishBam|polishBam]]*** Update to print the number of mismatches successful header additions** Update to [[BamUtil: recab|recab]]*** Update to print the number of base skipped due to the base quality** General Updates*** Update to add compile option to compile without C++0x/C++11*BamUtil.1.0.4.tgz‎ - Released skipped*[[Media:BamUtil.1.0.3.tgz|BamUtil.1.0.3.tgz‎]] - Released 09/19/2012** Adds: [[BamUtil: dedup|dedup]] [[BamUtil: recab|recab]]** General Updates*** Update Logger to write to stderr if output is too highstdout** Update to [[BamUtil: stats|stats]]*** Add required/exclude flags*** Exclude Clips if excluding umapped*** Add --withinRegion flag*** Update phred/qual counts to be uint64_t instead of int to avoid overflow** Update to [[BamUtil: validate|validate]]** * Detect header failures** Update to [[BamUtil: revertdiff|'''revert''' - Revert SAMdiff]]*** Update to specify chromosome/BAM replacing pos in ZP as a string rather than int so both can be shown** Update to [[BamUtil: readReference|readReference]]*** Output error message if the specified fields with their previous values (if known) reference name is not found** Update to [[BamUtil: splitChromosome|splitChromosome]]*** Update to actually split the chromosomes and removes specified tagsnot just hard coded to output chromosomes ids 0-22** Update Makefile to have cloneLib for cloning libStatGen*[[Media:BamUtil.1.0.2.tgz|BamUtil.1.0.2.tgz‎]]- Released 05/16/2012** Adds: [[BamUtil: squeezebam2FastQ|bam2FastQ]]*[[Media:BamUtil.1.0.1.tgz|'''squeeze''' BamUtil.1.0.1.tgz‎]] - Released 05/04/2012** Adds: [[BamUtil: splitBam|splitBam]], [[BamUtil: clipOverlap|clipOverlap]], reduces files size by dropping OQ [[BamUtil: trimBam|trimBam]], [[BamUtil: polishBam|polishBam]], [[BamUtil: rgMergeBam|rgMergeBam]], [[BamUtil: gapInfo|gapInfo]]** Adds additional functionality to [[BamUtil: stats|stats]]** Adds leftShifting to [[BamUtil: writeRegion|writeRegion]] and [[BamUtil: convert|convert]]** Adds more diff fieldsto [[BamUtil: diff|diff]]*[[Media:BamUtil.1.0.0.tgz|BamUtil.1.0.0.tgz‎]] - Released 10/10/2011**Initial release of just bamUtil. It started from the tool found in the deprecated StatGen repository.**Contains: [[BamUtil: validate|validate]], [[BamUtil: convert|convert]], duplicates[[BamUtil: dumpHeader|dumpHeader]], specified tags[[BamUtil: splitChromosome|splitChromosome]], using '=' when a base matches the reference[[BamUtil: writeRegion|writeRegion]], [[BamUtil: dumpRefInfo|dumpRefInfo]], [[BamUtil: dumpIndex|dumpIndex]], [[BamUtil: readIndexedBam|readIndexedBam]], [[BamUtil: filter|filter]], [[BamUtil: readReference|readReference]], [[BamUtil: revert|revert]], [[BamUtil: diff|diff]], [[BamUtil: squeeze|squeeze]], binning quality scores[[BamUtil: findCigars|findCigars]], and replacing readNames with unique integers[[BamUtil: stats|stats]]
* Informational Tools== Citation ==** [[If you use BamUtil, please cite our publication on GotCloud which includes BamUtil: validate|'''validate''' - Read and Validate a SAM/BAM file]]** [[BamUtilhttp: diff|'''diff''' - Print the diffs between 2 bams]]** [[BamUtil//genome.cshlp.org/content/early/2015/04/14/gr.176552.114.abstract Jun, Goo, et al. "An efficient and scalable analysis framework for variant extraction and refinement from population scale DNA sequence data." Genome research (2015): stats|'''stats''' gr- Print the diffs between 2 bams]176552.]
* Print Information in Readable Form:
** [[BamUtil: dumpHeader|'''dumpHeader''' - Print SAM/BAM header]]
** [[BamUtil: dumpRefInfo|'''dumpRefInfo''' - Print SAM/BAM Reference Information]]
** [[BamUtil: dumpIndex|'''dumpIndex''' - Dump a BAM index file into an easy to read text version]]
** [[BamUtil: readReference|'''readReference''' - Print the reference string for the specified region]]
= Programs =
This executable The software reads the beginning of an input file to determine if it is built using [[C++ Library: libStatGen]]SAM/BAM. To determine the format (SAM/BAM) of the output file, the software checks the output file's extension. If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file.
Just running ./bam will print the Usage information for the bam executable.{{BamUtilPrograms}}

Navigation menu