Line 10: |
Line 10: |
| == Getting Help == | | == Getting Help == |
| | | |
− | If you have any questions please use the [http://groups.google.com/group/bamUtils bamUtil Google Group] to ask questions or recommend improvements to bamUtil. | + | If you have any questions please use the [https://github.com/statgen/bamUtil bamUtil GitHub page] to raise and issue. |
− | | |
− | Alternatively, you can e-mail me, Mary Kate Wing, at mktrost@umich.edu.
| |
| | | |
| + | See [[BamUtil: FAQ]] to see if your question has already been answered. |
| | | |
| == Where to Find It == | | == Where to Find It == |
Line 27: |
Line 26: |
| | | |
| To install an official release, unpack the downloaded file (tar xvf), cd into the bamUtil_x.x.x directory and type make all. | | To install an official release, unpack the downloaded file (tar xvf), cd into the bamUtil_x.x.x directory and type make all. |
| + | |
| + | For version 1.0.14 and later, please download libStatGen and bamUtil separately: |
| | | |
| | | |
− | [[Media:BamUtilLibStatGen.1.0.9.tgz|BamUtilLibStatGen.1.0.9.tgz]] - Released 7/7/2013 | + | '''Version 1.0.14 - Released 7/8/2015''' |
| + | *[[LibStatGen Download#Official Releases|libStatGen version 1.0.14]] |
| + | *[[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.14]] |
| | | |
− | '''BamUtilLibStatGen.1.0.9 Release Notes'''
| |
− | * Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.9]]
| |
− | * Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.9]]
| |
− | * Update to [[BamUtil: mergeBam|mergeBam]]
| |
− | ** Update to ignore PG lines with duplicate IDs
| |
− | ** Update to accept merges of matching RG lines
| |
− | ** Update to log to stderr if no log/out file is specified
| |
| | | |
| '''Older Releases''' | | '''Older Releases''' |
− | * There is no version 1.0.8. It was skipped to stya in line with libStatGen versions (libStatGen 1.0.8 added vcf support) | + | * [[Media:BamUtilLibStatGen.1.0.13.tgz|BamUtilLibStatGen.1.0.13.tgz]] - Released 2/20/2015 |
| + | ** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.13]] - see link for version updates |
| + | ** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.13]] - see link for version updates |
| + | |
| + | |
| + | * [[Media:BamUtilLibStatGen.1.0.12.tar.gz|BamUtilLibStatGen.1.0.12.tgz]] - Released 5/14/2014 |
| + | ** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.12]] - see link for version updates |
| + | ** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.12]] - see link for version updates |
| + | ** Adds regions to [[BamUtil: mergeBam|mergeBam]] |
| + | ** Accept ',' delimiters for the tags string input in [[BamUtil: squeeze|squeeze]], [[BamUtil: revert|revert]], & [[BamUtil: diff|diff]] |
| + | |
| + | *[[Media:BamUtilLibStatGen.1.0.11.tar.gz|BamUtilLibStatGen.1.0.11.tar.gz]] - Released 2/28/2014 |
| + | ** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.11]] - see link for version updates |
| + | ** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.11]] - see link for version updates |
| + | ** Now properly supports 'B' & 'f' tags |
| + | ** Cleanup - compile issues |
| + | |
| + | *[[Media:BamUtilLibStatGen.1.0.10.tar.gz|BamUtilLibStatGen.1.0.10.tar.gz]] - Released 1/2/2014 |
| + | ** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.10]] - see link for version updates |
| + | ** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.10]] - see link for version updates |
| + | ** Adds PhoneHome/Version checking. |
| + | |
| + | *[[Media:BamUtilLibStatGen.1.0.9.tgz|BamUtilLibStatGen.1.0.9.tgz]] - Released 7/7/2013 |
| + | ** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.9]] |
| + | ** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.9]] |
| + | ** Update to [[BamUtil: mergeBam|mergeBam]] |
| + | *** Update to ignore PG lines with duplicate IDs |
| + | *** Update to accept merges of matching RG lines |
| + | *** Update to log to stderr if no log/out file is specified |
| + | * There is no version 1.0.8. It was skipped to stay in line with libStatGen versions (libStatGen 1.0.8 added vcf support) |
| *[[Media:BamUtilLibStatGen.1.0.7.tgz|BamUtilLibStatGen.1.0.7.tgz]] - Released 1/29/2013 | | *[[Media:BamUtilLibStatGen.1.0.7.tgz|BamUtilLibStatGen.1.0.7.tgz]] - Released 1/29/2013 |
| ** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.7]] | | ** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.7]] |
Line 83: |
Line 108: |
| **Initial release of bamUtil that includes libStatGen version 1.0.0. It started from the tool found in the deprecated StatGen repository. | | **Initial release of bamUtil that includes libStatGen version 1.0.0. It started from the tool found in the deprecated StatGen repository. |
| **Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.0]] [[BamUtil: validate|validate]], [[BamUtil: convert|convert]], [[BamUtil: dumpHeader|dumpHeader]], [[BamUtil: splitChromosome|splitChromosome]], [[BamUtil: writeRegion|writeRegion]], [[BamUtil: dumpRefInfo|dumpRefInfo]], [[BamUtil: dumpIndex|dumpIndex]], [[BamUtil: readIndexedBam|readIndexedBam]], [[BamUtil: filter|filter]], [[BamUtil: readReference|readReference]], [[BamUtil: revert|revert]], [[BamUtil: diff|diff]], [[BamUtil: squeeze|squeeze]], [[BamUtil: findCigars|findCigars]], [[BamUtil: stats|stats]] | | **Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.0]] [[BamUtil: validate|validate]], [[BamUtil: convert|convert]], [[BamUtil: dumpHeader|dumpHeader]], [[BamUtil: splitChromosome|splitChromosome]], [[BamUtil: writeRegion|writeRegion]], [[BamUtil: dumpRefInfo|dumpRefInfo]], [[BamUtil: dumpIndex|dumpIndex]], [[BamUtil: readIndexedBam|readIndexedBam]], [[BamUtil: filter|filter]], [[BamUtil: readReference|readReference]], [[BamUtil: revert|revert]], [[BamUtil: diff|diff]], [[BamUtil: squeeze|squeeze]], [[BamUtil: findCigars|findCigars]], [[BamUtil: stats|stats]] |
− |
| |
| | | |
| === Release of just BamUtil (does not include libStatGen) === | | === Release of just BamUtil (does not include libStatGen) === |
Line 89: |
Line 113: |
| To install an official release, unpack the downloaded file (tar xvf), cd into the bamUtil_x.x.x directory and type make all. | | To install an official release, unpack the downloaded file (tar xvf), cd into the bamUtil_x.x.x directory and type make all. |
| | | |
− | * [[Media:BamUtil.1.0.9.tgz|BamUtil.1.0.9.tgz]] - Released 7/7/2013 | + | '''BamUtil.1.0.14 Release Notes''' |
| + | * BamUtil Version 1.0.14 - Released 7/8/2015 |
| + | ** https://github.com/statgen/bamUtil/archive/v1.0.14.tar.gz |
| + | ** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.14]] |
| + | ** Update [[BamUtil: trimBam|trimBam]] |
| + | *** Add option to soft clip (-c) instead of trimming |
| + | ** Update [[BamUtil: clipOverlap|clipOverlap]] |
| + | *** Add option to mark reads as unmapped if they are entirely clipped |
| + | ** Update to [[BamUtil: bam2FastQ|bam2FastQ]] |
| + | *** Add option to gzip the output files |
| + | *** Add option to split Read Groups into separate fastq files |
| + | *** Add option to get the quality from a tag |
| + | ** Update [[BamUtil: recab|recab]] |
| + | *** Update to ignore ref 'N' when building the recalibration table |
| + | *** Add ability to bin |
| + | ** Add Dedup_LowMem tool |
| + | |
| + | '''Older Releases''' |
| + | * BamUtil Version 1.0.13 - Released 2/20/2015 |
| + | ** https://github.com/statgen/bamUtil/archive/v1.0.13.tar.gz |
| + | ** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.13]] |
| + | ** Makefile Updates |
| + | *** Improve logic to determine actual path for the library |
| + | *** Update to append to USER_COMPILE_VARS even if specified on the command line |
| + | ** Update [[BamUtil: writeRegion|writeRegion]] |
| + | *** Add option to specify readnames to keep in a file |
| + | *** Fixed bug that if a read overlapped 2 BED positions, it was printed twice |
| + | ** Update to [[BamUtil: bam2FastQ|bam2FastQ]] |
| + | *** Update to skip non-primary reads |
| + | ** Update to [[BamUtil: polishBam|polishBam]] |
| + | *** Update to handle '\t' string inputs and to add CO option |
| + | *** Fix MD5sum calculation to convert fasta to uppercase prior to calculating |
| + | |
| + | * [[Media:BamUtil.1.0.12.tgz|BamUtil.1.0.12.tgz]] - Released 5/14/2014 |
| + | ** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.12]] |
| + | ** Update [[BamUtil: mergeBam|mergeBam]] |
| + | *** Add a regions option |
| + | ** Update to [[BamUtil: squeeze|squeeze]], [[BamUtil: revert|revert]], [[BamUtil: diff|diff]] |
| + | *** Also accept ',' instead of just ';' as the delimiter in the input tags string. |
| | | |
− | '''BamUtil.1.0.9 Release Notes'''
| + | * [[Media:BamUtil.1.0.11.tgz|BamUtil.1.0.11.tgz]] - Released 2/28/2014 |
− | * Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.9]] (version 1.0.7 should also work) | + | ** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.11]] |
− | * Update to [[BamUtil: mergeBam|mergeBam]] | + | *** Adds support for 'B' & 'f' tags that did not work properly before. |
− | ** Update to ignore PG lines with duplicate IDs | + | ** Update [[BamUtil: splitBam|splitBam]] & [[BamUtil: polishBam|polishBam]] |
− | ** Update to accept merges of matching RG lines | + | *** Update to work properly if log & output file are not specified (no longer creates '.log') |
− | ** Update to log to stderr if no log/out file is specified | + | ** Update Main dummy/example tool to indicate the correct tool |
| + | ** Update to [[BamUtil: bam2FastQ|bam2FastQ]], [[BamUtil: clipOverlap|clipOverlap]], [[BamUtil: filter|filter]], [[BamUtil: mergeBam|mergeBam]], [[BamUtil: splitBam|splitBam]], [[BamUtil: squeeze|squeeze]], [[BamUtil: stats|stats]] |
| + | *** Cleanup usage/parameter descriptions |
| + | ** Update [[BamUtil: revert|revert]] |
| + | *** Update compatibility with libStatGen due to 'B' & 'f' tag handling updates |
| + | ** Add tests for 'B' & 'f' tags |
| | | |
| + | * [[Media:BamUtil.1.0.10.tar.gz|BamUtil.1.0.10.tar.gz]] - Released 1/2/2014 |
| + | ** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.10]] |
| + | ** All |
| + | *** Add PhoneHome/version checking |
| + | *** Make sub-program names case independent |
| + | *** Fix Logger.cpp compiler warning |
| + | ** Adds: [[BamUtil: explainFlags|explainFlags]] - describes the SAM/BAM flags based on the flag value |
| + | ** Update to [[BamUtil: stats|stats]] |
| + | *** Fix Stats to not try to not try to process a record after it is out of the loop (it would already have been processed or is invalid) |
| + | ** Update to [[BamUtil: splitBam|splitBam]] |
| + | *** fix description of --noeof option |
| + | ** Update to [[BamUtil: writeRegion|writeRegion]] |
| + | *** add exclude/required flags |
| + | ** Update to [[BamUtil: dedup|dedup]] & [[BamUtil: recab|recab]] |
| + | *** Ignore secondary reads for dedup and making the recalibration table. |
| + | *** skip QC Failures |
| + | *** add excludeFlags parameters |
| + | ** Update to [[BamUtil: clipOverlap|clipOverlap]] |
| + | *** add exclude flags |
| + | *** fix bug for readName sorted when a read is filtered due to flags |
| + | *** add sorting validation |
| + | ** Update to [[BamUtil: bam2FastQ|bam2FastQ]] |
| + | *** add --merge option to generate interleaved files. |
| + | *** update to open the input file before opening the output files, so if there is an error, the outputs aren't opened |
| + | ** Update to [[BamUtil: mergeBam|mergeBam]] |
| + | *** add option to ignore the RG PI field when checking headers |
| + | *** add more informative header merge error messages |
| + | |
| + | * [[Media:BamUtil.1.0.9.tgz|BamUtil.1.0.9.tgz]] - Released 7/7/2013 |
| + | ** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.9]] (version 1.0.7 should also work) |
| + | ** Update to [[BamUtil: mergeBam|mergeBam]] |
| + | *** Update to ignore PG lines with duplicate IDs |
| + | *** Update to accept merges of matching RG lines |
| + | *** Update to log to stderr if no log/out file is specified |
| | | |
− | '''Older Releases'''
| |
| *[[Media:BamUtil.1.0.7.tgz|BamUtil.1.0.7.tgz]] - Released 1/29/2013 | | *[[Media:BamUtil.1.0.7.tgz|BamUtil.1.0.7.tgz]] - Released 1/29/2013 |
| ** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.7]] or above | | ** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.7]] or above |
Line 153: |
Line 253: |
| **Initial release of just bamUtil. It started from the tool found in the deprecated StatGen repository. | | **Initial release of just bamUtil. It started from the tool found in the deprecated StatGen repository. |
| **Contains: [[BamUtil: validate|validate]], [[BamUtil: convert|convert]], [[BamUtil: dumpHeader|dumpHeader]], [[BamUtil: splitChromosome|splitChromosome]], [[BamUtil: writeRegion|writeRegion]], [[BamUtil: dumpRefInfo|dumpRefInfo]], [[BamUtil: dumpIndex|dumpIndex]], [[BamUtil: readIndexedBam|readIndexedBam]], [[BamUtil: filter|filter]], [[BamUtil: readReference|readReference]], [[BamUtil: revert|revert]], [[BamUtil: diff|diff]], [[BamUtil: squeeze|squeeze]], [[BamUtil: findCigars|findCigars]], [[BamUtil: stats|stats]] | | **Contains: [[BamUtil: validate|validate]], [[BamUtil: convert|convert]], [[BamUtil: dumpHeader|dumpHeader]], [[BamUtil: splitChromosome|splitChromosome]], [[BamUtil: writeRegion|writeRegion]], [[BamUtil: dumpRefInfo|dumpRefInfo]], [[BamUtil: dumpIndex|dumpIndex]], [[BamUtil: readIndexedBam|readIndexedBam]], [[BamUtil: filter|filter]], [[BamUtil: readReference|readReference]], [[BamUtil: revert|revert]], [[BamUtil: diff|diff]], [[BamUtil: squeeze|squeeze]], [[BamUtil: findCigars|findCigars]], [[BamUtil: stats|stats]] |
| + | |
| + | == Citation == |
| + | If you use BamUtil, please cite our publication on GotCloud which includes BamUtil: |
| + | [http://genome.cshlp.org/content/early/2015/04/14/gr.176552.114.abstract Jun, Goo, et al. "An efficient and scalable analysis framework for variant extraction and refinement from population scale DNA sequence data." Genome research (2015): gr-176552.] |
| + | |
| | | |
| = Programs = | | = Programs = |
Line 158: |
Line 263: |
| The software reads the beginning of an input file to determine if it is SAM/BAM. To determine the format (SAM/BAM) of the output file, the software checks the output file's extension. If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file. | | The software reads the beginning of an input file to determine if it is SAM/BAM. To determine the format (SAM/BAM) of the output file, the software checks the output file's extension. If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file. |
| | | |
− | The bam executable has the following functions.
| + | {{BamUtilPrograms}} |
− | | |
− | | |
− | * Rewrite SAM/BAM Files
| |
− | ** [[BamUtil: convert|'''convert''' - Read a SAM/BAM file and write as a SAM/BAM file (optionally converts between '=' & bases in the sequence)]]
| |
− | ** [[BamUtil: writeRegion|'''writeRegion''' - Write the alignments in the indexed BAM file that fall into the specified region and/or have the specified read name]]
| |
− | ** [[BamUtil: splitChromosome|'''splitChromosome''' - Split BAM by Chromosome]]
| |
− | ** [[BamUtil: splitBam|'''splitBam''' - Split SAM/BAM file by Read Group]]
| |
− | ** [[BamUtil: findCigars|'''findCigars''' - Output just the reads that contain any of the specified CIGAR operations]]
| |
− | | |
− | * Modify & write SAM/BAM Files
| |
− | ** [[BamUtil: clipOverlap|'''clipOverlap''' - Clip overlapping read pairs so they do not overlap]]
| |
− | ** [[BamUtil: filter|'''filter''' - Filter reads by clipping ends with too high of a mismatch percentage and by marking reads unmapped if the quality of mismatches is too high]]
| |
− | ** [[BamUtil: revert|'''revert''' - Revert SAM/BAM replacing the specified fields with their previous values (if known) and removes specified tags]]
| |
− | ** [[BamUtil: squeeze|'''squeeze''' - Reduce file size by dropping OQ fields, duplicates, specified tags, using '=' when a base matches the reference, binning quality scores, and replacing readNames with unique integers]]
| |
− | ** [[BamUtil: trimBam|'''trimBam''' - Trim end of reads, changing read ends to ‘N’ & quality to ‘!’]]
| |
− | **[[BamUtil: polishBam|'''polishBam''' – Add/Update header lines & add RG tag to each record]]
| |
− | **[[BamUtil: rgMergeBam|'''rgMergeBam''' – Merge sorted BAM files adding Read Groups]]
| |
− | **[[BamUtil: dedup|'''dedup''' – Mark or remove duplicates, can also perform recalibration]]
| |
− | **[[BamUtil: recab|'''recab''' - Recalibrate base qualities]]
| |
− | | |
− | * Informational Tools
| |
− | ** [[BamUtil: validate|'''validate''' - Read and Validate a SAM/BAM file]]
| |
− | ** [[BamUtil: diff|'''diff''' - Print the diffs between 2 bams]]
| |
− | ** [[BamUtil: stats|'''stats''' - Print some basic statistics on a SAM/BAM file]]
| |
− | ** [[BamUtil: gapInfo|'''gapInfo''' - Print information on the gap between read pairs in a SAM/BAM file]]
| |
− | | |
− | * Print Information in Readable Form:
| |
− | ** [[BamUtil: dumpHeader|'''dumpHeader''' - Print SAM/BAM header]]
| |
− | ** [[BamUtil: dumpRefInfo|'''dumpRefInfo''' - Print SAM/BAM Reference Information]]
| |
− | ** [[BamUtil: dumpIndex|'''dumpIndex''' - Dump a BAM index file into an easy to read text version]]
| |
− | ** [[BamUtil: readReference|'''readReference''' - Print the reference string for the specified region]]
| |
− | | |
− | *Additional Tools
| |
− | ** [[BamUtil: bam2FastQ|'''bam2FastQ''' - Convert the specified BAM file to fastQs]]
| |
− | | |
− | * Dummy/Example Tools:
| |
− | ** [[BamUtil: readIndexedBam|'''readIndexedBam''' - Read an indexed BAM file reference by reference id -1 to the max reference id and write it out as a SAM/BAM file]]
| |
− | | |
− | This executable is built using [[C++ Library: libStatGen]].
| |
− | | |
− | Just running ./bam will print the Usage information for the bam executable.
| |