Changes

From Genome Analysis Wiki
Jump to: navigation, search

BamUtil

307 bytes removed, 17:14, 11 September 2021
no edit summary
[[Category:bamUtil]]
[[Category:C++]]
[[Category:Software]]
[[Category:StatGen Download]]
[[Category:BAM Software]]
= bam Executable bamUtil Overview =When statgen is compiled, the SAM/BAM executable, "bam" is generated in the statgen/src/bin/ directory.
The software reads the beginning of an input file to determine if it bamUtil is a repository that contains several programs that perform operations on SAM/BAMfiles. To determine the format (SAM/BAM) All of the output filethese programs are built into a single executable, the software checks the output file's extension. If the extension is ".<code>bam" it writes a BAM file, otherwise it writes a SAM file</code>.
The bam executable has the following functions.
* [[C++ Executable: bam#validate|validate - Read and Validate a SAM/BAM file]]
* [[C++ Executable: bam#convert|convert - Read a SAM/BAM file and write as a SAM/BAM file]]
* [[C++ Executable: bam#dumpHeader|dumpHeader - Print SAM/BAM header]]
* [[C++ Executable: bam#splitChromosome|splitChromosome - Split BAM by Chromosome]]
* [[C++ Executable: bam#writeRegion|writeRegion - Write the alignments in the indexed BAM file that fall into the specified region]]
* [[C++ Executable: bam#dumpRefInfo|dumpRefInfo - Print SAM/BAM Reference Information]]
* [[C++ Executable: bam#dumpIndex|dumpIndex - Dump a BAM index file into an easy to read text version]]
* [[C++ Executable: bam#readIndexedBam|readIndexedBam - Read an indexed BAM file reference by reference id -1 to the max reference id and write it out as a SAM/BAM file]]
* [[C++ Executable: bam#filter|filter - Filter reads by clipping ends with too high of a mismatch percentage and by marking reads unmapped if the quality of mismatches is too high]]
* [[C++ Executable: bam#readReference|readReference - Print the reference string for the specified region]]
* [[C++ Executable: bam#diff|diff - Print the diffs between 2 bams]]
This executable is built using [[StatGenLibrary: BAM]].== Getting Help ==
Just running If you have any questions please use the [https://github.com/statgen/bam will print the Usage information for the bam executablebamUtil bamUtil GitHub page] to raise and issue.
See [[BamUtil: FAQ]] to see if your question has already been answered.
== validate Where to Find It =={{ToolGitRepo|repoName=bamUtil}}
The <code>validate</code> option on the bam executable reads and validates a SAM/BAM file. This option is documented at: [[BamValidator]]== Releases ==
== convert ==The <code>convert</code> option on If you prefer to run the bam executable reads a SAM/BAM file and writes it as a SAM/BAM filelast official release rather than the latest development version, you can download that here.
The executable converts the input file into the format There are two versions of the output filerelease, one that include libStatGen and one that does not. So if If you already have libStatGen installed and want to convert a BAM file to a SAM fileuse your own copy, from the pipeline/bam/ directory you just call: ./bam --in <bamFile>.bam --out <newSamFile>.samDon't forget to put in use the paths to the executable and your test filesversion that does not include libStatGen.
=== Sequence Representation Full Release (includes libStatGen) ===The sequence parameter options specify how to represent the sequence if the reference is specified (refFile option). If the reference is not specified or seqOrig is specified, no modifications are made to the sequence. If the reference and seqBases is specified, any matches between the sequence and the reference are represented in the sequence as the appropriate base. If the reference and seqEquals is specified, any matches between the sequence and the reference are represented in the sequence as '='.
==== Examples ==== ExtendedCigar: SSMMMDDMMMIMNNNMPMSSS Sequence: AATAA CTAGA T AGGG Reference: TAACCCTA ACCCT A Sequence with Orig: AATAACTAGATAGGG Sequence with Bases: AATAACTAGATAGGG Sequence with Equals: AA======G===GGGTo install an official release, unpack the downloaded file (tar xvf), cd into the bamUtil_x.x.x directory and type make all.
ExtendedCigarFor version 1.0.14 and later, please download libStatGen and bamUtil separately: SSMMMDDMMMIMNNNMPMSSS Sequence: AATGA CTGGA T AGGG Reference: TAACCCTA ACCCT A Sequence with Orig: AATGACTGGATAGGG Sequence with Bases: AATGACTGGATAGGG Sequence with Equals: AA=G===GG===GGG
ExtendedCigar: SSMMMDDMMMIMNNNMPMSSS
Sequence: AAT=A CT=GA T AGGG
Reference: TAACCCTA ACCCT A
Sequence with Orig: AAT=ACT=GATAGGG
Sequence with Bases: AATGACTGGATAGGG
Sequence with Equals: AA======G===GGG
ExtendedCigar: SSMMMDDMMMIMNNNMPMSSS'''Version 1.0.14 - Released 7/8/2015''' Sequence: AA=== ===G= = =GGG*[[LibStatGen Download#Official Releases|libStatGen version 1.0.14]] Reference: TAACCCTA ACCCT A Sequence with Orig: AA======G===GGG Sequence with Bases: AATAACTAGATAGGG Sequence with Equals: AA======G===GGG*[[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.14]]
=== Parameters ===
<pre>
Required Parameters:
--in : the SAM/BAM file to be read
--out : the SAM/BAM file to be written
Optional Parameters:
--refFile : reference file name
--noeof : do not expect an EOF block on a bam file.
--params : print the parameter settings
Optional Sequence Parameters (only specify one):
--seqOrig : Leave the sequence as is (default & used if reference is not specified).
--seqBases : Convert any '=' in the sequence to the appropriate base using the reference (requires --ref).
--seqEquals : Convert any bases that match the reference to '=' (requires --ref).
</pre>
=== Usage ==='''Older Releases''' * [[Media:BamUtilLibStatGen.1.0.13.tgz|BamUtilLibStatGen.1.0.13./bam convert --in <inputFile> -tgz‎]] -out <outputFile.samReleased 2/bam20/ubam (ubam is uncompressed bam)> 2015** Contains: [--refFile <reference filename>] [--seqBasesLibStatGen Download#Official Releases|libStatGen version 1.0.13]] --seqEqualssee link for version updates** Contains: [[#Release of just BamUtil (does not include libStatGen)|--seqOrigbamUtil version 1.0.13] [--noeof] [--params]see link for version updates
=== Return Value ===* [[Media:BamUtilLibStatGen.1.0.12.tar.gz|BamUtilLibStatGen.1.0.12.tgz‎]] - Released 5/14/2014Returns the SamStatus ** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.12]] - see link for version updates** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.12]] - see link for version updates** Adds regions to [[BamUtil: mergeBam|mergeBam]]** Accept ',' delimiters for the reads/writes.tags string input in [[BamUtil: squeeze|squeeze]], [[BamUtil: revert|revert]], & [[BamUtil: diff|diff]]
=== Example Output ===*[[Media:BamUtilLibStatGen.1.0.11.tar.gz|BamUtilLibStatGen.1.0.11.tar.gz‎]] - Released 2/28/2014<pre>** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.11]] - see link for version updatesNumber ** Contains: [[#Release of records read = 10just BamUtil (does not include libStatGen)|bamUtil version 1.0.11]] - see link for version updatesNumber of records written = 10** Now properly supports 'B' & 'f' tags</pre>** Cleanup - compile issues
== dumpHeader ==*[[Media:BamUtilLibStatGen.1.0.10.tar.gz|BamUtilLibStatGen.1.0.10.tar.gz‎]] - Released 1/2/2014The <code>dumpHeader</code> option on the bam executable prints the header ** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.10]] - see link for version updates** Contains: [[#Release of the specified SAMjust BamUtil (does not include libStatGen)|bamUtil version 1.0.10]] - see link for version updates** Adds PhoneHome/BAM file to coutVersion checking.
=== Parameters ===*[[Media:BamUtilLibStatGen.1.0.9.tgz|BamUtilLibStatGen.1.0.9.tgz‎]] - Released 7/7/2013<pre>** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.9]] Required Parameters** Contains:[[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.9]] filename ** Update to [[BamUtil: the sammergeBam|mergeBam]]*** Update to ignore PG lines with duplicate IDs*** Update to accept merges of matching RG lines*** Update to log to stderr if no log/bam filename whose header should be printedout file is specified* There is no version 1.0.8. It was skipped to stay in line with libStatGen versions (libStatGen 1.0.8 added vcf support)<*[[Media:BamUtilLibStatGen.1.0.7.tgz|BamUtilLibStatGen.1.0.7.tgz‎]] - Released 1/pre>29/2013** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.7]] ** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.7]]** Update to fix some compile issues on ubuntu 12.10** Update use of SamRecord::getStringTag to expect the return of a const string pointer due to libStatGen v1.0.7 updates** Update SamReferenceInfo usage due to libStatGen v1.0.7 updates** Update to [[BamUtil: diff|diff]]*** Fix DIFF to test and properly handle running out of available records. Previously no message was printed when this happened and there was a bug for which file it freed** Update to [[BamUtil: clipOverlap|clipOverlap]]*** Update to facilitate adding other overlap handling functions** Update to [[BamUtil: mergeBam|mergeBam]] (formerly RGMergeBam)*** Rename RGMergeBam to MergeBam*** Update to handle files that already have an RG
=== Usage ===*[[Media:BamUtilLibStatGen.1.0.6.tgz|BamUtilLibStatGen.1.0.6.tgz‎]] - Released 11/14/2012** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.6]] ** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.6]]** Update to [[BamUtil: trimBam|trimBam]]*** Update to allow trimming a different number of bases from each end of the read*[[Media:BamUtilLibStatGen.1.0.5.tgz|BamUtilLibStatGen.1.0.5.tgz‎]] - Released 10/24/2012** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.5]] ** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.5]]** Updates to: [[BamUtil: dedup|dedup]], [[BamUtil: polishBam|polishBam]], [[BamUtil: recab|recab]]** Update to add compile option to compile without C++0x/C++11** See [[#Release of just BamUtil (does not include libStatGen)|below]] for additional details on updates*BamUtilLibStatGen.1.0.4.tgz‎ - Released skipped*[[Media:BamUtilLibStatGen.1.0.3.tgz|BamUtilLibStatGen.1.0.3.tgz‎]] - Released 09/19/2012** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.3]] ** Contains: [[#Release of just BamUtil (does not include libStatGen)|bamUtil version 1.0.3]]** Adds: [[BamUtil: dedup|dedup]] [[BamUtil: recab|recab]]*[[Media:BamUtilLibStatGen.1.0.2.tgz|BamUtilLibStatGen.1.0.2.tgz‎]] - Released 05/16/2012** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.2]] ** Adds: [[BamUtil: bam2FastQ|bam2FastQ]]*[[Media:BamUtilLibStatGen.1.0.1.tgz|BamUtilLibStatGen.1.0.1.tgz‎]] - Released 05/04/2012** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.1]] ** Adds: [[BamUtil: splitBam|splitBam]], [[BamUtil: clipOverlap|clipOverlap]], [[BamUtil: trimBam|trimBam]], [[BamUtil: polishBam|polishBam]], [[BamUtil: rgMergeBam|rgMergeBam]], [[BamUtil: gapInfo|gapInfo]]** Adds additional functionality to [[BamUtil: stats|stats]]** Adds leftShifting to [[BamUtil: writeRegion|writeRegion]] and [[BamUtil: convert|convert]]** Adds more diff fields to [[BamUtil: diff|diff]]* [[Media:BamUtilLibStatGen.1.0.0.tgz|BamUtilLibStatGen.1.0.0.tgz‎]] - Released 10/10/2011**Initial release of bamUtil that includes libStatGen version 1.0.0. It started from the tool found in the deprecated StatGen repository.**Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.0]] [[BamUtil: validate|validate]], [[BamUtil: convert|convert]], [[BamUtil: dumpHeader|dumpHeader]], [[BamUtil: splitChromosome|splitChromosome]], [[BamUtil: writeRegion|writeRegion]], [[BamUtil: dumpRefInfo|dumpRefInfo]], [[BamUtil: dumpIndex|dumpIndex]], [[BamUtil: readIndexedBam|readIndexedBam]], [[BamUtil: filter|filter]], [[BamUtil: readReference|readReference]], [[BamUtil: revert|revert]], [[BamUtil: diff|diff]], [[BamUtil: squeeze|squeeze]], [[BamUtil: findCigars|findCigars]], [[BamUtil: stats|stats]]
./bam dumpHeader <inputFile>=== Release of just BamUtil (does not include libStatGen) ===
=== Return Value ===* 0: To install an official release, unpack the header was successfully read and printeddownloaded file (tar xvf), cd into the bamUtil_x.* non-0: the header was not successfully read or was not printedx. (Returns the SamStatusx directory and type make all.)
'''BamUtil.1.0.14 Release Notes'''
* BamUtil Version 1.0.14 - Released 7/8/2015
** https://github.com/statgen/bamUtil/archive/v1.0.14.tar.gz
** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.14]]
** Update [[BamUtil: trimBam|trimBam]]
*** Add option to soft clip (-c) instead of trimming
** Update [[BamUtil: clipOverlap|clipOverlap]]
*** Add option to mark reads as unmapped if they are entirely clipped
** Update to [[BamUtil: bam2FastQ|bam2FastQ]]
*** Add option to gzip the output files
*** Add option to split Read Groups into separate fastq files
*** Add option to get the quality from a tag
** Update [[BamUtil: recab|recab]]
*** Update to ignore ref 'N' when building the recalibration table
*** Add ability to bin
** Add Dedup_LowMem tool
=== Example Output ==='''Older Releases'''<pre>* BamUtil Version 1.0.13 - Released 2/20/2015** https://github.com/statgen/bamUtil/archive/v1.0.13.tar.gz@SQ SN** Requires, but does not include:[[LibStatGen Download#Official Releases|libStatGen version 1 LN.0.13]]** Makefile Updates*** Improve logic to determine actual path for the library*** Update to append to USER_COMPILE_VARS even if specified on the command line** Update [[BamUtil:247249719writeRegion|writeRegion]]@SQ SN:*** Add option to specify readnames to keep in a file*** Fixed bug that if a read overlapped 2 LNBED positions, it was printed twice** Update to [[BamUtil:242951149bam2FastQ|bam2FastQ]]@SQ SN*** Update to skip non-primary reads** Update to [[BamUtil:3 LN:199501827polishBam|polishBam]]*** Update to handle '\t' string inputs and to add CO option</pre>*** Fix MD5sum calculation to convert fasta to uppercase prior to calculating
* [[Media:BamUtil.1.0.12.tgz|BamUtil.1.0.12.tgz‎]] - Released 5/14/2014
** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.12]]
** Update [[BamUtil: mergeBam|mergeBam]]
*** Add a regions option
** Update to [[BamUtil: squeeze|squeeze]], [[BamUtil: revert|revert]], [[BamUtil: diff|diff]]
*** Also accept ',' instead of just ';' as the delimiter in the input tags string.
== splitChromosome ==* [[Media:BamUtil.1.0.11.tgz|BamUtil.1.0.11.tgz‎]] - Released 2/28/2014** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.11]]*** Adds support for 'B' & 'f' tags that did not work properly before.** Update [[BamUtil: splitBam|splitBam]] & [[BamUtil: polishBam|polishBam]]*** Update to work properly if log & output file are not specified (no longer creates '.log')** Update Main dummy/example tool to indicate the correct tool** Update to [[BamUtil: bam2FastQ|bam2FastQ]], [[BamUtil: clipOverlap|clipOverlap]], [[BamUtil: filter|filter]], [[BamUtil: mergeBam|mergeBam]], [[BamUtil: splitBam|splitBam]], [[BamUtil: squeeze|squeeze]], [[BamUtil: stats|stats]]*** Cleanup usage/parameter descriptions** Update [[BamUtil: revert|revert]]*** Update compatibility with libStatGen due to 'B' & 'f' tag handling updates** Add tests for 'B' & 'f' tags
The <code>splitChromosome<* [[Media:BamUtil.1.0.10.tar.gz|BamUtil.1.0.10.tar.gz‎]] - Released 1/code> option on 2/2014** Requires, but does not include: [[LibStatGen Download#Official Releases|libStatGen version 1.0.10]]** All*** Add PhoneHome/version checking*** Make sub-program names case independent*** Fix Logger.cpp compiler warning** Adds: [[BamUtil: explainFlags|explainFlags]] - describes the bam executable splits an indexed SAM/BAM file into multiple files flags based on the Chromosome flag value** Update to [[BamUtil: stats|stats]]*** Fix Stats to not try to not try to process a record after it is out of the loop (Reference Nameit would already have been processed or is invalid)** Update to [[BamUtil: splitBam|splitBam]]*** fix description of --noeof option** Update to [[BamUtil: writeRegion|writeRegion]]*** add exclude/required flags** Update to [[BamUtil: dedup|dedup]] & [[BamUtil: recab|recab]]*** Ignore secondary reads for dedup and making the recalibration table.*** skip QC Failures*** add excludeFlags parameters** Update to [[BamUtil: clipOverlap|clipOverlap]]*** add exclude flags*** fix bug for readName sorted when a read is filtered due to flags*** add sorting validation** Update to [[BamUtil: bam2FastQ|bam2FastQ]]*** add --merge option to generate interleaved files. *** update to open the input file before opening the output files, so if there is an error, the outputs aren't opened** Update to [[BamUtil: mergeBam|mergeBam]]*** add option to ignore the RG PI field when checking headers*** add more informative header merge error messages
The files all have the same base name* [[Media:BamUtil.1.0.9.tgz|BamUtil.1.0.9.tgz‎]] - Released 7/7/2013** Requires, but with an _does not include: [[LibStatGen Download# where # corresponds Official Releases|libStatGen version 1.0.9]] (version 1.0.7 should also work)** Update to [[BamUtil: mergeBam|mergeBam]]*** Update to ignore PG lines with the associated reference id from the BAM duplicate IDs*** Update to accept merges of matching RG lines*** Update to log to stderr if no log/out file.is specified
=== Parameters ===*[[Media:BamUtil.1.0.7.tgz|BamUtil.1.0.7.tgz‎]] - Released 1/29/2013<pre> Required Parameters** Requires, but does not include:[[LibStatGen Download#Official Releases|libStatGen version 1.0.7]] or above --in : the BAM file ** Update to be splitfix some compile issues on ubuntu 12.10 --out ** Update use of SamRecord:: getStringTag to expect the base filename for the SAM/BAM files return of a const string pointer due to write intolibStatGen v1. Does not include the extension0.7 updates _N will be appended ** Update SamReferenceInfo usage due to the basename where N indicates the ChromosomelibStatGen v1.0.7 updates Optional Parameters** Update to [[BamUtil:diff|diff]] --noeof *** Fix DIFF to test and properly handle running out of available records. Previously no message was printed when this happened and there was a bug for which file it freed** Update to [[BamUtil: clipOverlap|clipOverlap]]*** Update to facilitate adding other overlap handling functions** Update to [[BamUtil: do not expect mergeBam|mergeBam]] (formerly RGMergeBam)*** Rename RGMergeBam to MergeBam*** Update to handle files that already have an EOF block on RG*[[Media:BamUtil.1.0.6.tgz|BamUtil.1.0.6.tgz‎]] - Released 11/14/2012** Update to [[BamUtil: trimBam|trimBam]]*** Update to allow trimming a bam filedifferent number of bases from each end of the read*[[Media:BamUtil.1.0.5.tgz|BamUtil.1.0.5.tgz‎]] - Released 10/24/2012** Update to [[BamUtil: dedup|dedup]]*** Update logic for which pair to keep if they have the same quality** Update to [[BamUtil: polishBam|polishBam]]*** Update to print the number of successful header additions --bamIndex ** Update to [[BamUtil: recab|recab]]*** Update to print the path/name number of base skipped due to the bam index filebase quality** General Updates (if not specified, uses the --in value *** Update to add compile option to compile without C++0x/C++ "11*BamUtil.bai")1.0.4.tgz‎ - Released skipped *[[Media:BamUtil.1.0.3.tgz|BamUtil.1.0.3.tgz‎]] --bamout Released 09/19/2012** Adds: [[BamUtil: dedup|dedup]] [[BamUtil: recab|recab]]** General Updates*** Update Logger to write the to stderr if output files in BAM format (default).is stdout** Update to [[BamUtil: stats|stats]]*** Add required/exclude flags*** Exclude Clips if excluding umapped *** Add --samout withinRegion flag*** Update phred/qual counts to be uint64_t instead of int to avoid overflow** Update to [[BamUtil: validate|validate]]*** Detect header failures** Update to [[BamUtil: diff|diff]]*** Update to specify chromosome/pos in ZP as a string rather than int so both can be shown** Update to [[BamUtil: write readReference|readReference]]*** Output error message if the reference name is not found** Update to [[BamUtil: splitChromosome|splitChromosome]]*** Update to actually split the chromosomes and not just hard coded to output files in SAM formatchromosomes ids 0-22** Update Makefile to have cloneLib for cloning libStatGen*[[Media:BamUtil.1.0.2.tgz|BamUtil.1.0.2.tgz‎]] - Released 05/16/2012** Adds: [[BamUtil: bam2FastQ|bam2FastQ]] *[[Media:BamUtil.1.0.1.tgz|BamUtil.1.0.1.tgz‎]] -Released 05/04/2012** Adds: [[BamUtil: splitBam|splitBam]], [[BamUtil: clipOverlap|clipOverlap]], [[BamUtil: trimBam|trimBam]], [[BamUtil: polishBam|polishBam]], [[BamUtil: rgMergeBam|rgMergeBam]], [[BamUtil: gapInfo|gapInfo]]** Adds additional functionality to [[BamUtil: stats|stats]]** Adds leftShifting to [[BamUtil: writeRegion|writeRegion]] and [[BamUtil: convert|convert]]** Adds more diff fields to [[BamUtil: diff|diff]]*[[Media:BamUtil.1.0.0.tgz|BamUtil.1.0.0.tgz‎]] -params : print Released 10/10/2011**Initial release of just bamUtil. It started from the tool found in the parameter settingsdeprecated StatGen repository.</pre>**Contains: [[BamUtil: validate|validate]], [[BamUtil: convert|convert]], [[BamUtil: dumpHeader|dumpHeader]], [[BamUtil: splitChromosome|splitChromosome]], [[BamUtil: writeRegion|writeRegion]], [[BamUtil: dumpRefInfo|dumpRefInfo]], [[BamUtil: dumpIndex|dumpIndex]], [[BamUtil: readIndexedBam|readIndexedBam]], [[BamUtil: filter|filter]], [[BamUtil: readReference|readReference]], [[BamUtil: revert|revert]], [[BamUtil: diff|diff]], [[BamUtil: squeeze|squeeze]], [[BamUtil: findCigars|findCigars]], [[BamUtil: stats|stats]]
=== Usage =Citation ==If you use BamUtil, please cite our publication on GotCloud which includes BamUtil: [http://genome.cshlp.org/content/early/2015/04/14/gr.176552.114.abstract Jun, Goo, et al. "An efficient and scalable analysis framework for variant extraction and refinement from population scale DNA sequence data." Genome research (2015): gr-176552.]
./bam splitChromosome --in <inputFilename> --out <outputFileBaseName> [--bamIndex <bamIndexFile>] [--noeof] [--bamout|--samout] [--params]
= Programs =
=== Return Value ===* 0: all records are successfully read and written.* non-0: at least one record was not successfully read or written. === Example Output ===<pre>Reference ID -1 has 2 recordsReference ID 0 has 5 recordsReference ID 1 has 2 recordsReference ID 2 has 1 recordsReference ID 3 has 0 recordsReference ID 4 has 0 recordsReference ID 5 has 0 recordsReference ID 6 has 0 recordsReference ID 7 has 0 recordsReference ID 8 has 0 recordsReference ID 9 has 0 recordsReference ID 10 has 0 recordsReference ID 11 has 0 recordsReference ID 12 has 0 recordsReference ID 13 has 0 recordsReference ID 14 has 0 recordsReference ID 15 has 0 recordsReference ID 16 has 0 recordsReference ID 17 has 0 recordsReference ID 18 has 0 recordsReference ID 19 has 0 recordsReference ID 20 has 0 recordsReference ID 21 has 0 recordsReference ID 22 has 0 recordsNumber of records = 10Returning: 0 (SUCCESS)</pre>  == writeRegion == The <code>writeRegion</code> option on software reads the bam executable writes the alignments in the indexed BAM file that fall into the specified region (reference id and start/end position). === Parameters ===<pre> Required Parameters: --in : the BAM file to be read --out : the SAM/BAM file to write to Optional Parameters: --noeof : do not expect an EOF block on a bam file. --bamIndex : the path/name of the bam index file (if not specified, uses the --in value + ".bai") --refName : the BAM reference Name to read (either this or refID can be specified) --refID : the BAM reference ID to read (defaults to -1: unmapped) --start : inclusive 0-based start position (defaults to -1) --end : exclusive 0-based end position (defaults to -1: meaning til the end beginning of the reference) --params : print the parameter settings</pre> === Usage ===  ./bam writeRegion --in <inputFilename> --out <outputFilename> [--bamIndex <bamIndexFile>] [--noeof] [--refName <reference Name> | --refID <reference ID>] [--start <0-based start pos>] [--end <0-based end psoition>] [--params] === Return Value ===* 0: all records are successfully read and written.* non-0: at least one record was not successfully read or written. === Example Output ===<pre> Wrote t.sam with 2 records.</pre>  == dumpRefInfo ==The <code>dumpRefInfo</code> option on the bam executable prints the SAM/BAM file's reference information. === Parameters ===<pre> Required Parameters: --in : the SAM/BAM file to be read Optional Parameters: --noeof : do not expect an EOF block on a bam input file. --printRecordRefs : print the reference information for the records in the file (grouped by reference). --params : print the parameter settings</pre> === Usage === ./bam dumpRefInfo --in <inputFilename> [--noeof] [--printRecordRefs] [--params] === Return Value ===* 0: the file was processed successfully.* non-0: the file was not processed successfully.  == dumpIndex ==The <code>dumpIndex</code> option on the bam executable prints BAM index file in an easy to read format. === Parameters ===<pre> Required Parameters: --bamIndex : the path/name of the bam index file to display Optional Parameters: --refID : the reference ID to read, defaults to print all --summary : only print a summary - 1 line per reference. --params : print the parameter settings</pre> === Usage === ./bam dumpIndex --bamIndex <bamIndexFile> [--refID <ref#>] [--summary] [--params] === Return Value ===* 0: the BAM index file was processed successfully.* non-0: the BAM index file was not processed successfully.  == readIndexedBam ==The <code>readIndexedBam</code> option on the bam executable reads an indexed BAM file reference id by reference id -1 to the max reference id and writes determine if it out as a SAM/BAM file. === Parameters ===<pre> Required Parameters: inputFilename - path/name of the input BAM file outputFile.sam/bam - path/name of the output file bamIndexFile - path/name of the BAM index file</pre> === Usage ===./bam readIndexedBam <inputFilename> <outputFile.sam/bam> <bamIndexFile> === Return Value ===* 0 == filter == The <code>filter</code> option on the bam executable filters the reads in a a SAM/BAM file. This option is documented at: [[Bam Executable: Filter]] == diff ==<span style="color:#D2691E">'''***Coming Soon***'''</span> The <code>diff</code> option on the bam executable prints the difference between two coordinate sorted SAM/BAM files. This can be used to compare To determine the outputs of running a SAM/BAM through different tools/versions of tools. The <code>diff</code> tool compares records that have the same Read Name and Fragment format (from the flag). If a matching ReadName & Fragment is not found, the record is considered to be different. <code>diff</code> assumes the files are coordinate sorted and uses this assumption for determining how long to store a record before determining that the other file does not contain a matching ReadName/Fragment. If the files are not coordinate sorted, this logic does not work. By default, just the chromosome/position and cigar are compared for each record. Options are available to compare:* sequence* base quality* specified tags* turn off position comparison* turn off cigar comparison === Parameters ===<pre> Required Parameters: --in1 : first coordinate sorted SAM/BAM file to be diffed --in2 : second coordinate sorted SAM/BAM file to be diffed Optional Parameters: --out : output filename, use .bam extension to output in SAM/BAM format instead of diff format. In SAMBAM format there will be 3 output files: 1) the specified name with record diffs 2) specified name with _only_<in1>.sam/bam with records only in the in1 file 3) specified name with _only_<in2>.sam/bam with records only in the in2 file --seq : diff the sequence bases. --baseQual : diff the base qualities. --tags : diff the specified Tags formatted as Tag:Type;Tag:Type;Tag:Type... --noCigar : do not diff the the cigars. --noPos : do not diff the positions. --onlyDiffs : only print the fields that are different, otherwise for any diff all the fields that are compared are printed. --recPoolSize : number of records to allow to be stored at a time, default value: 1000000 --posDiff : max base pair difference between possibly matching records100000 --noeof : do not expect an EOF block on a bam file. --params : print the parameter settings</pre> === Usage === ./bam diff --in1 <inputFile> --in2 <inputFile> [--out <outputFile>] [--baseQual] [--tags <Tag:Type[;Tag:Type]*>] [--noCigar] [--noPos] [--onlyDiffs] [--recPoolSize <int>] [--posDiff <int>] [--noeof] [--params] === Return Value ===* 0: all records are successfully read and written.* non-0: an error occurred processing the parameters or reading one of the files.e === Output Format ===2 Output Formats:# Diff Format# BAM Format ==== Diff Format ====There are 2 types of differences.* ReadName/Fragment combo is in one output file, but not in the other file within software checks the window set by recPoolSize & posDiff* ReadName/Fragment combo is in both files, but at least one of the specified fields to diff is different Each difference output consists of 2 or 3 lines. If the record only appears in one of the files, the diff is 2 lines, if it appears in both files, the diff is 3 lines. The first line of the difference output is just the read name. The 2nd and 3rd line (if present) begin with either a '<' or a '>'. If the record is from the first file (--in1), it begins with a '<'s extension. If the record extension is from the 2nd file (--in2), it begins with a '>'. The 2nd line is the flag followed by the diff'd fields from one of the records. The 3rd line (if a matching record was found) is the flag followed by the diff'd fields from the matching record".  The diff'd record lines are tab separated, and are in the following order if --onlyDiffs is not specified:* '<' or '>'* flag* chrom:pos (chromosome name ':' 1 based position) - if --noPos is not specified* cigar - if --noCigar is not specified* sequence - if --seq is specified* base quality - if --baseQual is specified* tag:type:value - for each tag:type specified in --tags* ...* tag:type:value If <code>onlyDiffs</code> is specified, only the fields that are specified and are different get printed in lines 2 & 3. ===== Example Output =====Command: ../bin/bam diff --in1 testFiles/testDiff1.sam --in2 testFiles/testDiff2.sam --seq --baseQual --tags "OP:i;MD:Z" --onlyDiffs --out results/diffOrderSam.log Output:<pre>18:462+29M5I3M:F:295< a1 1:78> a1 1:741> a1 1:70 3S1M1S ACGTN ;46>> OP:i:75 MD:Z:30A0C52> a1 1:72 3S1M1S ACGTN ;47>> OP:i:75 MD:Z:30A0C5ABC> cd *:0 * * *DEF> cd *:0 * * *</pre> ==== SAM/Bam Format ====use .sam/.bam extension to output in SAM/it writes a BAM format instead of diff format. In SAM/BAM format there will be 3 output files:# the specified name with record diffs# specified name with _only_<in1>.sam/bam with records only in the in1 file# specified name with _only_<in2>.sam/bam with records only in the in2 file When a record is found in both input files, but otherwise it writes a difference is found, the record from the first file is written with additional tags to indicate the values from the second file, using the following tags:* ZF - Flag* ZP - Pos* ZC - Cigar* ZS - Sequence* ZQ - Base Quality* ZT - Tags == readReference ==The <code>readReference</code> option on the bam executable prints the specified region of the reference sequence in an easy to read format. === Parameters ===<pre> Required Parameters: --refFile : the reference --refName : the SAM/BAM reference Name to read --start : inclusive 0-based start position (defaults to -1) Required Length Parameter (one but not both needs to be specified): --end : exclusive 0-based end position (defaults to -1: meaning til the end of the reference) --numBases : number of bases from start to display --params : print the parameter settings</pre> === Usage === ./bam readReference --refFile <referenceFilename> --refName <reference Name> --start <0 based start> --end <0 based end>|--numBases <number of bases> [--params] === Return Value ===* 0: the reference file was successfully read.* non-0: the reference file was not successfully read=== Example Output ===<pre>
</pre>{{BamUtilPrograms}}

Navigation menu