BamUtil
>= bam Executable =
When statgen is compiled, the SAM/BAM executable, "bam" is generated in the statgen/src/bin/ directory.
The software reads the beginning of an input file to determine if it is SAM/BAM. To determine the format (SAM/BAM) of the output file, the software checks the output file's extension. If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file.
The bam executable has the following functions.
- validate - Read and Validate a SAM/BAM file
- convert - Read a SAM/BAM file and write as a SAM/BAM file
- dumpHeader - Print SAM/BAM header
- splitChromosome - Split BAM by Chromosome
- writeRegion - Write the alignments in the indexed BAM file that fall into the specified region
- dumpRefInfo - Print SAM/BAM Reference Information
- dumpIndex - Dump a BAM index file into an easy to read text version
- readIndexedBam - Read an indexed BAM file reference by reference id -1 to the max reference id and write it out as a SAM/BAM file
- filter - Filter reads by clipping ends with too high of a mismatch percentage and by marking reads unmapped if the quality of mismatches is too high
- readReference - Print the reference string for the specified region
This executable is built using StatGenLibrary: BAM.
Just running ./bam will print the Usage information for the bam executable.
validate
The <code>validate</code> option on the bam executable reads and validates a SAM/BAM file. This option is documented at: BamValidator
convert
The <code>convert</code> option on the bam executable reads a SAM/BAM file and writes it as a SAM/BAM file.
The executable converts the input file into the format of the output file. So if you want to convert a BAM file to a SAM file, from the pipeline/bam/ directory you just call:
./bam --in <bamFile>.bam --out <newSamFile>.sam
Don't forget to put in the paths to the executable and your test files.
Parameters
<pre>
Required Parameters: --in : the SAM/BAM file to be read --out : the SAM/BAM file to be written Optional Parameters: --noeof : do not expect an EOF block on a bam file. --params : print the parameter settings
</pre>
Usage
./bam convert --in <inputFile> --out <outputFile.sam/bam/ubam (ubam is uncompressed bam)> [--noeof] [--params]
Return Value
Returns the SamStatus for the reads/writes.
Example Output
<pre> Number of records read = 10 Number of records written = 10 </pre>
dumpHeader
The <code>dumpHeader</code> option on the bam executable prints the header of the specified SAM/BAM file to cout.
Parameters
<pre>
Required Parameters:
filename : the sam/bam filename whose header should be printed. </pre>
Usage
./bam dumpHeader <inputFile>
Return Value
- 0: the header was successfully read and printed.
- non-0: the header was not successfully read or was not printed. (Returns the SamStatus.)
Example Output
<pre> @SQ SN:1 LN:247249719 @SQ SN:2 LN:242951149 @SQ SN:3 LN:199501827 </pre>
splitChromosome
The <code>splitChromosome</code> option on the bam executable splits an indexed BAM file into multiple files based on the Chromosome (Reference Name).
The files all have the same base name, but with an _# where # corresponds with the associated reference id from the BAM file.
Parameters
<pre>
Required Parameters: --in : the BAM file to be split --out : the base filename for the SAM/BAM files to write into. Does not include the extension. _N will be appended to the basename where N indicates the Chromosome. Optional Parameters: --noeof : do not expect an EOF block on a bam file. --bamIndex : the path/name of the bam index file (if not specified, uses the --in value + ".bai") --bamout : write the output files in BAM format (default). --samout : write the output files in SAM format. --params : print the parameter settings
</pre>
Usage
./bam splitChromosome --in <inputFilename> --out <outputFileBaseName> [--bamIndex <bamIndexFile>] [--noeof] [--bamout|--samout] [--params]
Return Value
- 0: all records are successfully read and written.
- non-0: at least one record was not successfully read or written.
Example Output
<pre> Reference ID -1 has 2 records Reference ID 0 has 5 records Reference ID 1 has 2 records Reference ID 2 has 1 records Reference ID 3 has 0 records Reference ID 4 has 0 records Reference ID 5 has 0 records Reference ID 6 has 0 records Reference ID 7 has 0 records Reference ID 8 has 0 records Reference ID 9 has 0 records Reference ID 10 has 0 records Reference ID 11 has 0 records Reference ID 12 has 0 records Reference ID 13 has 0 records Reference ID 14 has 0 records Reference ID 15 has 0 records Reference ID 16 has 0 records Reference ID 17 has 0 records Reference ID 18 has 0 records Reference ID 19 has 0 records Reference ID 20 has 0 records Reference ID 21 has 0 records Reference ID 22 has 0 records Number of records = 10 Returning: 0 (SUCCESS) </pre>
writeRegion
The <code>writeRegion</code> option on the bam executable writes the alignments in the indexed BAM file that fall into the specified region (reference id and start/end position).
Parameters
<pre>
Required Parameters: --in : the BAM file to be read --out : the SAM/BAM file to write to Optional Parameters: --noeof : do not expect an EOF block on a bam file. --bamIndex : the path/name of the bam index file (if not specified, uses the --in value + ".bai") --refName : the BAM reference Name to read (either this or refID can be specified) --refID : the BAM reference ID to read (defaults to -1: unmapped) --start : inclusive 0-based start position (defaults to -1) --end : exclusive 0-based end position (defaults to -1: meaning til the end of the reference) --params : print the parameter settings
</pre>
Usage
./bam writeRegion --in <inputFilename> --out <outputFilename> [--bamIndex <bamIndexFile>] [--noeof] [--refName <reference Name> | --refID <reference ID>] [--start <0-based start pos>] [--end <0-based end psoition>] [--params]
Return Value
- 0: all records are successfully read and written.
- non-0: at least one record was not successfully read or written.
Example Output
<pre>
Wrote t.sam with 2 records. </pre>
dumpRefInfo
The <code>dumpRefInfo</code> option on the bam executable prints the SAM/BAM file's reference information.
Parameters
<pre>
Required Parameters: --in : the SAM/BAM file to be read Optional Parameters: --noeof : do not expect an EOF block on a bam file. --printRecordRefs : print the reference information for the records in the file (grouped by reference). --params : print the parameter settings
</pre>
Usage
./bam dumpRefInfo --in <inputFilename> [--noeof] [--printRecordRefs] [--params]
Return Value
- 0: the file was processed successfully.
- non-0: the file was not processed successfully.
dumpIndex
The <code>dumpIndex</code> option on the bam executable prints BAM index file in an easy to read format.
Parameters
<pre>
Required Parameters: --bamIndex : the path/name of the bam index file to display Optional Parameters: --refID : the reference ID to read, defaults to print all --summary : only print a summary - 1 line per reference. --params : print the parameter settings
</pre>
Usage
./bam dumpIndex --bamIndex <bamIndexFile> [--refID <ref#>] [--summary] [--params]
Return Value
- 0: the BAM index file was processed successfully.
- non-0: the BAM index file was not processed successfully.
readIndexedBam
The <code>readIndexedBam</code> option on the bam executable reads an indexed BAM file reference id by reference id -1 to the max reference id and writes it out as a SAM/BAM file.
Parameters
<pre> Required Parameters: inputFilename - path/name of the input BAM file outputFile.sam/bam - path/name of the output file bamIndexFile - path/name of the BAM index file </pre>
Usage
./bam readIndexedBam <inputFilename> <outputFile.sam/bam> <bamIndexFile>
Return Value
- 0
filter
The <code>filter</code> option on the bam executable filters the reads in a a SAM/BAM file. This option is documented at: Bam Executable: Filter
readReference
The <code>readReference</code> option on the bam executable prints the specified region of the reference sequence in an easy to read format.
Parameters
<pre>
Required Parameters: --refFile : the reference --refName : the SAM/BAM reference Name to read --start : inclusive 0-based start position (defaults to -1) Required Length Parameter (one but not both needs to be specified): --end : exclusive 0-based end position (defaults to -1: meaning til the end of the reference) --numBases : number of bases from start to display --params : print the parameter settings
</pre>
Usage
./bam readReference --refFile <referenceFilename> --refName <reference Name> --start <0 based start> --end <0 based end>|--numBases <number of bases> [--params]
Return Value
- 0: the reference file was successfully read.
- non-0: the reference file was not successfully read.
Example Output
<pre>
open and prefetch reference genome /home/mktrost/data/human.g1k.v37.fa: done. GGCAAAATGTATATAATTATGGCATGAGGTATGCAACTTTAGGCAAGGAAGCAAAAGCAGAAACCATGAAA </pre>