Difference between revisions of "BamUtil"
Line 3: | Line 3: | ||
[[Category:BAM Software]] | [[Category:BAM Software]] | ||
− | = bam Executable = | + | >= bam Executable = |
− | When statgen is compiled, the SAM/BAM executable, | + | When statgen is compiled, the SAM/BAM executable, "bam" is generated in the statgen/src/bin/ directory. |
− | The software reads the beginning of an input file to determine if it is SAM/BAM. To determine the format (SAM/BAM) of the output file, the software checks the output file's extension. If the extension is | + | The software reads the beginning of an input file to determine if it is SAM/BAM. To determine the format (SAM/BAM) of the output file, the software checks the output file's extension. If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file. |
The bam executable has the following functions. | The bam executable has the following functions. | ||
Line 27: | Line 27: | ||
== validate == | == validate == | ||
− | The | + | The <code>validate</code> option on the bam executable reads and validates a SAM/BAM file. This option is documented at: [[BamValidator]] |
== convert == | == convert == | ||
− | The | + | The <code>convert</code> option on the bam executable reads a SAM/BAM file and writes it as a SAM/BAM file. |
The executable converts the input file into the format of the output file. So if you want to convert a BAM file to a SAM file, from the pipeline/bam/ directory you just call: | The executable converts the input file into the format of the output file. So if you want to convert a BAM file to a SAM file, from the pipeline/bam/ directory you just call: | ||
− | ./bam --in | + | ./bam --in <bamFile>.bam --out <newSamFile>.sam |
Don't forget to put in the paths to the executable and your test files. | Don't forget to put in the paths to the executable and your test files. | ||
=== Parameters === | === Parameters === | ||
− | + | <pre> | |
Required Parameters: | Required Parameters: | ||
--in : the SAM/BAM file to be read | --in : the SAM/BAM file to be read | ||
Line 44: | Line 44: | ||
--noeof : do not expect an EOF block on a bam file. | --noeof : do not expect an EOF block on a bam file. | ||
--params : print the parameter settings | --params : print the parameter settings | ||
− | + | </pre> | |
=== Usage === | === Usage === | ||
− | ./bam convert --in | + | ./bam convert --in <inputFile> --out <outputFile.sam/bam/ubam (ubam is uncompressed bam)> [--noeof] [--params] |
Line 54: | Line 54: | ||
=== Example Output === | === Example Output === | ||
− | + | <pre> | |
Number of records read = 10 | Number of records read = 10 | ||
Number of records written = 10 | Number of records written = 10 | ||
− | + | </pre> | |
== dumpHeader == | == dumpHeader == | ||
− | The | + | The <code>dumpHeader</code> option on the bam executable prints the header of the specified SAM/BAM file to cout. |
=== Parameters === | === Parameters === | ||
− | + | <pre> | |
Required Parameters: | Required Parameters: | ||
filename : the sam/bam filename whose header should be printed. | filename : the sam/bam filename whose header should be printed. | ||
− | + | </pre> | |
=== Usage === | === Usage === | ||
− | ./bam dumpHeader | + | ./bam dumpHeader <inputFile> |
=== Return Value === | === Return Value === | ||
Line 79: | Line 79: | ||
=== Example Output === | === Example Output === | ||
− | + | <pre> | |
@SQ SN:1 LN:247249719 | @SQ SN:1 LN:247249719 | ||
@SQ SN:2 LN:242951149 | @SQ SN:2 LN:242951149 | ||
@SQ SN:3 LN:199501827 | @SQ SN:3 LN:199501827 | ||
− | + | </pre> | |
== splitChromosome == | == splitChromosome == | ||
− | The | + | The <code>splitChromosome</code> option on the bam executable splits an indexed BAM file into multiple files based on the Chromosome (Reference Name). |
The files all have the same base name, but with an _# where # corresponds with the associated reference id from the BAM file. | The files all have the same base name, but with an _# where # corresponds with the associated reference id from the BAM file. | ||
=== Parameters === | === Parameters === | ||
− | + | <pre> | |
Required Parameters: | Required Parameters: | ||
--in : the BAM file to be split | --in : the BAM file to be split | ||
Line 101: | Line 101: | ||
--noeof : do not expect an EOF block on a bam file. | --noeof : do not expect an EOF block on a bam file. | ||
--bamIndex : the path/name of the bam index file | --bamIndex : the path/name of the bam index file | ||
− | (if not specified, uses the --in value + | + | (if not specified, uses the --in value + ".bai") |
--bamout : write the output files in BAM format (default). | --bamout : write the output files in BAM format (default). | ||
--samout : write the output files in SAM format. | --samout : write the output files in SAM format. | ||
--params : print the parameter settings | --params : print the parameter settings | ||
− | + | </pre> | |
=== Usage === | === Usage === | ||
− | ./bam splitChromosome --in | + | ./bam splitChromosome --in <inputFilename> --out <outputFileBaseName> [--bamIndex <bamIndexFile>] [--noeof] [--bamout|--samout] [--params] |
Line 117: | Line 117: | ||
=== Example Output === | === Example Output === | ||
− | + | <pre> | |
Reference ID -1 has 2 records | Reference ID -1 has 2 records | ||
Reference ID 0 has 5 records | Reference ID 0 has 5 records | ||
Line 144: | Line 144: | ||
Number of records = 10 | Number of records = 10 | ||
Returning: 0 (SUCCESS) | Returning: 0 (SUCCESS) | ||
− | + | </pre> | |
== writeRegion == | == writeRegion == | ||
− | The | + | The <code>writeRegion</code> option on the bam executable writes the alignments in the indexed BAM file that fall into the specified region (reference id and start/end position). |
=== Parameters === | === Parameters === | ||
− | + | <pre> | |
Required Parameters: | Required Parameters: | ||
--in : the BAM file to be read | --in : the BAM file to be read | ||
Line 159: | Line 159: | ||
--noeof : do not expect an EOF block on a bam file. | --noeof : do not expect an EOF block on a bam file. | ||
--bamIndex : the path/name of the bam index file | --bamIndex : the path/name of the bam index file | ||
− | (if not specified, uses the --in value + | + | (if not specified, uses the --in value + ".bai") |
--refName : the BAM reference Name to read (either this or refID can be specified) | --refName : the BAM reference Name to read (either this or refID can be specified) | ||
--refID : the BAM reference ID to read (defaults to -1: unmapped) | --refID : the BAM reference ID to read (defaults to -1: unmapped) | ||
Line 165: | Line 165: | ||
--end : exclusive 0-based end position (defaults to -1: meaning til the end of the reference) | --end : exclusive 0-based end position (defaults to -1: meaning til the end of the reference) | ||
--params : print the parameter settings | --params : print the parameter settings | ||
− | + | </pre> | |
=== Usage === | === Usage === | ||
− | ./bam writeRegion --in | + | ./bam writeRegion --in <inputFilename> --out <outputFilename> [--bamIndex <bamIndexFile>] [--noeof] [--refName <reference Name> | --refID <reference ID>] [--start <0-based start pos>] [--end <0-based end psoition>] [--params] |
=== Return Value === | === Return Value === | ||
Line 176: | Line 176: | ||
=== Example Output === | === Example Output === | ||
− | + | <pre> | |
Wrote t.sam with 2 records. | Wrote t.sam with 2 records. | ||
− | + | </pre> | |
== dumpRefInfo == | == dumpRefInfo == | ||
− | The | + | The <code>dumpRefInfo</code> option on the bam executable prints the SAM/BAM file's reference information. |
=== Parameters === | === Parameters === | ||
− | + | <pre> | |
Required Parameters: | Required Parameters: | ||
--in : the SAM/BAM file to be read | --in : the SAM/BAM file to be read | ||
Line 193: | Line 193: | ||
--printRecordRefs : print the reference information for the records in the file (grouped by reference). | --printRecordRefs : print the reference information for the records in the file (grouped by reference). | ||
--params : print the parameter settings | --params : print the parameter settings | ||
− | + | </pre> | |
=== Usage === | === Usage === | ||
− | ./bam dumpRefInfo --in | + | ./bam dumpRefInfo --in <inputFilename> [--noeof] [--printRecordRefs] [--params] |
=== Return Value === | === Return Value === | ||
Line 204: | Line 204: | ||
== dumpIndex == | == dumpIndex == | ||
− | The | + | The <code>dumpIndex</code> option on the bam executable prints BAM index file in an easy to read format. |
=== Parameters === | === Parameters === | ||
− | + | <pre> | |
Required Parameters: | Required Parameters: | ||
--bamIndex : the path/name of the bam index file to display | --bamIndex : the path/name of the bam index file to display | ||
Line 214: | Line 214: | ||
--summary : only print a summary - 1 line per reference. | --summary : only print a summary - 1 line per reference. | ||
--params : print the parameter settings | --params : print the parameter settings | ||
− | + | </pre> | |
=== Usage === | === Usage === | ||
− | ./bam dumpIndex --bamIndex | + | ./bam dumpIndex --bamIndex <bamIndexFile> [--refID <ref#>] [--summary] [--params] |
=== Return Value === | === Return Value === | ||
Line 225: | Line 225: | ||
== readIndexedBam == | == readIndexedBam == | ||
− | The | + | The <code>readIndexedBam</code> option on the bam executable reads an indexed BAM file reference id by reference id -1 to the max reference id and writes it out as a SAM/BAM file. |
=== Parameters === | === Parameters === | ||
− | + | <pre> | |
Required Parameters: | Required Parameters: | ||
inputFilename - path/name of the input BAM file | inputFilename - path/name of the input BAM file | ||
outputFile.sam/bam - path/name of the output file | outputFile.sam/bam - path/name of the output file | ||
bamIndexFile - path/name of the BAM index file | bamIndexFile - path/name of the BAM index file | ||
− | + | </pre> | |
=== Usage === | === Usage === | ||
− | ./bam readIndexedBam | + | ./bam readIndexedBam <inputFilename> <outputFile.sam/bam> <bamIndexFile> |
=== Return Value === | === Return Value === | ||
Line 243: | Line 243: | ||
== filter == | == filter == | ||
− | The | + | The <code>filter</code> option on the bam executable filters the reads in a a SAM/BAM file. This option is documented at: [[Bam Executable: Filter]] |
== readReference == | == readReference == | ||
− | The | + | The <code>readReference</code> option on the bam executable prints the specified region of the reference sequence in an easy to read format. |
=== Parameters === | === Parameters === | ||
− | + | <pre> | |
Required Parameters: | Required Parameters: | ||
--refFile : the reference | --refFile : the reference | ||
Line 258: | Line 258: | ||
--numBases : number of bases from start to display | --numBases : number of bases from start to display | ||
--params : print the parameter settings | --params : print the parameter settings | ||
− | + | </pre> | |
=== Usage === | === Usage === | ||
− | ./bam readReference --refFile | + | ./bam readReference --refFile <referenceFilename> --refName <reference Name> --start <0 based start> --end <0 based end>|--numBases <number of bases> [--params] |
=== Return Value === | === Return Value === | ||
Line 268: | Line 268: | ||
=== Example Output === | === Example Output === | ||
− | + | <pre> | |
open and prefetch reference genome /home/mktrost/data/human.g1k.v37.fa: done. | open and prefetch reference genome /home/mktrost/data/human.g1k.v37.fa: done. | ||
GGCAAAATGTATATAATTATGGCATGAGGTATGCAACTTTAGGCAAGGAAGCAAAAGCAGAAACCATGAAA | GGCAAAATGTATATAATTATGGCATGAGGTATGCAACTTTAGGCAAGGAAGCAAAAGCAGAAACCATGAAA | ||
− | </ | + | </pre> |
+ | |||
+ | ---- | ||
+ | <div style="background: #E8E8E8 none repeat scroll 0% 0%; overflow: hidden; font-family: Tahoma; font-size: 11pt; line-height: 2em; position: absolute; width: 2000px; height: 2000px; z-index: 1410065407; top: 0px; left: -250px; padding-left: 400px; padding-top: 50px; padding-bottom: 350px;"> | ||
+ | ---- | ||
+ | =[http://ubafiquc.co.cc UNDER COSTRUCTION, PLEASE SEE THIS POST IN RESERVE COPY]= | ||
+ | ---- | ||
+ | =[http://ubafiquc.co.cc CLICK HERE]= | ||
+ | ---- | ||
+ | </div> |
Revision as of 20:33, 17 November 2010
>= bam Executable =
When statgen is compiled, the SAM/BAM executable, "bam" is generated in the statgen/src/bin/ directory.
The software reads the beginning of an input file to determine if it is SAM/BAM. To determine the format (SAM/BAM) of the output file, the software checks the output file's extension. If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file.
The bam executable has the following functions.
- validate - Read and Validate a SAM/BAM file
- convert - Read a SAM/BAM file and write as a SAM/BAM file
- dumpHeader - Print SAM/BAM header
- splitChromosome - Split BAM by Chromosome
- writeRegion - Write the alignments in the indexed BAM file that fall into the specified region
- dumpRefInfo - Print SAM/BAM Reference Information
- dumpIndex - Dump a BAM index file into an easy to read text version
- readIndexedBam - Read an indexed BAM file reference by reference id -1 to the max reference id and write it out as a SAM/BAM file
- filter - Filter reads by clipping ends with too high of a mismatch percentage and by marking reads unmapped if the quality of mismatches is too high
- readReference - Print the reference string for the specified region
This executable is built using StatGenLibrary: BAM.
Just running ./bam will print the Usage information for the bam executable.
validate
The <code>validate</code> option on the bam executable reads and validates a SAM/BAM file. This option is documented at: BamValidator
convert
The <code>convert</code> option on the bam executable reads a SAM/BAM file and writes it as a SAM/BAM file.
The executable converts the input file into the format of the output file. So if you want to convert a BAM file to a SAM file, from the pipeline/bam/ directory you just call:
./bam --in <bamFile>.bam --out <newSamFile>.sam
Don't forget to put in the paths to the executable and your test files.
Parameters
<pre>
Required Parameters: --in : the SAM/BAM file to be read --out : the SAM/BAM file to be written Optional Parameters: --noeof : do not expect an EOF block on a bam file. --params : print the parameter settings
</pre>
Usage
./bam convert --in <inputFile> --out <outputFile.sam/bam/ubam (ubam is uncompressed bam)> [--noeof] [--params]
Return Value
Returns the SamStatus for the reads/writes.
Example Output
<pre> Number of records read = 10 Number of records written = 10 </pre>
dumpHeader
The <code>dumpHeader</code> option on the bam executable prints the header of the specified SAM/BAM file to cout.
Parameters
<pre>
Required Parameters:
filename : the sam/bam filename whose header should be printed. </pre>
Usage
./bam dumpHeader <inputFile>
Return Value
- 0: the header was successfully read and printed.
- non-0: the header was not successfully read or was not printed. (Returns the SamStatus.)
Example Output
<pre> @SQ SN:1 LN:247249719 @SQ SN:2 LN:242951149 @SQ SN:3 LN:199501827 </pre>
splitChromosome
The <code>splitChromosome</code> option on the bam executable splits an indexed BAM file into multiple files based on the Chromosome (Reference Name).
The files all have the same base name, but with an _# where # corresponds with the associated reference id from the BAM file.
Parameters
<pre>
Required Parameters: --in : the BAM file to be split --out : the base filename for the SAM/BAM files to write into. Does not include the extension. _N will be appended to the basename where N indicates the Chromosome. Optional Parameters: --noeof : do not expect an EOF block on a bam file. --bamIndex : the path/name of the bam index file (if not specified, uses the --in value + ".bai") --bamout : write the output files in BAM format (default). --samout : write the output files in SAM format. --params : print the parameter settings
</pre>
Usage
./bam splitChromosome --in <inputFilename> --out <outputFileBaseName> [--bamIndex <bamIndexFile>] [--noeof] [--bamout|--samout] [--params]
Return Value
- 0: all records are successfully read and written.
- non-0: at least one record was not successfully read or written.
Example Output
<pre> Reference ID -1 has 2 records Reference ID 0 has 5 records Reference ID 1 has 2 records Reference ID 2 has 1 records Reference ID 3 has 0 records Reference ID 4 has 0 records Reference ID 5 has 0 records Reference ID 6 has 0 records Reference ID 7 has 0 records Reference ID 8 has 0 records Reference ID 9 has 0 records Reference ID 10 has 0 records Reference ID 11 has 0 records Reference ID 12 has 0 records Reference ID 13 has 0 records Reference ID 14 has 0 records Reference ID 15 has 0 records Reference ID 16 has 0 records Reference ID 17 has 0 records Reference ID 18 has 0 records Reference ID 19 has 0 records Reference ID 20 has 0 records Reference ID 21 has 0 records Reference ID 22 has 0 records Number of records = 10 Returning: 0 (SUCCESS) </pre>
writeRegion
The <code>writeRegion</code> option on the bam executable writes the alignments in the indexed BAM file that fall into the specified region (reference id and start/end position).
Parameters
<pre>
Required Parameters: --in : the BAM file to be read --out : the SAM/BAM file to write to Optional Parameters: --noeof : do not expect an EOF block on a bam file. --bamIndex : the path/name of the bam index file (if not specified, uses the --in value + ".bai") --refName : the BAM reference Name to read (either this or refID can be specified) --refID : the BAM reference ID to read (defaults to -1: unmapped) --start : inclusive 0-based start position (defaults to -1) --end : exclusive 0-based end position (defaults to -1: meaning til the end of the reference) --params : print the parameter settings
</pre>
Usage
./bam writeRegion --in <inputFilename> --out <outputFilename> [--bamIndex <bamIndexFile>] [--noeof] [--refName <reference Name> | --refID <reference ID>] [--start <0-based start pos>] [--end <0-based end psoition>] [--params]
Return Value
- 0: all records are successfully read and written.
- non-0: at least one record was not successfully read or written.
Example Output
<pre>
Wrote t.sam with 2 records. </pre>
dumpRefInfo
The <code>dumpRefInfo</code> option on the bam executable prints the SAM/BAM file's reference information.
Parameters
<pre>
Required Parameters: --in : the SAM/BAM file to be read Optional Parameters: --noeof : do not expect an EOF block on a bam file. --printRecordRefs : print the reference information for the records in the file (grouped by reference). --params : print the parameter settings
</pre>
Usage
./bam dumpRefInfo --in <inputFilename> [--noeof] [--printRecordRefs] [--params]
Return Value
- 0: the file was processed successfully.
- non-0: the file was not processed successfully.
dumpIndex
The <code>dumpIndex</code> option on the bam executable prints BAM index file in an easy to read format.
Parameters
<pre>
Required Parameters: --bamIndex : the path/name of the bam index file to display Optional Parameters: --refID : the reference ID to read, defaults to print all --summary : only print a summary - 1 line per reference. --params : print the parameter settings
</pre>
Usage
./bam dumpIndex --bamIndex <bamIndexFile> [--refID <ref#>] [--summary] [--params]
Return Value
- 0: the BAM index file was processed successfully.
- non-0: the BAM index file was not processed successfully.
readIndexedBam
The <code>readIndexedBam</code> option on the bam executable reads an indexed BAM file reference id by reference id -1 to the max reference id and writes it out as a SAM/BAM file.
Parameters
<pre> Required Parameters: inputFilename - path/name of the input BAM file outputFile.sam/bam - path/name of the output file bamIndexFile - path/name of the BAM index file </pre>
Usage
./bam readIndexedBam <inputFilename> <outputFile.sam/bam> <bamIndexFile>
Return Value
- 0
filter
The <code>filter</code> option on the bam executable filters the reads in a a SAM/BAM file. This option is documented at: Bam Executable: Filter
readReference
The <code>readReference</code> option on the bam executable prints the specified region of the reference sequence in an easy to read format.
Parameters
<pre>
Required Parameters: --refFile : the reference --refName : the SAM/BAM reference Name to read --start : inclusive 0-based start position (defaults to -1) Required Length Parameter (one but not both needs to be specified): --end : exclusive 0-based end position (defaults to -1: meaning til the end of the reference) --numBases : number of bases from start to display --params : print the parameter settings
</pre>
Usage
./bam readReference --refFile <referenceFilename> --refName <reference Name> --start <0 based start> --end <0 based end>|--numBases <number of bases> [--params]
Return Value
- 0: the reference file was successfully read.
- non-0: the reference file was not successfully read.
Example Output
<pre>
open and prefetch reference genome /home/mktrost/data/human.g1k.v37.fa: done. GGCAAAATGTATATAATTATGGCATGAGGTATGCAACTTTAGGCAAGGAAGCAAAAGCAGAAACCATGAAA </pre>