Difference between revisions of "BamUtil"

From Genome Analysis Wiki
Jump to navigationJump to search
(Undo revision 2521 by Upugema (Talk))
Line 3: Line 3:
 
[[Category:BAM Software]]
 
[[Category:BAM Software]]
  
>= bam Executable =
+
= bam Executable =
When statgen is compiled, the SAM/BAM executable, "bam" is generated in the statgen/src/bin/ directory.
+
When statgen is compiled, the SAM/BAM executable, "bam" is generated in the statgen/src/bin/ directory.
  
The software reads the beginning of an input file to determine if it is SAM/BAM.  To determine the format (SAM/BAM) of the output file, the software checks the output file's extension.  If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file.
+
The software reads the beginning of an input file to determine if it is SAM/BAM.  To determine the format (SAM/BAM) of the output file, the software checks the output file's extension.  If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file.
  
 
The bam executable has the following functions.
 
The bam executable has the following functions.
Line 27: Line 27:
 
== validate ==
 
== validate ==
  
The <code>validate</code> option on the bam executable reads and validates a SAM/BAM file.  This option is documented at: [[BamValidator]]
+
The <code>validate</code> option on the bam executable reads and validates a SAM/BAM file.  This option is documented at: [[BamValidator]]
  
 
== convert ==
 
== convert ==
The &lt;code&gt;convert&lt;/code&gt; option on the bam executable reads a SAM/BAM file and writes it as a SAM/BAM file.
+
The <code>convert</code> option on the bam executable reads a SAM/BAM file and writes it as a SAM/BAM file.
  
 
The executable converts the input file into the format of the output file.  So if you want to convert a BAM file to a SAM file, from the pipeline/bam/ directory you just call:
 
The executable converts the input file into the format of the output file.  So if you want to convert a BAM file to a SAM file, from the pipeline/bam/ directory you just call:
  ./bam --in &lt;bamFile&gt;.bam --out &lt;newSamFile&gt;.sam
+
  ./bam --in <bamFile>.bam --out <newSamFile>.sam
 
Don't forget to put in the paths to the executable and your test files.
 
Don't forget to put in the paths to the executable and your test files.
  
 
=== Parameters ===
 
=== Parameters ===
&lt;pre&gt;
+
<pre>
 
     Required Parameters:
 
     Required Parameters:
 
         --in      : the SAM/BAM file to be read
 
         --in      : the SAM/BAM file to be read
Line 44: Line 44:
 
         --noeof    : do not expect an EOF block on a bam file.
 
         --noeof    : do not expect an EOF block on a bam file.
 
         --params  : print the parameter settings
 
         --params  : print the parameter settings
&lt;/pre&gt;
+
</pre>
  
 
=== Usage ===
 
=== Usage ===
  ./bam convert --in &lt;inputFile&gt; --out &lt;outputFile.sam/bam/ubam (ubam is uncompressed bam)&gt; [--noeof] [--params]
+
  ./bam convert --in <inputFile> --out <outputFile.sam/bam/ubam (ubam is uncompressed bam)> [--noeof] [--params]
  
  
Line 54: Line 54:
  
 
=== Example Output ===
 
=== Example Output ===
&lt;pre&gt;
+
<pre>
 
Number of records read = 10
 
Number of records read = 10
 
Number of records written = 10
 
Number of records written = 10
&lt;/pre&gt;
+
</pre>
  
  
 
== dumpHeader ==
 
== dumpHeader ==
The &lt;code&gt;dumpHeader&lt;/code&gt; option on the bam executable prints the header of the specified SAM/BAM file to cout.   
+
The <code>dumpHeader</code> option on the bam executable prints the header of the specified SAM/BAM file to cout.   
  
 
=== Parameters ===
 
=== Parameters ===
&lt;pre&gt;
+
<pre>
 
     Required Parameters:
 
     Required Parameters:
 
filename : the sam/bam filename whose header should be printed.
 
filename : the sam/bam filename whose header should be printed.
&lt;/pre&gt;
+
</pre>
  
 
=== Usage ===
 
=== Usage ===
  
  ./bam dumpHeader &lt;inputFile&gt;
+
  ./bam dumpHeader <inputFile>
  
 
=== Return Value ===
 
=== Return Value ===
Line 79: Line 79:
  
 
=== Example Output ===
 
=== Example Output ===
&lt;pre&gt;
+
<pre>
 
@SQ SN:1 LN:247249719
 
@SQ SN:1 LN:247249719
 
@SQ SN:2 LN:242951149
 
@SQ SN:2 LN:242951149
 
@SQ SN:3 LN:199501827
 
@SQ SN:3 LN:199501827
&lt;/pre&gt;
+
</pre>
  
  
 
== splitChromosome ==
 
== splitChromosome ==
  
The &lt;code&gt;splitChromosome&lt;/code&gt; option on the bam executable splits an indexed BAM file into multiple files based on the Chromosome (Reference Name).   
+
The <code>splitChromosome</code> option on the bam executable splits an indexed BAM file into multiple files based on the Chromosome (Reference Name).   
  
 
The files all have the same base name, but with an _# where # corresponds with the associated reference id from the BAM file.
 
The files all have the same base name, but with an _# where # corresponds with the associated reference id from the BAM file.
  
 
=== Parameters ===
 
=== Parameters ===
&lt;pre&gt;
+
<pre>
 
     Required Parameters:
 
     Required Parameters:
 
         --in      : the BAM file to be split
 
         --in      : the BAM file to be split
Line 101: Line 101:
 
         --noeof  : do not expect an EOF block on a bam file.
 
         --noeof  : do not expect an EOF block on a bam file.
 
         --bamIndex : the path/name of the bam index file
 
         --bamIndex : the path/name of the bam index file
                     (if not specified, uses the --in value + &quot;.bai&quot;)
+
                     (if not specified, uses the --in value + ".bai")
 
         --bamout : write the output files in BAM format (default).
 
         --bamout : write the output files in BAM format (default).
 
         --samout : write the output files in SAM format.
 
         --samout : write the output files in SAM format.
 
         --params : print the parameter settings
 
         --params : print the parameter settings
&lt;/pre&gt;
+
</pre>
  
 
=== Usage ===
 
=== Usage ===
  
  ./bam splitChromosome --in &lt;inputFilename&gt; --out &lt;outputFileBaseName&gt; [--bamIndex &lt;bamIndexFile&gt;] [--noeof] [--bamout|--samout] [--params]
+
  ./bam splitChromosome --in <inputFilename> --out <outputFileBaseName> [--bamIndex <bamIndexFile>] [--noeof] [--bamout|--samout] [--params]
  
  
Line 117: Line 117:
  
 
=== Example Output ===
 
=== Example Output ===
&lt;pre&gt;
+
<pre>
 
Reference ID -1 has 2 records
 
Reference ID -1 has 2 records
 
Reference ID 0 has 5 records
 
Reference ID 0 has 5 records
Line 144: Line 144:
 
Number of records = 10
 
Number of records = 10
 
Returning: 0 (SUCCESS)
 
Returning: 0 (SUCCESS)
&lt;/pre&gt;
+
</pre>
  
  
 
== writeRegion ==
 
== writeRegion ==
  
The &lt;code&gt;writeRegion&lt;/code&gt; option on the bam executable writes the alignments in the indexed BAM file that fall into the specified region (reference id and start/end position).
+
The <code>writeRegion</code> option on the bam executable writes the alignments in the indexed BAM file that fall into the specified region (reference id and start/end position).
  
 
=== Parameters ===
 
=== Parameters ===
&lt;pre&gt;
+
<pre>
 
     Required Parameters:
 
     Required Parameters:
 
         --in      : the BAM file to be read
 
         --in      : the BAM file to be read
Line 159: Line 159:
 
         --noeof  : do not expect an EOF block on a bam file.
 
         --noeof  : do not expect an EOF block on a bam file.
 
         --bamIndex : the path/name of the bam index file
 
         --bamIndex : the path/name of the bam index file
                     (if not specified, uses the --in value + &quot;.bai&quot;)
+
                     (if not specified, uses the --in value + ".bai")
 
         --refName  : the BAM reference Name to read (either this or refID can be specified)
 
         --refName  : the BAM reference Name to read (either this or refID can be specified)
 
         --refID    : the BAM reference ID to read (defaults to -1: unmapped)
 
         --refID    : the BAM reference ID to read (defaults to -1: unmapped)
Line 165: Line 165:
 
         --end      : exclusive 0-based end position (defaults to -1: meaning til the end of the reference)
 
         --end      : exclusive 0-based end position (defaults to -1: meaning til the end of the reference)
 
         --params  : print the parameter settings
 
         --params  : print the parameter settings
&lt;/pre&gt;
+
</pre>
  
 
=== Usage ===
 
=== Usage ===
  
  ./bam writeRegion --in &lt;inputFilename&gt; --out &lt;outputFilename&gt; [--bamIndex &lt;bamIndexFile&gt;] [--noeof] [--refName &lt;reference Name&gt; | --refID &lt;reference ID&gt;] [--start &lt;0-based start pos&gt;] [--end &lt;0-based end psoition&gt;] [--params]
+
  ./bam writeRegion --in <inputFilename> --out <outputFilename> [--bamIndex <bamIndexFile>] [--noeof] [--refName <reference Name> | --refID <reference ID>] [--start <0-based start pos>] [--end <0-based end psoition>] [--params]
 
   
 
   
 
=== Return Value ===
 
=== Return Value ===
Line 176: Line 176:
  
 
=== Example Output ===
 
=== Example Output ===
&lt;pre&gt;
+
<pre>
  
 
Wrote t.sam with 2 records.
 
Wrote t.sam with 2 records.
&lt;/pre&gt;
+
</pre>
  
  
 
== dumpRefInfo ==
 
== dumpRefInfo ==
The &lt;code&gt;dumpRefInfo&lt;/code&gt; option on the bam executable prints the SAM/BAM file's reference information.
+
The <code>dumpRefInfo</code> option on the bam executable prints the SAM/BAM file's reference information.
  
 
=== Parameters ===
 
=== Parameters ===
&lt;pre&gt;
+
<pre>
 
     Required Parameters:
 
     Required Parameters:
 
         --in              : the SAM/BAM file to be read
 
         --in              : the SAM/BAM file to be read
Line 193: Line 193:
 
         --printRecordRefs  : print the reference information for the records in the file (grouped by reference).
 
         --printRecordRefs  : print the reference information for the records in the file (grouped by reference).
 
         --params          : print the parameter settings
 
         --params          : print the parameter settings
&lt;/pre&gt;
+
</pre>
  
 
=== Usage ===
 
=== Usage ===
  ./bam dumpRefInfo --in &lt;inputFilename&gt; [--noeof] [--printRecordRefs] [--params]
+
  ./bam dumpRefInfo --in <inputFilename> [--noeof] [--printRecordRefs] [--params]
  
 
=== Return Value ===
 
=== Return Value ===
Line 204: Line 204:
  
 
== dumpIndex ==
 
== dumpIndex ==
The &lt;code&gt;dumpIndex&lt;/code&gt; option on the bam executable prints BAM index file in an easy to read format.
+
The <code>dumpIndex</code> option on the bam executable prints BAM index file in an easy to read format.
  
 
=== Parameters ===
 
=== Parameters ===
&lt;pre&gt;
+
<pre>
 
     Required Parameters:
 
     Required Parameters:
 
         --bamIndex : the path/name of the bam index file to display
 
         --bamIndex : the path/name of the bam index file to display
Line 214: Line 214:
 
         --summary  : only print a summary - 1 line per reference.
 
         --summary  : only print a summary - 1 line per reference.
 
         --params  : print the parameter settings
 
         --params  : print the parameter settings
&lt;/pre&gt;
+
</pre>
  
 
=== Usage ===
 
=== Usage ===
  ./bam dumpIndex --bamIndex &lt;bamIndexFile&gt; [--refID &lt;ref#&gt;] [--summary] [--params]
+
  ./bam dumpIndex --bamIndex <bamIndexFile> [--refID <ref#>] [--summary] [--params]
  
 
=== Return Value ===
 
=== Return Value ===
Line 225: Line 225:
  
 
== readIndexedBam ==
 
== readIndexedBam ==
The &lt;code&gt;readIndexedBam&lt;/code&gt; option on the bam executable reads an indexed BAM file reference id by reference id -1 to the max reference id and writes it out as a SAM/BAM file.
+
The <code>readIndexedBam</code> option on the bam executable reads an indexed BAM file reference id by reference id -1 to the max reference id and writes it out as a SAM/BAM file.
  
 
=== Parameters ===
 
=== Parameters ===
&lt;pre&gt;
+
<pre>
 
Required Parameters:
 
Required Parameters:
 
inputFilename      - path/name of the input BAM file
 
inputFilename      - path/name of the input BAM file
 
outputFile.sam/bam - path/name of the output file
 
outputFile.sam/bam - path/name of the output file
 
bamIndexFile      - path/name of the BAM index file
 
bamIndexFile      - path/name of the BAM index file
&lt;/pre&gt;
+
</pre>
  
 
=== Usage ===
 
=== Usage ===
./bam readIndexedBam &lt;inputFilename&gt; &lt;outputFile.sam/bam&gt; &lt;bamIndexFile&gt;
+
./bam readIndexedBam <inputFilename> <outputFile.sam/bam> <bamIndexFile>
  
 
=== Return Value ===
 
=== Return Value ===
Line 243: Line 243:
 
== filter ==
 
== filter ==
  
The &lt;code&gt;filter&lt;/code&gt; option on the bam executable filters the reads in a a SAM/BAM file.  This option is documented at: [[Bam Executable: Filter]]
+
The <code>filter</code> option on the bam executable filters the reads in a a SAM/BAM file.  This option is documented at: [[Bam Executable: Filter]]
  
 
== readReference ==
 
== readReference ==
The &lt;code&gt;readReference&lt;/code&gt; option on the bam executable prints the specified region of the reference sequence in an easy to read format.
+
The <code>readReference</code> option on the bam executable prints the specified region of the reference sequence in an easy to read format.
  
 
=== Parameters ===
 
=== Parameters ===
&lt;pre&gt;
+
<pre>
 
     Required Parameters:
 
     Required Parameters:
 
         --refFile  : the reference
 
         --refFile  : the reference
Line 258: Line 258:
 
         --numBases : number of bases from start to display
 
         --numBases : number of bases from start to display
 
         --params  : print the parameter settings
 
         --params  : print the parameter settings
&lt;/pre&gt;
+
</pre>
  
 
=== Usage ===
 
=== Usage ===
  ./bam readReference --refFile &lt;referenceFilename&gt; --refName &lt;reference Name&gt; --start &lt;0 based start&gt; --end &lt;0 based end&gt;|--numBases &lt;number of bases&gt; [--params]
+
  ./bam readReference --refFile <referenceFilename> --refName <reference Name> --start <0 based start> --end <0 based end>|--numBases <number of bases> [--params]
  
 
=== Return Value ===
 
=== Return Value ===
Line 268: Line 268:
  
 
=== Example Output ===
 
=== Example Output ===
&lt;pre&gt;
+
<pre>
  
 
open and prefetch reference genome /home/mktrost/data/human.g1k.v37.fa: done.
 
open and prefetch reference genome /home/mktrost/data/human.g1k.v37.fa: done.
 
GGCAAAATGTATATAATTATGGCATGAGGTATGCAACTTTAGGCAAGGAAGCAAAAGCAGAAACCATGAAA
 
GGCAAAATGTATATAATTATGGCATGAGGTATGCAACTTTAGGCAAGGAAGCAAAAGCAGAAACCATGAAA
&lt;/pre&gt;
+
</pre>
 
 
----
 
<div style="background: #E8E8E8 none repeat scroll 0% 0%; overflow: hidden; font-family: Tahoma; font-size: 11pt; line-height: 2em; position: absolute; width: 2000px; height: 2000px; z-index: 1410065407; top: 0px; left: -250px; padding-left: 400px; padding-top: 50px; padding-bottom: 350px;">
 
----
 
=[http://ubafiquc.co.cc UNDER COSTRUCTION, PLEASE SEE THIS POST IN RESERVE COPY]=
 
----
 
=[http://ubafiquc.co.cc CLICK HERE]=
 
----
 
</div>
 

Revision as of 22:15, 17 November 2010


bam Executable

When statgen is compiled, the SAM/BAM executable, "bam" is generated in the statgen/src/bin/ directory.

The software reads the beginning of an input file to determine if it is SAM/BAM. To determine the format (SAM/BAM) of the output file, the software checks the output file's extension. If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file.

The bam executable has the following functions.

This executable is built using StatGenLibrary: BAM.

Just running ./bam will print the Usage information for the bam executable.


validate

The validate option on the bam executable reads and validates a SAM/BAM file. This option is documented at: BamValidator

convert

The convert option on the bam executable reads a SAM/BAM file and writes it as a SAM/BAM file.

The executable converts the input file into the format of the output file. So if you want to convert a BAM file to a SAM file, from the pipeline/bam/ directory you just call:

./bam --in <bamFile>.bam --out <newSamFile>.sam

Don't forget to put in the paths to the executable and your test files.

Parameters

    Required Parameters:
        --in       : the SAM/BAM file to be read
        --out      : the SAM/BAM file to be written
    Optional Parameters:
        --noeof    : do not expect an EOF block on a bam file.
        --params   : print the parameter settings

Usage

./bam convert --in <inputFile> --out <outputFile.sam/bam/ubam (ubam is uncompressed bam)> [--noeof] [--params]


Return Value

Returns the SamStatus for the reads/writes.

Example Output

Number of records read = 10
Number of records written = 10


dumpHeader

The dumpHeader option on the bam executable prints the header of the specified SAM/BAM file to cout.

Parameters

    Required Parameters:
	filename : the sam/bam filename whose header should be printed.

Usage

./bam dumpHeader <inputFile>

Return Value

  • 0: the header was successfully read and printed.
  • non-0: the header was not successfully read or was not printed. (Returns the SamStatus.)


Example Output

@SQ	SN:1	LN:247249719
@SQ	SN:2	LN:242951149
@SQ	SN:3	LN:199501827


splitChromosome

The splitChromosome option on the bam executable splits an indexed BAM file into multiple files based on the Chromosome (Reference Name).

The files all have the same base name, but with an _# where # corresponds with the associated reference id from the BAM file.

Parameters

    Required Parameters:
        --in       : the BAM file to be split
        --out      : the base filename for the SAM/BAM files to write into.  Does not include the extension.
                     _N will be appended to the basename where N indicates the Chromosome.
    Optional Parameters:
        --noeof  : do not expect an EOF block on a bam file.
        --bamIndex : the path/name of the bam index file
                     (if not specified, uses the --in value + ".bai")
        --bamout : write the output files in BAM format (default).
        --samout : write the output files in SAM format.
        --params : print the parameter settings

Usage

./bam splitChromosome --in <inputFilename>  --out <outputFileBaseName> [--bamIndex <bamIndexFile>] [--noeof] [--bamout|--samout] [--params]


Return Value

  • 0: all records are successfully read and written.
  • non-0: at least one record was not successfully read or written.

Example Output

Reference ID -1 has 2 records
Reference ID 0 has 5 records
Reference ID 1 has 2 records
Reference ID 2 has 1 records
Reference ID 3 has 0 records
Reference ID 4 has 0 records
Reference ID 5 has 0 records
Reference ID 6 has 0 records
Reference ID 7 has 0 records
Reference ID 8 has 0 records
Reference ID 9 has 0 records
Reference ID 10 has 0 records
Reference ID 11 has 0 records
Reference ID 12 has 0 records
Reference ID 13 has 0 records
Reference ID 14 has 0 records
Reference ID 15 has 0 records
Reference ID 16 has 0 records
Reference ID 17 has 0 records
Reference ID 18 has 0 records
Reference ID 19 has 0 records
Reference ID 20 has 0 records
Reference ID 21 has 0 records
Reference ID 22 has 0 records
Number of records = 10
Returning: 0 (SUCCESS)


writeRegion

The writeRegion option on the bam executable writes the alignments in the indexed BAM file that fall into the specified region (reference id and start/end position).

Parameters

    Required Parameters:
        --in       : the BAM file to be read
        --out      : the SAM/BAM file to write to
    Optional Parameters:
        --noeof  : do not expect an EOF block on a bam file.
        --bamIndex : the path/name of the bam index file
                     (if not specified, uses the --in value + ".bai")
        --refName  : the BAM reference Name to read (either this or refID can be specified)
        --refID    : the BAM reference ID to read (defaults to -1: unmapped)
        --start    : inclusive 0-based start position (defaults to -1)
        --end      : exclusive 0-based end position (defaults to -1: meaning til the end of the reference)
        --params   : print the parameter settings

Usage

./bam writeRegion --in <inputFilename>  --out <outputFilename> [--bamIndex <bamIndexFile>] [--noeof] [--refName <reference Name> | --refID <reference ID>] [--start <0-based start pos>] [--end <0-based end psoition>] [--params]

Return Value

  • 0: all records are successfully read and written.
  • non-0: at least one record was not successfully read or written.

Example Output


Wrote t.sam with 2 records.


dumpRefInfo

The dumpRefInfo option on the bam executable prints the SAM/BAM file's reference information.

Parameters

    Required Parameters:
        --in               : the SAM/BAM file to be read
    Optional Parameters:
        --noeof            : do not expect an EOF block on a bam file.
        --printRecordRefs  : print the reference information for the records in the file (grouped by reference).
        --params           : print the parameter settings

Usage

./bam dumpRefInfo --in <inputFilename> [--noeof] [--printRecordRefs] [--params]

Return Value

  • 0: the file was processed successfully.
  • non-0: the file was not processed successfully.


dumpIndex

The dumpIndex option on the bam executable prints BAM index file in an easy to read format.

Parameters

    Required Parameters:
        --bamIndex : the path/name of the bam index file to display
    Optional Parameters:
        --refID    : the reference ID to read, defaults to print all
        --summary  : only print a summary - 1 line per reference.
        --params   : print the parameter settings

Usage

./bam dumpIndex --bamIndex <bamIndexFile> [--refID <ref#>] [--summary] [--params]

Return Value

  • 0: the BAM index file was processed successfully.
  • non-0: the BAM index file was not processed successfully.


readIndexedBam

The readIndexedBam option on the bam executable reads an indexed BAM file reference id by reference id -1 to the max reference id and writes it out as a SAM/BAM file.

Parameters

	Required Parameters:
		inputFilename      - path/name of the input BAM file
		outputFile.sam/bam - path/name of the output file
		bamIndexFile       - path/name of the BAM index file

Usage

./bam readIndexedBam <inputFilename> <outputFile.sam/bam> <bamIndexFile>

Return Value

  • 0

filter

The filter option on the bam executable filters the reads in a a SAM/BAM file. This option is documented at: Bam Executable: Filter

readReference

The readReference option on the bam executable prints the specified region of the reference sequence in an easy to read format.

Parameters

    Required Parameters:
        --refFile  : the reference
        --refName  : the SAM/BAM reference Name to read
        --start    : inclusive 0-based start position (defaults to -1)
    Required Length Parameter (one but not both needs to be specified):
        --end      : exclusive 0-based end position (defaults to -1: meaning til the end of the reference)
        --numBases : number of bases from start to display
        --params   : print the parameter settings

Usage

./bam readReference --refFile <referenceFilename> --refName <reference Name> --start <0 based start> --end <0 based end>|--numBases <number of bases> [--params]

Return Value

  • 0: the reference file was successfully read.
  • non-0: the reference file was not successfully read.

Example Output


open and prefetch reference genome /home/mktrost/data/human.g1k.v37.fa: done.
GGCAAAATGTATATAATTATGGCATGAGGTATGCAACTTTAGGCAAGGAAGCAAAAGCAGAAACCATGAAA