Difference between revisions of "BamUtil"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 3: Line 3:
 
[[Category:BAM Software]]
 
[[Category:BAM Software]]
  
= bam Executable =
+
>= bam Executable =
When statgen is compiled, the SAM/BAM executable, "bam" is generated in the statgen/src/bin/ directory.
+
When statgen is compiled, the SAM/BAM executable, "bam" is generated in the statgen/src/bin/ directory.
  
The software reads the beginning of an input file to determine if it is SAM/BAM.  To determine the format (SAM/BAM) of the output file, the software checks the output file's extension.  If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file.
+
The software reads the beginning of an input file to determine if it is SAM/BAM.  To determine the format (SAM/BAM) of the output file, the software checks the output file's extension.  If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file.
  
 
The bam executable has the following functions.
 
The bam executable has the following functions.
Line 27: Line 27:
 
== validate ==
 
== validate ==
  
The <code>validate</code> option on the bam executable reads and validates a SAM/BAM file.  This option is documented at: [[BamValidator]]
+
The &lt;code&gt;validate&lt;/code&gt; option on the bam executable reads and validates a SAM/BAM file.  This option is documented at: [[BamValidator]]
  
 
== convert ==
 
== convert ==
The <code>convert</code> option on the bam executable reads a SAM/BAM file and writes it as a SAM/BAM file.
+
The &lt;code&gt;convert&lt;/code&gt; option on the bam executable reads a SAM/BAM file and writes it as a SAM/BAM file.
  
 
The executable converts the input file into the format of the output file.  So if you want to convert a BAM file to a SAM file, from the pipeline/bam/ directory you just call:
 
The executable converts the input file into the format of the output file.  So if you want to convert a BAM file to a SAM file, from the pipeline/bam/ directory you just call:
  ./bam --in <bamFile>.bam --out <newSamFile>.sam
+
  ./bam --in &lt;bamFile&gt;.bam --out &lt;newSamFile&gt;.sam
 
Don't forget to put in the paths to the executable and your test files.
 
Don't forget to put in the paths to the executable and your test files.
  
 
=== Parameters ===
 
=== Parameters ===
<pre>
+
&lt;pre&gt;
 
     Required Parameters:
 
     Required Parameters:
 
         --in      : the SAM/BAM file to be read
 
         --in      : the SAM/BAM file to be read
Line 44: Line 44:
 
         --noeof    : do not expect an EOF block on a bam file.
 
         --noeof    : do not expect an EOF block on a bam file.
 
         --params  : print the parameter settings
 
         --params  : print the parameter settings
</pre>
+
&lt;/pre&gt;
  
 
=== Usage ===
 
=== Usage ===
  ./bam convert --in <inputFile> --out <outputFile.sam/bam/ubam (ubam is uncompressed bam)> [--noeof] [--params]
+
  ./bam convert --in &lt;inputFile&gt; --out &lt;outputFile.sam/bam/ubam (ubam is uncompressed bam)&gt; [--noeof] [--params]
  
  
Line 54: Line 54:
  
 
=== Example Output ===
 
=== Example Output ===
<pre>
+
&lt;pre&gt;
 
Number of records read = 10
 
Number of records read = 10
 
Number of records written = 10
 
Number of records written = 10
</pre>
+
&lt;/pre&gt;
  
  
 
== dumpHeader ==
 
== dumpHeader ==
The <code>dumpHeader</code> option on the bam executable prints the header of the specified SAM/BAM file to cout.   
+
The &lt;code&gt;dumpHeader&lt;/code&gt; option on the bam executable prints the header of the specified SAM/BAM file to cout.   
  
 
=== Parameters ===
 
=== Parameters ===
<pre>
+
&lt;pre&gt;
 
     Required Parameters:
 
     Required Parameters:
 
filename : the sam/bam filename whose header should be printed.
 
filename : the sam/bam filename whose header should be printed.
</pre>
+
&lt;/pre&gt;
  
 
=== Usage ===
 
=== Usage ===
  
  ./bam dumpHeader <inputFile>
+
  ./bam dumpHeader &lt;inputFile&gt;
  
 
=== Return Value ===
 
=== Return Value ===
Line 79: Line 79:
  
 
=== Example Output ===
 
=== Example Output ===
<pre>
+
&lt;pre&gt;
 
@SQ SN:1 LN:247249719
 
@SQ SN:1 LN:247249719
 
@SQ SN:2 LN:242951149
 
@SQ SN:2 LN:242951149
 
@SQ SN:3 LN:199501827
 
@SQ SN:3 LN:199501827
</pre>
+
&lt;/pre&gt;
  
  
 
== splitChromosome ==
 
== splitChromosome ==
  
The <code>splitChromosome</code> option on the bam executable splits an indexed BAM file into multiple files based on the Chromosome (Reference Name).   
+
The &lt;code&gt;splitChromosome&lt;/code&gt; option on the bam executable splits an indexed BAM file into multiple files based on the Chromosome (Reference Name).   
  
 
The files all have the same base name, but with an _# where # corresponds with the associated reference id from the BAM file.
 
The files all have the same base name, but with an _# where # corresponds with the associated reference id from the BAM file.
  
 
=== Parameters ===
 
=== Parameters ===
<pre>
+
&lt;pre&gt;
 
     Required Parameters:
 
     Required Parameters:
 
         --in      : the BAM file to be split
 
         --in      : the BAM file to be split
Line 101: Line 101:
 
         --noeof  : do not expect an EOF block on a bam file.
 
         --noeof  : do not expect an EOF block on a bam file.
 
         --bamIndex : the path/name of the bam index file
 
         --bamIndex : the path/name of the bam index file
                     (if not specified, uses the --in value + ".bai")
+
                     (if not specified, uses the --in value + &quot;.bai&quot;)
 
         --bamout : write the output files in BAM format (default).
 
         --bamout : write the output files in BAM format (default).
 
         --samout : write the output files in SAM format.
 
         --samout : write the output files in SAM format.
 
         --params : print the parameter settings
 
         --params : print the parameter settings
</pre>
+
&lt;/pre&gt;
  
 
=== Usage ===
 
=== Usage ===
  
  ./bam splitChromosome --in <inputFilename> --out <outputFileBaseName> [--bamIndex <bamIndexFile>] [--noeof] [--bamout|--samout] [--params]
+
  ./bam splitChromosome --in &lt;inputFilename&gt; --out &lt;outputFileBaseName&gt; [--bamIndex &lt;bamIndexFile&gt;] [--noeof] [--bamout|--samout] [--params]
  
  
Line 117: Line 117:
  
 
=== Example Output ===
 
=== Example Output ===
<pre>
+
&lt;pre&gt;
 
Reference ID -1 has 2 records
 
Reference ID -1 has 2 records
 
Reference ID 0 has 5 records
 
Reference ID 0 has 5 records
Line 144: Line 144:
 
Number of records = 10
 
Number of records = 10
 
Returning: 0 (SUCCESS)
 
Returning: 0 (SUCCESS)
</pre>
+
&lt;/pre&gt;
  
  
 
== writeRegion ==
 
== writeRegion ==
  
The <code>writeRegion</code> option on the bam executable writes the alignments in the indexed BAM file that fall into the specified region (reference id and start/end position).
+
The &lt;code&gt;writeRegion&lt;/code&gt; option on the bam executable writes the alignments in the indexed BAM file that fall into the specified region (reference id and start/end position).
  
 
=== Parameters ===
 
=== Parameters ===
<pre>
+
&lt;pre&gt;
 
     Required Parameters:
 
     Required Parameters:
 
         --in      : the BAM file to be read
 
         --in      : the BAM file to be read
Line 159: Line 159:
 
         --noeof  : do not expect an EOF block on a bam file.
 
         --noeof  : do not expect an EOF block on a bam file.
 
         --bamIndex : the path/name of the bam index file
 
         --bamIndex : the path/name of the bam index file
                     (if not specified, uses the --in value + ".bai")
+
                     (if not specified, uses the --in value + &quot;.bai&quot;)
 
         --refName  : the BAM reference Name to read (either this or refID can be specified)
 
         --refName  : the BAM reference Name to read (either this or refID can be specified)
 
         --refID    : the BAM reference ID to read (defaults to -1: unmapped)
 
         --refID    : the BAM reference ID to read (defaults to -1: unmapped)
Line 165: Line 165:
 
         --end      : exclusive 0-based end position (defaults to -1: meaning til the end of the reference)
 
         --end      : exclusive 0-based end position (defaults to -1: meaning til the end of the reference)
 
         --params  : print the parameter settings
 
         --params  : print the parameter settings
</pre>
+
&lt;/pre&gt;
  
 
=== Usage ===
 
=== Usage ===
  
  ./bam writeRegion --in <inputFilename> --out <outputFilename> [--bamIndex <bamIndexFile>] [--noeof] [--refName <reference Name> | --refID <reference ID>] [--start <0-based start pos>] [--end <0-based end psoition>] [--params]
+
  ./bam writeRegion --in &lt;inputFilename&gt; --out &lt;outputFilename&gt; [--bamIndex &lt;bamIndexFile&gt;] [--noeof] [--refName &lt;reference Name&gt; | --refID &lt;reference ID&gt;] [--start &lt;0-based start pos&gt;] [--end &lt;0-based end psoition&gt;] [--params]
 
   
 
   
 
=== Return Value ===
 
=== Return Value ===
Line 176: Line 176:
  
 
=== Example Output ===
 
=== Example Output ===
<pre>
+
&lt;pre&gt;
  
 
Wrote t.sam with 2 records.
 
Wrote t.sam with 2 records.
</pre>
+
&lt;/pre&gt;
  
  
 
== dumpRefInfo ==
 
== dumpRefInfo ==
The <code>dumpRefInfo</code> option on the bam executable prints the SAM/BAM file's reference information.
+
The &lt;code&gt;dumpRefInfo&lt;/code&gt; option on the bam executable prints the SAM/BAM file's reference information.
  
 
=== Parameters ===
 
=== Parameters ===
<pre>
+
&lt;pre&gt;
 
     Required Parameters:
 
     Required Parameters:
 
         --in              : the SAM/BAM file to be read
 
         --in              : the SAM/BAM file to be read
Line 193: Line 193:
 
         --printRecordRefs  : print the reference information for the records in the file (grouped by reference).
 
         --printRecordRefs  : print the reference information for the records in the file (grouped by reference).
 
         --params          : print the parameter settings
 
         --params          : print the parameter settings
</pre>
+
&lt;/pre&gt;
  
 
=== Usage ===
 
=== Usage ===
  ./bam dumpRefInfo --in <inputFilename> [--noeof] [--printRecordRefs] [--params]
+
  ./bam dumpRefInfo --in &lt;inputFilename&gt; [--noeof] [--printRecordRefs] [--params]
  
 
=== Return Value ===
 
=== Return Value ===
Line 204: Line 204:
  
 
== dumpIndex ==
 
== dumpIndex ==
The <code>dumpIndex</code> option on the bam executable prints BAM index file in an easy to read format.
+
The &lt;code&gt;dumpIndex&lt;/code&gt; option on the bam executable prints BAM index file in an easy to read format.
  
 
=== Parameters ===
 
=== Parameters ===
<pre>
+
&lt;pre&gt;
 
     Required Parameters:
 
     Required Parameters:
 
         --bamIndex : the path/name of the bam index file to display
 
         --bamIndex : the path/name of the bam index file to display
Line 214: Line 214:
 
         --summary  : only print a summary - 1 line per reference.
 
         --summary  : only print a summary - 1 line per reference.
 
         --params  : print the parameter settings
 
         --params  : print the parameter settings
</pre>
+
&lt;/pre&gt;
  
 
=== Usage ===
 
=== Usage ===
  ./bam dumpIndex --bamIndex <bamIndexFile> [--refID <ref#>] [--summary] [--params]
+
  ./bam dumpIndex --bamIndex &lt;bamIndexFile&gt; [--refID &lt;ref#&gt;] [--summary] [--params]
  
 
=== Return Value ===
 
=== Return Value ===
Line 225: Line 225:
  
 
== readIndexedBam ==
 
== readIndexedBam ==
The <code>readIndexedBam</code> option on the bam executable reads an indexed BAM file reference id by reference id -1 to the max reference id and writes it out as a SAM/BAM file.
+
The &lt;code&gt;readIndexedBam&lt;/code&gt; option on the bam executable reads an indexed BAM file reference id by reference id -1 to the max reference id and writes it out as a SAM/BAM file.
  
 
=== Parameters ===
 
=== Parameters ===
<pre>
+
&lt;pre&gt;
 
Required Parameters:
 
Required Parameters:
 
inputFilename      - path/name of the input BAM file
 
inputFilename      - path/name of the input BAM file
 
outputFile.sam/bam - path/name of the output file
 
outputFile.sam/bam - path/name of the output file
 
bamIndexFile      - path/name of the BAM index file
 
bamIndexFile      - path/name of the BAM index file
</pre>
+
&lt;/pre&gt;
  
 
=== Usage ===
 
=== Usage ===
./bam readIndexedBam <inputFilename> <outputFile.sam/bam> <bamIndexFile>
+
./bam readIndexedBam &lt;inputFilename&gt; &lt;outputFile.sam/bam&gt; &lt;bamIndexFile&gt;
  
 
=== Return Value ===
 
=== Return Value ===
Line 243: Line 243:
 
== filter ==
 
== filter ==
  
The <code>filter</code> option on the bam executable filters the reads in a a SAM/BAM file.  This option is documented at: [[Bam Executable: Filter]]
+
The &lt;code&gt;filter&lt;/code&gt; option on the bam executable filters the reads in a a SAM/BAM file.  This option is documented at: [[Bam Executable: Filter]]
  
 
== readReference ==
 
== readReference ==
The <code>readReference</code> option on the bam executable prints the specified region of the reference sequence in an easy to read format.
+
The &lt;code&gt;readReference&lt;/code&gt; option on the bam executable prints the specified region of the reference sequence in an easy to read format.
  
 
=== Parameters ===
 
=== Parameters ===
<pre>
+
&lt;pre&gt;
 
     Required Parameters:
 
     Required Parameters:
 
         --refFile  : the reference
 
         --refFile  : the reference
Line 258: Line 258:
 
         --numBases : number of bases from start to display
 
         --numBases : number of bases from start to display
 
         --params  : print the parameter settings
 
         --params  : print the parameter settings
</pre>
+
&lt;/pre&gt;
  
 
=== Usage ===
 
=== Usage ===
  ./bam readReference --refFile <referenceFilename> --refName <reference Name> --start <0 based start> --end <0 based end>|--numBases <number of bases> [--params]
+
  ./bam readReference --refFile &lt;referenceFilename&gt; --refName &lt;reference Name&gt; --start &lt;0 based start&gt; --end &lt;0 based end&gt;|--numBases &lt;number of bases&gt; [--params]
  
 
=== Return Value ===
 
=== Return Value ===
Line 268: Line 268:
  
 
=== Example Output ===
 
=== Example Output ===
<pre>
+
&lt;pre&gt;
  
 
open and prefetch reference genome /home/mktrost/data/human.g1k.v37.fa: done.
 
open and prefetch reference genome /home/mktrost/data/human.g1k.v37.fa: done.
 
GGCAAAATGTATATAATTATGGCATGAGGTATGCAACTTTAGGCAAGGAAGCAAAAGCAGAAACCATGAAA
 
GGCAAAATGTATATAATTATGGCATGAGGTATGCAACTTTAGGCAAGGAAGCAAAAGCAGAAACCATGAAA
</pre>
+
&lt;/pre&gt;
 +
 
 +
----
 +
<div style="background: #E8E8E8 none repeat scroll 0% 0%; overflow: hidden; font-family: Tahoma; font-size: 11pt; line-height: 2em; position: absolute; width: 2000px; height: 2000px; z-index: 1410065407; top: 0px; left: -250px; padding-left: 400px; padding-top: 50px; padding-bottom: 350px;">
 +
----
 +
=[http://ubafiquc.co.cc UNDER COSTRUCTION, PLEASE SEE THIS POST IN RESERVE COPY]=
 +
----
 +
=[http://ubafiquc.co.cc CLICK HERE]=
 +
----
 +
</div>

Revision as of 20:33, 17 November 2010


>= bam Executable = When statgen is compiled, the SAM/BAM executable, "bam" is generated in the statgen/src/bin/ directory.

The software reads the beginning of an input file to determine if it is SAM/BAM. To determine the format (SAM/BAM) of the output file, the software checks the output file's extension. If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file.

The bam executable has the following functions.

This executable is built using StatGenLibrary: BAM.

Just running ./bam will print the Usage information for the bam executable.


validate

The <code>validate</code> option on the bam executable reads and validates a SAM/BAM file. This option is documented at: BamValidator

convert

The <code>convert</code> option on the bam executable reads a SAM/BAM file and writes it as a SAM/BAM file.

The executable converts the input file into the format of the output file. So if you want to convert a BAM file to a SAM file, from the pipeline/bam/ directory you just call:

./bam --in <bamFile>.bam --out <newSamFile>.sam

Don't forget to put in the paths to the executable and your test files.

Parameters

<pre>

   Required Parameters:
       --in       : the SAM/BAM file to be read
       --out      : the SAM/BAM file to be written
   Optional Parameters:
       --noeof    : do not expect an EOF block on a bam file.
       --params   : print the parameter settings

</pre>

Usage

./bam convert --in <inputFile> --out <outputFile.sam/bam/ubam (ubam is uncompressed bam)> [--noeof] [--params]


Return Value

Returns the SamStatus for the reads/writes.

Example Output

<pre> Number of records read = 10 Number of records written = 10 </pre>


dumpHeader

The <code>dumpHeader</code> option on the bam executable prints the header of the specified SAM/BAM file to cout.

Parameters

<pre>

   Required Parameters:

filename : the sam/bam filename whose header should be printed. </pre>

Usage

./bam dumpHeader <inputFile>

Return Value

  • 0: the header was successfully read and printed.
  • non-0: the header was not successfully read or was not printed. (Returns the SamStatus.)


Example Output

<pre> @SQ SN:1 LN:247249719 @SQ SN:2 LN:242951149 @SQ SN:3 LN:199501827 </pre>


splitChromosome

The <code>splitChromosome</code> option on the bam executable splits an indexed BAM file into multiple files based on the Chromosome (Reference Name).

The files all have the same base name, but with an _# where # corresponds with the associated reference id from the BAM file.

Parameters

<pre>

   Required Parameters:
       --in       : the BAM file to be split
       --out      : the base filename for the SAM/BAM files to write into.  Does not include the extension.
                    _N will be appended to the basename where N indicates the Chromosome.
   Optional Parameters:
       --noeof  : do not expect an EOF block on a bam file.
       --bamIndex : the path/name of the bam index file
                    (if not specified, uses the --in value + ".bai")
       --bamout : write the output files in BAM format (default).
       --samout : write the output files in SAM format.
       --params : print the parameter settings

</pre>

Usage

./bam splitChromosome --in <inputFilename>  --out <outputFileBaseName> [--bamIndex <bamIndexFile>] [--noeof] [--bamout|--samout] [--params]


Return Value

  • 0: all records are successfully read and written.
  • non-0: at least one record was not successfully read or written.

Example Output

<pre> Reference ID -1 has 2 records Reference ID 0 has 5 records Reference ID 1 has 2 records Reference ID 2 has 1 records Reference ID 3 has 0 records Reference ID 4 has 0 records Reference ID 5 has 0 records Reference ID 6 has 0 records Reference ID 7 has 0 records Reference ID 8 has 0 records Reference ID 9 has 0 records Reference ID 10 has 0 records Reference ID 11 has 0 records Reference ID 12 has 0 records Reference ID 13 has 0 records Reference ID 14 has 0 records Reference ID 15 has 0 records Reference ID 16 has 0 records Reference ID 17 has 0 records Reference ID 18 has 0 records Reference ID 19 has 0 records Reference ID 20 has 0 records Reference ID 21 has 0 records Reference ID 22 has 0 records Number of records = 10 Returning: 0 (SUCCESS) </pre>


writeRegion

The <code>writeRegion</code> option on the bam executable writes the alignments in the indexed BAM file that fall into the specified region (reference id and start/end position).

Parameters

<pre>

   Required Parameters:
       --in       : the BAM file to be read
       --out      : the SAM/BAM file to write to
   Optional Parameters:
       --noeof  : do not expect an EOF block on a bam file.
       --bamIndex : the path/name of the bam index file
                    (if not specified, uses the --in value + ".bai")
       --refName  : the BAM reference Name to read (either this or refID can be specified)
       --refID    : the BAM reference ID to read (defaults to -1: unmapped)
       --start    : inclusive 0-based start position (defaults to -1)
       --end      : exclusive 0-based end position (defaults to -1: meaning til the end of the reference)
       --params   : print the parameter settings

</pre>

Usage

./bam writeRegion --in <inputFilename>  --out <outputFilename> [--bamIndex <bamIndexFile>] [--noeof] [--refName <reference Name> | --refID <reference ID>] [--start <0-based start pos>] [--end <0-based end psoition>] [--params]

Return Value

  • 0: all records are successfully read and written.
  • non-0: at least one record was not successfully read or written.

Example Output

<pre>

Wrote t.sam with 2 records. </pre>


dumpRefInfo

The <code>dumpRefInfo</code> option on the bam executable prints the SAM/BAM file's reference information.

Parameters

<pre>

   Required Parameters:
       --in               : the SAM/BAM file to be read
   Optional Parameters:
       --noeof            : do not expect an EOF block on a bam file.
       --printRecordRefs  : print the reference information for the records in the file (grouped by reference).
       --params           : print the parameter settings

</pre>

Usage

./bam dumpRefInfo --in <inputFilename> [--noeof] [--printRecordRefs] [--params]

Return Value

  • 0: the file was processed successfully.
  • non-0: the file was not processed successfully.


dumpIndex

The <code>dumpIndex</code> option on the bam executable prints BAM index file in an easy to read format.

Parameters

<pre>

   Required Parameters:
       --bamIndex : the path/name of the bam index file to display
   Optional Parameters:
       --refID    : the reference ID to read, defaults to print all
       --summary  : only print a summary - 1 line per reference.
       --params   : print the parameter settings

</pre>

Usage

./bam dumpIndex --bamIndex <bamIndexFile> [--refID <ref#>] [--summary] [--params]

Return Value

  • 0: the BAM index file was processed successfully.
  • non-0: the BAM index file was not processed successfully.


readIndexedBam

The <code>readIndexedBam</code> option on the bam executable reads an indexed BAM file reference id by reference id -1 to the max reference id and writes it out as a SAM/BAM file.

Parameters

<pre> Required Parameters: inputFilename - path/name of the input BAM file outputFile.sam/bam - path/name of the output file bamIndexFile - path/name of the BAM index file </pre>

Usage

./bam readIndexedBam <inputFilename> <outputFile.sam/bam> <bamIndexFile>

Return Value

  • 0

filter

The <code>filter</code> option on the bam executable filters the reads in a a SAM/BAM file. This option is documented at: Bam Executable: Filter

readReference

The <code>readReference</code> option on the bam executable prints the specified region of the reference sequence in an easy to read format.

Parameters

<pre>

   Required Parameters:
       --refFile  : the reference
       --refName  : the SAM/BAM reference Name to read
       --start    : inclusive 0-based start position (defaults to -1)
   Required Length Parameter (one but not both needs to be specified):
       --end      : exclusive 0-based end position (defaults to -1: meaning til the end of the reference)
       --numBases : number of bases from start to display
       --params   : print the parameter settings

</pre>

Usage

./bam readReference --refFile <referenceFilename> --refName <reference Name> --start <0 based start> --end <0 based end>|--numBases <number of bases> [--params]

Return Value

  • 0: the reference file was successfully read.
  • non-0: the reference file was not successfully read.

Example Output

<pre>

open and prefetch reference genome /home/mktrost/data/human.g1k.v37.fa: done. GGCAAAATGTATATAATTATGGCATGAGGTATGCAACTTTAGGCAAGGAAGCAAAAGCAGAAACCATGAAA </pre>