Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,958 bytes added ,  17:38, 6 October 2011
Add --in/--out sections
Line 1: Line 1: −
[[Category:BamUtil|convert]]
+
<br>
[[Category:BAM Software]]
  −
[[Category:Software]]
     −
= Overview of the <code>convert</code> function of <code>bamUtil</code> =
+
= Overview of the <code>convert</code> function of <code>bamUtil</code> =
The <code>convert</code> option on the [[bamUtil]] executable reads a SAM/BAM file and writes it as a SAM/BAM file.
     −
The executable converts the input file into the format of the output file.
+
The <code>convert</code> option on the [[BamUtil]] executable reads a SAM/BAM file and writes it as a SAM/BAM file.  
   −
It has options to allow for the conversion of the sequence to/from '=' from/to the actual bases by using the reference sequence.
+
The executable converts the input file into the format of the output file.  
   −
If you want to convert a BAM file to a SAM file, just call:
+
It has options to allow for the conversion of the sequence to/from '=' from/to the actual bases by using the reference sequence.  
<pathToExe>/bam --in <bamFile>.bam --out <newSamFile>.sam
  −
Don't forget to put in the paths to the executable and your test files.
     −
= Parameters =
+
If you want to convert a BAM file to a SAM file, just call:
<pre>
+
 
    Required Parameters:
+
&lt;pathToExe&gt;/bam --in &lt;bamFile&gt;.bam --out &lt;newSamFile&gt;.sam
         --in       : the SAM/BAM file to be read
+
 
         --out       : the SAM/BAM file to be written
+
Don't forget to put in the paths to the executable and your test files.
 +
 
 +
= Parameters =
 +
<pre>   Required Parameters:
 +
         --in       &nbsp;: the SAM/BAM file to be read
 +
         --out     &nbsp;: the SAM/BAM file to be written
 
     Optional Parameters:
 
     Optional Parameters:
--refFile   : reference file name
+
--refFile &nbsp;: reference file name
         --noeof     : do not expect an EOF block on a bam file.
+
         --noeof   &nbsp;: do not expect an EOF block on a bam file.
         --params   : print the parameter settings
+
         --params   &nbsp;: print the parameter settings
         --recover   : attempt to recover the input bam file.
+
         --recover &nbsp;: attempt to recover the input bam file.
 
     Optional Sequence Parameters (only specify one):
 
     Optional Sequence Parameters (only specify one):
--seqOrig   : Leave the sequence as is (default & used if reference is not specified).
+
--seqOrig &nbsp;: Leave the sequence as is (default &amp; used if reference is not specified).
--seqBases : Convert any '=' in the sequence to the appropriate base using the reference (requires --ref).
+
--seqBases &nbsp;: Convert any '=' in the sequence to the appropriate base using the reference (requires --ref).
--seqEquals : Convert any bases that match the reference to '=' (requires --ref).
+
--seqEquals&nbsp;: Convert any bases that match the reference to '=' (requires --ref).
</pre>
+
</pre>  
 +
== input File (<code>--in</code>)  ==
 +
 
 +
Use <code>--in</code> followed by your file name to specify the SAM/BAM input file.
 +
 
 +
The program automatically determines if your input file is SAM/BAM/uncompressed BAM without any input other than a filename from the user, unless your input file is stdin.
 +
 
 +
A <code>-</code> is used to indicate to read from stdin and the extension is used to determine the file type (no extension indicates SAM).
 +
 
 +
{|border="1" cellspacing="0" cellpadding="2"
 +
|SAM/BAM/Uncompressed BAM from file
 +
| <code>--in yourFileName</code>
 +
|-
 +
|SAM from stdin
 +
| <code>--in -</code>
 +
|-
 +
|BAM from stdin
 +
| <code>--in -.bam</code>
 +
|-
 +
|Uncompressed BAM from stdin
 +
| <code>--in -.ubam</code>
 +
|}
 +
 
 +
 
 +
Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file).  This matches the <code>samtools</code> implementation so pipes between our tools and <code>samtools</code> are supported.
 +
 
 +
== output File (<code>--out</code>) ==
 +
 
 +
Use <code>--out</code> followed by your file name to specify the SAM/BAM output file.
 +
 
 +
The file extension is used to determine whether to write SAM/BAM/uncompressed BAM.  A <code>-</code> is used to indicate stdout and the extension for file type (no extension is SAM).
 +
 
 +
{|border="1" cellspacing="0" cellpadding="2"
 +
|SAM to file
 +
| <code>--out yourFileName.sam</code>
 +
|-
 +
|BAM to file
 +
| <code>--out yourFileName.bam</code>
 +
|-
 +
|Uncompressed BAM to file
 +
| <code>--out yourFileName.ubam</code>
 +
|-
 +
|SAM to stdout
 +
| <code>--out -</code>
 +
|-
 +
|BAM to stdout
 +
| <code>--out -.bam</code>
 +
|-
 +
|Uncompressed BAM to stdout
 +
| <code>--out -.ubam</code>
 +
|}
 +
 
 +
 
 +
Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file).  This matches the <code>samtools</code> implementation so pipes between our tools and <code>samtools</code> are supported.
 +
 
   −
== Sequence Representation Parameters ==
+
== Sequence Representation Parameters ==
The sequence parameters options specify how to represent the sequence if the reference is specified (refFile option). If the reference is not specified or seqOrig is specified, no modifications are made to the sequence. If the reference and seqBases is specified, any matches between the sequence and the reference are represented in the sequence as the appropriate base. If the reference and seqEquals is specified, any matches between the sequence and the reference are represented in the sequence as '='.
+
 
 +
The sequence parameters options specify how to represent the sequence if the reference is specified (refFile option). If the reference is not specified or seqOrig is specified, no modifications are made to the sequence. If the reference and seqBases is specified, any matches between the sequence and the reference are represented in the sequence as the appropriate base. If the reference and seqEquals is specified, any matches between the sequence and the reference are represented in the sequence as '='.  
 +
 
 +
=== Examples  ===
   −
=== Examples ===
   
  ExtendedCigar: SSMMMDDMMMIMNNNMPMSSS
 
  ExtendedCigar: SSMMMDDMMMIMNNNMPMSSS
 
  Sequence:      AATAA  CTAGA  T AGGG
 
  Sequence:      AATAA  CTAGA  T AGGG
Line 62: Line 118:  
  Sequence with Equals: AA======G===GGG
 
  Sequence with Equals: AA======G===GGG
   −
= BAM File Recovery =
+
= BAM File Recovery =
   −
A BAM file that has been corrupted or truncated due to a copy or disk problem can often be partially recovered.
+
A BAM file that has been corrupted or truncated due to a copy or disk problem can often be partially recovered.  
   −
Both the BGZF format and binary BAM format have enough information to scan forward and resynchronize the input data. While some data will be lost, substantial recovery can often be done.
+
Both the BGZF format and binary BAM format have enough information to scan forward and resynchronize the input data. While some data will be lost, substantial recovery can often be done.  
   −
When a file has bad blocks in it, normal copy commands (cp) will truncate the file at the point of disk read failure. To recover the maximum amount of data possible, use the dd command with the conv=noerror option.
+
When a file has bad blocks in it, normal copy commands (cp) will truncate the file at the point of disk read failure. To recover the maximum amount of data possible, use the dd command with the conv=noerror option.  
   −
So a normal use case for recovery would look this this:
+
So a normal use case for recovery would look this this:  
 +
<pre># dd if=brokenbamfile.bam of=/tmp/brokenbamfile1.bam conv=noerror bs=4k
 +
# bam convert --recover --in /tmp/brokenbamfile1.bam --out /tmp/brokenbamfilerecovered.bam
 +
</pre>
 +
Note, you will of course need to output the result file to a known good filesystem.
   −
<pre>
+
Currently, no statistics are printed as far as how many BAM records are recovered, but subsequent tests can readily be done on the resulting file to determine the quality of recovery.  
# dd if=brokenbamfile.bam of=/tmp/brokenbamfile1.bam conv=noerror bs=4k
  −
# bam convert --recover --in /tmp/brokenbamfile1.bam --out /tmp/brokenbamfilerecovered.bam
  −
</pre>
     −
Note, you will of course need to output the result file to a known good filesystem.
+
In real cases, we have recovered better than 94% of reads from a set of severely damaged files (numerous 64K chunks of a RAID were lost), and better than 99.9% recovery from a moderately damaged file (3 disk pages were corrupt).  
   −
Currently, no statistics are printed as far as how many BAM records are recovered, but subsequent tests can readily be done on the resulting file to determine the quality of recovery.
+
= Usage  =
   −
In real cases, we have recovered better than 94% of reads from a set of severely damaged files (numerous 64K chunks of a RAID were lost), and better than 99.9% recovery from a moderately damaged file (3 disk pages were corrupt).
+
./bam convert --in &lt;inputFile&gt; --out &lt;outputFile.sam/bam/ubam (ubam is uncompressed bam)&gt; [--refFile &lt;reference filename&gt;] [--seqBases|--seqEquals|--seqOrig] [--recover] [--noeof] [--params]
   −
= Usage =
+
= Return Value  =
./bam convert --in <inputFile> --out <outputFile.sam/bam/ubam (ubam is uncompressed bam)> [--refFile <reference filename>] [--seqBases|--seqEquals|--seqOrig] [--recover] [--noeof] [--params]
     −
= Return Value =
+
Returns the SamStatus for the reads/writes.  
Returns the SamStatus for the reads/writes.
     −
= Example Output =
+
= Example Output =
<pre>
+
<pre>Number of records read = 10
Number of records read = 10
   
Number of records written = 10
 
Number of records written = 10
</pre>
+
</pre>  
 +
[[Category:BamUtil|convert]] [[Category:BAM_Software]] [[Category:Software]]

Navigation menu