Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,421 bytes removed ,  18:27, 3 January 2014
Line 8: Line 8:     
It has options to allow for the conversion of the sequence to/from '=' from/to the actual bases by using the reference sequence.  
 
It has options to allow for the conversion of the sequence to/from '=' from/to the actual bases by using the reference sequence.  
 +
 +
It also has an option to left shift indels in the CIGARs before writing the output file.
    
If you want to convert a BAM file to a SAM file, just call:  
 
If you want to convert a BAM file to a SAM file, just call:  
Line 15: Line 17:  
Don't forget to put in the paths to the executable and your test files.  
 
Don't forget to put in the paths to the executable and your test files.  
   −
= Parameters  =
  −
<pre>    Required Parameters:
  −
        --in      &nbsp;: the SAM/BAM file to be read
  −
        --out      &nbsp;: the SAM/BAM file to be written
  −
    Optional Parameters:
  −
--refFile  &nbsp;: reference file name
  −
        --noeof    &nbsp;: do not expect an EOF block on a bam file.
  −
        --params  &nbsp;: print the parameter settings
  −
        --recover  &nbsp;: attempt to recover the input bam file.
  −
    Optional Sequence Parameters (only specify one):
  −
--seqOrig  &nbsp;: Leave the sequence as is (default &amp; used if reference is not specified).
  −
--seqBases &nbsp;: Convert any '=' in the sequence to the appropriate base using the reference (requires --ref).
  −
--seqEquals&nbsp;: Convert any bases that match the reference to '=' (requires --ref).
  −
</pre>
  −
== input File (<code>--in</code>)  ==
  −
  −
Use <code>--in</code> followed by your file name to specify the SAM/BAM input file.
     −
The program automatically determines if your input file is SAM/BAM/uncompressed BAM without any input other than a filename from the user, unless your input file is stdin.
+
= Usage  =
   −
A <code>-</code> is used to indicate to read from stdin and the extension is used to determine the file type (no extension indicates SAM).
+
./bam convert --in <inputFile> --out <outputFile.sam/bam/ubam (ubam is uncompressed bam)> [--refFile <reference filename>] [--useBases|--useEquals|--useOrigSeq] [--lshift] [--noeof] [--params]
   −
{|border="1" cellspacing="0" cellpadding="2"
  −
|SAM/BAM/Uncompressed BAM from file
  −
| <code>--in yourFileName</code>
  −
|-
  −
|SAM from stdin
  −
| <code>--in -</code>
  −
|-
  −
|BAM from stdin
  −
| <code>--in -.bam</code>
  −
|-
  −
|Uncompressed BAM from stdin
  −
| <code>--in -.ubam</code>
  −
|}
      +
= Parameters  =
 +
<pre> Required Parameters:
 +
--in        : the SAM/BAM file to be read
 +
--out        : the SAM/BAM file to be written
 +
Optional Parameters:
 +
--refFile    : reference file name
 +
--lshift    : left shift indels when writing records
 +
--noeof      : do not expect an EOF block on a bam file
 +
--params    : print the parameter settings
 +
--recover    : attempt error recovery while reading a bam file
 +
Optional Sequence Parameters (only specify one):
 +
--useOrigSeq : Leave the sequence as is (default & used if reference is not specified)
 +
--useBases  : Convert any '=' in the sequence to the appropriate base using the reference (requires --refFile)
 +
--useEquals  : Convert any bases that match the reference to '=' (requires --refFile)
 +
</pre>
 +
{{PhoneHomeParamDesc}}
   −
Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file).  This matches the <code>samtools</code> implementation so pipes between our tools and <code>samtools</code> are supported.
+
== Required Parameters==
 
+
{{InBAMInputFile}}
== output File (<code>--out</code>) ==
+
{{OutBAMOutputFile}}
 
  −
Use <code>--out</code> followed by your file name to specify the SAM/BAM output file.
  −
 
  −
The file extension is used to determine whether to write SAM/BAM/uncompressed BAM.  A <code>-</code> is used to indicate stdout and the extension for file type (no extension is SAM).
     −
{|border="1" cellspacing="0" cellpadding="2"
+
== Optional Parameters ==
|SAM to file
+
{{refFile}}
| <code>--out yourFileName.sam</code>
  −
|-
  −
|BAM to file
  −
| <code>--out yourFileName.bam</code>
  −
|-
  −
|Uncompressed BAM to file
  −
| <code>--out yourFileName.ubam</code>
  −
|-
  −
|SAM to stdout
  −
| <code>--out -</code>
  −
|-
  −
|BAM to stdout
  −
| <code>--out -.bam</code>
  −
|-
  −
|Uncompressed BAM to stdout
  −
| <code>--out -.ubam</code>
  −
|}
      +
=== Left Shift Indels in the CIGAR (<code>--lshift</code>) ===
   −
Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file).  This matches the <code>samtools</code> implementation so pipes between our tools and <code>samtools</code> are supported.
+
Left shift indels as far as they can go in the read.  
    +
{{noeofBGZFParameter}}
 +
{{paramsParameter}}
   −
== Recover a corrupted BAM file (<code>--recover</code>) ==
+
=== Recover a corrupted BAM file (<code>--recover</code>) ===
    
See [[#BAM File Recovery |BAM File Recovery]].
 
See [[#BAM File Recovery |BAM File Recovery]].
   −
 
+
== Sequence Representation Parameters (<code>--useOrigSeq</code>, <code>--useBases</code>, <code>--useEquals</code>, <code>--refFile</code>) ==
== Sequence Representation Parameters (<code>--seqOrig</code>, <code>--seqBases</code>, <code>--seqEquals</code>, <code>--refFile</code>) ==
      
The sequence parameters options specify how to represent the sequence if the reference is specified (refFile option).  
 
The sequence parameters options specify how to represent the sequence if the reference is specified (refFile option).  
   −
If the reference is not specified or seqOrig is specified, no modifications are made to the sequence.  
+
If the reference is not specified or useOrigSeq is specified, no modifications are made to the sequence.  
   −
If the reference and seqBases is specified, any matches between the sequence and the reference are represented in the sequence as the appropriate base.  
+
If the reference and useBases is specified, any matches between the sequence and the reference are represented in the sequence as the appropriate base.  
   −
If the reference and seqEquals is specified, any matches between the sequence and the reference are represented in the sequence as '='.  
+
If the reference and useEquals is specified, any matches between the sequence and the reference are represented in the sequence as '='.  
    
=== Examples  ===
 
=== Examples  ===
Line 129: Line 98:  
  Sequence with Equals: AA======G===GGG
 
  Sequence with Equals: AA======G===GGG
    +
{{PhoneHomeParameters}}
    
= BAM File Recovery  =
 
= BAM File Recovery  =
Line 148: Line 118:  
In real cases, we have recovered better than 94% of reads from a set of severely damaged files (numerous 64K chunks of a RAID were lost), and better than 99.9% recovery from a moderately damaged file (3 disk pages were corrupt).  
 
In real cases, we have recovered better than 94% of reads from a set of severely damaged files (numerous 64K chunks of a RAID were lost), and better than 99.9% recovery from a moderately damaged file (3 disk pages were corrupt).  
   −
= Usage  =
  −
  −
./bam convert --in &lt;inputFile&gt; --out &lt;outputFile.sam/bam/ubam (ubam is uncompressed bam)&gt; [--refFile &lt;reference filename&gt;] [--seqBases|--seqEquals|--seqOrig] [--recover] [--noeof] [--params]
      
= Return Value  =
 
= Return Value  =
   −
Returns the SamStatus for the reads/writes.  
+
Returns the SamStatus for the reads/writes (0 for success, non-0 for failure).
    
= Example Output  =
 
= Example Output  =

Navigation menu