Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,827 bytes added ,  17:25, 17 May 2012
no edit summary
Line 1: Line 1: −
=This functionality will be released on 5/17/2012=
  −
   
= Overview of the <code>bam2FastQ</code> function of <code>[[bamUtil]]</code> =
 
= Overview of the <code>bam2FastQ</code> function of <code>[[bamUtil]]</code> =
 
The <code>bam2FastQ</code> option on the [[bamUtil]] converts a BAM file into FastQ files. This is necessary when only BAM files are delivered but a new alignment is desired. By converting BAM to FastQ files new alignments can be done using FastQ files
 
The <code>bam2FastQ</code> option on the [[bamUtil]] converts a BAM file into FastQ files. This is necessary when only BAM files are delivered but a new alignment is desired. By converting BAM to FastQ files new alignments can be done using FastQ files
Line 16: Line 14:     
When processing files sorted by read name, the only requirement is that matching read names are next to each other.  It does not need to be in strict alphabetical order.
 
When processing files sorted by read name, the only requirement is that matching read names are next to each other.  It does not need to be in strict alphabetical order.
 +
 +
Read Names in paired-end FASTQ files are appended with "/1" for the first in the pair, and "/2" for the second in the pair.  Override these defaults using [[#First in Pair FastQ ReadName Extension (--firstRNExt)|--firstRNExt]] and [[#Second in Pair FastQ ReadName Extension (--secondRNExt)|--secondRNExt]]
 +
 +
Sequences marked as Reverse strands in the SAM/BAM file are reverse complemented prior to writing to the FASTQ files.  To skip this step, specify [[#Do Not Reverse Complement Reverse Strands (--noReverseComp)|--noReverseComp]]
    
Any errors and a summary of how many pairs and unpaired reads were processed are written to stderr.
 
Any errors and a summary of how many pairs and unpaired reads were processed are written to stderr.
Line 83: Line 85:     
{{inBAMInputFile}}
 
{{inBAMInputFile}}
 +
 +
== BAM File Is Sorted By Read Name (<code>--readname</code>) ==
 +
 +
The bam2FastQ program by default checks the sort order in the SAM/BAM header when converting to FASTQ, and if that is not specified, assumes it is sorted by coordinate.
 +
 +
To override the default and force it to assume the file is sorted by readname, specify the <code>--readName</code> option
 +
 +
The file does not need to be strictly sorted by read name.  The only requirement is that matching read names are next to each other.
 +
 +
== Reference File for Converting '=' in the Sequence to Bases <code>--refFile</code>==
 +
If the SAM/BAM file contains '=' in the sequence instead of the actual bases, the bam2FastQ program needs to convert the '=' back to the bases.  To do that it needs the reference.  Specify the reference by using <code>--refFile</code> followed by the reference filename.
 +
 +
For example:
 +
./bam bam2FastQ --in myFile.bam --refFile myPath/myRefFile.fa
    
== Output FastQ File Base Name (<code>--outBase</code>) ==
 
== Output FastQ File Base Name (<code>--outBase</code>) ==
Line 93: Line 109:     
The value specified by this parameter is overridden by <code>--firstOut</code>, <code>--secondOut</code>, and <code>--unpairedOut</code>, but is used for whichever output files are not specified.
 
The value specified by this parameter is overridden by <code>--firstOut</code>, <code>--secondOut</code>, and <code>--unpairedOut</code>, but is used for whichever output files are not specified.
      
== Output FastQ File Name For the First End of Paired End (<code>--firstOut</code>) ==
 
== Output FastQ File Name For the First End of Paired End (<code>--firstOut</code>) ==
Line 105: Line 120:  
For example:
 
For example:
 
  ./bam bam2FastQ --in myFile.bam --firstOut myFileEnd1.fastq
 
  ./bam bam2FastQ --in myFile.bam --firstOut myFileEnd1.fastq
      
== Output FastQ File Name For the Second End of Paired End (<code>--secondOut</code>) ==
 
== Output FastQ File Name For the Second End of Paired End (<code>--secondOut</code>) ==
Line 117: Line 131:  
For example:
 
For example:
 
  ./bam bam2FastQ --in myFile.bam --secondOut myFileEnd2.fastq
 
  ./bam bam2FastQ --in myFile.bam --secondOut myFileEnd2.fastq
      
== Output FastQ File Name For Unpaired Reads (<code>--unpairedOut</code>) ==
 
== Output FastQ File Name For Unpaired Reads (<code>--unpairedOut</code>) ==
Line 130: Line 143:  
  ./bam bam2FastQ --in myFile.bam --unpairedOut myFileUnpaired.fastq
 
  ./bam bam2FastQ --in myFile.bam --unpairedOut myFileUnpaired.fastq
    +
== First in Pair FastQ ReadName Extension (<code>--firstRNExt</code>) ==
   −
== BAM File Is Sorted By Read Name (<code>--readname</code>) ==
+
<code>--firstRNExt</code> overrides the default "/1" that is appended to the Read Name of the first-end of a read pair with the specified value.
 +
 
 +
== Second in Pair FastQ ReadName Extension (<code>--secondRNExt</code>) ==
 +
 
 +
<code>--secondRNExt</code> overrides the default "/2" that is appended to the Read Name of the second-end of a read pair with the specified value.
 +
 
 +
== Include the Read Name on the "+" line of the FASTQ (<code>--rnPlus</code>) ==
 +
 
 +
By default the read name is not included on the "+" line of the FASTQ files.  To include the read name and the extension for paired-end reads, specify <code>--rnPlus</code>.
 +
 
 +
== Do Not Reverse Complement Reverse Strands (<code>--noReverseComp</code>) ==
 +
 
 +
By default, reads marked as reverse in the BAM file are reverse complemented prior to writing to the FASTQ files.  <code>--noReverseComp</code> disables this feature, and skips the reverse complement step.
 +
 
 +
For example, if a sequence is ACCGTG marked as reverse, the default FASTQ record will be written as: CACGGT
   −
The bam2FastQ program by default checks the sort order in the SAM/BAM header when converting to FASTQ, and if that is not specified, assumes it is sorted by coordinate.
+
Specifying <code>--noReverseComp</code> would result in a FASTQ sequence of ACCGTG
   −
To override the default and force it to assume the file is sorted by readname, specify the <code>--readName</code> option
+
{{noeofBGZFParameter}}
 +
{{paramsParameter}}
   −
The file does not need to be strictly sorted by read name.  The only requirement is that matching read names are next to each other.
      +
= Return Value =
   −
== Reference File for Converting '=' in the Sequence to Bases <code>--refFile</code>==
+
Returns -1 if input parameters are invalid.
If the SAM/BAM file contains '=' in the sequence instead of the actual bases, the bam2FastQ program needs to convert the '=' back to the bases.  To do that it needs the reference.  Specify the reference by using <code>--refFile</code> followed by the reference filename.
     −
For example:
+
Returns the SamStatus for the reads/writes (0 on success).
./bam bam2FastQ --in myFile.bam --refFile myPath/myRefFile.fa
 

Navigation menu