Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,825 bytes added ,  18:47, 6 January 2014
no edit summary
Line 26: Line 26:     
=== Output Files ===
 
=== Output Files ===
This program produces 3 output fastq files.
+
By default, this program produces 3 output fastq files.
 
# unpaired reads
 
# unpaired reads
 
# first end of paired reads
 
# first end of paired reads
 
# second end of paired reads
 
# second end of paired reads
 +
 +
If the [[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]] option is specified, the program produces 2 output fastq files.
 +
# unpaired reads
 +
# interleaved paired-end reads
    
The default fastq file names are determined by taking the base name of the input file and adding an extension for each filetype.   
 
The default fastq file names are determined by taking the base name of the input file and adding an extension for each filetype.   
 
{|border="1" cellspacing="0" cellpadding="2"
 
{|border="1" cellspacing="0" cellpadding="2"
! Output File Contents !! Extension
+
! colspan="2"|Default !!colspan="2"|[[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]]
 +
|-
 +
! Output File Contents !! Extension !! Output File Contents !! Extension
 
|-
 
|-
 +
|unpaired reads
 +
| .fastq
 
|unpaired reads
 
|unpaired reads
 
| .fastq
 
| .fastq
Line 40: Line 48:  
|first end of paired reads
 
|first end of paired reads
 
| _1.fastq
 
| _1.fastq
 +
| rowspan="2"|interleaved paired-end reads
 +
(both first & second end)
 +
| rowspan="2"|_interleaved.fastq
 
|-
 
|-
 
|second end of paired reads
 
|second end of paired reads
Line 45: Line 56:  
|}
 
|}
   −
If the inputFile was "myPath/myFile.bam", the resulting fastq's would be:
+
If the inputFile was "myPath/myFile.bam", the resulting fastqs would be:
 
#myPath/myFile.fastq
 
#myPath/myFile.fastq
 
#myPath/myFile_1.fastq
 
#myPath/myFile_1.fastq
 
#myPath/myFile_2.fastq
 
#myPath/myFile_2.fastq
 +
 +
With the [[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]] option, the resulting fastqs would be:
 +
#myPath/myFile.fastq
 +
#myPath/myFile_interleaved.fastq
    
Instead of using the inputFile base name as the output file base, you can specify a different base name by using the [[#Output FastQ File Base Name (--outBase)|--outBase]] option.
 
Instead of using the inputFile base name as the output file base, you can specify a different base name by using the [[#Output FastQ File Base Name (--outBase)|--outBase]] option.
    
You can optionally directly specify the output fastq filenames using:
 
You can optionally directly specify the output fastq filenames using:
* --firstOut firstReadInAPair.fastq
+
* --firstOut firstReadInAPair.fastq (also used for the interleaved filename with [[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]].
 
* --secondOut secondReadInAPair.fastq
 
* --secondOut secondReadInAPair.fastq
 
* --unpairedOut unpairedReads.fastq
 
* --unpairedOut unpairedReads.fastq
Line 59: Line 74:     
= Usage =
 
= Usage =
  ./bam bam2FastQ --in <inputFile> [--readName] [--refFile <referenceFile>] [--outBase <outputFileBase>] [--firstOut <1stReadInPairOutFile>] [--secondOut <2ndReadInPairOutFile>] [--unpairedOut <unpairedOutFile>] [--firstRNExt <firstInPairReadNameExt>] [--secondRNExt <secondInPairReadNameExt>] [--rnPlus] [--noReverseComp] [--noeof] [--params]
+
  ./bam bam2FastQ --in <inputFile> [--readName] [--refFile <referenceFile>] [--outBase <outputFileBase>] [--firstOut <1stReadInPairOutFile>] [--merge|--secondOut <2ndReadInPairOutFile>] [--unpairedOut <unpairedOutFile>] [--firstRNExt <firstInPairReadNameExt>] [--secondRNExt <secondInPairReadNameExt>] [--rnPlus] [--noReverseComp] [--noeof] [--params]
    
= Parameters =
 
= Parameters =
Line 68: Line 83:  
--readname      : Process the BAM as readName sorted instead
 
--readname      : Process the BAM as readName sorted instead
 
                  of coordinate if the header does not indicate a sort order.
 
                  of coordinate if the header does not indicate a sort order.
 +
--merge        : Generate 1 interleaved (merged) FASTQ for paired-ends (unpaired in a separate file)
 +
                  use firstOut to override the filename of the interleaved file.
 
--refFile      : Reference file for converting '=' in the sequence to the actual base
 
--refFile      : Reference file for converting '=' in the sequence to the actual base
 
                  if '=' are found and the refFile is not specified, 'N' is written to the FASTQ
 
                  if '=' are found and the refFile is not specified, 'N' is written to the FASTQ
--outBase      : Base output name for generated output files
  −
--firstOut      : Output name for the first in pair file
  −
                  over-rides setting of outBase
  −
--secondOut    : Output name for the second in pair file
  −
                  over-rides setting of outBase
  −
--unpairedOut  : Output name for unpaired reads
  −
                  over-rides setting of outBase
   
--firstRNExt    : read name extension to use for first read in a pair
 
--firstRNExt    : read name extension to use for first read in a pair
 
                  default is "/1"
 
                  default is "/1"
Line 85: Line 95:  
--noeof        : Do not expect an EOF block on a bam file.
 
--noeof        : Do not expect an EOF block on a bam file.
 
--params        : Print the parameter settings to stderr
 
--params        : Print the parameter settings to stderr
 +
Optional OutputFile Names:
 +
--outBase      : Base output name for generated output files
 +
--firstOut      : Output name for the first in pair file
 +
                  over-rides setting of outBase
 +
--secondOut    : Output name for the second in pair file
 +
                  over-rides setting of outBase
 +
--unpairedOut  : Output name for unpaired reads
 +
                  over-rides setting of outBase
 
</pre>
 
</pre>
    +
== Required Parameters ==
 
{{inBAMInputFile}}
 
{{inBAMInputFile}}
   −
== BAM File Is Sorted By Read Name (<code>--readname</code>) ==
+
== Optional Parameters ==
 +
=== BAM File Is Sorted By Read Name (<code>--readname</code>) ===
    
The bam2FastQ program by default checks the sort order in the SAM/BAM header when converting to FASTQ, and if that is not specified, assumes it is sorted by coordinate.
 
The bam2FastQ program by default checks the sort order in the SAM/BAM header when converting to FASTQ, and if that is not specified, assumes it is sorted by coordinate.
Line 97: Line 117:  
The file does not need to be strictly sorted by read name.  The only requirement is that matching read names are next to each other.
 
The file does not need to be strictly sorted by read name.  The only requirement is that matching read names are next to each other.
   −
== Reference File for Converting '=' in the Sequence to Bases <code>--refFile</code>==
+
=== Generate 1 Paired-End Output File (<code>--merge</code>) ===
 +
 
 +
Use the <code>--merge</code> option to generate 1 interleaved (merged) FASTQ for paired-ends instead of 2 files.  Unpaired reads are still written to a separate file.
 +
 
 +
The default extension for the output file is "_interleaved".
 +
 
 +
Use [[#Output FastQ File Name For the First End of Paired End (--firstOut)|<code>--firstOut</code>]] to override the filename of the interleaved file.
 +
 
 +
This parameter was added in version 1.0.10.
 +
 
 +
=== Reference File for Converting '=' in the Sequence to Bases (<code>--refFile</code>) ===
 
If the SAM/BAM file contains '=' in the sequence instead of the actual bases, the bam2FastQ program needs to convert the '=' back to the bases.  To do that it needs the reference.  Specify the reference by using <code>--refFile</code> followed by the reference filename.
 
If the SAM/BAM file contains '=' in the sequence instead of the actual bases, the bam2FastQ program needs to convert the '=' back to the bases.  To do that it needs the reference.  Specify the reference by using <code>--refFile</code> followed by the reference filename.
   Line 103: Line 133:  
  ./bam bam2FastQ --in myFile.bam --refFile myPath/myRefFile.fa
 
  ./bam bam2FastQ --in myFile.bam --refFile myPath/myRefFile.fa
   −
== Output FastQ File Base Name (<code>--outBase</code>) ==
+
=== First in Pair FastQ ReadName Extension (<code>--firstRNExt</code>) ===
 +
 
 +
<code>--firstRNExt</code> overrides the default "/1" that is appended to the Read Name of the first-end of a read pair with the specified value.
 +
 
 +
=== Second in Pair FastQ ReadName Extension (<code>--secondRNExt</code>) ===
 +
 
 +
<code>--secondRNExt</code> overrides the default "/2" that is appended to the Read Name of the second-end of a read pair with the specified value.
 +
 
 +
=== Include the Read Name on the "+" line of the FASTQ (<code>--rnPlus</code>) ===
 +
 
 +
By default the read name is not included on the "+" line of the FASTQ files.  To include the read name and the extension for paired-end reads, specify <code>--rnPlus</code>.
 +
 
 +
=== Do Not Reverse Complement Reverse Strands (<code>--noReverseComp</code>) ===
 +
 
 +
By default, reads marked as reverse in the BAM file are reverse complemented prior to writing to the FASTQ files.  <code>--noReverseComp</code> disables this feature, and skips the reverse complement step.
 +
 
 +
For example, if a sequence is ACCGTG marked as reverse, the default FASTQ record will be written as: CACGGT
 +
 
 +
Specifying <code>--noReverseComp</code> would result in a FASTQ sequence of ACCGTG
 +
 
 +
{{noeofBGZFParameter}}
 +
{{paramsParameter}}
 +
 
 +
== Optional Output Filenames ==
 +
 
 +
=== Output FastQ File Base Name (<code>--outBase</code>) ===
    
You can replace the default output base name by using the <code>--outBase</code> option.
 
You can replace the default output base name by using the <code>--outBase</code> option.
Line 113: Line 168:  
The value specified by this parameter is overridden by <code>--firstOut</code>, <code>--secondOut</code>, and <code>--unpairedOut</code>, but is used for whichever output files are not specified.
 
The value specified by this parameter is overridden by <code>--firstOut</code>, <code>--secondOut</code>, and <code>--unpairedOut</code>, but is used for whichever output files are not specified.
   −
== Output FastQ File Name For the First End of Paired End (<code>--firstOut</code>) ==
+
With the [[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]] option, the resulting fastq's would instead be:
 +
#myNewPath/myFastQBase.fastq
 +
#myNewPath/myFastQBase_interleaved.fastq
 +
 
 +
=== Output FastQ File Name For the First End of Paired End (<code>--firstOut</code>) ===
    
This setting overides the default and <code>--outBase</code> file name.  
 
This setting overides the default and <code>--outBase</code> file name.  
Line 119: Line 178:  
The entire filename and extension must be specified.
 
The entire filename and extension must be specified.
   −
Does not affect the filenames for the first end or for unpaired reads.
+
Does not affect the filenames for the second end or for unpaired reads.
    
For example:
 
For example:
 
  ./bam bam2FastQ --in myFile.bam --firstOut myFileEnd1.fastq
 
  ./bam bam2FastQ --in myFile.bam --firstOut myFileEnd1.fastq
   −
== Output FastQ File Name For the Second End of Paired End (<code>--secondOut</code>) ==
+
=== Output FastQ File Name For the Second End of Paired End (<code>--secondOut</code>) ===
    
This setting overides the default and <code>--outBase</code> file name.  
 
This setting overides the default and <code>--outBase</code> file name.  
Line 135: Line 194:  
  ./bam bam2FastQ --in myFile.bam --secondOut myFileEnd2.fastq
 
  ./bam bam2FastQ --in myFile.bam --secondOut myFileEnd2.fastq
   −
== Output FastQ File Name For Unpaired Reads (<code>--unpairedOut</code>) ==
+
=== Output FastQ File Name For Unpaired Reads (<code>--unpairedOut</code>) ===
    
This setting overides the default and <code>--outBase</code> file names.  
 
This setting overides the default and <code>--outBase</code> file names.  
Line 141: Line 200:  
The entire filename and extension must be specified.
 
The entire filename and extension must be specified.
   −
Does not affect the filenames for the two paired end fastq files.
+
Does not affect the filenames for the paired-end fastq files.
    
For example:
 
For example:
 
  ./bam bam2FastQ --in myFile.bam --unpairedOut myFileUnpaired.fastq
 
  ./bam bam2FastQ --in myFile.bam --unpairedOut myFileUnpaired.fastq
   −
== First in Pair FastQ ReadName Extension (<code>--firstRNExt</code>) ==
+
{{PhoneHomeParameters}}
 
  −
<code>--firstRNExt</code> overrides the default "/1" that is appended to the Read Name of the first-end of a read pair with the specified value.
  −
 
  −
== Second in Pair FastQ ReadName Extension (<code>--secondRNExt</code>) ==
  −
 
  −
<code>--secondRNExt</code> overrides the default "/2" that is appended to the Read Name of the second-end of a read pair with the specified value.
  −
 
  −
== Include the Read Name on the "+" line of the FASTQ (<code>--rnPlus</code>) ==
  −
 
  −
By default the read name is not included on the "+" line of the FASTQ files.  To include the read name and the extension for paired-end reads, specify <code>--rnPlus</code>.
  −
 
  −
== Do Not Reverse Complement Reverse Strands (<code>--noReverseComp</code>) ==
  −
 
  −
By default, reads marked as reverse in the BAM file are reverse complemented prior to writing to the FASTQ files.  <code>--noReverseComp</code> disables this feature, and skips the reverse complement step.
  −
 
  −
For example, if a sequence is ACCGTG marked as reverse, the default FASTQ record will be written as: CACGGT
  −
 
  −
Specifying <code>--noReverseComp</code> would result in a FASTQ sequence of ACCGTG
  −
 
  −
{{noeofBGZFParameter}}
  −
{{paramsParameter}}
  −
 
      
= Return Value =
 
= Return Value =
Line 174: Line 211:  
Returns -1 if input parameters are invalid.
 
Returns -1 if input parameters are invalid.
   −
Returns the SamStatus for the reads/writes (0 on success).
+
Returns the SamStatus for the reads/writes (0 on success, non-0 on failure).

Navigation menu