Changes

From Genome Analysis Wiki
Jump to navigationJump to search
6,392 bytes added ,  23:53, 5 March 2016
Line 1: Line 1: −
=This functionality will be released on 5/17/2012=
  −
   
= Overview of the <code>bam2FastQ</code> function of <code>[[bamUtil]]</code> =
 
= Overview of the <code>bam2FastQ</code> function of <code>[[bamUtil]]</code> =
 
The <code>bam2FastQ</code> option on the [[bamUtil]] converts a BAM file into FastQ files. This is necessary when only BAM files are delivered but a new alignment is desired. By converting BAM to FastQ files new alignments can be done using FastQ files
 
The <code>bam2FastQ</code> option on the [[bamUtil]] converts a BAM file into FastQ files. This is necessary when only BAM files are delivered but a new alignment is desired. By converting BAM to FastQ files new alignments can be done using FastQ files
    +
'''NOTE: Secondary and Supplementary reads are skipped when converting to FastQ.  It assumes that there will only be 2 reads (the 2 primary mates) with the same read name that are not secondary or supplementary.'''
 +
 +
'''NOTE: Use the --splitRG option to split reads into read group specific FASTQs.'''
    
== How to use it ==
 
== How to use it ==
Line 16: Line 17:     
When processing files sorted by read name, the only requirement is that matching read names are next to each other.  It does not need to be in strict alphabetical order.
 
When processing files sorted by read name, the only requirement is that matching read names are next to each other.  It does not need to be in strict alphabetical order.
 +
 +
Read Names in paired-end FASTQ files are appended with "/1" for the first in the pair, and "/2" for the second in the pair.  Override these defaults using [[#First in Pair FastQ ReadName Extension (--firstRNExt)|--firstRNExt]] and [[#Second in Pair FastQ ReadName Extension (--secondRNExt)|--secondRNExt]]
 +
 +
Sequences marked as Reverse strands in the SAM/BAM file are reverse complemented prior to writing to the FASTQ files.  To skip this step, specify [[#Do Not Reverse Complement Reverse Strands (--noReverseComp)|--noReverseComp]]
    
Any errors and a summary of how many pairs and unpaired reads were processed are written to stderr.
 
Any errors and a summary of how many pairs and unpaired reads were processed are written to stderr.
 +
 +
'''NOTE: This tool does not work on templates that have more than 2 segments.  It does not properly match reads when more than 2 reads have the same read name.'''
 +
 +
'''NOTE: This tool does not split reads into read group specific FASTQs.  If you want Read Group specific FASTQ files, first run [[BamUtil: splitBam]] to first split the BAM into 1 BAM per Read Group.  Then run bam2FastQ on each bam.'''
    
=== Output Files ===
 
=== Output Files ===
This program produces 3 output fastq files.
+
By default, this program produces 3 output fastq files.
 
# unpaired reads
 
# unpaired reads
 
# first end of paired reads
 
# first end of paired reads
 
# second end of paired reads
 
# second end of paired reads
 +
 +
If the [[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]] option is specified, the program produces 2 output fastq files.
 +
# unpaired reads
 +
# interleaved paired-end reads
    
The default fastq file names are determined by taking the base name of the input file and adding an extension for each filetype.   
 
The default fastq file names are determined by taking the base name of the input file and adding an extension for each filetype.   
 
{|border="1" cellspacing="0" cellpadding="2"
 
{|border="1" cellspacing="0" cellpadding="2"
! Output File Contents !! Extension
+
! colspan="2"|Default !!colspan="2"|[[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]]
 +
|-
 +
! Output File Contents !! Extension !! Output File Contents !! Extension
 
|-
 
|-
 +
|unpaired reads
 +
| .fastq
 
|unpaired reads
 
|unpaired reads
 
| .fastq
 
| .fastq
Line 34: Line 51:  
|first end of paired reads
 
|first end of paired reads
 
| _1.fastq
 
| _1.fastq
 +
| rowspan="2"|interleaved paired-end reads
 +
(both first & second end)
 +
| rowspan="2"|_interleaved.fastq
 
|-
 
|-
 
|second end of paired reads
 
|second end of paired reads
Line 39: Line 59:  
|}
 
|}
   −
If the inputFile was "myPath/myFile.bam", the resulting fastq's would be:
+
If the inputFile was "myPath/myFile.bam", the resulting fastqs would be:
 
#myPath/myFile.fastq
 
#myPath/myFile.fastq
 
#myPath/myFile_1.fastq
 
#myPath/myFile_1.fastq
 
#myPath/myFile_2.fastq
 
#myPath/myFile_2.fastq
 +
 +
With the [[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]] option, the resulting fastqs would be:
 +
#myPath/myFile.fastq
 +
#myPath/myFile_interleaved.fastq
    
Instead of using the inputFile base name as the output file base, you can specify a different base name by using the [[#Output FastQ File Base Name (--outBase)|--outBase]] option.
 
Instead of using the inputFile base name as the output file base, you can specify a different base name by using the [[#Output FastQ File Base Name (--outBase)|--outBase]] option.
    
You can optionally directly specify the output fastq filenames using:
 
You can optionally directly specify the output fastq filenames using:
* --firstOut firstReadInAPair.fastq
+
* --firstOut firstReadInAPair.fastq (also used for the interleaved filename with [[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]])
 
* --secondOut secondReadInAPair.fastq
 
* --secondOut secondReadInAPair.fastq
 
* --unpairedOut unpairedReads.fastq
 
* --unpairedOut unpairedReads.fastq
Line 53: Line 77:     
= Usage =
 
= Usage =
./bam bam2FastQ --in <inputFile> [--readName] [--refFile <referenceFile>] [--outBase <outputFileBase>] [--firstOut <1stReadInPairOutFile>] [--secondOut <2ndReadInPairOutFile>] [--unpairedOut <unpairedOutFile>] [--firstRNExt <firstInPairReadNameExt>] [--secondRNExt <secondInPairReadNameExt>] [--rnPlus] [--noReverseComp] [--noeof] [--params]
+
./bam bam2FastQ --in <inputFile> [--readName] [--splitRG] [--qualField <tag>] [--refFile <referenceFile>] [--outBase <outputFileBase>] [--firstOut <1stReadInPairOutFile>] [--merge|--secondOut <2ndReadInPairOutFile>] [--unpairedOut <unpairedOutFile>] [--firstRNExt <firstInPairReadNameEx                          t>] [--secondRNExt <secondInPairReadNameExt>] [--rnPlus] [--noReverseComp] [--region <chr>[:<pos>[:<base>]]] [--gzip] [--noeof] [--params]
 
      
= Parameters =
 
= Parameters =
 
<pre>
 
<pre>
Required Parameters:
+
        Required Parameters:
--in      : the SAM/BAM file to convert to FastQ
+
                --in      : the SAM/BAM file to convert to FastQ
Optional Parameters:
+
        Optional Parameters:
--readname      : Process the BAM as readName sorted instead
+
                --readname      : Process the BAM as readName sorted instead
                  of coordinate if the header does not indicate a sort order.
+
                                  of coordinate if the header does not indicate a sort order.
--refFile      : Reference file for converting '=' in the sequence to the actual base
+
                --splitRG      : Split into RG specific fastqs.
                  if '=' are found and the refFile is not specified, 'N' is written to the FASTQ
+
                --qualField    : Use the base quality from the specified tag
--outBase      : Base output name for generated output files
+
                                  rather than from the Quality field (default)
--firstOut      : Output name for the first in pair file
+
                --merge        : Generate 1 interleaved (merged) FASTQ for paired-ends (unpaired in a separate file)
                  over-rides setting of outBase
+
                                  use firstOut to override the filename of the interleaved file.
--secondOut    : Output name for the second in pair file
+
                --refFile      : Reference file for converting '=' in the sequence to the actual base
                  over-rides setting of outBase
+
                                  if '=' are found and the refFile is not specified, 'N' is written to the FASTQ
--unpairedOut  : Output name for unpaired reads
+
                --firstRNExt    : read name extension to use for first read in a pair
                  over-rides setting of outBase
+
                                  default is "/1"
--firstRNExt    : read name extension to use for first read in a pair
+
                --secondRNExt  : read name extension to use for second read in a pair
                  default is "/1"
+
                                  default is "/2"
--secondRNExt  : read name extension to use for second read in a pair
+
                --rnPlus        : Add the Read Name/extension to the '+' line of the fastq records
                  default is "/2"
+
                --noReverseComp : Do not reverse complement reads marked as reverse
--rnPlus        : Add the Read Name/extension to the '+' line of the fastq records
+
                --region        : Only convert reads containing the specified region/nucleotide.
--noReverseComp : Do not reverse complement reads marked as reverse
+
                                  Position formatted as: chr:pos:base
--noeof        : Do not expect an EOF block on a bam file.
+
                                  pos (0-based) & base are optional.
--params        : Print the parameter settings to stderr
+
                --gzip          : Compress the output FASTQ files using gzip
 +
                --noeof        : Do not expect an EOF block on a bam file.
 +
                --params        : Print the parameter settings to stderr
 +
        Optional OutputFile Names:
 +
                --outBase      : Base output name for generated output files
 +
                --firstOut      : Output name for the first in pair file
 +
                                  over-rides setting of outBase
 +
                --secondOut    : Output name for the second in pair file
 +
                                  over-rides setting of outBase
 +
                --unpairedOut  : Output name for unpaired reads
 +
                                  over-rides setting of outBase
 
</pre>
 
</pre>
    +
== Required Parameters ==
 
{{inBAMInputFile}}
 
{{inBAMInputFile}}
   −
== Output FastQ File Base Name (<code>--outBase</code>) ==
+
== Optional Parameters ==
 +
=== BAM File Is Sorted By Read Name (<code>--readname</code>) ===
 +
 
 +
The bam2FastQ program by default checks the sort order in the SAM/BAM header when converting to FASTQ, and if that is not specified, assumes it is sorted by coordinate.
 +
 
 +
To override the default and force it to assume the file is sorted by readname, specify the <code>--readName</code> option
 +
 
 +
The file does not need to be strictly sorted by read name.  The only requirement is that matching read names are next to each other.
 +
 
 +
=== Split into RG Specific FASTQs (<code>--splitRG</code>) ===
 +
 
 +
Create RG specific FASTQ files.
 +
 
 +
Cannot be specified with firstOut/secondOut/unpairedOut since there will be a different filename for each RG.
 +
 
 +
Cannot write to stdout when <code>--splitRG</code> is specified.
 +
 
 +
Output filenames will be <outBase>.<RG>_1.fastq, <outBase>.<RG>_2.fastq, and <outBase>.<RG>.fastq.  A fastq list file <outBase>.list will be created containing MERGE_NAME (the RG tag's SM value or outBase if the value is empty), fastq 1, fastq 2 (or . if it is a single ended fastq), and the RG tag string.
 +
 
 +
=== Use the Base Quality from the Specified Tag (<code>--qualField</code>) ===
 +
 
 +
By default, the quality field is used for the Base Qualities in the FASTQ file.  Specify <code>--qualField <tagName></code> to use the base qualities from the specified tag instead of the quality field.
 +
 
 +
 
 +
=== Generate 1 Paired-End Output File (<code>--merge</code>) ===
 +
 
 +
Use the <code>--merge</code> option to generate 1 interleaved (merged) FASTQ for paired-ends instead of 2 files.  Unpaired reads are still written to a separate file.
 +
 
 +
The default extension for the output file is "_interleaved".
 +
 
 +
Use [[#Output FastQ File Name For the First End of Paired End (--firstOut)|<code>--firstOut</code>]] to override the filename of the interleaved file.
 +
 
 +
This parameter was added in version 1.0.10.
 +
 
 +
=== Reference File for Converting '=' in the Sequence to Bases (<code>--refFile</code>) ===
 +
If the SAM/BAM file contains '=' in the sequence instead of the actual bases, the bam2FastQ program needs to convert the '=' back to the bases.  To do that it needs the reference.  Specify the reference by using <code>--refFile</code> followed by the reference filename.
 +
 
 +
For example:
 +
./bam bam2FastQ --in myFile.bam --refFile myPath/myRefFile.fa
 +
 
 +
=== First in Pair FastQ ReadName Extension (<code>--firstRNExt</code>) ===
 +
 
 +
<code>--firstRNExt</code> overrides the default "/1" that is appended to the Read Name of the first-end of a read pair with the specified value.
 +
 
 +
=== Second in Pair FastQ ReadName Extension (<code>--secondRNExt</code>) ===
 +
 
 +
<code>--secondRNExt</code> overrides the default "/2" that is appended to the Read Name of the second-end of a read pair with the specified value.
 +
 
 +
=== Include the Read Name on the "+" line of the FASTQ (<code>--rnPlus</code>) ===
 +
 
 +
By default the read name is not included on the "+" line of the FASTQ files.  To include the read name and the extension for paired-end reads, specify <code>--rnPlus</code>.
 +
 
 +
=== Do Not Reverse Complement Reverse Strands (<code>--noReverseComp</code>) ===
 +
 
 +
By default, reads marked as reverse in the BAM file are reverse complemented prior to writing to the FASTQ files.  <code>--noReverseComp</code> disables this feature, and skips the reverse complement step.
 +
 
 +
For example, if a sequence is ACCGTG marked as reverse, the default FASTQ record will be written as: CACGGT
 +
 
 +
Specifying <code>--noReverseComp</code> would result in a FASTQ sequence of ACCGTG
 +
 
 +
=== Only Convert the Specified Region (<code>--region</code>) ===
 +
 
 +
Only convert reads containing the specified region/nucleotide.
 +
 
 +
Position formatted as: chr:pos:base
 +
 
 +
pos (0-based) & base are optional.
 +
 
 +
{{noeofBGZFParameter}}
 +
{{paramsParameter}}
 +
 
 +
== Optional Output Filenames ==
 +
 
 +
=== Output FastQ File Base Name (<code>--outBase</code>) ===
    
You can replace the default output base name by using the <code>--outBase</code> option.
 
You can replace the default output base name by using the <code>--outBase</code> option.
Line 94: Line 201:  
The value specified by this parameter is overridden by <code>--firstOut</code>, <code>--secondOut</code>, and <code>--unpairedOut</code>, but is used for whichever output files are not specified.
 
The value specified by this parameter is overridden by <code>--firstOut</code>, <code>--secondOut</code>, and <code>--unpairedOut</code>, but is used for whichever output files are not specified.
    +
With the [[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]] option, the resulting fastq's would instead be:
 +
#myNewPath/myFastQBase.fastq
 +
#myNewPath/myFastQBase_interleaved.fastq
   −
== Output FastQ File Name For the First End of Paired End (<code>--firstOut</code>) ==
+
=== Output FastQ File Name For the First End of Paired End (<code>--firstOut</code>) ===
    
This setting overides the default and <code>--outBase</code> file name.  
 
This setting overides the default and <code>--outBase</code> file name.  
Line 101: Line 211:  
The entire filename and extension must be specified.
 
The entire filename and extension must be specified.
   −
Does not affect the filenames for the first end or for unpaired reads.
+
Does not affect the filenames for the second end or for unpaired reads.
    
For example:
 
For example:
 
  ./bam bam2FastQ --in myFile.bam --firstOut myFileEnd1.fastq
 
  ./bam bam2FastQ --in myFile.bam --firstOut myFileEnd1.fastq
   −
 
+
=== Output FastQ File Name For the Second End of Paired End (<code>--secondOut</code>) ===
== Output FastQ File Name For the Second End of Paired End (<code>--secondOut</code>) ==
      
This setting overides the default and <code>--outBase</code> file name.  
 
This setting overides the default and <code>--outBase</code> file name.  
Line 118: Line 227:  
  ./bam bam2FastQ --in myFile.bam --secondOut myFileEnd2.fastq
 
  ./bam bam2FastQ --in myFile.bam --secondOut myFileEnd2.fastq
   −
 
+
=== Output FastQ File Name For Unpaired Reads (<code>--unpairedOut</code>) ===
== Output FastQ File Name For Unpaired Reads (<code>--unpairedOut</code>) ==
      
This setting overides the default and <code>--outBase</code> file names.  
 
This setting overides the default and <code>--outBase</code> file names.  
Line 125: Line 233:  
The entire filename and extension must be specified.
 
The entire filename and extension must be specified.
   −
Does not affect the filenames for the two paired end fastq files.
+
Does not affect the filenames for the paired-end fastq files.
    
For example:
 
For example:
 
  ./bam bam2FastQ --in myFile.bam --unpairedOut myFileUnpaired.fastq
 
  ./bam bam2FastQ --in myFile.bam --unpairedOut myFileUnpaired.fastq
    +
{{PhoneHomeParameters}}
   −
== BAM File Is Sorted By Read Name (<code>--readname</code>) ==
+
= Return Value =
   −
The bam2FastQ program by default checks the sort order in the SAM/BAM header when converting to FASTQ, and if that is not specified, assumes it is sorted by coordinate.
+
Returns -1 if input parameters are invalid.
   −
To override the default and force it to assume the file is sorted by readname, specify the <code>--readName</code> option
+
Returns the SamStatus for the reads/writes (0 on success, non-0 on failure).
 
  −
The file does not need to be strictly sorted by read name.  The only requirement is that matching read names are next to each other.
  −
 
  −
 
  −
== Reference File for Converting '=' in the Sequence to Bases <code>--refFile</code>==
  −
If the SAM/BAM file contains '=' in the sequence instead of the actual bases, the bam2FastQ program needs to convert the '=' back to the bases.  To do that it needs the reference.  Specify the reference by using <code>--refFile</code> followed by the reference filename.
  −
 
  −
For example:
  −
./bam bam2FastQ --in myFile.bam --refFile myPath/myRefFile.fa
 

Navigation menu