Changes

From Genome Analysis Wiki
Jump to navigationJump to search
4,981 bytes added ,  17:48, 16 May 2012
no edit summary
Line 1: Line 1: −
=== Purpose ===
+
= Overview of the <code>bam2FastQ</code> function of <code>[[bamUtil]]</code> =
This converts a BAM file into FastQ files. This is necessary when only BAM files are delivered but a new alignment is desired. By converting BAM to FastQ files new alignments can be done using FastQ files
+
The <code>bam2FastQ</code> option on the [[bamUtil]] converts a BAM file into FastQ files. This is necessary when only BAM files are delivered but a new alignment is desired. By converting BAM to FastQ files new alignments can be done using FastQ files
    
== How to use it ==
 
== How to use it ==
   −
When bam2FastQ is invoked without any arguments the following information is displayed
+
When bam2FastQ is invoked without any arguments the usage information is displayed as described below under [[#Usage|Usage]].
The following parameters are in effect:
  −
            Input BAM/SAM File :                (-iname)
  −
  −
Output FastQ Files
  −
  Output : --first [], --second [], --unpaired []
     −
Required parameter
+
The input BAM file is required, [[#input File (--in)|input File (--in)]].
-i InputBAM/SAM
     −
Optional parameters for output (however either --unpaired is present or both --first and --second are provided)
+
It works on both read/query name and coordinate sorted SAM/BAM files.  
  --first firstReadInAPair_FastQ
  −
--second secondReadInAPair_FastQ
  −
--unpaired unpairedReads_FastQ
     −
In order to extract paired end reads, the BAM file has to be sorted by name, e.g. using samtools. Suppose the BAM file is myinput.bam
+
If you want to convert a SAM/BAM that is read/query name sorted but the SO field of the header does not specify "queryname", then use the [[#BAM File Is Sorted By Read Name (<code>--readname</code>)|--readName]] option.
   −
  samtools sort -n myinput.bam myinput.sortByName
+
When processing files sorted by read name, the only requirement is that matching read names are next to each other. It does not need to be in strict alphabetical order.
   −
Using sorted bam file to extract paired end fastq files
+
Any errors and a summary of how many pairs and unpaired reads were processed are written to stderr.
   −
  bam2FastQ -i myinput.sortByName.BAM --first myread1.fastQ --second myread2.fastQ
+
=== Output Files ===
 +
This program produces 3 output fastq files.
 +
# unpaired reads
 +
# first end of paired reads
 +
# second end of paired reads
   −
Or to extract both paired end and single end fastq files (if the bam file contains both single and paired end reads)
+
The default fastq file names are determined by taking the base name of the input file and adding an extension for each filetype. 
+
{|border="1" cellspacing="0" cellpadding="2"
  bam2FastQ -i myinput.sortByName.BAM --first myread1.fastQ --second myread2.fastQ --unpaired myreadSingle.fastQ
+
! Output File Contents !! Extension
 +
|-
 +
|unpaired reads
 +
| .fastq
 +
|-
 +
|first end of paired reads
 +
| _1.fastq
 +
|-
 +
|second end of paired reads
 +
| _2.fastq
 +
|}
   −
Or using bam (sorted or not) file to extract single end fastq files
+
If the inputFile was "myPath/myFile.bam", the resulting fastq's would be:
   
+
#myPath/myFile.fastq
  bam2FastQ -i myinput.sortByName.BAM --unpaired myreadSingle.fastQ
+
#myPath/myFile_1.fastq
 +
#myPath/myFile_2.fastq
 +
 
 +
Instead of using the inputFile base name as the output file base, you can specify a different base name by using the [[#Output FastQ File Base Name (--outBase)|--outBase]] option.
 +
 
 +
You can optionally directly specify the output fastq filenames using:
 +
* --firstOut firstReadInAPair.fastq
 +
* --secondOut secondReadInAPair.fastq
 +
* --unpairedOut unpairedReads.fastq
 +
If any of these are not specified, the <code>--outBase</code> or default is used for that file.
 +
 
 +
= Usage =
 +
./bam bam2FastQ --in <inputFile> [--readName] [--refFile <referenceFile>] [--outBase <outputFileBase>] [--firstOut <1stReadInPairOutFile>] [--secondOut <2ndReadInPairOutFile>] [--unpairedOut <unpairedOutFile>] [--firstRNExt <firstInPairReadNameExt>] [--secondRNExt <secondInPairReadNameExt>] [--rnPlus] [--noReverseComp] [--noeof] [--params]
 +
 
 +
 
 +
= Parameters =
 +
<pre>
 +
Required Parameters:
 +
--in      : the SAM/BAM file to convert to FastQ
 +
Optional Parameters:
 +
--readname      : Process the BAM as readName sorted instead
 +
                  of coordinate if the header does not indicate a sort order.
 +
--refFile      : Reference file for converting '=' in the sequence to the actual base
 +
                  if '=' are found and the refFile is not specified, 'N' is written to the FASTQ
 +
--outBase      : Base output name for generated output files
 +
--firstOut      : Output name for the first in pair file
 +
                  over-rides setting of outBase
 +
--secondOut    : Output name for the second in pair file
 +
                  over-rides setting of outBase
 +
--unpairedOut  : Output name for unpaired reads
 +
                  over-rides setting of outBase
 +
--firstRNExt    : read name extension to use for first read in a pair
 +
                  default is "/1"
 +
--secondRNExt  : read name extension to use for second read in a pair
 +
                  default is "/2"
 +
--rnPlus        : Add the Read Name/extension to the '+' line of the fastq records
 +
--noReverseComp : Do not reverse complement reads marked as reverse
 +
--noeof        : Do not expect an EOF block on a bam file.
 +
--params        : Print the parameter settings to stderr
 +
</pre>
 +
 
 +
{{inBAMInputFile}}
 +
 
 +
== Output FastQ File Base Name (<code>--outBase</code>) ==
 +
 
 +
You can replace the default output base name by using the <code>--outBase</code> option.
 +
If the outBase was "myNewPath/myFastQBase", the resulting fastq's would be:
 +
#myNewPath/myFastQBase.fastq
 +
#myNewPath/myFastQBase_1.fastq
 +
#myNewPath/myFastQBase_2.fastq
 +
 
 +
The value specified by this parameter is overridden by <code>--firstOut</code>, <code>--secondOut</code>, and <code>--unpairedOut</code>, but is used for whichever output files are not specified.
 +
 
 +
 
 +
== Output FastQ File Name For the First End of Paired End (<code>--firstOut</code>) ==
 +
 
 +
This setting overides the default and <code>--outBase</code> file name.
 +
 
 +
The entire filename and extension must be specified.
 +
 
 +
Does not affect the filenames for the first end or for unpaired reads.
 +
 
 +
For example:
 +
./bam bam2FastQ --in myFile.bam --firstOut myFileEnd1.fastq
 +
 
 +
 
 +
== Output FastQ File Name For the Second End of Paired End (<code>--secondOut</code>) ==
 +
 
 +
This setting overides the default and <code>--outBase</code> file name.
 +
 
 +
The entire filename and extension must be specified.
 +
 
 +
Does not affect the filenames for the first end or for unpaired reads.
 +
 
 +
For example:
 +
./bam bam2FastQ --in myFile.bam --secondOut myFileEnd2.fastq
 +
 
 +
 
 +
== Output FastQ File Name For Unpaired Reads (<code>--unpairedOut</code>) ==
 +
 
 +
This setting overides the default and <code>--outBase</code> file names.
 +
 
 +
The entire filename and extension must be specified.
 +
 
 +
Does not affect the filenames for the two paired end fastq files.
 +
 
 +
For example:
 +
  ./bam bam2FastQ --in myFile.bam --unpairedOut myFileUnpaired.fastq
 +
 
 +
 
 +
== BAM File Is Sorted By Read Name (<code>--readname</code>) ==
 +
 
 +
The bam2FastQ program by default checks the sort order in the SAM/BAM header when converting to FASTQ, and if that is not specified, assumes it is sorted by coordinate.
 +
 
 +
To override the default and force it to assume the file is sorted by readname, specify the <code>--readName</code> option
 +
 
 +
The file does not need to be strictly sorted by read name. The only requirement is that matching read names are next to each other.
 +
 
 +
 
 +
== Reference File for Converting '=' in the Sequence to Bases <code>--refFile</code>==
 +
If the SAM/BAM file contains '=' in the sequence instead of the actual bases, the bam2FastQ program needs to convert the '=' back to the bases.  To do that it needs the reference.  Specify the reference by using <code>--refFile</code> followed by the reference filename.
 +
 
 +
For example:
 +
./bam bam2FastQ --in myFile.bam --refFile myPath/myRefFile.fa

Navigation menu