Changes

BamUtil: bam2FastQ (view source)

Revision as of 18:47, 6 January 2014

1,825 bytes added , 18:47, 6 January 2014

no edit summary

Line 26: Line 26:

=== Output Files ===

−

~~This~~ program produces 3 output fastq files.

+

By default, this program produces 3 output fastq files.

# unpaired reads

# first end of paired reads

# second end of paired reads

+

If the [[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]] option is specified, the program produces 2 output fastq files.

+

# unpaired reads

+

# interleaved paired-end reads

The default fastq file names are determined by taking the base name of the input file and adding an extension for each filetype.

{|border="1" cellspacing="0" cellpadding="2"

−

! Output File Contents !! Extension

+

! colspan="2"|Default !!colspan="2"|[[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]]

+

|-

+

! Output File Contents !! Extension !! Output File Contents !! Extension

|-

+

|unpaired reads

+

| .fastq

|unpaired reads

| .fastq

Line 40: Line 48:

|first end of paired reads

| _1.fastq

+

| rowspan="2"|interleaved paired-end reads

+

(both first & second end)

+

| rowspan="2"|_interleaved.fastq

|-

|second end of paired reads

Line 45: Line 56:

|}

−

If the inputFile was "myPath/myFile.bam", the resulting ~~fastq's~~ would be:

+

If the inputFile was "myPath/myFile.bam", the resulting fastqs would be:

#myPath/myFile.fastq

#myPath/myFile_1.fastq

#myPath/myFile_2.fastq

+

With the [[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]] option, the resulting fastqs would be:

+

#myPath/myFile.fastq

+

#myPath/myFile_interleaved.fastq

Instead of using the inputFile base name as the output file base, you can specify a different base name by using the [[#Output FastQ File Base Name (--outBase)|--outBase]] option.

You can optionally directly specify the output fastq filenames using:

−

* --firstOut firstReadInAPair.fastq

+

* --firstOut firstReadInAPair.fastq (also used for the interleaved filename with [[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]].

* --secondOut secondReadInAPair.fastq

* --unpairedOut unpairedReads.fastq

Line 59: Line 74:

= Usage =

−

./bam bam2FastQ --in <inputFile> [--readName] [--refFile <referenceFile>] [--outBase <outputFileBase>] [--firstOut <1stReadInPairOutFile>] [--secondOut <2ndReadInPairOutFile>] [--unpairedOut <unpairedOutFile>] [--firstRNExt <firstInPairReadNameExt>] [--secondRNExt <secondInPairReadNameExt>] [--rnPlus] [--noReverseComp] [--noeof] [--params]

+

./bam bam2FastQ --in <inputFile> [--readName] [--refFile <referenceFile>] [--outBase <outputFileBase>] [--firstOut <1stReadInPairOutFile>] [--merge|--secondOut <2ndReadInPairOutFile>] [--unpairedOut <unpairedOutFile>] [--firstRNExt <firstInPairReadNameExt>] [--secondRNExt <secondInPairReadNameExt>] [--rnPlus] [--noReverseComp] [--noeof] [--params]

= Parameters =

Line 68: Line 83:

--readname : Process the BAM as readName sorted instead

of coordinate if the header does not indicate a sort order.

+

--merge : Generate 1 interleaved (merged) FASTQ for paired-ends (unpaired in a separate file)

+

use firstOut to override the filename of the interleaved file.

--refFile : Reference file for converting '=' in the sequence to the actual base

if '=' are found and the refFile is not specified, 'N' is written to the FASTQ

−

~~--outBase : Base output name for generated output files~~

−

~~--firstOut : Output name for the first in pair file~~

−

~~over-rides setting of outBase~~

−

~~--secondOut : Output name for the second in pair file~~

−

~~over-rides setting of outBase~~

−

~~--unpairedOut : Output name for unpaired reads~~

−

~~over-rides setting of outBase~~

--firstRNExt : read name extension to use for first read in a pair

default is "/1"

Line 85: Line 95:

--noeof : Do not expect an EOF block on a bam file.

--params : Print the parameter settings to stderr

+

Optional OutputFile Names:

+

--outBase : Base output name for generated output files

+

--firstOut : Output name for the first in pair file

+

over-rides setting of outBase

+

--secondOut : Output name for the second in pair file

+

over-rides setting of outBase

+

--unpairedOut : Output name for unpaired reads

+

over-rides setting of outBase

</pre>

+

== Required Parameters ==

−

== BAM File Is Sorted By Read Name (<code>--readname</code>) ==

+

== Optional Parameters ==

+

=== BAM File Is Sorted By Read Name (<code>--readname</code>) ===

The bam2FastQ program by default checks the sort order in the SAM/BAM header when converting to FASTQ, and if that is not specified, assumes it is sorted by coordinate.

Line 97: Line 117:

The file does not need to be strictly sorted by read name. The only requirement is that matching read names are next to each other.

−

== Reference File for Converting '=' in the Sequence to Bases <code>--refFile</code>==

+

=== Generate 1 Paired-End Output File (<code>--merge</code>) ===

+

Use the <code>--merge</code> option to generate 1 interleaved (merged) FASTQ for paired-ends instead of 2 files. Unpaired reads are still written to a separate file.

+

The default extension for the output file is "_interleaved".

+

Use [[#Output FastQ File Name For the First End of Paired End (--firstOut)|<code>--firstOut</code>]] to override the filename of the interleaved file.

+

This parameter was added in version 1.0.10.

+

=== Reference File for Converting '=' in the Sequence to Bases (<code>--refFile</code>) ===

If the SAM/BAM file contains '=' in the sequence instead of the actual bases, the bam2FastQ program needs to convert the '=' back to the bases. To do that it needs the reference. Specify the reference by using <code>--refFile</code> followed by the reference filename.

Line 103: Line 133:

./bam bam2FastQ --in myFile.bam --refFile myPath/myRefFile.fa

−

== Output FastQ File Base Name (<code>--outBase</code>) ==

+

=== First in Pair FastQ ReadName Extension (<code>--firstRNExt</code>) ===

+

<code>--firstRNExt</code> overrides the default "/1" that is appended to the Read Name of the first-end of a read pair with the specified value.

+

=== Second in Pair FastQ ReadName Extension (<code>--secondRNExt</code>) ===

+

<code>--secondRNExt</code> overrides the default "/2" that is appended to the Read Name of the second-end of a read pair with the specified value.

+

=== Include the Read Name on the "+" line of the FASTQ (<code>--rnPlus</code>) ===

+

By default the read name is not included on the "+" line of the FASTQ files. To include the read name and the extension for paired-end reads, specify <code>--rnPlus</code>.

+

=== Do Not Reverse Complement Reverse Strands (<code>--noReverseComp</code>) ===

+

By default, reads marked as reverse in the BAM file are reverse complemented prior to writing to the FASTQ files. <code>--noReverseComp</code> disables this feature, and skips the reverse complement step.

+

For example, if a sequence is ACCGTG marked as reverse, the default FASTQ record will be written as: CACGGT

+

Specifying <code>--noReverseComp</code> would result in a FASTQ sequence of ACCGTG

+

== Optional Output Filenames ==

+

=== Output FastQ File Base Name (<code>--outBase</code>) ===

You can replace the default output base name by using the <code>--outBase</code> option.

Line 113: Line 168:

The value specified by this parameter is overridden by <code>--firstOut</code>, <code>--secondOut</code>, and <code>--unpairedOut</code>, but is used for whichever output files are not specified.

−

== Output FastQ File Name For the First End of Paired End (<code>--firstOut</code>) ==

+

With the [[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]] option, the resulting fastq's would instead be:

+

#myNewPath/myFastQBase.fastq

+

#myNewPath/myFastQBase_interleaved.fastq

+

=== Output FastQ File Name For the First End of Paired End (<code>--firstOut</code>) ===

This setting overides the default and <code>--outBase</code> file name.

Line 119: Line 178:

The entire filename and extension must be specified.

−

Does not affect the filenames for the ~~first~~ end or for unpaired reads.

+

Does not affect the filenames for the second end or for unpaired reads.

For example:

./bam bam2FastQ --in myFile.bam --firstOut myFileEnd1.fastq

−

== Output FastQ File Name For the Second End of Paired End (<code>--secondOut</code>) ==

+

=== Output FastQ File Name For the Second End of Paired End (<code>--secondOut</code>) ===

This setting overides the default and <code>--outBase</code> file name.

Line 135: Line 194:

./bam bam2FastQ --in myFile.bam --secondOut myFileEnd2.fastq

−

== Output FastQ File Name For Unpaired Reads (<code>--unpairedOut</code>) ==

+

=== Output FastQ File Name For Unpaired Reads (<code>--unpairedOut</code>) ===

This setting overides the default and <code>--outBase</code> file names.

Line 141: Line 200:

The entire filename and extension must be specified.

−

Does not affect the filenames for the ~~two~~ paired end fastq files.

+

Does not affect the filenames for the paired-end fastq files.

For example:

./bam bam2FastQ --in myFile.bam --unpairedOut myFileUnpaired.fastq

−

~~== First in Pair FastQ ReadName Extension (<code>--firstRNExt</code>) ==~~

+

−

~~<code>--firstRNExt</code> overrides the default "/1" that is appended to the Read Name of the first-end of a read pair with the specified value.~~

−

~~== Second in Pair FastQ ReadName Extension (<code>--secondRNExt</code>) ==~~

−

~~<code>--secondRNExt</code> overrides the default "/2" that is appended to the Read Name of the second-end of a read pair with the specified value.~~

−

~~== Include the Read Name on the "+" line of the FASTQ (<code>--rnPlus</code>) ==~~

−

~~By default the read name is not included on the "+" line of the FASTQ files. To include the read name and the extension for paired-end reads, specify <code>--rnPlus</code>.~~

−

~~== Do Not Reverse Complement Reverse Strands (<code>--noReverseComp</code>) ==~~

−

By default, reads marked as reverse in the BAM file are reverse complemented prior to writing to the FASTQ files. <code>--noReverseComp</code> disables this feature, and skips the reverse complement step.

−

~~For example, if a sequence is ACCGTG marked as reverse, the default FASTQ record will be written as: CACGGT~~

−

~~Specifying <code>--noReverseComp</code> would result in a FASTQ sequence of ACCGTG~~

−

~~{{paramsParameter}}~~

−

= Return Value =

Line 174: Line 211:

Returns -1 if input parameters are invalid.

−

Returns the SamStatus for the reads/writes (0 on success).

+

Returns the SamStatus for the reads/writes (0 on success, non-0 on failure).

Mktrost

Administrators

3,045

edits

Changes

BamUtil: bam2FastQ (view source)

Revision as of 18:47, 6 January 2014

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools