Changes

11,423 bytes added , 23:53, 5 March 2016

Line 1: Line 1: −

=~~== Purpose ==~~=

+

= Overview of the <code>bam2FastQ</code> function of <code>[[bamUtil]]</code> =

−

~~This~~ converts a BAM file into FastQ files. This is necessary when only BAM files are delivered but a new alignment is desired. By converting BAM to FastQ files new alignments can be done using FastQ files

+

The <code>bam2FastQ</code> option on the [[bamUtil]] converts a BAM file into FastQ files. This is necessary when only BAM files are delivered but a new alignment is desired. By converting BAM to FastQ files new alignments can be done using FastQ files

+

'''NOTE: Secondary and Supplementary reads are skipped when converting to FastQ. It assumes that there will only be 2 reads (the 2 primary mates) with the same read name that are not secondary or supplementary.'''

+

'''NOTE: Use the --splitRG option to split reads into read group specific FASTQs.'''

== How to use it ==

−

When bam2FastQ is invoked without any arguments the ~~following~~ information is displayed

+

When bam2FastQ is invoked without any arguments the usage information is displayed as described below under [[#Usage|Usage]].

−

~~The following parameters~~ are in ~~effect~~:

+

−

~~Input~~ BAM/~~SAM~~ File : (-~~iname~~)

+

The input BAM file is required, [[#input File (--in)|input File (--in)]].

−

+

−

Output FastQ ~~Files~~

+

It works on both read/query name and coordinate sorted SAM/BAM files.

−

Output : --~~first~~ [], --second [], --~~single~~ []

+

If you want to convert a SAM/BAM that is read/query name sorted but the SO field of the header does not specify "queryname", then use the [[#BAM File Is Sorted By Read Name (--readname)|--readName]] option.

+

When processing files sorted by read name, the only requirement is that matching read names are next to each other. It does not need to be in strict alphabetical order.

+

Read Names in paired-end FASTQ files are appended with "/1" for the first in the pair, and "/2" for the second in the pair. Override these defaults using [[#First in Pair FastQ ReadName Extension (--firstRNExt)|--firstRNExt]] and [[#Second in Pair FastQ ReadName Extension (--secondRNExt)|--secondRNExt]]

+

Sequences marked as Reverse strands in the SAM/BAM file are reverse complemented prior to writing to the FASTQ files. To skip this step, specify [[#Do Not Reverse Complement Reverse Strands (--noReverseComp)|--noReverseComp]]

+

Any errors and a summary of how many pairs and unpaired reads were processed are written to stderr.

+

'''NOTE: This tool does not work on templates that have more than 2 segments. It does not properly match reads when more than 2 reads have the same read name.'''

+

'''NOTE: This tool does not split reads into read group specific FASTQs. If you want Read Group specific FASTQ files, first run [[BamUtil: splitBam]] to first split the BAM into 1 BAM per Read Group. Then run bam2FastQ on each bam.'''

+

=== Output Files ===

+

By default, this program produces 3 output fastq files.

+

# unpaired reads

+

# first end of paired reads

+

# second end of paired reads

+

If the [[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]] option is specified, the program produces 2 output fastq files.

+

# unpaired reads

+

# interleaved paired-end reads

+

The default fastq file names are determined by taking the base name of the input file and adding an extension for each filetype.

+

{|border="1" cellspacing="0" cellpadding="2"

+

! colspan="2"|Default !!colspan="2"|[[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]]

+

|-

+

! Output File Contents !! Extension !! Output File Contents !! Extension

+

|-

+

|unpaired reads

+

| .fastq

+

|unpaired reads

+

| .fastq

+

|-

+

|first end of paired reads

+

| _1.fastq

+

| rowspan="2"|interleaved paired-end reads

+

(both first & second end)

+

| rowspan="2"|_interleaved.fastq

+

|-

+

|second end of paired reads

+

| _2.fastq

+

|}

+

If the inputFile was "myPath/myFile.bam", the resulting fastqs would be:

+

#myPath/myFile.fastq

+

#myPath/myFile_1.fastq

+

#myPath/myFile_2.fastq

+

With the [[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]] option, the resulting fastqs would be:

+

#myPath/myFile.fastq

+

#myPath/myFile_interleaved.fastq

+

Instead of using the inputFile base name as the output file base, you can specify a different base name by using the [[#Output FastQ File Base Name (--outBase)|--outBase]] option.

+

You can optionally directly specify the output fastq filenames using:

+

* --firstOut firstReadInAPair.fastq (also used for the interleaved filename with [[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]])

+

* --secondOut secondReadInAPair.fastq

+

* --unpairedOut unpairedReads.fastq

+

If any of these are not specified, the <code>--outBase</code> or default is used for that file.

+

= Usage =

+

./bam bam2FastQ --in <inputFile> [--readName] [--splitRG] [--qualField <tag>] [--refFile <referenceFile>] [--outBase <outputFileBase>] [--firstOut <1stReadInPairOutFile>] [--merge|--secondOut <2ndReadInPairOutFile>] [--unpairedOut <unpairedOutFile>] [--firstRNExt <firstInPairReadNameEx t>] [--secondRNExt <secondInPairReadNameExt>] [--rnPlus] [--noReverseComp] [--region <chr>[:<pos>[:<base>]]] [--gzip] [--noeof] [--params]

+

= Parameters =

+

<pre>

+

Required Parameters:

+

--in : the SAM/BAM file to convert to FastQ

+

Optional Parameters:

+

--readname : Process the BAM as readName sorted instead

+

of coordinate if the header does not indicate a sort order.

+

--splitRG : Split into RG specific fastqs.

+

--qualField : Use the base quality from the specified tag

+

rather than from the Quality field (default)

+

--merge : Generate 1 interleaved (merged) FASTQ for paired-ends (unpaired in a separate file)

+

use firstOut to override the filename of the interleaved file.

+

--refFile : Reference file for converting '=' in the sequence to the actual base

+

if '=' are found and the refFile is not specified, 'N' is written to the FASTQ

+

--firstRNExt : read name extension to use for first read in a pair

+

default is "/1"

+

--secondRNExt : read name extension to use for second read in a pair

+

default is "/2"

+

--rnPlus : Add the Read Name/extension to the '+' line of the fastq records

+

--noReverseComp : Do not reverse complement reads marked as reverse

+

--region : Only convert reads containing the specified region/nucleotide.

+

Position formatted as: chr:pos:base

+

pos (0-based) & base are optional.

+

--gzip : Compress the output FASTQ files using gzip

+

--noeof : Do not expect an EOF block on a bam file.

+

--params : Print the parameter settings to stderr

+

Optional OutputFile Names:

+

--outBase : Base output name for generated output files

+

--firstOut : Output name for the first in pair file

+

over-rides setting of outBase

+

--secondOut : Output name for the second in pair file

+

over-rides setting of outBase

+

--unpairedOut : Output name for unpaired reads

+

over-rides setting of outBase

+

</pre>

+

== Required Parameters ==

+

== Optional Parameters ==

+

=== BAM File Is Sorted By Read Name (<code>--readname</code>) ===

+

The bam2FastQ program by default checks the sort order in the SAM/BAM header when converting to FASTQ, and if that is not specified, assumes it is sorted by coordinate.

+

To override the default and force it to assume the file is sorted by readname, specify the <code>--readName</code> option

+

The file does not need to be strictly sorted by read name. The only requirement is that matching read names are next to each other.

+

=== Split into RG Specific FASTQs (<code>--splitRG</code>) ===

+

Create RG specific FASTQ files.

+

Cannot be specified with firstOut/secondOut/unpairedOut since there will be a different filename for each RG.

+

Cannot write to stdout when <code>--splitRG</code> is specified.

+

Output filenames will be <outBase>.<RG>_1.fastq, <outBase>.<RG>_2.fastq, and <outBase>.<RG>.fastq. A fastq list file <outBase>.list will be created containing MERGE_NAME (the RG tag's SM value or outBase if the value is empty), fastq 1, fastq 2 (or . if it is a single ended fastq), and the RG tag string.

+

=== Use the Base Quality from the Specified Tag (<code>--qualField</code>) ===

+

By default, the quality field is used for the Base Qualities in the FASTQ file. Specify <code>--qualField <tagName></code> to use the base qualities from the specified tag instead of the quality field.

+

=== Generate 1 Paired-End Output File (<code>--merge</code>) ===

+

Use the <code>--merge</code> option to generate 1 interleaved (merged) FASTQ for paired-ends instead of 2 files. Unpaired reads are still written to a separate file.

+

The default extension for the output file is "_interleaved".

+

Use [[#Output FastQ File Name For the First End of Paired End (--firstOut)|<code>--firstOut</code>]] to override the filename of the interleaved file.

+

This parameter was added in version 1.0.10.

+

=== Reference File for Converting '=' in the Sequence to Bases (<code>--refFile</code>) ===

+

If the SAM/BAM file contains '=' in the sequence instead of the actual bases, the bam2FastQ program needs to convert the '=' back to the bases. To do that it needs the reference. Specify the reference by using <code>--refFile</code> followed by the reference filename.

+

For example:

+

./bam bam2FastQ --in myFile.bam --refFile myPath/myRefFile.fa

+

=== First in Pair FastQ ReadName Extension (<code>--firstRNExt</code>) ===

+

<code>--firstRNExt</code> overrides the default "/1" that is appended to the Read Name of the first-end of a read pair with the specified value.

+

=== Second in Pair FastQ ReadName Extension (<code>--secondRNExt</code>) ===

+

<code>--secondRNExt</code> overrides the default "/2" that is appended to the Read Name of the second-end of a read pair with the specified value.

+

=== Include the Read Name on the "+" line of the FASTQ (<code>--rnPlus</code>) ===

+

By default the read name is not included on the "+" line of the FASTQ files. To include the read name and the extension for paired-end reads, specify <code>--rnPlus</code>.

+

=== Do Not Reverse Complement Reverse Strands (<code>--noReverseComp</code>) ===

+

By default, reads marked as reverse in the BAM file are reverse complemented prior to writing to the FASTQ files. <code>--noReverseComp</code> disables this feature, and skips the reverse complement step.

+

For example, if a sequence is ACCGTG marked as reverse, the default FASTQ record will be written as: CACGGT

+

Specifying <code>--noReverseComp</code> would result in a FASTQ sequence of ACCGTG

+

=== Only Convert the Specified Region (<code>--region</code>) ===

+

Only convert reads containing the specified region/nucleotide.

+

Position formatted as: chr:pos:base

+

pos (0-based) & base are optional.

+

== Optional Output Filenames ==

+

=== Output FastQ File Base Name (<code>--outBase</code>) ===

+

You can replace the default output base name by using the <code>--outBase</code> option.

+

If the outBase was "myNewPath/myFastQBase", the resulting fastq's would be:

+

#myNewPath/myFastQBase.fastq

+

#myNewPath/myFastQBase_1.fastq

+

#myNewPath/myFastQBase_2.fastq

+

The value specified by this parameter is overridden by <code>--firstOut</code>, <code>--secondOut</code>, and <code>--unpairedOut</code>, but is used for whichever output files are not specified.

+

With the [[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]] option, the resulting fastq's would instead be:

+

#myNewPath/myFastQBase.fastq

+

#myNewPath/myFastQBase_interleaved.fastq

+

=== Output FastQ File Name For the First End of Paired End (<code>--firstOut</code>) ===

+

This setting overides the default and <code>--outBase</code> file name.

+

The entire filename and extension must be specified.

+

Does not affect the filenames for the second end or for unpaired reads.

+

For example:

+

./bam bam2FastQ --in myFile.bam --firstOut myFileEnd1.fastq

+

=== Output FastQ File Name For the Second End of Paired End (<code>--secondOut</code>) ===

+

This setting overides the default and <code>--outBase</code> file name.

+

The entire filename and extension must be specified.

+

Does not affect the filenames for the first end or for unpaired reads.

+

For example:

+

./bam bam2FastQ --in myFile.bam --secondOut myFileEnd2.fastq

+

=== Output FastQ File Name For Unpaired Reads (<code>--unpairedOut</code>) ===

−

~~Required parameter~~

+

This setting overides the default and <code>--outBase</code> file names.

−

-~~i InputBAM~~/~~SAM~~

−

~~Optional parameters for output (however either --single or both --first~~ and ~~--second have to~~ be ~~provided)~~

+

The entire filename and extension must be specified.

−

~~--first firstReadInAPair_FastQ~~

−

~~--second secondReadInAPair_FastQ~~

−

~~--single unpairedReads_FastQ~~

−

~~In order to extract~~ paired end ~~reads, the BAM file has to be sorted by name, e.g. using samtools. Suppose the BAM file is myinput~~.~~bam~~

+

Does not affect the filenames for the paired-end fastq files.

−

~~samtools sort~~ -~~n myinput~~.bam ~~myinput~~.~~sortByName.bam~~

+

For example:

+

./bam bam2FastQ --in myFile.bam --unpairedOut myFileUnpaired.fastq

−

~~Using sorted bam file to extract paired end fastq files~~

+

−

~~bam2FastQ -i myinput.sortByName.BAM --first myread1.fastQ --second myread2.fastQ~~

+

= Return Value =

−

~~Or to extract both paired end and single end fastq files (~~if ~~the bam file contains both single and paired end reads)~~

+

Returns -1 if input parameters are invalid.

−

~~bam2FastQ -i myinput.sortByName.BAM --first myread1.fastQ --second myread2.fastQ --single myreadSingle~~.~~fastQ~~

−

~~Or using bam~~ (~~sorted or not~~) ~~file to extract single end fastq files~~

+

Returns the SamStatus for the reads/writes (0 on success, non-0 on failure).

−

~~bam2FastQ -i myinput.sortByName.BAM --single myreadSingle~~.~~fastQ~~

Mktrost

Administrators

3,045

edits

Changes

BamUtil: bam2FastQ (view source)

Revision as of 23:53, 5 March 2016

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools