Changes

1,938 bytes added , 23:53, 5 March 2016

Line 2: Line 2:

The <code>bam2FastQ</code> option on the [[bamUtil]] converts a BAM file into FastQ files. This is necessary when only BAM files are delivered but a new alignment is desired. By converting BAM to FastQ files new alignments can be done using FastQ files

−

'''NOTE: ~~This tool does not work on templates that have more than 2 segments~~. It ~~does not properly match~~ reads ~~when more than~~ 2 ~~reads have~~ the same read name.'''

+

'''NOTE: Secondary and Supplementary reads are skipped when converting to FastQ. It assumes that there will only be 2 reads (the 2 primary mates) with the same read name that are not secondary or supplementary.'''

−

'''NOTE: ~~This tool does not~~ split reads into read group specific FASTQs~~. If you want Read Group specific FASTQ files, first run [[BamUtil: splitBam]] to first split the BAM into 1 BAM per Read Group. Then run bam2FastQ on each bam~~.'''

+

'''NOTE: Use the --splitRG option to split reads into read group specific FASTQs.'''

== How to use it ==

Line 77: Line 77:

= Usage =

−

./bam bam2FastQ --in <inputFile> [--readName] [--refFile <referenceFile>] [--outBase <outputFileBase>] [--firstOut <1stReadInPairOutFile>] [--merge|--secondOut <2ndReadInPairOutFile>] [--unpairedOut <unpairedOutFile>] [--firstRNExt <~~firstInPairReadNameExt~~>] [--secondRNExt <secondInPairReadNameExt>] [--rnPlus] [--noReverseComp] [--noeof] [--params]

+

./bam bam2FastQ --in <inputFile> [--readName] [--splitRG] [--qualField <tag>] [--refFile <referenceFile>] [--outBase <outputFileBase>] [--firstOut <1stReadInPairOutFile>] [--merge|--secondOut <2ndReadInPairOutFile>] [--unpairedOut <unpairedOutFile>] [--firstRNExt <firstInPairReadNameEx t>] [--secondRNExt <secondInPairReadNameExt>] [--rnPlus] [--noReverseComp] [--region <chr>[:<pos>[:<base>]]] [--gzip] [--noeof] [--params]

= Parameters =

<pre>

−

Required Parameters:

+

Required Parameters:

−

--in : the SAM/BAM file to convert to FastQ

+

--in : the SAM/BAM file to convert to FastQ

−

Optional Parameters:

+

Optional Parameters:

−

--readname : Process the BAM as readName sorted instead

+

--readname : Process the BAM as readName sorted instead

−

of coordinate if the header does not indicate a sort order.

+

of coordinate if the header does not indicate a sort order.

−

--merge : Generate 1 interleaved (merged) FASTQ for paired-ends (unpaired in a separate file)

+

--splitRG : Split into RG specific fastqs.

−

use firstOut to override the filename of the interleaved file.

+

--qualField : Use the base quality from the specified tag

−

--refFile : Reference file for converting '=' in the sequence to the actual base

+

rather than from the Quality field (default)

−

if '=' are found and the refFile is not specified, 'N' is written to the FASTQ

+

--merge : Generate 1 interleaved (merged) FASTQ for paired-ends (unpaired in a separate file)

−

--firstRNExt : read name extension to use for first read in a pair

+

use firstOut to override the filename of the interleaved file.

−

default is "/1"

+

--refFile : Reference file for converting '=' in the sequence to the actual base

−

--secondRNExt : read name extension to use for second read in a pair

+

if '=' are found and the refFile is not specified, 'N' is written to the FASTQ

−

default is "/2"

+

--firstRNExt : read name extension to use for first read in a pair

−

--rnPlus : Add the Read Name/extension to the '+' line of the fastq records

+

default is "/1"

−

--noReverseComp : Do not reverse complement reads marked as reverse

+

--secondRNExt : read name extension to use for second read in a pair

−

--noeof : Do not expect an EOF block on a bam file.

+

default is "/2"

−

--params : Print the parameter settings to stderr

+

--rnPlus : Add the Read Name/extension to the '+' line of the fastq records

−

Optional OutputFile Names:

+

--noReverseComp : Do not reverse complement reads marked as reverse

−

--outBase : Base output name for generated output files

+

--region : Only convert reads containing the specified region/nucleotide.

−

--firstOut : Output name for the first in pair file

+

Position formatted as: chr:pos:base

−

over-rides setting of outBase

+

pos (0-based) & base are optional.

−

--secondOut : Output name for the second in pair file

+

--gzip : Compress the output FASTQ files using gzip

−

over-rides setting of outBase

+

--noeof : Do not expect an EOF block on a bam file.

−

--unpairedOut : Output name for unpaired reads

+

--params : Print the parameter settings to stderr

−

over-rides setting of outBase

+

Optional OutputFile Names:

+

--outBase : Base output name for generated output files

+

--firstOut : Output name for the first in pair file

+

over-rides setting of outBase

+

--secondOut : Output name for the second in pair file

+

over-rides setting of outBase

+

--unpairedOut : Output name for unpaired reads

+

over-rides setting of outBase

</pre>

Line 119: Line 126:

The file does not need to be strictly sorted by read name. The only requirement is that matching read names are next to each other.

+

=== Split into RG Specific FASTQs (<code>--splitRG</code>) ===

+

Create RG specific FASTQ files.

+

Cannot be specified with firstOut/secondOut/unpairedOut since there will be a different filename for each RG.

+

Cannot write to stdout when <code>--splitRG</code> is specified.

+

Output filenames will be <outBase>.<RG>_1.fastq, <outBase>.<RG>_2.fastq, and <outBase>.<RG>.fastq. A fastq list file <outBase>.list will be created containing MERGE_NAME (the RG tag's SM value or outBase if the value is empty), fastq 1, fastq 2 (or . if it is a single ended fastq), and the RG tag string.

+

=== Use the Base Quality from the Specified Tag (<code>--qualField</code>) ===

+

By default, the quality field is used for the Base Qualities in the FASTQ file. Specify <code>--qualField <tagName></code> to use the base qualities from the specified tag instead of the quality field.

+

=== Generate 1 Paired-End Output File (<code>--merge</code>) ===

Line 155: Line 177:

Specifying <code>--noReverseComp</code> would result in a FASTQ sequence of ACCGTG

+

=== Only Convert the Specified Region (<code>--region</code>) ===

+

Only convert reads containing the specified region/nucleotide.

+

Position formatted as: chr:pos:base

+

pos (0-based) & base are optional.

Mktrost

Administrators

3,045

edits

Changes

BamUtil: bam2FastQ (view source)

Revision as of 23:53, 5 March 2016

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools