Line 1: |
Line 1: |
− | === Purpose === | + | = Overview of the <code>bam2FastQ</code> function of <code>[[bamUtil]]</code> = |
− | This converts a BAM file into FastQ files. This is necessary when only BAM files are delivered but a new alignment is desired. By converting BAM to FastQ files new alignments can be done using FastQ files
| + | The <code>bam2FastQ</code> option on the [[bamUtil]] converts a BAM file into FastQ files. This is necessary when only BAM files are delivered but a new alignment is desired. By converting BAM to FastQ files new alignments can be done using FastQ files |
| + | |
| + | '''NOTE: Secondary and Supplementary reads are skipped when converting to FastQ. It assumes that there will only be 2 reads (the 2 primary mates) with the same read name that are not secondary or supplementary.''' |
| + | |
| + | '''NOTE: Use the --splitRG option to split reads into read group specific FASTQs.''' |
| | | |
| == How to use it == | | == How to use it == |
| | | |
− | When bam2FastQ is invoked without any arguments the following information is displayed | + | When bam2FastQ is invoked without any arguments the usage information is displayed as described below under [[#Usage|Usage]]. |
− | The following parameters are in effect: | + | |
− | Input BAM/SAM File : (-iname)
| + | The input BAM file is required, [[#input File (--in)|input File (--in)]]. |
− |
| + | |
− | Output FastQ Files
| + | It works on both read/query name and coordinate sorted SAM/BAM files. |
− | Output : --first [], --second [], --single []
| + | |
| + | If you want to convert a SAM/BAM that is read/query name sorted but the SO field of the header does not specify "queryname", then use the [[#BAM File Is Sorted By Read Name (--readname)|--readName]] option. |
| + | |
| + | When processing files sorted by read name, the only requirement is that matching read names are next to each other. It does not need to be in strict alphabetical order. |
| + | |
| + | Read Names in paired-end FASTQ files are appended with "/1" for the first in the pair, and "/2" for the second in the pair. Override these defaults using [[#First in Pair FastQ ReadName Extension (--firstRNExt)|--firstRNExt]] and [[#Second in Pair FastQ ReadName Extension (--secondRNExt)|--secondRNExt]] |
| + | |
| + | Sequences marked as Reverse strands in the SAM/BAM file are reverse complemented prior to writing to the FASTQ files. To skip this step, specify [[#Do Not Reverse Complement Reverse Strands (--noReverseComp)|--noReverseComp]] |
| + | |
| + | Any errors and a summary of how many pairs and unpaired reads were processed are written to stderr. |
| + | |
| + | '''NOTE: This tool does not work on templates that have more than 2 segments. It does not properly match reads when more than 2 reads have the same read name.''' |
| + | |
| + | '''NOTE: This tool does not split reads into read group specific FASTQs. If you want Read Group specific FASTQ files, first run [[BamUtil: splitBam]] to first split the BAM into 1 BAM per Read Group. Then run bam2FastQ on each bam.''' |
| + | |
| + | === Output Files === |
| + | By default, this program produces 3 output fastq files. |
| + | # unpaired reads |
| + | # first end of paired reads |
| + | # second end of paired reads |
| + | |
| + | If the [[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]] option is specified, the program produces 2 output fastq files. |
| + | # unpaired reads |
| + | # interleaved paired-end reads |
| + | |
| + | The default fastq file names are determined by taking the base name of the input file and adding an extension for each filetype. |
| + | {|border="1" cellspacing="0" cellpadding="2" |
| + | ! colspan="2"|Default !!colspan="2"|[[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]] |
| + | |- |
| + | ! Output File Contents !! Extension !! Output File Contents !! Extension |
| + | |- |
| + | |unpaired reads |
| + | | .fastq |
| + | |unpaired reads |
| + | | .fastq |
| + | |- |
| + | |first end of paired reads |
| + | | _1.fastq |
| + | | rowspan="2"|interleaved paired-end reads |
| + | (both first & second end) |
| + | | rowspan="2"|_interleaved.fastq |
| + | |- |
| + | |second end of paired reads |
| + | | _2.fastq |
| + | |} |
| + | |
| + | If the inputFile was "myPath/myFile.bam", the resulting fastqs would be: |
| + | #myPath/myFile.fastq |
| + | #myPath/myFile_1.fastq |
| + | #myPath/myFile_2.fastq |
| + | |
| + | With the [[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]] option, the resulting fastqs would be: |
| + | #myPath/myFile.fastq |
| + | #myPath/myFile_interleaved.fastq |
| + | |
| + | Instead of using the inputFile base name as the output file base, you can specify a different base name by using the [[#Output FastQ File Base Name (--outBase)|--outBase]] option. |
| + | |
| + | You can optionally directly specify the output fastq filenames using: |
| + | * --firstOut firstReadInAPair.fastq (also used for the interleaved filename with [[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]]) |
| + | * --secondOut secondReadInAPair.fastq |
| + | * --unpairedOut unpairedReads.fastq |
| + | If any of these are not specified, the <code>--outBase</code> or default is used for that file. |
| + | |
| + | = Usage = |
| + | ./bam bam2FastQ --in <inputFile> [--readName] [--splitRG] [--qualField <tag>] [--refFile <referenceFile>] [--outBase <outputFileBase>] [--firstOut <1stReadInPairOutFile>] [--merge|--secondOut <2ndReadInPairOutFile>] [--unpairedOut <unpairedOutFile>] [--firstRNExt <firstInPairReadNameEx t>] [--secondRNExt <secondInPairReadNameExt>] [--rnPlus] [--noReverseComp] [--region <chr>[:<pos>[:<base>]]] [--gzip] [--noeof] [--params] |
| + | |
| + | = Parameters = |
| + | <pre> |
| + | Required Parameters: |
| + | --in : the SAM/BAM file to convert to FastQ |
| + | Optional Parameters: |
| + | --readname : Process the BAM as readName sorted instead |
| + | of coordinate if the header does not indicate a sort order. |
| + | --splitRG : Split into RG specific fastqs. |
| + | --qualField : Use the base quality from the specified tag |
| + | rather than from the Quality field (default) |
| + | --merge : Generate 1 interleaved (merged) FASTQ for paired-ends (unpaired in a separate file) |
| + | use firstOut to override the filename of the interleaved file. |
| + | --refFile : Reference file for converting '=' in the sequence to the actual base |
| + | if '=' are found and the refFile is not specified, 'N' is written to the FASTQ |
| + | --firstRNExt : read name extension to use for first read in a pair |
| + | default is "/1" |
| + | --secondRNExt : read name extension to use for second read in a pair |
| + | default is "/2" |
| + | --rnPlus : Add the Read Name/extension to the '+' line of the fastq records |
| + | --noReverseComp : Do not reverse complement reads marked as reverse |
| + | --region : Only convert reads containing the specified region/nucleotide. |
| + | Position formatted as: chr:pos:base |
| + | pos (0-based) & base are optional. |
| + | --gzip : Compress the output FASTQ files using gzip |
| + | --noeof : Do not expect an EOF block on a bam file. |
| + | --params : Print the parameter settings to stderr |
| + | Optional OutputFile Names: |
| + | --outBase : Base output name for generated output files |
| + | --firstOut : Output name for the first in pair file |
| + | over-rides setting of outBase |
| + | --secondOut : Output name for the second in pair file |
| + | over-rides setting of outBase |
| + | --unpairedOut : Output name for unpaired reads |
| + | over-rides setting of outBase |
| + | </pre> |
| + | |
| + | == Required Parameters == |
| + | {{inBAMInputFile}} |
| + | |
| + | == Optional Parameters == |
| + | === BAM File Is Sorted By Read Name (<code>--readname</code>) === |
| + | |
| + | The bam2FastQ program by default checks the sort order in the SAM/BAM header when converting to FASTQ, and if that is not specified, assumes it is sorted by coordinate. |
| + | |
| + | To override the default and force it to assume the file is sorted by readname, specify the <code>--readName</code> option |
| + | |
| + | The file does not need to be strictly sorted by read name. The only requirement is that matching read names are next to each other. |
| + | |
| + | === Split into RG Specific FASTQs (<code>--splitRG</code>) === |
| + | |
| + | Create RG specific FASTQ files. |
| + | |
| + | Cannot be specified with firstOut/secondOut/unpairedOut since there will be a different filename for each RG. |
| + | |
| + | Cannot write to stdout when <code>--splitRG</code> is specified. |
| + | |
| + | Output filenames will be <outBase>.<RG>_1.fastq, <outBase>.<RG>_2.fastq, and <outBase>.<RG>.fastq. A fastq list file <outBase>.list will be created containing MERGE_NAME (the RG tag's SM value or outBase if the value is empty), fastq 1, fastq 2 (or . if it is a single ended fastq), and the RG tag string. |
| + | |
| + | === Use the Base Quality from the Specified Tag (<code>--qualField</code>) === |
| + | |
| + | By default, the quality field is used for the Base Qualities in the FASTQ file. Specify <code>--qualField <tagName></code> to use the base qualities from the specified tag instead of the quality field. |
| + | |
| + | |
| + | === Generate 1 Paired-End Output File (<code>--merge</code>) === |
| + | |
| + | Use the <code>--merge</code> option to generate 1 interleaved (merged) FASTQ for paired-ends instead of 2 files. Unpaired reads are still written to a separate file. |
| + | |
| + | The default extension for the output file is "_interleaved". |
| + | |
| + | Use [[#Output FastQ File Name For the First End of Paired End (--firstOut)|<code>--firstOut</code>]] to override the filename of the interleaved file. |
| + | |
| + | This parameter was added in version 1.0.10. |
| + | |
| + | === Reference File for Converting '=' in the Sequence to Bases (<code>--refFile</code>) === |
| + | If the SAM/BAM file contains '=' in the sequence instead of the actual bases, the bam2FastQ program needs to convert the '=' back to the bases. To do that it needs the reference. Specify the reference by using <code>--refFile</code> followed by the reference filename. |
| + | |
| + | For example: |
| + | ./bam bam2FastQ --in myFile.bam --refFile myPath/myRefFile.fa |
| + | |
| + | === First in Pair FastQ ReadName Extension (<code>--firstRNExt</code>) === |
| + | |
| + | <code>--firstRNExt</code> overrides the default "/1" that is appended to the Read Name of the first-end of a read pair with the specified value. |
| + | |
| + | === Second in Pair FastQ ReadName Extension (<code>--secondRNExt</code>) === |
| + | |
| + | <code>--secondRNExt</code> overrides the default "/2" that is appended to the Read Name of the second-end of a read pair with the specified value. |
| + | |
| + | === Include the Read Name on the "+" line of the FASTQ (<code>--rnPlus</code>) === |
| + | |
| + | By default the read name is not included on the "+" line of the FASTQ files. To include the read name and the extension for paired-end reads, specify <code>--rnPlus</code>. |
| + | |
| + | === Do Not Reverse Complement Reverse Strands (<code>--noReverseComp</code>) === |
| + | |
| + | By default, reads marked as reverse in the BAM file are reverse complemented prior to writing to the FASTQ files. <code>--noReverseComp</code> disables this feature, and skips the reverse complement step. |
| + | |
| + | For example, if a sequence is ACCGTG marked as reverse, the default FASTQ record will be written as: CACGGT |
| + | |
| + | Specifying <code>--noReverseComp</code> would result in a FASTQ sequence of ACCGTG |
| + | |
| + | === Only Convert the Specified Region (<code>--region</code>) === |
| + | |
| + | Only convert reads containing the specified region/nucleotide. |
| + | |
| + | Position formatted as: chr:pos:base |
| + | |
| + | pos (0-based) & base are optional. |
| + | |
| + | {{noeofBGZFParameter}} |
| + | {{paramsParameter}} |
| + | |
| + | == Optional Output Filenames == |
| + | |
| + | === Output FastQ File Base Name (<code>--outBase</code>) === |
| + | |
| + | You can replace the default output base name by using the <code>--outBase</code> option. |
| + | If the outBase was "myNewPath/myFastQBase", the resulting fastq's would be: |
| + | #myNewPath/myFastQBase.fastq |
| + | #myNewPath/myFastQBase_1.fastq |
| + | #myNewPath/myFastQBase_2.fastq |
| + | |
| + | The value specified by this parameter is overridden by <code>--firstOut</code>, <code>--secondOut</code>, and <code>--unpairedOut</code>, but is used for whichever output files are not specified. |
| + | |
| + | With the [[#Generate 1 Paired-End Output File (--merge)|<code>--merge</code>]] option, the resulting fastq's would instead be: |
| + | #myNewPath/myFastQBase.fastq |
| + | #myNewPath/myFastQBase_interleaved.fastq |
| + | |
| + | === Output FastQ File Name For the First End of Paired End (<code>--firstOut</code>) === |
| + | |
| + | This setting overides the default and <code>--outBase</code> file name. |
| + | |
| + | The entire filename and extension must be specified. |
| + | |
| + | Does not affect the filenames for the second end or for unpaired reads. |
| + | |
| + | For example: |
| + | ./bam bam2FastQ --in myFile.bam --firstOut myFileEnd1.fastq |
| + | |
| + | === Output FastQ File Name For the Second End of Paired End (<code>--secondOut</code>) === |
| + | |
| + | This setting overides the default and <code>--outBase</code> file name. |
| + | |
| + | The entire filename and extension must be specified. |
| + | |
| + | Does not affect the filenames for the first end or for unpaired reads. |
| + | |
| + | For example: |
| + | ./bam bam2FastQ --in myFile.bam --secondOut myFileEnd2.fastq |
| + | |
| + | === Output FastQ File Name For Unpaired Reads (<code>--unpairedOut</code>) === |
| | | |
− | Required parameter
| + | This setting overides the default and <code>--outBase</code> file names. |
− | -i InputBAM/SAM
| |
| | | |
− | Optional parameters for output (however either --single or both --first and --second have to be provided)
| + | The entire filename and extension must be specified. |
− | --first firstReadInAPair_FastQ
| |
− | --second secondReadInAPair_FastQ
| |
− | --single unpairedReads_FastQ
| |
| | | |
− | In order to extract paired end reads, the BAM file has to be sorted by name, e.g. using samtools. Suppose the BAM file is myinput.bam
| + | Does not affect the filenames for the paired-end fastq files. |
| | | |
− | samtools sort -n myinput.bam myinput.sortByName.bam | + | For example: |
| + | ./bam bam2FastQ --in myFile.bam --unpairedOut myFileUnpaired.fastq |
| | | |
− | Using sorted bam file to extract paired end fastq files
| + | {{PhoneHomeParameters}} |
| | | |
− | bam2FastQ -i myinput.sortByName.BAM --first myread1.fastQ --second myread2.fastQ
| + | = Return Value = |
| | | |
− | Or to extract both paired end and single end fastq files (if the bam file contains both single and paired end reads)
| + | Returns -1 if input parameters are invalid. |
− |
| |
− | bam2FastQ -i myinput.sortByName.BAM --first myread1.fastQ --second myread2.fastQ --single myreadSingle.fastQ
| |
| | | |
− | Or using bam (sorted or not) file to extract single end fastq files
| + | Returns the SamStatus for the reads/writes (0 on success, non-0 on failure). |
− |
| |
− | bam2FastQ -i myinput.sortByName.BAM --single myreadSingle.fastQ
| |