Line 1: |
Line 1: |
− | === Purpose === | + | = Overview of the <code>bam2FastQ</code> function of <code>[[bamUtil]]</code> = |
− | This converts a BAM file into FastQ files. This is necessary when only BAM files are delivered but a new alignment is desired. By converting BAM to FastQ files new alignments can be done using FastQ files
| + | The <code>bam2FastQ</code> option on the [[bamUtil]] converts a BAM file into FastQ files. This is necessary when only BAM files are delivered but a new alignment is desired. By converting BAM to FastQ files new alignments can be done using FastQ files |
| | | |
| == How to use it == | | == How to use it == |
| | | |
− | When bam2FastQ is invoked without any arguments the following information is displayed | + | When bam2FastQ is invoked without any arguments the usage information is displayed as described below under [[#Usage|Usage]]. |
− | The following parameters are in effect:
| |
− | Input BAM/SAM File : (-iname)
| |
− |
| |
− | Output FastQ Files
| |
− | Output : --first [], --second [], --unpaired []
| |
| | | |
− | Required parameter
| + | The input BAM file is required, [[#input File (--in)|input File (--in)]]. |
− | -i InputBAM/SAM
| |
| | | |
− | Optional parameters for output (however either --unpaired is present or both --first and --second are provided)
| + | It works on both read/query name and coordinate sorted SAM/BAM files. |
− | --first firstReadInAPair_FastQ | |
− | --second secondReadInAPair_FastQ
| |
− | --unpaired unpairedReads_FastQ
| |
| | | |
− | In order to extract paired end reads, the BAM file has to be sorted by name, e.g. using samtools. Suppose the BAM file is myinput.bam
| + | If you want to convert a SAM/BAM that is read/query name sorted but the SO field of the header does not specify "queryname", then use the [[#BAM File Is Sorted By Read Name (<code>--readname</code>)|--readName]] option. |
| | | |
− | samtools sort -n myinput.bam myinput.sortByName | + | When processing files sorted by read name, the only requirement is that matching read names are next to each other. It does not need to be in strict alphabetical order. |
| | | |
− | Using sorted bam file to extract paired end fastq files
| + | Any errors and a summary of how many pairs and unpaired reads were processed are written to stderr. |
| | | |
− | bam2FastQ -i myinput.sortByName.BAM --first myread1.fastQ --second myread2.fastQ
| + | === Output Files === |
| + | This program produces 3 output fastq files. |
| + | # unpaired reads |
| + | # first end of paired reads |
| + | # second end of paired reads |
| | | |
− | Or to extract both paired end and single end fastq files (if the bam file contains both single and paired end reads)
| + | The default fastq file names are determined by taking the base name of the input file and adding an extension for each filetype. |
− |
| + | {|border="1" cellspacing="0" cellpadding="2" |
− | bam2FastQ -i myinput.sortByName.BAM --first myread1.fastQ --second myread2.fastQ --unpaired myreadSingle.fastQ
| + | ! Output File Contents !! Extension |
| + | |- |
| + | |unpaired reads |
| + | | .fastq |
| + | |- |
| + | |first end of paired reads |
| + | | _1.fastq |
| + | |- |
| + | |second end of paired reads |
| + | | _2.fastq |
| + | |} |
| | | |
− | Or using bam (sorted or not) file to extract single end fastq files
| + | If the inputFile was "myPath/myFile.bam", the resulting fastq's would be: |
− | | + | #myPath/myFile.fastq |
− | bam2FastQ -i myinput.sortByName.BAM --unpaired myreadSingle.fastQ
| + | #myPath/myFile_1.fastq |
| + | #myPath/myFile_2.fastq |
| + | |
| + | Instead of using the inputFile base name as the output file base, you can specify a different base name by using the [[#Output FastQ File Base Name (--outBase)|--outBase]] option. |
| + | |
| + | You can optionally directly specify the output fastq filenames using: |
| + | * --firstOut firstReadInAPair.fastq |
| + | * --secondOut secondReadInAPair.fastq |
| + | * --unpairedOut unpairedReads.fastq |
| + | If any of these are not specified, the <code>--outBase</code> or default is used for that file. |
| + | |
| + | = Usage = |
| + | ./bam bam2FastQ --in <inputFile> [--readName] [--refFile <referenceFile>] [--outBase <outputFileBase>] [--firstOut <1stReadInPairOutFile>] [--secondOut <2ndReadInPairOutFile>] [--unpairedOut <unpairedOutFile>] [--firstRNExt <firstInPairReadNameExt>] [--secondRNExt <secondInPairReadNameExt>] [--rnPlus] [--noReverseComp] [--noeof] [--params] |
| + | |
| + | |
| + | = Parameters = |
| + | <pre> |
| + | Required Parameters: |
| + | --in : the SAM/BAM file to convert to FastQ |
| + | Optional Parameters: |
| + | --readname : Process the BAM as readName sorted instead |
| + | of coordinate if the header does not indicate a sort order. |
| + | --refFile : Reference file for converting '=' in the sequence to the actual base |
| + | if '=' are found and the refFile is not specified, 'N' is written to the FASTQ |
| + | --outBase : Base output name for generated output files |
| + | --firstOut : Output name for the first in pair file |
| + | over-rides setting of outBase |
| + | --secondOut : Output name for the second in pair file |
| + | over-rides setting of outBase |
| + | --unpairedOut : Output name for unpaired reads |
| + | over-rides setting of outBase |
| + | --firstRNExt : read name extension to use for first read in a pair |
| + | default is "/1" |
| + | --secondRNExt : read name extension to use for second read in a pair |
| + | default is "/2" |
| + | --rnPlus : Add the Read Name/extension to the '+' line of the fastq records |
| + | --noReverseComp : Do not reverse complement reads marked as reverse |
| + | --noeof : Do not expect an EOF block on a bam file. |
| + | --params : Print the parameter settings to stderr |
| + | </pre> |
| + | |
| + | {{inBAMInputFile}} |
| + | |
| + | == Output FastQ File Base Name (<code>--outBase</code>) == |
| + | |
| + | You can replace the default output base name by using the <code>--outBase</code> option. |
| + | If the outBase was "myNewPath/myFastQBase", the resulting fastq's would be: |
| + | #myNewPath/myFastQBase.fastq |
| + | #myNewPath/myFastQBase_1.fastq |
| + | #myNewPath/myFastQBase_2.fastq |
| + | |
| + | The value specified by this parameter is overridden by <code>--firstOut</code>, <code>--secondOut</code>, and <code>--unpairedOut</code>, but is used for whichever output files are not specified. |
| + | |
| + | |
| + | == Output FastQ File Name For the First End of Paired End (<code>--firstOut</code>) == |
| + | |
| + | This setting overides the default and <code>--outBase</code> file name. |
| + | |
| + | The entire filename and extension must be specified. |
| + | |
| + | Does not affect the filenames for the first end or for unpaired reads. |
| + | |
| + | For example: |
| + | ./bam bam2FastQ --in myFile.bam --firstOut myFileEnd1.fastq |
| + | |
| + | |
| + | == Output FastQ File Name For the Second End of Paired End (<code>--secondOut</code>) == |
| + | |
| + | This setting overides the default and <code>--outBase</code> file name. |
| + | |
| + | The entire filename and extension must be specified. |
| + | |
| + | Does not affect the filenames for the first end or for unpaired reads. |
| + | |
| + | For example: |
| + | ./bam bam2FastQ --in myFile.bam --secondOut myFileEnd2.fastq |
| + | |
| + | |
| + | == Output FastQ File Name For Unpaired Reads (<code>--unpairedOut</code>) == |
| + | |
| + | This setting overides the default and <code>--outBase</code> file names. |
| + | |
| + | The entire filename and extension must be specified. |
| + | |
| + | Does not affect the filenames for the two paired end fastq files. |
| + | |
| + | For example: |
| + | ./bam bam2FastQ --in myFile.bam --unpairedOut myFileUnpaired.fastq |
| + | |
| + | |
| + | == BAM File Is Sorted By Read Name (<code>--readname</code>) == |
| + | |
| + | The bam2FastQ program by default checks the sort order in the SAM/BAM header when converting to FASTQ, and if that is not specified, assumes it is sorted by coordinate. |
| + | |
| + | To override the default and force it to assume the file is sorted by readname, specify the <code>--readName</code> option |
| + | |
| + | The file does not need to be strictly sorted by read name. The only requirement is that matching read names are next to each other. |
| + | |
| + | |
| + | == Reference File for Converting '=' in the Sequence to Bases <code>--refFile</code>== |
| + | If the SAM/BAM file contains '=' in the sequence instead of the actual bases, the bam2FastQ program needs to convert the '=' back to the bases. To do that it needs the reference. Specify the reference by using <code>--refFile</code> followed by the reference filename. |
| + | |
| + | For example: |
| + | ./bam bam2FastQ --in myFile.bam --refFile myPath/myRefFile.fa |