Line 2: |
Line 2: |
| The <code>bam2FastQ</code> option on the [[bamUtil]] converts a BAM file into FastQ files. This is necessary when only BAM files are delivered but a new alignment is desired. By converting BAM to FastQ files new alignments can be done using FastQ files | | The <code>bam2FastQ</code> option on the [[bamUtil]] converts a BAM file into FastQ files. This is necessary when only BAM files are delivered but a new alignment is desired. By converting BAM to FastQ files new alignments can be done using FastQ files |
| | | |
− | '''NOTE: This tool does not work on templates that have more than 2 segments. It does not properly match reads when more than 2 reads have the same read name.''' | + | '''NOTE: Secondary and Supplementary reads are skipped when converting to FastQ. It assumes that there will only be 2 reads (the 2 primary mates) with the same read name that are not secondary or supplementary.''' |
| | | |
− | '''NOTE: This tool does not split reads into read group specific FASTQs. If you want Read Group specific FASTQ files, first run [[BamUtil: splitBam]] to first split the BAM into 1 BAM per Read Group. Then run bam2FastQ on each bam.''' | + | '''NOTE: Use the --splitRG option to split reads into read group specific FASTQs.''' |
| | | |
| == How to use it == | | == How to use it == |
Line 77: |
Line 77: |
| | | |
| = Usage = | | = Usage = |
− | ./bam bam2FastQ --in <inputFile> [--readName] [--refFile <referenceFile>] [--outBase <outputFileBase>] [--firstOut <1stReadInPairOutFile>] [--merge|--secondOut <2ndReadInPairOutFile>] [--unpairedOut <unpairedOutFile>] [--firstRNExt <firstInPairReadNameExt>] [--secondRNExt <secondInPairReadNameExt>] [--rnPlus] [--noReverseComp] [--noeof] [--params] | + | ./bam bam2FastQ --in <inputFile> [--readName] [--splitRG] [--qualField <tag>] [--refFile <referenceFile>] [--outBase <outputFileBase>] [--firstOut <1stReadInPairOutFile>] [--merge|--secondOut <2ndReadInPairOutFile>] [--unpairedOut <unpairedOutFile>] [--firstRNExt <firstInPairReadNameEx t>] [--secondRNExt <secondInPairReadNameExt>] [--rnPlus] [--noReverseComp] [--region <chr>[:<pos>[:<base>]]] [--gzip] [--noeof] [--params] |
| | | |
| = Parameters = | | = Parameters = |
| <pre> | | <pre> |
− | Required Parameters:
| + | Required Parameters: |
− | --in : the SAM/BAM file to convert to FastQ
| + | --in : the SAM/BAM file to convert to FastQ |
− | Optional Parameters:
| + | Optional Parameters: |
− | --readname : Process the BAM as readName sorted instead
| + | --readname : Process the BAM as readName sorted instead |
− | of coordinate if the header does not indicate a sort order.
| + | of coordinate if the header does not indicate a sort order. |
− | --merge : Generate 1 interleaved (merged) FASTQ for paired-ends (unpaired in a separate file)
| + | --splitRG : Split into RG specific fastqs. |
− | use firstOut to override the filename of the interleaved file.
| + | --qualField : Use the base quality from the specified tag |
− | --refFile : Reference file for converting '=' in the sequence to the actual base
| + | rather than from the Quality field (default) |
− | if '=' are found and the refFile is not specified, 'N' is written to the FASTQ
| + | --merge : Generate 1 interleaved (merged) FASTQ for paired-ends (unpaired in a separate file) |
− | --firstRNExt : read name extension to use for first read in a pair
| + | use firstOut to override the filename of the interleaved file. |
− | default is "/1"
| + | --refFile : Reference file for converting '=' in the sequence to the actual base |
− | --secondRNExt : read name extension to use for second read in a pair
| + | if '=' are found and the refFile is not specified, 'N' is written to the FASTQ |
− | default is "/2"
| + | --firstRNExt : read name extension to use for first read in a pair |
− | --rnPlus : Add the Read Name/extension to the '+' line of the fastq records
| + | default is "/1" |
− | --noReverseComp : Do not reverse complement reads marked as reverse
| + | --secondRNExt : read name extension to use for second read in a pair |
− | --noeof : Do not expect an EOF block on a bam file.
| + | default is "/2" |
− | --params : Print the parameter settings to stderr
| + | --rnPlus : Add the Read Name/extension to the '+' line of the fastq records |
− | Optional OutputFile Names:
| + | --noReverseComp : Do not reverse complement reads marked as reverse |
− | --outBase : Base output name for generated output files
| + | --region : Only convert reads containing the specified region/nucleotide. |
− | --firstOut : Output name for the first in pair file
| + | Position formatted as: chr:pos:base |
− | over-rides setting of outBase
| + | pos (0-based) & base are optional. |
− | --secondOut : Output name for the second in pair file
| + | --gzip : Compress the output FASTQ files using gzip |
− | over-rides setting of outBase
| + | --noeof : Do not expect an EOF block on a bam file. |
− | --unpairedOut : Output name for unpaired reads
| + | --params : Print the parameter settings to stderr |
− | over-rides setting of outBase
| + | Optional OutputFile Names: |
| + | --outBase : Base output name for generated output files |
| + | --firstOut : Output name for the first in pair file |
| + | over-rides setting of outBase |
| + | --secondOut : Output name for the second in pair file |
| + | over-rides setting of outBase |
| + | --unpairedOut : Output name for unpaired reads |
| + | over-rides setting of outBase |
| </pre> | | </pre> |
| | | |
Line 119: |
Line 126: |
| | | |
| The file does not need to be strictly sorted by read name. The only requirement is that matching read names are next to each other. | | The file does not need to be strictly sorted by read name. The only requirement is that matching read names are next to each other. |
| + | |
| + | === Split into RG Specific FASTQs (<code>--splitRG</code>) === |
| + | |
| + | Create RG specific FASTQ files. |
| + | |
| + | Cannot be specified with firstOut/secondOut/unpairedOut since there will be a different filename for each RG. |
| + | |
| + | Cannot write to stdout when <code>--splitRG</code> is specified. |
| + | |
| + | Output filenames will be <outBase>.<RG>_1.fastq, <outBase>.<RG>_2.fastq, and <outBase>.<RG>.fastq. A fastq list file <outBase>.list will be created containing MERGE_NAME (the RG tag's SM value or outBase if the value is empty), fastq 1, fastq 2 (or . if it is a single ended fastq), and the RG tag string. |
| + | |
| + | === Use the Base Quality from the Specified Tag (<code>--qualField</code>) === |
| + | |
| + | By default, the quality field is used for the Base Qualities in the FASTQ file. Specify <code>--qualField <tagName></code> to use the base qualities from the specified tag instead of the quality field. |
| + | |
| | | |
| === Generate 1 Paired-End Output File (<code>--merge</code>) === | | === Generate 1 Paired-End Output File (<code>--merge</code>) === |
Line 155: |
Line 177: |
| | | |
| Specifying <code>--noReverseComp</code> would result in a FASTQ sequence of ACCGTG | | Specifying <code>--noReverseComp</code> would result in a FASTQ sequence of ACCGTG |
| + | |
| + | === Only Convert the Specified Region (<code>--region</code>) === |
| + | |
| + | Only convert reads containing the specified region/nucleotide. |
| + | |
| + | Position formatted as: chr:pos:base |
| + | |
| + | pos (0-based) & base are optional. |
| | | |
| {{noeofBGZFParameter}} | | {{noeofBGZFParameter}} |