Line 1: |
Line 1: |
− | == Polish BAM == | + | = Overview of the <code>polishBam</code> function of <code>bamUtil</code> = |
− | The <code>polishBam</code> program is released as part of the StatGen Library & Tools download. | + | The <code>polishBam</code> option on the [[bamUtil]] executable adds/updates header lines & adds the RG tag to each record. |
| | | |
− | <code>polishBam</code> trims the end of reads in a SAM/BAM file, changing read ends to ‘N’ and quality to ‘!’. | + | = Usage = |
| + | ./bam polishBam (options) --in <inBamFile> --out <outBamFile> |
| | | |
− | | + | = Parameters = |
− | === Parameters ===
| |
| <pre> | | <pre> |
| Required parameters: | | Required parameters: |
Line 22: |
Line 22: |
| --checkSQ : check the consistency of SQ tags (SN and LN) with existing header lines. Must be used with --fasta option | | --checkSQ : check the consistency of SQ tags (SN and LN) with existing header lines. Must be used with --fasta option |
| </pre> | | </pre> |
| + | {{PhoneHomeParamDesc}} |
| + | |
| + | == Required Parameters == |
| + | {{InBAMInputFile}} |
| + | {{OutBAMOutputFile}} |
| + | |
| + | == Optional Parameters == |
| + | === Verbose (<code>--verbose</code>) === |
| + | Use <code>--verbose</code> to turn on verbose mode. |
| + | |
| + | === Specify Log Filename (<code>--log</code>) === |
| + | Use <code>--log</code> followed by the log filename to specify the log filename. Default is the output file basename with a <code>.log</code> extension |
| + | |
| + | === Add the HD Header (<code>--HD</code>) === |
| + | Use <code>--HD</code> followed by the HD header line to add a HD header. Be sure to include "@HD" in the line you specify. |
| + | |
| + | === Add the RG Header (<code>--RG</code>) === |
| + | Use <code>--RG</code> followed by the RG header line to add a RG header. Be sure to include "@RG" in the line you specify. |
| + | |
| + | === Add the PG Header (<code>--PG</code>) === |
| + | Use <code>--PG</code> followed by the PG header line to add a PG header. Be sure to include "@PG" in the line you specify. |
| + | |
| + | === Add MD5 and UR tags to SQ Headers (<code>--fasta</code>) === |
| + | Use <code>--fasta</code> followed by the fasta reference file name to compute MD5sums and update SQ tags with the M5 & UR values. Use the [[#Add the UR tag to SQ Headers (--UR)|<code>--UR</code>]] option to specify a different UR value. |
| + | |
| + | === Add the AS tag to SQ Headers (<code>--AS</code>) === |
| + | Use <code>--AS</code> followed by the genome assembly identify to add the AS tag to the SQ Headers. |
| + | |
| + | === Add the UR tag to SQ Headers (<code>--UR</code>) === |
| + | Use <code>--UR</code> followed by the URI of the sequence to add the UR tag to the SQ Headers. |
| | | |
− | === Usage ===
| + | The UR header will be automatically added with the [[#Add MD5 and UR tags to SQ Headers (--fasta)|<code>--fasta</code>]] option, so if [[#Add MD5 and UR tags to SQ Headers (--fasta)|<code>--fasta</code>]] is used, <code>--UR</code> only needs to be specified if it is different from [[#Add MD5 and UR tags to SQ Headers (--fasta)|<code>--fasta</code>]]. |
− | trimBam [inFile] [outFile] [num-bases-to-trim-on-each-side]
| |
| | | |
| + | === Add the SP tag to SQ Headers (<code>--SP</code>) === |
| + | Use <code>--SP</code> followed by the species to add the SP tag to the SQ Headers. |
| | | |
− | === Return Value ===
| + | {{PhoneHomeParameters}} |
− | Returns the SamStatus for the reads/writes. 0 on success.
| |
| | | |
− | === Example Output === | + | = Return Value = |
| + | Returns 0 on success, non-0 on failure. |
| + | |
| + | = Example = |
| + | Command: |
| <pre> | | <pre> |
− | polishBAM (options) --in=<inBamFile> --out=<outBamFile>
| + | ./bam polishBam --in testFiles/sortedSam.sam --out results/updatedSam.sam --log results/updated.log --checkSQ --fasta testFiles/testFasta.fa --AS my37 --UR testFasta.fa --RG "@RG ID:UM0037:1 SM:Sample2 LB:lb2 PU:mypu CN:UMCORE DT:2010-11-01 PL:ILLUMINA" --PG "@PG ID:polish VN:0.0.1" --SP new --HD "@HD VN:1.0 SO:coordinate GO:none" |
| </pre> | | </pre> |
| | | |
| + | Input File: |
| + | <pre> |
| + | @SQ SN:1 LN:2004 |
| + | @SQ SN:2 LN:2000 |
| + | @SQ SN:3 LN:2005 |
| + | @SQ SN:4 LN:2040 |
| + | @SQ SN:5 LN:2006 |
| + | @RG ID:myID LB:library SM:sample |
| + | @RG ID:myID2 SM:sample2 LB:library2 |
| + | @CO Comment 1 |
| + | @CO Comment 2 |
| + | 18:462+29M5I3M:F:295 97 1 75 0 5M 18 757 0 ACGTN ;>>>> AM:i:0 MD:Z:30A0C5 NM:i:2 XT:A:R |
| + | 18:462+29M5I3M:F:295 97 1 75 0 * 18 757 0 * * AM:i:0 |
| + | 1:1011:F:255+17M15D20M 73 1 1011 0 5M2D = 1011 0 CCGAA 6>6+4 AM:i:0 MD:Z:37 NM:i:0 XT:A:R |
| + | 1:1011:F:255+17M15D20M 133 1 1012 0 * = 1011 0 CTGT >>9> |
| + | 18:462+29M5I3M:F:296 97 1 1751 0 3S2H5M 18 757 0 TGCACGTN 453;>>>> |
| + | 18:462+29M5I3M:F:295 97 2 75 0 5M 18 757 0 ACGTN * AM:i:0 MD:Z:30A0C5 NM:i:2 XT:A:R |
| + | 18:462+29M5I3M:F:297 97 2 1751 0 3S5M1S3H 18 757 0 TGCACGTNG 453;>>>>5 |
| + | 18:462+29M5I3M:F:298 97 3 75 0 3S5M4H 18 757 0 TGCACGTN 453;>>>> |
| + | Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 AACT ==;; |
| + | Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 * * |
| + | </pre> |
| + | |
| + | |
| + | Output File: |
| + | <pre> |
| + | @SQ SN:1 LN:2004 AS:my37 M5:a9cfe5b8c11aa0cc2c0d2bf3602c9804 UR:testFasta.fa SP:new |
| + | @SQ SN:2 LN:2000 AS:my37 M5:7c342606b54aa211a50f5f63ac1cb2eb UR:testFasta.fa SP:new |
| + | @SQ SN:3 LN:2005 AS:my37 M5:c30e547093f33de240b164a4a2ebe3b5 UR:testFasta.fa SP:new |
| + | @SQ SN:4 LN:2040 AS:my37 M5:fc4c559e9da51e93e7875031ddf65f2a UR:testFasta.fa SP:new |
| + | @SQ SN:5 LN:2006 AS:my37 M5:c876194283debb8b507ebd0f82309ec4 UR:testFasta.fa SP:new |
| + | @RG ID:myID LB:library SM:sample |
| + | @RG ID:myID2 SM:sample2 LB:library2 |
| + | @HD VN:1.0 SO:coordinate GO:none |
| + | @RG ID:UM0037:1 SM:Sample2 LB:lb2 PU:mypu CN:UMCORE DT:2010-11-01 PL:ILLUMINA |
| + | @PG ID:polish VN:0.0.1 |
| + | @CO Comment 1 |
| + | @CO Comment 2 |
| + | 18:462+29M5I3M:F:295 97 1 75 0 5M 18 757 0 ACGTN ;>>>> AM:i:0 MD:Z:30A0C5 NM:i:2 RG:Z:UM0037:1 XT:A:R |
| + | 18:462+29M5I3M:F:295 97 1 75 0 * 18 757 0 * * AM:i:0 RG:Z:UM0037:1 |
| + | 1:1011:F:255+17M15D20M 73 1 1011 0 5M2D = 1011 0 CCGAA 6>6+4 AM:i:0 MD:Z:37 NM:i:0 RG:Z:UM0037:1 XT:A:R |
| + | 1:1011:F:255+17M15D20M 133 1 1012 0 * = 1011 0 CTGT >>9> RG:Z:UM0037:1 |
| + | 18:462+29M5I3M:F:296 97 1 1751 0 3S2H5M 18 757 0 TGCACGTN 453;>>>> RG:Z:UM0037:1 |
| + | 18:462+29M5I3M:F:295 97 2 75 0 5M 18 757 0 ACGTN * AM:i:0 MD:Z:30A0C5 NM:i:2 RG:Z:UM0037:1 XT:A:R |
| + | 18:462+29M5I3M:F:297 97 2 1751 0 3S5M1S3H 18 757 0 TGCACGTNG 453;>>>>5 RG:Z:UM0037:1 |
| + | 18:462+29M5I3M:F:298 97 3 75 0 3S5M4H 18 757 0 TGCACGTN 453;>>>> RG:Z:UM0037:1 |
| + | Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 AACT ==;; RG:Z:UM0037:1 |
| + | Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 * * RG:Z:UM0037:1 |
| + | </pre> |
| + | |
| + | Output: |
| + | <pre> |
| + | in testFiles/sortedSam.sam |
| + | out results/updatedSam.sam |
| + | log results/updated.log |
| + | checkSQ |
| + | </pre> |
| + | |
| + | Log File: |
| + | <pre> |
| + | Arguments in effect: |
| + | --in [testFiles/sortedSam.sam] |
| + | --out [results/updatedSam.sam] |
| + | --log [results/updated.log] |
| + | --fasta [testFiles/testFasta.fa] |
| + | --AS [my37] |
| + | --UR [testFasta.fa] |
| + | --SP [new] |
| + | --checkSQ [ON] |
| + | --HD [@HD VN:1.0 SO:coordinate GO:none] |
| + | --RG [@RG ID:UM0037:1 SM:Sample2 LB:lb2 PU:mypu CN:UMCORE DT:2010-11-01 PL:ILLUMINA] |
| + | --PG [@PG ID:polish VN:0.0.1] |
| + | Reading the reference file testFiles/testFasta.fa |
| + | Finished reading the reference file testFiles/testFasta.fa |
| + | Finished checking the consistency of SQ tags |
| + | Creating the header of new output file |
| + | Adding 1 HD, 1 RG, and 1 PG headers |
| + | Finished writing output headers |
| + | Writing output BAM file |
| + | Successfully written 10 records |
| + | </pre> |
| + | |
| + | |
| + | [[Category:BamUtil|polishBam]] |
| + | [[Category:BAM Software]] |
| [[Category:Software]] | | [[Category:Software]] |
− | [[Category:StatGen Download]]
| |
− | [[Category:BAM Software]]
| |