Difference between revisions of "BamUtil: polishBam"
(Created page with '== Trim BAM == The <code>trimBam</code> program is released as part of the StatGen Library & Tools download. <code>trimBam</code> trims the end of reads in a SAM/BAM file, chang…') |
|||
(8 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | = | + | = Overview of the <code>polishBam</code> function of <code>bamUtil</code> = |
− | The <code> | + | The <code>polishBam</code> option on the [[bamUtil]] executable adds/updates header lines & adds the RG tag to each record. |
− | < | + | = Usage = |
+ | ./bam polishBam (options) --in <inBamFile> --out <outBamFile> | ||
+ | = Parameters = | ||
+ | <pre> | ||
+ | Required parameters: | ||
+ | -i/--in : input BAM file | ||
+ | -o/--out : output BAM file | ||
+ | Optional parameters: | ||
+ | -v : turn on verbose mode | ||
+ | -l/--log : writes logfile. <outBamFile>.log will be used if value is unspecified | ||
+ | --HD : add @HD header line | ||
+ | --RG : add @RG header line | ||
+ | --PG : add @PG header line | ||
+ | -f/--fasta : fasta reference file to compute MD5sums and update SQ tags | ||
+ | --AS : AS tag for genome assembly identifier | ||
+ | --UR : UR tag for @SQ tag (if different from --fasta) | ||
+ | --SP : SP tag for @SQ tag | ||
+ | --checkSQ : check the consistency of SQ tags (SN and LN) with existing header lines. Must be used with --fasta option | ||
+ | </pre> | ||
+ | {{PhoneHomeParamDesc}} | ||
+ | |||
+ | == Required Parameters == | ||
+ | {{InBAMInputFile}} | ||
+ | {{OutBAMOutputFile}} | ||
+ | |||
+ | == Optional Parameters == | ||
+ | === Verbose (<code>--verbose</code>) === | ||
+ | Use <code>--verbose</code> to turn on verbose mode. | ||
+ | |||
+ | === Specify Log Filename (<code>--log</code>) === | ||
+ | Use <code>--log</code> followed by the log filename to specify the log filename. Default is the output file basename with a <code>.log</code> extension | ||
+ | |||
+ | === Add the HD Header (<code>--HD</code>) === | ||
+ | Use <code>--HD</code> followed by the HD header line to add a HD header. Be sure to include "@HD" in the line you specify. | ||
+ | |||
+ | === Add the RG Header (<code>--RG</code>) === | ||
+ | Use <code>--RG</code> followed by the RG header line to add a RG header. Be sure to include "@RG" in the line you specify. | ||
+ | |||
+ | === Add the PG Header (<code>--PG</code>) === | ||
+ | Use <code>--PG</code> followed by the PG header line to add a PG header. Be sure to include "@PG" in the line you specify. | ||
+ | |||
+ | === Add MD5 and UR tags to SQ Headers (<code>--fasta</code>) === | ||
+ | Use <code>--fasta</code> followed by the fasta reference file name to compute MD5sums and update SQ tags with the M5 & UR values. Use the [[#Add the UR tag to SQ Headers (--UR)|<code>--UR</code>]] option to specify a different UR value. | ||
+ | |||
+ | === Add the AS tag to SQ Headers (<code>--AS</code>) === | ||
+ | Use <code>--AS</code> followed by the genome assembly identify to add the AS tag to the SQ Headers. | ||
+ | |||
+ | === Add the UR tag to SQ Headers (<code>--UR</code>) === | ||
+ | Use <code>--UR</code> followed by the URI of the sequence to add the UR tag to the SQ Headers. | ||
+ | |||
+ | The UR header will be automatically added with the [[#Add MD5 and UR tags to SQ Headers (--fasta)|<code>--fasta</code>]] option, so if [[#Add MD5 and UR tags to SQ Headers (--fasta)|<code>--fasta</code>]] is used, <code>--UR</code> only needs to be specified if it is different from [[#Add MD5 and UR tags to SQ Headers (--fasta)|<code>--fasta</code>]]. | ||
+ | |||
+ | === Add the SP tag to SQ Headers (<code>--SP</code>) === | ||
+ | Use <code>--SP</code> followed by the species to add the SP tag to the SQ Headers. | ||
+ | |||
+ | {{PhoneHomeParameters}} | ||
+ | |||
+ | = Return Value = | ||
+ | Returns 0 on success, non-0 on failure. | ||
− | == | + | = Example = |
+ | Command: | ||
<pre> | <pre> | ||
− | + | ./bam polishBam --in testFiles/sortedSam.sam --out results/updatedSam.sam --log results/updated.log --checkSQ --fasta testFiles/testFasta.fa --AS my37 --UR testFasta.fa --RG "@RG ID:UM0037:1 SM:Sample2 LB:lb2 PU:mypu CN:UMCORE DT:2010-11-01 PL:ILLUMINA" --PG "@PG ID:polish VN:0.0.1" --SP new --HD "@HD VN:1.0 SO:coordinate GO:none" | |
− | |||
− | |||
− | |||
</pre> | </pre> | ||
− | == | + | Input File: |
− | + | <pre> | |
+ | @SQ SN:1 LN:2004 | ||
+ | @SQ SN:2 LN:2000 | ||
+ | @SQ SN:3 LN:2005 | ||
+ | @SQ SN:4 LN:2040 | ||
+ | @SQ SN:5 LN:2006 | ||
+ | @RG ID:myID LB:library SM:sample | ||
+ | @RG ID:myID2 SM:sample2 LB:library2 | ||
+ | @CO Comment 1 | ||
+ | @CO Comment 2 | ||
+ | 18:462+29M5I3M:F:295 97 1 75 0 5M 18 757 0 ACGTN ;>>>> AM:i:0 MD:Z:30A0C5 NM:i:2 XT:A:R | ||
+ | 18:462+29M5I3M:F:295 97 1 75 0 * 18 757 0 * * AM:i:0 | ||
+ | 1:1011:F:255+17M15D20M 73 1 1011 0 5M2D = 1011 0 CCGAA 6>6+4 AM:i:0 MD:Z:37 NM:i:0 XT:A:R | ||
+ | 1:1011:F:255+17M15D20M 133 1 1012 0 * = 1011 0 CTGT >>9> | ||
+ | 18:462+29M5I3M:F:296 97 1 1751 0 3S2H5M 18 757 0 TGCACGTN 453;>>>> | ||
+ | 18:462+29M5I3M:F:295 97 2 75 0 5M 18 757 0 ACGTN * AM:i:0 MD:Z:30A0C5 NM:i:2 XT:A:R | ||
+ | 18:462+29M5I3M:F:297 97 2 1751 0 3S5M1S3H 18 757 0 TGCACGTNG 453;>>>>5 | ||
+ | 18:462+29M5I3M:F:298 97 3 75 0 3S5M4H 18 757 0 TGCACGTN 453;>>>> | ||
+ | Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 AACT ==;; | ||
+ | Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 * * | ||
+ | </pre> | ||
− | == | + | Output File: |
− | + | <pre> | |
+ | @SQ SN:1 LN:2004 AS:my37 M5:a9cfe5b8c11aa0cc2c0d2bf3602c9804 UR:testFasta.fa SP:new | ||
+ | @SQ SN:2 LN:2000 AS:my37 M5:7c342606b54aa211a50f5f63ac1cb2eb UR:testFasta.fa SP:new | ||
+ | @SQ SN:3 LN:2005 AS:my37 M5:c30e547093f33de240b164a4a2ebe3b5 UR:testFasta.fa SP:new | ||
+ | @SQ SN:4 LN:2040 AS:my37 M5:fc4c559e9da51e93e7875031ddf65f2a UR:testFasta.fa SP:new | ||
+ | @SQ SN:5 LN:2006 AS:my37 M5:c876194283debb8b507ebd0f82309ec4 UR:testFasta.fa SP:new | ||
+ | @RG ID:myID LB:library SM:sample | ||
+ | @RG ID:myID2 SM:sample2 LB:library2 | ||
+ | @HD VN:1.0 SO:coordinate GO:none | ||
+ | @RG ID:UM0037:1 SM:Sample2 LB:lb2 PU:mypu CN:UMCORE DT:2010-11-01 PL:ILLUMINA | ||
+ | @PG ID:polish VN:0.0.1 | ||
+ | @CO Comment 1 | ||
+ | @CO Comment 2 | ||
+ | 18:462+29M5I3M:F:295 97 1 75 0 5M 18 757 0 ACGTN ;>>>> AM:i:0 MD:Z:30A0C5 NM:i:2 RG:Z:UM0037:1 XT:A:R | ||
+ | 18:462+29M5I3M:F:295 97 1 75 0 * 18 757 0 * * AM:i:0 RG:Z:UM0037:1 | ||
+ | 1:1011:F:255+17M15D20M 73 1 1011 0 5M2D = 1011 0 CCGAA 6>6+4 AM:i:0 MD:Z:37 NM:i:0 RG:Z:UM0037:1 XT:A:R | ||
+ | 1:1011:F:255+17M15D20M 133 1 1012 0 * = 1011 0 CTGT >>9> RG:Z:UM0037:1 | ||
+ | 18:462+29M5I3M:F:296 97 1 1751 0 3S2H5M 18 757 0 TGCACGTN 453;>>>> RG:Z:UM0037:1 | ||
+ | 18:462+29M5I3M:F:295 97 2 75 0 5M 18 757 0 ACGTN * AM:i:0 MD:Z:30A0C5 NM:i:2 RG:Z:UM0037:1 XT:A:R | ||
+ | 18:462+29M5I3M:F:297 97 2 1751 0 3S5M1S3H 18 757 0 TGCACGTNG 453;>>>>5 RG:Z:UM0037:1 | ||
+ | 18:462+29M5I3M:F:298 97 3 75 0 3S5M4H 18 757 0 TGCACGTN 453;>>>> RG:Z:UM0037:1 | ||
+ | Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 AACT ==;; RG:Z:UM0037:1 | ||
+ | Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 * * RG:Z:UM0037:1 | ||
+ | </pre> | ||
− | + | Output: | |
<pre> | <pre> | ||
− | + | in testFiles/sortedSam.sam | |
− | + | out results/updatedSam.sam | |
− | + | log results/updated.log | |
− | + | checkSQ | |
+ | </pre> | ||
− | + | Log File: | |
− | + | <pre> | |
+ | Arguments in effect: | ||
+ | --in [testFiles/sortedSam.sam] | ||
+ | --out [results/updatedSam.sam] | ||
+ | --log [results/updated.log] | ||
+ | --fasta [testFiles/testFasta.fa] | ||
+ | --AS [my37] | ||
+ | --UR [testFasta.fa] | ||
+ | --SP [new] | ||
+ | --checkSQ [ON] | ||
+ | --HD [@HD VN:1.0 SO:coordinate GO:none] | ||
+ | --RG [@RG ID:UM0037:1 SM:Sample2 LB:lb2 PU:mypu CN:UMCORE DT:2010-11-01 PL:ILLUMINA] | ||
+ | --PG [@PG ID:polish VN:0.0.1] | ||
+ | Reading the reference file testFiles/testFasta.fa | ||
+ | Finished reading the reference file testFiles/testFasta.fa | ||
+ | Finished checking the consistency of SQ tags | ||
+ | Creating the header of new output file | ||
+ | Adding 1 HD, 1 RG, and 1 PG headers | ||
+ | Finished writing output headers | ||
+ | Writing output BAM file | ||
+ | Successfully written 10 records | ||
</pre> | </pre> | ||
+ | |||
+ | [[Category:BamUtil|polishBam]] | ||
+ | [[Category:BAM Software]] | ||
[[Category:Software]] | [[Category:Software]] | ||
− | |||
− |
Latest revision as of 13:06, 6 January 2014
Overview of the polishBam
function of bamUtil
The polishBam
option on the bamUtil executable adds/updates header lines & adds the RG tag to each record.
Usage
./bam polishBam (options) --in <inBamFile> --out <outBamFile>
Parameters
Required parameters: -i/--in : input BAM file -o/--out : output BAM file Optional parameters: -v : turn on verbose mode -l/--log : writes logfile. <outBamFile>.log will be used if value is unspecified --HD : add @HD header line --RG : add @RG header line --PG : add @PG header line -f/--fasta : fasta reference file to compute MD5sums and update SQ tags --AS : AS tag for genome assembly identifier --UR : UR tag for @SQ tag (if different from --fasta) --SP : SP tag for @SQ tag --checkSQ : check the consistency of SQ tags (SN and LN) with existing header lines. Must be used with --fasta option
PhoneHome: --noPhoneHome : disable PhoneHome (default enabled) --phoneHomeThinning : adjust the PhoneHome thinning parameter (default 50)
Required Parameters
Input File (--in
)
Use --in
followed by your file name to specify the SAM/BAM input file.
The program automatically determines if your input file is SAM/BAM/uncompressed BAM without any input other than a filename from the user, unless your input file is stdin.
A -
is used to indicate to read from stdin and the extension is used to determine the file type (no extension indicates SAM).
SAM/BAM/Uncompressed BAM from file | --in yourFileName
|
SAM from stdin | --in - |
BAM from stdin | --in -.bam |
Uncompressed BAM from stdin | --in -.ubam |
Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools
implementation so pipes between our tools and samtools
are supported.
Output File (--out
)
Use --out
followed by your file name to specify the SAM/BAM output file.
The file extension is used to determine whether to write SAM/BAM/uncompressed BAM. A -
is used to indicate stdout and the extension for file type (no extension is SAM).
SAM to file | --out yourFileName.sam
|
BAM to file | --out yourFileName.bam
|
Uncompressed BAM to file | --out yourFileName.ubam
|
SAM to stdout | --out -
|
BAM to stdout | --out -.bam
|
Uncompressed BAM to stdout | --out -.ubam
|
Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools
implementation so pipes between our tools and samtools
are supported.
Optional Parameters
Verbose (--verbose
)
Use --verbose
to turn on verbose mode.
Specify Log Filename (--log
)
Use --log
followed by the log filename to specify the log filename. Default is the output file basename with a .log
extension
Add the HD Header (--HD
)
Use --HD
followed by the HD header line to add a HD header. Be sure to include "@HD" in the line you specify.
Add the RG Header (--RG
)
Use --RG
followed by the RG header line to add a RG header. Be sure to include "@RG" in the line you specify.
Add the PG Header (--PG
)
Use --PG
followed by the PG header line to add a PG header. Be sure to include "@PG" in the line you specify.
Add MD5 and UR tags to SQ Headers (--fasta
)
Use --fasta
followed by the fasta reference file name to compute MD5sums and update SQ tags with the M5 & UR values. Use the --UR
option to specify a different UR value.
Add the AS tag to SQ Headers (--AS
)
Use --AS
followed by the genome assembly identify to add the AS tag to the SQ Headers.
Add the UR tag to SQ Headers (--UR
)
Use --UR
followed by the URI of the sequence to add the UR tag to the SQ Headers.
The UR header will be automatically added with the --fasta
option, so if --fasta
is used, --UR
only needs to be specified if it is different from --fasta
.
Add the SP tag to SQ Headers (--SP
)
Use --SP
followed by the species to add the SP tag to the SQ Headers.
PhoneHome Parameters
See PhoneHome for more information on how PhoneHome works and what it does.
Turn off PhoneHome (--noPhoneHome
)
Use the --noPhoneHome
option to completely disable PhoneHome. PhoneHome is enabled by default based on the thinning parameter.
Adjust the Frequency of PhoneHome (--phoneHomeThinning
)
Use --phoneHomeThinning
to modify the percentage of the time that PhoneHome will run (0-100).
- By default,
--phoneHomeThinning
is set to 50, running 50% of the time. - PhoneHome will only occur if the run's random number modulo 100 is less than the --phoneHomeThinning value.
- N/A if
--noPhoneHome
is set.
Return Value
Returns 0 on success, non-0 on failure.
Example
Command:
./bam polishBam --in testFiles/sortedSam.sam --out results/updatedSam.sam --log results/updated.log --checkSQ --fasta testFiles/testFasta.fa --AS my37 --UR testFasta.fa --RG "@RG ID:UM0037:1 SM:Sample2 LB:lb2 PU:mypu CN:UMCORE DT:2010-11-01 PL:ILLUMINA" --PG "@PG ID:polish VN:0.0.1" --SP new --HD "@HD VN:1.0 SO:coordinate GO:none"
Input File:
@SQ SN:1 LN:2004 @SQ SN:2 LN:2000 @SQ SN:3 LN:2005 @SQ SN:4 LN:2040 @SQ SN:5 LN:2006 @RG ID:myID LB:library SM:sample @RG ID:myID2 SM:sample2 LB:library2 @CO Comment 1 @CO Comment 2 18:462+29M5I3M:F:295 97 1 75 0 5M 18 757 0 ACGTN ;>>>> AM:i:0 MD:Z:30A0C5 NM:i:2 XT:A:R 18:462+29M5I3M:F:295 97 1 75 0 * 18 757 0 * * AM:i:0 1:1011:F:255+17M15D20M 73 1 1011 0 5M2D = 1011 0 CCGAA 6>6+4 AM:i:0 MD:Z:37 NM:i:0 XT:A:R 1:1011:F:255+17M15D20M 133 1 1012 0 * = 1011 0 CTGT >>9> 18:462+29M5I3M:F:296 97 1 1751 0 3S2H5M 18 757 0 TGCACGTN 453;>>>> 18:462+29M5I3M:F:295 97 2 75 0 5M 18 757 0 ACGTN * AM:i:0 MD:Z:30A0C5 NM:i:2 XT:A:R 18:462+29M5I3M:F:297 97 2 1751 0 3S5M1S3H 18 757 0 TGCACGTNG 453;>>>>5 18:462+29M5I3M:F:298 97 3 75 0 3S5M4H 18 757 0 TGCACGTN 453;>>>> Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 AACT ==;; Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 * *
Output File:
@SQ SN:1 LN:2004 AS:my37 M5:a9cfe5b8c11aa0cc2c0d2bf3602c9804 UR:testFasta.fa SP:new @SQ SN:2 LN:2000 AS:my37 M5:7c342606b54aa211a50f5f63ac1cb2eb UR:testFasta.fa SP:new @SQ SN:3 LN:2005 AS:my37 M5:c30e547093f33de240b164a4a2ebe3b5 UR:testFasta.fa SP:new @SQ SN:4 LN:2040 AS:my37 M5:fc4c559e9da51e93e7875031ddf65f2a UR:testFasta.fa SP:new @SQ SN:5 LN:2006 AS:my37 M5:c876194283debb8b507ebd0f82309ec4 UR:testFasta.fa SP:new @RG ID:myID LB:library SM:sample @RG ID:myID2 SM:sample2 LB:library2 @HD VN:1.0 SO:coordinate GO:none @RG ID:UM0037:1 SM:Sample2 LB:lb2 PU:mypu CN:UMCORE DT:2010-11-01 PL:ILLUMINA @PG ID:polish VN:0.0.1 @CO Comment 1 @CO Comment 2 18:462+29M5I3M:F:295 97 1 75 0 5M 18 757 0 ACGTN ;>>>> AM:i:0 MD:Z:30A0C5 NM:i:2 RG:Z:UM0037:1 XT:A:R 18:462+29M5I3M:F:295 97 1 75 0 * 18 757 0 * * AM:i:0 RG:Z:UM0037:1 1:1011:F:255+17M15D20M 73 1 1011 0 5M2D = 1011 0 CCGAA 6>6+4 AM:i:0 MD:Z:37 NM:i:0 RG:Z:UM0037:1 XT:A:R 1:1011:F:255+17M15D20M 133 1 1012 0 * = 1011 0 CTGT >>9> RG:Z:UM0037:1 18:462+29M5I3M:F:296 97 1 1751 0 3S2H5M 18 757 0 TGCACGTN 453;>>>> RG:Z:UM0037:1 18:462+29M5I3M:F:295 97 2 75 0 5M 18 757 0 ACGTN * AM:i:0 MD:Z:30A0C5 NM:i:2 RG:Z:UM0037:1 XT:A:R 18:462+29M5I3M:F:297 97 2 1751 0 3S5M1S3H 18 757 0 TGCACGTNG 453;>>>>5 RG:Z:UM0037:1 18:462+29M5I3M:F:298 97 3 75 0 3S5M4H 18 757 0 TGCACGTN 453;>>>> RG:Z:UM0037:1 Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 AACT ==;; RG:Z:UM0037:1 Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 * * RG:Z:UM0037:1
Output:
in testFiles/sortedSam.sam out results/updatedSam.sam log results/updated.log checkSQ
Log File:
Arguments in effect: --in [testFiles/sortedSam.sam] --out [results/updatedSam.sam] --log [results/updated.log] --fasta [testFiles/testFasta.fa] --AS [my37] --UR [testFasta.fa] --SP [new] --checkSQ [ON] --HD [@HD VN:1.0 SO:coordinate GO:none] --RG [@RG ID:UM0037:1 SM:Sample2 LB:lb2 PU:mypu CN:UMCORE DT:2010-11-01 PL:ILLUMINA] --PG [@PG ID:polish VN:0.0.1] Reading the reference file testFiles/testFasta.fa Finished reading the reference file testFiles/testFasta.fa Finished checking the consistency of SQ tags Creating the header of new output file Adding 1 HD, 1 RG, and 1 PG headers Finished writing output headers Writing output BAM file Successfully written 10 records