Difference between revisions of "BamUtil: polishBam"
From Genome Analysis Wiki
Jump to navigationJump to searchm (moved PolishBam to BamUtil: polishBam) |
|
(No difference)
|
Revision as of 13:49, 3 April 2012
Overview of the polishBam
function of bamUtil
The polishBam
option on the bamUtil executable adds/updates header lines & adds the RG tag to each record.
Usage
./bam polishBam (options) --in <inBamFile> --out <outBamFile>
Parameters
Required parameters: -i/--in : input BAM file -o/--out : output BAM file Optional parameters: -v : turn on verbose mode -l/--log : writes logfile. <outBamFile>.log will be used if value is unspecified --HD : add @HD header line --RG : add @RG header line --PG : add @PG header line -f/--fasta : fasta reference file to compute MD5sums and update SQ tags --AS : AS tag for genome assembly identifier --UR : UR tag for @SQ tag (if different from --fasta) --SP : SP tag for @SQ tag --checkSQ : check the consistency of SQ tags (SN and LN) with existing header lines. Must be used with --fasta option
Return Value
Returns 0.
Example
Command:
./bam polishBam --in testFiles/sortedSam.sam --out results/updatedSam.sam --log results/updated.log --checkSQ --fasta testFiles/testFasta.fa --AS my37 --UR testFasta.fa --RG "@RG ID:UM0037:1 SM:Sample2 LB:lb2 PU:mypu CN:UMCORE DT:2010-11-01 PL:ILLUMINA" --PG "@PG ID:polish VN:0.0.1" --SP new --HD "@HD VN:1.0 SO:coordinate GO:none"
Input File:
@SQ SN:1 LN:2004 @SQ SN:2 LN:2000 @SQ SN:3 LN:2005 @SQ SN:4 LN:2040 @SQ SN:5 LN:2006 @RG ID:myID LB:library SM:sample @RG ID:myID2 SM:sample2 LB:library2 @CO Comment 1 @CO Comment 2 18:462+29M5I3M:F:295 97 1 75 0 5M 18 757 0 ACGTN ;>>>> AM:i:0 MD:Z:30A0C5 NM:i:2 XT:A:R 18:462+29M5I3M:F:295 97 1 75 0 * 18 757 0 * * AM:i:0 1:1011:F:255+17M15D20M 73 1 1011 0 5M2D = 1011 0 CCGAA 6>6+4 AM:i:0 MD:Z:37 NM:i:0 XT:A:R 1:1011:F:255+17M15D20M 133 1 1012 0 * = 1011 0 CTGT >>9> 18:462+29M5I3M:F:296 97 1 1751 0 3S2H5M 18 757 0 TGCACGTN 453;>>>> 18:462+29M5I3M:F:295 97 2 75 0 5M 18 757 0 ACGTN * AM:i:0 MD:Z:30A0C5 NM:i:2 XT:A:R 18:462+29M5I3M:F:297 97 2 1751 0 3S5M1S3H 18 757 0 TGCACGTNG 453;>>>>5 18:462+29M5I3M:F:298 97 3 75 0 3S5M4H 18 757 0 TGCACGTN 453;>>>> Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 AACT ==;; Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 * *
Output File:
@SQ SN:1 LN:2004 AS:my37 M5:a9cfe5b8c11aa0cc2c0d2bf3602c9804 UR:testFasta.fa SP:new @SQ SN:2 LN:2000 AS:my37 M5:7c342606b54aa211a50f5f63ac1cb2eb UR:testFasta.fa SP:new @SQ SN:3 LN:2005 AS:my37 M5:c30e547093f33de240b164a4a2ebe3b5 UR:testFasta.fa SP:new @SQ SN:4 LN:2040 AS:my37 M5:fc4c559e9da51e93e7875031ddf65f2a UR:testFasta.fa SP:new @SQ SN:5 LN:2006 AS:my37 M5:c876194283debb8b507ebd0f82309ec4 UR:testFasta.fa SP:new @RG ID:myID LB:library SM:sample @RG ID:myID2 SM:sample2 LB:library2 @HD VN:1.0 SO:coordinate GO:none @RG ID:UM0037:1 SM:Sample2 LB:lb2 PU:mypu CN:UMCORE DT:2010-11-01 PL:ILLUMINA @PG ID:polish VN:0.0.1 @CO Comment 1 @CO Comment 2 18:462+29M5I3M:F:295 97 1 75 0 5M 18 757 0 ACGTN ;>>>> AM:i:0 MD:Z:30A0C5 NM:i:2 RG:Z:UM0037:1 XT:A:R 18:462+29M5I3M:F:295 97 1 75 0 * 18 757 0 * * AM:i:0 RG:Z:UM0037:1 1:1011:F:255+17M15D20M 73 1 1011 0 5M2D = 1011 0 CCGAA 6>6+4 AM:i:0 MD:Z:37 NM:i:0 RG:Z:UM0037:1 XT:A:R 1:1011:F:255+17M15D20M 133 1 1012 0 * = 1011 0 CTGT >>9> RG:Z:UM0037:1 18:462+29M5I3M:F:296 97 1 1751 0 3S2H5M 18 757 0 TGCACGTN 453;>>>> RG:Z:UM0037:1 18:462+29M5I3M:F:295 97 2 75 0 5M 18 757 0 ACGTN * AM:i:0 MD:Z:30A0C5 NM:i:2 RG:Z:UM0037:1 XT:A:R 18:462+29M5I3M:F:297 97 2 1751 0 3S5M1S3H 18 757 0 TGCACGTNG 453;>>>>5 RG:Z:UM0037:1 18:462+29M5I3M:F:298 97 3 75 0 3S5M4H 18 757 0 TGCACGTN 453;>>>> RG:Z:UM0037:1 Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 AACT ==;; RG:Z:UM0037:1 Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 * * RG:Z:UM0037:1
Output:
in testFiles/sortedSam.sam out results/updatedSam.sam log results/updated.log checkSQ
Log File:
Arguments in effect: --in [testFiles/sortedSam.sam] --out [results/updatedSam.sam] --log [results/updated.log] --fasta [testFiles/testFasta.fa] --AS [my37] --UR [testFasta.fa] --SP [new] --checkSQ [ON] --HD [@HD VN:1.0 SO:coordinate GO:none] --RG [@RG ID:UM0037:1 SM:Sample2 LB:lb2 PU:mypu CN:UMCORE DT:2010-11-01 PL:ILLUMINA] --PG [@PG ID:polish VN:0.0.1] Reading the reference file testFiles/testFasta.fa Finished reading the reference file testFiles/testFasta.fa Finished checking the consistency of SQ tags Creating the header of new output file Adding 1 HD, 1 RG, and 1 PG headers Finished writing output headers Writing output BAM file Successfully written 10 records