BamUtil: polishBam
From Genome Analysis Wiki
Jump to navigationJump to searchThe printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Overview of the polishBam
function of bamUtil
The polishBam
option on the bamUtil executable adds/updates header lines & adds the RG tag to each record.
Usage
./bam polishBam (options) --in <inBamFile> --out <outBamFile>
Parameters
Required parameters: -i/--in : input BAM file -o/--out : output BAM file Optional parameters: -v : turn on verbose mode -l/--log : writes logfile. <outBamFile>.log will be used if value is unspecified --HD : add @HD header line --RG : add @RG header line --PG : add @PG header line -f/--fasta : fasta reference file to compute MD5sums and update SQ tags --AS : AS tag for genome assembly identifier --UR : UR tag for @SQ tag (if different from --fasta) --SP : SP tag for @SQ tag --checkSQ : check the consistency of SQ tags (SN and LN) with existing header lines. Must be used with --fasta option
Return Value
Returns 0.
Example
Command:
./bam polishBam --in testFiles/sortedSam.sam --out results/updatedSam.sam --log results/updated.log --checkSQ --fasta testFiles/testFasta.fa --AS my37 --UR testFasta.fa --RG "@RG ID:UM0037:1 SM:Sample2 LB:lb2 PU:mypu CN:UMCORE DT:2010-11-01 PL:ILLUMINA" --PG "@PG ID:polish VN:0.0.1" --SP new --HD "@HD VN:1.0 SO:coordinate GO:none"
Input File:
@SQ SN:1 LN:2004 @SQ SN:2 LN:2000 @SQ SN:3 LN:2005 @SQ SN:4 LN:2040 @SQ SN:5 LN:2006 @RG ID:myID LB:library SM:sample @RG ID:myID2 SM:sample2 LB:library2 @CO Comment 1 @CO Comment 2 18:462+29M5I3M:F:295 97 1 75 0 5M 18 757 0 ACGTN ;>>>> AM:i:0 MD:Z:30A0C5 NM:i:2 XT:A:R 18:462+29M5I3M:F:295 97 1 75 0 * 18 757 0 * * AM:i:0 1:1011:F:255+17M15D20M 73 1 1011 0 5M2D = 1011 0 CCGAA 6>6+4 AM:i:0 MD:Z:37 NM:i:0 XT:A:R 1:1011:F:255+17M15D20M 133 1 1012 0 * = 1011 0 CTGT >>9> 18:462+29M5I3M:F:296 97 1 1751 0 3S2H5M 18 757 0 TGCACGTN 453;>>>> 18:462+29M5I3M:F:295 97 2 75 0 5M 18 757 0 ACGTN * AM:i:0 MD:Z:30A0C5 NM:i:2 XT:A:R 18:462+29M5I3M:F:297 97 2 1751 0 3S5M1S3H 18 757 0 TGCACGTNG 453;>>>>5 18:462+29M5I3M:F:298 97 3 75 0 3S5M4H 18 757 0 TGCACGTN 453;>>>> Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 AACT ==;; Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 * *
Output File:
@SQ SN:1 LN:2004 AS:my37 M5:a9cfe5b8c11aa0cc2c0d2bf3602c9804 UR:testFasta.fa SP:new @SQ SN:2 LN:2000 AS:my37 M5:7c342606b54aa211a50f5f63ac1cb2eb UR:testFasta.fa SP:new @SQ SN:3 LN:2005 AS:my37 M5:c30e547093f33de240b164a4a2ebe3b5 UR:testFasta.fa SP:new @SQ SN:4 LN:2040 AS:my37 M5:fc4c559e9da51e93e7875031ddf65f2a UR:testFasta.fa SP:new @SQ SN:5 LN:2006 AS:my37 M5:c876194283debb8b507ebd0f82309ec4 UR:testFasta.fa SP:new @RG ID:myID LB:library SM:sample @RG ID:myID2 SM:sample2 LB:library2 @HD VN:1.0 SO:coordinate GO:none @RG ID:UM0037:1 SM:Sample2 LB:lb2 PU:mypu CN:UMCORE DT:2010-11-01 PL:ILLUMINA @PG ID:polish VN:0.0.1 @CO Comment 1 @CO Comment 2 18:462+29M5I3M:F:295 97 1 75 0 5M 18 757 0 ACGTN ;>>>> AM:i:0 MD:Z:30A0C5 NM:i:2 RG:Z:UM0037:1 XT:A:R 18:462+29M5I3M:F:295 97 1 75 0 * 18 757 0 * * AM:i:0 RG:Z:UM0037:1 1:1011:F:255+17M15D20M 73 1 1011 0 5M2D = 1011 0 CCGAA 6>6+4 AM:i:0 MD:Z:37 NM:i:0 RG:Z:UM0037:1 XT:A:R 1:1011:F:255+17M15D20M 133 1 1012 0 * = 1011 0 CTGT >>9> RG:Z:UM0037:1 18:462+29M5I3M:F:296 97 1 1751 0 3S2H5M 18 757 0 TGCACGTN 453;>>>> RG:Z:UM0037:1 18:462+29M5I3M:F:295 97 2 75 0 5M 18 757 0 ACGTN * AM:i:0 MD:Z:30A0C5 NM:i:2 RG:Z:UM0037:1 XT:A:R 18:462+29M5I3M:F:297 97 2 1751 0 3S5M1S3H 18 757 0 TGCACGTNG 453;>>>>5 RG:Z:UM0037:1 18:462+29M5I3M:F:298 97 3 75 0 3S5M4H 18 757 0 TGCACGTN 453;>>>> RG:Z:UM0037:1 Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 AACT ==;; RG:Z:UM0037:1 Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 * * RG:Z:UM0037:1
Output:
in testFiles/sortedSam.sam out results/updatedSam.sam log results/updated.log checkSQ
Log File:
Arguments in effect: --in [testFiles/sortedSam.sam] --out [results/updatedSam.sam] --log [results/updated.log] --fasta [testFiles/testFasta.fa] --AS [my37] --UR [testFasta.fa] --SP [new] --checkSQ [ON] --HD [@HD VN:1.0 SO:coordinate GO:none] --RG [@RG ID:UM0037:1 SM:Sample2 LB:lb2 PU:mypu CN:UMCORE DT:2010-11-01 PL:ILLUMINA] --PG [@PG ID:polish VN:0.0.1] Reading the reference file testFiles/testFasta.fa Finished reading the reference file testFiles/testFasta.fa Finished checking the consistency of SQ tags Creating the header of new output file Adding 1 HD, 1 RG, and 1 PG headers Finished writing output headers Writing output BAM file Successfully written 10 records