Changes

From Genome Analysis Wiki
Jump to: navigation, search

BamUtil: polishBam

5,479 bytes added, 13:06, 6 January 2014
no edit summary
== Polish BAM =Overview of the <code>polishBam</code> function of <code>bamUtil</code> =The <code>polishBam</code> program is released as part of option on the StatGen Library [[bamUtil]] executable adds/updates header lines & Tools downloadadds the RG tag to each record.
= Usage = ./bam polishBam (options) --in <codeinBamFile>polishBam--out </codeoutBamFile> trims the end of reads in a SAM/BAM file, changing read ends to ‘N’ and quality to ‘!’.
 === Parameters ===
<pre>
Required parameters:
--checkSQ : check the consistency of SQ tags (SN and LN) with existing header lines. Must be used with --fasta option
</pre>
{{PhoneHomeParamDesc}}
 
== Required Parameters ==
{{InBAMInputFile}}
{{OutBAMOutputFile}}
 
== Optional Parameters ==
=== Verbose (<code>--verbose</code>) ===
Use <code>--verbose</code> to turn on verbose mode.
 
=== Specify Log Filename (<code>--log</code>) ===
Use <code>--log</code> followed by the log filename to specify the log filename. Default is the output file basename with a <code>.log</code> extension
 
=== Add the HD Header (<code>--HD</code>) ===
Use <code>--HD</code> followed by the HD header line to add a HD header. Be sure to include "@HD" in the line you specify.
 
=== Add the RG Header (<code>--RG</code>) ===
Use <code>--RG</code> followed by the RG header line to add a RG header. Be sure to include "@RG" in the line you specify.
 
=== Add the PG Header (<code>--PG</code>) ===
Use <code>--PG</code> followed by the PG header line to add a PG header. Be sure to include "@PG" in the line you specify.
 
=== Add MD5 and UR tags to SQ Headers (<code>--fasta</code>) ===
Use <code>--fasta</code> followed by the fasta reference file name to compute MD5sums and update SQ tags with the M5 & UR values. Use the [[#Add the UR tag to SQ Headers (--UR)|<code>--UR</code>]] option to specify a different UR value.
 
=== Add the AS tag to SQ Headers (<code>--AS</code>) ===
Use <code>--AS</code> followed by the genome assembly identify to add the AS tag to the SQ Headers.
 
=== Add the UR tag to SQ Headers (<code>--UR</code>) ===
Use <code>--UR</code> followed by the URI of the sequence to add the UR tag to the SQ Headers.
=== Usage === trimBam The UR header will be automatically added with the [inFile[#Add MD5 and UR tags to SQ Headers (--fasta)|<code>--fasta</code>] ] option, so if [outFile[#Add MD5 and UR tags to SQ Headers (--fasta)|<code>--fasta</code>] [num] is used, <code>-bases-UR</code> only needs tobe specified if it is different from [[#Add MD5 and UR tags to SQ Headers (-trim-onfasta)|<code>-each-sidefasta</code>]].
=== Add the SP tag to SQ Headers (<code>--SP</code>) ===
Use <code>--SP</code> followed by the species to add the SP tag to the SQ Headers.
=== Return Value ===Returns the SamStatus for the reads/writes. 0 on success.{{PhoneHomeParameters}}
=Return Value =Returns 0 on success, non-0 on failure. = Example Output ===Command:
<pre>
polishBAM (options) ./bam polishBam --in=<inBamFile> testFiles/sortedSam.sam --out=<outBamFile>results/updatedSam.sam --log results/updated.log --checkSQ --fasta testFiles/testFasta.fa --AS my37 --UR testFasta.fa --RG "@RG ID:UM0037:1 SM:Sample2 LB:lb2 PU:mypu CN:UMCORE DT:2010-11-01 PL:ILLUMINA" --PG "@PG ID:polish VN:0.0.1" --SP new --HD "@HD VN:1.0 SO:coordinate GO:none"
</pre>
Input File:
<pre>
@SQ SN:1 LN:2004
@SQ SN:2 LN:2000
@SQ SN:3 LN:2005
@SQ SN:4 LN:2040
@SQ SN:5 LN:2006
@RG ID:myID LB:library SM:sample
@RG ID:myID2 SM:sample2 LB:library2
@CO Comment 1
@CO Comment 2
18:462+29M5I3M:F:295 97 1 75 0 5M 18 757 0 ACGTN ;>>>> AM:i:0 MD:Z:30A0C5 NM:i:2 XT:A:R
18:462+29M5I3M:F:295 97 1 75 0 * 18 757 0 * * AM:i:0
1:1011:F:255+17M15D20M 73 1 1011 0 5M2D = 1011 0 CCGAA 6>6+4 AM:i:0 MD:Z:37 NM:i:0 XT:A:R
1:1011:F:255+17M15D20M 133 1 1012 0 * = 1011 0 CTGT >>9>
18:462+29M5I3M:F:296 97 1 1751 0 3S2H5M 18 757 0 TGCACGTN 453;>>>>
18:462+29M5I3M:F:295 97 2 75 0 5M 18 757 0 ACGTN * AM:i:0 MD:Z:30A0C5 NM:i:2 XT:A:R
18:462+29M5I3M:F:297 97 2 1751 0 3S5M1S3H 18 757 0 TGCACGTNG 453;>>>>5
18:462+29M5I3M:F:298 97 3 75 0 3S5M4H 18 757 0 TGCACGTN 453;>>>>
Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 AACT ==;;
Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 * *
</pre>
 
 
Output File:
<pre>
@SQ SN:1 LN:2004 AS:my37 M5:a9cfe5b8c11aa0cc2c0d2bf3602c9804 UR:testFasta.fa SP:new
@SQ SN:2 LN:2000 AS:my37 M5:7c342606b54aa211a50f5f63ac1cb2eb UR:testFasta.fa SP:new
@SQ SN:3 LN:2005 AS:my37 M5:c30e547093f33de240b164a4a2ebe3b5 UR:testFasta.fa SP:new
@SQ SN:4 LN:2040 AS:my37 M5:fc4c559e9da51e93e7875031ddf65f2a UR:testFasta.fa SP:new
@SQ SN:5 LN:2006 AS:my37 M5:c876194283debb8b507ebd0f82309ec4 UR:testFasta.fa SP:new
@RG ID:myID LB:library SM:sample
@RG ID:myID2 SM:sample2 LB:library2
@HD VN:1.0 SO:coordinate GO:none
@RG ID:UM0037:1 SM:Sample2 LB:lb2 PU:mypu CN:UMCORE DT:2010-11-01 PL:ILLUMINA
@PG ID:polish VN:0.0.1
@CO Comment 1
@CO Comment 2
18:462+29M5I3M:F:295 97 1 75 0 5M 18 757 0 ACGTN ;>>>> AM:i:0 MD:Z:30A0C5 NM:i:2 RG:Z:UM0037:1 XT:A:R
18:462+29M5I3M:F:295 97 1 75 0 * 18 757 0 * * AM:i:0 RG:Z:UM0037:1
1:1011:F:255+17M15D20M 73 1 1011 0 5M2D = 1011 0 CCGAA 6>6+4 AM:i:0 MD:Z:37 NM:i:0 RG:Z:UM0037:1 XT:A:R
1:1011:F:255+17M15D20M 133 1 1012 0 * = 1011 0 CTGT >>9> RG:Z:UM0037:1
18:462+29M5I3M:F:296 97 1 1751 0 3S2H5M 18 757 0 TGCACGTN 453;>>>> RG:Z:UM0037:1
18:462+29M5I3M:F:295 97 2 75 0 5M 18 757 0 ACGTN * AM:i:0 MD:Z:30A0C5 NM:i:2 RG:Z:UM0037:1 XT:A:R
18:462+29M5I3M:F:297 97 2 1751 0 3S5M1S3H 18 757 0 TGCACGTNG 453;>>>>5 RG:Z:UM0037:1
18:462+29M5I3M:F:298 97 3 75 0 3S5M4H 18 757 0 TGCACGTN 453;>>>> RG:Z:UM0037:1
Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 AACT ==;; RG:Z:UM0037:1
Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 * * RG:Z:UM0037:1
</pre>
 
Output:
<pre>
in testFiles/sortedSam.sam
out results/updatedSam.sam
log results/updated.log
checkSQ
</pre>
 
Log File:
<pre>
Arguments in effect:
--in [testFiles/sortedSam.sam]
--out [results/updatedSam.sam]
--log [results/updated.log]
--fasta [testFiles/testFasta.fa]
--AS [my37]
--UR [testFasta.fa]
--SP [new]
--checkSQ [ON]
--HD [@HD VN:1.0 SO:coordinate GO:none]
--RG [@RG ID:UM0037:1 SM:Sample2 LB:lb2 PU:mypu CN:UMCORE DT:2010-11-01 PL:ILLUMINA]
--PG [@PG ID:polish VN:0.0.1]
Reading the reference file testFiles/testFasta.fa
Finished reading the reference file testFiles/testFasta.fa
Finished checking the consistency of SQ tags
Creating the header of new output file
Adding 1 HD, 1 RG, and 1 PG headers
Finished writing output headers
Writing output BAM file
Successfully written 10 records
</pre>
 
 
[[Category:BamUtil|polishBam]]
[[Category:BAM Software]]
[[Category:Software]]
[[Category:StatGen Download]]
[[Category:BAM Software]]

Navigation menu