Changes

From Genome Analysis Wiki
Jump to: navigation, search

BamUtil: polishBam

5,858 bytes added, 13:06, 6 January 2014
no edit summary
== Trim BAM =Overview of the <code>polishBam</code> function of <code>bamUtil</code> =The <code>trimBampolishBam</code> program is released as part of option on the StatGen Library [[bamUtil]] executable adds/updates header lines & Tools downloadadds the RG tag to each record.
= Usage = ./bam polishBam (options) --in <codeinBamFile>trimBam--out </codeoutBamFile> trims the end of reads in a SAM/BAM file, changing read ends to ‘N’ and quality to ‘!’.
= Parameters =
<pre>
Required parameters:
-i/--in : input BAM file
-o/--out : output BAM file
Optional parameters:
-v : turn on verbose mode
-l/--log : writes logfile. <outBamFile>.log will be used if value is unspecified
--HD : add @HD header line
--RG : add @RG header line
--PG : add @PG header line
-f/--fasta : fasta reference file to compute MD5sums and update SQ tags
--AS : AS tag for genome assembly identifier
--UR : UR tag for @SQ tag (if different from --fasta)
--SP : SP tag for @SQ tag
--checkSQ : check the consistency of SQ tags (SN and LN) with existing header lines. Must be used with --fasta option
</pre>
{{PhoneHomeParamDesc}}
 
== Required Parameters ==
{{InBAMInputFile}}
{{OutBAMOutputFile}}
 
== Optional Parameters ==
=== Verbose (<code>--verbose</code>) ===
Use <code>--verbose</code> to turn on verbose mode.
 
=== Specify Log Filename (<code>--log</code>) ===
Use <code>--log</code> followed by the log filename to specify the log filename. Default is the output file basename with a <code>.log</code> extension
 
=== Add the HD Header (<code>--HD</code>) ===
Use <code>--HD</code> followed by the HD header line to add a HD header. Be sure to include "@HD" in the line you specify.
 
=== Add the RG Header (<code>--RG</code>) ===
Use <code>--RG</code> followed by the RG header line to add a RG header. Be sure to include "@RG" in the line you specify.
 
=== Add the PG Header (<code>--PG</code>) ===
Use <code>--PG</code> followed by the PG header line to add a PG header. Be sure to include "@PG" in the line you specify.
 
=== Add MD5 and UR tags to SQ Headers (<code>--fasta</code>) ===
Use <code>--fasta</code> followed by the fasta reference file name to compute MD5sums and update SQ tags with the M5 & UR values. Use the [[#Add the UR tag to SQ Headers (--UR)|<code>--UR</code>]] option to specify a different UR value.
 
=== Add the AS tag to SQ Headers (<code>--AS</code>) ===
Use <code>--AS</code> followed by the genome assembly identify to add the AS tag to the SQ Headers.
 
=== Add the UR tag to SQ Headers (<code>--UR</code>) ===
Use <code>--UR</code> followed by the URI of the sequence to add the UR tag to the SQ Headers.
 
The UR header will be automatically added with the [[#Add MD5 and UR tags to SQ Headers (--fasta)|<code>--fasta</code>]] option, so if [[#Add MD5 and UR tags to SQ Headers (--fasta)|<code>--fasta</code>]] is used, <code>--UR</code> only needs to be specified if it is different from [[#Add MD5 and UR tags to SQ Headers (--fasta)|<code>--fasta</code>]].
 
=== Add the SP tag to SQ Headers (<code>--SP</code>) ===
Use <code>--SP</code> followed by the species to add the SP tag to the SQ Headers.
 
{{PhoneHomeParameters}}
 
= Return Value =
Returns 0 on success, non-0 on failure.
=Example == Parameters ===Command:
<pre>
Required Parameters./bam polishBam --in testFiles/sortedSam.sam --out results/updatedSam.sam --log results/updated.log --checkSQ --fasta testFiles/testFasta.fa --AS my37 --UR testFasta.fa --RG "@RG ID: inFile UM0037:1 SM:Sample2 LB:lb2 PU:mypu CN:UMCORE DT: the SAM/BAM file to be read outFile 2010-11-01 PL: the SAM/BAM file to be written numILLUMINA" -bases-toPG "@PG ID:polish VN:0.0.1" -trim-onSP new -each-side HD "@HD VN:1.0 SO:coordinate GO: the number of bases/qualities to trim from each sidenone"
</pre>
Input File:<pre>@SQ SN:1 LN:2004@SQ SN:2 LN:2000@SQ SN:3 LN:2005@SQ SN:4 LN:2040@SQ SN:5 LN:2006@RG ID:myID LB:library SM:sample@RG ID:myID2 SM:sample2 LB:library2@CO Comment 1@CO Comment 218:462+29M5I3M:F:295 97 1 75 0 5M 18 757 0 ACGTN ;>>>> AM:i:0 MD:Z:30A0C5 NM:i:2 XT:A:R18:462+29M5I3M:F:295 97 1 75 0 * 18 757 0 * * AM:i:01:1011:F:255+17M15D20M 73 1 1011 0 5M2D =1011 0 CCGAA 6>6+4 AM:i:0 MD:Z:37 NM:i:0 XT:A:R1:1011:F:255+17M15D20M 133 1 1012 0 * == Usage =1011 0 CTGT >>9>18:462+29M5I3M:F:296 97 1 1751 0 3S2H5M 18 757 0 TGCACGTN 453;>>>>18:462+29M5I3M:F:295 97 2 75 0 5M 18 757 0 ACGTN * AM:i:0 MD:Z:30A0C5 NM:i:2 XT:A:R18:462+29M5I3M:F:297 97 2 1751 0 3S5M1S3H 18 757 0 TGCACGTNG 453;>>>>518:462+29M5I3M:F:298 97 3 75 0 3S5M4H 18 757 0 TGCACGTN 453;>>>>Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 AACT ==;;Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 * * trimBam [inFile] [outFile] [num-bases-to-trim-on-each-side]</pre>
Output File:<pre>@SQ SN:1 LN:2004 AS:my37 M5:a9cfe5b8c11aa0cc2c0d2bf3602c9804 UR:testFasta.fa SP:new@SQ SN:2 LN:2000 AS:my37 M5:7c342606b54aa211a50f5f63ac1cb2eb UR:testFasta.fa SP:new@SQ SN:3 LN:2005 AS:my37 M5:c30e547093f33de240b164a4a2ebe3b5 UR:testFasta.fa SP:new@SQ SN:4 LN:2040 AS:my37 M5:fc4c559e9da51e93e7875031ddf65f2a UR:testFasta.fa SP:new@SQ SN:5 LN:2006 AS:my37 M5:c876194283debb8b507ebd0f82309ec4 UR:testFasta.fa SP:new@RG ID:myID LB:library SM:sample@RG ID:myID2 SM:sample2 LB:library2@HD VN:1.0 SO:coordinate GO:none@RG ID:UM0037:1 SM:Sample2 LB:lb2 PU:mypu CN:UMCORE DT:2010-11-01 PL:ILLUMINA@PG ID:polish VN:0.0.1@CO Comment 1@CO Comment 218:462+29M5I3M:F:295 97 1 75 0 5M 18 757 0 ACGTN ;>>>> AM:i:0 MD:Z:30A0C5 NM:i:2 RG:Z:UM0037:1 XT:A:R18:462+29M5I3M:F:295 97 1 75 0 * 18 757 0 * * AM:i:0 RG:Z:UM0037:11:1011:F:255+17M15D20M 73 1 1011 0 5M2D =1011 0 CCGAA 6>6+4 AM:i:0 MD:Z:37 NM:i:0 RG:Z:UM0037:1 XT:A:R1:1011:F:255+17M15D20M 133 1 1012 0 * == Return Value =1011 0 CTGT >>9> RG:Z:UM0037:118:462+29M5I3M:F:296 97 1 1751 0 3S2H5M 18 757 0 TGCACGTN 453;>>>> RG:Z:UM0037:118:462+29M5I3M:F:295 97 2 75 0 5M 18 757 0 ACGTN * AM:i:0 MD:Z:30A0C5 NM:i:2 RG:Z:UM0037:1 XT:A:R18:462+29M5I3M:F:297 97 2 1751 0 3S5M1S3H 18 757 0 TGCACGTNG 453;>>>>5 RG:Z:UM0037:118:462+29M5I3M:F:298 97 3 75 0 3S5M4H 18 757 0 TGCACGTN 453;>>>> RG:Z:UM0037:1Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 AACT ==;; RG:Z:UM0037:1Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 * * RG:Z:UM0037:1Returns the SamStatus for the reads</writes. 0 on success.pre>
=== Example Output ===:
<pre>
Arguments in effect: Input file : testFiles/testSamsortedSam.sam Output file : out results/trimSamupdatedSam.sam #TrimBases : 2log results/updated.logcheckSQ</pre>
Number Log File:<pre>Arguments in effect: --in [testFiles/sortedSam.sam] --out [results/updatedSam.sam] --log [results/updated.log] --fasta [testFiles/testFasta.fa] --AS [my37] --UR [testFasta.fa] --SP [new] --checkSQ [ON] --HD [@HD VN:1.0 SO:coordinate GO:none] --RG [@RG ID:UM0037:1 SM:Sample2 LB:lb2 PU:mypu CN:UMCORE DT:2010-11-01 PL:ILLUMINA] --PG [@PG ID:polish VN:0.0.1]Reading the reference file testFiles/testFasta.faFinished reading the reference file testFiles/testFasta.faFinished checking the consistency of records read = 10SQ tagsNumber Creating the header of records new output fileAdding 1 HD, 1 RG, and 1 PG headersFinished writing output headersWriting output BAM fileSuccessfully written = 10records
</pre>
 
[[Category:BamUtil|polishBam]]
[[Category:BAM Software]]
[[Category:Software]]
[[Category:StatGen Download]]
[[Category:BAM Software]]

Navigation menu