Changes

From Genome Analysis Wiki
Jump to navigationJump to search
5,858 bytes added ,  14:06, 6 January 2014
no edit summary
Line 1: Line 1: −
== Trim BAM ==
+
= Overview of the <code>polishBam</code> function of <code>bamUtil</code> =
The <code>trimBam</code> program is released as part of the StatGen Library & Tools download.
+
The <code>polishBam</code> option on the [[bamUtil]] executable adds/updates header lines & adds the RG tag to each record.
   −
<code>trimBam</code> trims the end of reads in a SAM/BAM file, changing read ends to ‘N’ and quality to ‘!’.
+
= Usage =
 +
./bam polishBam (options) --in <inBamFile> --out <outBamFile>
    +
= Parameters =
 +
<pre>
 +
  Required parameters:
 +
        -i/--in : input BAM file
 +
        -o/--out : output BAM file
 +
  Optional parameters:
 +
        -v : turn on verbose mode
 +
        -l/--log : writes logfile. <outBamFile>.log will be used if value is unspecified
 +
        --HD : add @HD header line
 +
        --RG : add @RG header line
 +
        --PG : add @PG header line
 +
        -f/--fasta : fasta reference file to compute MD5sums and update SQ tags
 +
        --AS : AS tag for genome assembly identifier
 +
        --UR : UR tag for @SQ tag (if different from --fasta)
 +
        --SP : SP tag for @SQ tag
 +
        --checkSQ : check the consistency of SQ tags (SN and LN) with existing header lines. Must be used with --fasta option
 +
</pre>
 +
{{PhoneHomeParamDesc}}
 +
 +
== Required Parameters ==
 +
{{InBAMInputFile}}
 +
{{OutBAMOutputFile}}
 +
 +
== Optional Parameters ==
 +
=== Verbose (<code>--verbose</code>) ===
 +
Use <code>--verbose</code> to turn on verbose mode.
 +
 +
=== Specify Log Filename (<code>--log</code>) ===
 +
Use <code>--log</code> followed by the log filename to specify the log filename.  Default is the output file basename with a <code>.log</code> extension
 +
 +
=== Add the HD Header (<code>--HD</code>) ===
 +
Use <code>--HD</code> followed by the HD header line to add a HD header.  Be sure to include "@HD" in the line you specify.
 +
 +
=== Add the RG Header (<code>--RG</code>) ===
 +
Use <code>--RG</code> followed by the RG header line to add a RG header.  Be sure to include "@RG" in the line you specify.
 +
 +
=== Add the PG Header (<code>--PG</code>) ===
 +
Use <code>--PG</code> followed by the PG header line to add a PG header.  Be sure to include "@PG" in the line you specify.
 +
 +
=== Add MD5 and UR tags to SQ Headers (<code>--fasta</code>) ===
 +
Use <code>--fasta</code> followed by the fasta reference file name to compute MD5sums and update SQ tags with the M5 & UR values.  Use the [[#Add the UR tag to SQ Headers (--UR)|<code>--UR</code>]] option to specify a different UR value.
 +
 +
=== Add the AS tag to SQ Headers (<code>--AS</code>) ===
 +
Use <code>--AS</code> followed by the genome assembly identify to add the AS tag to the SQ Headers.
 +
 +
=== Add the UR tag to SQ Headers (<code>--UR</code>) ===
 +
Use <code>--UR</code> followed by the URI of the sequence to add the UR tag to the SQ Headers.
 +
 +
The UR header will be automatically added with the [[#Add MD5 and UR tags to SQ Headers (--fasta)|<code>--fasta</code>]] option, so if [[#Add MD5 and UR tags to SQ Headers (--fasta)|<code>--fasta</code>]] is used, <code>--UR</code> only needs to be specified if it is different from [[#Add MD5 and UR tags to SQ Headers (--fasta)|<code>--fasta</code>]].
 +
 +
=== Add the SP tag to SQ Headers (<code>--SP</code>) ===
 +
Use <code>--SP</code> followed by the species to add the SP tag to the SQ Headers.
 +
 +
{{PhoneHomeParameters}}
 +
 +
= Return Value =
 +
Returns 0 on success, non-0 on failure.
   −
=== Parameters ===
+
= Example =
 +
Command:
 
<pre>
 
<pre>
    Required Parameters:
+
./bam polishBam  --in testFiles/sortedSam.sam --out results/updatedSam.sam --log results/updated.log --checkSQ --fasta testFiles/testFasta.fa --AS my37 --UR testFasta.fa --RG "@RG ID:UM0037:1 SM:Sample2 LB:lb2 PU:mypu CN:UMCORE DT:2010-11-01 PL:ILLUMINA" --PG "@PG ID:polish VN:0.0.1" --SP new --HD "@HD VN:1.0 SO:coordinate GO:none"
        inFile  : the SAM/BAM file to be read
  −
        outFile : the SAM/BAM file to be written
  −
        num-bases-to-trim-on-each-side : the number of bases/qualities to trim from each side
   
</pre>
 
</pre>
   −
=== Usage ===
+
Input File:
trimBam [inFile] [outFile] [num-bases-to-trim-on-each-side]
+
<pre>
 +
@SQ SN:1 LN:2004
 +
@SQ SN:2 LN:2000
 +
@SQ SN:3 LN:2005
 +
@SQ SN:4 LN:2040
 +
@SQ SN:5 LN:2006
 +
@RG ID:myID LB:library SM:sample
 +
@RG ID:myID2 SM:sample2 LB:library2
 +
@CO Comment 1
 +
@CO Comment 2
 +
18:462+29M5I3M:F:295 97 1 75 0 5M 18 757 0 ACGTN ;>>>> AM:i:0 MD:Z:30A0C5 NM:i:2 XT:A:R
 +
18:462+29M5I3M:F:295 97 1 75 0 * 18 757 0 * * AM:i:0
 +
1:1011:F:255+17M15D20M 73 1 1011 0 5M2D = 1011 0 CCGAA 6>6+4 AM:i:0 MD:Z:37 NM:i:0 XT:A:R
 +
1:1011:F:255+17M15D20M 133 1 1012 0 * = 1011 0 CTGT >>9>
 +
18:462+29M5I3M:F:296 97 1 1751 0 3S2H5M 18 757 0 TGCACGTN 453;>>>>
 +
18:462+29M5I3M:F:295 97 2 75 0 5M 18 757 0 ACGTN * AM:i:0 MD:Z:30A0C5 NM:i:2 XT:A:R
 +
18:462+29M5I3M:F:297 97 2 1751 0 3S5M1S3H 18 757 0 TGCACGTNG 453;>>>>5
 +
18:462+29M5I3M:F:298 97 3 75 0 3S5M4H 18 757 0 TGCACGTN 453;>>>>
 +
Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 AACT ==;;
 +
Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 * *
 +
</pre>
      −
=== Return Value ===
+
Output File:
Returns the SamStatus for the reads/writes.  0 on success.
+
<pre>
 +
@SQ SN:1 LN:2004 AS:my37 M5:a9cfe5b8c11aa0cc2c0d2bf3602c9804 UR:testFasta.fa SP:new
 +
@SQ SN:2 LN:2000 AS:my37 M5:7c342606b54aa211a50f5f63ac1cb2eb UR:testFasta.fa SP:new
 +
@SQ SN:3 LN:2005 AS:my37 M5:c30e547093f33de240b164a4a2ebe3b5 UR:testFasta.fa SP:new
 +
@SQ SN:4 LN:2040 AS:my37 M5:fc4c559e9da51e93e7875031ddf65f2a UR:testFasta.fa SP:new
 +
@SQ SN:5 LN:2006 AS:my37 M5:c876194283debb8b507ebd0f82309ec4 UR:testFasta.fa SP:new
 +
@RG ID:myID LB:library SM:sample
 +
@RG ID:myID2 SM:sample2 LB:library2
 +
@HD VN:1.0 SO:coordinate GO:none
 +
@RG ID:UM0037:1 SM:Sample2 LB:lb2 PU:mypu CN:UMCORE DT:2010-11-01 PL:ILLUMINA
 +
@PG ID:polish VN:0.0.1
 +
@CO Comment 1
 +
@CO Comment 2
 +
18:462+29M5I3M:F:295 97 1 75 0 5M 18 757 0 ACGTN ;>>>> AM:i:0 MD:Z:30A0C5 NM:i:2 RG:Z:UM0037:1 XT:A:R
 +
18:462+29M5I3M:F:295 97 1 75 0 * 18 757 0 * * AM:i:0 RG:Z:UM0037:1
 +
1:1011:F:255+17M15D20M 73 1 1011 0 5M2D = 1011 0 CCGAA 6>6+4 AM:i:0 MD:Z:37 NM:i:0 RG:Z:UM0037:1 XT:A:R
 +
1:1011:F:255+17M15D20M 133 1 1012 0 * = 1011 0 CTGT >>9> RG:Z:UM0037:1
 +
18:462+29M5I3M:F:296 97 1 1751 0 3S2H5M 18 757 0 TGCACGTN 453;>>>> RG:Z:UM0037:1
 +
18:462+29M5I3M:F:295 97 2 75 0 5M 18 757 0 ACGTN * AM:i:0 MD:Z:30A0C5 NM:i:2 RG:Z:UM0037:1 XT:A:R
 +
18:462+29M5I3M:F:297 97 2 1751 0 3S5M1S3H 18 757 0 TGCACGTNG 453;>>>>5 RG:Z:UM0037:1
 +
18:462+29M5I3M:F:298 97 3 75 0 3S5M4H 18 757 0 TGCACGTN 453;>>>> RG:Z:UM0037:1
 +
Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 AACT ==;; RG:Z:UM0037:1
 +
Y:16597235+13M13I11M:F:181 141 * 0 0 * * 0 0 * * RG:Z:UM0037:1
 +
</pre>
   −
=== Example Output ===
+
Output:
 
<pre>
 
<pre>
Arguments in effect:
+
in testFiles/sortedSam.sam
Input file : testFiles/testSam.sam
+
out results/updatedSam.sam
Output file : results/trimSam.sam
+
log results/updated.log
#TrimBases : 2
+
checkSQ
 +
</pre>
   −
Number of records read = 10
+
Log File:
Number of records written = 10
+
<pre>
 +
Arguments in effect:
 +
--in [testFiles/sortedSam.sam]
 +
--out [results/updatedSam.sam]
 +
--log [results/updated.log]
 +
--fasta [testFiles/testFasta.fa]
 +
--AS [my37]
 +
--UR [testFasta.fa]
 +
--SP [new]
 +
--checkSQ [ON]
 +
--HD [@HD VN:1.0 SO:coordinate GO:none]
 +
--RG [@RG ID:UM0037:1 SM:Sample2 LB:lb2 PU:mypu CN:UMCORE DT:2010-11-01 PL:ILLUMINA]
 +
--PG [@PG ID:polish VN:0.0.1]
 +
Reading the reference file testFiles/testFasta.fa
 +
Finished reading the reference file testFiles/testFasta.fa
 +
Finished checking the consistency of SQ tags
 +
Creating the header of new output file
 +
Adding 1 HD, 1 RG, and 1 PG headers
 +
Finished writing output headers
 +
Writing output BAM file
 +
Successfully written 10 records
 
</pre>
 
</pre>
    +
 +
[[Category:BamUtil|polishBam]]
 +
[[Category:BAM Software]]
 
[[Category:Software]]
 
[[Category:Software]]
[[Category:StatGen Download]]
  −
[[Category:BAM Software]]
 

Navigation menu