BamUtil: polishBam

From Genome Analysis Wiki
Revision as of 15:21, 1 November 2010 by Mktrost (talk | contribs)
Jump to: navigation, search

Polish BAM

The polishBam program is released as part of the StatGen Library & Tools download.

polishBam trims the end of reads in a SAM/BAM file, changing read ends to ‘N’ and quality to ‘!’.


Parameters

   Required parameters: 
        -i/--in : input BAM file
        -o/--out : output BAM file
   Optional parameters:
        -v : turn on verbose mode
        -l/--log : writes logfile. <outBamFile>.log will be used if value is unspecified
        --HD : add @HD header line
        --RG : add @RG header line
        --PG : add @PG header line
        -f/--fasta : fasta reference file to compute MD5sums and update SQ tags
        --AS : AS tag for genome assembly identifier
        --UR : UR tag for @SQ tag (if different from --fasta)
        --SP : SP tag for @SQ tag
        --checkSQ : check the consistency of SQ tags (SN and LN) with existing header lines. Must be used with --fasta option

Usage

polishBAM (options) --in <inBamFile> --out <outBamFile>


Return Value

Returns 0.

Example

Command:

../../bin/polishBam  --in testFiles/sortedSam.sam --out results/updatedSam.sam --log results/updated.log --checkSQ --fasta testFiles/testFasta.fa --AS my37 --UR testFasta.fa --RG "@RG	ID:UM0037:1	SM:Sample2	LB:lb2	PU:mypu	CN:UMCORE	DT:2010-11-01	PL:ILLUMINA" --PG "@PG	ID:polish	VN:0.0.1" --SP new --HD "@HD	VN:1.0	SO:coordinate	GO:none"

Input File:

@SQ	SN:1	LN:2004
@SQ	SN:2	LN:2000
@SQ	SN:3	LN:2005
@SQ	SN:4	LN:2040
@SQ	SN:5	LN:2006
@RG	ID:myID	LB:library	SM:sample
@RG	ID:myID2	SM:sample2	LB:library2
@CO	Comment 1
@CO	Comment 2
18:462+29M5I3M:F:295	97	1	75	0	5M	18	757	0	ACGTN	;>>>>	AM:i:0	MD:Z:30A0C5	NM:i:2	XT:A:R
18:462+29M5I3M:F:295	97	1	75	0	*	18	757	0	*	*	AM:i:0
1:1011:F:255+17M15D20M	73	1	1011	0	5M2D	=	1011	0	CCGAA	6>6+4	AM:i:0	MD:Z:37	NM:i:0	XT:A:R
1:1011:F:255+17M15D20M	133	1	1012	0	*	=	1011	0	CTGT	>>9>
18:462+29M5I3M:F:296	97	1	1751	0	3S2H5M	18	757	0	TGCACGTN	453;>>>>
18:462+29M5I3M:F:295	97	2	75	0	5M	18	757	0	ACGTN	*	AM:i:0	MD:Z:30A0C5	NM:i:2	XT:A:R
18:462+29M5I3M:F:297	97	2	1751	0	3S5M1S3H	18	757	0	TGCACGTNG	453;>>>>5
18:462+29M5I3M:F:298	97	3	75	0	3S5M4H	18	757	0	TGCACGTN	453;>>>>
Y:16597235+13M13I11M:F:181	141	*	0	0	*	*	0	0	AACT	==;;
Y:16597235+13M13I11M:F:181	141	*	0	0	*	*	0	0	*	*


Output File:

@SQ	SN:1	LN:2004	AS:my37	M5:a9cfe5b8c11aa0cc2c0d2bf3602c9804	UR:testFasta.fa	SP:new
@SQ	SN:2	LN:2000	AS:my37	M5:7c342606b54aa211a50f5f63ac1cb2eb	UR:testFasta.fa	SP:new
@SQ	SN:3	LN:2005	AS:my37	M5:c30e547093f33de240b164a4a2ebe3b5	UR:testFasta.fa	SP:new
@SQ	SN:4	LN:2040	AS:my37	M5:fc4c559e9da51e93e7875031ddf65f2a	UR:testFasta.fa	SP:new
@SQ	SN:5	LN:2006	AS:my37	M5:c876194283debb8b507ebd0f82309ec4	UR:testFasta.fa	SP:new
@RG	ID:myID	LB:library	SM:sample
@RG	ID:myID2	SM:sample2	LB:library2
@HD	VN:1.0	SO:coordinate	GO:none
@RG	ID:UM0037:1	SM:Sample2	LB:lb2	PU:mypu	CN:UMCORE	DT:2010-11-01	PL:ILLUMINA
@PG	ID:polish	VN:0.0.1
@CO	Comment 1
@CO	Comment 2
18:462+29M5I3M:F:295	97	1	75	0	5M	18	757	0	ACGTN	;>>>>	AM:i:0	MD:Z:30A0C5	NM:i:2	RG:Z:UM0037:1	XT:A:R
18:462+29M5I3M:F:295	97	1	75	0	*	18	757	0	*	*	AM:i:0	RG:Z:UM0037:1
1:1011:F:255+17M15D20M	73	1	1011	0	5M2D	=	1011	0	CCGAA	6>6+4	AM:i:0	MD:Z:37	NM:i:0	RG:Z:UM0037:1	XT:A:R
1:1011:F:255+17M15D20M	133	1	1012	0	*	=	1011	0	CTGT	>>9>	RG:Z:UM0037:1
18:462+29M5I3M:F:296	97	1	1751	0	3S2H5M	18	757	0	TGCACGTN	453;>>>>	RG:Z:UM0037:1
18:462+29M5I3M:F:295	97	2	75	0	5M	18	757	0	ACGTN	*	AM:i:0	MD:Z:30A0C5	NM:i:2	RG:Z:UM0037:1	XT:A:R
18:462+29M5I3M:F:297	97	2	1751	0	3S5M1S3H	18	757	0	TGCACGTNG	453;>>>>5	RG:Z:UM0037:1
18:462+29M5I3M:F:298	97	3	75	0	3S5M4H	18	757	0	TGCACGTN	453;>>>>	RG:Z:UM0037:1
Y:16597235+13M13I11M:F:181	141	*	0	0	*	*	0	0	AACT	==;;	RG:Z:UM0037:1
Y:16597235+13M13I11M:F:181	141	*	0	0	*	*	0	0	*	*	RG:Z:UM0037:1

Output:

in	testFiles/sortedSam.sam
out	results/updatedSam.sam
log	results/updated.log
checkSQ

Log File:

Arguments in effect:
	--in [testFiles/sortedSam.sam]
	--out [results/updatedSam.sam]
	--log [results/updated.log]
	--fasta [testFiles/testFasta.fa]
	--AS [my37]
	--UR [testFasta.fa]
	--SP [new]
	--checkSQ [ON]
	--HD [@HD	VN:1.0	SO:coordinate	GO:none]
	--RG [@RG	ID:UM0037:1	SM:Sample2	LB:lb2	PU:mypu	CN:UMCORE	DT:2010-11-01	PL:ILLUMINA]
	--PG [@PG	ID:polish	VN:0.0.1]
Reading the reference file testFiles/testFasta.fa
Finished reading the reference file testFiles/testFasta.fa
Finished checking the consistency of SQ tags
Creating the header of new output file
Adding 1 HD, 1 RG, and 1 PG headers
Finished writing output headers
Writing output BAM file
Successfully written 10 records