BamUtil: polishBam

From Genome Analysis Wiki
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Overview of the polishBam function of bamUtil

The polishBam option on the bamUtil executable adds/updates header lines & adds the RG tag to each record.

Usage

./bam polishBam (options) --in <inBamFile> --out <outBamFile>

Parameters

   Required parameters: 
        -i/--in : input BAM file
        -o/--out : output BAM file
   Optional parameters:
        -v : turn on verbose mode
        -l/--log : writes logfile. <outBamFile>.log will be used if value is unspecified
        --HD : add @HD header line
        --RG : add @RG header line
        --PG : add @PG header line
        -f/--fasta : fasta reference file to compute MD5sums and update SQ tags
        --AS : AS tag for genome assembly identifier
        --UR : UR tag for @SQ tag (if different from --fasta)
        --SP : SP tag for @SQ tag
        --checkSQ : check the consistency of SQ tags (SN and LN) with existing header lines. Must be used with --fasta option


Return Value

Returns 0.

Example

Command:

./bam polishBam  --in testFiles/sortedSam.sam --out results/updatedSam.sam --log results/updated.log --checkSQ --fasta testFiles/testFasta.fa --AS my37 --UR testFasta.fa --RG "@RG	ID:UM0037:1	SM:Sample2	LB:lb2	PU:mypu	CN:UMCORE	DT:2010-11-01	PL:ILLUMINA" --PG "@PG	ID:polish	VN:0.0.1" --SP new --HD "@HD	VN:1.0	SO:coordinate	GO:none"

Input File:

@SQ	SN:1	LN:2004
@SQ	SN:2	LN:2000
@SQ	SN:3	LN:2005
@SQ	SN:4	LN:2040
@SQ	SN:5	LN:2006
@RG	ID:myID	LB:library	SM:sample
@RG	ID:myID2	SM:sample2	LB:library2
@CO	Comment 1
@CO	Comment 2
18:462+29M5I3M:F:295	97	1	75	0	5M	18	757	0	ACGTN	;>>>>	AM:i:0	MD:Z:30A0C5	NM:i:2	XT:A:R
18:462+29M5I3M:F:295	97	1	75	0	*	18	757	0	*	*	AM:i:0
1:1011:F:255+17M15D20M	73	1	1011	0	5M2D	=	1011	0	CCGAA	6>6+4	AM:i:0	MD:Z:37	NM:i:0	XT:A:R
1:1011:F:255+17M15D20M	133	1	1012	0	*	=	1011	0	CTGT	>>9>
18:462+29M5I3M:F:296	97	1	1751	0	3S2H5M	18	757	0	TGCACGTN	453;>>>>
18:462+29M5I3M:F:295	97	2	75	0	5M	18	757	0	ACGTN	*	AM:i:0	MD:Z:30A0C5	NM:i:2	XT:A:R
18:462+29M5I3M:F:297	97	2	1751	0	3S5M1S3H	18	757	0	TGCACGTNG	453;>>>>5
18:462+29M5I3M:F:298	97	3	75	0	3S5M4H	18	757	0	TGCACGTN	453;>>>>
Y:16597235+13M13I11M:F:181	141	*	0	0	*	*	0	0	AACT	==;;
Y:16597235+13M13I11M:F:181	141	*	0	0	*	*	0	0	*	*


Output File:

@SQ	SN:1	LN:2004	AS:my37	M5:a9cfe5b8c11aa0cc2c0d2bf3602c9804	UR:testFasta.fa	SP:new
@SQ	SN:2	LN:2000	AS:my37	M5:7c342606b54aa211a50f5f63ac1cb2eb	UR:testFasta.fa	SP:new
@SQ	SN:3	LN:2005	AS:my37	M5:c30e547093f33de240b164a4a2ebe3b5	UR:testFasta.fa	SP:new
@SQ	SN:4	LN:2040	AS:my37	M5:fc4c559e9da51e93e7875031ddf65f2a	UR:testFasta.fa	SP:new
@SQ	SN:5	LN:2006	AS:my37	M5:c876194283debb8b507ebd0f82309ec4	UR:testFasta.fa	SP:new
@RG	ID:myID	LB:library	SM:sample
@RG	ID:myID2	SM:sample2	LB:library2
@HD	VN:1.0	SO:coordinate	GO:none
@RG	ID:UM0037:1	SM:Sample2	LB:lb2	PU:mypu	CN:UMCORE	DT:2010-11-01	PL:ILLUMINA
@PG	ID:polish	VN:0.0.1
@CO	Comment 1
@CO	Comment 2
18:462+29M5I3M:F:295	97	1	75	0	5M	18	757	0	ACGTN	;>>>>	AM:i:0	MD:Z:30A0C5	NM:i:2	RG:Z:UM0037:1	XT:A:R
18:462+29M5I3M:F:295	97	1	75	0	*	18	757	0	*	*	AM:i:0	RG:Z:UM0037:1
1:1011:F:255+17M15D20M	73	1	1011	0	5M2D	=	1011	0	CCGAA	6>6+4	AM:i:0	MD:Z:37	NM:i:0	RG:Z:UM0037:1	XT:A:R
1:1011:F:255+17M15D20M	133	1	1012	0	*	=	1011	0	CTGT	>>9>	RG:Z:UM0037:1
18:462+29M5I3M:F:296	97	1	1751	0	3S2H5M	18	757	0	TGCACGTN	453;>>>>	RG:Z:UM0037:1
18:462+29M5I3M:F:295	97	2	75	0	5M	18	757	0	ACGTN	*	AM:i:0	MD:Z:30A0C5	NM:i:2	RG:Z:UM0037:1	XT:A:R
18:462+29M5I3M:F:297	97	2	1751	0	3S5M1S3H	18	757	0	TGCACGTNG	453;>>>>5	RG:Z:UM0037:1
18:462+29M5I3M:F:298	97	3	75	0	3S5M4H	18	757	0	TGCACGTN	453;>>>>	RG:Z:UM0037:1
Y:16597235+13M13I11M:F:181	141	*	0	0	*	*	0	0	AACT	==;;	RG:Z:UM0037:1
Y:16597235+13M13I11M:F:181	141	*	0	0	*	*	0	0	*	*	RG:Z:UM0037:1

Output:

in	testFiles/sortedSam.sam
out	results/updatedSam.sam
log	results/updated.log
checkSQ

Log File:

Arguments in effect:
	--in [testFiles/sortedSam.sam]
	--out [results/updatedSam.sam]
	--log [results/updated.log]
	--fasta [testFiles/testFasta.fa]
	--AS [my37]
	--UR [testFasta.fa]
	--SP [new]
	--checkSQ [ON]
	--HD [@HD	VN:1.0	SO:coordinate	GO:none]
	--RG [@RG	ID:UM0037:1	SM:Sample2	LB:lb2	PU:mypu	CN:UMCORE	DT:2010-11-01	PL:ILLUMINA]
	--PG [@PG	ID:polish	VN:0.0.1]
Reading the reference file testFiles/testFasta.fa
Finished reading the reference file testFiles/testFasta.fa
Finished checking the consistency of SQ tags
Creating the header of new output file
Adding 1 HD, 1 RG, and 1 PG headers
Finished writing output headers
Writing output BAM file
Successfully written 10 records