BamUtil: polishBam

From Genome Analysis Wiki
Jump to: navigation, search

Overview of the polishBam function of bamUtil

The polishBam option on the bamUtil executable adds/updates header lines & adds the RG tag to each record.

Usage

./bam polishBam (options) --in <inBamFile> --out <outBamFile>

Parameters

   Required parameters: 
        -i/--in : input BAM file
        -o/--out : output BAM file
   Optional parameters:
        -v : turn on verbose mode
        -l/--log : writes logfile. <outBamFile>.log will be used if value is unspecified
        --HD : add @HD header line
        --RG : add @RG header line
        --PG : add @PG header line
        -f/--fasta : fasta reference file to compute MD5sums and update SQ tags
        --AS : AS tag for genome assembly identifier
        --UR : UR tag for @SQ tag (if different from --fasta)
        --SP : SP tag for @SQ tag
        --checkSQ : check the consistency of SQ tags (SN and LN) with existing header lines. Must be used with --fasta option
	PhoneHome:
		--noPhoneHome       : disable PhoneHome (default enabled)
		--phoneHomeThinning : adjust the PhoneHome thinning parameter (default 50)

Required Parameters

Input File (--in)

Use --in followed by your file name to specify the SAM/BAM input file.

The program automatically determines if your input file is SAM/BAM/uncompressed BAM without any input other than a filename from the user, unless your input file is stdin.

A - is used to indicate to read from stdin and the extension is used to determine the file type (no extension indicates SAM).

SAM/BAM/Uncompressed BAM from file --in yourFileName
SAM from stdin --in -
BAM from stdin --in -.bam
Uncompressed BAM from stdin --in -.ubam


Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools implementation so pipes between our tools and samtools are supported.

Output File (--out)

Use --out followed by your file name to specify the SAM/BAM output file.

The file extension is used to determine whether to write SAM/BAM/uncompressed BAM. A - is used to indicate stdout and the extension for file type (no extension is SAM).

SAM to file --out yourFileName.sam
BAM to file --out yourFileName.bam
Uncompressed BAM to file --out yourFileName.ubam
SAM to stdout --out -
BAM to stdout --out -.bam
Uncompressed BAM to stdout --out -.ubam


Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools implementation so pipes between our tools and samtools are supported.

Optional Parameters

Verbose (--verbose)

Use --verbose to turn on verbose mode.

Specify Log Filename (--log)

Use --log followed by the log filename to specify the log filename. Default is the output file basename with a .log extension

Add the HD Header (--HD)

Use --HD followed by the HD header line to add a HD header. Be sure to include "@HD" in the line you specify.

Add the RG Header (--RG)

Use --RG followed by the RG header line to add a RG header. Be sure to include "@RG" in the line you specify.

Add the PG Header (--PG)

Use --PG followed by the PG header line to add a PG header. Be sure to include "@PG" in the line you specify.

Add MD5 and UR tags to SQ Headers (--fasta)

Use --fasta followed by the fasta reference file name to compute MD5sums and update SQ tags with the M5 & UR values. Use the --UR option to specify a different UR value.

Add the AS tag to SQ Headers (--AS)

Use --AS followed by the genome assembly identify to add the AS tag to the SQ Headers.

Add the UR tag to SQ Headers (--UR)

Use --UR followed by the URI of the sequence to add the UR tag to the SQ Headers.

The UR header will be automatically added with the --fasta option, so if --fasta is used, --UR only needs to be specified if it is different from --fasta.

Add the SP tag to SQ Headers (--SP)

Use --SP followed by the species to add the SP tag to the SQ Headers.

PhoneHome Parameters

See PhoneHome for more information on how PhoneHome works and what it does.

Turn off PhoneHome (--noPhoneHome)

Use the --noPhoneHome option to completely disable PhoneHome. PhoneHome is enabled by default based on the thinning parameter.

Adjust the Frequency of PhoneHome (--phoneHomeThinning)

Use --phoneHomeThinning to modify the percentage of the time that PhoneHome will run (0-100).

  • By default, --phoneHomeThinning is set to 50, running 50% of the time.
  • PhoneHome will only occur if the run's random number modulo 100 is less than the --phoneHomeThinning value.
  • N/A if --noPhoneHome is set.

Return Value

Returns 0 on success, non-0 on failure.

Example

Command:

./bam polishBam  --in testFiles/sortedSam.sam --out results/updatedSam.sam --log results/updated.log --checkSQ --fasta testFiles/testFasta.fa --AS my37 --UR testFasta.fa --RG "@RG	ID:UM0037:1	SM:Sample2	LB:lb2	PU:mypu	CN:UMCORE	DT:2010-11-01	PL:ILLUMINA" --PG "@PG	ID:polish	VN:0.0.1" --SP new --HD "@HD	VN:1.0	SO:coordinate	GO:none"

Input File:

@SQ	SN:1	LN:2004
@SQ	SN:2	LN:2000
@SQ	SN:3	LN:2005
@SQ	SN:4	LN:2040
@SQ	SN:5	LN:2006
@RG	ID:myID	LB:library	SM:sample
@RG	ID:myID2	SM:sample2	LB:library2
@CO	Comment 1
@CO	Comment 2
18:462+29M5I3M:F:295	97	1	75	0	5M	18	757	0	ACGTN	;>>>>	AM:i:0	MD:Z:30A0C5	NM:i:2	XT:A:R
18:462+29M5I3M:F:295	97	1	75	0	*	18	757	0	*	*	AM:i:0
1:1011:F:255+17M15D20M	73	1	1011	0	5M2D	=	1011	0	CCGAA	6>6+4	AM:i:0	MD:Z:37	NM:i:0	XT:A:R
1:1011:F:255+17M15D20M	133	1	1012	0	*	=	1011	0	CTGT	>>9>
18:462+29M5I3M:F:296	97	1	1751	0	3S2H5M	18	757	0	TGCACGTN	453;>>>>
18:462+29M5I3M:F:295	97	2	75	0	5M	18	757	0	ACGTN	*	AM:i:0	MD:Z:30A0C5	NM:i:2	XT:A:R
18:462+29M5I3M:F:297	97	2	1751	0	3S5M1S3H	18	757	0	TGCACGTNG	453;>>>>5
18:462+29M5I3M:F:298	97	3	75	0	3S5M4H	18	757	0	TGCACGTN	453;>>>>
Y:16597235+13M13I11M:F:181	141	*	0	0	*	*	0	0	AACT	==;;
Y:16597235+13M13I11M:F:181	141	*	0	0	*	*	0	0	*	*


Output File:

@SQ	SN:1	LN:2004	AS:my37	M5:a9cfe5b8c11aa0cc2c0d2bf3602c9804	UR:testFasta.fa	SP:new
@SQ	SN:2	LN:2000	AS:my37	M5:7c342606b54aa211a50f5f63ac1cb2eb	UR:testFasta.fa	SP:new
@SQ	SN:3	LN:2005	AS:my37	M5:c30e547093f33de240b164a4a2ebe3b5	UR:testFasta.fa	SP:new
@SQ	SN:4	LN:2040	AS:my37	M5:fc4c559e9da51e93e7875031ddf65f2a	UR:testFasta.fa	SP:new
@SQ	SN:5	LN:2006	AS:my37	M5:c876194283debb8b507ebd0f82309ec4	UR:testFasta.fa	SP:new
@RG	ID:myID	LB:library	SM:sample
@RG	ID:myID2	SM:sample2	LB:library2
@HD	VN:1.0	SO:coordinate	GO:none
@RG	ID:UM0037:1	SM:Sample2	LB:lb2	PU:mypu	CN:UMCORE	DT:2010-11-01	PL:ILLUMINA
@PG	ID:polish	VN:0.0.1
@CO	Comment 1
@CO	Comment 2
18:462+29M5I3M:F:295	97	1	75	0	5M	18	757	0	ACGTN	;>>>>	AM:i:0	MD:Z:30A0C5	NM:i:2	RG:Z:UM0037:1	XT:A:R
18:462+29M5I3M:F:295	97	1	75	0	*	18	757	0	*	*	AM:i:0	RG:Z:UM0037:1
1:1011:F:255+17M15D20M	73	1	1011	0	5M2D	=	1011	0	CCGAA	6>6+4	AM:i:0	MD:Z:37	NM:i:0	RG:Z:UM0037:1	XT:A:R
1:1011:F:255+17M15D20M	133	1	1012	0	*	=	1011	0	CTGT	>>9>	RG:Z:UM0037:1
18:462+29M5I3M:F:296	97	1	1751	0	3S2H5M	18	757	0	TGCACGTN	453;>>>>	RG:Z:UM0037:1
18:462+29M5I3M:F:295	97	2	75	0	5M	18	757	0	ACGTN	*	AM:i:0	MD:Z:30A0C5	NM:i:2	RG:Z:UM0037:1	XT:A:R
18:462+29M5I3M:F:297	97	2	1751	0	3S5M1S3H	18	757	0	TGCACGTNG	453;>>>>5	RG:Z:UM0037:1
18:462+29M5I3M:F:298	97	3	75	0	3S5M4H	18	757	0	TGCACGTN	453;>>>>	RG:Z:UM0037:1
Y:16597235+13M13I11M:F:181	141	*	0	0	*	*	0	0	AACT	==;;	RG:Z:UM0037:1
Y:16597235+13M13I11M:F:181	141	*	0	0	*	*	0	0	*	*	RG:Z:UM0037:1

Output:

in	testFiles/sortedSam.sam
out	results/updatedSam.sam
log	results/updated.log
checkSQ

Log File:

Arguments in effect:
	--in [testFiles/sortedSam.sam]
	--out [results/updatedSam.sam]
	--log [results/updated.log]
	--fasta [testFiles/testFasta.fa]
	--AS [my37]
	--UR [testFasta.fa]
	--SP [new]
	--checkSQ [ON]
	--HD [@HD	VN:1.0	SO:coordinate	GO:none]
	--RG [@RG	ID:UM0037:1	SM:Sample2	LB:lb2	PU:mypu	CN:UMCORE	DT:2010-11-01	PL:ILLUMINA]
	--PG [@PG	ID:polish	VN:0.0.1]
Reading the reference file testFiles/testFasta.fa
Finished reading the reference file testFiles/testFasta.fa
Finished checking the consistency of SQ tags
Creating the header of new output file
Adding 1 HD, 1 RG, and 1 PG headers
Finished writing output headers
Writing output BAM file
Successfully written 10 records