BamUtil: trimBam

From Genome Analysis Wiki
Jump to: navigation, search

Overview of the trimBam function of bamUtil

The trimBam option on the bamUtil executable trims the end of reads in a SAM/BAM file, changing read ends to ‘N’ and quality to ‘!’, or by soft clipping (if command-line option, --clip is specified).

Usage

./bam trimBam [inFile] [outFile] [num-bases-to-trim-on-each-side]

Version 1.0.6 and later:

Alternately, the number of bases from each side can be specified (either or both -L/-R (--left/--right) can be specified):

./bam trimBam [inFile] [outFile] -L [num-bases-to-trim-from-left] -R [num-bases-to-trim-from-right]

By default reverse strands are reversed and then the left & right are trimmed.

This means that --left actually trims from the right of the read in the SAM/BAM for reverse reads.

Optionally --ignoreStrand/-i can be specified to ignore the strand information and treat forward/reverse the same.

trimBam will modify the sequences to 'N', and the quality string to '!' unless the optional parameter --clip/-c is specified. If --clip/-c is specified, the ends will be soft clipped instead of modified.


Soft Clipping Notes (--clip/-c)

Available in version 1.0.14 and later.

When soft clipping:

  • if the entire read would be soft clipped, no clipping is done, and instead the read is marked as unmapped
  • mate information is not updated (start positions/mapping may change after soft clipping)
    • run samtools fixmate to fix mate information (will first need to sort by read name)
  • output is not sorted (start positions/mapping may change after soft clipping)
    • run samtools sort to resort by coordinate (after fixmate)
  • soft clips already in the read are maintained or added to
    • if 3 bases were clipped and 2 are specified to be clipped, no change is made to that end
    • if 3 bases were clipped and 5 are specified to be clipped, 2 additional bases are clipped from that end

Fixing the mate/resorting

In order to update the mate, samtools fixmate must be run.

In order to reorder the file, samtools sort must be run.

Notes about the samtools programs:

  • samtools fixmate requires the file to be sorted by query name.
  • samtools sort cannot write to pipes.

Steps

  1. Run this program and pipe it into samtools sort by query name
    • ./bam trimBam <your InputFile> - [#basesToTrim] [any other options] -c | samtools sort -n - tempQuerySort
  2. Run samtools fixmate and pipe it into samtools sort by position
    •  samtools fixmate tempQuerySort.bam - | samtools sort - finalResult

Parameters

    Required Parameters:
        inFile  : the SAM/BAM file to be read
        outFile : the SAM/BAM file to be written
        num-bases-to-trim-on-each-side : the number of bases/qualities to trim from each side
    Instead of num-bases-to-trim-on-each-side, -L/-R (or --left/--right) can be specified to indicate the number of bases to trim from the left/right (left/right are reversed for reverse strands)
    Optional Parameters:
        --ignoreStrand : ignore strand information - do not reverse left/right for reverse reads
        --clip         : soft clip the ends rather than setting to N/!
	PhoneHome:
		--noPhoneHome       : disable PhoneHome (default enabled)
		--phoneHomeThinning : adjust the PhoneHome thinning parameter (default 50)

Required Parameters

Input File (1st argument)

The 1st argument is the name of the input SAM/BAM file.

The program automatically determines if your input file is SAM/BAM/uncompressed BAM without any input other than a filename from the user, unless your input file is stdin.

A - is used to indicate to read from stdin and the extension is used to determine the file type (no extension indicates SAM).

SAM/BAM/Uncompressed BAM from file yourFileName
SAM from stdin -
BAM from stdin -.bam
Uncompressed BAM from stdin -.ubam


Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools implementation so pipes between our tools and samtools are supported.

Output File (2nd argument)

The 2nd argument is the name of the output SAM/BAM file.

The file extension is used to determine whether to write SAM/BAM/uncompressed BAM. A - is used to indicate stdout and the extension for file type (no extension is SAM).

SAM to file yourFileName.sam
BAM to file yourFileName.bam
Uncompressed BAM to file yourFileName.ubam
SAM to stdout -
BAM to stdout -.bam
Uncompressed BAM to stdout -.ubam


Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools implementation so pipes between our tools and samtools are supported.

Optional parameters

Number of Bases to Trim from Each End (3rd argument)

If the 3rd argument a number (with no flag/option), it is the number of bases to trim from each end of the reads.

Trim Bases from the Left (--left or -L)

Use --left or -L followed by the number of bases to be trimmed from the left.

By default reverse strands are reversed and then the left is trimmed, meaning that --left actually trims from the right of the read in the SAM/BAM for reverse reads.

Use --ignoreStrand/-i to ignore the strand information and treat forward/reverse the same.

Trim Bases from the Right (--right or -R)

Use --right or -R followed by the number of bases to be trimmed from the right.

By default reverse strands are reversed and then the right is trimmed, meaning that --right actually trims from the left of the read in the SAM/BAM for reverse reads.

Use --ignoreStrand/-i to ignore the strand information and treat forward/reverse the same.

Ignore the Strand when Trimming (--ignoreStrand or -i)

Use --ignoreStrand or -i to ignore the strand information and treat forward/reverse the same. When --ignoreStrand or -i is set, do not reverse reverse reads prior to trimming left/right.

SoftClip the Ends (--clip or -c)

Use --clip or -c to soft clip the ends instead of setting to N/! (or set to unmapped if the entire read would be soft clipped).

See Soft Clipping Notes for more information about clipping and post processing that will need to be done.

Do not require BGZF EOF block (--noeof)

Use --noeof if you do not expect a trailing eof block in your bgzf file.

By default, the trailing empty block is expected and checked for.

PhoneHome Parameters

See PhoneHome for more information on how PhoneHome works and what it does.

Turn off PhoneHome (--noPhoneHome)

Use the --noPhoneHome option to completely disable PhoneHome. PhoneHome is enabled by default based on the thinning parameter.

Adjust the Frequency of PhoneHome (--phoneHomeThinning)

Use --phoneHomeThinning to modify the percentage of the time that PhoneHome will run (0-100).

  • By default, --phoneHomeThinning is set to 50, running 50% of the time.
  • PhoneHome will only occur if the run's random number modulo 100 is less than the --phoneHomeThinning value.
  • N/A if --noPhoneHome is set.

Return Value

Returns the SamStatus for the reads/writes. 0 on success, non-0 on failure.

Examples

Trim the same number of bases from each side

Example Input, trimming 2 bases:

./bin trimBam testFiles/testSam.sam results/trimSam.sam 2

Example Output:

Arguments in effect: 
	Input file : testFiles/testSam.sam
	Output file : results/trimSam.sam
	#Bases to trim from each side : 2

Number of records read = 10
Number of records written = 10


Trim different bases from each side, but treat reverse strands the opposite

Example Input, trimming 1 base from the left and 2 bases from the right for forward strands and do the opposite for reverse strands:

./bin trimBam testFiles/testSam.sam results/trimSam.sam -L 1 -R 2

Example Output:

Arguments in effect: 
	Input file : testFiles/testSam.sam
	Output file : results/trimSam.sam
	#Bases to trim from the left of forward strands : 1
	#Bases to trim from the right of forward strands: 2
	#Bases to trim from the left of reverse strands : 2
	#Bases to trim from the right of reverse strands : 1

Number of records read = 10
Number of records written = 10


Trim different bases from each side, but treat forward & reverse the same

Example Input, trimming 1 base from the left and 2 bases from the right ignoring strand information:

./bin trimBam testFiles/testSam.sam results/trimSam.sam -L 1 -R 2 --ignoreStrand

Example Output:

Arguments in effect: 
	Input file : testFiles/testSam.sam
	Output file : results/trimSam.sam
	#Bases to trim from the left of forward strands : 1
	#Bases to trim from the right of forward strands: 2
	#Bases to trim from the left of reverse strands : 1
	#Bases to trim from the right of reverse strands : 2

Number of records read = 10
Number of records written = 10