BamUtil: clipOverlap

From Genome Analysis Wiki
Revision as of 16:20, 28 October 2011 by Mktrost (talk | contribs)
Jump to navigationJump to search


Overview of the clipOverlap function of bamUtil

The clipOverlap option on the bamUtil executable clips overlapping read pairs.

RESTRICTIONS

  • Assumes the file is sorted by ReadName
  • Assumes only 2 reads have matching ReadNames
    • It matches in pairs, so if there are 3, the first 2 will be matched and compared, but the 3rd won't. If there are 4, the first 2 will be matched and the last 2 will be matched and compared.
  • Only mapped reads will be clipped

Rules for Clipping

Clipping from the front

The first operation after the softclip will be a Match/Mismatch, meaning that any trailing pads, deletions, insertions, or skips will also be soft clipped.

Clip Location How it is handled
If the clip position falls in a skip/deletion Removes the entire skip/deletion
If the position immediately after the clip is a skip/deletion Also removes the skip/deletion
If the position immediately after the clip is an Insert Softclips the insert
If the position immediately after the clip is a Pad Removes the pad
Clip occurs at the last match/mismatch position of the read (the entire read is clipped) Entire read is soft clipped, 0-based position is left as the original (not modified)

Clipping from the back

Clip Location How it is handled
If the clip position falls in a skip/deletion Removes the entire skip/deletion
If the position immediately before the clip is a deletion/skip/pad Remove the deletion/skip/pad
If the position immediately before the clip is an insertion Leave the insertion, even if it results in a 70M3I27S
Clip occurs at the first position of the read (the entire read is clipped) Entire read is soft clipped, 0-based position is left as the original (not modified)


Usage

Parameters

	Required Parameters:
		--in         : the SAM/BAM file to be read
		--out        : the SAM/BAM file to be written
	Optional Parameters:
		--noeof      : do not expect an EOF block on a bam file.
		--params     : print the parameter settings


Input File (--in)

Use --in followed by your file name to specify the SAM/BAM input file.

The program automatically determines if your input file is SAM/BAM/uncompressed BAM without any input other than a filename from the user, unless your input file is stdin.

A - is used to indicate to read from stdin and the extension is used to determine the file type (no extension indicates SAM).

SAM/BAM/Uncompressed BAM from file --in yourFileName
SAM from stdin --in -
BAM from stdin --in -.bam
Uncompressed BAM from stdin --in -.ubam


Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools implementation so pipes between our tools and samtools are supported.

Output File (--out)

Use --out followed by your file name to specify the SAM/BAM output file.

The file extension is used to determine whether to write SAM/BAM/uncompressed BAM. A - is used to indicate stdout and the extension for file type (no extension is SAM).

SAM to file --out yourFileName.sam
BAM to file --out yourFileName.bam
Uncompressed BAM to file --out yourFileName.ubam
SAM to stdout --out -
BAM to stdout --out -.bam
Uncompressed BAM to stdout --out -.ubam


Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools implementation so pipes between our tools and samtools are supported.



Return Value

Returns the SamStatus for the reads/writes.


Example Output