BamUtil: writeRegion

From Genome Analysis Wiki
Jump to navigationJump to search


Overview of the writeRegion function of bamUtil

The writeRegion option on the bamUtil executable uses an indexed BAM file to only write the alignments that:

  • fall within the region specified
    • region as defined by refID or refName and start and/or end
    • region as defined in the bed file
    • overlapping or fully within if --withinReg is specified
  • have a specific read name (if specified)


Usage

./bam writeRegion --in <inputFilename>  --out <outputFilename> [--bamIndex <bamIndexFile>] [--refName <reference Name> | --refID <reference ID>] [--start <0-based start pos>] [--end <0-based end psoition>] [--bed <bed filename>] [--withinRegion] [--readName <readName>] [--lshift] [--params] [--noeof]


Parameters

	Required Parameters:
		--in        : the BAM file to be read
		--out       : the SAM/BAM file to write to
	Optional Parameters for Specifying a Region:
		--bamIndex  : the path/name of the bam index file
		              (if not specified, uses the --in value + ".bai")
		--refName   : the BAM reference Name to read
		              Either this or refID can be specified.
		              Defaults to all references.
		--refID     : the BAM reference ID to read.
		              Either this or refName can be specified.
		              Defaults to all references.
		              Specify -1 for unmapped
		--start     : inclusive 0-based start position.
		              Defaults to -1: meaning from the start of the reference.
		              Only applicable if refName/refID is set.
		--end       : exclusive 0-based end position.
		              Defaults to -1: meaning til the end of the reference.
		              Only applicable if refName/refID is set.
		--bed       : use the specified bed file for regions.
		--withinReg : only print reads fully enclosed within the region.
		--readName  : only print reads with this read name.
	Optional Parameters For Other Operations:
		--lshift        : left shift indels when writing records
		--excludeFlags  : Skip any records with any of the specified flags set
		                  (specify an integer representation of the flags)
		--requiredFlags : Only process records with all of the specified flags set
		                  (specify an integer representation of the flags)
		--params        : print the parameter settings
		--noeof         : do not expect an EOF block on a bam file.
	PhoneHome:
		--noPhoneHome       : disable PhoneHome (default enabled)
		--phoneHomeThinning : adjust the PhoneHome thinning parameter (default 50)

Required Parameters

Input File (--in)

Use --in followed by your file name to specify the SAM/BAM input file.

The program automatically determines if your input file is SAM/BAM/uncompressed BAM without any input other than a filename from the user, unless your input file is stdin.

A - is used to indicate to read from stdin and the extension is used to determine the file type (no extension indicates SAM).

SAM/BAM/Uncompressed BAM from file --in yourFileName
SAM from stdin --in -
BAM from stdin --in -.bam
Uncompressed BAM from stdin --in -.ubam


Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools implementation so pipes between our tools and samtools are supported.

Output File (--out)

Use --out followed by your file name to specify the SAM/BAM output file.

The file extension is used to determine whether to write SAM/BAM/uncompressed BAM. A - is used to indicate stdout and the extension for file type (no extension is SAM).

SAM to file --out yourFileName.sam
BAM to file --out yourFileName.bam
Uncompressed BAM to file --out yourFileName.ubam
SAM to stdout --out -
BAM to stdout --out -.bam
Uncompressed BAM to stdout --out -.ubam


Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools implementation so pipes between our tools and samtools are supported.

Optional Region Specifying Parameters

Bam Index File (--bamIndex)

Use --bamIndex followed by your file name to specify the BAM index file to use for reading the BAM file.

If this file is required but not specified, it will use the input file name + ".bai".

Read only a Specific Reference/Chromosome (--refName or --refID)

If you only want to read a specific reference (chromosome), specify either --refName followed by the reference name or --refID followed by the reference id.

If you want to read all references, don't specify either --refName or --refID.

The reference Name is the name specified in the RNAME field of the records in the SAM file or in the name fields of the reference information section of the BAM file.

The reference ID is the value specified in the refID field of the records in the BAM file.

If you want to read only unmapped reads, use --refID -1

Read only a Specific Region of a Chromosome (--start and --end)

You can only specify a specific region if you also specify a specific reference/chromosome using --refName or --refID.

Use --start to specify the inclusive 0-based start position of the region you want to read. Specify --start -1 to specify start at the beginning of the specified chromosome.

Use --end to specify the exclusive 0-based end position of the region you want to read. Specify --end -1 to specify end of the specified chromosome.

Bed File with Regions to Write (--bed)

If --bed followed by a filename is specified the regions specified in the bed file will be written.

It is assumed that the regions in the bed file are sorted.

Only Write Reads Fully within the Specified Region (--withinReg)

By default reads that overlap the specified region are written. If instead you only want to write reads that are fully within the specified regions, use the --withinReg option.

Only Print Reads with a Specified Read Name (--readName)

If you only want to print reads with a specific read name, use the --readName option followed by the read name.

Optional Parameters For Other Operations

Left Shift Indels in the CIGAR (--lshift)

Left shift indels as far as they can go in the read.

Skip Records with any of the Specified Flags (--excludeFlags)

Use --excludeFlags followed by the flags (as one integer) to skip any records that has any of the specified flags set.

This parameter was added in version 1.0.10.

Only Process Records with the all of the Specified Flags (--requiredFlags)

Use --requiredFlags followed by the flags (as one integer) to only process records with all of the specified flags set.

This parameter was added in version 1.0.10.

Print the Program Parameters (--params)

Use --params to print the parameters for your program to stderr.

Do not require BGZF EOF block (--noeof)

Use --noeof if you do not expect a trailing eof block in your bgzf file.

By default, the trailing empty block is expected and checked for.

PhoneHome Parameters

See PhoneHome for more information on how PhoneHome works and what it does.

Turn off PhoneHome (--noPhoneHome)

Use the --noPhoneHome option to completely disable PhoneHome. PhoneHome is enabled by default based on the thinning parameter.

Adjust the Frequency of PhoneHome (--phoneHomeThinning)

Use --phoneHomeThinning to modify the percentage of the time that PhoneHome will run (0-100).

  • By default, --phoneHomeThinning is set to 50, running 50% of the time.
  • PhoneHome will only occur if the run's random number modulo 100 is less than the --phoneHomeThinning value.
  • N/A if --noPhoneHome is set.

Return Value

  • 0: all records are successfully read and written.
  • non-0: at least one record was not successfully read or written.

Example Output


Wrote t.sam with 2 records.