Difference between revisions of "BamUtil: writeRegion"

From Genome Analysis Wiki
Jump to: navigation, search
(Remove redundant description.)
Line 10: Line 10:
 
** overlapping or fully within if <code>--withinReg</code> is specified
 
** overlapping or fully within if <code>--withinReg</code> is specified
 
* have a specific read name (if specified)
 
* have a specific read name (if specified)
 +
 +
 +
= Usage =
 +
 +
./bam writeRegion --in <inputFilename>  --out <outputFilename> [--bamIndex <bamIndexFile>] [--refName <reference Name> | --refID <reference ID>] [--start <0-based start pos>] [--end <0-based end psoition>] [--bed <bed filename>] [--withinRegion] [--readName <readName>] [--lshift] [--params] [--noeof]
  
  
Line 17: Line 22:
 
--in        : the BAM file to be read
 
--in        : the BAM file to be read
 
--out      : the SAM/BAM file to write to
 
--out      : the SAM/BAM file to write to
Optional Parameters:
+
Optional Parameters for Specifying a Region:
 
--bamIndex  : the path/name of the bam index file
 
--bamIndex  : the path/name of the bam index file
 
              (if not specified, uses the --in value + ".bai")
 
              (if not specified, uses the --in value + ".bai")
Line 36: Line 41:
 
--withinReg : only print reads fully enclosed within the region.
 
--withinReg : only print reads fully enclosed within the region.
 
--readName  : only print reads with this read name.
 
--readName  : only print reads with this read name.
 +
Optional Parameters For Other Operations:
 +
--lshift    : left shift indels when writing records
 
--params    : print the parameter settings
 
--params    : print the parameter settings
 
--noeof    : do not expect an EOF block on a bam file.
 
--noeof    : do not expect an EOF block on a bam file.
 
</pre>
 
</pre>
  
= Usage =
+
{{InBAMInputFile}}
 +
{{OutBAMOutputFile}}
 +
{{bamIndex}}
 +
 
 +
== Region Specifying Parameters ==
 +
=== Read only a Specific Reference/Chromosome (<code>--refName</code> or <code>--refID</code>) ===
 +
If you only want to read a specific reference (chromosome), specify either the reference name following <code>--refName</code> or the reference id following </code>--refID</code>.
 +
 
 +
If you want to read all references, don't specify either <code>--refName</code> or <code>--refID</code>.
 +
 
 +
The reference Name is the name specified in the <code>RNAME</code> field of the records in the SAM file or in the <code>name</code> fields of the reference information section of the BAM file.
 +
 
 +
The reference ID is the value specified in the <code>refID</code> field of the records in the BAM file. 
 +
 
 +
If you want to read only unmapped reads, use <code>--refID -1</code>
 +
 
 +
=== Read only a Specific Region of a Chromosome (<code>--start</code> and <code>--end</code>) ===
 +
 
 +
You can only specify a specific region if you also specify a specific reference/chromosome using <code>--refName</code> or <code>--refID</code>.
 +
 
 +
Use <code>--start</code> to specify the inclusive 0-based start position of the region you want to read.  Specify <code>--start -1</code> to specify start at the beginning of the specified chromosome.
 +
 
 +
Use <code>--end</code> to specify the exclusive 0-based end position of the region you want to read.  Specify <code>--end -1</code> to specify end of the specified chromosome.
 +
 
 +
=== Bed File with Regions to Write (<code>--bed</code>) ===
 +
 
 +
If <code>--bed</code> followed by a filename is specified the regions specified in the bed file will be written.
 +
 
 +
It is assumed that the regions in the bed file are sorted.
 +
 
 +
=== Only Write Reads Fully within the Specified Region (<code>--withinReg</code>) ===
 +
 
 +
By default reads that overlap the specified region are written.  If instead you only want to write reads that are fully within the specified regions, use the <code>--withinReg</code> option.
 +
 
 +
=== Only Print Reads with a Specified Read Name (code>--readName</code>) ===
 +
 
 +
If you only want to print reads with a specific read name, use the <code>--readName</code> option followed by the read name.
 +
 
 +
 
 +
== Left Shift Indels in the CIGAR (<code>--lshift</code>) ==
 +
 
 +
Left shift indels as far as they can go in the read.
 +
 
 +
{{noeofBGZFParameter}}
 +
{{paramsParameter}}
 +
 
  
./bam writeRegion --in <inputFilename>  --out <outputFilename> [--bamIndex <bamIndexFile>] [--refName <reference Name> | --refID <reference ID>] [--start <0-based start pos>] [--end <0-based end psoition>] [--bed <bed filename>] [--withinRegion] [--readName <readName>] [--params] [--noeof]
 
 
 
= Return Value =
 
= Return Value =
 
*    0: all records are successfully read and written.
 
*    0: all records are successfully read and written.

Revision as of 13:12, 6 March 2012


Overview of the writeRegion function of bamUtil

The writeRegion option on the bamUtil executable uses an indexed BAM file to only write the alignments that:

  • fall within the region specified
    • region as defined by refID or refName and start and/or end
    • region as defined in the bed file
    • overlapping or fully within if --withinReg is specified
  • have a specific read name (if specified)


Usage

./bam writeRegion --in <inputFilename>  --out <outputFilename> [--bamIndex <bamIndexFile>] [--refName <reference Name> | --refID <reference ID>] [--start <0-based start pos>] [--end <0-based end psoition>] [--bed <bed filename>] [--withinRegion] [--readName <readName>] [--lshift] [--params] [--noeof]


Parameters

	Required Parameters:
		--in        : the BAM file to be read
		--out       : the SAM/BAM file to write to
	Optional Parameters for Specifying a Region:
		--bamIndex  : the path/name of the bam index file
		              (if not specified, uses the --in value + ".bai")
		--refName   : the BAM reference Name to read
		              Either this or refID can be specified.
		              Defaults to all references.
		--refID     : the BAM reference ID to read.
		              Either this or refName can be specified.
		              Defaults to all references.
		              Specify -1 for unmapped
		--start     : inclusive 0-based start position.
		              Defaults to -1: meaning from the start of the reference.
		              Only applicable if refName/refID is set.
		--end       : exclusive 0-based end position.
		              Defaults to -1: meaning til the end of the reference.
		              Only applicable if refName/refID is set.
		--bed       : use the specified bed file for regions.
		--withinReg : only print reads fully enclosed within the region.
		--readName  : only print reads with this read name.
	Optional Parameters For Other Operations:
		--lshift    : left shift indels when writing records
		--params    : print the parameter settings
		--noeof     : do not expect an EOF block on a bam file.

Input File (--in)

Use --in followed by your file name to specify the SAM/BAM input file.

The program automatically determines if your input file is SAM/BAM/uncompressed BAM without any input other than a filename from the user, unless your input file is stdin.

A - is used to indicate to read from stdin and the extension is used to determine the file type (no extension indicates SAM).

SAM/BAM/Uncompressed BAM from file --in yourFileName
SAM from stdin --in -
BAM from stdin --in -.bam
Uncompressed BAM from stdin --in -.ubam


Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools implementation so pipes between our tools and samtools are supported.

Output File (--out)

Use --out followed by your file name to specify the SAM/BAM output file.

The file extension is used to determine whether to write SAM/BAM/uncompressed BAM. A - is used to indicate stdout and the extension for file type (no extension is SAM).

SAM to file --out yourFileName.sam
BAM to file --out yourFileName.bam
Uncompressed BAM to file --out yourFileName.ubam
SAM to stdout --out -
BAM to stdout --out -.bam
Uncompressed BAM to stdout --out -.ubam


Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools implementation so pipes between our tools and samtools are supported.

Bam Index File (--bamIndex)

Use --bamIndex followed by your file name to specify the BAM index file to use for reading the BAM file.

If this file is required but not specified, it will use the input file name + ".bai".

Region Specifying Parameters

Read only a Specific Reference/Chromosome (--refName or --refID)

If you only want to read a specific reference (chromosome), specify either the reference name following --refName or the reference id following --refID.

If you want to read all references, don't specify either --refName or --refID.

The reference Name is the name specified in the RNAME field of the records in the SAM file or in the name fields of the reference information section of the BAM file.

The reference ID is the value specified in the refID field of the records in the BAM file.

If you want to read only unmapped reads, use --refID -1

Read only a Specific Region of a Chromosome (--start and --end)

You can only specify a specific region if you also specify a specific reference/chromosome using --refName or --refID.

Use --start to specify the inclusive 0-based start position of the region you want to read. Specify --start -1 to specify start at the beginning of the specified chromosome.

Use --end to specify the exclusive 0-based end position of the region you want to read. Specify --end -1 to specify end of the specified chromosome.

Bed File with Regions to Write (--bed)

If --bed followed by a filename is specified the regions specified in the bed file will be written.

It is assumed that the regions in the bed file are sorted.

Only Write Reads Fully within the Specified Region (--withinReg)

By default reads that overlap the specified region are written. If instead you only want to write reads that are fully within the specified regions, use the --withinReg option.

Only Print Reads with a Specified Read Name (code>--readName)

If you only want to print reads with a specific read name, use the --readName option followed by the read name.


Left Shift Indels in the CIGAR (--lshift)

Left shift indels as far as they can go in the read.

Do not require BGZF EOF block (--noeof)

Use --noeof if you do not expect a trailing eof block in your bgzf file.

By default, the trailing empty block is expected and checked for.

Print the Program Parameters (--params)

Use --params to print the parameters for your program to stderr.


Return Value

  • 0: all records are successfully read and written.
  • non-0: at least one record was not successfully read or written.

Example Output


Wrote t.sam with 2 records.