Bam read count
From Genome Analysis Wiki
Jump to navigationJump to search
Introduction
- The tools calculates the read count for each region in the input list of regions from a BAM file, and also outputs the normalized read count as Read Per Million Mapped Reads per Kilobases (RPKM). To correct for the bias of the read count due to GC bias, it will also output the GC contact of each region along with the total reads mapped to the corresponding GC content bins. Bu providing this statistics, the extent of GC bias can be investigated and if excessive bias is observed this statistic can be used to correct the global GC bias.
- This tool was initially developed for RNA-seq data to quantify gene/exon expression but can also be used as other sequences as well.
Usage
A command without any input will display the basic usage
readcount
The following parameters are available. Ones with "[]" are in effect: Input Files : --reference [/data/local/ref/karma.ref/human.g1k.v37], --regions [/home/bingshan/data/db/refFlat.tbl/refFlat.exons.v37.tbl] Mapping filters : --min_overlap [1], --min_map_quality Normalization : --norm_by_mapped2target Output file : --exon_out [], --gene_out [], --uniq
An example command for quantify RNAseq gene/exon expression looks like the following:
./readCount --reference hg19.fa --regions refFalt.exon.hg19 --min_overlap 5 --uniq --exon_out exon.readcount --gene_out gene.readcount input.bam
Input files
Required input files are --reference, --regions and input.bam
- An example --regions file is a BEF file and looks like the following:
1 1000 2000 GENE1 1 5000 6000 GENE1 1 8000 9000 GENE1 2 1000 2000 GENE2 2 5000 6000 GENE2 2 8000 9000 GENE2 ...
Other options
Some of command line options are explained below and others are self-explanatory.
--min_overlap : minimum number of bases overlapping an input region to consider that the read is in the region. --uniq : If a read is mapped to multiple regions (e.g. exons) this read will be counted only once toward the read count in the gene --exon_cout : read count for each exon. A read may be counted to multiple exons if this read is mapped to multiple exons. --gene_cout : read count for each gene.
Output files
Contact
For questions please contact the authors (Bingshan Li: bingshan@umich.edu