Changes

385 bytes added , 15:29, 14 August 2012

Line 1: Line 1:

== Introduction ==

−

* The ~~tools calculates the read count~~ for ~~each region in~~ the input ~~list of regions from a BAM~~ file, ~~and also outputs~~ the ~~normalized read count as Read Per Million Mapped Reads per Kilobases (RPKM). To correct for the bias of the read count due to GC bias, it~~ will ~~also output the GC contact of each region along with the total reads mapped to the corresponding GC content bins. Bu providing this statistics, the extent of GC bias can~~ be ~~investigated and if excessive bias is observed this statistic can be used to correct the global GC bias.~~

+

'''NOTE: The current version works only for single end reads. If the input bam file contains paired end sequences, reads from the same fragment will be counted independently'''

−

* This tool was initially developed for RNA-seq data to quantify gene/exon expression but can also be used for other ~~sequences~~ as well.

+

* The tools calculates the read count for each region in the input list of regions from a BAM file, and also outputs the normalized read count as Read Per Million Mapped Reads per Kilobases (RPKM). To correct for the bias of the read count due to GC bias, it will also output the GC content of each region along with the total reads mapped to the corresponding GC content bins. By providing this statistic, the extent of GC bias can be investigated and if excessive bias is observed this statistic can be used to correct the global GC bias.

+

* This tool was initially developed for RNA-seq data to quantify gene/exon expression but can also be used for other purposes as well.

== Usage ==

Line 45: Line 47:

--min_overlap : minimum number of bases overlapping an input region to consider that the read is in the region.

+

--min_map_quality : reads with map quality below this number will be ignored

--uniq : If a read is mapped to multiple regions (e.g. exons) this read will be counted only once toward the read count in the gene

−

--norm_by_mapped2target : ~~If specified, the~~ RPM and RPKM will be normalized by ~~the total number of~~ reads mapped to ~~all~~ target regions. ~~The default~~ is to use all mapped reads ~~(including off-target reads) for normalization~~

+

--norm_by_mapped2target : The RPM and RPKM will be normalized by reads mapped only to target regions. Default is to use all mapped reads in a bam file.

--exon_cout : read count for each exon. A read may be counted to multiple exons if this read is mapped to multiple exons.

--gene_cout : read count for each gene.

Line 69: Line 72:

AACSL:5 2171 15 0.780 0.359 49.65 443653

AADAC:3 1607 172 8.949 5.569 38.39 468603

+

== Download ==

+

The latest version of source code v0.01 can be [[Media:readcount.0.01.tar.gz | downloaded]] here. To compile, cd readcount.0.01/ and type make. The executable is under bin/ directory.

== Contact ==

Bingshan

480

edits

Changes

Bam read count (view source)

Revision as of 15:29, 14 August 2012

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools