Changes

From Genome Analysis Wiki
Jump to navigationJump to search
385 bytes added ,  15:29, 14 August 2012
Line 1: Line 1:  
== Introduction ==
 
== Introduction ==
   −
* The tools calculates the read count for each region in the input list of regions from a BAM file, and also outputs the normalized read count as Read Per Million Mapped Reads per Kilobases (RPKM). To correct for the bias of the read count due to GC bias, it will also output the GC contact of each region along with the total reads mapped to the corresponding GC content bins. Bu providing this statistics, the extent of GC bias can be investigated and if excessive bias is observed this statistic can be used to correct the global GC bias.
+
'''NOTE: The current version works only for single end reads. If the input bam file contains paired end sequences, reads from the same fragment will be counted independently'''
   −
* This tool was initially developed for RNA-seq data to quantify gene/exon expression but can also be used for other sequences as well.
+
* The tools calculates the read count for each region in the input list of regions from a BAM file, and also outputs the normalized read count as Read Per Million Mapped Reads per Kilobases (RPKM). To correct for the bias of the read count due to GC bias, it will also output the GC content of each region along with the total reads mapped to the corresponding GC content bins. By providing this statistic, the extent of GC bias can be investigated and if excessive bias is observed this statistic can be used to correct the global GC bias.
 +
 
 +
* This tool was initially developed for RNA-seq data to quantify gene/exon expression but can also be used for other purposes as well.
    
== Usage ==
 
== Usage ==
Line 45: Line 47:     
  --min_overlap : minimum number of bases overlapping an input region to consider that the read is in the region.
 
  --min_overlap : minimum number of bases overlapping an input region to consider that the read is in the region.
 +
--min_map_quality : reads with map quality below this number will be ignored
 
  --uniq : If a read is mapped to multiple regions (e.g. exons) this read will be counted only once toward the read count in the gene
 
  --uniq : If a read is mapped to multiple regions (e.g. exons) this read will be counted only once toward the read count in the gene
  --norm_by_mapped2target : If specified, the RPM and RPKM will be normalized by the total number of reads mapped to all target regions. The default is to use all mapped reads (including off-target reads) for normalization
+
  --norm_by_mapped2target : The RPM and RPKM will be normalized by reads mapped only to target regions. Default is to use all mapped reads in a bam file.
 
  --exon_cout : read count for each exon. A read may be counted to multiple exons if this read is mapped to multiple exons.
 
  --exon_cout : read count for each exon. A read may be counted to multiple exons if this read is mapped to multiple exons.
 
  --gene_cout : read count for each gene.
 
  --gene_cout : read count for each gene.
Line 69: Line 72:  
  AACSL:5 2171    15      0.780  0.359  49.65  443653
 
  AACSL:5 2171    15      0.780  0.359  49.65  443653
 
  AADAC:3 1607    172    8.949  5.569  38.39  468603
 
  AADAC:3 1607    172    8.949  5.569  38.39  468603
 +
 +
== Download ==
 +
The latest version of source code v0.01 can be [[Media:readcount.0.01.tar.gz | downloaded]] here. To compile, cd readcount.0.01/ and type make. The executable is under bin/ directory.
    
== Contact ==
 
== Contact ==
480

edits

Navigation menu