Changes

From Genome Analysis Wiki
Jump to navigationJump to search
890 bytes added ,  15:29, 14 August 2012
Line 1: Line 1:  
== Introduction ==
 
== Introduction ==
   −
* The tools calculates the read count for each region in the input list of regions from a BAM file, and also outputs the normalized read count as Read Per Million Mapped Reads per Kilobases (RPKM). To correct for the bias of the read count due to GC bias, it will also output the GC contact of each region along with the total reads mapped to the corresponding GC content bins. Bu providing this statistics, the extent of GC bias can be investigated and if excessive bias is observed this statistic can be used to correct the global GC bias.
+
'''NOTE: The current version works only for single end reads. If the input bam file contains paired end sequences, reads from the same fragment will be counted independently'''
   −
* This tool was initially developed for RNA-seq data to quantify gene/exon expression but can also be used for other sequences as well.
+
* The tools calculates the read count for each region in the input list of regions from a BAM file, and also outputs the normalized read count as Read Per Million Mapped Reads per Kilobases (RPKM). To correct for the bias of the read count due to GC bias, it will also output the GC content of each region along with the total reads mapped to the corresponding GC content bins. By providing this statistic, the extent of GC bias can be investigated and if excessive bias is observed this statistic can be used to correct the global GC bias.
 +
 
 +
* This tool was initially developed for RNA-seq data to quantify gene/exon expression but can also be used for other purposes as well.
    
== Usage ==
 
== Usage ==
Line 22: Line 24:     
  ./readCount --reference hg19.fa --regions refFalt.exon.hg19 --min_overlap 5 --uniq --exon_out exon.readcount --gene_out gene.readcount input.bam
 
  ./readCount --reference hg19.fa --regions refFalt.exon.hg19 --min_overlap 5 --uniq --exon_out exon.readcount --gene_out gene.readcount input.bam
 +
 +
The following example generate normalized read counts using the number of reads that are only mapped to targets:
 +
 +
./readCount --reference hg19.fa --regions refFalt.exon.hg19 --min_overlap 5 --uniq --norm_by_mapped2target --exon_out exon.readcount --gene_out gene.readcount input.bam
    
== Input files ==
 
== Input files ==
Line 27: Line 33:  
Required input files are --reference, --regions and input.bam
 
Required input files are --reference, --regions and input.bam
   −
* An example --regions file is a BEF file and looks like the following:
+
* An example --regions file is a BED file and looks like the following:
 
  1  1000  2000  GENE1
 
  1  1000  2000  GENE1
 
  1  5000  6000  GENE1
 
  1  5000  6000  GENE1
Line 41: Line 47:     
  --min_overlap : minimum number of bases overlapping an input region to consider that the read is in the region.
 
  --min_overlap : minimum number of bases overlapping an input region to consider that the read is in the region.
 +
--min_map_quality : reads with map quality below this number will be ignored
 
  --uniq : If a read is mapped to multiple regions (e.g. exons) this read will be counted only once toward the read count in the gene
 
  --uniq : If a read is mapped to multiple regions (e.g. exons) this read will be counted only once toward the read count in the gene
 +
--norm_by_mapped2target : The RPM and RPKM will be normalized by reads mapped only to target regions. Default is to use all mapped reads in a bam file.
 
  --exon_cout : read count for each exon. A read may be counted to multiple exons if this read is mapped to multiple exons.
 
  --exon_cout : read count for each exon. A read may be counted to multiple exons if this read is mapped to multiple exons.
 
  --gene_cout : read count for each gene.
 
  --gene_cout : read count for each gene.
      
== Output files ==
 
== Output files ==
Line 65: Line 72:  
  AACSL:5 2171    15      0.780  0.359  49.65  443653
 
  AACSL:5 2171    15      0.780  0.359  49.65  443653
 
  AADAC:3 1607    172    8.949  5.569  38.39  468603
 
  AADAC:3 1607    172    8.949  5.569  38.39  468603
 +
 +
== Download ==
 +
The latest version of source code v0.01 can be [[Media:readcount.0.01.tar.gz | downloaded]] here. To compile, cd readcount.0.01/ and type make. The executable is under bin/ directory.
    
== Contact ==
 
== Contact ==
480

edits

Navigation menu