Difference between revisions of "Bam read count"

From Genome Analysis Wiki
Jump to navigationJump to search
(Created page with '== Introduction == * The tools calculates the read count for each region in the input list of regions from a BAM file, and also outputs the normalized read count as Read Per Mil…')
 
Line 47: Line 47:
  
 
== Output files ==
 
== Output files ==
 +
 +
* --exon_out
 +
 +
##Mapped2targets reads: 19220055
 +
#Chr    Start  End    Len    ReadCnt RPM    RPKM    %GC    nReadGCBin
 +
12      53701240        53701497        AAAS    257    55      2.862  11.135  56.42  581629
 +
12      53701628        53701713        AAAS    85      16      0.832  9.794  60.00  324929
 +
12      53701835        53701917        AAAS    82      22      1.145  13.959  51.22  611596
 +
...
 +
 +
* --gene_out
 +
##Mapped2targets reads: 19220055
 +
#Gene  Len    ReadCnt RPM    RPKM    %GC    nReadGCBin
 +
AAAS:12 1836    360    18.730  10.202  57.52  389925
 +
AAA1:7  6184    0      0.000  0.000  37.18  289310
 +
AACS:12 3322    700    36.420  10.963  55.72  835010
 +
AACSL:5 2171    15      0.780  0.359  49.65  443653
 +
AADAC:3 1607    172    8.949  5.569  38.39  468603
  
 
== Contact ==
 
== Contact ==

Revision as of 18:13, 21 February 2012

Introduction

  • The tools calculates the read count for each region in the input list of regions from a BAM file, and also outputs the normalized read count as Read Per Million Mapped Reads per Kilobases (RPKM). To correct for the bias of the read count due to GC bias, it will also output the GC contact of each region along with the total reads mapped to the corresponding GC content bins. Bu providing this statistics, the extent of GC bias can be investigated and if excessive bias is observed this statistic can be used to correct the global GC bias.
  • This tool was initially developed for RNA-seq data to quantify gene/exon expression but can also be used as other sequences as well.

Usage

A command without any input will display the basic usage

readcount
The following parameters are available.  Ones with "[]" are in effect:
      Input Files : --reference [/data/local/ref/karma.ref/human.g1k.v37],
                    --regions [/home/bingshan/data/db/refFlat.tbl/refFlat.exons.v37.tbl]
  Mapping filters : --min_overlap [1], --min_map_quality
    Normalization : --norm_by_mapped2target
      Output file : --exon_out [], --gene_out [], --uniq


An example command for quantify RNAseq gene/exon expression looks like the following:

./readCount --reference hg19.fa --regions refFalt.exon.hg19 --min_overlap 5 --uniq --exon_out exon.readcount --gene_out gene.readcount input.bam

Input files

Required input files are --reference, --regions and input.bam

  • An example --regions file is a BEF file and looks like the following:
1  1000  2000  GENE1
1  5000  6000  GENE1
1  8000  9000  GENE1
2  1000  2000  GENE2
2  5000  6000  GENE2
2  8000  9000  GENE2
...

Other options

Some of command line options are explained below and others are self-explanatory.

--min_overlap : minimum number of bases overlapping an input region to consider that the read is in the region.
--uniq : If a read is mapped to multiple regions (e.g. exons) this read will be counted only once toward the read count in the gene
--exon_cout : read count for each exon. A read may be counted to multiple exons if this read is mapped to multiple exons.
--gene_cout : read count for each gene.

Output files

  • --exon_out
##Mapped2targets reads: 19220055
#Chr    Start   End     Len     ReadCnt RPM     RPKM    %GC     nReadGCBin
12      53701240        53701497        AAAS    257     55      2.862   11.135  56.42   581629
12      53701628        53701713        AAAS    85      16      0.832   9.794   60.00   324929
12      53701835        53701917        AAAS    82      22      1.145   13.959  51.22   611596
...
  • --gene_out
##Mapped2targets reads: 19220055
#Gene   Len     ReadCnt RPM     RPKM    %GC     nReadGCBin
AAAS:12 1836    360     18.730  10.202  57.52   389925
AAA1:7  6184    0       0.000   0.000   37.18   289310
AACS:12 3322    700     36.420  10.963  55.72   835010
AACSL:5 2171    15      0.780   0.359   49.65   443653
AADAC:3 1607    172     8.949   5.569   38.39   468603

Contact

For questions please contact the authors (Bingshan Li: bingshan@umich.edu