Changes

QPLOT (view source)

Revision as of 11:38, 14 February 2012

15 bytes added , 11:38, 14 February 2012

no edit summary

Line 1: Line 1: −

== Introduction ==

+

= Introduction =

The qplot program is to calculate various summary statistics some of which will be plotted in a pdf file which can be used to assess the sequencing quality for illumina sequencing after mapping reads to the reference genome. The main statistics are empirical Phred scores which was calculated based on the background mismatch rate. By background mismatch rate, it means the rate that sequenced bases are different from the reference genome, EXCLUDING dbSNP positions. Other statistics include GC biases, insert size distribution, depth distribution, genome coverage, empirical Q20 count and so on. An example plot and summary text will follow at the end

−

== Where to Find It ==

+

= Where to Find It =

−

~~{{ToolGitRepo|repoName=qplot|noDownload=}}~~

−

=== Binary Download ===

+

== Binary Download ==

We have prepared pre-compiled qplot and you can download from: [http://www.sph.umich.edu/csg/zhanxw/software/qplot/qplot.20120213.tar.gz qplot.20120213.tar.gz (File Size: 1.7G)]

Line 16: Line 15:

You can also find example BAM input file under qplot/example/chrom20.9M.10M.bam. it is taken from 1000 Genome Project with sequencing reads aligned to chromosome 20 positioned 8M to 9M.

−

== Usage ==

+

== Source Code Distribution ==

+

= Usage =

−

=== Command line ===

+

== Command line ==

After you obtained qplot executable (either from compiling source codes or downloaded pre-compiled binary file), you will find executable file under qplot/bin/qplot. Here is the qplot help page

Line 38: Line 41:

Plot labels : --label [], --bamLabel []

−

=== Input files ===

+

== Input files ==

Three (3) precomputed files are required. Multiple bam/sam files should be appended after all other parameters.

Line 57: Line 60:

''Note'': Before running the qplot, it is critical to check how the chromosome numbers are coded. Some bam files use just numbers, others use chr + numbers. '''You need to make sure that the chromosome numbers from reference and dbSNP are consistent with the bam file.'''

−

=== Parameters ===

+

== Parameters ==

Most of command line parameters are self explanatory and some of them are described here

Line 105: Line 108:

Number of concurrent threads running for the input bam files. One bam file will be processed by one thread. Therefore using a number which is dividable by the number of input bam files will make it efficient. One extra thread requires memory about 375Mb on top of around 4Gb memory used to hold reference and GC content file.

−

=== Output files ===

+

== Output files ==

There are three (optional) output files.

Line 235: Line 238:

By checking "Empirical phred score by cycle" (top right graph on the first page), we notice the empirical qualities in the first several cycle are abnormally low. This question leads us hypnotize the first several bases have different properties. Further investigation revealed that this sequencing was done using bar-coded DNA samples, but the analysis did not properly de-multiplexing to each sample.

−

== Contact ==

+

= Contact =

Questions and requests should be sent to Bingshan Li ([mailto:bingshan@umich.edu bingshan@umich.edu])

Mktrost

Administrators

3,045

edits

Changes

QPLOT (view source)

Revision as of 11:38, 14 February 2012

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools