Changes

From Genome Analysis Wiki
Jump to navigationJump to search
2,551 bytes added ,  23:46, 13 February 2012
no edit summary
Line 8: Line 8:  
=== Binary Download ===
 
=== Binary Download ===
   −
We have prepared pre-compile qplot and you can download from: [http://www.sph.umich.edu/csg/zhanxw/software/qplot/qplot.20120213.tar.gz qplot.20120213.tar.gz (File Size: 1.7G)]  
+
We have prepared pre-compiled qplot and you can download from: [http://www.sph.umich.edu/csg/zhanxw/software/qplot/qplot.20120213.tar.gz qplot.20120213.tar.gz (File Size: 1.7G)]  
    
The executable file is under qplot/bin/qplot.  
 
The executable file is under qplot/bin/qplot.  
Line 17: Line 17:     
== Usage ==
 
== Usage ==
Go to qplot and make will create an executable qplot in the qplot sub-directory. Here is the qplot help page
     −
   linux_host > ./qplot
+
=== Command line ===
 +
After you obtained qplot executable (either from compiling source codes or downloaded pre-compiled binary file), you will find executable file under qplot/bin/qplot. Here is the qplot help page
 +
 
 +
   some_linux_host > qplot/bin/qplot
 +
 
 
   The following parameters are available.  Ones with "[]" are in effect:
 
   The following parameters are available.  Ones with "[]" are in effect:
 +
 
               References : --reference [/data/local/ref/karma.ref/human.g1k.v37.umfa],
 
               References : --reference [/data/local/ref/karma.ref/human.g1k.v37.umfa],
 
                           --dbsnp [/home/bingshan/data/db/dbSNP/dbSNP130.UCSC.coordinates.tbl],
 
                           --dbsnp [/home/bingshan/data/db/dbSNP/dbSNP130.UCSC.coordinates.tbl],
Line 34: Line 38:  
             Plot labels : --label [], --bamLabel []
 
             Plot labels : --label [], --bamLabel []
   −
== Input files ==
+
=== Input files ===
    
Three (3) precomputed files are required. Multiple bam/sam files should be appended after all other parameters.
 
Three (3) precomputed files are required. Multiple bam/sam files should be appended after all other parameters.
Line 53: Line 57:  
''Note'': Before running the qplot, it is critical to check how the chromosome numbers are coded. Some bam files use just numbers, others use chr + numbers. '''You need to make sure that the chromosome numbers from reference and dbSNP are consistent with the bam file.'''
 
''Note'': Before running the qplot, it is critical to check how the chromosome numbers are coded. Some bam files use just numbers, others use chr + numbers. '''You need to make sure that the chromosome numbers from reference and dbSNP are consistent with the bam file.'''
   −
== Parameters ==
+
=== Parameters ===
    
Most of command line parameters are self explanatory and some of them are described here
 
Most of command line parameters are self explanatory and some of them are described here
Line 77: Line 81:  
'''NOTE''' In order for this to work, the lane info has to be encoded in the read name such that lane number is the second field with the delimit of ":".
 
'''NOTE''' In order for this to work, the lane info has to be encoded in the read name such that lane number is the second field with the delimit of ":".
   −
* Region list
+
*Mapping filters
 +
 
 +
Qplot will exclude reads with lower mapping qualities than user specified parameter. By default, all reads will be included in analysis.
 +
 
 +
*Region list
    
If the interest of qplot is a list of regions, e.g. exons, this can be achieved by providing a list of region. The regions should be in the form of "chr start end label" each line in the file. In order for this option to work, within each chromosome (contig) the regions have to be sorted by starting position, and also the input bam files have to be sorted.  
 
If the interest of qplot is a list of regions, e.g. exons, this can be achieved by providing a list of region. The regions should be in the form of "chr start end label" each line in the file. In order for this option to work, within each chromosome (contig) the regions have to be sorted by starting position, and also the input bam files have to be sorted.  
Line 84: Line 92:  
  2 100 300 region_C
 
  2 100 300 region_C
 
  ...
 
  ...
 +
 +
Qplot also provide --invertRegion option. Enabling this option will let qplot calculate those sequence bases that are off the given region.
 +
    
* Plot labels
 
* Plot labels
Line 94: Line 105:  
Number of concurrent threads running for the input bam files. One bam file will be processed by one thread. Therefore using a number which is dividable by the number of input bam files will make it efficient. One extra thread requires memory about 375Mb on top of around 4Gb memory used to hold reference and GC content file.
 
Number of concurrent threads running for the input bam files. One bam file will be processed by one thread. Therefore using a number which is dividable by the number of input bam files will make it efficient. One extra thread requires memory about 375Mb on top of around 4Gb memory used to hold reference and GC content file.
   −
== Output files ==
+
=== Output files ===
    
There are three (optional) output files.
 
There are three (optional) output files.
Line 109: Line 120:  
Qplot will generate ''qa.R'' which is R code used for plotting the figures in ''qa.pdf'' file. If Rscript is not installed in the system, you can use the qa.R to generate the figures in other machines, or extract plotting data from each run and combine multiple runs together to generate more comprehensive plots (See [[Example]]).
 
Qplot will generate ''qa.R'' which is R code used for plotting the figures in ''qa.pdf'' file. If Rscript is not installed in the system, you can use the qa.R to generate the figures in other machines, or extract plotting data from each run and combine multiple runs together to generate more comprehensive plots (See [[Example]]).
   −
== Example output ==
+
== Example ==
 +
 
 +
Qplot can generate diagnostic graphs, related R code and summary statistics for each sam/bam files.
 +
 
 +
=== Build-in example ===
 +
 
 +
In pre-compiled binary file, you will find a subdirectory named examples. We provide a sample file from 1000 Genome project, it contained aligned read on chromosome 20 from position 8 Mbp to 9Mbp. You can use qplot using the following commandline:
 +
 
 +
../bin/qplot --reference ../data/human.g1k.v37.umfa --dbsnp ../data/dbSNP130.UCSC.coordinates.tbl --gccontent ../data/human.g1k.w100.gc --plot qplot.pdf --stats qplot.stats --Rcode qplot.R --label "chr20:9M-10M" chrom20.9M.10M.bam
 +
 
 +
Sample outputs are listed below:
 +
 
 +
Figure: [[Media:qplot.pdf | qplot.pdf]]
 +
 
 +
Summary statistics:
 +
Stats\BAM      chrom20.9M.10M.bam
 +
TotalReads(e6)  1.11
 +
MappingRate(%)  97.24
 +
MapRate_MQpass(%)      97.24
 +
TargetMapping(%)        0.00
 +
ZeroMapQual(%)  2.39
 +
MapQual<10(%)  2.86
 +
PairedReads(%)  83.76
 +
ProperPaired(%) 71.34
 +
MappedBases(e9) 0.04
 +
Q20Bases(e9)    0.04
 +
Q20BasesPct(%)  88.63
 +
MeanDepth      42.22
 +
GenomeCover(%)  0.03
 +
EPS_MSE 1.81
 +
EPS_Cycle_Mean  18.71
 +
GCBiasMSE      0.01
 +
ISize_mode      137
 +
ISize_medium    184
 +
DupRate(%)      5.90
 +
QCFailRate(%)  0.00
 +
BaseComp_A(%)  29.9
 +
BaseComp_C(%)  20.1
 +
BaseComp_G(%)  20.2
 +
BaseComp_T(%)  29.8
 +
BaseComp_O(%)  0.1
 +
 
 +
 +
=== Gallery of examples ===
 +
 
 +
Here we show qplot can be applied in different sequencing scenarios. Also users can customize statistics generated by qplot in various formats.
 +
 
 +
* Whole genome sequencing with more than one lanes
 +
 
 +
 
 +
Figures
   −
* Figures
   
  https://statgen.sph.umich.edu/w/images/5/53/Sardinia_Run_84_QA.pdf
 
  https://statgen.sph.umich.edu/w/images/5/53/Sardinia_Run_84_QA.pdf
   −
* Summary statistics text file
+
Summary statistics text file
 
  TotalReads(e6)  72.94  64.52  74.87  62.25  67.21
 
  TotalReads(e6)  72.94  64.52  74.87  62.25  67.21
 
  MappingRate(%)  97.62  97.51  97.75  97.52  97.35
 
  MappingRate(%)  97.62  97.51  97.75  97.52  97.35
Line 140: Line 200:  
  BaseComp_T(%)  26.8    27.1    26.8    27.3    26.9
 
  BaseComp_T(%)  26.8    27.1    26.8    27.3    26.9
 
  BaseComp_O(%)  0.0    0.0    0.0    0.0    0.0
 
  BaseComp_O(%)  0.0    0.0    0.0    0.0    0.0
 +
 +
* Whole genome sequencing with 24-multiplexing
 +
 +
With customized script, we aggregated 24 bar-coded samples in the same graph.
 +
The graph will help compare sequencing quality between samples.
 +
 +
[[Media: qplot.Pool.9847.pdf | QPlot of 24 samples(PDF) ]]
 +
 +
* Interactive qplot
 +
 +
Qplot can be interactive. In the following example, you can use scroll mouse to zoom in, zoom out each graph; pan to certain part of graph.
 +
By presenting qplot data in web page, users can identify problematic sequencing samples easily.
 +
 +
[http://www-personal.umich.edu/~zhanxw/qplot.Pool.9847.html  QPlot of 24 samples(HTML) ]
    
== Contact ==
 
== Contact ==
    
Questions and requests should be sent to Bingshan Li ([mailto:bingshan@umich.edu bingshan@umich.edu])
 
Questions and requests should be sent to Bingshan Li ([mailto:bingshan@umich.edu bingshan@umich.edu])
255

edits

Navigation menu