Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 2: Line 2:     
[[File:Pval-qq-sample.png|center]]
 
[[File:Pval-qq-sample.png|center]]
 +
 +
= Credit =
 +
 +
This page is based on a tutorial originally written by [mailto:mflick@umich.edu Matthew Flickinger]
    
== Credits ==
 
== Credits ==
Line 20: Line 24:  
);
 
);
 
</syntaxhighlight>
 
</syntaxhighlight>
 +
 +
== A Fancier QQ Plot by Matthew Flickinger ==
    
Unfortunately the simple way of doing it leaves out many of the things that are nice to have on the plot such as a reference line and a confidence interval plus if your data set is large it plots a lot of points that aren't very interesting in the lower left. Here is a more complex example that adds a few more niceties and thins the data to only plot meaningful points
 
Unfortunately the simple way of doing it leaves out many of the things that are nice to have on the plot such as a reference line and a confidence interval plus if your data set is large it plots a lot of points that aren't very interesting in the lower left. Here is a more complex example that adds a few more niceties and thins the data to only plot meaningful points
Line 30: Line 36:  
ylab=expression(paste("Observed (",-log[10], " p-value)")),  
 
ylab=expression(paste("Observed (",-log[10], " p-value)")),  
 
draw.conf=TRUE, conf.points=1000, conf.col="lightgray", conf.alpha=.05,
 
draw.conf=TRUE, conf.points=1000, conf.col="lightgray", conf.alpha=.05,
already.transformed=FALSE, pch=20, aspect="iso",  
+
already.transformed=FALSE, pch=20, aspect="iso", prepanel=prepanel.qqunif,
 
par.settings=list(superpose.symbol=list(pch=pch)), ...) {
 
par.settings=list(superpose.symbol=list(pch=pch)), ...) {
 
 
Line 40: Line 46:  
stop("pvalue vector is not numeric, can't draw plot")
 
stop("pvalue vector is not numeric, can't draw plot")
 
if (any(is.na(unlist(pvalues)))) stop("pvalue vector contains NA values, can't draw plot")
 
if (any(is.na(unlist(pvalues)))) stop("pvalue vector contains NA values, can't draw plot")
if (any(unlist(pvalues)==0)) stop("pvalue vector contains zeros, can't draw plot")
+
if (already.transformed==FALSE) {
 
+
if (any(unlist(pvalues)==0)) stop("pvalue vector contains zeros, can't draw plot")
 +
} else {
 +
if (any(unlist(pvalues)<0)) stop("-log10 pvalue vector contains negative values, can't draw plot")
 +
}
 
 
 
 
Line 111: Line 120:  
gc()
 
gc()
 
 
prepanel.origin = function(x,y,...) {
+
prepanel.qqunif= function(x,y,...) {
 
A = list()
 
A = list()
 
A$xlim = range(x, y)*1.02
 
A$xlim = range(x, y)*1.02
Line 121: Line 130:  
#draw the plot
 
#draw the plot
 
xyplot(pvalues~exp.x, groups=grp, xlab=xlab, ylab=ylab, aspect=aspect,
 
xyplot(pvalues~exp.x, groups=grp, xlab=xlab, ylab=ylab, aspect=aspect,
prepanel=prepanel.origin, scales=list(axs="i"), pch=pch,
+
prepanel=prepanel, scales=list(axs="i"), pch=pch,
 
panel = function(x, y, ...) {
 
panel = function(x, y, ...) {
 
if (draw.conf) {
 
if (draw.conf) {
Line 133: Line 142:  
}
 
}
 
</syntaxhighlight>
 
</syntaxhighlight>
 +
 +
=== Sample Usage ===
    
A sample call to this function would be
 
A sample call to this function would be
 +
 
<syntaxhighlight lang="rsplus">
 
<syntaxhighlight lang="rsplus">
 
qqunif.plot(my.pvalues) #these are the raw p-values, not log-transformed
 
qqunif.plot(my.pvalues) #these are the raw p-values, not log-transformed
 
</syntaxhighlight>
 
</syntaxhighlight>
   −
The confidence intervals are calculated using the fact that the standard uniform order statistics follow a beta distribution. The default settings will draw confidence intervals around the 1000 more significant points. You can change that with the <code>conf.points=</code> parameter and you can change the alpha level from the default .05 using the <tt>conf.alpha=</tt> parameter. If you wish to disable the confidence interval, use <tt>draw.conf=F</tt> in your call to <tt>qqunif.plot()</tt>.
+
=== Under the Hood: Multiple P-value Lists ===
   −
This function does thin the data by rounding the observer and expected -log10 p-values to two places by default. You can control the thinning with the <tt>should.thin=</tt>, <tt>thin.obs.places=</tt>, and <tt>thin.exp.places=</tt> parameters.
+
If you are comparing two-test or want to show data before and after it has been corrected for genomic control, you can pass multiple sets of p-values to the function via a list.
 
  −
The function should also accept any other lattice graphing parameters should you want to change the plot title (<tt>main=</tt>), plotting character (<tt>pch=</tt>), or plot colors (<tt>col=</tt> for points, <tt>conf.col=</tt> for confidence interval). By default the <tt>aspect="iso"</tt> parameter is set which ensures that the reference line lies on a 45-degree angle. If you have very significant results, this may make your plot taller than you would like. You can set the parameter to <tt>aspect="fill"</tt> to use the standard layout which stretches the values on each axis to take up as much room as possible.
     −
If you are comparing two-test or want to show data before and after it has been corrected for genomic control, you can pass multiple sets of p-values to the function via a list.
   
<syntaxhighlight lang="rsplus">
 
<syntaxhighlight lang="rsplus">
 
my.pvalue.list<-list("Study 1"=runif(10000), "Study 2"=runif(10000,0,.90))
 
my.pvalue.list<-list("Study 1"=runif(10000), "Study 2"=runif(10000,0,.90))
Line 151: Line 160:  
</syntaxhighlight>
 
</syntaxhighlight>
   −
Note that the confidence interval drawn depends on the total number of p-values given. When you pass in a list, the number of tests the confidence interval uses is determined by the vector with the '''least number of p-values''' - this gives the widest, most conservative confidence bands. Internally the different groups are drawn using the lattice superpose settings, so if you want more control over the color and shapes, you can use the <tt>par.settings=list(superpose.symbol=)</tt> settings. Furthermore, you can use any of the lattice methods of adding a legend to your plot. The names used in the legend correspond to the names of the elements in the list you pass in.
+
Internally the different groups are drawn using the lattice superpose settings, so if you want more control over the color and shapes, you can use the <tt>par.settings=list(superpose.symbol=)</tt> settings. Furthermore, you can use any of the lattice methods of adding a legend to your plot. The names used in the legend correspond to the names of the elements in the list you pass in.
 +
 
 +
=== Under the Hood: Confidence Intervals ===
 +
 
 +
The confidence intervals are calculated using the fact that the standard uniform order statistics follow a beta distribution. The default settings will draw confidence intervals around the 1000 more significant points. You can change that with the <tt>conf.points=</tt> parameter and you can change the alpha level from the default .05 using the <tt>conf.alpha=</tt> parameter. If you wish to disable the confidence interval, use <tt>draw.conf=F</tt> in your call to <tt>qqunif.plot()</tt>.
 +
 
 +
Note that the confidence interval drawn depends on the total number of p-values given. When you pass in a list, the number of tests the confidence interval uses is determined by the vector with the '''least number of p-values''' - this gives the widest, most conservative confidence bands.
 +
 
 +
=== Under the Hood: Thinning the Data ===
 +
 
 +
This function does thin the data by rounding the observer and expected -log10 p-values to two places by default. You can control the thinning with the <tt>should.thin=</tt>, <tt>thin.obs.places=</tt>, and <tt>thin.exp.places=</tt> parameters.
 +
 
 +
=== Under the Hood: Customizing Graphics ===
 +
 
 +
The function should also accept any other lattice graphing parameters should you want to change the plot title (<tt>main=</tt>), plotting character (<tt>pch=</tt>), or plot colors (<tt>col=</tt> for points, <tt>conf.col=</tt> for confidence interval). By default the <tt>aspect="iso"</tt> parameter is set which ensures that the reference line lies on a 45-degree angle. If you have very significant results, this may make your plot taller than you would like. You can set the parameter to <tt>aspect="fill"</tt> to use the standard layout which stretches the values on each axis to take up as much room as possible.
    
== R Base Graphics ==
 
== R Base Graphics ==

Navigation menu