Difference between revisions of "LocusZoom"

From Genome Analysis Wiki
Jump to: navigation, search
(Generating a Hit Spec File)
(Generating a Hit Spec File)
Line 70: Line 70:
 
: This option allows you to plot results for all markers within a specific distance (e.g. 500kb) of an index SNP. To use this option, set column 1 to have the name of the index SNP (e.g. ''rs2'' below) and set column 5 to specify the width of the region of interest (e.g. 500kb below). Here is an example:
 
: This option allows you to plot results for all markers within a specific distance (e.g. 500kb) of an index SNP. To use this option, set column 1 to have the name of the index SNP (e.g. ''rs2'' below) and set column 5 to specify the width of the region of interest (e.g. 500kb below). Here is an example:
  
<code>
+
<code lang="text">
 
Feature    chr    start    end      flank        plot    arguments
 
Feature    chr    start    end      flank        plot    arguments
 
rs1     na     na       na      500kb        yes      rfrows=3 weightCol=”N” snpset=”HapMap” metalRug=”Our SNPs”  
 
rs1     na     na       na      500kb        yes      rfrows=3 weightCol=”N” snpset=”HapMap” metalRug=”Our SNPs”  

Revision as of 09:11, 15 April 2010

LocusZoomSmall.png

LocusZoom is designed to facilitate viewing of local association results together with useful information about a locus, such as the location and orientation of the genes it includes, linkage disequilibrium coefficients and local estimates of recombination rates. It was developed by popular demand, as a result of many questions we have had about "How did you make the figures in your talk?" or "How did you make the figures for your GWAS paper?" (And for better or for worse, we have quite a few GWAS papers!!).

LocusZoom can be used in three ways:

Plot Summaries of Your Genomewide Scan Interactively
You can upload summary results of your own genomewide scan or genomewide meta-analysis and request plots of regions of interest using a web-based form.
Generate Many Plots in Batch Mode
You can upload summary results for your genomewide scan or genomewide meta-analysis and request several plots in one go by uploading a batch file. You will receive results via e-mail. A snail-mail option is not available.
Plot Summaries of Publicly Available Datasets
Currently, this includes the results of our genome-wide scan for variants associated with HDL-cholesterol, LDL-cholesterol and triglyceride levels in ~20,000 individuals.

We are also developing a distributable code package that you can install on your own system to generate plots locally. This is not yet available, but is expected in April 2010.

Upload your own meta-analysis file and generate single plots using a web-based form

Uploading Your Association Study Results

Association results can be uploaded to our web server using the plot your data webpage. Result files are limited to 20Mb in size, which allows for a gzipped text table including key columns (marker name, p-value and sample size) for up to ~3 million SNPs. In our tests, a typical GWAS results file is ~17 Mb in size after imputation of HapMap SNPs. Once a file is uploaded, LocusZoom will remember the file for the duration of your web session allowing you to generate multiple plots. If you have a slow connection or would like to save time, you can upload results for a region or chromosome of interest only. Your results are entirely confidential and won't be viewed by us or anyone else (except those with whom you share them!)

To specify the region to be plotted, you will have to specify the name of a key marker in the region (typically, as an rs-number), name a gene of interest or provide appropriate genome coordinates.

If you include a sample size column in the result file, it will be used to control the size of each plotted marker.

Customizing the Display of Your Results

All options listed in the Main Table above are available, as well as the options listed below

Setting Default Value Details
Column Delimiter none Users must specify the type of column delimiter in the results file
Pvalue Column Name none Users must specify the name of the column that contains the p-values
Marker Column Name none Users must specify the heading of the column that contains marker names
Human Genome Build none Plots can be generated based on hg18 (default) or hg17 positions
HapMap Population for LD none This option allows the user to specify which HapMap population was used to obtain LD estimates. The default is CEU but users may select YRI or JPT+CHB

Using Batch Mode

To start batch mode, first upload your results file just as you would in interactive mode. The same file size restrictions apply.

Generating a Hit Spec File

Batch mode allows you to conveniently specify a set of plots to be generated in "Hit Spec" file. This is handy if you need to generate large numbers of plots or if you want to plot the same set of regions after updating a genomewide analysis (for example).

The "Hit Spec" file is a whitespace delimited text file. The file has six mandatory columns which can be followed by a series of optional key=value pairs to allow for detailed customization of each plot. The first line in the file is assumed to be a header and is ignored. Each subsequent line describes a single plot. There are three ways to select a region to plot:

Plotting a window flanking an interesting SNP
This option allows you to plot results for all markers within a specific distance (e.g. 500kb) of an index SNP. To use this option, set column 1 to have the name of the index SNP (e.g. rs2 below) and set column 5 to specify the width of the region of interest (e.g. 500kb below). Here is an example:

Feature chr start end flank plot arguments rs1 na na na 500kb yes rfrows=3 weightCol=”N” snpset=”HapMap” metalRug=”Our SNPs”

Plotting a region flanking an interesting SNP
This option is similar to the previous option, but allows you to specify an assymetric region of interest. For example, perhaps you interested in a plot that extends a bit further to the right of the SNP of interest. In this case, specify the coordinates of the region to be plotted in columns 2, 3, and 4. Here is an example:

Feature chr start end flank plot arguments rs2 1 540000 580000 na yes rfrows=4 legend=”right” showAnnot=T

Plotting a region flanking a gene of interest
This option allows you to focus on a particular gene, rather than a specific SNP. It is similar to the first option. You should set column 1 to be the name of the gene of interest and column 5 to be the desired window width. When you use this option, LocusZoom will automatically select an index SNP for each region; the SNP will be the site with the smallest p-value. Here is an example:

Feature chr start end flank plot arguments CETP na na na 200kb yes rfrows=6 showAnnot=T annotPch=”1,24,24,25,22,21,8,7”

The sixth column in the "Hit Spec" file can be used to enable (with the value yes) or disable (with the value no) an individual plot. For example, if you run a "Hit Spec" file with 15 plots and 14 of them turn out very nicely, you may wish to re-run the "Hit Spec" file with some tweaks to the problem plot. In this case (if you dislike waiting for your results as much as we do!), you could disable generation of the plots that seem nice by changing the 6th column to “no” and leave the plot that you tweaked as a “yes”.

The 7th and final column contains additional LocusZoom arguments as key=value pairs. Any number of key=value pair arguments can be included. For details of available options, see the section entitled LocusZoom options below.

Generate single plots using our publicly-available lipids GWAS data

  1. Selecting regions to display using our lipids data
    The plots were designed to examine ~ 1 Megabase windows of the genome, although for regions with several association signals or long-range linkage disequilibrium patterns, plots extending as large as a few Mb can be drawn. The user can specify the region to display in the LocusZoom plot in one of three ways; 1) an index SNP and a flanking region, 2) the chromosome together with start and stop positions (in basepairs), 3) gene name and a flanking region.
  2. Displaying LD information
    In the main plot window, data points are colored according to their level of linkage disequilibrium (LD) with the index SNP. If users specify the region to display using an index SNP and flanking region, LD of all data points will be relative to the user-specified index SNP. If users specify the region to display using options 2 and 3 above, LocusZoom will select the most significant SNP in the region. For all other SNPs in the plot, the color of the data point will reflect the pair-wise LD patterns with this index SNP. The default LD which will be displayed is r2 from the HapMap CEU population (release 22), but users have the option to select either r2 or D’ from; HapMap CEU, HapMap YRI, Hapmap CHB+JPT, 1000 Genomes CEU. Because we have pre-computed LD for all SNPs in HapMap CEU, plots will generate very quickly if using the default LD information, provided the region to display is less than 500kb either side of the index SNP. SNPs with missing LD information are shown in grey.

Table 1.3 Additional options available from the web form

Web Form Option Batch Mode Command File Option Description
Title on Plot title=”My Favorite Locus” The title above the plot can be specified
Human Genome Build n/a (must be selected from web form) Plots can be generated based on hg 18 (default) or hg17 positions
Legend Location legend=”left” This specifies the location of the legend within the plot, the default is auto. Auto tries to select the preferential location (either left or right) depending on the location of data points.

(auto, left, right, none)

Show RUG snpset=”HapMap”
To display rug for SNPs in analysis file;
metalRug=”Rug SNPs”
Show a “rug” at the top of the plot – a series of vertical tick marks highlighting the positions of SNPs from HapMap CEU (here given as “HapMap”) or the markers shown in the plot (use metalRug). Remove the rug in batch mode using snpset=NULL. Other options include "Affy500",or "Illu318", or use "Affy500,Illu318,HapMap" to see all 3.
Maximum Rows of Gene Names rfrows=4 LocusZoom will automatically determine the optimal number of rows to display genes and gene names so they are not overlapping. However, if the user wishes to keep all plots the same size, the maximum number of gene rows can be specified. Additional genes may be left off the figure to accommodate this feature so please use with caution. If genes are missing from the plot, this will be indicated on the plot.
Point Size Proportional to Sample Size weightCol=”SampleSize” This specifies that the “dot size” of the data points will reflect the square-root of the sample size (to reflect the s.e.). The default is to have all dot sizes remain the same size.
LD Measure ldCol=”dprime” (“rsquare”) The color of the data points reflects the LD (r2) with the index SNP. The default is "rsquare".
HapMap Population for LD n/a (must be selected from web form) This option allows the user to specify which population is used to obtain LD estimates. The default is CEU from HapMap Phase II but users may select YRI or JPT+CHB from HapMap Phase II, or CEU from 1000 Genomes (August 2009 release).
Highlight Region of Interest hiStart=425Mb
hiEnd=425.1Mb
A grey box can be used to highlight important regions of the genome – this can reflect the region of an association signal or a region being sequenced, etc.
Theme theme=”publication” We have created a theme that has larger text and is more easily readable for publication.
Show Annotation showAnnot=T
showRefsnpAnnot=T
annotPch=”21,24,24,25,22,22,8,7”
SNP annotation is available for all 1000G SNPs (Aug 2009 release) and can be displayed on the plot using this option. On the website, various annotation options can be turned on or off.
Certain annotation fields can be turned on or off using the annotPch command. To show several categories of SNPs as the same symbol, simply give the same R symbol code for those categories (e.g. annotPch=”21,24,24,25,22,22,8,7”). The category listings, together with their default symbol setting are;
Framestop (24, triangle)
Splice (24, triangle)
NonSynonymous (25, inverted triangle)
Synonymous (22, square)
UTR (22, square)
TFBScons (8, star)
MCS44 Placental (7, square with diagonal lines)
None-of-the-above (21, filled circle).
For more information about the annotation categories used, please see http://research.nhgri.nih.gov/tools/unisnp/?rm=ohelp
Recombination Rate Overlay showRecomb=T The estimated recombination rate from HapMap samples can be shown on the plot, or left off. The data plotted are from Hapmap; http://hapmap.ncbi.nlm.nih.gov/downloads/recombination/2008-03_rel22_B36/rates/