Changes

From Genome Analysis Wiki
Jump to navigationJump to search
2,291 bytes added ,  13:29, 28 October 2020
m
Make deprecation warning more visible.
Line 3: Line 3:  
'''LocusZoom''' is designed to facilitate viewing of local association results together with useful information about a locus, such as the location and orientation of the genes it includes, linkage disequilibrium coefficients and local estimates of recombination rates. It was developed by popular demand, as a result of many questions we have had about "How did you make the figures in your talk?" or "How did you make the figures for your GWAS paper?" (And for better or for worse, we have quite a few GWAS papers!!).
 
'''LocusZoom''' is designed to facilitate viewing of local association results together with useful information about a locus, such as the location and orientation of the genes it includes, linkage disequilibrium coefficients and local estimates of recombination rates. It was developed by popular demand, as a result of many questions we have had about "How did you make the figures in your talk?" or "How did you make the figures for your GWAS paper?" (And for better or for worse, we have quite a few GWAS papers!!).
   −
LocusZoom can be used in three ways:
+
LocusZoom can be used in four ways:
   −
; Plot Summaries of Your Genomewide Scan Interactively
+
; 1. Plot Summaries of Your Genomewide Scan Interactively
 
: You can upload summary results of your own genomewide scan or genomewide meta-analysis and request plots of regions of interest using a web-based form.
 
: You can upload summary results of your own genomewide scan or genomewide meta-analysis and request plots of regions of interest using a web-based form.
   −
; Generate Many Plots in Batch Mode
+
; 2. Generate Many Plots in Batch Mode
 
: You can upload summary results for your genomewide scan or genomewide meta-analysis and request several plots in one go by uploading a batch file. You will receive results via e-mail. A snail-mail option is not available.
 
: You can upload summary results for your genomewide scan or genomewide meta-analysis and request several plots in one go by uploading a batch file. You will receive results via e-mail. A snail-mail option is not available.
   −
; Plot Summaries of Publicly Available Datasets
+
; 3. Plot Summaries of Publicly Available Datasets
 
: Currently, this includes the results of [http://www.sph.umich.edu/csg/abecasis/public/lipids2008/ our genome-wide scan] for variants associated with HDL-cholesterol, LDL-cholesterol and triglyceride levels in ~20,000 individuals.
 
: Currently, this includes the results of [http://www.sph.umich.edu/csg/abecasis/public/lipids2008/ our genome-wide scan] for variants associated with HDL-cholesterol, LDL-cholesterol and triglyceride levels in ~20,000 individuals.
   −
We are also developing a distributable code package that you can install on your own system to generate plots locally. This is not yet available, but is expected in April 2010.
+
; 4. Download LocusZoom and run on your local unix machine
 +
: [http://genome.sph.umich.edu/wiki/LocusZoom_Standalone Download LocusZoom] and [http://genome.sph.umich.edu/wiki/LocusZoom_Standalone#Sources_of_SQLite_database_tables associated databases]
 +
 
    
== Upload your own meta-analysis file and generate single plots using a web-based form  ==
 
== Upload your own meta-analysis file and generate single plots using a web-based form  ==
Line 20: Line 22:  
=== Uploading Your Association Study Results ===
 
=== Uploading Your Association Study Results ===
   −
Association results can be uploaded to our web server using the [http://csg.sph.umich.edu/locuszoom/ plot your data webpage]. Result files are limited to 20Mb in size, which allows for a [[gzip|gzipped]] text table including key columns (marker name, p-value and sample size) for up to ~3 million SNPs. In our tests, a typical GWAS results file is ~17 Mb in size after imputation of HapMap SNPs. Once a file is uploaded, LocusZoom will remember the file for the duration of your web session allowing you to generate multiple plots. If you have a slow connection or would like to save time, you can upload results for a region or chromosome of interest only. Your results are entirely confidential and won't be viewed by us or anyone else (except those with whom you share them!)
     −
To specify the region to be plotted, you will have to specify the name of a key marker in the region (typically, as an rs-number), name a gene of interest or provide appropriate genome coordinates.  
+
<span style="color:red">'''The instructions below refer to a "legacy" service that is not actively maintained. For modern datasets, consider using our new [https://my.locuszoom.org my.locuszoom.org] service for the latest features, including manhattan plots and support for build GRCh38.'''</span>
 +
 
 +
 
 +
'''Please note: You CAN plot SNPs without rsid using chr6:20122013 format.'''
 +
 
 +
Association results can be uploaded to our web server using the [http://locuszoom.org/ plot your data webpage]. Result files are limited to 20Mb in size, which allows for a [[gzip|gzipped]] text table including key columns (marker name, p-value and sample size) for up to ~3 million SNPs. In our tests, a typical GWAS results file is ~17 Mb in size after imputation of HapMap SNPs. Once a file is uploaded, LocusZoom will remember the file for the duration of your web session allowing you to generate multiple plots. If you have a slow connection or would like to save time, you can upload results for a region or chromosome of interest only. Your results are entirely confidential and won't be viewed by us or anyone else (except those with whom you share them!)
 +
 
 +
To specify the region to be plotted, you will have to specify the name of a key marker in the region (typically, as an rs-number, but can be in chr:pos format), name a gene of interest or provide appropriate genome coordinates. When displaying linkage disequilibrium, plotting will be very fast for small windows when HapMap CEU linkage disequilibrium is requested (because pairwise coefficients have been precomputed) and will be a bit slower for larger windows (because linkage disequilibrium coefficients must be computed on the fly).
    
If you include a sample size column in the result file, it will be used to control the size of each plotted marker.
 
If you include a sample size column in the result file, it will be used to control the size of each plotted marker.
 +
 +
=== Custom Annotation ===
 +
 +
You may choose to have SNPs displayed using different plotting symbols to distinguish them from each other.  To implement this, in the section "Custom Annotation" in the box "Column Name", you need to provide the name of a column in your meta-analysis file.  This column will list a category for each SNP of your own choosing (i.e. "nonsynonymous", "splice","intronic",etc.) or ("Genotyped","Imputed"), however, the category names may not include any spaces.  To select the order of the categories to display in the legend and to match the order of pre-selected R plotting symbols (set as pch = 21, 22, 23, 24, 25, 4, 7, 8, 10, 11, 12, 13, 14, 3), you may provide the category names in the specified order in "Category Order" section of "Custom Annotation".  Each entry (which may not contain spaces) does not need quotes but each entry should be separated by commas.
 +
 +
Alternatively, we have provided functional annotation of all 1000 Genomes (Aug 2009) and HapMap r22 SNPs according to the following categories; Framestop (24, triangle), Splice (24, triangle), NonSynonymous (25, inverted triangle), Synonymous (22, square), UTR (22, square), TFBScons (8, star), MCS44 Placental (7, square with diagonal lines) and None-of-the-above (21, filled circle). This can be implemented using the section "Show Annotation" and clicking the box beside each annotation category that you would like distinguished.  SNPs that are not in any selected category will still be displayed as having no annotation.
 +
 +
=== Plotting of Pairwise Linkage Disequilibrium ===
 +
 +
In the main plot window, data points are colored according to their level of linkage disequilibrium (LD) of the each SNP with the index SNP. If users specify the region to display using an index SNP and flanking region, LD of all data points will be relative to the user-specified index SNP. If users specify the region to display using genome coordinates or a gene name, LocusZoom will automatically select the most significant SNP in the region as the index SNP. For all other SNPs in the plot, the color of the data point will reflect the pairwise LD with this index SNP. The default LD measure is r<sup>2</sup> calculated from the HapMap CEU population (release 22), but users have the option to replace this with D’ and of selecting the HapMap YRI, Hapmap CHB+JPT or 1000 Genomes CEU reference panels. To  display LD from 1000G CEU, please substitute rsid's for 1000G naming convention (chrxx:xxxx) whenever possible.  Because we have pre-computed LD for all SNPs in HapMap CEU, plots will often generate more quickly if using the default LD information. SNPs with missing LD information are shown in grey.
    
=== Customizing the Display of Your Results ===
 
=== Customizing the Display of Your Results ===
Line 72: Line 90:  
<source lang="text">
 
<source lang="text">
 
Feature    chr    start    end      flank        plot    arguments
 
Feature    chr    start    end      flank        plot    arguments
rs1     na     na       na      500kb        yes      rfrows=3 weightCol=”N” snpset=”HapMap” metalRug=”Our SNPs”  
+
rs1   na   na     na      500kb        yes      rfrows=3 weightCol=”N” snpset=”HapMap” metalRug=”Our SNPs”  
 
</source>
 
</source>
   Line 80: Line 98:  
<source lang="text">
 
<source lang="text">
 
Feature    chr    start    end      flank        plot    arguments
 
Feature    chr    start    end      flank        plot    arguments
rs2     1     540000  580000  na     yes      rfrows=4 legend=”right” showAnnot=T  
+
rs2   1   540000  580000  na           yes      rfrows=4 legend=”right” showAnnot=T  
 
</source>
 
</source>
   Line 88: Line 106:  
<source lang="text">
 
<source lang="text">
 
Feature    chr    start    end      flank        plot    arguments
 
Feature    chr    start    end      flank        plot    arguments
CETP     na     na       na      200kb        yes      rfrows=6 showAnnot=T annotPch=”1,24,24,25,22,21,8,7”
+
CETP   na      na     na      200kb        yes      rfrows=6 showAnnot=T annotPch=”1,24,24,25,22,21,8,7”
 
</source>
 
</source>
   Line 97: Line 115:  
== Generate single plots using our publicly-available lipids GWAS data  ==
 
== Generate single plots using our publicly-available lipids GWAS data  ==
   −
#Selecting regions to display using our lipids data<br>The plots were designed to examine ~ 1 Megabase windows of the genome, although for regions with several association signals or long-range linkage disequilibrium patterns, plots extending as large as a few Mb can be drawn. The user can specify the region to display in the LocusZoom plot in one of three ways; 1) an index SNP and a flanking region, 2) the chromosome together with start and stop positions (in basepairs), 3) gene name and a flanking region.
+
In addition to plotting your own results, you can plot the results of some publicly available GWAS. Currently, the only publicly available set of results is our GWAS for loci determining blood lipid levels (Kathiresan et al, Nature Genetics 2009). Just like when you are plotting your own data, you can specify 1) an index SNP and a flanking region, 2) the chromosome together with start and stop positions (in basepairs), or 3) gene name and a flanking region.
#Displaying LD information<br>In the main plot window, data points are colored according to their level of linkage disequilibrium (LD) with the index SNP. If users specify the region to display using an index SNP and flanking region, LD of all data points will be relative to the user-specified index SNP. If users specify the region to display using options 2 and 3 above, LocusZoom will select the most significant SNP in the region. For all other SNPs in the plot, the color of the data point will reflect the pair-wise LD patterns with this index SNP. The default LD which will be displayed is r2 from the HapMap CEU population (release 22), but users have the option to select either r2 or D’ from; HapMap CEU, HapMap YRI, Hapmap CHB+JPT, 1000 Genomes CEU. Because we have pre-computed LD for all SNPs in HapMap CEU, plots will generate very quickly if using the default LD information, provided the region to display is less than 500kb either side of the index SNP. SNPs with missing LD information are shown in grey.
     −
Table 1.3 Additional options available from the web form
+
== Commonly Used LocusZoom Options ==
   −
{| border="1" width="80%" align="center"
+
 
 +
{| border="1" width="100%" align="center"
 
|- bgcolor="lightgray"
 
|- bgcolor="lightgray"
! Web Form Option
+
! Web Form  
! Batch Mode Command File Option
+
! "Hit Spec" File Key-Value Pair
 
! Description
 
! Description
 
|-
 
|-
 
| Title on Plot  
 
| Title on Plot  
 
| title=”My Favorite Locus”  
 
| title=”My Favorite Locus”  
| The title above the plot can be specified
+
| Specifies large text displayed above the plot
 
|-
 
|-
 
| Human Genome Build  
 
| Human Genome Build  
| n/a (must be selected from web form)
+
| n/a  
| Plots can be generated based on hg 18 (default) or hg17 positions
+
| Plots can be generated based on hg18 (the default) or hg17 positions
 
|-
 
|-
 
| Legend Location  
 
| Legend Location  
 
| legend=”left”  
 
| legend=”left”  
| This specifies the location of the legend within the plot, the default is auto. Auto tries to select the preferential location (either left or right) depending on the location of data points.  
+
| This specifies the location of the legend within the plot, the default is auto. Auto tries to select a location that overlaps a minimal number of datapoints. (auto, left, right, none)  
(auto, left, right, none)  
  −
 
   
|-
 
|-
| Show RUG
+
| SNP Position Rug
| snpset=”HapMap”<br>To display rug for SNPs in analysis file;<br>metalRug=”Rug SNPs”  
+
| snpset=”HapMap” metalRug=”Rug SNPs”  
| Show a “rug” at the top of the plot – a series of vertical tick marks highlighting the positions of SNPs from HapMap CEU (here given as “HapMap”) or the markers shown in the plot (use metalRug). Remove the rug in batch mode using snpset=NULL. Other options include "Affy500",or "Illu318", or use "Affy500,Illu318,HapMap" to see all 3.
+
| These options control display of tickmarks indicating SNP positions at the top of the plot. Setting snpset="HapMap", snpset="Illu318" or snpset="Affy500" display a fixed set of SNPs. (You can also try snpset="Affy500,Illu318,HapMap" to see all 3). The metalRug option displays a rug which only includes the SNPs that are actually plotted. To remove the rug in batch mode set snpset=NULL.  
 
|-
 
|-
| Maximum Rows of Gene Names  
+
| Number of Rows for Gene Names  
 
| rfrows=4  
 
| rfrows=4  
| LocusZoom will automatically determine the optimal number of rows to display genes and gene names so they are not overlapping. However, if the user wishes to keep all plots the same size, the maximum number of gene rows can be specified. Additional genes may be left off the figure to accommodate this feature so please use with caution. If genes are missing from the plot, this will be indicated on the plot.
+
| LocusZoom will automatically tries to determine the number of display rows to use for genes and gene names so they are not overlapping. This can make each plot prettier, but is not ideal when you want to compare many plots side by side. To ensure a fixed amount of space is used for gene names, use this option to set the maximum number of display rows. If LocusZoom runs out of plotting space and some genes are left out, a warning will be added to the plot.
 
|-
 
|-
| Point Size Proportional to Sample Size
+
| Point Size
 
| weightCol=”SampleSize”  
 
| weightCol=”SampleSize”  
| This specifies that the “dot size” of the data points will reflect the square-root of the sample size (to reflect the s.e.). The default is to have all dot sizes remain the same size.
+
| This specifies that the “dot size” of each data points will reflect the square-root of the sample size. The default is to have all dot sizes equal.
 
|-
 
|-
 
| LD Measure  
 
| LD Measure  
 
| ldCol=”dprime” (“rsquare”)  
 
| ldCol=”dprime” (“rsquare”)  
| The color of the data points reflects the LD (r2) with the index SNP. The default is "rsquare".
+
| Colors data points according to the selected LD measure. The default is "rsquare".
 
|-
 
|-
| HapMap Population for LD  
+
| Reference Population for LD  
| n/a (must be selected from web form)
+
| n/a  
| This option allows the user to specify which population is used to obtain LD estimates. The default is CEU from HapMap Phase II but users may select YRI or JPT+CHB from HapMap Phase II, or CEU from 1000 Genomes (August 2009 release).
+
| This option allows the user to specify which reference panel is used to obtain LD estimates. The default is CEU from HapMap Phase II but users may select YRI or JPT+CHB from HapMap Phase II, or CEU from 1000 Genomes (August 2009 or June 2010 release).
 
|-
 
|-
 
| Highlight Region of Interest  
 
| Highlight Region of Interest  
| hiStart=425Mb<br>hiEnd=425.1Mb  
+
| hiStart=425Mb hiEnd=425.1Mb  
| A grey box can be used to highlight important regions of the genome – this can reflect the region of an association signal or a region being sequenced, etc.
+
| A grey box can be used to highlight important regions of the genome – this can reflect where an association signal peaks or a region selected for sequencing, for example.
 
|-
 
|-
 
| Theme  
 
| Theme  
Line 151: Line 167:  
|-
 
|-
 
| Show Annotation  
 
| Show Annotation  
| showAnnot=T<br>showRefsnpAnnot=T<br>annotPch=”21,24,24,25,22,22,8,7”  
+
| showAnnot=T showRefsnpAnnot=T annotPch=”21,24,24,25,22,22,8,7”  
| SNP annotation is available for all 1000G SNPs (Aug 2009 release) and can be displayed on the plot using this option. On the website, various annotation options can be turned on or off.<br>Certain annotation fields can be turned on or off using the annotPch command. To show several categories of SNPs as the same symbol, simply give the same R symbol code for those categories (e.g. annotPch=”21,24,24,25,22,22,8,7”). The category listings, together with their default symbol setting are;<br>Framestop (24, triangle)<br>Splice (24, triangle)<br>NonSynonymous (25, inverted triangle)<br>Synonymous (22, square)<br>UTR (22, square)<br>TFBScons (8, star)<br>MCS44 Placental (7, square with diagonal lines)<br>None-of-the-above (21, filled circle). <br>For more information about the annotation categories used, please see http://research.nhgri.nih.gov/tools/unisnp/?rm=ohelp
+
| SNP annotation is available for all 1000G SNPs (Aug 2009 release) and can be enabled with the showAnnot=T option. The annotPch command allows you to customize the R plotting symbol used for each kind of SNP; it is okay to use the same symbol for more than one category. The annotation categories, together with their default symbol setting are: Framestop (24, triangle), Splice (24, triangle), NonSynonymous (25, inverted triangle), Synonymous (22, square), UTR (22, square), TFBScons (8, star), MCS44 Placental (7, square with diagonal lines) and None-of-the-above (21, filled circle). For more information about these annotation categories used, please see http://research.nhgri.nih.gov/tools/unisnp/?rm=ohelp
 
|-
 
|-
 
| Recombination Rate Overlay  
 
| Recombination Rate Overlay  
 
| showRecomb=T  
 
| showRecomb=T  
| The estimated recombination rate from HapMap samples can be shown on the plot, or left off. The data plotted are from Hapmap; http://hapmap.ncbi.nlm.nih.gov/downloads/recombination/2008-03_rel22_B36/rates/
+
| The estimated recombination rate from HapMap samples can be shown on the plot or left off. The data plotted are from Hapmap; http://hapmap.ncbi.nlm.nih.gov/downloads/recombination/2008-03_rel22_B36/rates/
 
|}
 
|}
 +
 +
For a full list of options that can be used in Batch Mode using a hitspec file, please see [http://genome.sph.umich.edu/wiki/LocusZoom_Standalone#Plotting_options this list]
    
[[Category:Software]]
 
[[Category:Software]]
32

edits

Navigation menu