Changes

From Genome Analysis Wiki
Jump to: navigation, search

LocusZoom

3,965 bytes added, 13:29, 28 October 2020
m
Make deprecation warning more visible.
= LocusZoom =There are three main methods to generate plots;# Generate single plots using our publicly-available data from a genome-wide scan of ~20,000 individuals for HDL-cholesterol, LDL cholesterol and triglyceride levels.# Upload your own meta-analysis file and generate single plots using a web-based form# Upload your own meta-analysis file and generate many plots at once by uploading a specification file. Results will be emailed back to the user[[Image:LocusZoomSmall.png]]
== Generate single plots using our publicly-available lipids GWAS data ==# Selecting regions to display using our lipids data<br>The plots were '''LocusZoom''' is designed to examine ~ 1 Megabase windows facilitate viewing of the genome, although for regions local association results together with several association signals or long-range linkage disequilibrium patternsuseful information about a locus, plots extending as large such as a few Mb can be drawn. The user can specify the region to display in the LocusZoom plot in one location and orientation of three ways; 1) an index SNP and a flanking region, 2) the chromosome together with start and stop positions (in basepairs)genes it includes, 3) gene name and a flanking region. # Displaying LD information<br>In the main plot window, data points are colored according to their level of linkage disequilibrium (LD) with the index SNP. If users specify the region to display using an index SNP coefficients and flanking region, LD local estimates of all data points will be relative to the user-specified index SNPrecombination rates. If users specify the region to display using options 2 and 3 aboveIt was developed by popular demand, LocusZoom will select as a result of many questions we have had about "How did you make the most significant SNP figures in your talk?" or "How did you make the region. For all other SNPs in the plot, the color of the data point will reflect the pair-wise LD patterns with this index SNP. The default LD which will be displayed is r2 from the HapMap CEU population figures for your GWAS paper?" (release 22), but users have the option to select either r2 And for better or D’ from; HapMap CEU, HapMap YRI, Hapmap CHB+JPTfor worse, 1000 Genomes CEU. Because we have pre-computed LD for all SNPs in HapMap CEU, plots will generate very quickly if using the default LD information, provided the region to display is less than 500kb either side of the index SNP. SNPs with missing LD information are shown in greyquite a few GWAS papers!!).
Table 1.3 Additional options available from the web formLocusZoom can be used in four ways:
; 1. Plot Summaries of Your Genomewide Scan Interactively: You can upload summary results of your own genomewide scan or genomewide meta-analysis and request plots of regions of interest using a web-based form. ; 2. Generate Many Plots in Batch Mode: You can upload summary results for your genomewide scan or genomewide meta-analysis and request several plots in one go by uploading a batch file. You will receive results via e-mail. A snail-mail option is not available. ; 3. Plot Summaries of Publicly Available Datasets: Currently, this includes the results of [http://www.sph.umich.edu/csg/abecasis/public/lipids2008/ our genome-wide scan] for variants associated with HDL-cholesterol, LDL-cholesterol and triglyceride levels in ~20,000 individuals. ; 4. Download LocusZoom and run on your local unix machine: [http://genome.sph.umich.edu/wiki/LocusZoom_Standalone Download LocusZoom] and [http://genome.sph.umich.edu/wiki/LocusZoom_Standalone#Sources_of_SQLite_database_tables associated databases]  == Upload your own meta-analysis file and generate single plots using a web-based form == === Uploading Your Association Study Results ===  <span style="color:red">'''The instructions below refer to a "legacy" service that is not actively maintained. For modern datasets, consider using our new [https://my.locuszoom.org my.locuszoom.org] service for the latest features, including manhattan plots and support for build GRCh38.'''</span>  '''Please note: You CAN plot SNPs without rsid using chr6:20122013 format.''' Association results can be uploaded to our web server using the [http://locuszoom.org/ plot your data webpage]. Result files are limited to 20Mb in size, which allows for a [[gzip|gzipped]] text table including key columns (marker name, p-value and sample size) for up to ~3 million SNPs. In our tests, a typical GWAS results file is ~17 Mb in size after imputation of HapMap SNPs. Once a file is uploaded, LocusZoom will remember the file for the duration of your web session allowing you to generate multiple plots. If you have a slow connection or would like to save time, you can upload results for a region or chromosome of interest only. Your results are entirely confidential and won't be viewed by us or anyone else (except those with whom you share them!) To specify the region to be plotted, you will have to specify the name of a key marker in the region (typically, as an rs-number, but can be in chr:pos format), name a gene of interest or provide appropriate genome coordinates. When displaying linkage disequilibrium, plotting will be very fast for small windows when HapMap CEU linkage disequilibrium is requested (because pairwise coefficients have been precomputed) and will be a bit slower for larger windows (because linkage disequilibrium coefficients must be computed on the fly). If you include a sample size column in the result file, it will be used to control the size of each plotted marker. === Custom Annotation === You may choose to have SNPs displayed using different plotting symbols to distinguish them from each other. To implement this, in the section "Custom Annotation" in the box "Column Name", you need to provide the name of a column in your meta-analysis file. This column will list a category for each SNP of your own choosing (i.e. "nonsynonymous", "splice","intronic",etc.) or ("Genotyped","Imputed"), however, the category names may not include any spaces. To select the order of the categories to display in the legend and to match the order of pre-selected R plotting symbols (set as pch = 21, 22, 23, 24, 25, 4, 7, 8, 10, 11, 12, 13, 14, 3), you may provide the category names in the specified order in "Category Order" section of "Custom Annotation". Each entry (which may not contain spaces) does not need quotes but each entry should be separated by commas. Alternatively, we have provided functional annotation of all 1000 Genomes (Aug 2009) and HapMap r22 SNPs according to the following categories; Framestop (24, triangle), Splice (24, triangle), NonSynonymous (25, inverted triangle), Synonymous (22, square), UTR (22, square), TFBScons (8, star), MCS44 Placental (7, square with diagonal lines) and None-of-the-above (21, filled circle). This can be implemented using the section "Show Annotation" and clicking the box beside each annotation category that you would like distinguished. SNPs that are not in any selected category will still be displayed as having no annotation. === Plotting of Pairwise Linkage Disequilibrium === In the main plot window, data points are colored according to their level of linkage disequilibrium (LD) of the each SNP with the index SNP. If users specify the region to display using an index SNP and flanking region, LD of all data points will be relative to the user-specified index SNP. If users specify the region to display using genome coordinates or a gene name, LocusZoom will automatically select the most significant SNP in the region as the index SNP. For all other SNPs in the plot, the color of the data point will reflect the pairwise LD with this index SNP. The default LD measure is r<sup>2</sup> calculated from the HapMap CEU population (release 22), but users have the option to replace this with D’ and of selecting the HapMap YRI, Hapmap CHB+JPT or 1000 Genomes CEU reference panels. To display LD from 1000G CEU, please substitute rsid's for 1000G naming convention (chrxx:xxxx) whenever possible. Because we have pre-computed LD for all SNPs in HapMap CEU, plots will often generate more quickly if using the default LD information. SNPs with missing LD information are shown in grey. === Customizing the Display of Your Results === All options listed in the Main Table above are available, as well as the options listed below {| class="wikitable" border="10" cellpadding="3"|- bgcolor="lightgray"! Setting ! Default Value ! Details
|-
| Option when using web Column Delimiter |none | Option to include Users must specify the type of column delimiter in specification the results file when using batch mode || Description
|-
| Title on Plot Pvalue Column Name |none | title=”My Favorite Locus” || The title above Users must specify the name of the column that contains the plot can be specifiedp-values
|-
| Human Genome Build Marker Column Name |none | n/a (Users must be selected from web form) || Plots can be generated based on hg 18 (default) or hg17 positionsspecify the heading of the column that contains marker names
|-
| Legend Location Human Genome Build |none | legend=”left” || This specifies the location of the legend within the plot, the default is auto. Auto tries to select the preferential location (either left or right) depending Plots can be generated based on the location of data points.hg18 (auto, left, right, nonedefault)or hg17 positions
|-
| Show RUG HapMap Population for LD |none | snpsetThis option allows the user to specify which HapMap population was used to obtain LD estimates. The default is CEU but users may select YRI or JPT+CHB|} == Using Batch Mode ==”HapMap”<br> To display rug start batch mode, first upload your results file just as you would in interactive mode. The same file size restrictions apply. === Generating a Hit Spec File === Batch mode allows you to conveniently specify a set of plots to be generated in "Hit Spec" file. This is handy if you need to generate large numbers of plots or if you want to plot the same set of regions after updating a genomewide analysis (for SNPs example). The "Hit Spec" file is a whitespace delimited text file. The file has six mandatory columns which can be followed by a series of optional '''key'''=''value'' pairs to allow for detailed customization of each plot. The first line in analysis the fileis assumed to be a header and is ignored. Each subsequent line describes a single plot. There are three ways to select a region to plot: ;Plotting a window flanking an interesting SNP: This option allows you to plot results for all markers within a specific distance (e.g. 500kb) of an index SNP. To use this option, set column 1 to have the name of the index SNP (e.g. ''rs2'' below) and set column 5 to specify the width of the region of interest (e.g. 500kb below). Here is an example: <brsource lang="text">Feature chr start end flank plot argumentsrs1 na na na 500kb yes rfrows=3 weightCol=”N” snpset=”HapMap” metalRug=”Rug ”Our SNPs” || Show </source> ; Plotting a region flanking an interesting SNP: This option is similar to the previous option, but allows you to specify an assymetric region of interest. For example, perhaps you interested in a plot that extends a “rug” at bit further to the right of the SNP of interest. In this case, specify the top coordinates of the region to be plotted in columns 2, 3, and 4. Here is an example: <source lang="text">Feature chr start end flank plot argumentsrs2 1 540000 580000 na yes rfrows=4 legend=”right” showAnnot=T </source> ; Plotting a region flanking a gene of interest: This option allows you to focus on a series particular gene, rather than a specific SNP. It is similar to the first option. You should set column 1 to be the name of the gene of vertical tick marks highlighting interest and column 5 to be the desired window width. When you use this option, LocusZoom will automatically select an index SNP for each region; the SNP will be the site with the smallest p-value. Here is an example: <source lang="text">Feature chr start end flank plot argumentsCETP na na na 200kb yes rfrows=6 showAnnot=T annotPch=”1,24,24,25,22,21,8,7”</source> The sixth column in the "Hit Spec" file can be used to enable (with the positions value ''yes'') or disable (with the value ''no'') an individual plot. For example, if you run a "Hit Spec" file with 15 plots and 14 of SNPs from HapMap CEU them turn out very nicely, you may wish to re-run the "Hit Spec" file with some tweaks to the problem plot. In this case (here given if you dislike waiting for your results as “HapMap”much as we do!) or , you could disable generation of the plots that seem nice by changing the 6th column to “no” and leave the markers shown in plot that you tweaked as a “yes”. The 7th and final column contains additional LocusZoom arguments as '''key'''=''value'' pairs. Any number of '''key'''=''value'' pair arguments can be included. For details of available options, see the section entitled LocusZoom options below. == Generate single plots using our publicly-available lipids GWAS data == In addition to plotting your own results, you can plot the results of some publicly available GWAS. Currently, the only publicly available set of results is our GWAS for loci determining blood lipid levels (use metalRugKathiresan et al, Nature Genetics 2009). Just like when you are plotting your own data, you can specify 1) an index SNP and a flanking region, 2) the chromosome together with start and stop positions (in basepairs), or 3)gene name and a flanking region. == Commonly Used LocusZoom Options ==  {| border="1" width="100%" align="center"|- bgcolor="lightgray"! Web Form ! "Hit Spec" File Key-Value Pair ! Description
|-
| Maximum Rows of Gene Names Title on Plot || rfrowstitle=3 |”My Favorite Locus” | LocusZoom will automatically determine the optimal number of rows to display genes and gene names so they are not overlapping. However, if the user wishes to keep all plots the same size, the maximum number of gene rows can be specified. Additional genes may be left off the figure to accommodate this feature so please use with caution. If genes are missing from the plot, this will be indicated on Specifies large text displayed above the plot.
|-
| Point Size Proportional to Sample Size Human Genome Build |n/a | weightCol=”SampleSize” || This specifies that the “dot size” of the data points will reflect the square-root of the sample size Plots can be generated based on hg18 (to reflect the s.e.default). The default is to have all dot sizes remain the same size.or hg17 positions
|-
| LD Measure Legend Location || ldCollegend=”dprime” (“rsquare”) ”left” || The color This specifies the location of the data points reflects legend within the LD plot, the default is auto. Auto tries to select a location that overlaps a minimal number of datapoints. (r2auto, left, right, none) with the index SNP.
|-
| HapMap Population for LD SNP Position Rug |snpset=”HapMap” metalRug=”Rug SNPs” | n/a (must be selected from web form) || This option allows These options control display of tickmarks indicating SNP positions at the top of the user to specify which population is used to obtain LD estimatesplot. The default is CEU from HapMap Phase II but users may select YRI or JPT+CHB from Setting snpset="HapMap Phase II", snpset="Illu318" or CEU from 1000 Genomes snpset="Affy500" display a fixed set of SNPs. (August 2009 releaseYou can also try snpset="Affy500,Illu318,HapMap" to see all 3). The metalRug option displays a rug which only includes the SNPs that are actually plotted. To remove the rug in batch mode set snpset=NULL.
|-
| Highlight Region Number of Interest Rows for Gene Names |rfrows=4 | hiStart=425Mb<br>hiEnd=425LocusZoom will automatically tries to determine the number of display rows to use for genes and gene names so they are not overlapping.1Mb || A grey box This can be used make each plot prettier, but is not ideal when you want to highlight important regions compare many plots side by side. To ensure a fixed amount of the genome – space is used for gene names, use this can reflect option to set the region maximum number of an association signal or display rows. If LocusZoom runs out of plotting space and some genes are left out, a region being sequenced, etcwarning will be added to the plot.
|-
| Theme Point Size || themeweightCol=”pub” |”SampleSize” | We have created a theme This specifies that has larger text and the “dot size” of each data points will reflect the square-root of the sample size. The default is more easily readable for publicationto have all dot sizes equal.
|-
| Show Annotation LD Measure || showAnnot=T<br>showRefsnpAnnot=T<br>annotPchldCol=”1,24,24,25,22,21,8,7” || SNP annotation is available for all 1000G SNPs ”dprime” (Aug 2009 release“rsquare”) and can be displayed on | Colors data points according to the plot using this option. selected LD On the website, various annotation options can be turned on or offmeasure.<br>Certain annotation fields can be turned on or off using the annotPch command. To show several categories of SNPs as the same symbol, simply give the same R symbol code for those categories (e.g. annotPch=”1,24,24,24,21,21,21,21”). The category listings, together with their default symbol setting are;<br>Framestop (24, triangle)<br>Splice (24, triangle)<br>NonSynonymous (25, inverted triangle)<br>Synonymous (22, square)<br>UTR (22, square)<br>TFBScons (8, star)<br>MCS44 Placental (7, square with diagonal lines)<br>None-of-the-above (21, filled circle)is "rsquare".
|-
| Recombination Rate Overlay || showRecomb=T Reference Population for LD |n/a | This option allows the user to specify which reference panel is used to obtain LD estimates. The estimated recombination rate default is CEU from HapMap samples can be shown on the plotPhase II but users may select YRI or JPT+CHB from HapMap Phase II, or left off. The data plotted are CEU from Hapmap; http://hapmap.ncbi1000 Genomes (August 2009 or June 2010 release).nlm.nih.gov/downloads/recombination/2008-03_rel22_B36/rates/
|-
|}Highlight Region of Interest | hiStart=425Mb hiEnd=Upload your own meta-analysis file and generate single plots using a web-based form==425.1Mb # Uploading your own association results<br>Association results | A grey box can be uploaded used to our web server using highlight important regions of the LocusZoom webpage, which will accept a typical meta-analysis file for ~2.5 million SNPs provided the user selects only the required columns (SNP Name, p-value, and optionally, N) and gzips the file before uploading. In our tests, genome – this results in a file ~17 Mb which is below the 20 Mb file size limit. This allows users to draw multiple plots from the LocusZoom website while only uploading the meta-analysis results file one time. Alternatively, for faster viewing of a single region, users can upload reflect where an association signal peaks or a file that contains only the rows corresponding to SNPs in the region of interest or a particular chromosome. Users need to specify the name of the column containing SNP identifiers (rs numbers or genome-based names such as chr1:400000 where the position is from the same build as that being plottedselected for sequencing, typically hg18) and the name of the column containing p-valuesfor example. Data points can optionally be sized according to the square-root of user-specified weights such as sample size. Providing the name of the weight column turns this feature on.# Options specific to uploading your own results<br>All options listed in 1.3 above are available, as well as the options listed below{| border="1"
|-
| Column Delimiter Theme |theme=”publication” | n/We have created a (must be selected from web form) || Users must specify the type of column delimiter in the results filetheme that has larger text and is more easily readable for publication.
|-
| Pvalue Column Name Show Annotation |showAnnot=T showRefsnpAnnot=T annotPch=”21,24,24,25,22,22,8,7” | n/a SNP annotation is available for all 1000G SNPs (must Aug 2009 release) and can be selected from web form) || Users must specify enabled with the showAnnot=T option. The annotPch command allows you to customize the name R plotting symbol used for each kind of SNP; it is okay to use the column that contains the p-values|-| Marker Column Name || n/a same symbol for more than one category. The annotation categories, together with their default symbol setting are: Framestop (24, triangle), Splice (24, triangle), NonSynonymous (25, inverted triangle), Synonymous (22, square), UTR (must be selected from web form22, square) || Users must specify the heading of the column that contains marker names|-| Human Genome Build || n/a , TFBScons (must be selected from web form8, star) || Plots can be generated based on hg 18 , MCS44 Placental (default7, square with diagonal lines) or hg17 positions|and None-of-the-| HapMap Population for LD || n/a above (must be selected from web form21, filled circle) || This option allows the user to specify which HapMap population was . For more information about these annotation categories used to obtain LD estimates, please see http://research.nhgri.nih. The default is CEU but users may select YRI or JPT+CHBgov/tools/unisnp/?rm=ohelp
|-
| Recombination Rate Overlay
| showRecomb=T
| The estimated recombination rate from HapMap samples can be shown on the plot or left off. The data plotted are from Hapmap; http://hapmap.ncbi.nlm.nih.gov/downloads/recombination/2008-03_rel22_B36/rates/
|}
For a full list of options that can be used in Batch Mode using a hitspec file, please see [http://genome.sph.umich.edu/wiki/LocusZoom_Standalone#Plotting_options this list]
== Uploading your own file and use the batch mode ==# Uploading a file<br>Association results can be uploaded to our web server using the LocusZoom webpage, which will accept a typical meta-analysis file for ~2.5 million SNPs provided the user selects only the required columns (SNP Name, p-value, and optionally, N) and gzips the file before uploading. In our tests, this results in a file ~17 Mb which is below the 20 Mb file size limit. This allows users to draw multiple plots from the LocusZoom website while only uploading the meta-analysis results file one time. Alternatively, for faster viewing of a single region, users can upload a file that contains only the rows corresponding to SNPs in the region of interest or a particular chromosome. # Uploading the specification file<br>Users can upload a specification file which allows for the easy generation of dozens of plots, where each plot can be customized for even more features than available on the web interface for LocusZoom. The file is required to have 7 white space-delimited columns, where the last column can be blank. The header is not important, but LocusZoom expects a header to exist. To define a region to plot, users may specify either i) a SNP name in the first column and the appropriate flanking region (e.g. 200kb, 500kb, 1Mb) in the fifth column, or ii) a gene name in the first column and the appropriate flanking region in the fifth column, or iii) a chromosome number, start and stop positions in the 2nd, 3rd and 4th columns respectively. If option ii) is selected, LocusZoom will select the most significant SNP in the region as the index SNP. If option iii) is chosen, the index SNP for the plot must be specified in the first column. For distances <= 500kb where the lead SNP is a HapMap SNP and LD from CEU is requested, the plots will be generated very quickly because we have pre-computed LD for all HapMap SNPs in the CEU. The sixth column is used to select which regions should be plotted. For example, if you run a specification file with 15 plots and 14 of them turn out very nicely, you may wish to re-run the specification file with some modifications to a single row of the specification file to tweak the last remaining plot. In this case, you could change the 14 plots you don’t need to re-run to “no” under the 6th column (“run” column) and leave the 15th plot you’d like to re-run as “yes” in this column. The 7th and final column contains optional LocusZoom arguments (see the 2nd column of Table 1.3 above). As many options as the user wishes to change can be specified in 7th column and LocusZoom options should be separated by spaces. Example of a specification file (must include a header);<pre>specfile.txtsnp chr start end flank run m2zargsrs1 NA NA NA 500kb yes rfrows=3 weightCol=”N” snpset=”HapMap” metalRug=”Our SNPs” rs2 1 540000 580000 NA yes rfrows=4 legend=”right” showAnnot=T CETP NA NA NA 200kb yes rfrows=6 showAnnot=T annotPch=”1,24,24,25,22,21,8,7”</pre>[[Category:Software]]
30
edits

Navigation menu