LocusZoom

From Genome Analysis Wiki
Revision as of 22:25, 2 February 2010 by Dharknes (talk | contribs) (Created page with 'There are three main methods to generate plots; # Generate single plots using our publicly-available data from a genome-wide scan of ~20,000 individuals for HDL-cholesterol, LDL …')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

There are three main methods to generate plots;

  1. Generate single plots using our publicly-available data from a genome-wide scan of ~20,000 individuals for HDL-cholesterol, LDL cholesterol and triglyceride levels.
  2. Upload your own meta-analysis file and generate single plots using a web-based form
  3. Upload your own meta-analysis file and generate many plots at once by uploading a specification file. Results will be emailed back to the user.

1 Generate single plots using our publicly-available lipids GWAS data 1.1 Selecting regions to display using our lipids data The plots were designed to examine ~ 1 Megabase windows of the genome, although for regions with several association signals or long-range linkage disequilibrium patterns, plots extending as large as a few Mb can be drawn. The user can specify the region to display in the LocusZoom plot in one of three ways; 1) an index SNP and a flanking region, 2) the chromosome together with start and stop positions (in basepairs), 3) gene name and a flanking region. 1.2 Displaying LD information In the main plot window, data points are colored according to their level of linkage disequilibrium (LD) with the index SNP. If users specify the region to display using an index SNP and flanking region, LD of all data points will be relative to the user-specified index SNP. If users specify the region to display using options 2 and 3 above, LocusZoom will select the most significant SNP in the region. For all other SNPs in the plot, the color of the data point will reflect the pair-wise LD patterns with this index SNP. The default LD which will be displayed is r2 from the HapMap CEU population (release 22), but users have the option to select either r2 or D’ from; HapMap CEU, HapMap YRI, Hapmap CHB+JPT, 1000 Genomes CEU. Because we have pre-computed LD for all SNPs in HapMap CEU, plots will generate very quickly if using the default LD information, provided the region to display is less than 500kb either side of the index SNP. SNPs with missing LD information are shown in grey. Table 1.3 Additional options available from the web form Additional options that can be used when generating any type of plot Option when using web Option to include in specification file when using batch mode Description Title on Plot title=”My Favorite Locus” The title above the plot can be specified Human Genome Build n/a (must be selected from web form) Plots can be generated based on hg 18 (default) or hg17 positions Legend Location legend=”right” This specifies the location of the legend within the plot, the default is left (left, right, none) Show RUG snpset=”HapMap”

To display rug for SNPs in analysis file; metalRug=”Rug SNPs” Show a “rug” at the top of the plot – a series of vertical tick marks highlighting the positions of SNPs from HapMap CEU (here given as “HapMap”) or the markers shown in the plot (use metalRug) Maximum Rows of Gene Names rfrows=3 LocusZoom will automatically determine the optimal number of rows to display genes and gene names so they are not overlapping. However, if the user wishes to keep all plots the same size, the maximum number of gene rows can be specified. Additional genes may be left off the figure to accommodate this feature so please use with caution. If genes are missing from the plot, this will be indicated on the plot. Point Size Proportional to Sample Size weightCol=”SampleSize” This specifies that the “dot size” of the data points will reflect the square-root of the sample size (to reflect the s.e.). The default is to have all dot sizes remain the same size. LD Measure ldCol=”dprime” (“rsquare”) The color of the data points reflects the LD (r2) with the index SNP. HapMap Population for LD n/a (must be selected from web form) This option allows the user to specify which HapMap population was used to obtain LD estimates. The default is CEU but users may select YRI or JPT+CHB Highlight Region of Interest hiStart=425Mb hiEnd=425.1Mb A grey box can be used to highlight important regions of the genome – this can reflect the region of an association signal or a region being sequenced, etc. Theme theme=”pub” We have created a theme that has larger text and is more easily readable for publication. Format of Output File format=”pdf,png” format=”pdf” format=”png” Pdf or png output is available. The default is pdf Show Annotation showAnnot=T showRefsnpAnnot=T annotPch=”1,24,24,25,22,21,8,7” SNP annotation is available for all 1000G SNPs (Aug 2009 release) and can be displayed on the plot using this option. On the website, various annotation options can be turned on or off. Certain annotation fields can be turned on or off using the annotPch command. To show several categories of SNPs as the same symbol, simply give the same R symbol code for those categories (e.g. annotPch=”1,24,24,24,21,21,21,21”). The category listings, together with their default symbol setting are; Framestop (24, triangle) Splice (24, triangle) NonSynonymous (25, inverted triangle) Coding (22, square) UTR (21, filled circle) TFBScons (8, star) MCS44 Placental (7, square with diagonal lines) None-of-the-above (1, open circle) Recombination Rate Overlay showRecomb=T The estimated recombination rate from HapMap samples can be shown on the plot, or left off. The data plotted are from;


2. Upload your own meta-analysis file and generate single plots using a web-based form 2. 1 Uploading your own association results Association results can be uploaded to our web server using the LocusZoom webpage, which will accept a typical meta-analysis file for ~2.5 million SNPs provided the user selects only the required columns (SNP Name, p-value, and optionally, N) and gzips the file before uploading. In our tests, this results in a file ~17 Mb which is below the 20 Mb file size limit. This allows users to draw multiple plots from the LocusZoom website while only uploading the meta-analysis results file one time. Alternatively, for faster viewing of a single region, users can upload a file that contains only the rows corresponding to SNPs in the region of interest or a particular chromosome. Users need to specify the name of the column containing SNP identifiers (rs numbers or genome-based names such as chr1:400000 where the position is from the same build as that being plotted, typically hg18) and the name of the column containing p-values. Data points can optionally be sized according to the square-root of user-specified weights such as sample size. Providing the name of the weight column turns this feature on. 2.2 Options specific to uploading your own results All options listed in 1.3 above are available, as well as the options listed below Column Delimiter n/a (must be selected from web form) Users must specify the type of column delimiter in the results file Pvalue Column Name n/a (must be selected from web form) Users must specify the name of the column that contains the p-values Marker Column Name n/a (must be selected from web form) Users must specify the heading of the column that contains marker names Human Genome Build n/a (must be selected from web form) Plots can be generated based on hg 18 (default) or hg17 positions HapMap Population for LD n/a (must be selected from web form) This option allows the user to specify which HapMap population was used to obtain LD estimates. The default is CEU but users may select YRI or JPT+CHB

3 Uploading your own file and using the batch mode 3.1 Uploading a file Association results can be uploaded to our web server using the LocusZoom webpage, which will accept a typical meta-analysis file for ~2.5 million SNPs provided the user selects only the required columns (SNP Name, p-value, and optionally, N) and gzips the file before uploading. In our tests, this results in a file ~17 Mb which is below the 20 Mb file size limit. This allows users to draw multiple plots from the LocusZoom website while only uploading the meta-analysis results file one time. Alternatively, for faster viewing of a single region, users can upload a file that contains only the rows corresponding to SNPs in the region of interest or a particular chromosome.

3.2 Uploading the specification file Users can upload a specification file which allows for the easy generation of dozens of plots, where each plot can be customized for even more features than available on the web interface for LocusZoom. The file is required to have 7 white space-delimited columns, where the last column can be blank. The header is not important, but LocusZoom expects a header to exist. To define a region to plot, users may specify either i) a SNP name in the first column and the appropriate flanking region (e.g. 200kb, 500kb, 1Mb) in the fifth column, or ii) a gene name in the first column and the appropriate flanking region in the fifth column, or iii) a chromosome number, start and stop positions in the 2nd, 3rd and 4th columns respectively. If option ii) is selected, LocusZoom will select the most significant SNP in the region as the index SNP. If option iii) is chosen, the index SNP for the plot must be specified in the first column. For distances <= 500kb where the lead SNP is a HapMap SNP and LD from CEU is requested, the plots will be generated very quickly because we have pre-computed LD for all HapMap SNPs in the CEU. The sixth column is used to select which regions should be plotted. For example, if you run a specification file with 15 plots and 14 of them turn out very nicely, you may wish to re-run the specification file with some modifications to a single row of the specification file to tweak the last remaining plot. In this case, you could change the 14 plots you don’t need to re-run to “no” under the 6th column (“run” column) and leave the 15th plot you’d like to re-run as “yes” in this column. The 7th and final column contains optional LocusZoom arguments (see the 2nd column of Table 1.3 above). As many options as the user wishes to change can be specified in 7th column and LocusZoom options should be separated by spaces.

Example of a specification file (must include a header); specfile.txt snp chr start end flank run m2zargs rs1 NA NA NA 500kb yes rfrows=3 weightCol=”N” snpset=”HapMap” metalRug=”Our SNPs” rs2 1 540000 580000 NA yes rfrows=4 legend=”right” showAnnot=T CETP NA NA NA 200kb yes rfrows=6 showAnnot=T annotPch=”1,24,24,25,22,21,8,7”