Changes

LocusZoom Standalone (view source)

Revision as of 16:29, 9 May 2013

10,459 bytes added , 16:29, 9 May 2013

no edit summary

Line 16: Line 16:

*[http://www.python.org/download/ Python 2.6] (do '''not''' download the 3.0 branch!)

*[http://www.r-project.org/ R 2.10+]

+

The following software is optional but recommended:

+

*[[New Fugue|new_fugue]], a program for computing LD, written by Goncalo Abecasis.

*[http://pngu.mgh.harvard.edu/~purcell/plink/ PLINK], written by Shaun Purcell.

−

For the latest stable LocusZoom package, see our [https://statgen.sph.umich.edu/locuszoom/download/ download] page.

+

The following R packages are optional but recommended:

+

*[http://cran.r-project.org/web/packages/gridExtra/index.html gridExtra] (used for creating summary tables of GWAS hits / fine-mapping SNPs as additional pages in the PDF)

+

For the latest stable LocusZoom package, see our [https://statgen.sph.umich.edu/locuszoom/download/ download] page. The current version is '''1.2''', released on May 10th, 2013.

Currently only '''Unix/Linux''' is supported, though Mac OS X should be supported in a future release. Support for Windows may come at a much later date.

Line 36: Line 45:

See our [https://statgen.sph.umich.edu/locuszoom/download/ download] page for links to the latest as well as previous releases.

+

== Changes in Version 1.2 ==

+

A number of new features have been added for this version. See the following sections for more info:

+

* [[#EPACTS formatted file|Loading EPACTS results]]

+

* [[#Plotting LD with additional reference SNPs|Plotting LD with additional reference SNPs]]

+

* [[#Labeling multiple SNPs|Labeling multiple SNPs]]

+

* [[#Fine-mapping credible sets|Fine-mapping credible sets]]

+

* [[#GWAS catalog variants|GWAS catalog variants]]

+

* [[#Supply VCF files for calculating LD|Supply VCF files for calculating LD]]

+

The full changelog is available on the [https://statgen.sph.umich.edu/locuszoom/download/ download] site.

== Installation ==

Line 47: Line 70:

R is also required for generating the plots. You can download R at [http://www.r-project.org/ www.r-project.org]. Version 2.10 or greater is required.

−

=== Step 3: Install new_fugue ===

+

=== Step 3: Install LD calculation software (optional) ===

+

* If you wish to calculate from hg18 sources (hapmap, earlier releases of 1000G): install '''new_fugue''' (see below.)

+

* If you wish to calculate from hg19 sources (latest 1000G): install '''PLINK''' (see below.)

+

* If you plan to supply your own LD files per region, or calculate LD directly from VCF files: install nothing! See options for --ld and --ld-vcf.

+

==== new_fugue ====

New_fugue is a program that calculates linkage disequilibrium measures from genotype files. While installing new_fugue is optional, we highly recommend it as it makes the process of generating plots much easier. If you opt to skip installing new_fugue, you will need to provide your own computed LD files for each region that you want to plot.

Line 60: Line 89:

You may need administrator rights to install this program.

−

=== ~~Step 4: Install~~ PLINK ===

+

==== PLINK ====

PLINK is now used to calculate LD for all future LD sources / populations that we may add. The program new_fugue (above) is used to calculate LD from older sources (such as hapmap) and older builds (such as hg18) where LD files are sufficiently small.

−

You can download PLINK and find instructions for installing it [http://pngu.mgh.harvard.edu/~purcell/plink/download.shtml here].

+

You can download PLINK and find instructions for installing it [http://pngu.mgh.harvard.edu/~purcell/plink/download.shtml here].

=== Step 5: Install LocusZoom ===

Line 109: Line 138:

For annotation:

−

*We ~~used~~ various sources including RefSeq Genes (refFlat), TFBS Conserved (tfbsConsSites), and Conservation (phaseConsElements44wayPlacental), all available from the [http://genome.usc.edu UCSC Genome Browser].

+

*We use various sources including RefSeq Genes (refFlat), TFBS Conserved (tfbsConsSites), and Conservation (phaseConsElements44wayPlacental), all available from the [http://genome.usc.edu UCSC Genome Browser].

*[ftp://ftp.hapmap.org/hapmap/recombination/2008-03_rel22_B36/rates/ Recombination rates from HapMap].

+

For GWAS hits:

+

*We use the NHGRI GWAS catalog, available at [http://www.genome.gov/gwastudies/ genome.gov]

== Input ==

−

=== Association results file ~~("metal"~~ file) ===

+

=== Association results file ===

+

LocusZoom requires an association results file similar in formatting to what METAL or EPACTS provides.

+

==== METAL formatted file ====

−

~~The main input to LocusZoom is a file containing results from an association scan or meta-analysis.~~ The file must have 2 columns: markers (SNPs), and p-values. The file should look something like this:

+

The file must have 2 columns: markers (SNPs), and p-values. The file should look something like this:

<br>

Line 144: Line 180:

P-values of any magnitude are supported in scientific notation (we use an arbitrary precision library built-in to python, and transform p-values to the log scale.) If you've already transformed your p-values to the log scale, simply use <code>--no-transform</code> and LocusZoom will not transform them.

+

==== EPACTS formatted file ====

+

The file can come directly from [[EPACTS]], or simply be formatted similarly to the following:

+

{|

+

|-

+

! scope="col" | #CHROM

+

! scope="col" | BEGIN

+

! scope="col" | END

+

! scope="col" | MARKER_ID

+

! scope="col" | NS

+

! scope="col" | AC

+

! scope="col" | CALLRATE

+

! scope="col" | MAF

+

! scope="col" | PVALUE

+

! scope="col" | SCORE

+

! scope="col" | N.CASE

+

! scope="col" | N.CTRL

+

! scope="col" | AF.CASE

+

! scope="col" | AF.CTRL

+

|-

+

| 1 || 15903 || 15903 || 1:15903_G/GC || 2657 || 3892.2 || 1 || 0.26757 || 0.36771 || 0.90077 || 1326 || 1331 || 1.4688 || 1.4609

+

|-

+

| 1 || 19190 || 19191 || 1:19190_GC/G || 2657 || 823.65 || 1 || 0.155 || 0.67173 || 0.42378 || 1326 || 1331 || 0.3115 || 0.30849

+

|-

+

| 1 || 20316 || 20317 || 1:20316_GA/G || 2657 || 1005.3 || 1 || 0.18917 || 0.50804 || 0.66189 || 1326 || 1331 || 0.38062 || 0.37607

+

|-

+

| 1 || 30967 || 30970 || 1:30967_CCCA/C || 2657 || 435.35 || 1 || 0.081925 || 0.08848 || -1.7035 || 1326 || 1331 || 0.16007 || 0.16762

+

|-

+

| 1 || 51972 || 51975 || 1:51972_GGAC/G || 2657 || 207.8 || 1 || 0.039104 || 0.51638 || -0.64893 || 1326 || 1331 || 0.077187 || 0.079226

+

|-

+

| 1 || 53138 || 53140 || 1:53138_TAA/T || 2657 || 216.2 || 1 || 0.040685 || 0.55679 || 0.58762 || 1326 || 1331 || 0.083145 || 0.079602

+

|-

+

| 1 || 54421 || 54421 || 1:54421_A/G || 2657 || 179.45 || 1 || 0.033769 || 0.73592 || 0.33726 || 1326 || 1331 || 0.068213 || 0.066867

+

|-

+

| 1 || 66221 || 66221 || 1:66221_A/AT || 2657 || 664.45 || 1 || 0.12504 || 0.48676 || 0.69547 || 1326 || 1331 || 0.25366 || 0.24651

+

|-

+

| 1 || 66222 || 66223 || 1:66222_TA/T || 2657 || 470.3 || 1 || 0.088502 || 0.64258 || 0.4641 || 1326 || 1331 || 0.17941 || 0.17461

+

|-

+

|}

+

The chrom, start, end, marker ID, and p-value columns must all be present. The file must be tab-delimited.

+

To load this file, use --epacts.

=== Region ===

Line 184: Line 265:

! align="left" scope="col" | Population

! align="left" scope="col" | LocusZoom Arguments

+

|-

+

| March 2012

+

| hg19

+

| ASN

+

| --pop ASN --build hg19 --source 1000G_March2012

+

|-

+

| March 2012

+

| hg19

+

| AFR

+

| --pop AFR--build hg19 --source 1000G_March2012

+

|-

+

| March 2012

+

| hg19

+

| EUR

+

| --pop EUR --build hg19 --source 1000G_March2012

+

|-

+

| March 2012

+

| hg19

+

| AMR

+

| --pop AMR --build hg19 --source 1000G_March2012

|-

| Nov 2010

Line 251: Line 352:

| --pop JPT+CHB --build hg18 --source hapmap

|}

+

=== Batch mode ===

Line 363: Line 465:

The file should be whitespace delimited, and the header (column names shown above) must exist.

+

=== Supply VCF files for calculating LD ===

+

You can give LocusZoom a VCF file directly to use for calculating LD:

+

+

locuszoom --ld-vcf my_genotypes.vcf.gz ...

+

</syntaxhighlight>

+

This option takes the place of having to supply per-region pre-calculated LD (--ld) or having to specify --pop and --source for calculating LD from genotype files supplied by LZ.

+

<span style="color:#FF6600">'''Warning: '''</span> The VCF file must also have a [http://samtools.sourceforge.net/tabix.shtml tabix] index located in the same directory. For the above example, the tabix index "my_genotypes.vcf.gz.tbi" must exist.

+

You can also calculate D' from phased VCF files:

+

+

locuszoom --ld-vcf my_genotypes.vcf.gz --ld-measure dprime ...

+

</syntaxhighlight>

+

The default measure is "rsquared".

+

== Optional Input ==

+

=== Plotting LD with additional reference SNPs ===

+

LocusZoom can now show LD with multiple SNPs in a region (for example, you might want to show LD with a number of SNPs from a conditional analysis.)

+

You give LocusZoom the usual reference SNP (used for centering the plot and calculating the region) but an additional set of lead/reference SNPs as well.

+

For all other SNPs not in the "lead SNP set" of { reference SNP, additional reference SNPs }, LZ will find which of the lead SNPs it is in highest LD with, and color it to match that lead SNP. The extent of LD with the lead SNP is shown by a gradient of color.

+

As an example:

+

+

locuszoom --metal <DIAGRAM T2D results> --refsnp "rs231362" --add-refsnps "rs163184"

+

</syntaxhighlight>

+

Will generate the following plot:

+

[[File:New lz cond only.png|700px]]

+

The following options are available for changing the style of these types of plots:

+

{| width="85%" cellspacing="0" cellpadding="5" border="1"

+

|-

+

! scope="col" | Option (with default value)

+

! scope="col" | Description

+

|-

+

| condLdColors="gray60,#E41A1C,#377EB8,#4DAF4A,#984EA3,#FF7F00,#A65628,#F781BF"

+

| First color is missing LD color, the rest are used as needed for each additional lead SNP

+

|-

+

| drawMarkerNames = T

+

| Display marker names (or not) above lead SNPs

+

|-

+

| condLdLow=NULL

+

| Used to set all SNPs with LD in the lowest bin to the same color, for example condLdLow="gray70"

+

|-

+

| condRefsnpPch=23

+

| Symbol for each lead SNP, defaults to diamond

+

|-

+

| condPch='4,16,17,15,25,8,7,13,12,9,10'

+

| Plotting symbols for groups of SNPs in LD with additional refsnps, make sure they don't overlap with condRefsnpPch above

+

|-

+

| ldCuts = "0,.2,.4,.6,.8,1"

+

| Bins for LD

+

|}

+

=== GWAS catalog variants ===

+

You can add known GWAS variants to your plots. For example:

+

+

locuszoom ... --gwas-cat whole_cat-significant-only --build hg19

+

</syntaxhighlight>

+

[[File:New lz gwas cat.png|900px]]

+

Currently the only catalog is the NHGRI GWAS catalog from [http://www.genome.gov/gwastudies/ genome.gov].

+

<pre>

+

Available GWAS catalogs for build hg19:

+

+----------------------------+----------------------------------------------------------------+

+

| Option | Description |

+

+----------------------------+----------------------------------------------------------------+

+

| whole-cat_significant-only | The entire GWAS catalog, filtered to SNPs with p-value < 5E-08 |

+

+----------------------------+----------------------------------------------------------------+

+

</pre>

+

If the R package '''gridExtra''' is installed, a summary of each GWAS catalog variant in your region is listed later in the PDF:

+

[[File:New lz gwas summary.png|500px]]

+

=== Fine-mapping credible sets ===

+

LocusZoom can add an additional track to the plot showing results from a fine-mapping analysis. These are typically SNPs within the 95% credible set (see [http://www.nature.com/ng/journal/v44/n12/full/ng.2435.html this paper] for an example.)

+

To add this fine-mapping track, you supply (as a plotting option) the fine-mapping set of credible SNPs as a file:

+

+

locuszoom ... fineMap="my_finemapping_results.txt"

+

</syntaxhighlight>

+

The fine-mapping results file should be a tab-delimited file with each fine-mapping SNP (for example, all those fine-mapping SNPs in the 95% credible set), a descriptive label (EUR/AMR/AFR/etc.), and a color:

+

{| class="wikitable sortable"

+

|-

+

! scope="col" | snp

+

! scope="col" | chr

+

! scope="col" | pos

+

! scope="col" | pp

+

! scope="col" | group

+

! scope="col" | color

+

|-

+

| rs1 || 18 || 55931115 || 0.88 || AMR || red

+

|-

+

| rs1 || 18 || 55920115 || 0.88 || AMR || red

+

|-

+

| rs1 || 18 || 55940115 || 0.88 || AMR || red

+

|-

+

| rs1 || 18 || 55930115 || 0.88 || EUR || blue

+

|-

+

| rs2 || 18 || 55940115 || 0.02 || EUR || blue

+

|-

+

| rs3 || 18 || 56000000 || 0.03 || AFR || green

+

|-

+

| rs4 || 18 || 56022000 || 0.03 || AFR || green

+

|-

+

| rs3 || 18 || 56100000 || 0.03 || ASN || purple

+

|-

+

| rs3 || 18 || 56150000 || 0.03 || ASN || purple

+

|-

+

| rs4 || 18 || 56160000 || 0.03 || ASN || purple

+

|-

+

| rs4 || 18 || 56180000 || 0.03 || ASN || purple

+

|-

+

|}

+

LocusZoom will extract from the file only those SNPs falling within the region to be plotted, so you can provide all of your fine-mapping results in a single file.

+

The generated plot will have a track showing the fine-mapping SNPs:

+

[[File:New lz finemap.png|900px]]

+

If the R package '''gridExtra''' is installed, the PDF will also have a summary of each fine-mapping SNP:

+

[[File:New lz finemap summary.png|400px]]

+

=== Labeling multiple SNPs ===

+

You can specify a file controlling the labels for either the reference SNP, or any other arbitrary SNP within the region. For example:

+

[[File:New lz denote markers.png|700px]]

+

Use the --denote-markers-file <file> argument to do this:

+

+

locuszoom ... --denote-markers-file <your file>

+

</syntaxhighlight>

+

The file looks like:

+

{|

+

|-

+

! scope="col" align="left" | snp

+

! scope="col" align="left" | string

+

! scope="col" align="left" | color

+

|-

+

| rs231362 || GWAS || blue

+

|-

+

| rs163184 || Conditional || purple

+

|-

+

|}

+

It must be tab-delimited and the columns must have a header and be named as such.

== Output ==

Line 396: Line 683:

| --markercol

| Name of the SNP column in the --metal file.

+

|-

+

| --epacts

+

| Provide a results file generated by [[EPACTS]] instead of a --metal file.

|-

| --refsnp

Line 416: Line 706:

| --ld

| Provide a file specifying LD between your reference SNP and all SNPs within the region you wish to plot. You only need to supply this file if you have created LD specifically for your purposes (perhaps a different population or genome build.) Otherwise, LD is computed automatically for you.

+

|-

+

| --ld-vcf

+

| Use a VCF file to calculate LD between SNPs. This can be a VCF file with an entire genome of SNPs and does not have to be subsetted to your region. The VCF file must also have a tabix index file. For calculating D', the VCF must be phased.

|-

| --source

Line 920: Line 1,213:

| PLINK_PATH

| Path to the PLINK binary. Defaults to "plink", which searches for PLINK on your path. If it is not on your path, specify the full path here.

+

|-

+

| RSCRIPT_PATH

+

| Path to the Rscript binary. Defaults to "Rscript", which searches for Rscript on your path. If it is not on your path, specify the full path here.

|-

| SQLITE_DB

Line 926: Line 1,222:

| LD_DB

| Contains a "tree" which maps a tuple of (genotype source, genotype population, genome build) to genotype files.

+

|-

+

| GWAS_CATS

+

| Contains a "tree" which maps genome build and the name of a GWAS catalog to the actual file containing the GWAS hits.

|}

Welchr

239

edits

Changes

LocusZoom Standalone (view source)

Revision as of 16:29, 9 May 2013

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools