Changes

From Genome Analysis Wiki
Jump to navigationJump to search
14,760 bytes added ,  14:28, 31 May 2017
no edit summary
Line 14: Line 14:  
The following software is required:  
 
The following software is required:  
   −
*[http://www.python.org/download/ Python 2.6] (do '''not''' download the 3.0 branch!)  
+
*[http://www.python.org/download/ Python 2.7+] (do '''not''' download the 3.0 branch!)  
*[http://www.r-project.org/ R 2.10+]  
+
*[http://www.r-project.org/ R 3.0+]. Note that if using R 3.1, you must install LocusZoom 1.3 (previous versions will fail.)
 +
 
 +
 
 +
The following software is optional but recommended:
 +
 
 
*[[New Fugue|new_fugue]], a program for computing LD, written by Goncalo Abecasis.
 
*[[New Fugue|new_fugue]], a program for computing LD, written by Goncalo Abecasis.
*[http://pngu.mgh.harvard.edu/~purcell/plink/ PLINK], written by Shaun Purcell.
+
*[http://pngu.mgh.harvard.edu/~purcell/plink/ PLINK]
 +
*[http://samtools.sourceforge.net/ tabix], downloaded with samtools
 +
 
   −
For the latest stable LocusZoom package (including [http://genome.sph.umich.edu/wiki/LocusZoom_Standalone#Sources_of_SQLite_database_tables database tables]), see our [https://statgen.sph.umich.edu/locuszoom/download/ download] page.
+
The following R packages are optional but recommended:
 +
*[http://cran.r-project.org/web/packages/gridExtra/index.html gridExtra] (used for creating summary tables of GWAS hits / fine-mapping SNPs as additional pages in the PDF)
   −
Currently only '''Unix/Linux''' is supported, though Mac OS X should be supported in a future release.
     −
Support for Windows may come at a much later date.
+
For the latest stable LocusZoom package, see our [https://github.com/statgen/locuszoom-standalone download] page. The current version is '''1.3''', released on June 20th, 2014. 
 +
 
 +
Currently only '''Unix/Linux''' is supported, though Mac OS X should be supported in a future release. Support for Windows may come at a much later date.
    
== Synopsis  ==
 
== Synopsis  ==
Line 37: Line 45:  
== Download ==  
 
== Download ==  
   −
See our [https://statgen.sph.umich.edu/locuszoom/download/ download] page for links to the latest as well as previous releases.
+
See our [https://github.com/statgen/locuszoom-standalone download] page for links to the latest as well as previous releases.
    
== Installation  ==
 
== Installation  ==
Line 49: Line 57:  
R is also required for generating the plots. You can download R at [http://www.r-project.org/ www.r-project.org]. Version 2.10 or greater is required.  
 
R is also required for generating the plots. You can download R at [http://www.r-project.org/ www.r-project.org]. Version 2.10 or greater is required.  
   −
=== Step 3: Install new_fugue ===
+
=== Step 3: Install LD calculation software (optional) ===
 +
 
 +
* If you wish to calculate from hg18 sources (hapmap, earlier releases of 1000G): install '''new_fugue''' (see below.)
 +
* If you wish to calculate from hg19 sources (latest 1000G): install '''PLINK''' (see below.)
 +
* If you plan to supply your own LD files per region, or calculate LD directly from VCF files: install nothing! See options for --ld and --ld-vcf.
 +
 
 +
==== new_fugue ====
    
New_fugue is a program that calculates linkage disequilibrium measures from genotype files. While installing new_fugue is optional, we highly recommend it as it makes the process of generating plots much easier. If you opt to skip installing new_fugue, you will need to provide your own computed LD files for each region that you want to plot.  
 
New_fugue is a program that calculates linkage disequilibrium measures from genotype files. While installing new_fugue is optional, we highly recommend it as it makes the process of generating plots much easier. If you opt to skip installing new_fugue, you will need to provide your own computed LD files for each region that you want to plot.  
Line 62: Line 76:  
You may need administrator rights to install this program.
 
You may need administrator rights to install this program.
   −
=== Step 4: Install PLINK ===  
+
==== PLINK ====
    
PLINK is now used to calculate LD for all future LD sources / populations that we may add. The program new_fugue (above) is used to calculate LD from older sources (such as hapmap) and older builds (such as hg18) where LD files are sufficiently small.  
 
PLINK is now used to calculate LD for all future LD sources / populations that we may add. The program new_fugue (above) is used to calculate LD from older sources (such as hapmap) and older builds (such as hg18) where LD files are sufficiently small.  
   −
You can download PLINK and find instructions for installing it [http://pngu.mgh.harvard.edu/~purcell/plink/download.shtml here].  
+
You can download PLINK and find instructions for installing it [http://pngu.mgh.harvard.edu/~purcell/plink/download.shtml here].
 +
 
 +
=== Step 5: Install tabix ===
 +
 
 +
Tabix is used to quickly extract regions from bgzipped and tabix-indexed files. It is used in LocusZoom to extract regions from VCF files when calculating LD, and for extracting from EPACTS result files.
 +
 
 +
It can be downloaded from the sourceforge site [http://samtools.sourceforge.net/ here] or directly to the download site [http://sourceforge.net/projects/samtools/files/ here].
   −
=== Step 5: Install LocusZoom  ===
+
=== Step 6: Install LocusZoom  ===
    
LocusZoom is provided as a tar archive which contains the following:  
 
LocusZoom is provided as a tar archive which contains the following:  
   −
*the LocusZoom python application  
+
*The LocusZoom python application  
*the R script used for generating plots  
+
*The R script used for generating plots  
 
*Human genome '''build hg18 and hg19''' data, including:  
 
*Human genome '''build hg18 and hg19''' data, including:  
 
**genotype files (used for computing LD) from HapMap and 1000G  
 
**genotype files (used for computing LD) from HapMap and 1000G  
Line 88: Line 108:  
***locuszoom (this is the locuszoom "executable")  
 
***locuszoom (this is the locuszoom "executable")  
 
***locuszoom.R (the R script which is used by locuszoom for creating the plots)  
 
***locuszoom.R (the R script which is used by locuszoom for creating the plots)  
 +
***dbmeister.py (script for creating custom user databases)
 +
***lzupdate.py (script for creating an updated copy of the provided locuszoom database)
 
**conf/ (configuration file located here)  
 
**conf/ (configuration file located here)  
 
**data/  
 
**data/  
Line 111: Line 133:     
For annotation:  
 
For annotation:  
*We used various sources including RefSeq Genes (refFlat), TFBS Conserved (tfbsConsSites), and Conservation (phaseConsElements44wayPlacental), all available from the [http://genome.usc.edu UCSC Genome Browser].  
+
*We use various sources including RefSeq Genes (refFlat), TFBS Conserved (tfbsConsSites), and Conservation (phaseConsElements44wayPlacental), all available from the [http://genome.usc.edu UCSC Genome Browser].  
 
*[ftp://ftp.hapmap.org/hapmap/recombination/2008-03_rel22_B36/rates/ Recombination rates from HapMap].
 
*[ftp://ftp.hapmap.org/hapmap/recombination/2008-03_rel22_B36/rates/ Recombination rates from HapMap].
 +
 +
For GWAS hits:
 +
*We use the NHGRI GWAS catalog, available at [http://www.genome.gov/gwastudies/ genome.gov]
    
== Input  ==
 
== Input  ==
   −
=== Association results file ("metal" file)  ===
+
=== Association results file ===
   −
The main input to LocusZoom is a file containing results from an association scan or meta-analysis. The file must have 2 columns: markers (SNPs), and p-values. The file should look something like this:  
+
LocusZoom requires an association results file similar in formatting to what METAL or EPACTS provides.  
 +
 
 +
==== METAL formatted file ====
 +
 
 +
The file must have 2 columns: markers (SNPs), and p-values. The file should look something like this:  
    
<br>  
 
<br>  
Line 146: Line 175:     
P-values of any magnitude are supported in scientific notation (we use an arbitrary precision library built-in to python, and transform p-values to the log scale.) If you've already transformed your p-values to the log scale, simply use <code>--no-transform</code> and LocusZoom will not transform them.
 
P-values of any magnitude are supported in scientific notation (we use an arbitrary precision library built-in to python, and transform p-values to the log scale.) If you've already transformed your p-values to the log scale, simply use <code>--no-transform</code> and LocusZoom will not transform them.
 +
 +
==== EPACTS formatted file ====
 +
 +
The file can come directly from [[EPACTS]], or simply be formatted similarly to the following:
 +
 +
{|
 +
|-
 +
! scope="col" | #CHROM
 +
! scope="col" | BEGIN
 +
! scope="col" | END
 +
! scope="col" | MARKER_ID
 +
! scope="col" | NS
 +
! scope="col" | AC
 +
! scope="col" | CALLRATE
 +
! scope="col" | MAF
 +
! scope="col" | PVALUE
 +
! scope="col" | SCORE
 +
! scope="col" | N.CASE
 +
! scope="col" | N.CTRL
 +
! scope="col" | AF.CASE
 +
! scope="col" | AF.CTRL
 +
|-
 +
| 1 || 15903 || 15903 || 1:15903_G/GC || 2657 || 3892.2 || 1 || 0.26757 || 0.36771 || 0.90077 || 1326 || 1331 || 1.4688 || 1.4609
 +
|-
 +
| 1 || 19190 || 19191 || 1:19190_GC/G || 2657 || 823.65 || 1 || 0.155 || 0.67173 || 0.42378 || 1326 || 1331 || 0.3115 || 0.30849
 +
|-
 +
| 1 || 20316 || 20317 || 1:20316_GA/G || 2657 || 1005.3 || 1 || 0.18917 || 0.50804 || 0.66189 || 1326 || 1331 || 0.38062 || 0.37607
 +
|-
 +
| 1 || 30967 || 30970 || 1:30967_CCCA/C || 2657 || 435.35 || 1 || 0.081925 || 0.08848 || -1.7035 || 1326 || 1331 || 0.16007 || 0.16762
 +
|-
 +
| 1 || 51972 || 51975 || 1:51972_GGAC/G || 2657 || 207.8 || 1 || 0.039104 || 0.51638 || -0.64893 || 1326 || 1331 || 0.077187 || 0.079226
 +
|-
 +
| 1 || 53138 || 53140 || 1:53138_TAA/T || 2657 || 216.2 || 1 || 0.040685 || 0.55679 || 0.58762 || 1326 || 1331 || 0.083145 || 0.079602
 +
|-
 +
| 1 || 54421 || 54421 || 1:54421_A/G || 2657 || 179.45 || 1 || 0.033769 || 0.73592 || 0.33726 || 1326 || 1331 || 0.068213 || 0.066867
 +
|-
 +
| 1 || 66221 || 66221 || 1:66221_A/AT || 2657 || 664.45 || 1 || 0.12504 || 0.48676 || 0.69547 || 1326 || 1331 || 0.25366 || 0.24651
 +
|-
 +
| 1 || 66222 || 66223 || 1:66222_TA/T || 2657 || 470.3 || 1 || 0.088502 || 0.64258 || 0.4641 || 1326 || 1331 || 0.17941 || 0.17461
 +
|-
 +
|}
 +
 +
The chrom, start, end, marker ID, and p-value columns must all be present. The file must be tab-delimited.
 +
 +
To load this file, use --epacts.
 +
 +
<span style="color:#00CC33">'''Note'''</span>: LocusZoom (as of 1.3) will now use the tabix index for the EPACTS file if it exists and if tabix is intalled on your system. This results in much faster loading of EPACTS files and should absolutely be used if possible.
 +
 +
<span style="color:#ff0000">'''Warning'''</span>: The "test" version of EPACTS changed the format of the output. To make LZ work, you'll also need to add <code>--epacts-beg-col BEG</code> to your command line.
 +
 +
==== Reading from STDIN ====
 +
 +
If you have a quick way of pulling out regions from your association results to plot (such as with tabix), you can pass the data directly to locuszoom on STDIN by specifying the file as "-". For example:
 +
 +
<pre>
 +
tabix -h my_file.gz 1:1-10000 | locuszoom --metal - --refgene TCF7L2
 +
</pre>
    
=== Region  ===
 
=== Region  ===
Line 186: Line 272:  
! align="left" scope="col" | Population  
 
! align="left" scope="col" | Population  
 
! align="left" scope="col" | LocusZoom Arguments
 
! align="left" scope="col" | LocusZoom Arguments
 +
|-
 +
| March 2012
 +
| hg19
 +
| ASN
 +
| --pop ASN --build hg19 --source 1000G_March2012
 +
|-
 +
| March 2012
 +
| hg19
 +
| AFR
 +
| --pop AFR--build hg19 --source 1000G_March2012
 +
|-
 +
| March 2012
 +
| hg19
 +
| EUR
 +
| --pop EUR --build hg19 --source 1000G_March2012
 +
|-
 +
| March 2012
 +
| hg19
 +
| AMR
 +
| --pop AMR --build hg19 --source 1000G_March2012
 
|-
 
|-
 
| Nov 2010
 
| Nov 2010
Line 253: Line 359:  
| --pop JPT+CHB --build hg18 --source hapmap
 
| --pop JPT+CHB --build hg18 --source hapmap
 
|}
 
|}
 +
    
=== Batch mode  ===
 
=== Batch mode  ===
Line 365: Line 472:     
The file should be whitespace delimited, and the header (column names shown above) must exist.
 
The file should be whitespace delimited, and the header (column names shown above) must exist.
 +
 +
=== Supply VCF files for calculating LD ===
 +
 +
You can give LocusZoom a VCF file directly to use for calculating LD:
 +
 +
<pre>
 +
locuszoom --ld-vcf my_genotypes.vcf.gz ...
 +
</pre>
 +
 +
This option takes the place of having to supply per-region pre-calculated LD (--ld) or having to specify --pop and --source for calculating LD from genotype files supplied by LZ.
 +
 +
<span style="color:#FF6600">'''Warning: '''</span> The VCF file must also have a [http://samtools.sourceforge.net/tabix.shtml tabix] index located in the same directory. For the above example, the tabix index "my_genotypes.vcf.gz.tbi" must exist.
 +
 +
 +
You can also calculate D' from phased VCF files:
 +
 +
<pre>
 +
locuszoom --ld-vcf my_genotypes.vcf.gz --ld-measure dprime ...
 +
</pre>
 +
 +
The default measure is "rsquared".
 +
 +
In version 1.3, if you have VCF files separated out by chromosome, you can create a JSON file mapping chromosome name --> VCF file, and provide the JSON file to --ld-vcf. For example, the JSON file could look like:
 +
 +
<pre>
 +
{
 +
  "X": "/path/to/X.vcf.gz",
 +
  "Y": "/path/to/Y.vcf.gz",
 +
  "MT": "/path/to/MT.vcf.gz",
 +
}
 +
</pre>
 +
 +
And then pass it directly using <code>locuszoom --ld-vcf my_vcfs.json</code>.
 +
 +
== Optional Input ==
 +
 +
=== Plotting LD with additional reference SNPs ===
 +
 +
LocusZoom can now show LD with multiple SNPs in a region (for example, you might want to show LD with a number of SNPs from a conditional analysis.)
 +
 +
You give LocusZoom the usual reference SNP (used for centering the plot and calculating the region) but an additional set of lead/reference SNPs as well.
 +
 +
For all other SNPs not in the "lead SNP set" of { reference SNP, additional reference SNPs }, LZ will find which of the lead SNPs it is in highest LD with, and color it to match that lead SNP. The extent of LD with the lead SNP is shown by a gradient of color.
 +
 +
 +
As an example:
 +
 +
<syntaxhighlight lang="bash">
 +
locuszoom --metal <DIAGRAM T2D results> --refsnp "rs231362" --add-refsnps "rs163184"
 +
</syntaxhighlight>
 +
 +
 +
Will generate the following plot:
 +
 +
[[File:New lz cond only.png|700px]]
 +
 +
 +
The following options are available for changing the style of these types of plots:
 +
 +
{| width="85%" cellspacing="0" cellpadding="5" border="1"
 +
|-
 +
! scope="col" | Option (with default value)
 +
! scope="col" | Description
 +
|-
 +
| condLdColors="gray60,#E41A1C,#377EB8,#4DAF4A,#984EA3,#FF7F00,#A65628,#F781BF"
 +
| First color is missing LD color, the rest are used as needed for each additional lead SNP
 +
|-
 +
| drawMarkerNames = T
 +
| Display marker names (or not) above lead SNPs
 +
|-
 +
| condLdLow=NULL
 +
| Used to set all SNPs with LD in the lowest bin to the same color, for example condLdLow="gray70"
 +
|-
 +
| condRefsnpPch=23
 +
| Symbol for each lead SNP, defaults to diamond
 +
|-
 +
| condPch='4,16,17,15,25,8,7,13,12,9,10'
 +
| Plotting symbols for groups of SNPs in LD with additional refsnps, make sure they don't overlap with condRefsnpPch above
 +
|-
 +
| ldCuts = "0,.2,.4,.6,.8,1"
 +
| Bins for LD
 +
|}
 +
 +
=== GWAS catalog variants ===
 +
 +
You can add known GWAS variants to your plots. For example:
 +
 +
<syntaxhighlight lang="bash">
 +
locuszoom ... --gwas-cat whole-cat_significant-only --build hg19
 +
</syntaxhighlight>
 +
 +
[[File:New lz gwas cat.png|900px]]
 +
 +
 +
Currently the only catalog is the NHGRI GWAS catalog from [http://www.genome.gov/gwastudies/ genome.gov].
 +
 +
<pre>
 +
Available GWAS catalogs for build hg19:
 +
 +
+----------------------------+----------------------------------------------------------------+
 +
|          Option          |                          Description                          |
 +
+----------------------------+----------------------------------------------------------------+
 +
| whole-cat_significant-only | The entire GWAS catalog, filtered to SNPs with p-value < 5E-08 |
 +
+----------------------------+----------------------------------------------------------------+
 +
</pre>
 +
 +
 +
If the R package '''gridExtra''' is installed, a summary of each GWAS catalog variant in your region is listed later in the PDF:
 +
 +
[[File:New lz gwas summary.png|500px]]
 +
 +
=== Fine-mapping credible sets ===
 +
 +
LocusZoom can add an additional track to the plot showing results from a fine-mapping analysis. These are typically SNPs within the 95% credible set (see [http://www.nature.com/ng/journal/v44/n12/full/ng.2435.html this paper] for an example of a method generating such a set of SNPs.)
 +
 +
To add this fine-mapping track, you supply (as a plotting option) the fine-mapping set of credible SNPs as a file:
 +
 +
<syntaxhighlight lang="bash">
 +
locuszoom ... fineMap="my_finemapping_results.txt"
 +
</syntaxhighlight>
 +
 +
 +
The fine-mapping results file should be a tab-delimited file with each fine-mapping SNP (for example, all those fine-mapping SNPs in the 95% credible set), a descriptive label (EUR/AMR/AFR/etc.), and a color:
 +
 +
{| class="wikitable sortable"
 +
|-
 +
! scope="col" | snp
 +
! scope="col" | chr
 +
! scope="col" | pos
 +
! scope="col" | pp
 +
! scope="col" | group
 +
! scope="col" | color
 +
|-
 +
| rs1 || 18 || 55931115 || 0.88 || AMR || red
 +
|-
 +
| rs1 || 18 || 55920115 || 0.88 || AMR || red
 +
|-
 +
| rs1 || 18 || 55940115 || 0.88 || AMR || red
 +
|-
 +
| rs1 || 18 || 55930115 || 0.88 || EUR || blue
 +
|-
 +
| rs2 || 18 || 55940115 || 0.02 || EUR || blue
 +
|-
 +
| rs3 || 18 || 56000000 || 0.03 || AFR || green
 +
|-
 +
| rs4 || 18 || 56022000 || 0.03 || AFR || green
 +
|-
 +
| rs3 || 18 || 56100000 || 0.03 || ASN || purple
 +
|-
 +
| rs3 || 18 || 56150000 || 0.03 || ASN || purple
 +
|-
 +
| rs4 || 18 || 56160000 || 0.03 || ASN || purple
 +
|-
 +
| rs4 || 18 || 56180000 || 0.03 || ASN || purple
 +
|-
 +
|}
 +
 +
LocusZoom will extract from the file only those SNPs falling within the region to be plotted, so you can provide all of your fine-mapping results in a single file.
 +
 +
 +
The generated plot will have a track showing the fine-mapping SNPs:
 +
 +
[[File:New lz finemap.png|900px]]
 +
 +
 +
If the R package '''gridExtra''' is installed, the PDF will also have a summary of each fine-mapping SNP:
 +
 +
[[File:New lz finemap summary.png|400px]]
 +
 +
=== Labeling multiple SNPs ===
 +
 +
You can specify a file controlling the labels for either the reference SNP, or any other arbitrary SNP within the region. For example:
 +
 +
[[File:New lz denote markers.png|700px]]
 +
 +
Use the --denote-markers-file <file> argument to do this:
 +
 +
<syntaxhighlight lang="bash">
 +
locuszoom ... --denote-markers-file <your file>
 +
</syntaxhighlight>
 +
 +
The file looks like:
 +
 +
{|
 +
|-
 +
! scope="col" align="left" | snp
 +
! scope="col" align="left" | string
 +
! scope="col" align="left" | color
 +
|-
 +
| rs231362 || GWAS || blue
 +
|-
 +
| rs163184 || Conditional || purple
 +
|-
 +
|}
 +
 +
It must be tab-delimited and the columns must have a header and be named as such.
 +
 +
=== Plotting BED tracks ===
 +
 +
You can supply locuszoom with a BED file, and the tracks within it will be added to the plot. For example:
 +
 +
[[File:Bed_tracks.png]]
 +
 +
Use the --bed-tracks option, for example:
 +
 +
<pre>
 +
locuszoom ... --bed-tracks <your bed file>
 +
</pre>
 +
 +
The BED file should have at least 4 columns: the first 3 for chr/start/end, and the 4th column for the label of the track. It must be '''tab-delimited''', not white-space delimited.
 +
 +
Color can also be specified, but the BED file then needs to follow the full [http://genome.ucsc.edu/FAQ/FAQformat.html#format1 BED format].
 +
 +
=== Specify gene table (refFlat, GENCODE, etc.) ===
 +
 +
You can now specify a different gene information table to use. LocusZoom provides both refFlat and GENCODE. refFlat is the default. For example:
 +
 +
<pre>
 +
locuszoom --gene-table gencode
 +
</pre>
    
== Output  ==
 
== Output  ==
Line 398: Line 725:  
| --markercol  
 
| --markercol  
 
| Name of the SNP column in the --metal file.
 
| Name of the SNP column in the --metal file.
 +
|-
 +
| --epacts
 +
| Provide a results file generated by [[EPACTS]] instead of a --metal file.
 
|-
 
|-
 
| --refsnp  
 
| --refsnp  
Line 414: Line 744:  
|-
 
|-
 
| --build  
 
| --build  
| Human genome build. This defaults to "hg18", and is the only build we provide data for currently. You can supply your own build-specific data by modifying the conf file, and creating your own SQLite database (see *LINK HERE*).
+
| Human genome build. This defaults to "hg18". You can supply your own build-specific data by modifying the conf file, and creating your own SQLite database (see *LINK HERE*).
 
|-
 
|-
 
| --ld  
 
| --ld  
 
| Provide a file specifying LD between your reference SNP and all SNPs within the region you wish to plot. You only need to supply this file if you have created LD specifically for your purposes (perhaps a different population or genome build.) Otherwise, LD is computed automatically for you.
 
| Provide a file specifying LD between your reference SNP and all SNPs within the region you wish to plot. You only need to supply this file if you have created LD specifically for your purposes (perhaps a different population or genome build.) Otherwise, LD is computed automatically for you.
 +
|-
 +
| --ld-vcf
 +
| Use a VCF file to calculate LD between SNPs. This can be a VCF file with an entire genome of SNPs and does not have to be subsetted to your region. The VCF file must also have a tabix index file. For calculating D', the VCF must be phased.
 
|-
 
|-
 
| --source  
 
| --source  
Line 432: Line 765:  
|-
 
|-
 
| --no-transform  
 
| --no-transform  
| LocusZoom supports arbitrary precision p-values. However, if your p-values have already been transformed to the log scale, you can use this option to stop LocusZoom from automatically transforming them.
+
| LocusZoom supports arbitrary precision p-values. However, if your p-values have already been transformed to the -log10 scale, you can use this option to stop LocusZoom from automatically transforming them.
 
|-
 
|-
 
| --prefix  
 
| --prefix  
Line 448: Line 781:     
In addition to the options above, there are options that control the plotting engine inside Locuszoom.  These are used with a different syntax: arg=value (no spaces allowed).
 
In addition to the options above, there are options that control the plotting engine inside Locuszoom.  These are used with a different syntax: arg=value (no spaces allowed).
 +
 +
New/fixed options in 1.3:
    +
{| width="85%" cellspacing="0" cellpadding="5" border="1"
 +
|-
 +
! scope="col" | Option (with default value)
 +
! scope="col" | Description
 +
|-
 +
| colorCol=NULL
 +
| Specify the name of a column in association results file denoting the color each marker should be. This disables coloring by LD. For the column values, color names should be used, for example "red" "olivedrab" etc.
 +
|-
 +
| signifLine=NULL
 +
| Specify (in -log10 p-value scale) where to place a horizontal significance line. Can have multiple lines, e.g. signifLine="7.3,9"
 +
|-
 +
| signifLineColor=NULL
 +
| Specify color of each significance line, e.g. signifLineColor="red,blue"
 +
|-
 +
| signifLineWidth=NULL
 +
| Specify the line width for each significance line, e.g. signifLineWidth="2,3"
 +
|-
 +
| showIso=F
 +
| Show genes as isoforms, rather than collapsed into one canonical transcript. To enable use showIso=T
 +
|}
 +
 +
<br>
 +
 +
Other options:
 
{| width="85%" cellspacing="0" cellpadding="5" border="1"
 
{| width="85%" cellspacing="0" cellpadding="5" border="1"
 
|-
 
|-
Line 643: Line 1,002:  
<code></code>
 
<code></code>
 
<pre>--metal your_data --refsnp rs7983146 --flank 500kb
 
<pre>--metal your_data --refsnp rs7983146 --flank 500kb
</pre>
  −
=== Create a plot for each SNP in a file  ===
  −
  −
<code></code>
  −
<pre>--metal your_data --hits file_with_snps
   
</pre>  
 
</pre>  
 
=== Use 1000 genomes, CEU for LD instead of the default (HapMap r22 CEU)  ===
 
=== Use 1000 genomes, CEU for LD instead of the default (HapMap r22 CEU)  ===
Line 658: Line 1,012:     
<code></code>
 
<code></code>
<pre>--metal your_data --refsnp rs11899863 --pop YRI
+
<pre>--metal your_data --refsnp rs11899863 --pop YRI --build hg18 --source hapmap
 
</pre>
 
</pre>
   Line 677: Line 1,031:  
== Advanced configuration  ==
 
== Advanced configuration  ==
   −
=== Creating a SQLite database  ===
+
=== Creating a custom SQLite database  ===
   −
As a starting point, we provide a SQLite database based on UCSC human genome '''build hg18''', which includes the following tables:  
+
As a starting point, we provide SQLite databases based on UCSC human genome '''build hg18 and hg19''', which includes the following tables:  
    
*snp_pos: SNP positions  
 
*snp_pos: SNP positions  
Line 903: Line 1,257:     
If you wish for your database to become the default, change the <code>LATEST_BUILD</code> variable in the m2zfast.conf file to whatever you have chosen above (in our example, our new database became mapped to 'hg19'.)
 
If you wish for your database to become the default, change the <code>LATEST_BUILD</code> variable in the m2zfast.conf file to whatever you have chosen above (in our example, our new database became mapped to 'hg19'.)
 +
 +
=== Updating the existing locuszoom database(s) ===
 +
 +
LocusZoom now comes with a database updating script <code>bin/lzupdate.py</code>. This script can download the necessary data from UCSC, NCBI, NGHRI, and GENCODE to create an up-to-date database file. The script performs the following actions:
 +
 +
# Download latest SNP table from UCSC for the given build
 +
# Reformat SNP table for insertion into sqlite database
 +
# Download latest refFlat from UCSC for the given build
 +
# Reformat refFlat for insertion into sqlite database
 +
# (optional) Download GENCODE annotation file from GENCODE FTP site
 +
# Download RsMergeArch from NCBI
 +
# Write formatted translation table for old rsIDs to latest (from RsMergeArch)
 +
# Create a SNP set file (for indicating rug of markers at top of plot for different genotyping arrays)
 +
# Download the latest NHGRI GWAS catalog
 +
# Format catalog for use with locuszoom
 +
# Call <code>bin/dbmeister.py</code> to insert everything above (except the GWAS catalog file, which remains a separate file)
 +
 +
An example of running the script:
 +
 +
<pre>
 +
bin/lzupdate.py --build hg19 --gencode 19 --gwas-cat
 +
</pre>
 +
 +
The script will NOT overwrite the existing locuszoom database, since you should likely back it up first (under data/database/*.db). After running the script you should have both a new locuszoom.db file, and a gwas catalog file. You can then either overwrite the locuszoom database after backing it up, or you could place them in a different location and modify the conf file accordingly. The script will provide instructions after running for how to do this.
    
=== Changing m2zfast.conf settings  ===
 
=== Changing m2zfast.conf settings  ===
Line 927: Line 1,305:  
| PLINK_PATH
 
| PLINK_PATH
 
| Path to the PLINK binary. Defaults to "plink", which searches for PLINK&nbsp;on your path. If it is not on your path, specify the full path here.  
 
| Path to the PLINK binary. Defaults to "plink", which searches for PLINK&nbsp;on your path. If it is not on your path, specify the full path here.  
 +
|-
 +
| RSCRIPT_PATH
 +
| Path to the Rscript binary. Defaults to "Rscript", which searches for Rscript&nbsp;on your path. If it is not on your path, specify the full path here.
 
|-
 
|-
 
| SQLITE_DB  
 
| SQLITE_DB  
Line 933: Line 1,314:  
| LD_DB  
 
| LD_DB  
 
| Contains a "tree" which maps a tuple of (genotype source, genotype population, genome build) to genotype files.
 
| Contains a "tree" which maps a tuple of (genotype source, genotype population, genome build) to genotype files.
 +
|-
 +
| GWAS_CATS
 +
| Contains a "tree" which maps genome build and the name of a GWAS catalog to the actual file containing the GWAS hits.
 
|}
 
|}
  
239

edits

Navigation menu