Changes

From Genome Analysis Wiki
Jump to navigationJump to search
5,442 bytes added ,  17:30, 23 June 2014
no edit summary
Line 14: Line 14:  
The following software is required:  
 
The following software is required:  
   −
*[http://www.python.org/download/ Python 2.6] (do '''not''' download the 3.0 branch!)  
+
*[http://www.python.org/download/ Python 2.7+] (do '''not''' download the 3.0 branch!)  
*[http://www.r-project.org/ R 2.10+]  
+
*[http://www.r-project.org/ R 3.0+]. Note that if using R 3.1, you must install LocusZoom 1.3 (previous versions will fail.)
      Line 21: Line 21:     
*[[New Fugue|new_fugue]], a program for computing LD, written by Goncalo Abecasis.
 
*[[New Fugue|new_fugue]], a program for computing LD, written by Goncalo Abecasis.
*[http://pngu.mgh.harvard.edu/~purcell/plink/ PLINK], written by Shaun Purcell.
+
*[http://pngu.mgh.harvard.edu/~purcell/plink/ PLINK]
 +
*[http://samtools.sourceforge.net/ tabix], downloaded with samtools
      Line 28: Line 29:       −
For the latest stable LocusZoom package, see our [https://statgen.sph.umich.edu/locuszoom/download/ download] page. The current version is '''1.2''', released on May 9th, 2013.  
+
For the latest stable LocusZoom package, see our [https://statgen.sph.umich.edu/locuszoom/download/ download] page. The current version is '''1.3''', released on June 20th, 2014.
    
Currently only '''Unix/Linux''' is supported, though Mac OS X should be supported in a future release. Support for Windows may come at a much later date.
 
Currently only '''Unix/Linux''' is supported, though Mac OS X should be supported in a future release. Support for Windows may come at a much later date.
Line 45: Line 46:     
See our [https://statgen.sph.umich.edu/locuszoom/download/ download] page for links to the latest as well as previous releases.
 
See our [https://statgen.sph.umich.edu/locuszoom/download/ download] page for links to the latest as well as previous releases.
 +
 +
== Changes in Version 1.3 ==
 +
 +
New features:
 +
 +
* Database and GWAS catalog files updated for hg19
 +
* [[#Plotting BED tracks| Adding BED tracks]]
 +
* [[#Updating the existing locuszoom database(s)| Update locuszoom's database without waiting for a release]]
 +
* [[#Specify gene table (refFlat, GENCODE, etc.) | Use different gene information tables + GENCODE support]]
 +
* [[#EPACTS formatted file|Support for tabix indexed EPACTS files]]
 +
* [[#Reading from STDIN| Read data from STDIN]]
 +
* [[#Plotting options| New plotting options for color, significance lines]]
 +
* [[#Supply VCF files for calculating LD|Provide multiple chromosome separated VCF files for calculating LD]]
 +
 +
The full changelog is available on the [https://statgen.sph.umich.edu/locuszoom/download/ download] site.
    
== Changes in Version 1.2 ==
 
== Changes in Version 1.2 ==
Line 56: Line 72:  
* [[#GWAS catalog variants|GWAS catalog variants]]
 
* [[#GWAS catalog variants|GWAS catalog variants]]
 
* [[#Supply VCF files for calculating LD|Supply VCF files for calculating LD]]
 
* [[#Supply VCF files for calculating LD|Supply VCF files for calculating LD]]
      
The full changelog is available on the [https://statgen.sph.umich.edu/locuszoom/download/ download] site.
 
The full changelog is available on the [https://statgen.sph.umich.edu/locuszoom/download/ download] site.
Line 95: Line 110:  
You can download PLINK and find instructions for installing it [http://pngu.mgh.harvard.edu/~purcell/plink/download.shtml here].
 
You can download PLINK and find instructions for installing it [http://pngu.mgh.harvard.edu/~purcell/plink/download.shtml here].
   −
=== Step 5: Install LocusZoom  ===
+
=== Step 5: Install tabix ===
 +
 
 +
Tabix is used to quickly extract regions from bgzipped and tabix-indexed files. It is used in LocusZoom to extract regions from VCF files when calculating LD, and for extracting from EPACTS result files.
 +
 
 +
It can be downloaded from the sourceforge site [http://samtools.sourceforge.net/ here] or directly to the download site [http://sourceforge.net/projects/samtools/files/ here].
 +
 
 +
=== Step 6: Install LocusZoom  ===
    
LocusZoom is provided as a tar archive which contains the following:  
 
LocusZoom is provided as a tar archive which contains the following:  
   −
*the LocusZoom python application  
+
*The LocusZoom python application  
*the R script used for generating plots  
+
*The R script used for generating plots  
 
*Human genome '''build hg18 and hg19''' data, including:  
 
*Human genome '''build hg18 and hg19''' data, including:  
 
**genotype files (used for computing LD) from HapMap and 1000G  
 
**genotype files (used for computing LD) from HapMap and 1000G  
Line 115: Line 136:  
***locuszoom (this is the locuszoom "executable")  
 
***locuszoom (this is the locuszoom "executable")  
 
***locuszoom.R (the R script which is used by locuszoom for creating the plots)  
 
***locuszoom.R (the R script which is used by locuszoom for creating the plots)  
 +
***dbmeister.py (script for creating custom user databases)
 +
***lzupdate.py (script for creating an updated copy of the provided locuszoom database)
 
**conf/ (configuration file located here)  
 
**conf/ (configuration file located here)  
 
**data/  
 
**data/  
Line 225: Line 248:     
To load this file, use --epacts.
 
To load this file, use --epacts.
 +
 +
<span style="color:#00CC33">'''Note'''</span>: LocusZoom (as of 1.3) will now use the tabix index for the EPACTS file if it exists and if tabix is intalled on your system. This results in much faster loading of EPACTS files and should absolutely be used if possible.
    
<span style="color:#ff0000">'''Warning'''</span>: The "test" version of EPACTS changed the format of the output. To make LZ work, you'll also need to add <code>--epacts-beg-col BEG</code> to your command line.
 
<span style="color:#ff0000">'''Warning'''</span>: The "test" version of EPACTS changed the format of the output. To make LZ work, you'll also need to add <code>--epacts-beg-col BEG</code> to your command line.
 +
 +
==== Reading from STDIN ====
 +
 +
If you have a quick way of pulling out regions from your association results to plot (such as with tabix), you can pass the data directly to locuszoom on STDIN by specifying the file as "-". For example:
 +
 +
<pre>
 +
tabix -h my_file.gz 1:1-10000 | locuszoom --metal - --refgene TCF7L2
 +
</pre>
    
=== Region  ===
 
=== Region  ===
Line 472: Line 505:  
You can give LocusZoom a VCF file directly to use for calculating LD:  
 
You can give LocusZoom a VCF file directly to use for calculating LD:  
   −
<syntaxhighlight lang="bash">
+
<pre>
 
locuszoom --ld-vcf my_genotypes.vcf.gz ...
 
locuszoom --ld-vcf my_genotypes.vcf.gz ...
</syntaxhighlight>
+
</pre>
    
This option takes the place of having to supply per-region pre-calculated LD (--ld) or having to specify --pop and --source for calculating LD from genotype files supplied by LZ.  
 
This option takes the place of having to supply per-region pre-calculated LD (--ld) or having to specify --pop and --source for calculating LD from genotype files supplied by LZ.  
Line 483: Line 516:  
You can also calculate D' from phased VCF files:  
 
You can also calculate D' from phased VCF files:  
   −
<syntaxhighlight lang="bash">
+
<pre>
 
locuszoom --ld-vcf my_genotypes.vcf.gz --ld-measure dprime ...
 
locuszoom --ld-vcf my_genotypes.vcf.gz --ld-measure dprime ...
</syntaxhighlight>
+
</pre>
    
The default measure is "rsquared".
 
The default measure is "rsquared".
 +
 +
In version 1.3, if you have VCF files separated out by chromosome, you can create a JSON file mapping chromosome name --> VCF file, and provide the JSON file to --ld-vcf. For example, the JSON file could look like:
 +
 +
<pre>
 +
{
 +
  "X": "/path/to/X.vcf.gz",
 +
  "Y": "/path/to/Y.vcf.gz",
 +
  "MT": "/path/to/MT.vcf.gz",
 +
}
 +
</pre>
 +
 +
And then pass it directly using <code>locuszoom --ld-vcf my_vcfs.json</code>.
    
== Optional Input ==
 
== Optional Input ==
Line 652: Line 697:     
It must be tab-delimited and the columns must have a header and be named as such.
 
It must be tab-delimited and the columns must have a header and be named as such.
 +
 +
=== Plotting BED tracks ===
 +
 +
You can supply locuszoom with a BED file, and tracks available will be added to the plot. For example:
 +
 +
[[File:Bed_tracks.png]]
 +
 +
Use the --bed-tracks option, for example:
 +
 +
<syntaxhighlight lang="bash">
 +
locuszoom ... --bed-file <your file>
 +
</syntaxhighlight>
 +
 +
The BED file should have 4 columns: the first 3 for chr/start/end, and the 4th column for the label of the track.
 +
 +
=== Specify gene table (refFlat, GENCODE, etc.) ===
 +
 +
You can now specify a different gene information table to use. LocusZoom provides both refFlat and GENCODE. refFlat is the default. For example:
 +
 +
<pre>
 +
locuszoom --gene-table gencode
 +
</pre>
    
== Output  ==
 
== Output  ==
Line 741: Line 808:     
In addition to the options above, there are options that control the plotting engine inside Locuszoom.  These are used with a different syntax: arg=value (no spaces allowed).
 
In addition to the options above, there are options that control the plotting engine inside Locuszoom.  These are used with a different syntax: arg=value (no spaces allowed).
 +
 +
New/fixed options in 1.3:
    +
{| width="85%" cellspacing="0" cellpadding="5" border="1"
 +
|-
 +
! scope="col" | Option (with default value)
 +
! scope="col" | Description
 +
|-
 +
| colorCol=NULL
 +
| Specify the name of a column in association results file denoting the color each marker should be. This disables coloring by LD.
 +
|-
 +
| signifLine=NULL
 +
| Specify (in p-value scale) where to place a horizontal significance line. Can have multiple lines, e.g. signifLine="5e-08,1e-10"
 +
|-
 +
| signifLineColor=NULL
 +
| Specify color of each significance line, e.g. signifLineColor="red,blue"
 +
|-
 +
| signifLineWidth=NULL
 +
| Specify the line width for each significance line, e.g. signifLineWidth="2,3"
 +
|-
 +
| showIso=F
 +
| Show genes as isoforms, rather than collapsed into one canonical transcript. To enable use showIso=T
 +
|}
 +
 +
<br>
 +
 +
Other options:
 
{| width="85%" cellspacing="0" cellpadding="5" border="1"
 
{| width="85%" cellspacing="0" cellpadding="5" border="1"
 
|-
 
|-
Line 965: Line 1,058:  
== Advanced configuration  ==
 
== Advanced configuration  ==
   −
=== Creating a SQLite database  ===
+
=== Creating a custom SQLite database  ===
   −
As a starting point, we provide a SQLite database based on UCSC human genome '''build hg18''', which includes the following tables:  
+
As a starting point, we provide SQLite databases based on UCSC human genome '''build hg18 and hg19''', which includes the following tables:  
    
*snp_pos: SNP positions  
 
*snp_pos: SNP positions  
Line 1,191: Line 1,284:     
If you wish for your database to become the default, change the <code>LATEST_BUILD</code> variable in the m2zfast.conf file to whatever you have chosen above (in our example, our new database became mapped to 'hg19'.)
 
If you wish for your database to become the default, change the <code>LATEST_BUILD</code> variable in the m2zfast.conf file to whatever you have chosen above (in our example, our new database became mapped to 'hg19'.)
 +
 +
=== Updating the existing locuszoom database(s) ===
 +
 +
LocusZoom now comes with a database updating script <code>bin/lzupdate.py</code>. This script can download the necessary data from UCSC, NCBI, NGHRI, and GENCODE to create an up-to-date database file. The script performs the following actions:
 +
 +
# Download latest SNP table from UCSC for the given build
 +
# Reformat SNP table for insertion into sqlite database
 +
# Download latest refFlat from UCSC for the given build
 +
# Reformat refFlat for insertion into sqlite database
 +
# (optional) Download GENCODE annotation file from GENCODE FTP site
 +
# Download RsMergeArch from NCBI
 +
# Write formatted translation table for old rsIDs to latest (from RsMergeArch)
 +
# Create a SNP set file (for indicating rug of markers at top of plot for different genotyping arrays)
 +
# Download the latest NHGRI GWAS catalog
 +
# Format catalog for use with locuszoom
 +
# Call <code>bin/dbmeister.py</code> to insert everything above (except the GWAS catalog file, which remains a separate file)
 +
 +
An example of running the script:
 +
 +
<pre>
 +
bin/lzupdate.py --build hg19 --gencode 19 --gwas-cat
 +
</pre>
 +
 +
The script will NOT overwrite the existing locuszoom database, since you should likely back it up first (under data/database/*.db). After running the script you should have both a new locuszoom.db file, and a gwas catalog file. You can then either overwrite the locuszoom database after backing it up, or you could place them in a different location and modify the conf file accordingly. The script will provide instructions after running for how to do this.
    
=== Changing m2zfast.conf settings  ===
 
=== Changing m2zfast.conf settings  ===
239

edits

Navigation menu