Genezoom

From Genome Analysis Wiki
Jump to: navigation, search

GeneZoom plot is a visualization tool that shows the frequency of variants in a predefined region for groups of individuals. It takes an annotated VCF file as input and generate a text file with specific variant information extracted from VCF file. An R script is used to draw GeneZoom plot based on the text file.

Because all reference data are version hg19, please make sure that your VCF file is also version hg19.

Get geneZoom Source Codes

Download from GitHub with Git

You can create your own git clone(copy) using:

 git clone https://github.com/jinchen-umich/geneZoom.git

or

 git clone git://github.com/jinchen-umich/geneZoom.git

Either of these two commands creates a directory called geneZoom in the current directory.

Update your copy

If you have already downloaded your copy, use the following commands to update:

 1. cd pathToYourCopy/geneZoom
 2. git pull

Download From GitHub without Git

If there is no git in your system, you can still download from GitHub:

  1. Latest Code (master branch)
    via Website
    1. Go to : https://github.com/jinchen-umich/geneZoom
    2. Click on the Download ZIP button on the right side panel.
    via Command Line
    wget https://github.com/jinchen-umich/geneZoom/archive/master.zip

After downloading the file, uncompress (unzip/untar) it. The directory created will be named geneZoom.

Build geneZoom

To build geneZoom, copy the geneZoom package to the directory you want, and then run the following command:

 tar xzvf geneZoom.tar.gz

After you unzip, you can find 3 directories in "geneZoom" (./example ./lib ./scripts ./ref).

Basic Usage Example

When you run

 perl geneZoom.pl

you will get some information about geneZoom


GeneZoom.pl :


This tool is a visualization tool that shows the frequency of variants in a predefined region for groups of individuals.

Note: The SNPs and VCF should be hg19 version. VCF file must have the header greater than 4.0 version. This tool will run ANOVAR to annotate VCF. The annotation values should be in the value list of ANOVAR(http://www.openbioinformatics.org/annovar/annovar_gene.html).

Version : 1.0.1

Report Bug(s) : jich[at]umich[dot]edu


Usage : perl GeneZoom.pl --vcf vcf --phenotypeFile phenotypeFile --sampleFieldName sample --phenotypeFieldName phenotype --phenotypeDelim tab/comma/blank --snpList snpList --flag "splicing:0.01:0.02,nonsense:blue,missense" --format pdf --outDIR outDIR

perl GeneZoom.pl --vcf vcf --phenotypeFile phenotypeFile --sampleFieldName sample --phenotypeFieldName phenotype --phenotypeDelim tab/comma/blank --snpList snpList --snpChrFieldName chr --snpPosFieldName pos --snpDelim tab/comma/blank --flag "splicing:green,nonsense:0.02:0.03,missense" --lables "chr1:123,chr2:234" --outDIR outDIR


Get Help

When you run

 perl geneZoom.pl --man
 perl geneZoom.pl --help
 perldoc geneZoom.pl

You can get the help document.

Parameters

vcf: The VCF file has SNP information. This VCF must have header with sample ID. It can be a gz file or ASCII file.

gene: The gene region will be plotted. For example, "PCSK9".

phenotypeFile: The phenotype has phenotype value and sample ID. This file must have header to spcify which colum is phenotype value and sample ID.

sampleFieldName: The field name of sample ID in the phenotype file.

phenotypeFieldName: The field name of phenotype value in the phenotype file

phenotypeDelim: The delim in phenotype file. It can be tab, blank or comma.

snpList: SNP list which you want to show in plot. If you have a lot SNPs in the gene region, you can specify the SNPs only shown in the plot. Can be NULL.

snpChrFieldName: The field name of CHR in SNP list. If you don't define snpList, this Can be NULL.

snpPosFieldName: The field name of POS in SNP list. If you don't define snpList, this Can be NULL.

snpChrPosFieldName: The field name of CHR:POS in SNP list. If you don't define snpList, this Can be NULL.

snpDelim: The delim in snp list. It can be tab, blank or comma. If you don't define snpList, this Can be NULL.

lableSNPs: The SNP which will be labled in plot. Can be NULL.

flags: The annotation values, MAF range and colors. For example,splicing:0:0.01:red,readthrough:blue. You must specify annotation value,MAF range and color can be empty. The tool will use the default MAF range(0,0.5), and random select one color.

defaultIntron: The default intron lenght in plot. When draw enxon region, tool re-define the intron region with this value. Can be NULL.

title: The titile of the plot. Can be NULL.

xlab: The xlab of the plot. Can be NULL.

ylab: The xlab of the plot. Can be NULL.

titleCex: The cex of title. This value can change the size of title. Can be NULL.

xlabCex: The cex of xlab. This value can change the size of xlab. Can be NULL.

ylabCex: The cex of ylab. This value can change the size of ylab. Can be NULL.

scatterYAxisCex: The cex of y axis. This value can change the size of y axis. Can be NULL.

phenotyeMeanLineColor: The color of mean value of all SNPs in gene region. Can be NULL.

phenotyeMeanLineType: The line type of mean value of all SNPs in gene region. Can be NULL.

phenotyeSDLineColor: The color of standard deviation value of all SNPs in gene region. Can be NULL.

phenotyeSDLineType: The line type of standard deviation value of all SNPs in gene region. Can be NULL.

exonRegionColor: The color of exon region. Can be NULL.

beanPlotXAxisLabelAngle: The lable angle of bean plot x axis. Can be NULL.

beanPlotXAxisLableCex: The cex of bean plot x axis. Can be NULL.

beanPlotXAxisLablePos1: The position 1 of bean plot x axis lable. Can be NULL.

beanPlotXAxisLablePos2: The position 2 of bean plot x axis lable. Can be NULL.

width: The width of plot. Default is 14.1 . Can be NULL.

height: The height of plot. Default is 10. Can be NULL.

format: The format of plot. It can be pdf,tiff and png. Can be NULL.

outDIR: The result directory. All intermediate files and result plot file are in this folder.

Testing geneZoom

There is an example directory in ~/geneZoom. You can find vcf file, phenotype file and on sh file. Run geneZoom example.sh.

 sh ~/geneZoom/example/example.sh

After running 2 minutes +/- 1 minutes. You will get result file "PCSK9.pdf" in your defined output directory.

Acknowledgements

geneZoom is the result of collaborative efforts by Cristen Willer, Jin Chen, He zhang, Ellen Schmidt, Wei Zhou, and Goncalo Abecasis. Please email Cristen Willer [cristen@umich.edu] with any questions.