RAREMETALWORKER

From Genome Analysis Wiki
Revision as of 19:48, 18 November 2012 by Shuang Feng (talk | contribs)
Jump to: navigation, search

Rare-Metal-Worker is a tool for generating summary level statistics for rare variant and gene level meta analyses using Rare-Metal. It handles both related individuals and unrelated individuals.

Change Log

  • Version 0.0.1 was released on 11/13/2012.
  • Modified Rare-Metal-Worker to let it output LD matrix by a sliding window. (11/14/2012)
  • Uploaded to public wiki. (11/16/2012)
  • Enabled writing log file automatically. (11/18/2012)

Key Features

Rare-Metal-Worker has the following features:

  • Takes genotypes from either PED file or VCF file.
  • Generates summary level statistics for both related and unrelated individuals.
  • Generates linkage disequilibrium matrices summarizing covariance between single marker statistics using an adjustable sliding window.
  • Optionally handles related individuals using a kinship matrix derived from either pedigree or genotype data.
  • Has the option of fitting shared environment.
  • Can handle variants on Chromosome X.

Software Download and Installation

Where to Download

  • The source package for Linux and Mac can be downloaded here:
 http://genome.sph.umich.edu/wiki/File:RareMetalWorker.0.0.1.tgz
  • Save it to your local path and decompress use the following command:
 tar xvzf RareMetalWorker.0.0.1.tgz

How to Compile

  • Go to /RareMetalWorker_0.0.1/RareMetalWorker/src and use the following command:
 make all

How to Execute

  • To execute the program, go to /RareMetalWorker_0.0.1/RareMetalWorker/bin, then the program can be executed by ./Rare-Metal-Worker.

Software Specifications

Input Files

Rare-Metal-Worker needs the following files as input: PED and DAT file in Merlin format, AND/OR a VCF file. When genotypes are stored in PED and DAT file, the VCF file is not needed. However, even if genotypes are saved in a VCF file, PED and DAT files are still needed for carrying covariate and trait information.

PED and DAT Files

  • When PED file has genotypes saved, there is no need for a VCF file as input.
  • Rare-Metal-Worker takes PED and DAT file in Merlin format. Please refer to the following link for specifications.
 http://www.sph.umich.edu/csg/abecasis/merlin/tour/input_files.html
  • DAT file must have variant names in the following format "M chr:pos". Here is an example of variant name format in DAT file:
 M 1:123456
 M 1:234567
 M 2:111111
 M 2:222222
 M X:12345
 M X:111111
  • Markers in PED and DAT file must be sorted by chromosome and position.
  • Covariate and trait values are saved in PED file. Covariate and trait descriptions are saved in DAT file.

VCF File

  • Another option is to use VCF as input. Please refer to the following link for VCF file specification:
  http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41  
  • VCF file should be compressed by bgzip and indexed by tabix, using the following command:
 bgzip input.vcf     ## this command will produce input.vcf.gz
 tabix -pvcf -f input.vcf.gz  ## this command will produce input.vcf.gz.tbi

Software Options

The following options are currently available in Rare-Metal-Worker:

   Options:
      Input Files : --ped [Exomechip.pheno.ped], --dat [Exomechip.pheno.dat], --vcf []
     Output Files : --prefix [], --LDwindow [1000000]
       VC Options : --vcShared, --vcX, --useCovariates [ON]
    Trait Options : --makeResiduals, --inverseNormal [ON], --traitName []
   Kinship Source : --kinPedigree [ON], --kinGeno, --kinFile [], --kinSave
  Kinship Options : --kinMaf [0.05], --kinMiss [0.05]
     Chromosome X : --xLabel [X], --xStart [2699520], --xEnd [154931044]


Input Files

  • When genotypes are saved in a VCF file, PED and DAT files are used for specifying pedigree structure, covariate and trait information. An example command line might look like this:
 --ped input.ped --dat input.dat --vcf input.vcf.gz
  • When genotypes are saved in the PED file, the VCF file is not needed. An example command line might look like this:
 --ped input.ped --dat input.dat

Output Files

  • --prefix is optional.
  • If --prefix is not specified, the output file names will be:
 traitname.singlevar.score.txt
 traitname.singlevar.cov.txt
  • If --prefix prefix is specified, then the output file names are:
 prefix.traitname.singlevar.score.txt
 prefix.traitname.singlevar.cov.txt
  • --LDwindow specifies the length of the window that LD Matrix should be generated upon each variant. The default is 1MB.

VC Options

  • When --vcShared and --vcX are specified, Rare-Metal-Worker knows that you want to fit shared environment and/or chromosome X variance component together with genetic component and non-shared environment.
  • When --useCovariates is specified, Rare-Metal-Worker understands covariates should be read from PED file. Covariates are modeled as fixed effects.

Trait Options

  • --makeResiduals can be combined with --useCovariates to generate residuals from a simple linear regressions before analysis. If the --inverseNormal option is also used, then the residuals will be quantile normalized before fitting variance component model.
    • An example Command line requesting pre-adjustment for covariates before fitting a variance component follows:
  --useCovariates --makeResiduals --inverseNormal
    • An example command line requesting joint modeling of fixed effects and variance components follows:
  --useCovariates --inverseNormal 
  • If --inverseNormal is used WITHOUT --makeResiduals, then trait values are inverse normalized before any model fitting.
  • --traitName is created for situations when you have many traits saved in your PED and DAT file, but you are interested in one or a few of them. It can read a file ending with .txt with each trait of interest in a separate line, or trait names separated with "/". An example to handle one trait or multiple traits is in the following:
  --traitName LDL
  --traitName LDL/HDL/TG
  --traitName traitsOfInterest.txt

Kinship Source

  • --kinPedigree allows Rare-Metal-Worker to generate kinship matrix from pedigree, when pedigree information is available. This option is on by default.
  • --kinGeno informs Rare-Metal-Worker to generate kinship matrix from all available variants that pass the criteria, specified in --kinMaf and --kinMiss options. The default will take variants with MAF>0.05 and genotype missing rate <0.05.
  • --kinFile let Rare-Metal-Worker read in a kinship matrix from a file. However, the kinship matrix has to be generated using Rare-Metal-Worker previously from the same VCF file or EPD/DAT file.
  • --kinSave allows you to save the kinship.

Kinship Options

  • --kinMiss and --kinMaf should be used with --kinGeno together.
  • --kinMiss specifies the maximum genotype missing rate when calculating kinship from genotypes. The default is 0.05.
  • --kinMaf specifies the minimum minor allele frequency used when calculating kinship from genotypes. The default is 0.05.

Chromosome X

  • --xLabel should have a value of a string which specifies how variants on chromosome X are coded. The default is "X".
  • --xStart and --xEnd specifies the start and end of non-pseudo-autosomal regions on chromosome X. These options should be specified when --fitX is used.
  • The default for --xStart is 2699520 and default for --xEnd is 154931044, according to NCBI genome build 37.

Handling Unrelated Individuals

  • To let Rare-Metal-Worker handle unrelated individuals, we just have to code the individuals as unrelated in PED file, or each individual belongs to a unique family. Then Rare-Metal-Worker will take care of the rest.
  • However, when --kinGenotype is also used, Rare-Metal-Worker will consider them as related and generate kinship matrix from genotypes.
  • An example is shown as following (header is included for illustration purpose, not in real PED file):
  famid pid fid mid sex age trait
  1     1.1   0   0   1  10  -0.3
  2     2.1   0   0   1  56  0.0
  3     3.1   0   0   2  31  0.4
  4     4.1   0   0   2  23  0.008
  5     5.1   0   0   2  34  2.35

Output Formats

  • There are two files generated automatically by default:
 prefix.traitName.singlevar.score.txt
 prefix.traitName.singlevar.cov.txt
  • prefix.traitName.singlevar.score.txt contains summary level statistics that are needed by Rare-Metal. An example is shown in below:
 LDL mean= -0.00, variance=  1.00, heritability= 34.30 
 CHR	POS	REF_ALLELE	ALT_ALLELE	INFORMATIVE_N	FOUNDER_AF	ALL_AF	INFORMATIVE_AC	STAT	ALT_ALLELE_EFFSIZE	PVALUE
 chr1	762320	G	        A	        6103	        0.0038225	0.0038225	45	4.73428	      0.0949001	        0.502675
 chr1	865628	G	        A	        6103	        0.00556756	0.00556756	67	18.3894	      0.249333	        0.0322511
 chr1	865665	G	        A	        6103	        8.30979e-05	8.30979e-05	1	-0.740215     -0.599493	        0.505316
 chr1	878744	C	        G	        6103	        0.00667334	0.00667334	80	-19.5138      -0.220432	        0.0380796
 chr1	879381	G	        A	        6103	        8.31117e-05	8.31117e-05	1	-0.831691     -0.61887	        0.473108
  • pvalues from the above output are from the family-based single variant score test.
  • prefix.traitName.singlevar.cov.txt contains the LD matrix among a variant and the adjacent markers within a prefixed-sized window. The default window size is 1MB. It has the following format:
 CHR    POS        VAR_POS_IN_WINDOW                             LD_MATRIX
 chr1   762320     762320,865628,865665,878744,879381,1560000    0.0359084,-0.000242112,-0.00125797,-0.000993422,-0.000344509,-0.00017077,
 chr1   865628     865628,865665,878744,879381,1560000,1867659   0.419804,-0.0103663,-0.00635265,0.0594056,0.0534505,-0.00462183,
 chr1	 878744     878744,879381,1560000,1867659,1877659         0.000404537,-0.000235215,-1.4455e-05,-8.69137e-06,-3.1027e-05, 

Examples

Related individuals

  • When you have genotype stored in ped file and dat file, and want to use pedigree kinship and inverse normalize trait values before adjusting any covariates and doing analysis:
 /bin/Rare-Metal-Worker --ped yourInput.ped --dat yourInput.dat --traitName LDL --inverseNormal --useCovariates
  • When you have genotype stored in ped file and dat file, and want to use pedigree kinship and adjust covariates before inverse normalizing the residuals and doing further analysis:
 /bin/Rare-Metal-Worker --ped yourInput.ped --dat yourInput.dat --traitName LDL --useCovariates --makeResiduals --inverseNormal 
  • When you have genotype stored in ped file and dat file, and want to use kinship generated from genotypes:
 /bin/Rare-Metal-Worker --ped yourInput.ped --dat yourInput.dat --kinGeno --kinSave --traitName LDL (--kinSave allows you to save kinship matrix for future use; it is optional.)
  • When you have genotype stored in vcf file and want to use pedigree kinship:
 /bin/Rare-Metal-Worker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz
  • When you have genotype stored in vcf file and want to use kinship generated from genotype:
 /bin/Rare-Metal-Worker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz --kinGeno --kinSave (--kinSave allows you to save kinship matrix for future use.)

Unrelated individuals

  • Commands are the same as in above example, except each individual has to have a distinct family ID in PED file, and their father and mother ids should be "0".
  • When you have genotypes from ped and marker information from dat file, and assuming no relatedness in the sample:
 ./Rare-Metal-Worker --ped yours.ped --dat yours.dat
  • When you have genotypes from vcf and covariates and trait information saved in ped and dat file, assuming there is no relatedness in the sample, you should use the following:
 ./Rare-Metal-Worker --ped yours.ped --dat yours.dat --vcf yours.vcf.gz
  • When you have genotypes from vcf and covariates and trait information saved in ped and dat file, assuming there is cryptic relatedness in the sample, you should use the following:
 ./Rare-Metal-Worker --ped yours.ped --dat yours.dat --vcf yours.vcf.gz --kinGeno (# this will handle individuals as related, and generate kinship matrix from genotype.)

Q & A