RAREMETALWORKER

From Genome Analysis Wiki
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Rare-Metal-Worker is a tool for generating summary level statistics for rare variant and gene level meta analyses using Rare-Metal. It handles both related individuals and unrelated individuals.

Change Log

  • Version 0.0.1 was released on 11/13/2012.
  • Modified Rare-Metal-Worker to let it output LD matrix by a sliding window. (11/14/2012)
  • Uploaded to public wiki. (11/16/2012)
  • Enabled writing log file automatically. (11/18/2012)

Key Features

Rare-Metal-Worker has the following features:

  • Takes genotypes from either PED file or VCF file.
  • Generates summary level statistics for both related and unrelated individuals.
  • Generates linkage disequilibrium matrices summarizing covariance between single marker statistics using an adjustable sliding window.
  • Optionally handles related individuals using a kinship matrix derived from either pedigree or genotype data.
  • Has the option of fitting shared environment.
  • Can handle variants on Chromosome X.

Software Download and Installation

Where to Download

  • The source package for Linux and Mac can be downloaded here:
 http://genome.sph.umich.edu/wiki/File:RareMetalWorker.0.0.1.tgz
  • Save it to your local path and decompress use the following command:
 tar xvzf RareMetalWorker.0.0.1.tgz

How to Compile

  • Go to /RareMetalWorker_0.0.1/RareMetalWorker/src and use the following command:
 make all
  • To execute the program, go to /RareMetalWorker_0.0.1/RareMetalWorker/bin, then the program can be executed by ./Rare-Metal-Worker.

Software Specifications

Input Files

Rare-Metal-Worker needs the following files as input: PED and DAT file in Merlin format, AND/OR a VCF file. When genotypes are stored in PED and DAT file, the VCF file is not needed. However, even if genotypes are saved in a VCF file, PED and DAT files are still needed for carrying covariate and trait information.

PED and DAT Files

  • When PED file has genotypes saved, there is no need for a VCF file as input.
  • Rare-Metal-Worker takes PED and DAT file in Merlin format. Please refer to the following link for specifications.
 http://www.sph.umich.edu/csg/abecasis/merlin/tour/input_files.html
  • DAT file must have variant names in the following format "M chr:pos". Here is an example of variant name format in DAT file:
 M 1:123456
 M 1:234567
 M 2:111111
 M 2:222222
 M X:12345
 M X:111111
  • Markers in PED and DAT file must be sorted by chromosome and position.
  • Covariate and trait values are saved in PED file. Covariate and trait descriptions are saved in DAT file.

VCF File

  • Another option is to use VCF as input. Please refer to the following link for VCF file specification:
  http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41  
  • VCF file should be compressed by bgzip and indexed by tabix, using the following command:
 bgzip input.vcf     ## this command will produce input.vcf.gz
 tabix -pvcf -f input.vcf.gz  ## this command will produce input.vcf.gz.tbi

Software Options

The following options are currently available in Rare-Metal-Worker:

   Options:
      Input Files : --ped [Exomechip.pheno.ped], --dat [Exomechip.pheno.dat], --vcf []
     Output Files : --prefix [], --LDwindow [1000000]
       VC Options : --vcShared, --vcX, --useCovariates [ON]
    Trait Options : --makeResiduals, --inverseNormal [ON], --traitName []
   Kinship Source : --kinPedigree [ON], --kinGeno, --kinFile [], --kinSave
  Kinship Options : --kinMaf [0.05], --kinMiss [0.05]
     Chromosome X : --xLabel [X], --xStart [2699520], --xEnd [154931044]


Input Files

  • When genotypes are saved in a VCF file, PED and DAT files are used for specifying pedigree structure, covariate and trait information. An example command line might look like this:
 --ped input.ped --dat input.dat --vcf input.vcf.gz
  • When genotypes are saved in the PED file, the VCF file is not needed. An example command line might look like this:
 --ped input.ped --dat input.dat

Output Files

  • --prefix is optional.
  • If --prefix is not specified, the output file names will be:
 traitname.singlevar.score.txt
 traitname.singlevar.cov.txt
  • If --prefix prefix is specified, then the output file names are:
 prefix.traitname.singlevar.score.txt
 prefix.traitname.singlevar.cov.txt
  • --LDwindow specifies the length of the window that LD Matrix should be generated upon each variant. The default is 1MB.

VC Options

  • When --vcShared and --vcX are specified, Rare-Metal-Worker knows that you want to fit shared environment and/or chromosome X variance component together with genetic component and non-shared environment.
  • When --useCovariates is specified, Rare-Metal-Worker understands covariates should be read from PED file. Covariates are modeled as fixed effects.

Trait Options

  • --makeResiduals can be combined with --useCovariates to generate residuals from a simple linear regressions before analysis. If the --inverseNormal option is also used, then the residuals will be quantile normalized before fitting variance component model.
    • An example Command line requesting pre-adjustment for covariates before fitting a variance component follows:
  --useCovariates --makeResiduals --inverseNormal
    • An example command line requesting joint modeling of fixed effects and variance components follows:
  --useCovariates --inverseNormal 
  • If --inverseNormal is used WITHOUT --makeResiduals, then trait values are inverse normalized before any model fitting.
  • --traitName is created for situations when you have many traits saved in your PED and DAT file, but you are interested in one or a few of them. It can read a file ending with .txt with each trait of interest in a separate line, or trait names separated with "/". An example to handle one trait or multiple traits is in the following:
  --traitName LDL
  --traitName LDL/HDL/TG
  --traitName traitsOfInterest.txt

Kinship Source

  • --kinPedigree allows Rare-Metal-Worker to generate kinship matrix from pedigree, when pedigree information is available. This option is on by default.
  • --kinGeno informs Rare-Metal-Worker to generate kinship matrix from all available variants that pass the criteria, specified in --kinMaf and --kinMiss options. The default will take variants with MAF>0.05 and genotype missing rate <0.05.
  • --kinFile let Rare-Metal-Worker read in a kinship matrix from a file. However, the kinship matrix has to be generated using Rare-Metal-Worker previously from the same VCF file or EPD/DAT file.
  • --kinSave allows you to save the kinship.

Kinship Options

  • --kinMiss and --kinMaf should be used with --kinGeno together.
  • --kinMiss specifies the maximum genotype missing rate when calculating kinship from genotypes. The default is 0.05.
  • --kinMaf specifies the minimum minor allele frequency used when calculating kinship from genotypes. The default is 0.05.

Chromosome X

  • --xLabel should have a value of a string which specifies how variants on chromosome X are coded. The default is "X".
  • --xStart and --xEnd specifies the start and end of non-pseudo-autosomal regions on chromosome X. These options should be specified when --fitX is used.
  • The default for --xStart is 2699520 and default for --xEnd is 154931044, according to NCBI genome build 37.

Handling Unrelated Individuals

  • To let Rare-Metal-Worker handle unrelated individuals, we just have to code the individuals as unrelated in PED file, or each individual belongs to a unique family. Then Rare-Metal-Worker will take care of the rest.
  • However, when --kinGenotype is also used, Rare-Metal-Worker will consider them as related and generate kinship matrix from genotypes.
  • An example is shown as following (header is included for illustration purpose, not in real PED file):
  famid pid fid mid sex age trait
  1     1.1   0   0   1  10  -0.3
  2     2.1   0   0   1  56  0.0
  3     3.1   0   0   2  31  0.4
  4     4.1   0   0   2  23  0.008
  5     5.1   0   0   2  34  2.35

Output Formats

  • There are two files generated automatically by default:
 prefix.traitName.singlevar.score.txt
 prefix.traitName.singlevar.cov.txt
  • prefix.traitName.singlevar.score.txt contains summary level statistics that are needed by Rare-Metal. An example is shown in below:
 LDL mean= -0.00, variance=  1.00, heritability= 34.30 
 CHR	POS	REF_ALLELE	ALT_ALLELE	INFORMATIVE_N	FOUNDER_AF	ALL_AF	INFORMATIVE_AC	STAT	ALT_ALLELE_EFFSIZE	PVALUE
 chr1	762320	G	        A	        6103	        0.0038225	0.0038225	45	4.73428	      0.0949001	        0.502675
 chr1	865628	G	        A	        6103	        0.00556756	0.00556756	67	18.3894	      0.249333	        0.0322511
 chr1	865665	G	        A	        6103	        8.30979e-05	8.30979e-05	1	-0.740215     -0.599493	        0.505316
 chr1	878744	C	        G	        6103	        0.00667334	0.00667334	80	-19.5138      -0.220432	        0.0380796
 chr1	879381	G	        A	        6103	        8.31117e-05	8.31117e-05	1	-0.831691     -0.61887	        0.473108
  • pvalues from the above output are from the family-based single variant score test.
  • prefix.traitName.singlevar.cov.txt contains the LD matrix among a variant and the adjacent markers within a prefixed-sized window. The default window size is 1MB. It has the following format:
 CHR    POS        VAR_POS_IN_WINDOW                             LD_MATRIX
 chr1   762320     762320,865628,865665,878744,879381,1560000    0.0359084,-0.000242112,-0.00125797,-0.000993422,-0.000344509,-0.00017077,
 chr1   865628     865628,865665,878744,879381,1560000,1867659   0.419804,-0.0103663,-0.00635265,0.0594056,0.0534505,-0.00462183,
 chr1	 878744     878744,879381,1560000,1867659,1877659         0.000404537,-0.000235215,-1.4455e-05,-8.69137e-06,-3.1027e-05, 

Examples

Related individuals

  • When you have genotype stored in ped file and dat file, and want to use pedigree kinship and inverse normalize trait values before adjusting any covariates and doing analysis:
 /bin/Rare-Metal-Worker --ped yourInput.ped --dat yourInput.dat --traitName LDL --inverseNormal --useCovariates
  • When you have genotype stored in ped file and dat file, and want to use pedigree kinship and adjust covariates before inverse normalizing the residuals and doing further analysis:
 /bin/Rare-Metal-Worker --ped yourInput.ped --dat yourInput.dat --traitName LDL --useCovariates --makeResiduals --inverseNormal 
  • When you have genotype stored in ped file and dat file, and want to use kinship generated from genotypes:
 /bin/Rare-Metal-Worker --ped yourInput.ped --dat yourInput.dat --kinGeno --kinSave --traitName LDL (--kinSave allows you to save kinship matrix for future use; it is optional.)
  • When you have genotype stored in vcf file and want to use pedigree kinship:
 /bin/Rare-Metal-Worker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz
  • When you have genotype stored in vcf file and want to use kinship generated from genotype:
 /bin/Rare-Metal-Worker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz --kinGeno --kinSave (--kinSave allows you to save kinship matrix for future use.)

Unrelated individuals

  • Commands are the same as in above example, except each individual has to have a distinct family ID in PED file, and their father and mother ids should be "0".
  • When you have genotypes from ped and marker information from dat file, and assuming no relatedness in the sample:
 ./Rare-Metal-Worker --ped yours.ped --dat yours.dat
  • When you have genotypes from vcf and covariates and trait information saved in ped and dat file, assuming there is no relatedness in the sample, you should use the following:
 ./Rare-Metal-Worker --ped yours.ped --dat yours.dat --vcf yours.vcf.gz
  • When you have genotypes from vcf and covariates and trait information saved in ped and dat file, assuming there is cryptic relatedness in the sample, you should use the following:
 ./Rare-Metal-Worker --ped yours.ped --dat yours.dat --vcf yours.vcf.gz --kinGeno (# this will handle individuals as related, and generate kinship matrix from genotype.)

Q & A