RAREMETALWORKER

From Genome Analysis Wiki
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Rare-Metal-Worker is a software for generating summary level statistics for meta analyses using Rare-Metals. It handles both related individuals and unrelated individuals.


Change Log

  • Version 0.1 is released on 11/13/2012.
  • Modified Rare-Metal-Worker to let it output LD matrix by a sliding window. (11/14/2012)

Key Features

Rare-Metal-Worker has the following features:

  • Takes genotypes from either Merlin format PED file or VCF file.
  • Generates summary level statistics for both related and unrelated individuals.
  • Generates variants LD matrix by a sliding window of preferred size.
  • Handles related individuals using a kinship matrix derived from the pedigree or from genotype data.
  • Has the option of fitting shared environment.
  • Can handle variants on Chromosome X.

Software Specifications

Input Files

Rare-Metal-Worker needs the following files as input: PED and DAT file in Merlin format, AND/OR a VCF file. When genotypes are stored in PED and DAT file, VCF file is not needed. However, even if genotypes are saved in a VCF file, PED and DAT files are still needed for carrying covariate and trait information.

PED and DAT Files

  • When PED file has genotypes saved, there is no need of VCF file as input.
  • Rare-Metal-Worker takes PED and DAT file in Merlin format. Please refer to the following link for specifications.
 http://www.sph.umich.edu/csg/abecasis/merlin/tour/input_files.html
  • DAT file must have variant names in the following format "M chr:pos". Here is an example of variant name format in DAT file:
 M 1:123456
 M 1:234567
 M 2:111111
 M 2:222222
 M X:12345
 M X:111111
  • Markers in PED and DAT file must be sorted by chromosome and position.

VCF File

  • Another option is to use VCF as input. Please refer to the following link for VCF file specification:
  http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41  
  • VCF file should be compressed by bgzip and indexed by tabix, using the following command:
 bgzip input.vcf     ## this command will produce input.vcf.gz
 tabix -pvcf -f input.vcf.gz  ## this command will produce input.vcf.gz.tbi

Covariates & Traits

  • Covariate and trait values are saved in PED file. Covariate and trait descriptions are saved in DAT file.
  • It follows Merlin input format as in the above link.

Software Options

The following options are currently available in Rare-Metal-Worker:

   Options:
      Input Files : --ped [Exomechip.pheno.ped], --dat [Exomechip.pheno.dat],
                    --vcf []
     Output Files : --prefix [], --LDwindow [1000000]
       VC Options : --vcShared, --vcX, --useCovariates [ON]
    Trait Options : --makeResiduals, --inverseNormal [ON], --traitName []
   Kinship Source : --kinPedigree [ON], --kinGeno, --kinFile [], --kinSave
  Kinship Options : --kinMaf [0.05], --kinMiss [0.05]
     Chromosome X : --xLabel [X], --xStart [2699520], --xEnd [154931044]

The usage of these options are explained one-by-one using examples in the following:

Input Files

  • When genotypes are saved in VCF file, PED and DAT files are also necessary for containing pedigree structure, covariate and trait information. An example is in the following:
 --ped input.ped --dat input.dat --vcf input.vcf.gz
  • When genotypes are saved in PED file, VCF file is not needed. An example is in the following:
 --ped input.ped --dat input.dat

Output Files

--prefix is optional.

  • If --prefix is not specified, the output file names will be:
 traitname.singlevar.score.txt
 traitname.singlevar.cov.txt
  • If --prefix prefix is specified, then the output file names are:
 prefix.traitname.singlevar.score.txt
 prefix.traitname.singlevar.cov.txt

VC Options

  • When --vcShared and --vcX are specified, Rare-Metal-Worker know that you want to fit shared environment and/or chromosome X variance component together with genetic component and non-shared environment.
  • When --useCovariates is specified, Rare-Metal-Worker understands covariates should be read from PED file. Covariates are fit as fixed effects.

Trait Options

  • --makeResiduals is created to be used with --useCovariates together to generate residuals from linear regressions ignoring variance components first and then variance components will be estimated from those residuals. If --inverseNormal options is also used, then the residuals from linear regression will be inverse normalized before fitting variance component model.
    • An example to use residuals after adjusting covariates, inverse normalize the residuals and then fit variance component model is in the following:
  --useCovariates --makeResiduals --inverseNormal
    • An example to fit fixed effects together with variance components is in the following:
  --useCovariates --inverseNormal 
  • If --inverseNormal is used WITHOUT --makeResiduals, then trait values are inverse normalized before any model fitting.
  • --traitName is created for the situation when you have many traits saved in your PED and DAT file, but you are interested in one or a few of them. It can read a file ending with .txt with each trait of interest in a separate line, or trait names separated with "/". An example to handle one trait or multiple traits is in the following:
  --traitName LDL
  --traitName LDL/HDL/TG
  --traitName traitsOfInterest.txt

Kinship Source

  • --kinPed allows Rare-Metal-Worker to generate kinship matrix from pedigree, when pedigree information is available. This option is on by default.
  • --kinGeno informs Rare-Metal-Worker to generate kinship matrix from all available variants that pass the criteria, specified in --kinMaf and --kinMiss options. The default will take variants with MAF>0.05 and genotype missing rate <0.05.
  • --kinFile let Rare-Metal-Worker read in a kinship matrix from a file. However, the kinship matrix has to be generated using Rare-Metal-Worker previously.
  • --kinSave allows you to save the kinship.

Chromosome X

  • --xStart and --xEnd specifies the start and end of non-pseudo-autosomal regions on chromosome X. These options should be specified when --fitX is used.

Other Options

  • --LDwindow specifies the size of the window to calculate variants LD matrix. The default is 1MB.

Handling Unrelated Individuals

  • To let Rare-Metal-Worker handle unrelated individuals, we just have to code the individuals as unrelated in PED file, or each individual belongs to a unique family. Then Rare-Metal-Worker will take care of the rest.
  • However, when --kinGenotype is also used, Rare-Metal-Worker will consider them as related and generate kinship matrix from genotypes.
  • An example is shown as following (header is for illustration purpose, not in real PED file):
  famid pid fid mid sex age trait
  1     1.1   0   0   1  10  -0.3
  2     2.1   0   0   1  56  0.0
  3     3.1   0   0   2  31  0.4
  4     4.1   0   0   2  23  0.008
  5     5.1   0   0   2  34  2.35

Examples

Related individuals

  • When you have genotype stored in ped file and dat file, and want to use pedigree kinship:
 /bin/Rare-Metal-Worker --ped yourInput.ped --dat yourInput.dat --traitName LDL
  • When you have genotype stored in ped file and dat file, and want to use kinship generated from genotypes:
 /bin/Rare-Metal-Worker --ped yourInput.ped --dat yourInput.dat --kinGeno --kinSave --traitName LDL (--kinSave allows you to save kinship matrix for future use.)
  • When you have genotype stored in vcf file and want to use pedigree kinship:
 /bin/Rare-Metal-Worker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz
  • When you have genotype stored in vcf file and want to use kinship generated from genotype:
 /bin/Rare-Metal-Worker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz --kinGeno --kinSave (--kinSave allows you to save kinship matrix for future use.)

Unrelated individuals

  • Command are the same as in above example, except each individual has to have a distinct family ID in ped file.

Q & A