From Genome Analysis WikiJump to navigationJump to search
Rare-Metal-Worker is a software for generating summary level statistics for meta analyses using Rare-Metals. It handles both related individuals and unrelated individuals.
- Version 0.1 is released on 11/13/2012.
- Modified Rare-Metal-Worker to let it output LD matrix by a sliding window. (11/14/2012)
Rare-Metal-Worker has the following features:
- Takes genotypes from either Merlin format PED file or VCF file.
- Generates summary level statistics for both related and unrelated individuals.
- Generates variants LD matrix by a sliding window of preferred size.
- Handles related individuals using a kinship matrix derived from the pedigree or from genotype data.
- Has the option of fitting shared environment.
- Can handle variants on Chromosome X.
Rare-Metal-Worker needs the following files as input: PED and DAT file in Merlin format, AND/OR a VCF file. When genotypes are stored in PED and DAT file, VCF file is not needed. However, even if genotypes are saved in a VCF file, PED and DAT files are still needed for carrying covariate and trait information.
PED and DAT Files
- When PED file has genotypes saved, there is no need of VCF file as input.
- Rare-Metal-Worker takes PED and DAT file in Merlin format. Please refer to the following link for specifications.
- DAT file must have variant names in the following format "M chr:pos". Here is an example of variant name format in DAT file:
M 1:123456 M 1:234567 M 2:111111 M 2:222222 M X:12345 M X:111111
- Markers in PED and DAT file must be sorted by chromosome and position.
- Another option is to use VCF as input. Please refer to the following link for VCF file specification:
- VCF file should be compressed by bgzip and indexed by tabix, using the following command:
bgzip input.vcf ## this command will produce input.vcf.gz tabix -pvcf -f input.vcf.gz ## this command will produce input.vcf.gz.tbi
Covariates & Traits
- Covariate and trait values are saved in PED file. Covariate and trait descriptions are saved in DAT file.
- It follows Merlin input format as in the above link.
The following options are currently available in Rare-Metal-Worker:
Options: Input Files : --ped [Exomechip.pheno.ped], --dat [Exomechip.pheno.dat], --vcf  Output Files : --prefix , --LDwindow  VC Options : --vcShared, --vcX, --useCovariates [ON] Trait Options : --makeResiduals, --inverseNormal [ON], --traitName  Kinship Source : --kinPedigree [ON], --kinGeno, --kinFile , --kinSave Kinship Options : --kinMaf [0.05], --kinMiss [0.05] Chromosome X : --xLabel [X], --xStart , --xEnd 
The usage of these options are explained one-by-one using examples in the following:
- When genotypes are saved in VCF file, PED and DAT files are also necessary for containing pedigree structure, covariate and trait information. An example is in the following:
--ped input.ped --dat input.dat --vcf input.vcf.gz
- When genotypes are saved in PED file, VCF file is not needed. An example is in the following:
--ped input.ped --dat input.dat
- --prefix is optional.
- If --prefix is not specified, the output file names will be:
- If --prefix prefix is specified, then the output file names are:
- --LDwindow specifies the length of the window that LD Matrix should be generated upon each variant. The default is 1MB.
- When --vcShared and --vcX are specified, Rare-Metal-Worker know that you want to fit shared environment and/or chromosome X variance component together with genetic component and non-shared environment.
- When --useCovariates is specified, Rare-Metal-Worker understands covariates should be read from PED file. Covariates are fit as fixed effects.
- --makeResiduals is created to be used with --useCovariates together to generate residuals from linear regressions ignoring variance components first and then variance components will be estimated from those residuals. If --inverseNormal options is also used, then the residuals from linear regression will be inverse normalized before fitting variance component model.
- An example to use residuals after adjusting covariates, inverse normalize the residuals and then fit variance component model is in the following:
--useCovariates --makeResiduals --inverseNormal
- An example to fit fixed effects together with variance components is in the following:
- If --inverseNormal is used WITHOUT --makeResiduals, then trait values are inverse normalized before any model fitting.
- --traitName is created for the situation when you have many traits saved in your PED and DAT file, but you are interested in one or a few of them. It can read a file ending with .txt with each trait of interest in a separate line, or trait names separated with "/". An example to handle one trait or multiple traits is in the following:
--traitName LDL --traitName LDL/HDL/TG --traitName traitsOfInterest.txt
- --kinPed allows Rare-Metal-Worker to generate kinship matrix from pedigree, when pedigree information is available. This option is on by default.
- --kinGeno informs Rare-Metal-Worker to generate kinship matrix from all available variants that pass the criteria, specified in --kinMaf and --kinMiss options. The default will take variants with MAF>0.05 and genotype missing rate <0.05.
- --kinFile let Rare-Metal-Worker read in a kinship matrix from a file. However, the kinship matrix has to be generated using Rare-Metal-Worker previously.
- --kinSave allows you to save the kinship.
- --xStart and --xEnd specifies the start and end of non-pseudo-autosomal regions on chromosome X. These options should be specified when --fitX is used.
- --LDwindow specifies the size of the window to calculate variants LD matrix. The default is 1MB.
- To let Rare-Metal-Worker handle unrelated individuals, we just have to code the individuals as unrelated in PED file, or each individual belongs to a unique family. Then Rare-Metal-Worker will take care of the rest.
- However, when --kinGenotype is also used, Rare-Metal-Worker will consider them as related and generate kinship matrix from genotypes.
- An example is shown as following (header is for illustration purpose, not in real PED file):
famid pid fid mid sex age trait 1 1.1 0 0 1 10 -0.3 2 2.1 0 0 1 56 0.0 3 3.1 0 0 2 31 0.4 4 4.1 0 0 2 23 0.008 5 5.1 0 0 2 34 2.35
- There are two files generated automatically by default:
- prefix.traitName.singlevar.score.txt contains summary level statistics that are needed by Rare-Metal. An example is shown in below:
LDL mean= -0.00, variance= 1.00, heritability= 34.30 CHR POS REF_ALLELE ALT_ALLELE INFORMATIVE_N FOUNDER_AF ALL_AF INFORMATIVE_AC STAT ALT_ALLELE_EFFSIZE PVALUE chr1 762320 G A 6103 0.0038225 0.0038225 45 4.73428 0.0949001 0.502675 chr1 865628 G A 6103 0.00556756 0.00556756 67 18.3894 0.249333 0.0322511 chr1 865665 G A 6103 8.30979e-05 8.30979e-05 1 -0.740215 -0.599493 0.505316 chr1 878744 C G 6103 0.00667334 0.00667334 80 -19.5138 -0.220432 0.0380796 chr1 879381 G A 6103 8.31117e-05 8.31117e-05 1 -0.831691 -0.61887 0.473108
- pvalues from the above output are from the family-based single variant score test.
- prefix.traitName.singlevar.cov.txt contains the LD matrix among a variant and the adjacent markers within a prefixed-sized window. The default window size is 1MB. It has the following format:
CHR POS VAR_POS_IN_WINDOW LD_MATRIX chr1 762320 762320,865628,865665,878744,879381,1560000 0.0359084,-0.000242112,-0.00125797,-0.000993422,-0.000344509,-0.00017077, chr1 865628 865628,865665,878744,879381,1560000,1867659 0.419804,-0.0103663,-0.00635265,0.0594056,0.0534505,-0.00462183, chr1 878744 878744,879381,1560000,1867659,1877659 0.000404537,-0.000235215,-1.4455e-05,-8.69137e-06,-3.1027e-05,
- When you have genotype stored in ped file and dat file, and want to use pedigree kinship:
/bin/Rare-Metal-Worker --ped yourInput.ped --dat yourInput.dat --traitName LDL
- When you have genotype stored in ped file and dat file, and want to use kinship generated from genotypes:
/bin/Rare-Metal-Worker --ped yourInput.ped --dat yourInput.dat --kinGeno --kinSave --traitName LDL (--kinSave allows you to save kinship matrix for future use.)
- When you have genotype stored in vcf file and want to use pedigree kinship:
/bin/Rare-Metal-Worker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz
- When you have genotype stored in vcf file and want to use kinship generated from genotype:
/bin/Rare-Metal-Worker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz --kinGeno --kinSave (--kinSave allows you to save kinship matrix for future use.)
- Command are the same as in above example, except each individual has to have a distinct family ID in ped file.
- When you have genotypes from ped and marker information from dat file, and assuming no relatedness in the sample:
./Rare-Metal-Worker --ped yours.ped --dat yours.dat
- When you have genotypes from vcf and covariates and trait information saved in ped and dat file, assuming there is no relatedness in the sample, you should use the following:
./Rare-Metal-Worker --ped yours.ped --dat yours.dat --vcf yours.vcf.gz