Changes

From Genome Analysis Wiki
Jump to navigationJump to search
8,116 bytes added ,  18:38, 16 November 2012
Created page with ''''Rare-Metal-Worker''' is a software for generating summary level statistics for meta analyses using Rare-Metals. It handles both related individuals and unrelated individuals. …'
'''Rare-Metal-Worker''' is a software for generating summary level statistics for meta analyses using Rare-Metals. It handles both related individuals and unrelated individuals.



== Change Log ==
* Version 0.1 is released on 11/13/2012.
* Modified Rare-Metal-Worker to let it output LD matrix by a sliding window. (11/14/2012)

== Key Features ==
Rare-Metal-Worker has the following features:
* Takes genotypes from either Merlin format PED file or VCF file.
* Generates summary level statistics for both related and unrelated individuals.
* Generates variants LD matrix by a sliding window of preferred size.
* Handles related individuals using a kinship matrix derived from the pedigree or from genotype data.
* Has the option of fitting shared environment.
* Can handle variants on Chromosome X.

== Software Specifications ==

=== Input Files ===
Rare-Metal-Worker needs the following files as input: PED and DAT file in Merlin format, '''AND/OR''' a VCF file. When genotypes are stored in PED and DAT file, VCF file is not needed. However, even if genotypes are saved in a VCF file, PED and DAT files are still needed for carrying covariate and trait information.

==== PED and DAT Files ====
* When PED file has genotypes saved, there is no need of VCF file as input.
* Rare-Metal-Worker takes PED and DAT file in Merlin format. Please refer to the following link for specifications.
http://www.sph.umich.edu/csg/abecasis/merlin/tour/input_files.html
* DAT file must have variant names in the following format "M chr:pos". Here is an example of variant name format in DAT file:
M 1:123456
M 1:234567
M 2:111111
M 2:222222
M X:12345
M X:111111
* '''Markers in PED and DAT file must be sorted by chromosome and position.'''

==== VCF File ====
* Another option is to use VCF as input. Please refer to the following link for VCF file specification:
http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41
* VCF file should be compressed by bgzip and indexed by tabix, using the following command:
bgzip input.vcf ## this command will produce input.vcf.gz
tabix -pvcf -f input.vcf.gz ## this command will produce input.vcf.gz.tbi

==== Covariates & Traits ====
* Covariate and trait values are saved in PED file. Covariate and trait descriptions are saved in DAT file.
* It follows Merlin input format as in the above link.

=== Software Options ===
The following options are currently available in Rare-Metal-Worker:

Options:
Input Files : --ped [Exomechip.pheno.ped], --dat [Exomechip.pheno.dat],
--vcf []
Output Files : --prefix [], --LDwindow [1000000]
VC Options : --vcShared, --vcX, --useCovariates [ON]
Trait Options : --makeResiduals, --inverseNormal [ON], --traitName []
Kinship Source : --kinPedigree [ON], --kinGeno, --kinFile [], --kinSave
Kinship Options : --kinMaf [0.05], --kinMiss [0.05]
Chromosome X : --xLabel [X], --xStart [2699520], --xEnd [154931044]

The usage of these options are explained one-by-one using examples in the following:
==== Input Files ====
* When genotypes are saved in VCF file, PED and DAT files are also necessary for containing pedigree structure, covariate and trait information. An example is in the following:
--ped input.ped --dat input.dat --vcf input.vcf.gz
* When genotypes are saved in PED file, VCF file is not needed. An example is in the following:
--ped input.ped --dat input.dat

==== Output Files ====
--prefix is optional.
* If --prefix is not specified, the output file names will be:
traitname.singlevar.score.txt
traitname.singlevar.cov.txt
* If --prefix prefix is specified, then the output file names are:
prefix.traitname.singlevar.score.txt
prefix.traitname.singlevar.cov.txt
==== VC Options ====
* When --vcShared and --vcX are specified, Rare-Metal-Worker know that you want to fit shared environment and/or chromosome X variance component together with genetic component and non-shared environment.
* When --useCovariates is specified, Rare-Metal-Worker understands covariates should be read from PED file. Covariates are fit as fixed effects.
==== Trait Options ====
* --makeResiduals is created to be used with --useCovariates together to generate residuals from linear regressions ignoring variance components first and then variance components will be estimated from those residuals. If --inverseNormal options is also used, then the residuals from linear regression will be inverse normalized before fitting variance component model.
** An example to use residuals after adjusting covariates, inverse normalize the residuals and then fit variance component model is in the following:
--useCovariates --makeResiduals --inverseNormal
** An example to fit fixed effects together with variance components is in the following:
--useCovariates --inverseNormal
* If --inverseNormal is used WITHOUT --makeResiduals, then trait values are inverse normalized before any model fitting.
* --traitName is created for the situation when you have many traits saved in your PED and DAT file, but you are interested in one or a few of them. It can read a file ending with .txt with each trait of interest in a separate line, or trait names separated with "/". An example to handle one trait or multiple traits is in the following:
--traitName LDL
--traitName LDL/HDL/TG
--traitName traitsOfInterest.txt
==== Kinship Source ====
* --kinPed allows Rare-Metal-Worker to generate kinship matrix from pedigree, when pedigree information is available. This option is on by default.
* --kinGeno informs Rare-Metal-Worker to generate kinship matrix from all available variants that pass the criteria, specified in --kinMaf and --kinMiss options. The default will take variants with MAF>0.05 and genotype missing rate <0.05.
* --kinFile let Rare-Metal-Worker read in a kinship matrix from a file. However, the kinship matrix has to be generated using Rare-Metal-Worker previously.
* --kinSave allows you to save the kinship.

==== Chromosome X ====
* --xStart and --xEnd specifies the start and end of non-pseudo-autosomal regions on chromosome X. These options should be specified when --fitX is used.
==== Other Options ====
* --LDwindow specifies the size of the window to calculate variants LD matrix. The default is 1MB.

=== Handling Unrelated Individuals ===
* To let Rare-Metal-Worker handle unrelated individuals, we just have to code the individuals as unrelated in PED file, or each individual belongs to a unique family. Then Rare-Metal-Worker will take care of the rest.
* However, when --kinGenotype is also used, Rare-Metal-Worker will consider them as related and generate kinship matrix from genotypes.
* An example is shown as following (header is for illustration purpose, not in real PED file):

famid pid fid mid sex trait
1 1.1 0 0 1 -0.3
2 2.1 0 0 1 0.0
3 3.1 0 0 2 0.4
4 4.1 0 0 2 0.008
5 5.1 0 0 2 2.35

== Examples ==

=== Related individuals ===
* When you have genotype stored in ped file and dat file, and want to use pedigree kinship:

/bin/Rare-Metal-Worker --ped yourInput.ped --dat yourInput.dat --traitName LDL

* When you have genotype stored in ped file and dat file, and want to use kinship generated from genotypes:

/bin/Rare-Metal-Worker --ped yourInput.ped --dat yourInput.dat --kinGeno --kinSave --traitName LDL (--kinSave allows you to save kinship matrix for future use.)

* When you have genotype stored in vcf file and want to use pedigree kinship:

/bin/Rare-Metal-Worker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz

* When you have genotype stored in vcf file and want to use kinship generated from genotype:

/bin/Rare-Metal-Worker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz --kinGeno --kinSave (--kinSave allows you to save kinship matrix for future use.)

=== Unrelated individuals ===

* Command are the same as in above example, except each individual has to have a distinct family ID in ped file.

== Q & A ==
2,004

edits

Navigation menu