Line 1: |
Line 1: |
− | '''Rare-Metal-Worker''' is a tool for generating summary statistics for rare variants and gene level meta analyses using Rare-Metal. It handles both related individuals and unrelated individuals. | + | '''Rare Metal''' wiki page will on up by 2/14/2013. |
| | | |
− | If you have any questions, please contact: sfengsph at umich dot edu
| + | Thanks for your patience! |
− | | |
− | == Change Log ==
| |
− | * Version 0.0.1 was released on 11/13/2012.
| |
− | * Modified Rare-Metal-Worker to let it output LD matrix by a sliding window. (11/14/2012)
| |
− | * Uploaded to public wiki. (11/16/2012)
| |
− | * Enabled writing log file by defalut. (11/18/2012)
| |
− | * Forced sample IDs to be matched when reading in kinship from a file. Perform a sanity check before reading in kinship file. If a sample of interest is not included in kinship file, then fatal error will occur. (11/19/2012)
| |
− | * Added HWE pvalue and call rate in summary statistics output. (11/27/2012)
| |
− | * Bugs fixed to solve compiling errors on some machines (Thank you Mary Kate!). Version 0.0.2 released. (11/30/2012)
| |
− | * Updated output format. Version 0.0.3 released. (12/3/2012)
| |
− | * More messages coded into log file. (12/4/2012)
| |
− | * Version 0.0.4 released. (12/5/2012)
| |
− | * Bug fixed for empirical kinship calculation when genotypes are read from VCF file. Version 0.0.5 released. (12/6/2012)
| |
− | * Version 0.0.6 released. (12/6/2012)
| |
− | * Updated output format for monomorphic sites. (12/7/2012)
| |
− | * Changed executable name into bin/raremetalworker. Version 0.0.7 released. (12/10/2012)
| |
− | * Fixed a bug when reading genotypes from vcf file. (2/5/2013)
| |
− | | |
− | == Key Features ==
| |
− | Rare-Metal-Worker has the following features:
| |
− | * Takes genotypes from either PED file or VCF file.
| |
− | * Generates summary statistics for both related and unrelated individuals.
| |
− | * Generates linkage disequilibrium matrices summarizing covariance between single marker statistics using an adjustable sliding window.
| |
− | * Optionally handles related individuals using a kinship matrix derived from either pedigree or genotype data.
| |
− | * Has the option of fitting shared environment.
| |
− | * Can handle variants on Chromosome X.
| |
− | | |
− | == Software Download and Installation ==
| |
− | | |
− | === Where to Download ===
| |
− | | |
− | * The source package for Linux and Mac can be downloaded here: [[Media:RareMetalWorker.0.0.8.tgz|software package download]]
| |
− | * Save it to your local path and decompress using the following command:
| |
− | tar xvzf RareMetalWorker.0.0.8.tgz
| |
− | * For UM CSG cluster users, no installation is needed. It is available at /net/fantasia/home/sfengsph/code/Rare-Metal/RareMetalWorker/bin/raremetalworker
| |
− | | |
− | === How to Compile ===
| |
− | | |
− | * Go to /RareMetalWorker_0.0.8/RareMetalWorker/src and use the following command:
| |
− |
| |
− | make all
| |
− | | |
− | === How to Execute ===
| |
− | | |
− | * To execute the program, go to /RareMetalWorker_0.0.6/RareMetalWorker/bin, then the program can be executed by ./Rare-Metal-Worker.
| |
− | * An example command line for a related sample when you have genotype info saved in VCF file is as following:
| |
− | ./raremetalworker --ped your.pheno.ped --dat your.pheno.dat --vcf your.geno.vcf.gz --useCovariates --inverseNormal --prefix your.study
| |
− | * An example command line for a related sample when you have genotype info saved in PED/DAT file is as following:
| |
− | ./raremetalworker --ped your.ped --dat your.dat --useCovariates --inverseNormal --prefix your.study
| |
− | * An example command line for an unrelated sample when you have genotype info saved in PED/DAT file is as following:
| |
− | ./raremetalworker --ped your.ped --dat your.dat --useCovariates --inverseNormal --prefix your.study
| |
− | * An example command line for an unrelated sample when you have genotype info saved in VCF file is as following:
| |
− | ./raremetalworker --ped your.pheno.ped --dat your.pheno.dat --vcf your.geno.vcf.gz --useCovariates --inverseNormal --prefix your.study
| |
− | * An example command line to use when you have genotype info saved in VCF file and you want to adjust covariates first and then inverse normalize residuals is as following:
| |
− | ./raremetalworker --ped your.pheno.ped --dat your.pheno.dat --vcf your.geno.vcf.gz --makeResiduals --useCovariates --inverseNormal --prefix your.study
| |
− | * For more examples, please go to [[Examples]].
| |
− | | |
− | == Software Specifications ==
| |
− | | |
− | === Input Files ===
| |
− | Rare-Metal-Worker needs the following files as input: PED and DAT file in Merlin format, '''AND/OR''' a VCF file. When genotypes are stored in PED and DAT file, the VCF file is not needed. However, even if genotypes are saved in a VCF file, PED and DAT files are still needed for carrying covariate and trait information.
| |
− | | |
− | ==== PED and DAT Files ====
| |
− | * When PED file has genotypes saved, there is no need for a VCF file as input.
| |
− | * Rare-Metal-Worker takes PED/DAT file in Merlin format. Please refer to [[http://www.sph.umich.edu/csg/abecasis/merlin/tour/input_files.html PED/DAT format description]] for details.
| |
− | * An example PED file is in the following:
| |
− | 1 1 0 0 1 1.5 1 23 A A A A A A A A A A
| |
− | 2 1 0 0 1 1.0 1 34 A C A C A C A C A C
| |
− | 3 1 0 0 2 0.4 1 43 A A A A A A A A A A
| |
− | 4 1 0 0 2 0.9 1 13 A C A C A C A C A C
| |
− | * The matching DAT file is in the following:
| |
− | T YourTraitName
| |
− | C SEX
| |
− | C AGE
| |
− | M 1:123456
| |
− | M 1:234567
| |
− | M 2:111111
| |
− | M 2:222222
| |
− | M X:12345
| |
− | * DAT file must have variant names in the following format "M chr:pos".
| |
− | * Orders of labels in DAT file have to match the order of fields in PED file.
| |
− | * '''Markers in PED and DAT file must be sorted by chromosome and position.'''
| |
− | | |
− | * Covariate and trait values are saved in PED file. Covariate and trait descriptions are saved in DAT file.
| |
− | | |
− | ==== VCF File ====
| |
− | * Another option is to use VCF as input. Please refer to the following link for VCF file specification: [[http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41 1000 genome wiki VCF specs]]
| |
− | * VCF file should be compressed by bgzip and indexed by tabix, using the following command:
| |
− | bgzip input.vcf ## this command will produce input.vcf.gz
| |
− | tabix -pvcf -f input.vcf.gz ## this command will produce input.vcf.gz.tbi
| |
− | * Even with the presence of VCF file, PED/DAT files are still needed for covariates and phenotypes.
| |
− | | |
− | === Software Options ===
| |
− | The following options are currently available in Rare-Metal-Worker:
| |
− | | |
− | Options:
| |
− | Input Files : --ped [Exomechip.pheno.ped], --dat [Exomechip.pheno.dat], --vcf []
| |
− | Output Files : --prefix [], --LDwindow [1000000]
| |
− | VC Options : --vcShared, --vcX, --useCovariates [ON]
| |
− | Trait Options : --makeResiduals, --inverseNormal [ON], --traitName []
| |
− | Kinship Source : --kinPedigree [ON], --kinGeno, --kinFile [], --kinSave
| |
− | Kinship Options : --kinMaf [0.05], --kinMiss [0.05]
| |
− | Chromosome X : --xLabel [X], --xStart [2699520], --xEnd [154931044]
| |
− | | |
− | | |
− | ==== Input Files ====
| |
− | * When genotypes are saved in a VCF file, PED and DAT files are used for specifying pedigree structure, covariate and trait information. An example command line might look like this:
| |
− | --ped input.ped --dat input.dat --vcf input.vcf.gz
| |
− | * When genotypes are saved in the PED file, the VCF file is not needed. An example command line might look like this:
| |
− | --ped input.ped --dat input.dat
| |
− | | |
− | ==== Output Files ====
| |
− | * --prefix is optional.
| |
− | * If --prefix is not specified, the output file names will be:
| |
− | traitname.singlevar.score.txt
| |
− | traitname.singlevar.cov.txt
| |
− | * If --prefix prefix is specified, then the output file names are:
| |
− | prefix.traitname.singlevar.score.txt
| |
− | prefix.traitname.singlevar.cov.txt
| |
− | * --LDwindow specifies the length of the window that LD Matrix should be generated upon each variant. The default is 1MB.
| |
− | | |
− | ==== VC Options ====
| |
− | * When --vcShared and --vcX are specified, Rare-Metal-Worker knows that you want to fit shared environment and/or chromosome X variance component together with genetic component and non-shared environment.
| |
− | * When --useCovariates is specified, Rare-Metal-Worker understands covariates should be read from PED file. Covariates are modeled as fixed effects.
| |
− | ==== Trait Options ====
| |
− | * --makeResiduals can be combined with --useCovariates to generate residuals from a simple linear regressions before analysis. If the --inverseNormal option is also used, then the residuals will be quantile normalized before fitting variance component model.
| |
− | ** An example Command line requesting pre-adjustment for covariates before fitting a variance component follows:
| |
− | --useCovariates --makeResiduals --inverseNormal
| |
− | ** An example command line requesting joint modeling of fixed effects and variance components follows:
| |
− | --useCovariates --inverseNormal
| |
− | * If --inverseNormal is used WITHOUT --makeResiduals, then trait values are inverse normalized before any model fitting.
| |
− | * --traitName is created for situations when you have many traits saved in your PED and DAT file, but you are interested in one or a few of them. It can read a file ending with .txt with each trait of interest in a separate line, or trait names separated with "/". An example to handle one trait or multiple traits is in the following:
| |
− | --traitName LDL
| |
− | --traitName LDL/HDL/TG
| |
− | --traitName traitsOfInterest.txt
| |
− | * If --traitName is not used, all traits in PED/DAT file will be analyzed.
| |
− | | |
− | ==== Kinship Source ====
| |
− | * --kinPedigree allows Rare-Metal-Worker to generate kinship matrix from pedigree, when pedigree information is available. This option is on by default.
| |
− | * --kinGeno informs Rare-Metal-Worker to generate kinship matrix from all available variants that pass the criteria, specified in --kinMaf and --kinMiss options. The default will take variants with MAF>0.05 and genotype missing rate <0.05.
| |
− | * --kinFile let Rare-Metal-Worker read in a kinship matrix from a file. The first row of the kinship file has to be the sample IDs included in the kinship file. If a sample of interest is not included in the kinship file, fatal error will occur and the program will be terminated. A sample of interest is a sample that is phenotyped and has all covariates measured when --useCovariates is specified.
| |
− | * --kinSave allows you to save the kinship matrix.
| |
− | | |
− | ==== Kinship Options ====
| |
− | * --kinMiss and --kinMaf should be used with --kinGeno together.
| |
− | * --kinMiss specifies the maximum genotype missing rate when calculating kinship from genotypes. The default is 0.05.
| |
− | * --kinMaf specifies the minimum minor allele frequency used when calculating kinship from genotypes. The default is 0.05.
| |
− | | |
− | ==== Chromosome X ====
| |
− | * --xLabel should have a value of a string which specifies how variants on chromosome X are coded. The default is "X".
| |
− | * --xStart and --xEnd specifies the start and end of non-pseudo-autosomal regions on chromosome X. These options should be specified when --vcX is used.
| |
− | * The default for --xStart is 2699520 and default for --xEnd is 154931044, according to NCBI genome build 37.
| |
− | | |
− | === Handling Unrelated Individuals ===
| |
− | * To let Rare-Metal-Worker handle unrelated individuals, we just have to code the individuals as unrelated in PED file, or each individual belongs to a unique family. Then Rare-Metal-Worker will take care of the rest.
| |
− | * However, when --kinGenotype is also used, Rare-Metal-Worker will consider them as related and generate kinship matrix from genotypes.
| |
− | * An example is shown as following (header is included for illustration purpose, not in real PED file):
| |
− | | |
− | famid pid fid mid sex age trait
| |
− | 1 1.1 0 0 1 10 -0.3
| |
− | 2 2.1 0 0 1 56 0.0
| |
− | 3 3.1 0 0 2 31 0.4
| |
− | 4 4.1 0 0 2 23 0.008
| |
− | 5 5.1 0 0 2 34 2.35
| |
− | | |
− | == Output Formats ==
| |
− | | |
− | * There are three files generated automatically by default:
| |
− | | |
− | prefix.traitName.singlevar.score.txt
| |
− | prefix.traitName.singlevar.cov.txt
| |
− | prefix.singlevar.log
| |
− |
| |
− | * prefix.traitName.singlevar.score.txt contains summary statistics that are needed by Rare-Metal. An example is shown in below:
| |
− | | |
− | LDL mean= -0.00, variance= 1.00, heritability= 34.30
| |
− | CHR POS REF_ALLELE ALT_ALLELE INFORMATIVE_N FOUNDER_AF ALL_AF INFORMATIVE_AC HWE_PVALUE STAT ALT_ALLELE_EFFSIZE PVALUE
| |
− | 10 45410002 G A 6103 0.0341589 0.0341589 410 0.165893 126.205 0.309798 4.03074e-10
| |
− | 19 45412079 G A 6103 0.0368124 0.0368124 434 0.714645 -265.84 -0.587356 7.87851e-36
| |
− | 19 45414451 G A 6103 0.444989 0.444989 5312 0.0759271 -26.1212 -0.00837122 0.640058
| |
− | | |
− | * pvalues from the above output are from the family-based single variant score test.
| |
− | | |
− | * prefix.traitName.singlevar.cov.txt contains the LD matrix among a variant and the adjacent markers within a prefixed-sized window. The default window size is 1MB. It has the following format:
| |
− |
| |
− | CHR POS VAR_POS_IN_WINDOW LD_MATRIX
| |
− | 1 762320 762320,865628,865665,878744,879381,1560000 0.0359084,-0.000242112,-0.00125797,-0.000993422,-0.000344509,-0.00017077,
| |
− | 1 865628 865628,865665,878744,879381,1560000,1864659 0.419804,-0.0103663,-0.00635265,0.0594056,0.0534505,-0.00462183,
| |
− | 1 878744 878744,879381,1560000,1864659,1877659 0.000404537,-0.000235215,-1.4455e-05,-8.69137e-06,-3.1027e-05,
| |
− | | |
− | * An example log file is in the following:
| |
− | Summary statistics for trait LDL have been saved in LDL.singlevar.score.txt.
| |
− | LD matrices for trait LDL have been saved in LDL.singlevar.cov.txt.
| |
− |
| |
− | Rare-Metal-Worker handled all individuals as related.
| |
− |
| |
− | The following parameters are in effect:
| |
− |
| |
− | Input Files:
| |
− | ============================
| |
− | --ped [APOE.ped]
| |
− | --dat [APOE.dat]
| |
− | --vcf []
| |
− |
| |
− | Output Files:
| |
− | ============================
| |
− | --prefix []
| |
− | --LDwindow [1000000]
| |
− |
| |
− | VC Options:
| |
− | ============================
| |
− | --vcShared [false]
| |
− | --vcX [false]
| |
− | --useCovariates [false]
| |
− |
| |
− | Trait Options:
| |
− | ============================
| |
− | --makeResiduals [true]
| |
− | --inverseNormal [true]
| |
− | --traitName [LDL]
| |
− |
| |
− | Kinship Source:
| |
− | ============================
| |
− | --kinPedigree [true]
| |
− | --kinGeno [false]
| |
− | --kinFile []
| |
− | --kinSave [false]
| |
− |
| |
− | Kinship Options:
| |
− | ============================
| |
− | --kinMaf [0.05]
| |
− | --kinMiss [0.05]
| |
− |
| |
− | Chromosome X:
| |
− | ============================
| |
− | xLabel [X]
| |
− | xStart [2699520]
| |
− | xEnd [154931044]
| |
− | | |
− | == Examples ==
| |
− | | |
− | === Related individuals ===
| |
− | * When you have genotype stored in ped file and dat file, and want to use pedigree kinship and inverse normalize trait values before adjusting any covariates and doing analysis:
| |
− | | |
− | /bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --traitName LDL --inverseNormal --useCovariates
| |
− | | |
− | * When you have genotype stored in ped file and dat file, and want to use pedigree kinship and adjust covariates before inverse normalizing the residuals and doing further analysis:
| |
− | | |
− | /bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --traitName LDL --useCovariates --makeResiduals --inverseNormal
| |
− | | |
− | * When you have genotype stored in ped file and dat file, and want to use kinship generated from genotypes:
| |
− |
| |
− | /bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --kinGeno --kinSave --traitName LDL (--kinSave allows you to save kinship matrix for future use; it is optional.)
| |
− | | |
− | * When you have genotype stored in vcf file and want to use pedigree kinship:
| |
− | | |
− | /bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz
| |
− | | |
− | * When you have genotype stored in vcf file and want to use kinship generated from genotype:
| |
− | | |
− | /bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz --kinGeno --kinSave (--kinSave allows you to save kinship matrix for future use.)
| |
− | | |
− | === Unrelated individuals ===
| |
− | | |
− | * Commands are the same as in above example, except each individual has to have a distinct family ID in PED file, and their father and mother ids should be "0".
| |
− | * When you have genotypes from ped and marker information from dat file, and assuming no relatedness in the sample:
| |
− | | |
− | ./raremetalworker --ped yours.ped --dat yours.dat
| |
− | | |
− | * When you have genotypes from vcf and covariates and trait information saved in ped and dat file, assuming there is no relatedness in the sample, you should use the following:
| |
− | | |
− | ./raremetalworker --ped yours.ped --dat yours.dat --vcf yours.vcf.gz
| |
− | | |
− | * When you have genotypes from vcf and covariates and trait information saved in ped and dat file, assuming there is cryptic relatedness in the sample, you should use the following:
| |
− | | |
− | ./raremetalworker --ped yours.ped --dat yours.dat --vcf yours.vcf.gz --kinGeno (# this will handle individuals as related, and generate kinship matrix from genotype.)
| |
− | | |
− | == Q & A ==
| |