Difference between revisions of "RAREMETALWORKER"

From Genome Analysis Wiki
Jump to navigationJump to search
 
(303 intermediate revisions by 6 users not shown)
Line 1: Line 1:
'''Rare-Metal-Worker''' is a tool for generating summary statistics for rare variants and gene level meta analyses using Rare-Metal. It handles both related individuals and unrelated individuals.
+
[[Category:RAREMETALWORKER]]
  
If you have any questions, please contact: sfengsph at umich dot edu
+
'''RAREMETALWORKER''' is a tool for single variant analysis, generating summary statistics for gene level meta analyses in [http://genome.sph.umich.edu/wiki/RAREMETAL '''RAREMETAL'''].
==Authors==
 
* Shuang Feng worked on method development and software implementation in C++.
 
* Dajiang Liu contributed in method development.  
 
* Mary Kate Wing wrote and is in charge of maintaining the VCF library which is used in RareMetalWorker. She also offers precious help in debugging.
 
* Gonçalo Abecasis supervises the entire task.
 
  
== Change Log ==
+
If you feel this program is useful, please tell us your name and contact in this [https://docs.google.com/spreadsheet/ccc?key=0AuYjznTeEDYudFpqUk9sQ2pkN3d3endjYldqMEp6ZUE&usp=sharing '''registration'''].
* Version 0.0.1 was released on 11/13/2012.
+
 
* Modified Rare-Metal-Worker to let it output LD matrix by a sliding window. (11/14/2012)
+
If you have any questions, please contact Sai Chen (saichen at umich dot edu) or [[Goncalo_Abecasis | '''Goncalo Abecasis''']] (goncalo at umich dot edu).
* Uploaded to public wiki. (11/16/2012)
+
 
* Enabled writing log file by defalut. (11/18/2012)
+
 
* Forced sample IDs to be matched when reading in kinship from a file. Perform a sanity check before reading in kinship file. If a sample of interest is not included in kinship file, then fatal error will occur. (11/19/2012)
+
== Useful Wiki Pages ==
* Added HWE pvalue and call rate in summary statistics output. (11/27/2012)
+
 
* Bugs fixed to solve compiling errors on some machines (Thank you Mary Kate!). Version 0.0.2 released. (11/30/2012)
+
There are several pages in this Wiki that may be useful to RAREMETALWORKER users. Here are links to key pages:
* Updated output format. Version 0.0.3 released. (12/3/2012)
+
* The [[RAREMETALWORKER_command_reference | '''RAREMETALWORKER command reference''']]
* More messages coded into log file. (12/4/2012)
+
* The [[RAREMETALWORKER_method | '''RAREMETALWORKER method''']]
* Version 0.0.4 released. (12/5/2012)
+
* The [[Tutorial:_RAREMETAL| '''RAREMETALWORKER quick start tutorial''']]
* Bug fixed for empirical kinship calculation when genotypes are read from VCF file. Version 0.0.5 released. (12/6/2012)
+
* The [[RAREMETALWORKER_SPECIAL_TOPICS | '''RAREMETALWORKER special topics''']]
* Version 0.0.6 released. (12/6/2012)
+
* The [[RAREMETAL_Documentation | '''RAREMETAL documentation''']]
* Updated output format for monomorphic sites. (12/7/2012)
+
* The [[RAREMETAL FAQ | '''FAQ''']]
* Changed executable name into bin/raremetalworker. Version 0.0.7 released. (12/10/2012)
+
* The [[RAREMETAL_Change_Log | '''Change Log''']]
* Fixed a bug when reading vcf file with ref or alt allele is missing. (2/5/2013)
 
* Fixed a bug when there is missing genotype from VCF file. (2/2013)
 
* Fixed a bug when handling chromosome X. Added sex labels option. (3/2/2013)
 
* Optimized code to speed up the process of calculating empirical kinship. (3/3/2013)
 
* Updated code to report allele frequencies calculated only from selected samples. (3/3/2013)
 
  
 
== Key Features ==
 
== Key Features ==
Rare-Metal-Worker has the following features:
+
RAREMETALWORKER has the following features:
 
* Takes genotypes from either PED file or VCF file.
 
* Takes genotypes from either PED file or VCF file.
 
* Generates summary statistics for both related and unrelated individuals.
 
* Generates summary statistics for both related and unrelated individuals.
Line 37: Line 27:
 
* Has the option of fitting shared environment.
 
* Has the option of fitting shared environment.
 
* Can handle variants on Chromosome X.
 
* Can handle variants on Chromosome X.
 +
* Calculates QC statistics such as hwe pvalue, call rate and genomic control.
 +
* Automatically generate QQ and manhattan plots.
  
 
== Software Download and Installation ==
 
== Software Download and Installation ==
  
=== Where to Download ===
+
=== DOWNLOAD ===
  
* The source package for Linux and Mac can be downloaded here: [[Media:RareMetalWorker.0.0.9.tgz|click to download]]
+
We have tested compilation on several platforms including Linux, MAC OS X, and Windows.  
* Save it to your local path and decompress using the following command:
 
  tar xvzf RareMetalWorker.0.0.9.tgz
 
* For UM CSG cluster users, no installation is needed. It is available at /net/fantasia/home/sfengsph/code/Rare-Metal/RareMetalWorker/bin/raremetalworker
 
  
=== How to Compile ===
+
For source code and executables together with instructions of building from source, please go to [[RAREMETAL_DOWNLOAD_%26_BUILD |'''DOWNLOAD source and executables''']].
  
* Go to /RareMetalWorker_0.0.8/RareMetalWorker/src and use the following command:
+
For questions about building and compilation, please go to [[RAREMETAL_FAQ | '''FAQ''']].
 
  make
 
  
 
=== How to Execute ===
 
=== How to Execute ===
  
* To execute the program, go to /RareMetalWorker_0.0.6/RareMetalWorker/bin, then the program can be executed by ./Rare-Metal-Worker.
+
 
* An example command line for a related sample when you have genotype info saved in VCF file is as following:
+
* To execute the program, go to /RareMetalWorker_0.4.8/RareMetalWorker/bin, issue ./raremetalworker.
  ./raremetalworker --ped your.pheno.ped --dat your.pheno.dat --vcf your.geno.vcf.gz --useCovariates --inverseNormal --prefix your.study
+
* For example command lines, please refer to [[RAREMETALWORKER#Example_Command_Lines | '''RAREMETALWORKER EXAMPLES''']].
* An example command line for a related sample when you have genotype info saved in PED/DAT file is as following:
+
 
  ./raremetalworker --ped your.ped --dat your.dat --useCovariates --inverseNormal --prefix your.study
+
==Method==
* An example command line for an unrelated sample when you have genotype info saved in PED/DAT file is as following:
+
 
  ./raremetalworker --ped your.ped --dat your.dat --useCovariates --inverseNormal --prefix your.study
+
Method description and key formulae can be found in [http://genome.sph.umich.edu/wiki/RAREMETALWORKER_method '''RAREMETALWORKER METHOD'''].
* An example command line for an unrelated sample when you have genotype info saved in VCF file is as following:
+
 
  ./raremetalworker --ped your.pheno.ped --dat your.pheno.dat --vcf your.geno.vcf.gz --useCovariates --inverseNormal --prefix your.study
+
==For Binary Traits==
* An example command line to use when you have genotype info saved in VCF file and you want to adjust covariates first and then inverse normalize residuals is as following:
+
 
  ./raremetalworker --ped your.pheno.ped --dat your.pheno.dat --vcf your.geno.vcf.gz --makeResiduals --useCovariates --inverseNormal --prefix your.study
+
RAREMETALWORKER currently treat all traits as quantitative. If your trait is binary, the odds ratio can be approximated from effect size estimates generated by RAREMETALWORKER. The installation/source package has a script included to augment the odds ratio estimates to the last column of the RAREMETALWORKER output. For details, please refer to [[RAREMETAL_DOWNLOAD_%26_BUILD#Calculating_Odds_Ratio_from_RAREMETALWORKER_output | '''Calculate Odds Ratio from RAREMETALWORKER output''']].
* For more examples, please go to [[http://genome.sph.umich.edu/wiki/Rare-Metal-Worker#Examples Examples]].
 
  
 
== Software Specifications ==
 
== Software Specifications ==
  
=== Input Files ===
+
===INTERFACE===
Rare-Metal-Worker needs the following files as input: PED and DAT file in Merlin format, '''AND/OR''' a VCF file. When genotypes are stored in PED and DAT file, the VCF file is not needed. However, even if genotypes are saved in a VCF file, PED and DAT files are still needed for carrying covariate and trait information.  
+
 
 +
RAREMETALWORKER is a command line tool. Once you execute, you will see a full list of options printed on the screen.
 +
 
 +
For detailed description of command options, please go to [[RAREMETALWORKER_command_reference | '''command reference''']].
 +
 
 +
Options:
 +
      Input Files : --ped [], --dat [], --vcf [], --dosage, --noeof
 +
      Output Files : --prefix [], --LDwindow [1000000], --zip, --thin,
 +
                    --labelHits
 +
        VC Options : --vcX, --separateX
 +
    Trait Options : --makeResiduals, --inverseNormal, --traitName []
 +
    Model Options : --recessive, --dominant
 +
    Kinship Source : --kinPedigree, --kinGeno, --kinFile [], --kinxFile [],
 +
                    --kinSave
 +
  Kinship Options : --kinMaf [0.05], --kinMiss [0.05]
 +
      Chromosome X : --xLabel [X], --xStart [2699520], --xEnd [154931044],
 +
                    --maleLabel [1], --femaleLabel [2]
 +
            others : --cpu [1], --kinOnly,
 +
                    --geneMap [../data/refFlat_hg19.txt]
 +
        PhoneHome : --noPhoneHome, --phoneHomeThinning [100]
 +
 
 +
===INPUT FILE FORMAT===
 +
 
 +
RMW needs the following files as input: PED and DAT file in Merlin format, '''AND/OR''' a VCF file. When genotypes are stored in PED and DAT file, the VCF file is not needed. However, even if genotypes are saved in a VCF file, PED and DAT files are still needed for carrying covariate and trait information.  
  
 
==== PED and DAT Files ====
 
==== PED and DAT Files ====
 
* When PED file has genotypes saved, there is no need for a VCF file as input.
 
* When PED file has genotypes saved, there is no need for a VCF file as input.
* Rare-Metal-Worker takes PED/DAT file in Merlin format. Please refer to [[http://www.sph.umich.edu/csg/abecasis/merlin/tour/input_files.html PED/DAT format description]] for details.
+
* RMW takes PED/DAT file in Merlin format. Please refer to [http://www.sph.umich.edu/csg/abecasis/merlin/tour/input_files.html PED/DAT format description] for details.
 +
* PED file requires "dummy" parents to be included in the pedigree file. To check the integrity of your PED/DAT file, please use [http://www.sph.umich.edu/csg/abecasis/PedStats '''pedstats''']. To add dummy parents into the pedigree, please use the [[Media:Script.tgz | '''perl script''']].
 
* An example PED file is in the following:
 
* An example PED file is in the following:
 
     1 1 0 0 1 1.5 1 23 A A A A A A A A A A
 
     1 1 0 0 1 1.5 1 23 A A A A A A A A A A
Line 94: Line 104:
 
* '''Markers in PED and DAT file must be sorted by chromosome and position.'''
 
* '''Markers in PED and DAT file must be sorted by chromosome and position.'''
  
* Covariate and trait values are saved in PED file. Covariate and trait descriptions are saved in DAT file.
+
* Covariate and trait values are saved in PED file. Covariate and trait descriptions are saved in DAT file. Note that you must specify <code>--makeResiduals</code> in order to adjust the covariates out of the phenotype. See [[RAREMETALWORKER#Example_Command_Lines | Example Command Lines]] for examples and [[RAREMETALWORKER_command_reference#Trait_Options | Trait Options]] for more information.
  
 
==== VCF File ====
 
==== VCF File ====
* Another option is to use VCF as input. Please refer to the following link for VCF file specification: [[http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41 1000 genome wiki VCF specs]]  
+
=====GENOTYPES=====
 +
* Another option is to use VCF as input. Please refer to the following link for VCF file specification: [http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41 1000 genome wiki VCF specs]
 
* VCF file should be compressed by bgzip and indexed by tabix, using the following command:
 
* VCF file should be compressed by bgzip and indexed by tabix, using the following command:
 
   bgzip input.vcf    ## this command will produce input.vcf.gz
 
   bgzip input.vcf    ## this command will produce input.vcf.gz
   tabix -pvcf -f input.vcf.gz  ## this command will produce input.vcf.gz.tbi
+
   tabix -p vcf -f input.vcf.gz  ## this command will produce input.vcf.gz.tbi
 
* Even with the presence of VCF file, PED/DAT files are still needed for covariates and phenotypes.
 
* Even with the presence of VCF file, PED/DAT files are still needed for covariates and phenotypes.
 +
* Are you using PLINK file formats? Converting to VCF is easy. Use WDIST (very similar to PLINK) to make the conversion. Visit this page [https://www.cog-genomics.org/wdist/ | WDIST] to find documentation and downloads for WDIST.
  
=== Software Options ===
 
The following options are currently available in Rare-Metal-Worker:
 
  
    Options:
 
      Input Files : --ped [Exomechip.pheno.ped], --dat [Exomechip.pheno.dat], --vcf []
 
      Output Files : --prefix [], --LDwindow [1000000]
 
        VC Options : --vcShared, --vcX, --useCovariates [ON]
 
    Trait Options : --makeResiduals, --inverseNormal [ON], --traitName []
 
    Kinship Source : --kinPedigree [ON], --kinGeno, --kinFile [], --kinSave
 
  Kinship Options : --kinMaf [0.05], --kinMiss [0.05]
 
      Chromosome X : --xLabel [X], --xStart [2699520], --xEnd [154931044]
 
 
 
==== Input Files ====
 
 
* When genotypes are saved in a VCF file, PED and DAT files are used for specifying pedigree structure, covariate and trait information. An example command line might look like this:
 
* When genotypes are saved in a VCF file, PED and DAT files are used for specifying pedigree structure, covariate and trait information. An example command line might look like this:
 
   --ped input.ped --dat input.dat --vcf input.vcf.gz
 
   --ped input.ped --dat input.dat --vcf input.vcf.gz
Line 122: Line 121:
 
   --ped input.ped --dat input.dat
 
   --ped input.ped --dat input.dat
  
==== Output Files ====
+
=====DOSAGE=====
* --prefix is optional.  
+
* If you want to analyze dosage data from VCF file, the following option has to be specified: --dosage. A key word "DS" in FORMAT field in VCF file has to included accordingly. An example is in the following:
* If --prefix is not specified, the output file names will be:
+
 
   traitname.singlevar.score.txt
+
   #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT IDx ID1 ID2 ID3
  traitname.singlevar.cov.txt
+
  22 16050408 37239779 T C . PASS AC=2;AN=496 GT:DS:GP ./.:.:0,0,0 ./.:.:0,0,0 ./.:.:0,0,0
* If --prefix prefix is specified, then the output file names are:
+
   22 16050933 37239784 G A . PASS AC=141;AN=904 GT:DS:GP 0/0:0.0:1,0,0 0/0:0.0:1,0,0 0/0:0.0:1,0,0
  prefix.traitname.singlevar.score.txt
+
 
   prefix.traitname.singlevar.cov.txt
+
* --noeof allows using VCF file without BGZF EOF markers. This is a very rare option to use. If your run is terminated with error message: "", then you might want to check out this option.
* --LDwindow specifies the length of the window that LD Matrix should be generated upon each variant. The default is 1MB.
+
 
 +
=== OUTPUT===
  
==== VC Options ====
+
====OUTPUT FILE NAMES====
* When --vcShared and --vcX are specified, Rare-Metal-Worker knows that you want to fit shared environment and/or chromosome X variance component together with genetic component and non-shared environment.
 
* When --useCovariates is specified, Rare-Metal-Worker understands covariates should be read from PED file. Covariates are modeled as fixed effects.
 
==== Trait Options ====
 
* --makeResiduals can be combined with --useCovariates to generate residuals from a simple linear regressions before analysis. If the --inverseNormal option is also used, then the residuals will be quantile normalized before fitting variance component model.
 
** An example Command line requesting pre-adjustment for covariates before fitting a variance component follows:
 
  --useCovariates --makeResiduals --inverseNormal
 
** An example command line requesting joint modeling of fixed effects and variance components follows:
 
  --useCovariates --inverseNormal
 
* If --inverseNormal is used WITHOUT --makeResiduals, then trait values are inverse normalized before any model fitting.
 
* --traitName is created for situations when you have many traits saved in your PED and DAT file, but you are interested in one or a few of them. It can read a file ending with .txt with each trait of interest in a separate line, or trait names separated with "/". An example to handle one trait or multiple traits is in the following:
 
  --traitName LDL
 
  --traitName LDL/HDL/TG
 
  --traitName traitsOfInterest.txt
 
* If --traitName is not used, all traits in PED/DAT file will be analyzed.
 
  
==== Kinship Source ====
+
* Three files are generated automatically by default:
* --kinPedigree allows Rare-Metal-Worker to generate kinship matrix from pedigree, when pedigree information is available. This option is on by default.
+
  prefix.traitName.singlevar.score.txt (single variant summary statistics and QC statistics)
* --kinGeno informs Rare-Metal-Worker to generate kinship matrix from all available variants that pass the criteria, specified in --kinMaf and --kinMiss options. The default will take variants with MAF>0.05 and genotype missing rate <0.05.
+
  prefix.traitName.singlevar.cov.txt (covariance matrices of single variant score statistics)
* --kinFile let Rare-Metal-Worker read in a kinship matrix from a file. The first row of the kinship file has to be the sample IDs included in the kinship file. If a sample of interest is not included in the kinship file, fatal error will occur and the program will be terminated. A sample of interest is a sample that is phenotyped and has all covariates measured when --useCovariates is specified.
+
  prefix.singlevar.log (log file)
* --kinSave allows you to save the kinship matrix.
 
  
==== Kinship Options ====
+
* If --zip option is used, then the following will be generated automatically:
* --kinMiss and --kinMaf should be used with --kinGeno together.  
+
  prefix.traitName.singlevar.score.txt.gz
* --kinMiss specifies the maximum genotype missing rate when calculating kinship from genotypes. The default is 0.05.
+
  prefix.traitName.singlevar.score.txt.gz.tbi
* --kinMaf specifies the minimum minor allele frequency used when calculating kinship from genotypes. The default is 0.05.
+
  prefix.traitName.singlevar.cov.txt.gz
 +
  prefix.traitName.singlevar.cov.txt.gz.tbi
 +
  prefix.singlevar.log
  
==== Chromosome X ====
+
* If --recessive and/or --dominant options are used, then the following files are also generated '''in addition''' to the above files
* --xLabel should have a value of a string which specifies how variants on chromosome X are coded. The default is "X".
+
  prefix.traitName.recessive.singlevar.score.txt.gz
* --xStart and --xEnd specifies the start and end of non-pseudo-autosomal regions on chromosome X. These options should be specified when --vcX is used.
+
  prefix.traitName.recessive.singlevar.cov.txt.gz
* The default for --xStart is 2699520 and default for --xEnd is 154931044, according to NCBI genome build 37.
+
  prefix.traitName.dominant.singlevar.score.txt.gz
 +
  prefix.traitName.dominant.singlevar.cov.txt.gz
  
=== Handling Unrelated Individuals ===
+
* If --kinGeno --kinSave is used, then the genomic relationship matrix is stored in
* To let Rare-Metal-Worker handle unrelated individuals, we just have to code the individuals as unrelated in PED file, or each individual belongs to a unique family. Then Rare-Metal-Worker will take care of the rest.
+
  prefix.Empirical.Kinship.gz
* However, when --kinGenotype is also used, Rare-Metal-Worker will consider them as related and generate kinship matrix from genotypes.
 
* An example is shown as following (header is included for illustration purpose, not in real PED file):
 
  
  famid pid fid mid sex age trait
+
* If --vcX option is used, then the genomic relationship matrix from chromosome X is stored in
  1    1.1  0  0  1  10  -0.3
+
   prefix.Empirical.KinshipX.gz
  2    2.1  0  0  1  56  0.0
 
  3    3.1  0  0   2  31  0.4
 
  4    4.1  0  0  2  23  0.008
 
  5    5.1  0  0  2  34  2.35
 
  
== Output Formats ==
+
====OUTPUT FILE FORMATS====
  
* There are three files generated automatically by default:
+
=====Summary Statistics=====
 +
* In the file with summary statistics named prefix.traitName.singlevar.score.txt contains summary statistics that are needed by Rare-Metal. An example is shown in below:
  
   prefix.traitName.singlevar.score.txt
+
LDL mean= -0.00, variance=  1.00, heritability= 34.30
   prefix.traitName.singlevar.cov.txt
+
CHR      POS REF_ALLELE ALT_ALLELE  INFORMATIVE_N  FOUNDER_AF    ALL_AF  INFORMATIVE_AC  HWE_PVALUE      STAT  ALT_ALLELE_EFFSIZE        PVALUE
  prefix.singlevar.log
+
   10  45410002          G          A          6103    0.034159  0.034159            410    0.165893  126.2050            0.309798  4.030740e-10
   
+
   19  45412079          G          A          6103    0.036812  0.036812            434    0.714645 -265.8400          -0.587356  7.878510e-36
* prefix.traitName.singlevar.score.txt contains summary statistics that are needed by Rare-Metal. An example is shown in below:
+
  19 45414451          G          A          6103    0.444989  0.444989            5312    0.075927  -26.1212          -0.008371  6.400580e-01
  
  LDL mean= -0.00, variance=  1.00, heritability= 34.30
 
  CHR    POS    REF_ALLELE      ALT_ALLELE      INFORMATIVE_N  FOUNDER_AF      ALL_AF  INFORMATIVE_AC  HWE_PVALUE      STAT    ALT_ALLELE_EFFSIZE      PVALUE
 
  10  45410002        G      A      6103    0.0341589      0.0341589      410    0.165893        126.205 0.309798        4.03074e-10
 
  19  45412079        G      A      6103    0.0368124      0.0368124      434    0.714645        -265.84 -0.587356      7.87851e-36
 
  19  45414451        G      A      6103    0.444989        0.444989        5312    0.0759271      -26.1212        -0.00837122    0.640058
 
  
 
* pvalues from the above output are from the family-based single variant score test.
 
* pvalues from the above output are from the family-based single variant score test.
  
 +
=====LD Matrices=====
 
* prefix.traitName.singlevar.cov.txt contains the LD matrix among a variant and the adjacent markers within a prefixed-sized window. The default window size is 1MB. It has the following format:
 
* prefix.traitName.singlevar.cov.txt contains the LD matrix among a variant and the adjacent markers within a prefixed-sized window. The default window size is 1MB. It has the following format:
 
    
 
    
  CHR   POS       VAR_POS_IN_WINDOW                             LD_MATRIX
+
CHR     POS                           VAR_POS_IN_WINDOW                                                                 LD_MATRIX
  762320    762320,865628,865665,878744,879381,1560000   0.0359084,-0.000242112,-0.00125797,-0.000993422,-0.000344509,-0.00017077,
+
  1 762320   762320,865628,865665,878744,879381,1560000 0.0359084,-0.000242112,-0.00125797,-0.000993422,-0.000344509,-0.00017077,
  1   865628     865628,865665,878744,879381,1560000,1864659   0.419804,-0.0103663,-0.00635265,0.0594056,0.0534505,-0.00462183,
+
  1 865628 865628,865665,878744,879381,1560000,1864659           0.419804,-0.0103663,-0.00635265,0.0594056,0.0534505,-0.00462183,
  1   878744     878744,879381,1560000,1864659,1877659         0.000404537,-0.000235215,-1.4455e-05,-8.69137e-06,-3.1027e-05,
+
  1 878744       878744,879381,1560000,1864659,1877659             0.000404537,-0.000235215,-1.4455e-05,-8.69137e-06,-3.1027e-05,
 +
 
 +
=====Genomic Relationship Matrix (GRM)=====
 +
 
 +
* Once --kinGeno --kinSave --prefix options are requested, you would expect to see a GRM generated (compressed by gzip) with name yourprefix.Empirical.Kinship.gz. If --prefix option is not used, then the file name is Empirical.Kinship.gz.
 +
* If --vcX --kinGeno --kinSave --prefix options are requested, besides the autosomal GRM, you would also expect to see a separate GRM for chromosome X saved (compressed by gzip also) under the name yourprefix.Empirical.KinshipX.gz.
 +
* The GRMs are generated based on all genotyped individuals included in the PED file; samples with missing phenotype or missing covariates are not excluded from GRMs. This feature makes GRMs reusable if you have multiple traits to analyze in separate runs. You can simplely use --kinFile option (--kinxFile option if you have X chromosome GRM together with --vcX option issued) to reuse the pre-saved GRMs.
 +
* The format for both autosomal and chromosome X GRMs are the same. The first row has all sample IDs (sample size=N) listed. The rest of the file is a symmetric matrix with dimention ''NxN'', and element ''ij'' of this matrix represents the kinship between the <math>i^{th}</math> and the <math>j^{th}</math> sample whose ID can be found from the first row.
 +
* For details about GRM calculation, please refer to [[RAREMETALWORKER_method | '''method''']].
 +
 
 +
=====Log File=====
 +
* RMW automatically generates a log file named "yourprefix.singlevar.log".
 +
* The first part of the log file has options used for your analysis saved.
  
* An example log file is in the following:
+
The following parameters are in effect:
  Summary statistics for trait LDL have been saved in LDL.singlevar.score.txt.
+
  LD matrices for trait LDL have been saved in LDL.singlevar.cov.txt.
+
Input Files:
 
+
============================
  Rare-Metal-Worker handled all individuals as related.
+
--ped [pheno.ped]
 
+
--dat [pheno.dat]  
  The following parameters are in effect:
+
--vcf [allvars.vcf.gz]
 
+
--dosage [false]
  Input Files:
+
--noeof [false]
  ============================
+
  --ped [APOE.ped]
+
Output Files:
  --dat [APOE.dat]
+
============================
  --vcf []
+
--prefix [rmw.test]
 
+
--LDwindow [1000000]
  Output Files:
+
--zip [false]
  ============================
+
--thin [false]
  --prefix []
+
--labelHits [false]
  --LDwindow [1000000]
+
 
+
VC Options:
  VC Options:
+
============================
  ============================
+
--vcX [true]
  --vcShared [false]
+
--separateX [true]
  --vcX [false]
+
  --useCovariates [false]
+
Trait Options:
 
+
============================
  Trait Options:
+
--makeResiduals [false]
  ============================
+
--inverseNormal [false]
  --makeResiduals [true]
+
--traitName []
  --inverseNormal [true]
+
  --traitName [LDL]
+
Model Options:
 
+
============================
  Kinship Source:
+
--recessive [false]
  ============================
+
--dominant [false]
  --kinPedigree [true]
+
  --kinGeno [false]
+
Kinship Source:
  --kinFile []
+
============================
  --kinSave [false]
+
--kinPedigree [true]
 
+
--kinGeno [false]
  Kinship Options:
+
--kinFile []
  ============================
+
--kinxFile []
  --kinMaf [0.05]
+
--kinSave [false]
  --kinMiss [0.05]
+
 
+
Kinship Options:
  Chromosome X:
+
============================
  ============================
+
--kinMaf [0.05]
  xLabel [X]
+
--kinMiss [0.05]
  xStart [2699520]
+
  xEnd [154931044]
+
Chromosome X:
 +
============================
 +
xLabel [X]
 +
xStart [2699520]
 +
xEnd [154931044]
 +
maleLabel [1]
 +
femaleLabel [2]
 +
 
 +
* The second part of the log file has all warnings and running messages saved.
 +
 
 +
=====Plots=====
 +
* RAREMETALWORKER generates QQ plot and Manhattan plots automatically, unless there are only trivial number of variants analyzed.
 +
* RAREMETALWORKER stores plots of each trait in separate files named ''yourprefix.traitname.plots.pdf''.
 +
* RAREMETALWORKER stores plots for recessive and dominant results separated with files named ''yourprefix.traitname.recessive.plots.pdf'' and ''yourprefix.traitname.dominant.plots.pdf''.
 +
* RAREMETALWORKER automatically generates three stratified QQ plots, one with all variants, one with variants of maf<0.05, and one with variants of maf<0.01.
 +
* Genomic controls are automatically calculated and labeled in QQ plots.
 +
* By using --labelHits option, users can choose to label the hits.
 +
* Here is an example QQ plot and manhattan plot generated by RAREMETALWORKER.
 +
 
 +
{| border="1" cellpadding="5" cellspacing="0" align="center"
 +
|-
 +
| align="center" width="100" | [[File:QQ.png]]
 +
|-
 +
| align="center" width="200" | [[File:Single_var_manhattan.png]]
 +
|}
 +
 
 +
===SPECIAL TOPICS===
 +
* For special topics such as how RAREMEALWORKER handles missing data, unrelated individuals, markers on chromosomeX, please go to [[RAREMETALWORKER_SPECIAL_TOPICS | '''SPECIAL TOPICS''']].
 +
 
 +
== Example Command Lines ==
 +
 
 +
The following list a few popular combinations of options used for analyses. For an itemized description of options, please go to [[RAREMETALWORKER_command_reference | '''COMMAND REFERENCE''']].
 +
 
 +
===General Usage===
 +
 
 +
* If your PED file has many traits but you only want one of them to be analyzed, then the following command does the trick:
 +
 
 +
  prompt> $PATH/bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz --traitName BMI --prefix yourFavoritePrefix
 +
 
 +
* If you want to inverse normalize (quantile normalize) your trait before doing associations, this can be done by adding --inverseNormal to your command line:
 +
 
 +
  prompt> $PATH/bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz --traitName BMI --prefix yourFavoritePrefix --inverseNormal
 +
 
 +
* The following command will adjust covariates first and then use residuals to proceed association:
 +
 
 +
  prompt> $PATH/bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz --traitName BMI --prefix yourFavoritePrefix --makeResiduals
 +
 
 +
* The following command will adjust covariates first and then use the inverse normalized residuals to proceed association:
  
== Examples ==
+
  prompt> $PATH/bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz --traitName BMI --prefix yourFavoritePrefix --makeResiduals --inverseNormal
  
 
=== Related individuals ===
 
=== Related individuals ===
* When you have genotype stored in ped file and dat file, and want to use pedigree kinship and inverse normalize trait values before adjusting any covariates and doing analysis:
 
  
   /bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --traitName LDL --inverseNormal --useCovariates
+
* When pedigree is known and you want to use it to count for relatedness then the following command can be used:
 +
 
 +
   prompt> $PATH/bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz --kinPedigree --prefix yourFavoritePrefix
 +
 
 +
* When you want to an estimated genomic relationship matrix to count for relatedness then the following command can be used:
 +
 
 +
  prompt> $PATH/bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz --prefix yourFavoritePrefix --kinGeno --kinSave (this will save the genomic relationship matrix for future use)
  
* When you have genotype stored in ped file and dat file, and want to use pedigree kinship and adjust covariates before inverse normalizing the residuals and doing further analysis:
+
* If the genomic relationship matrix has been saved previously, and you want to use it to count for relatedness then the following command can be used:
  
   /bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --traitName LDL --useCovariates --makeResiduals --inverseNormal
+
   prompt> $PATH/bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz --prefix yourFavoritePrefix --kinFile yourPreviouslySavedKinship
  
* When you have genotype stored in ped file and dat file, and want to use kinship generated from genotypes:
+
=== Unrelated individuals ===
 
 
  /bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --kinGeno --kinSave --traitName LDL (--kinSave allows you to save kinship matrix for future use; it is optional.)
 
  
* When you have genotype stored in vcf file and want to use pedigree kinship:  
+
* To analyze individuals as unrelated, even if pedigree is known, you just have to use the following command:
  
   /bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz
+
   prompt> $PATH/bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz --prefix yourFavoritePrefix
  
* When you have genotype stored in vcf file and want to use kinship generated from genotype:
+
===Analyzing Chromosome X===
  
  /bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz --kinGeno --kinSave (--kinSave allows you to save kinship matrix for future use.)
+
* To analyze markers on chromosome X, if relatedness is not considered, then no special options needs to be issued.  
  
=== Unrelated individuals ===
+
* When relatedness is modeled using linear mixed model, and pedigree is known, then the following command fits use both autosomal kinship and chromosomeX kinship to fit a variance component model:
  
* Commands are the same as in above example, except each individual has to have a distinct family ID in PED file, and their father and mother ids should be "0".
+
  prompt> $PATH/bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --kinPedigree --vcX --vcf yourInput.vcf.gz --prefix yourFavoritePrefix
* When you have genotypes from ped and marker information from dat file, and assuming no relatedness in the sample:
 
  
  ./raremetalworker --ped yours.ped --dat yours.dat
+
* Adding --separateX to the above command line will only use chromosome X kinship to fit the variance component model:
  
* When you have genotypes from vcf and covariates and trait information saved in ped and dat file, assuming there is no relatedness in the sample, you should use the following:
+
  prompt> $PATH/bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --kinPedigree --vcX --separateX --vcf yourInput.vcf.gz --prefix yourFavoritePrefix
  
  ./raremetalworker --ped yours.ped --dat yours.dat --vcf yours.vcf.gz
+
* Please refer to [[RAREMETALWORKER_method | '''METHODS''']] for methods and [[RAREMETALWORKER_SPECIAL_TOPICS#Analyzing_Chromosome_X | '''SPECIAL TOPICS''']] for technical details.
  
* When you have genotypes from vcf and covariates and trait information saved in ped and dat file, assuming there is cryptic relatedness in the sample, you should use the following:
+
===Using MERLIN format PED/DAT INPUT FILES===
 +
* When genotypes are stored in MERLIN format PED/DAT files, command should be the same to do the above analysis, except --vcf option should be excluded.
 +
* Please refer to [[RAREMETALWORKER#PED_and_DAT_Files | '''PED/DAT format''']] for format requirements.
  
  ./raremetalworker --ped yours.ped --dat yours.dat --vcf yours.vcf.gz --kinGeno (# this will handle individuals as related, and generate kinship matrix from genotype.)
+
== Tutorial ==
 +
* For a comprehensive tutorial of RMW and RAREMETAL using example data sets, please go to the following:
  
== Q & A ==
+
  [http://genome.sph.umich.edu/wiki/Tutorial:_RareMETAL '''RAREMETAL and RAREMETALWORKER Tutorial''']

Latest revision as of 13:34, 20 November 2019


RAREMETALWORKER is a tool for single variant analysis, generating summary statistics for gene level meta analyses in RAREMETAL.

If you feel this program is useful, please tell us your name and contact in this registration.

If you have any questions, please contact Sai Chen (saichen at umich dot edu) or Goncalo Abecasis (goncalo at umich dot edu).


Useful Wiki Pages

There are several pages in this Wiki that may be useful to RAREMETALWORKER users. Here are links to key pages:

Key Features

RAREMETALWORKER has the following features:

  • Takes genotypes from either PED file or VCF file.
  • Generates summary statistics for both related and unrelated individuals.
  • Generates linkage disequilibrium matrices summarizing covariance between single marker statistics using an adjustable sliding window.
  • Optionally handles related individuals using a kinship matrix derived from either pedigree or genotype data.
  • Has the option of fitting shared environment.
  • Can handle variants on Chromosome X.
  • Calculates QC statistics such as hwe pvalue, call rate and genomic control.
  • Automatically generate QQ and manhattan plots.

Software Download and Installation

DOWNLOAD

We have tested compilation on several platforms including Linux, MAC OS X, and Windows.

For source code and executables together with instructions of building from source, please go to DOWNLOAD source and executables.

For questions about building and compilation, please go to FAQ.

How to Execute

  • To execute the program, go to /RareMetalWorker_0.4.8/RareMetalWorker/bin, issue ./raremetalworker.
  • For example command lines, please refer to RAREMETALWORKER EXAMPLES.

Method

Method description and key formulae can be found in RAREMETALWORKER METHOD.

For Binary Traits

RAREMETALWORKER currently treat all traits as quantitative. If your trait is binary, the odds ratio can be approximated from effect size estimates generated by RAREMETALWORKER. The installation/source package has a script included to augment the odds ratio estimates to the last column of the RAREMETALWORKER output. For details, please refer to Calculate Odds Ratio from RAREMETALWORKER output.

Software Specifications

INTERFACE

RAREMETALWORKER is a command line tool. Once you execute, you will see a full list of options printed on the screen.

For detailed description of command options, please go to command reference.

Options:
      Input Files : --ped [], --dat [], --vcf [], --dosage, --noeof
     Output Files : --prefix [], --LDwindow [1000000], --zip, --thin,
                    --labelHits
       VC Options : --vcX, --separateX
    Trait Options : --makeResiduals, --inverseNormal, --traitName []
    Model Options : --recessive, --dominant
   Kinship Source : --kinPedigree, --kinGeno, --kinFile [], --kinxFile [],
                    --kinSave
  Kinship Options : --kinMaf [0.05], --kinMiss [0.05]
     Chromosome X : --xLabel [X], --xStart [2699520], --xEnd [154931044],
                    --maleLabel [1], --femaleLabel [2]
           others : --cpu [1], --kinOnly,
                    --geneMap [../data/refFlat_hg19.txt]
        PhoneHome : --noPhoneHome, --phoneHomeThinning [100]

INPUT FILE FORMAT

RMW needs the following files as input: PED and DAT file in Merlin format, AND/OR a VCF file. When genotypes are stored in PED and DAT file, the VCF file is not needed. However, even if genotypes are saved in a VCF file, PED and DAT files are still needed for carrying covariate and trait information.

PED and DAT Files

  • When PED file has genotypes saved, there is no need for a VCF file as input.
  • RMW takes PED/DAT file in Merlin format. Please refer to PED/DAT format description for details.
  • PED file requires "dummy" parents to be included in the pedigree file. To check the integrity of your PED/DAT file, please use pedstats. To add dummy parents into the pedigree, please use the perl script.
  • An example PED file is in the following:
    1 1 0 0 1 1.5 1 23 A A A A A A A A A A
    2 1 0 0 1 1.0 1 34 A C A C A C A C A C
    3 1 0 0 2 0.4 1 43 A A A A A A A A A A
    4 1 0 0 2 0.9 1 13 A C A C A C A C A C
  • The matching DAT file is in the following:
 T YourTraitName
 C SEX
 C AGE
 M 1:123456
 M 1:234567
 M 2:111111
 M 2:222222
 M X:12345
  • DAT file must have variant names in the following format "M chr:pos".
  • Orders of labels in DAT file have to match the order of fields in PED file.
  • Markers in PED and DAT file must be sorted by chromosome and position.
  • Covariate and trait values are saved in PED file. Covariate and trait descriptions are saved in DAT file. Note that you must specify --makeResiduals in order to adjust the covariates out of the phenotype. See Example Command Lines for examples and Trait Options for more information.

VCF File

GENOTYPES
  • Another option is to use VCF as input. Please refer to the following link for VCF file specification: 1000 genome wiki VCF specs
  • VCF file should be compressed by bgzip and indexed by tabix, using the following command:
 bgzip input.vcf     ## this command will produce input.vcf.gz
 tabix -p vcf -f input.vcf.gz  ## this command will produce input.vcf.gz.tbi
  • Even with the presence of VCF file, PED/DAT files are still needed for covariates and phenotypes.
  • Are you using PLINK file formats? Converting to VCF is easy. Use WDIST (very similar to PLINK) to make the conversion. Visit this page | WDIST to find documentation and downloads for WDIST.


  • When genotypes are saved in a VCF file, PED and DAT files are used for specifying pedigree structure, covariate and trait information. An example command line might look like this:
 --ped input.ped --dat input.dat --vcf input.vcf.gz
  • When genotypes are saved in the PED file, the VCF file is not needed. An example command line might look like this:
 --ped input.ped --dat input.dat
DOSAGE
  • If you want to analyze dosage data from VCF file, the following option has to be specified: --dosage. A key word "DS" in FORMAT field in VCF file has to included accordingly. An example is in the following:
 #CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	IDx	ID1	ID2	ID3
 22	16050408	37239779	T	C	.	PASS	AC=2;AN=496	GT:DS:GP	./.:.:0,0,0	./.:.:0,0,0	./.:.:0,0,0	
 22	16050933	37239784	G	A	.	PASS	AC=141;AN=904	GT:DS:GP	0/0:0.0:1,0,0	0/0:0.0:1,0,0	0/0:0.0:1,0,0
  • --noeof allows using VCF file without BGZF EOF markers. This is a very rare option to use. If your run is terminated with error message: "", then you might want to check out this option.

OUTPUT

OUTPUT FILE NAMES

  • Three files are generated automatically by default:
 prefix.traitName.singlevar.score.txt (single variant summary statistics and QC statistics)
 prefix.traitName.singlevar.cov.txt (covariance matrices of single variant score statistics)
 prefix.singlevar.log (log file)
  • If --zip option is used, then the following will be generated automatically:
 prefix.traitName.singlevar.score.txt.gz
 prefix.traitName.singlevar.score.txt.gz.tbi
 prefix.traitName.singlevar.cov.txt.gz
 prefix.traitName.singlevar.cov.txt.gz.tbi
 prefix.singlevar.log
  • If --recessive and/or --dominant options are used, then the following files are also generated in addition to the above files
 prefix.traitName.recessive.singlevar.score.txt.gz
 prefix.traitName.recessive.singlevar.cov.txt.gz
 prefix.traitName.dominant.singlevar.score.txt.gz
 prefix.traitName.dominant.singlevar.cov.txt.gz
  • If --kinGeno --kinSave is used, then the genomic relationship matrix is stored in
 prefix.Empirical.Kinship.gz
  • If --vcX option is used, then the genomic relationship matrix from chromosome X is stored in
 prefix.Empirical.KinshipX.gz

OUTPUT FILE FORMATS

Summary Statistics
  • In the file with summary statistics named prefix.traitName.singlevar.score.txt contains summary statistics that are needed by Rare-Metal. An example is shown in below:
LDL mean= -0.00, variance=  1.00, heritability= 34.30 
CHR       POS REF_ALLELE ALT_ALLELE  INFORMATIVE_N  FOUNDER_AF    ALL_AF  INFORMATIVE_AC  HWE_PVALUE      STAT  ALT_ALLELE_EFFSIZE        PVALUE
 10  45410002          G          A           6103    0.034159  0.034159             410    0.165893  126.2050            0.309798  4.030740e-10
 19  45412079          G          A           6103    0.036812  0.036812             434    0.714645 -265.8400           -0.587356  7.878510e-36
 19  45414451          G          A           6103    0.444989  0.444989            5312    0.075927  -26.1212           -0.008371  6.400580e-01


  • pvalues from the above output are from the family-based single variant score test.
LD Matrices
  • prefix.traitName.singlevar.cov.txt contains the LD matrix among a variant and the adjacent markers within a prefixed-sized window. The default window size is 1MB. It has the following format:
CHR     POS                            VAR_POS_IN_WINDOW                                                                  LD_MATRIX
  1  762320   762320,865628,865665,878744,879381,1560000  0.0359084,-0.000242112,-0.00125797,-0.000993422,-0.000344509,-0.00017077,
  1  865628  865628,865665,878744,879381,1560000,1864659           0.419804,-0.0103663,-0.00635265,0.0594056,0.0534505,-0.00462183,
  1  878744        878744,879381,1560000,1864659,1877659             0.000404537,-0.000235215,-1.4455e-05,-8.69137e-06,-3.1027e-05,
Genomic Relationship Matrix (GRM)
  • Once --kinGeno --kinSave --prefix options are requested, you would expect to see a GRM generated (compressed by gzip) with name yourprefix.Empirical.Kinship.gz. If --prefix option is not used, then the file name is Empirical.Kinship.gz.
  • If --vcX --kinGeno --kinSave --prefix options are requested, besides the autosomal GRM, you would also expect to see a separate GRM for chromosome X saved (compressed by gzip also) under the name yourprefix.Empirical.KinshipX.gz.
  • The GRMs are generated based on all genotyped individuals included in the PED file; samples with missing phenotype or missing covariates are not excluded from GRMs. This feature makes GRMs reusable if you have multiple traits to analyze in separate runs. You can simplely use --kinFile option (--kinxFile option if you have X chromosome GRM together with --vcX option issued) to reuse the pre-saved GRMs.
  • The format for both autosomal and chromosome X GRMs are the same. The first row has all sample IDs (sample size=N) listed. The rest of the file is a symmetric matrix with dimention NxN, and element ij of this matrix represents the kinship between the and the sample whose ID can be found from the first row.
  • For details about GRM calculation, please refer to method.
Log File
  • RMW automatically generates a log file named "yourprefix.singlevar.log".
  • The first part of the log file has options used for your analysis saved.
The following parameters are in effect:

Input Files:
============================
--ped [pheno.ped]
--dat [pheno.dat] 
--vcf [allvars.vcf.gz]
--dosage [false]
--noeof [false]

Output Files:
============================
--prefix [rmw.test]
--LDwindow [1000000]
--zip [false]
--thin [false]
--labelHits [false]

VC Options:
============================
--vcX [true]
--separateX [true]

Trait Options:
============================
--makeResiduals [false]
--inverseNormal [false]
--traitName []

Model Options:
============================
--recessive [false]
--dominant [false]

Kinship Source:
============================
--kinPedigree [true]
--kinGeno [false]
--kinFile []
--kinxFile []
--kinSave [false]

Kinship Options:
============================
--kinMaf [0.05]
--kinMiss [0.05]

Chromosome X:
============================
xLabel [X]
xStart [2699520]
xEnd [154931044]
maleLabel [1]
femaleLabel [2]
  • The second part of the log file has all warnings and running messages saved.
Plots
  • RAREMETALWORKER generates QQ plot and Manhattan plots automatically, unless there are only trivial number of variants analyzed.
  • RAREMETALWORKER stores plots of each trait in separate files named yourprefix.traitname.plots.pdf.
  • RAREMETALWORKER stores plots for recessive and dominant results separated with files named yourprefix.traitname.recessive.plots.pdf and yourprefix.traitname.dominant.plots.pdf.
  • RAREMETALWORKER automatically generates three stratified QQ plots, one with all variants, one with variants of maf<0.05, and one with variants of maf<0.01.
  • Genomic controls are automatically calculated and labeled in QQ plots.
  • By using --labelHits option, users can choose to label the hits.
  • Here is an example QQ plot and manhattan plot generated by RAREMETALWORKER.
QQ.png
Single var manhattan.png

SPECIAL TOPICS

  • For special topics such as how RAREMEALWORKER handles missing data, unrelated individuals, markers on chromosomeX, please go to SPECIAL TOPICS.

Example Command Lines

The following list a few popular combinations of options used for analyses. For an itemized description of options, please go to COMMAND REFERENCE.

General Usage

  • If your PED file has many traits but you only want one of them to be analyzed, then the following command does the trick:
 prompt> $PATH/bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz --traitName BMI --prefix yourFavoritePrefix
  • If you want to inverse normalize (quantile normalize) your trait before doing associations, this can be done by adding --inverseNormal to your command line:
 prompt> $PATH/bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz --traitName BMI --prefix yourFavoritePrefix --inverseNormal
  • The following command will adjust covariates first and then use residuals to proceed association:
 prompt> $PATH/bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz --traitName BMI --prefix yourFavoritePrefix --makeResiduals
  • The following command will adjust covariates first and then use the inverse normalized residuals to proceed association:
 prompt> $PATH/bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz --traitName BMI --prefix yourFavoritePrefix --makeResiduals --inverseNormal

Related individuals

  • When pedigree is known and you want to use it to count for relatedness then the following command can be used:
 prompt> $PATH/bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz --kinPedigree --prefix yourFavoritePrefix
  • When you want to an estimated genomic relationship matrix to count for relatedness then the following command can be used:
 prompt> $PATH/bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz --prefix yourFavoritePrefix --kinGeno --kinSave (this will save the genomic relationship matrix for future use)
  • If the genomic relationship matrix has been saved previously, and you want to use it to count for relatedness then the following command can be used:
 prompt> $PATH/bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz --prefix yourFavoritePrefix --kinFile yourPreviouslySavedKinship

Unrelated individuals

  • To analyze individuals as unrelated, even if pedigree is known, you just have to use the following command:
 prompt> $PATH/bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --vcf yourInput.vcf.gz --prefix yourFavoritePrefix

Analyzing Chromosome X

  • To analyze markers on chromosome X, if relatedness is not considered, then no special options needs to be issued.
  • When relatedness is modeled using linear mixed model, and pedigree is known, then the following command fits use both autosomal kinship and chromosomeX kinship to fit a variance component model:
 prompt> $PATH/bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --kinPedigree --vcX --vcf yourInput.vcf.gz --prefix yourFavoritePrefix 
  • Adding --separateX to the above command line will only use chromosome X kinship to fit the variance component model:
 prompt> $PATH/bin/raremetalworker --ped yourInput.ped --dat yourInput.dat --kinPedigree --vcX --separateX --vcf yourInput.vcf.gz --prefix yourFavoritePrefix

Using MERLIN format PED/DAT INPUT FILES

  • When genotypes are stored in MERLIN format PED/DAT files, command should be the same to do the above analysis, except --vcf option should be excluded.
  • Please refer to PED/DAT format for format requirements.

Tutorial

  • For a comprehensive tutorial of RMW and RAREMETAL using example data sets, please go to the following:
 RAREMETAL and RAREMETALWORKER Tutorial