RAREMETALWORKER command reference

From Genome Analysis Wiki
Revision as of 13:16, 14 April 2014 by Shuang Feng (talk | contribs) (--kinSave)
Jump to: navigation, search

Useful Links

Here are some useful links to key pages:

List of Options

 Options:
       Input Files : --ped [], --dat [], --vcf [], --dosage, --noeof
      Output Files : --prefix [], --LDwindow [1000000], --zip, --thin,
                     --labelHits
        VC Options : --vcX, --separateX
     Trait Options : --makeResiduals, --inverseNormal, --traitName []
     Model Options : --recessive, --dominant
    Kinship Source : --kinPedigree, --kinGeno, --kinFile [], --kinxFile [],
                     --kinSave
   Kinship Options : --kinMaf [0.05], --kinMiss [0.05]
      Chromosome X : --xLabel [X], --xStart [2699520], --xEnd [154931044],
                     --maleLabel [1], --femaleLabel [2]
         PhoneHome : --noPhoneHome, --phoneHomeThinning [100]

Input Files

--ped

--dat

--vcf

  • --vcf takes a string of your VCF file name.

--dosage

  • When --dosage is issued in command line, RAREMETALWORKER reads dosage from your VCF file.
  • --dosage must be used with --vcf option.
  • Description of dosage format in a VCF file can be found in dosage.

--noeof

  • If you VCF file does not have the BGZF EOF markers, you should use --noeof option to let RAREMETALWORKER skip checking the BGZF EOF markers at the end of the file.
  • Please see BGZF EOF for more details.

Output Files

--prefix

  • --prefix takes a value of a string as the prefix of your output files.
  • For a full list of output files generated by RAREMETALWORKER, please refer to output.

--LDwindow

  • --LDwindow takes a integer value as the size of the moving window.
  • RAREMETALWORKER generates LD matrices between a current marker that it is working on and all markers within this window.
  • The default size is 1 million bases.
  • For more information about the LD matrix, please refer to LD matrix.

--zip

--thin

  • If --thin is issued, then RAREMETALWORKER generates QQ plots and Manhattan plots with less resolution (points), to make the pdf files smaller in size.

--labelHits

  • If --thin is issued, then RAREMETALWORKER automatically label the loci that are above a threshold.
  • The threshold is calculated using Bonferroni correction (0.05/N, where N is the total number of polymorphic markers).

VC Options

--vcX

  • --vcX option has to be used with --kinPedigree (when pedigree kinship is used), or --kinGeno (when genomic relationship matrix is estimated), or --kinFile ( when GRM is read from a file).
  • Using --vcX option let RAREMETALWORKER fit a linear mixed model to analyze chromosome X, using both autosomal kinship and chromosome X kinship.

--separateX

  • --separateX option must be used with --vcX option.
  • Using --separateX option requests RAREMETALWORKER to fit a linear mixed model using only chromosome X kinship for analyses of chromosome X markers.

Please refer to method and technical details for more explanation.

Trait Options

--makeResiduals

  • If --makeResiduals is used, then covariates are adjusted before fitting linear models using residuals.

--inverseNormal

  • If --inverseNormal is used, but not with --makeResiduals, then trait values are inverse normalized before fitting linear models.
  • If --inverseNormal and --makeResiduals are used together, then covariates are adjusted and inverse normalized residuals are used to fit linear models.

--traitName

  • --traitName takes a string of the trait name that you want to analyze.
  • If this option is not used, then all traits included in PED/DAT files are analyzed.

Model Options

--recessive

  • If --recessive is used, then RAREMETALWORKER generates recessive results in addition to the additive results.
  • The set of association results generated by default can be found in recessive output.
  • A separate pdf file with QQ and Manhattan plots based on recessive results is generated with name yourprefix.traitname.recessive.plots.pdf.

--dominant

  • If --dominant is used, then RAREMETALWORKER generates recessive results in addition to the additive results.
  • The set of association results generated by default can be found in dominant output.
  • A separate pdf file with QQ and Manhattan plots based on recessive results is generated with name yourprefix.traitname.dominant.plots.pdf.

Kinship Source

--kinPedigree

  • If --kinPedigree is used, pedigree structure coded in PED file is used to generate a kinship matrix for later fitting linear mixed model before associations.

--kinGeno

  • If --kinPedigree is used, then a genomic relationship matrix is estimated from genotype.
  • If --vcX option is used, then a separate genomic relationship matrix for chromosome X is also estimated.
  • For details about how to estimate GRM, please refer to methods'.

--kinFile

  • --kinFile takes a string of the file name of previously saved GRM with format described in format.
  • This option reads GRM from the file and then extract the correct GRM based on samples to be analyzed according to your specifications, such as traits to be analyzed, missing covariates and genotypes (please refer to missing data for more details).
  • --kinFile can not be used together with --kinGeno.

--kinxFile

  • --kinxFile must be used with --kinFile and --vcX.
  • --kinxFile takes a string of file name of the previously saved GRM for chromosome X.
  • If --kinxFile is not used, but --kinFile your.autosomal.Empirical.Kinship.gz --vcX are issued in a command line, then RAREMETALWORKER will look for a kinship X file named your.autosomal.Empirical.KinshipX.gz. If this file is still not found, a FATAL ERROR will occur.

--kinSave

  • This option must be used with --kinGeno.
  • Issuing --kinSave will request RAREMETALWORKER to store the estimated GMR in a file named yourprefix.Empirical.Kinship.gz.
  • If --vcX is also issued in the command line, then a separate file named yourprefix.Empirical.KinshipX.gz will be generated where the GRM of chromosome X is saved.
  • For formats of the saved genomic relationship matrix, please refer to format.

Kinship Options

--kinMaf

--kinMiss

Chromosome X

--xLabel

--xStart

[2699520]

--xEnd

[154931044],

PhoneHome

--noPhoneHome

--phoneHomeThinning

  • --prefix is optional.
  • If --prefix is not specified, the output file names will be:
 traitname.singlevar.score.txt
 traitname.singlevar.cov.txt
  • Otherwise, the output file names are:
 prefix.traitname.singlevar.score.txt
 prefix.traitname.singlevar.cov.txt
  • --LDwindow specifies the length of the window that LD Matrix should be generated upon each variant. The default is 1MB.
  • --zip gives users the option of writing compressed files (bgzip compressed) automatically for convenient sharing.
  • --thin tells RMW to thin points when generating QQ plot and Manhattan plots, so the file size is smaller.
  • --labelHits tells RMW to to label the hits using pvalue threshold 0.05/(#of variants tested) with gene name, based on human genome build 19.

VC Options

  • When --vcShared and --vcX are specified, RMW knows that you want to fit shared environment and/or chromosome X variance component together with genetic component and non-shared environment.
  • When --makeResiduals is specified, RMW understands covariates should be read from PED/DAT file. Covariates are modeled as fixed effects.

Trait Options

  • --makeResiduals tells RMW to adjust the covariates and analyze residuals instead of the original phenotypes. If either --kinGeno or --kinPedigree option is used, then a variance component model will be fit based on residuals. If the --inverseNormal option is also used, then the residuals will be quantile normalized before fitting variance component model.
  • --traitName is created for situations when you have many traits saved in your PED and DAT file, but you are interested in one or a few of them. It can read a file ending with .txt with each trait of interest in a separate line, or trait names separated with "/". An example to handle one trait or multiple traits is in the following:
  --traitName LDL
  --traitName LDL/HDL/TG
  --traitName traitsOfInterest.txt
  • If --traitName is not used, all traits in PED/DAT file will be analyzed.

Model Options

  • additive model is used in RMW as default.
  • --recessive allows additional association results (pvalue, effect size, and standard error) generated using recessive model. If VCF file is used, then non-reference allele is considered the recessive allele. If PED/DAT files are used for genotype, then minor allele is considered the recessive allele.
  • --dominant allows additional association results (pvalue, effect size, and standard error) generated using dominant model. If VCF file is used, then non-reference allele is considered the dominant allele. If PED/DAT files are used for genotype, then minor allele is considered the dominant allele.
  • --recessive and --dominant options can be used together.
  • Recessive and dominant results are stored in separate files.

Kinship Source

  • --kinPedigree allows RMW to generate kinship matrix from pedigree, when pedigree information is available.
  • --kinGeno informs RMW to generate kinship matrix from all available variants that pass the criteria, specified in --kinMaf and --kinMiss options. The default will take variants with MAF>0.05 and genotype missing rate <0.05.
  • --kinGeno option can NOT be used with --kinPedigree or --kinFile option. Only one of three options or none of them can be used in the same run.
  • --kinFile let RMW read in a kinship matrix from a file. The first row of the kinship file has to be the sample IDs included in the kinship file. If a sample of interest is not included in the kinship file, fatal error will occur and the program will be terminated. A sample of interest is a sample that is phenotyped and has all covariates measured when --makeResiduals is specified.
  • --kinSave allows you to save the kinship matrix.

Kinship Options

  • --kinMiss and --kinMaf should be used with --kinGeno together.
  • --kinMiss specifies the maximum genotype missing rate when calculating kinship from genotypes. The default is 0.05.
  • --kinMaf specifies the minimum minor allele frequency used when calculating kinship from genotypes. The default is 0.05.

Chromosome X

  • --xLabel should have a value of a string which specifies how variants on chromosome X are coded. The default is "X".
  • --xStart and --xEnd specifies the start and end of non-pseudo-autosomal regions on chromosome X. These options should be specified when --vcX is used.
  • The default for --xStart is 2699520 and default for --xEnd is 154931044, according to NCBI genome build 37.

Please refer to the following for the analysis of X-linked variants ANALYZING CHROMOSOME X.

PhoneHome Parameters

See PhoneHome for more information on how PhoneHome works and what it does.

  • --noPhoneHome disables PhoneHome. PhoneHome is enabled by default based on the thinning parameter.
  • --phoneHomeThinning (0-100) adjusts the frequency of PhoneHome.
    • By default, --phoneHomeThinning is set to 50, running 50% of the time.
    • PhoneHome will only occur if the run's random number modulo 100 is less than the --phoneHomeThinning value.
    • N/A if --noPhoneHome is set.