Difference between revisions of "RAREMETALWORKER command reference"

From Genome Analysis Wiki
Jump to navigationJump to search
 
(19 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
[[Category:RAREMETALWORKER]]
 
==Useful Links==
 
==Useful Links==
  
 
Here are some useful links to key pages:
 
Here are some useful links to key pages:
* [[RAREMETALWORKER | '''RAREMETALWORKER documentation''']]
+
* The [[RAREMETALWORKER | '''RAREMETALWORKER documentation''']]
* [[RAREMETALWORKER_METHOD | '''RAREMETALWORKER method''']]
+
* The [[RAREMETALWORKER_METHOD | '''RAREMETALWORKER method''']]
* [[RAREMETALWORKER_SPECIAL_TOPICS | '''RAREMETALWORKER special topics''']]
+
* The [[RAREMETALWORKER_SPECIAL_TOPICS | '''RAREMETALWORKER special topics''']]
* [[RAREMETAL_FAQ | '''FAQ''']]
+
* The [[Tutorial:_RAREMETAL | '''RAREMETALWORKER quick start tutorial''']]
 +
* The [[RAREMETAL_FAQ | '''FAQ''']]
  
==List of Options ==
+
Options:
 
+
      Input Files : --ped [], --dat [], --vcf [], --dosage, --flagDosage [DS],
  Options:
+
                    --noeof
        Input Files : --ped [], --dat [], --vcf [], --dosage, --noeof
+
      Output Files : --prefix [], --LDwindow [1000000], --zip, --thin,
      Output Files : --prefix [], --LDwindow [1000000], --zip, --thin,
+
                    --labelHits
                      --labelHits
+
        VC Options : --vcX, --separateX
        VC Options : --vcX, --separateX
+
    Trait Options : --makeResiduals, --inverseNormal, --traitName []
      Trait Options : --makeResiduals, --inverseNormal, --traitName []
+
    Model Options : --recessive, --dominant
      Model Options : --recessive, --dominant
+
    Kinship Source : --kinPedigree, --kinGeno, --kinFile [], --kinxFile [],
    Kinship Source : --kinPedigree, --kinGeno, --kinFile [], --kinxFile [],
+
                    --kinSave
                      --kinSave
+
  Kinship Options : --kinMaf [0.05], --kinMiss [0.05]
    Kinship Options : --kinMaf [0.05], --kinMiss [0.05]
+
      Chromosome X : --xLabel [X], --xStart [2699520], --xEnd [154931044],
      Chromosome X : --xLabel [X], --xStart [2699520], --xEnd [154931044],
+
                    --maleLabel [1], --femaleLabel [2]
                      --maleLabel [1], --femaleLabel [2]
+
            others : --cpu [1], --kinOnly,
          PhoneHome : --noPhoneHome, --phoneHomeThinning [100]
+
                    --geneMap [../data/refFlat_hg19.txt], --mergedVCFID
 +
        PhoneHome : --noPhoneHome, --phoneHomeThinning [100]
  
 
==Input Files==
 
==Input Files==
Line 34: Line 37:
 
* --dosage must be used with --vcf option.
 
* --dosage must be used with --vcf option.
 
* Description of dosage format in a VCF file can be found in [[RAREMETALWORKER#DOSAGE | '''dosage''']].
 
* Description of dosage format in a VCF file can be found in [[RAREMETALWORKER#DOSAGE | '''dosage''']].
 +
 +
===--flagDosage===
 +
* This option let user customize the name of field in VCF file that labels dosage data.
 +
* The default is "DS".
 +
 
===--noeof===
 
===--noeof===
 
* If you VCF file does not have the BGZF EOF markers, you should use --noeof option to let RAREMETALWORKER skip checking the BGZF EOF markers at the end of the file.  
 
* If you VCF file does not have the BGZF EOF markers, you should use --noeof option to let RAREMETALWORKER skip checking the BGZF EOF markers at the end of the file.  
Line 51: Line 59:
  
 
===--zip===
 
===--zip===
* By issuing --zip, RAREMETALWORKER compress the [[ RAREMETALWORKER#Summary_Statistics| '''summary statistics''']] and [[RAREMETALWORKER#LD_Matrices | '''LD matrices''']] generated automatically, using gzip.  
+
* By issuing --zip, RAREMETALWORKER compress the [[ RAREMETALWORKER#Summary_Statistics| '''summary statistics''']] and [[RAREMETALWORKER#LD_Matrices | '''LD matrices''']] generated automatically, using gzip. And the output zip files will be indexed using tabix.
  
 
=== --thin ===
 
=== --thin ===
Line 73: Line 81:
  
 
==Trait Options==
 
==Trait Options==
 +
===--makeResiduals===
 +
* If --makeResiduals is used, then covariates are adjusted before fitting linear models using residuals.
 +
 +
===--inverseNormal===
 +
* If --inverseNormal is used, but not with --makeResiduals, then trait values are inverse normalized before fitting linear models.
 +
* If --inverseNormal and --makeResiduals are used together, then covariates are adjusted and inverse normalized residuals are used to fit linear models.
 +
 +
===--traitName===
 +
* --traitName takes a string of the trait name that you want to analyze.
 +
* If this option is not used, then all traits included in [[RAREMETALWORKER#PED_and_DAT_Files | '''PED/DAT''']] files are analyzed.
  
 
==Model Options==
 
==Model Options==
 +
===--recessive===
 +
* If --recessive is used, then RAREMETALWORKER generates recessive results in addition to the additive results.
 +
* The set of association results generated by default can be found in [[RAREMETALWORKER#OUTPUT_FILE_NAMES | '''recessive output''']].
 +
* A separate pdf file with QQ and Manhattan plots based on recessive results is generated with name ''yourprefix.traitname.recessive.plots.pdf''.
 +
===--dominant===
 +
* If --dominant is used, then RAREMETALWORKER generates recessive results in addition to the additive results.
 +
* The set of association results generated by default can be found in [[RAREMETALWORKER#OUTPUT_FILE_NAMES | '''dominant output''']].
 +
* A separate pdf file with QQ and Manhattan plots based on recessive results is generated with name ''yourprefix.traitname.dominant.plots.pdf''.
  
 
==Kinship Source==
 
==Kinship Source==
 +
===--kinPedigree===
 +
* If --kinPedigree is used, pedigree structure coded in [[RAREMETALWORKER#PED_and_DAT_Files | '''PED''']] file is used to generate a kinship matrix for later fitting linear mixed model before associations.
 +
 +
===--kinGeno===
 +
* If --kinPedigree is used, then a genomic relationship matrix is estimated from genotype.
 +
* If --vcX option is used, then a separate genomic relationship matrix for chromosome X is also estimated.
 +
* For details about how to estimate GRM, please refer to [[RAREMETALWORKER_method#MODELING_RELATEDNESS | ''methods''']].
 +
 +
===--kinFile===
 +
* --kinFile takes a string of the file name of previously saved GRM with format described in [[RAREMETALWORKER#Genomic_Relationship_Matrix_.28GRM.29 | '''format''']].
 +
* This option reads GRM from the file and then extract the correct GRM based on samples to be analyzed according to your specifications, such as traits to be analyzed, missing covariates and genotypes (please refer to [[RAREMETALWORKER_SPECIAL_TOPICS#Missing_Data | '''missing data''']] for more details).
 +
* --kinFile can not be used together with --kinGeno.
 +
 +
===--kinxFile===
 +
* --kinxFile must be used with --kinFile and --vcX.
 +
* --kinxFile takes a string of file name of the previously saved GRM for chromosome X.
 +
* If --kinxFile is not used, but --kinFile your.autosomal.Empirical.Kinship.gz  --vcX are issued in a command line, then RAREMETALWORKER will look for a kinship X file named your.autosomal.Empirical.KinshipX.gz. If this file is still not found, a FATAL ERROR will occur.
 +
===--kinSave===
 +
* This option must be used with --kinGeno.
 +
* Issuing --kinSave will request [[RAREMETALWORKER]] to store the estimated GMR in a file named yourprefix.Empirical.Kinship.gz.
 +
* If --vcX is also issued in the command line, then a separate file named yourprefix.Empirical.KinshipX.gz will be generated where the GRM of chromosome X is saved.
 +
*For formats of the saved genomic relationship matrix, please refer to [[RAREMETALWORKER#Genomic_Relationship_Matrix_.28GRM.29 | '''format''']].
  
 
==Kinship Options==
 
==Kinship Options==
 +
===--kinMaf===
 +
* --kinMaf takes a value that specifies the MAF cutoff for variants to be used to estimate GRMs.
 +
* The default is 0.05, which means variants with MAF<0.05 are not used for estimating GRMs.
 +
===--kinMiss===
 +
* --kinMiss takes a value that specifies the missing genotype cutoff for variants to be used to estimate GRMs.
 +
* The default is 0.05, which means variants with genotype call rate <0.95 are not used for estimating GRMs.
  
 
==Chromosome X==
 
==Chromosome X==
 +
===--xLabel===
 +
* --xLabel takes a string that used as label for chromosome X in your file.
 +
* The default is "X".
  
==PhoneHome==
+
===--xStart===
 +
* --xStart takes an integer that described the start position of nonPAR region on chromosome X.
 +
* The default is 2699520 based on Human Genome build 19.
  
 +
===--xEnd===
 +
* --xStart takes an integer that described the end position of nonPAR region on chromosome X.
 +
* The default is 154931044 based on Human Genome build 19.
  
 +
==Others==
 +
===--cpu[1]===
 +
*--cpu takes an integer that specifies the number of cpus to use for estimating kinship matrix from genotypes.
  
* --prefix is optional.
+
===--kinOnly===
* If --prefix is not specified, the output file names will be:
+
*--kinOnly allows users to estimate kinship matrix without any association analysis of any traits included in the data set.
  traitname.singlevar.score.txt
+
*To also estimate chromosome X kinship, --vcX option should be added in command line.
  traitname.singlevar.cov.txt
 
* Otherwise, the output file names are:
 
  prefix.traitname.singlevar.score.txt
 
  prefix.traitname.singlevar.cov.txt
 
* --LDwindow specifies the length of the window that LD Matrix should be generated upon each variant. The default is 1MB.
 
* --zip gives users the option of writing compressed files (bgzip compressed) automatically for convenient sharing.
 
* --thin tells RMW to thin points when generating QQ plot and Manhattan plots, so the file size is smaller.
 
* --labelHits tells RMW to to label the hits using pvalue threshold 0.05/(#of variants tested) with gene name, based on human genome build 19.
 
  
==== VC Options ====
+
===--geneMap===
* When --vcShared and --vcX are specified, RMW knows that you want to fit shared environment and/or chromosome X variance component together with genetic component and non-shared environment.
+
* --geneMap takes a string describing the path to find mapping file for manhattan plot annotation.  
* When --makeResiduals is specified, RMW understands covariates should be read from PED/DAT file. Covariates are modeled as fixed effects.
+
* The default is human genome build 19, saved in raremetal/data/refFlat_hg19.txt.
  
==== Trait Options ====
+
===--mergedVCFID===
* --makeResiduals tells RMW to adjust the covariates and analyze residuals instead of the original phenotypes. If either --kinGeno or --kinPedigree option is used, then a variance component model will be fit based on residuals. If the --inverseNormal option is also used, then the residuals will be quantile normalized before fitting variance component model.
+
* This options allows RAREMETALWORKER to recognize VCF samples IDs in "FAMID_PID" format.  
* --traitName is created for situations when you have many traits saved in your PED and DAT file, but you are interested in one or a few of them. It can read a file ending with .txt with each trait of interest in a separate line, or trait names separated with "/". An example to handle one trait or multiple traits is in the following:
+
* The default value is OFF, which means VCF sample IDs are consistent with PID field in PED file.
  --traitName LDL
 
  --traitName LDL/HDL/TG
 
  --traitName traitsOfInterest.txt
 
* If --traitName is not used, all traits in PED/DAT file will be analyzed.
 
  
==== Model Options ====
+
==PhoneHome==
* additive model is used in RMW as default.
+
* See [[PhoneHome]] for more information on how PhoneHome works and what it does.
* --recessive allows additional association results (pvalue, effect size, and standard error) generated using recessive model. If VCF file is used, then non-reference allele is considered the recessive allele. If PED/DAT files are used for genotype, then minor allele is considered the recessive allele.
+
===--noPhoneHome===
* --dominant allows additional association results (pvalue, effect size, and standard error) generated using dominant model. If VCF file is used, then non-reference allele is considered the dominant allele. If PED/DAT files are used for genotype, then minor allele is considered the dominant allele.
+
* --noPhoneHome disables PhoneHome.  
* --recessive and --dominant options can be used together.
+
* PhoneHome is enabled by default based on the thinning parameter.
* Recessive and dominant results are stored in separate files.
+
===--phoneHomeThinning===
 
+
* --phoneHomeThinning (0-100) adjusts the frequency of PhoneHome.
==== Kinship Source ====
+
* The default is 100, running 100% of the time.
* --kinPedigree allows RMW to generate kinship matrix from pedigree, when pedigree information is available.  
 
* --kinGeno informs RMW to generate kinship matrix from all available variants that pass the criteria, specified in --kinMaf and --kinMiss options. The default will take variants with MAF>0.05 and genotype missing rate <0.05.
 
* --kinGeno option can NOT be used with --kinPedigree or --kinFile option. Only one of three options or none of them can be used in the same run.
 
* --kinFile let RMW read in a kinship matrix from a file. The first row of the kinship file has to be the sample IDs included in the kinship file. If a sample of interest is not included in the kinship file, fatal error will occur and the program will be terminated. A sample of interest is a sample that is phenotyped and has all covariates measured when --makeResiduals is specified.
 
* --kinSave allows you to save the kinship matrix.
 
 
 
==== Kinship Options ====
 
* --kinMiss and --kinMaf should be used with --kinGeno together.
 
* --kinMiss specifies the maximum genotype missing rate when calculating kinship from genotypes. The default is 0.05.
 
* --kinMaf specifies the minimum minor allele frequency used when calculating kinship from genotypes. The default is 0.05.
 
 
 
==== Chromosome X ====
 
* --xLabel should have a value of a string which specifies how variants on chromosome X are coded. The default is "X".
 
* --xStart and --xEnd specifies the start and end of non-pseudo-autosomal regions on chromosome X. These options should be specified when --vcX is used.
 
* The default for --xStart is 2699520 and default for --xEnd is 154931044, according to NCBI genome build 37.
 
 
 
Please refer to the following for the analysis of X-linked variants [[RAREMETALWORKER_X|'''ANALYZING CHROMOSOME X''']].
 
 
 
{{PhoneHomeParameters|hdr=====|bullet=1}}
 

Latest revision as of 17:47, 16 March 2018

Useful Links

Here are some useful links to key pages:

Options:

      Input Files : --ped [], --dat [], --vcf [], --dosage, --flagDosage [DS],
                    --noeof
     Output Files : --prefix [], --LDwindow [1000000], --zip, --thin,
                    --labelHits
       VC Options : --vcX, --separateX
    Trait Options : --makeResiduals, --inverseNormal, --traitName []
    Model Options : --recessive, --dominant
   Kinship Source : --kinPedigree, --kinGeno, --kinFile [], --kinxFile [],
                    --kinSave
  Kinship Options : --kinMaf [0.05], --kinMiss [0.05]
     Chromosome X : --xLabel [X], --xStart [2699520], --xEnd [154931044],
                    --maleLabel [1], --femaleLabel [2]
           others : --cpu [1], --kinOnly,
                    --geneMap [../data/refFlat_hg19.txt], --mergedVCFID
        PhoneHome : --noPhoneHome, --phoneHomeThinning [100]

Input Files

--ped

--dat

--vcf

  • --vcf takes a string of your VCF file name.

--dosage

  • When --dosage is issued in command line, RAREMETALWORKER reads dosage from your VCF file.
  • --dosage must be used with --vcf option.
  • Description of dosage format in a VCF file can be found in dosage.

--flagDosage

  • This option let user customize the name of field in VCF file that labels dosage data.
  • The default is "DS".

--noeof

  • If you VCF file does not have the BGZF EOF markers, you should use --noeof option to let RAREMETALWORKER skip checking the BGZF EOF markers at the end of the file.
  • Please see BGZF EOF for more details.

Output Files

--prefix

  • --prefix takes a value of a string as the prefix of your output files.
  • For a full list of output files generated by RAREMETALWORKER, please refer to output.

--LDwindow

  • --LDwindow takes a integer value as the size of the moving window.
  • RAREMETALWORKER generates LD matrices between a current marker that it is working on and all markers within this window.
  • The default size is 1 million bases.
  • For more information about the LD matrix, please refer to LD matrix.

--zip

  • By issuing --zip, RAREMETALWORKER compress the summary statistics and LD matrices generated automatically, using gzip. And the output zip files will be indexed using tabix.

--thin

  • If --thin is issued, then RAREMETALWORKER generates QQ plots and Manhattan plots with less resolution (points), to make the pdf files smaller in size.

--labelHits

  • If --thin is issued, then RAREMETALWORKER automatically label the loci that are above a threshold.
  • The threshold is calculated using Bonferroni correction (0.05/N, where N is the total number of polymorphic markers).

VC Options

--vcX

  • --vcX option has to be used with --kinPedigree (when pedigree kinship is used), or --kinGeno (when genomic relationship matrix is estimated), or --kinFile ( when GRM is read from a file).
  • Using --vcX option let RAREMETALWORKER fit a linear mixed model to analyze chromosome X, using both autosomal kinship and chromosome X kinship.

--separateX

  • --separateX option must be used with --vcX option.
  • Using --separateX option requests RAREMETALWORKER to fit a linear mixed model using only chromosome X kinship for analyses of chromosome X markers.

Please refer to method and technical details for more explanation.

Trait Options

--makeResiduals

  • If --makeResiduals is used, then covariates are adjusted before fitting linear models using residuals.

--inverseNormal

  • If --inverseNormal is used, but not with --makeResiduals, then trait values are inverse normalized before fitting linear models.
  • If --inverseNormal and --makeResiduals are used together, then covariates are adjusted and inverse normalized residuals are used to fit linear models.

--traitName

  • --traitName takes a string of the trait name that you want to analyze.
  • If this option is not used, then all traits included in PED/DAT files are analyzed.

Model Options

--recessive

  • If --recessive is used, then RAREMETALWORKER generates recessive results in addition to the additive results.
  • The set of association results generated by default can be found in recessive output.
  • A separate pdf file with QQ and Manhattan plots based on recessive results is generated with name yourprefix.traitname.recessive.plots.pdf.

--dominant

  • If --dominant is used, then RAREMETALWORKER generates recessive results in addition to the additive results.
  • The set of association results generated by default can be found in dominant output.
  • A separate pdf file with QQ and Manhattan plots based on recessive results is generated with name yourprefix.traitname.dominant.plots.pdf.

Kinship Source

--kinPedigree

  • If --kinPedigree is used, pedigree structure coded in PED file is used to generate a kinship matrix for later fitting linear mixed model before associations.

--kinGeno

  • If --kinPedigree is used, then a genomic relationship matrix is estimated from genotype.
  • If --vcX option is used, then a separate genomic relationship matrix for chromosome X is also estimated.
  • For details about how to estimate GRM, please refer to methods'.

--kinFile

  • --kinFile takes a string of the file name of previously saved GRM with format described in format.
  • This option reads GRM from the file and then extract the correct GRM based on samples to be analyzed according to your specifications, such as traits to be analyzed, missing covariates and genotypes (please refer to missing data for more details).
  • --kinFile can not be used together with --kinGeno.

--kinxFile

  • --kinxFile must be used with --kinFile and --vcX.
  • --kinxFile takes a string of file name of the previously saved GRM for chromosome X.
  • If --kinxFile is not used, but --kinFile your.autosomal.Empirical.Kinship.gz --vcX are issued in a command line, then RAREMETALWORKER will look for a kinship X file named your.autosomal.Empirical.KinshipX.gz. If this file is still not found, a FATAL ERROR will occur.

--kinSave

  • This option must be used with --kinGeno.
  • Issuing --kinSave will request RAREMETALWORKER to store the estimated GMR in a file named yourprefix.Empirical.Kinship.gz.
  • If --vcX is also issued in the command line, then a separate file named yourprefix.Empirical.KinshipX.gz will be generated where the GRM of chromosome X is saved.
  • For formats of the saved genomic relationship matrix, please refer to format.

Kinship Options

--kinMaf

  • --kinMaf takes a value that specifies the MAF cutoff for variants to be used to estimate GRMs.
  • The default is 0.05, which means variants with MAF<0.05 are not used for estimating GRMs.

--kinMiss

  • --kinMiss takes a value that specifies the missing genotype cutoff for variants to be used to estimate GRMs.
  • The default is 0.05, which means variants with genotype call rate <0.95 are not used for estimating GRMs.

Chromosome X

--xLabel

  • --xLabel takes a string that used as label for chromosome X in your file.
  • The default is "X".

--xStart

  • --xStart takes an integer that described the start position of nonPAR region on chromosome X.
  • The default is 2699520 based on Human Genome build 19.

--xEnd

  • --xStart takes an integer that described the end position of nonPAR region on chromosome X.
  • The default is 154931044 based on Human Genome build 19.

Others

--cpu[1]

  • --cpu takes an integer that specifies the number of cpus to use for estimating kinship matrix from genotypes.

--kinOnly

  • --kinOnly allows users to estimate kinship matrix without any association analysis of any traits included in the data set.
  • To also estimate chromosome X kinship, --vcX option should be added in command line.

--geneMap

  • --geneMap takes a string describing the path to find mapping file for manhattan plot annotation.
  • The default is human genome build 19, saved in raremetal/data/refFlat_hg19.txt.

--mergedVCFID

  • This options allows RAREMETALWORKER to recognize VCF samples IDs in "FAMID_PID" format.
  • The default value is OFF, which means VCF sample IDs are consistent with PID field in PED file.

PhoneHome

  • See PhoneHome for more information on how PhoneHome works and what it does.

--noPhoneHome

  • --noPhoneHome disables PhoneHome.
  • PhoneHome is enabled by default based on the thinning parameter.

--phoneHomeThinning

  • --phoneHomeThinning (0-100) adjusts the frequency of PhoneHome.
  • The default is 100, running 100% of the time.