Difference between revisions of "Famrvtest"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 15: Line 15:
 
* Save it to your local path and decompress using the following command:
 
* Save it to your local path and decompress using the following command:
 
   tar xvzf FamRV.0.0.5.tar.gz
 
   tar xvzf FamRV.0.0.5.tar.gz
* Go to FamRV_0.0.5/famRV/src and type the following command to compile:
+
* Go to famRV_0.0.5/famRV/src and type the following command to compile:
 
   make
 
   make
  

Revision as of 14:09, 10 September 2013

Brief Description

famRvTest is a computationally efficient tool for family-based association analyses of rare variants using sequencing or genotyping array data. famRvTest supports both single variant and gene-level associations.

For any questions, please contact Shuang Feng (sfengsph at umich.edu) or Gonçalo Abecasis (goncalo at umich.edu).

Download and Installation

  • University of Michigan CSG users can go to the following:
 /net/fantasia/home/sfengsph/code/famRV/bin/famRvTesst

Where to Download

How to Compile

  • Save it to your local path and decompress using the following command:
 tar xvzf FamRV.0.0.5.tar.gz
  • Go to famRV_0.0.5/famRV/src and type the following command to compile:
 make

How to Execute

  • Go to FamRV_0.0.1/famRvTest/bin and use the following:
 ./famRvTest

Approach

famRvTest uses linear mixed model approach, incorporating efficient optimization algorithm, to account for familial relationship, where kinship is either quantified based upon pedigree structures or estimated from genotypes of markers from genome-wide. Single marker associations including score, likelihood ratio and ward tests and gene-level associations methods (weighted and un-weighted burden, SKAT and variable threshold tests) have been implemented. Manuscript is under preparation.

Command References

                    Data File :                 (-dname)
                Pedigree File :                 (-pname)

Options:
           Kinship Options : --kinGeno, --empMaf [0.05], --empMiss [0.05],
                             --outputX, --outputKin, --kinFile [],
                             --kinPrefix []
      Input/Output Options : --vcf [], --groupFile [], --freqFile [],
                             --prefix []
                VC Options : --inverseNormal, --fitSharedEnv, --fitX,
                             --useCovariates, --traitName []
           SingleVar Tests : --SingleVarLRT, --SingleVarScore,
                             --SingleVarWald
              Burden Tests : --SKAT, --MB, --CMC_binary, --CMC_counts
  Variable Threshold Tests : --VTasymptotic, --VTpermute, --permuteMin [1000],
                             --permuteMax [3000000]
             Other Options : --function [], --mafMin [0.00], --mafMax [0.50],
                             --mac [0.00], --noStop, --xLabel [X],
                             --Xstart [2699520], --Xend [154931044], --dosage,
                             --founderFreq, --h2Only, --fullResult [ON]

Crucial Input Files:

famRvTest takes Merlin format pedigree and data file as input. These two files are crucial for the program to run. Please refer to Merlin documentation for details.

Kinship Options:

--kinGeno allows you to estimate relationship matrix using genotypes; otherwise, kinship matrix based on pedigree structure will be used.
--empMaf and --empMiss specifies the cutoff of minor allele frequency and genotype missing rate to filter SNPs for estimating empirical kinship matrix.
--outputX allows you to save kinship matrix from chromosome X. 
--outputKin allows you to save the kinship matrix from autosomal matrix if --outputX is also specified.
--kinFile allows you to read kinship matrix from a previously saved file.
--kinPrefix specifies the file prefix for kinship matrices saved.

Input/Output Options:

--vcf specified the name of input vcf file.
--groupFile should be followed by a the name of the groupfile you want to use for gene-level associations.
--freqFile allows users to read allele frequencies from a file instead of estimating based on data.
--prefix specifies the name of file prefix for all results saved.

SingleVar Tests:

--SingleVarWald, --SingleVarScore and --SingleVarLRT are wald, score and likelihood ratio tests.

Burden Tests:

--SKAT --MB --CMC_binary --CMC_counts are SKAT, weighted-burden test (Madsen-Browning weight), collapsing burden test and unweighted burden test based on rare allele count.

VT Tests:

--VTasymptotic performs variable threshold test and calculate asymptotic p-value.
--VTpermute performs variable threshold test and calculate p-value based on permutation.
--permuteMin [1000] and --permuteMax [3000000] specify the min and max number of permutation.

Other Options:

--function allows grouping by functional annotation when annotated vcf file is used for gene-level association tests.
--mafMin [0.00] and --mafMax [0.50] specify the minimum and maximum allele frequency for variants to group. 
--mac [0.0] specify the minimum rare allele count as one of the filters to rare variants to group.
--noStop indicating no stopping rule to be used in VT permutation test.
--xLabel [X] specifies labels for chromosome X.
--Xstart [2699520] and --Xend [154931044] are start and end position of non-pseudo-autosomal region.
--founderFreq considers founder allele frequencies in analysis.
--h2Only provides a shortcut of calculating heritability only.
--fullResult [on] provides results in long format in gene-level association testing, including results from single markers included in analysis.

Example Command Line

Single Variant Analysis

The following command lines let you run single variant association analysis of trait "LDL" using score test, after inverse normalization of the quantitative trait and adjusting covariates. --traitName specifies the single trait or traits you want to analyze in this batch. If this option is not used, then all traits coded in data file will be analyzed accordingly. --SingleVarLRT provides essentially the same test as in merlin --fastAssoc option.

./famRvTest -p your.ped -d your.dat --SingleVarScore --inverseNormal --useCovariates --traitName LDL

Futhermore, if you want to run likelihood ratio test and wald test at the same time, the following command should do the work:

./famRvTest -p your.ped -d your.dat --SingleVarScore --SingleVarLRT --SingleVarWald --inverseNormal --useCovariates --traitName LDL

All the above commands will let you do family-based association analysis using kinship matrices generated using pedigree structure coded in pedigree file. The following command lines show examples of using genotype to estimate empirical relationship matrix to do the work.

 ./famRvTest -p your.ped -d your.dat --SingleVarScore --SingleVarLRT --SingleVarWald --inverseNormal --useCovariates --traitName LDL --empKin

Gene-level Association

The following command lines let you run gene-level association analysis of genes listed in "your.genes.groupfile" for trait "LDL" using SKAT, Madsen-Browning weighted burden, rare allele counts un-weighted burden and collapsing burden and variable threshold tests, after inverse normalization of the quantitative trait and adjusting covariates. Only rare variants with maf less than or equal to 0.05 and minor allele count greater than or equal to 3 are grouped.

./famRvTest -p your.ped -d your.dat --SKAT --MB --CMC_counts --CMC_binary --VTasymptotic --inverseNormal --useCovariates --traitName LDL --groupFile your.genes.groupfile --maxMaf 0.05 --mac 3