Difference between revisions of "Famrvtest"

From Genome Analysis Wiki
Jump to navigationJump to search
 
(89 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
== Useful Wiki Pages ==
 +
 +
There are a few pages in this Wiki that may be useful to famRvTest users. Here are links to key pages:
 +
 +
* The [[FamRvTest_command|'''famrvtest''' Command Reference]]
 +
* The [[FamRvTest_tutorial|'''famrvtest''' Tutorial]]
 +
 
== Brief Description ==
 
== Brief Description ==
  
'''famRvTest''' is a computationally efficient tool for family-based association analyses of rare variants using sequencing or genotyping array data. '''famRvTest''' supports both single variant and gene-level associations.  
+
'''famrvtest''' is a computationally efficient tool for family-based rare variant association analyses using genotyping array or sequencing data. '''famrvtest''' supports both single variant and gene-level associations.  
  
For any questions, please contact Shuang Feng (sfengsph at umich.edu) or Gonçalo Abecasis (goncalo at umich.edu).
+
For any questions, please contact [[Shuang_Feng |Shuang Feng]] (sfengsph at umich.edu) or [[Goncalo_Abecasis|Gonçalo Abecasis]] (goncalo at umich.edu).
  
 
== Download and Installation ==
 
== Download and Installation ==
 
* University of Michigan CSG users can go to the following:
 
* University of Michigan CSG users can go to the following:
   /net/fantasia/home/sfengsph/code/famRV/bin/famRvTesst
+
   /net/fantasia/home/sfengsph/code/famrvtest/bin/famrvtest
  
 
=== Where to Download ===
 
=== Where to Download ===
* The software package for Linux and Mac (source code included) can be downloaded here: [[Media:FamRV.0.0.5.tar.gz|'''software package download''']]
+
* Source code can be downloaded in the following
 +
 +
  [[Media:LINUX_famrvtest.2.4.tgz|Source for '''LINUX''']]
 +
  [[Media:MAC_famrvtest.2.4.tgz|Source for '''MAC''']]
 +
  [[Media:MINGW_famrvtest.2.4.tgz|Source for '''MINGW''']]
 +
  [[Media:CYGWIN64_famrvtest.2.4.tgz|Source for '''CYGWIN64''']]
 +
 
 +
* Executable can be downloaded in the following:
 +
 
 +
  [[Media:Famrvtest.2.4.linux.executable.tgz |Executable for '''LINUX''']]
  
 
=== How to Compile ===
 
=== How to Compile ===
 
* Save it to your local path and decompress using the following command:
 
* Save it to your local path and decompress using the following command:
   tar xvzf FamRV.0.0.5.tar.gz
+
   tar xvzf LINUX_famrvtest.2.4.tgz
* Go to FamRV_0.0.5/famRV/src and type the following command to compile:
+
* Go to promp>famrvtest and type the following command to compile:
 
   make
 
   make
  
 
=== How to Execute ===
 
=== How to Execute ===
* Go to FamRV_0.0.1/famRvTest/bin and use the following:
+
* Go to famrvtest/bin and use the following:
   ./famRvTest
+
   ./famrvtest
 +
 
 +
==Command Reference==
 +
Please go to [[FamRvTest_command|Command Reference Page]] for details.
  
 
==Approach==
 
==Approach==
'''famRvTest''' uses linear mixed model approach, incorporating efficient optimization algorithm, to account for familial relationship, where kinship is either quantified based upon pedigree structures or estimated from genotypes of markers from genome-wide. Single marker associations including score, likelihood ratio and ward tests and gene-level associations methods (weighted and un-weighted burden, SKAT and variable threshold tests) have been implemented. Manuscript is under preparation.
+
'''famrvtest''' uses linear mixed model approach, incorporating efficient optimization algorithm, to account for familial relationship, where kinship is either quantified based upon pedigree structures or estimated from genotypes of markers from genome-wide. Single marker associations including score, likelihood ratio and ward tests and gene-level associations methods (weighted and un-weighted burden, SKAT and variable threshold tests) have been implemented. Manuscript is under preparation.
  
== Command References ==
+
== Input Files ==
 +
famrvtest needs the following files as input: PED and DAT file in Merlin format, '''AND/OR''' a VCF file. When genotypes are stored in PED and DAT file, the VCF file is not needed. However, even if genotypes are saved in a VCF file, PED and DAT files are still needed for carrying covariate and trait information.
  
                    Data File :                (-dname)
+
=== PED and DAT Files ===
                Pedigree File :                (-pname)
+
* When PED file has genotypes saved, there is no need for a VCF file as input.
+
* '''famrvtest''' takes PED/DAT file in [http://www.sph.umich.edu/csg/abecasis/Merlin/index.html '''Merlin'''] format. Please refer to [http://sph.umich.edu/csg/abecasis/merlin/tour/input_files.html PED/DAT format description] for details.
Options:
+
* An example PED file is in the following:
            Kinship Options : --kinGeno, --empMaf [0.05], --empMiss [0.05],
+
    1 1 0 0 1 1.5 1 23 A A A A A A A A A A
                              --outputX, --outputKin, --kinFile [],
+
    2 1 0 0 1 1.0 1 34 A C A C A C A C A C
                              --kinPrefix []
+
    3 1 0 0 2 0.4 1 43 A A A A A A A A A A
      Input/Output Options : --vcf [], --groupFile [], --freqFile [],
+
    4 1 0 0 2 0.9 1 13 A C A C A C A C A C
                              --prefix []
+
* The matching DAT file is in the following:
                VC Options : --inverseNormal, --fitSharedEnv, --fitX,
+
  T YourTraitName
                              --useCovariates, --traitName []
+
  C SEX
            SingleVar Tests : --SingleVarLRT, --SingleVarScore,
+
  C AGE
                              --SingleVarWald
+
  M 1:123456a
              Burden Tests : --SKAT, --MB, --CMC_binary, --CMC_counts
+
  M 1:234567
  Variable Threshold Tests : --VTasymptotic, --VTpermute, --permuteMin [1000],
+
  M 2:111111
                              --permuteMax [3000000]
+
  M 2:222222
              Other Options : --function [], --mafMin [0.00], --mafMax [0.50],
+
  M X:12345
                              --mac [0.00], --noStop, --xLabel [X],
+
* DAT file must have variant names in the following format "M chr:pos".  
                              --Xstart [2699520], --Xend [154931044], --dosage,
+
* Orders of labels in DAT file have to match the order of fields in PED file.  
                              --founderFreq, --h2Only, --fullResult [ON]
+
* '''Markers in PED and DAT file must be sorted by chromosome and position.'''
 
 
Crucial Input Files:
 
'''famRvTest''' takes Merlin format pedigree and data file as input. These two files are crucial for the program to run. Please refer to [http://www.sph.umich.edu/csg/abecasis/Merlin/tour/input_files.html|'''Merlin'''] documentation for details.
 
Kinship Options:
 
--kinGeno allows you to estimate relationship matrix using genotypes; otherwise, kinship matrix based on pedigree structure will be used.
 
--empMaf and --empMiss specifies the cutoff of minor allele frequency and genotype missing rate to filter SNPs for estimating empirical kinship matrix.
 
--outputX allows you to save kinship matrix from chromosome X.  
 
--outputKin allows you to save the kinship matrix from autosomal matrix if --outputX is also specified.
 
--kinFile allows you to read kinship matrix from a previously saved file.
 
--kinPrefix specifies the file prefix for kinship matrices saved.
 
 
 
Input/Output Options:
 
--vcf specified the name of input vcf file.
 
--groupFile should be followed by a the name of the groupfile you want to use for gene-level associations.
 
--freqFile allows users to read allele frequencies from a file instead of estimating based on data.
 
--prefix specifies the name of file prefix for all results saved.
 
 
 
SingleVar Tests:
 
--SingleVarWald, --SingleVarScore and --SingleVarLRT are wald, score and likelihood ratio tests.
 
  
Burden Tests:
+
* Covariate and trait values are saved in PED file. Covariate and trait descriptions are saved in DAT file.
--SKAT --MB --CMC_binary --CMC_counts are SKAT, weighted-burden test (Madsen-Browning weight), collapsing burden test and unweighted burden test based on rare allele count.
 
  
VT Tests:
+
=== VCF File ===
--VTasymptotic performs variable threshold test and calculate asymptotic p-value.
+
* Another option is to use VCF as input. Please refer to the following link for VCF file specification: [http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41 1000 genome wiki VCF specs]  
--VTpermute performs variable threshold test and calculate p-value based on permutation.
+
* VCF file should be compressed by bgzip and indexed by tabix, using the following command:
--permuteMin [1000] and --permuteMax [3000000] specify the min and max number of permutation.
+
  bgzip input.vcf    ## this command will generate input.vcf.gz
 +
  tabix -p vcf -f input.vcf.gz  ## this command will generate input.vcf.gz.tbi
 +
* Even with the presence of VCF file, PED/DAT files are still needed for covariates and phenotypes.
  
Other Options:
+
=== Group File for Gene-level Tests===
--function allows grouping by functional annotation when annotated vcf file is used for gene-level association tests.
+
* Grouping methods are only necessary for gene-level tests.
--mafMin [0.00] and --mafMax [0.50] specify the minimum and maximum allele frequency for variants to group.  
+
* With --groupFile option, you can specify particular set of variants to be grouped for burden tests.
--mac [0.0] specify the minimum rare allele count as one of the filters to rare variants to group.
+
* The group file must be a tab or space delimited file in the following format:
--noStop indicating no stopping rule to be used in VT permutation test.
+
  GROUP_ID MARKER1_ID MARKER2_ID MARKER3_ID ...  
--xLabel [X] specifies labels for chromosome X.
+
* MARKER_ID must be in the following format:
--Xstart [2699520] and --Xend [154931044] are start and end position of non-pseudo-autosomal region.
+
  CHR:POS:ALLELE1:ALLELE2
--founderFreq considers founder allele frequencies in analysis.
+
* An example group file is:
--h2Only provides a shortcut of calculating heritability only.
+
  PLEKHN1 1:901922:G:A    1:901923:C:A    1:902088:G:A    1:902128:C:T    1:902133:C:G    1:902176:C:T    1:905669:C:G       
--fullResult [on] provides results in long format in gene-level association testing, including results from single markers included in analysis.
+
  HES4    1:934735:A:C    1:934770:G:A    1:934801:C:T    1:935085:G:A    1:935089:C:G
 +
* '''Version 2.4 and later allow variants from different chromosomes to be grouped for testing. This might be useful for pathway analysis.'''
 +
* '''Note: any variants that have different alleles from listed in group file will be excluded from gene-level tests.'''
  
 
== Example Command Line ==
 
== Example Command Line ==
 
===Single Variant Analysis===
 
===Single Variant Analysis===
 
The following command lines let you run single variant association analysis of trait "LDL" using score test, after inverse normalization of the quantitative trait and adjusting covariates. --traitName specifies the single trait or traits you want to analyze in this batch. If this option is not used, then all traits coded in data file will be analyzed accordingly. --SingleVarLRT provides essentially the same test as in merlin --fastAssoc option.  
 
The following command lines let you run single variant association analysis of trait "LDL" using score test, after inverse normalization of the quantitative trait and adjusting covariates. --traitName specifies the single trait or traits you want to analyze in this batch. If this option is not used, then all traits coded in data file will be analyzed accordingly. --SingleVarLRT provides essentially the same test as in merlin --fastAssoc option.  
  ./famRvTest -p your.ped -d your.dat --SingleVarScore --inverseNormal --useCovariates --traitName LDL
+
  ./famrvtest --ped your.ped --dat your.dat --vcf your.vcf.gz --SingleVarScore --inverseNormal --useCovariates --traitName LDL
Futhermore, if you want to run likelihood ratio test and wald test at the same time, the following command should do the work:
 
./famRvTest -p your.ped -d your.dat --SingleVarScore --SingleVarLRT --SingleVarWald --inverseNormal --useCovariates --traitName LDL
 
  
 
All the above commands will let you do family-based association analysis using kinship matrices generated using pedigree structure coded in pedigree file. The following command lines show examples of using genotype to estimate empirical relationship matrix to do the work.  
 
All the above commands will let you do family-based association analysis using kinship matrices generated using pedigree structure coded in pedigree file. The following command lines show examples of using genotype to estimate empirical relationship matrix to do the work.  
   ./famRvTest -p your.ped -d your.dat --SingleVarScore --SingleVarLRT --SingleVarWald --inverseNormal --useCovariates --traitName LDL --empKin
+
   ./famrvtest --ped  your.ped --dat your.dat --SingleVarScore --inverseNormal --useCovariates --traitName LDL --kinPedigree
  
 
===Gene-level Association===
 
===Gene-level Association===
  
 
The following command lines let you run gene-level association analysis of genes listed in "your.genes.groupfile" for trait "LDL" using SKAT, Madsen-Browning weighted burden, rare allele counts un-weighted burden and collapsing burden and variable threshold tests, after inverse normalization of the quantitative trait and adjusting covariates. Only rare variants with maf less than or equal to 0.05 and minor allele count greater than or equal to 3 are grouped.
 
The following command lines let you run gene-level association analysis of genes listed in "your.genes.groupfile" for trait "LDL" using SKAT, Madsen-Browning weighted burden, rare allele counts un-weighted burden and collapsing burden and variable threshold tests, after inverse normalization of the quantitative trait and adjusting covariates. Only rare variants with maf less than or equal to 0.05 and minor allele count greater than or equal to 3 are grouped.
  ./famRvTest -p your.ped -d your.dat --SKAT --MB --CMC_counts --CMC_binary --VTasymptotic --inverseNormal --useCovariates --traitName LDL --groupFile your.genes.groupfile --maxMaf 0.05 --mac 3
+
  ./famrvtest -ped your.ped -dat your.dat --SKAT_BETA --MB --burden --VT --inverseNormal --useCovariates --traitName LDL --groupFile your.genes.groupfile --maf 0.05
 +
 
 +
== Change Log ==
 +
 
 +
* Released version 0.0.9 with a bug fixed for potential compiling error. (10/10/2013)
 +
* Released version 2.0, a faster version and added family-based single variant permutation test. (7/14/2014)
 +
* Released version 2.2, a bug fixed which causes single variant test can not be run alone. (7/15/2014)
 +
* Uploaded new source code package for version2.2, with updated makefiles. (8/4/14)
 +
* Released version 2.3. Fixed a bug which causes compiling error (not finding the correct makefile). (8/20/14)
 +
* Released version 2.4. Enable analyzing pathways where variants from different chromosomes can be grouped. (9/27/2014)

Latest revision as of 10:34, 21 February 2017

Useful Wiki Pages

There are a few pages in this Wiki that may be useful to famRvTest users. Here are links to key pages:

Brief Description

famrvtest is a computationally efficient tool for family-based rare variant association analyses using genotyping array or sequencing data. famrvtest supports both single variant and gene-level associations.

For any questions, please contact Shuang Feng (sfengsph at umich.edu) or Gonçalo Abecasis (goncalo at umich.edu).

Download and Installation

  • University of Michigan CSG users can go to the following:
 /net/fantasia/home/sfengsph/code/famrvtest/bin/famrvtest

Where to Download

  • Source code can be downloaded in the following
  Source for LINUX
  Source for MAC
  Source for MINGW
  Source for CYGWIN64
  • Executable can be downloaded in the following:
  Executable for LINUX

How to Compile

  • Save it to your local path and decompress using the following command:
 tar xvzf LINUX_famrvtest.2.4.tgz
  • Go to promp>famrvtest and type the following command to compile:
 make

How to Execute

  • Go to famrvtest/bin and use the following:
 ./famrvtest

Command Reference

Please go to Command Reference Page for details.

Approach

famrvtest uses linear mixed model approach, incorporating efficient optimization algorithm, to account for familial relationship, where kinship is either quantified based upon pedigree structures or estimated from genotypes of markers from genome-wide. Single marker associations including score, likelihood ratio and ward tests and gene-level associations methods (weighted and un-weighted burden, SKAT and variable threshold tests) have been implemented. Manuscript is under preparation.

Input Files

famrvtest needs the following files as input: PED and DAT file in Merlin format, AND/OR a VCF file. When genotypes are stored in PED and DAT file, the VCF file is not needed. However, even if genotypes are saved in a VCF file, PED and DAT files are still needed for carrying covariate and trait information.

PED and DAT Files

  • When PED file has genotypes saved, there is no need for a VCF file as input.
  • famrvtest takes PED/DAT file in Merlin format. Please refer to PED/DAT format description for details.
  • An example PED file is in the following:
    1 1 0 0 1 1.5 1 23 A A A A A A A A A A
    2 1 0 0 1 1.0 1 34 A C A C A C A C A C
    3 1 0 0 2 0.4 1 43 A A A A A A A A A A
    4 1 0 0 2 0.9 1 13 A C A C A C A C A C
  • The matching DAT file is in the following:
 T YourTraitName
 C SEX
 C AGE
 M 1:123456a
 M 1:234567
 M 2:111111
 M 2:222222
 M X:12345
  • DAT file must have variant names in the following format "M chr:pos".
  • Orders of labels in DAT file have to match the order of fields in PED file.
  • Markers in PED and DAT file must be sorted by chromosome and position.
  • Covariate and trait values are saved in PED file. Covariate and trait descriptions are saved in DAT file.

VCF File

  • Another option is to use VCF as input. Please refer to the following link for VCF file specification: 1000 genome wiki VCF specs
  • VCF file should be compressed by bgzip and indexed by tabix, using the following command:
 bgzip input.vcf     ## this command will generate input.vcf.gz
 tabix -p vcf -f input.vcf.gz  ## this command will generate input.vcf.gz.tbi
  • Even with the presence of VCF file, PED/DAT files are still needed for covariates and phenotypes.

Group File for Gene-level Tests

  • Grouping methods are only necessary for gene-level tests.
  • With --groupFile option, you can specify particular set of variants to be grouped for burden tests.
  • The group file must be a tab or space delimited file in the following format:
 GROUP_ID MARKER1_ID MARKER2_ID MARKER3_ID ... 
  • MARKER_ID must be in the following format:
 CHR:POS:ALLELE1:ALLELE2
  • An example group file is:
 PLEKHN1 1:901922:G:A    1:901923:C:A    1:902088:G:A    1:902128:C:T    1:902133:C:G    1:902176:C:T    1:905669:C:G        
 HES4    1:934735:A:C    1:934770:G:A    1:934801:C:T    1:935085:G:A    1:935089:C:G
  • Version 2.4 and later allow variants from different chromosomes to be grouped for testing. This might be useful for pathway analysis.
  • Note: any variants that have different alleles from listed in group file will be excluded from gene-level tests.

Example Command Line

Single Variant Analysis

The following command lines let you run single variant association analysis of trait "LDL" using score test, after inverse normalization of the quantitative trait and adjusting covariates. --traitName specifies the single trait or traits you want to analyze in this batch. If this option is not used, then all traits coded in data file will be analyzed accordingly. --SingleVarLRT provides essentially the same test as in merlin --fastAssoc option.

./famrvtest --ped your.ped --dat your.dat --vcf your.vcf.gz --SingleVarScore --inverseNormal --useCovariates --traitName LDL

All the above commands will let you do family-based association analysis using kinship matrices generated using pedigree structure coded in pedigree file. The following command lines show examples of using genotype to estimate empirical relationship matrix to do the work.

 ./famrvtest --ped  your.ped --dat your.dat --SingleVarScore --inverseNormal --useCovariates --traitName LDL --kinPedigree

Gene-level Association

The following command lines let you run gene-level association analysis of genes listed in "your.genes.groupfile" for trait "LDL" using SKAT, Madsen-Browning weighted burden, rare allele counts un-weighted burden and collapsing burden and variable threshold tests, after inverse normalization of the quantitative trait and adjusting covariates. Only rare variants with maf less than or equal to 0.05 and minor allele count greater than or equal to 3 are grouped.

./famrvtest -ped your.ped -dat your.dat --SKAT_BETA --MB --burden --VT --inverseNormal --useCovariates --traitName LDL --groupFile your.genes.groupfile --maf 0.05

Change Log

  • Released version 0.0.9 with a bug fixed for potential compiling error. (10/10/2013)
  • Released version 2.0, a faster version and added family-based single variant permutation test. (7/14/2014)
  • Released version 2.2, a bug fixed which causes single variant test can not be run alone. (7/15/2014)
  • Uploaded new source code package for version2.2, with updated makefiles. (8/4/14)
  • Released version 2.3. Fixed a bug which causes compiling error (not finding the correct makefile). (8/20/14)
  • Released version 2.4. Enable analyzing pathways where variants from different chromosomes can be grouped. (9/27/2014)