Difference between revisions of "Rare variant tests"

From Genome Analysis Wiki
Jump to navigationJump to search
 
(4 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
=== Summary of discussion from ESP rare variant working group ===
 +
 +
The rare variant working group within ESP has discussed the issue of
 +
rare variant tests on several conference calls.  The end result is
 +
that we recommend selecting one test from each of these three
 +
categories;
 +
 +
1. Aggregate tests (typically with 1% threshold, nonsynonymous SNPs
 +
only, with meta-analysis across different ethnic groups)
 +
 +
2. Tests that allow for risk and/or protective variants (again,
 +
probably 1% threshold, nonsynonymous SNPs only, with meta-analysis
 +
across different ethnic groups)
 +
 +
3. Weighted tests that allow incorporation of more common variants
 +
(possibly apply 5% threshold?, nonsynonymous only, etc.)
 +
 +
A brief summary of the RV discussion;
 +
 +
- Permutations (where we permute phenotype while maintaining ethnic
 +
group) will likely be required to get empirical p-values.  These RV
 +
tests typically provide conservative p-values (deflated QQ plot), but
 +
not always.  Thus, a computationally intensive test will not be
 +
practical for performing large numbers of permutations (at least
 +
1000).
 +
 +
- Using too many tests will decrease the power overall because of
 +
correction for family-wise error.
 +
 +
- Although we'd like to evaluate power and type I error rates of these
 +
tests under a variety of genetic models, the reality is that we have
 +
so few known positive examples it would be difficult to assess them
 +
all in a fair way at this time.  Instead, we expect to re-convene this
 +
discussion group at a later date once some true positive associations
 +
are identified.
 +
 +
- Shamil Sunyaev is performing a bake-off with some of these tests,
 +
and we look forward to seeing his results in the future.
 +
 +
- PLINKSeq is on its way, but is likely a month away from release (end Feb 2011)
 +
 +
 +
 
=== Summary of rare variant tests for sequence data  ===
 
=== Summary of rare variant tests for sequence data  ===
  
Line 5: Line 48:
  
 
  * indicates applicability to quantitative data
 
  * indicates applicability to quantitative data
? indicates possible applicability to quantitative data with adaptation
 
Blue font indicates implementation in upcoming PLINKSeq release (http://atgu.mgh.harvard.edu/plinkseq/)
 
  
[http://atgu.mgh.harvard.edu/plinkseq/ Will be implemented in PlinkSeq]
+
 
 +
 
  
 
'''1)  Aggregate tests using a cut off e.g. 1 % analyzing nonsynonymous variants to detect detrimental variants'''
 
'''1)  Aggregate tests using a cut off e.g. 1 % analyzing nonsynonymous variants to detect detrimental variants'''
Line 19: Line 61:
 
! scope="col" align="left" | Notes  
 
! scope="col" align="left" | Notes  
 
|-
 
|-
| CMC/T1 test* || [http://www.ncbi.nlm.nih.gov/pubmed/18691683 Li & Leal, 2008] ||  || [http://atgu.mgh.harvard.edu/plinkseq/ Will be implemented in PlinkSeq] |
+
| CMC/T1 test* || [http://www.ncbi.nlm.nih.gov/pubmed/18691683 Li & Leal, 2008]  
 +
|
 +
| [http://atgu.mgh.harvard.edu/plinkseq/ Will be implemented in PlinkSeq]  
 
|-
 
|-
| KBAC || [http://www.ncbi.nlm.nih.gov/pubmed/20976247 Liu & Leal, 2010] || || [http://atgu.mgh.harvard.edu/plinkseq/ Will be implemented in PlinkSeq] |
+
| KBAC || [http://www.ncbi.nlm.nih.gov/pubmed/20976247 Liu & Leal, 2010] ||  
 +
| [http://atgu.mgh.harvard.edu/plinkseq/ Will be implemented in PlinkSeq]  
 
|-
 
|-
| VT* || [http://www.ncbi.nlm.nih.gov/pubmed/20471002 Price et al., 2010] || http://genetics.bwh.harvard.edu/rare_variants/ || Incorporating functional weights but not VT, [http://atgu.mgh.harvard.edu/plinkseq/ Will be implemented in PlinkSeq] |
+
| VT* || [http://www.ncbi.nlm.nih.gov/pubmed/20471002 Price et al., 2010]  
 +
| http://genetics.bwh.harvard.edu/rare_variants/  
 +
| Incorporating functional weights but not VT, [http://atgu.mgh.harvard.edu/plinkseq/ Will be implemented in PlinkSeq] |
 
|-
 
|-
| WSS || [http://www.ncbi.nlm.nih.gov/pubmed/19214210 Madsen & Browning, 2009] || || with 1% cutoff, [http://atgu.mgh.harvard.edu/plinkseq/ Will be implemented in PlinkSeq] |
+
| WSS || [http://www.ncbi.nlm.nih.gov/pubmed/19214210 Madsen & Browning, 2009] ||  
 +
| with 1% cutoff, [http://atgu.mgh.harvard.edu/plinkseq/ Will be implemented in PlinkSeq]  
 
|-
 
|-
| CMAT || [http://www.ncbi.nlm.nih.gov/pubmed/21070896 Zawistowski et al. 2010] || || |
+
| CMAT || [http://www.ncbi.nlm.nih.gov/pubmed/21070896 Zawistowski et al. 2010] || ||  
 
|-
 
|-
| ANRV/GRANVIL* || [http://www.ncbi.nlm.nih.gov/pubmed/19810025 Morris & Zeggini] || || |
+
| ANRV/GRANVIL* || [http://www.ncbi.nlm.nih.gov/pubmed/19810025 Morris & Zeggini] || ||  
 
|-
 
|-
| RARECOVER || [http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000954 Bhati et al. 2010] || || |
+
| RARECOVER || [http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000954 Bhati et al. 2010] || ||  
 
|-
 
|-
| CCRaVAT and QuTie* || [http://www.ncbi.nlm.nih.gov/pubmed/20964851 Lawrence et al. 2010] || http://www.sanger.ac.uk/resources/software/rarevariant/ || |
+
| CCRaVAT and QuTie* || [http://www.ncbi.nlm.nih.gov/pubmed/20964851 Lawrence et al. 2010]  
 +
| http://www.sanger.ac.uk/resources/software/rarevariant/ ||  
 
|-
 
|-
| RVE (rare variant exclusive) || Cohen & Hobb || || underpowered, [http://atgu.mgh.harvard.edu/plinkseq/ Will be implemented in PlinkSeq] |
+
| RVE (rare variant exclusive) || Cohen & Hobbs ||  
 +
| underpowered, [http://atgu.mgh.harvard.edu/plinkseq/ Will be implemented in PlinkSeq]  
 
|}
 
|}
 +
 +
  
 
'''2)  Aggregate tests for protective and detrimental variants (recommend 1% cutoff)'''
 
'''2)  Aggregate tests for protective and detrimental variants (recommend 1% cutoff)'''
Line 47: Line 99:
 
! scope="col" align="left" | Notes  
 
! scope="col" align="left" | Notes  
 
|-
 
|-
| C-alpha || [Neale et al., submitted] ||  || [http://atgu.mgh.harvard.edu/plinkseq/ Will be implemented in PlinkSeq] |
+
| C-alpha || [Neale et al., submitted] ||   
 +
| [http://atgu.mgh.harvard.edu/plinkseq/ Will be implemented in PlinkSeq]  
 
|-
 
|-
| Ionita-Laza & Lange || [http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1001289 Ionita-Laza & Lange, 2011] || || |
+
| Ionita-Laza & Lange  
 +
| [http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1001289 Ionita-Laza & Lange, 2011] || || |
 
|-
 
|-
| DASH* || [http://www.ncbi.nlm.nih.gov/pubmed/20413981 Han & Pan] || || Computational burden |
+
| DASH* || [http://www.ncbi.nlm.nih.gov/pubmed/20413981 Han & Pan] || || Computational burden  
 
|-
 
|-
| SKAT* || [http://www.ncbi.nlm.nih.gov/pubmed/20560208 Wu et al., 2010] || http://www.hsph.harvard.edu/~xlin/software.html || For some kernel choices, need to code 0=major homozygote, 1=het, 2-minor homozygote |
+
| SKAT* || [http://www.ncbi.nlm.nih.gov/pubmed/20560208 Wu et al., 2010]  
 +
| http://www.hsph.harvard.edu/~xlin/software.html  
 +
| For some kernel choices, need to code 0=major homozygote, 1=het, 2-minor homozygote  
 
|-
 
|-
| WHaIT || [http://www.ncbi.nlm.nih.gov/pubmed/21055717 Li et al. 2010] || http://www.sph.umich.edu/csg/yli/whait/ || |
+
| WHaIT || [http://www.ncbi.nlm.nih.gov/pubmed/21055717 Li et al. 2010]  
 +
| http://csg.sph.umich.edu//yli/whait/ ||  
 
|-
 
|-
| EMMPAT* || [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2978703/pdf/pgen.1001202.pdf King et al. 2010] || http://home.uchicago.edu/~crk8e/papersup.html || |
+
| EMMPAT*  
 +
| [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2978703/pdf/pgen.1001202.pdf King et al. 2010]  
 +
| http://home.uchicago.edu/~crk8e/papersup.html ||  
 
|}
 
|}
 +
 +
  
 
'''3) Analyzing common and rare variants together (could down-weight or threshold common variants)'''
 
'''3) Analyzing common and rare variants together (could down-weight or threshold common variants)'''
Line 69: Line 130:
 
! scope="col" align="left" | Notes
 
! scope="col" align="left" | Notes
 
|-
 
|-
| WSS || [http://www.ncbi.nlm.nih.gov/pubmed/19214210 Madsen & Browning, 2009] || || with 1% or 5% cutoff, [http://atgu.mgh.harvard.edu/plinkseq/ Will be implemented in PlinkSeq] |
+
| WSS || [http://www.ncbi.nlm.nih.gov/pubmed/19214210 Madsen & Browning, 2009] ||  
 +
| with 1% or 5% cutoff, [http://atgu.mgh.harvard.edu/plinkseq/ Will be implemented in PlinkSeq]  
 
|-
 
|-
| RARECOVER || [http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000954 Bhati et al. 2010] || || |
+
| RARECOVER  
 +
| [http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000954 Bhati et al. 2010] || ||  
 
|-
 
|-
| Step-Up Collapsing* || [http://www.plosone.org/article/info:doi/10.1371/journal.pone.0013584 Hoffman et al. 2010] || || [http://atgu.mgh.harvard.edu/plinkseq/ Will be implemented in PlinkSeq] |
+
| Step-Up Collapsing*  
 +
| [http://www.plosone.org/article/info:doi/10.1371/journal.pone.0013584 Hoffman et al. 2010] ||  
 +
| [http://atgu.mgh.harvard.edu/plinkseq/ Will be implemented in PlinkSeq]  
 
|-
 
|-
| CMC/T5 test* || [http://www.ncbi.nlm.nih.gov/pubmed/18691683 Li & Leal, 2008] ||  || [http://atgu.mgh.harvard.edu/plinkseq/ Will be implemented in PlinkSeq] |
+
| CMC/T5 test* || [http://www.ncbi.nlm.nih.gov/pubmed/18691683 Li & Leal, 2008] ||   
 +
| [http://atgu.mgh.harvard.edu/plinkseq/ Will be implemented in PlinkSeq]  
 
|-
 
|-
| MENDEL* || [http://www.ncbi.nlm.nih.gov/pubmed/21121038 Zhou et al. 2011] || http://www.genetics.ucla.edu/software/download?package=1 || |
+
| MENDEL* || [http://www.ncbi.nlm.nih.gov/pubmed/21121038 Zhou et al. 2011]  
 +
| http://www.genetics.ucla.edu/software/download?package=1 ||  
 
|}
 
|}
 +
 +
  
 
'''4.) Analyze higher frequency rare variants >1% individually'''
 
'''4.) Analyze higher frequency rare variants >1% individually'''
 
                   Use same regression frame work which has been used for common variants*
 
                   Use same regression frame work which has been used for common variants*
 
                   Use meta analysis to combine results from sequence data and imputed genotypes to increase power*
 
                   Use meta analysis to combine results from sequence data and imputed genotypes to increase power*
 
 
Additional tests
 
Test Name Notes Reference Website/Code
 
Logic Regression* Kooperberg et al. (2001)
 
http://kooperberg.fhcrc.org/papers/2001gaw.pdf
 
Sequence diversity Anderson (2006)
 
Sequence dissimilarity* Schork et al. (2008), Wessel et al. (2006)
 
Ridge Regression* Malo et al. (2008)
 
 
  
 
'''Additional tests'''
 
'''Additional tests'''
Line 103: Line 162:
 
! scope="col" align="left" | Notes  
 
! scope="col" align="left" | Notes  
 
|-
 
|-
| Logic regression* || [http://kooperberg.fhcrc.org/papers/2001gaw.pdf Kooperberg et al. 2001] ||  || |
+
| Logic regression* || [http://kooperberg.fhcrc.org/papers/2001gaw.pdf Kooperberg et al. 2001] ||  ||  
 
|-
 
|-
| Sequence diversity || Anderson et al. 2006 || || |
+
| Sequence diversity || Anderson et al. 2006 || ||  
 
|-
 
|-
| Sequence dissimilarity* || Schork et al. 2008, Wessel et al. 2006 || || |
+
| Sequence dissimilarity* || Schork et al. 2008, Wessel et al. 2006 || ||  
 
|-
 
|-
| Ridge regression * || Malo et al. 2008 || || |
+
| Ridge regression * || [http://www.cell.com/AJHG/abstract/S0002-9297(08)00091-8 Malo et al. 2008] || ||  
 
|}
 
|}

Latest revision as of 11:35, 2 February 2017

Summary of discussion from ESP rare variant working group

The rare variant working group within ESP has discussed the issue of rare variant tests on several conference calls. The end result is that we recommend selecting one test from each of these three categories;

1. Aggregate tests (typically with 1% threshold, nonsynonymous SNPs only, with meta-analysis across different ethnic groups)

2. Tests that allow for risk and/or protective variants (again, probably 1% threshold, nonsynonymous SNPs only, with meta-analysis across different ethnic groups)

3. Weighted tests that allow incorporation of more common variants (possibly apply 5% threshold?, nonsynonymous only, etc.)

A brief summary of the RV discussion;

- Permutations (where we permute phenotype while maintaining ethnic group) will likely be required to get empirical p-values. These RV tests typically provide conservative p-values (deflated QQ plot), but not always. Thus, a computationally intensive test will not be practical for performing large numbers of permutations (at least 1000).

- Using too many tests will decrease the power overall because of correction for family-wise error.

- Although we'd like to evaluate power and type I error rates of these tests under a variety of genetic models, the reality is that we have so few known positive examples it would be difficult to assess them all in a fair way at this time. Instead, we expect to re-convene this discussion group at a later date once some true positive associations are identified.

- Shamil Sunyaev is performing a bake-off with some of these tests, and we look forward to seeing his results in the future.

- PLINKSeq is on its way, but is likely a month away from release (end Feb 2011)


Summary of rare variant tests for sequence data

Compiled by Cristen Willer and Suzanne Leal for the ESP Feb 1, 2011

* indicates applicability to quantitative data



1) Aggregate tests using a cut off e.g. 1 % analyzing nonsynonymous variants to detect detrimental variants

Test Name Reference Software Notes
CMC/T1 test* Li & Leal, 2008 Will be implemented in PlinkSeq
KBAC Liu & Leal, 2010 Will be implemented in PlinkSeq
VT* Price et al., 2010 http://genetics.bwh.harvard.edu/rare_variants/
WSS Madsen & Browning, 2009 with 1% cutoff, Will be implemented in PlinkSeq
CMAT Zawistowski et al. 2010
ANRV/GRANVIL* Morris & Zeggini
RARECOVER Bhati et al. 2010
CCRaVAT and QuTie* Lawrence et al. 2010 http://www.sanger.ac.uk/resources/software/rarevariant/
RVE (rare variant exclusive) Cohen & Hobbs underpowered, Will be implemented in PlinkSeq


2) Aggregate tests for protective and detrimental variants (recommend 1% cutoff)

Test Name Reference Software Notes
C-alpha [Neale et al., submitted] Will be implemented in PlinkSeq
Ionita-Laza & Lange Ionita-Laza & Lange, 2011
DASH* Han & Pan Computational burden
SKAT* Wu et al., 2010 http://www.hsph.harvard.edu/~xlin/software.html For some kernel choices, need to code 0=major homozygote, 1=het, 2-minor homozygote
WHaIT Li et al. 2010 http://csg.sph.umich.edu//yli/whait/
EMMPAT* King et al. 2010 http://home.uchicago.edu/~crk8e/papersup.html


3) Analyzing common and rare variants together (could down-weight or threshold common variants)

Test Name Reference Software Notes
WSS Madsen & Browning, 2009 with 1% or 5% cutoff, Will be implemented in PlinkSeq
RARECOVER Bhati et al. 2010
Step-Up Collapsing* Hoffman et al. 2010 Will be implemented in PlinkSeq
CMC/T5 test* Li & Leal, 2008 Will be implemented in PlinkSeq
MENDEL* Zhou et al. 2011 http://www.genetics.ucla.edu/software/download?package=1


4.) Analyze higher frequency rare variants >1% individually

                  Use same regression frame work which has been used for common variants*
                  Use meta analysis to combine results from sequence data and imputed genotypes to increase power*

Additional tests

Test Name Reference Software Notes
Logic regression* Kooperberg et al. 2001
Sequence diversity Anderson et al. 2006
Sequence dissimilarity* Schork et al. 2008, Wessel et al. 2006
Ridge regression * Malo et al. 2008