Difference between revisions of "RAREMETAL METHOD"

From Genome Analysis Wiki
Jump to: navigation, search
(BURDEN META ANALYSIS)
(Tag category)
 
(79 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
==INTRODUCTION==
 
==INTRODUCTION==
The key idea behind meta-analysis with RAREMETAL is that various gene-level test statistics can be reconstructed from single variant score statistics and that, when the linkage disequilibrium relationships between variants are known, the distribution of these gene-level statistics can be derived and used to evaluate signifi-cance. Single variant statistics are calculated using the Cochran-Mantel-Haenszel method. The main formulae are tabulated in the following:
+
The key idea behind meta-analysis with RAREMETAL is that various gene-level test statistics can be reconstructed from single variant score statistics and that, when the linkage disequilibrium relationships between variants are known, the distribution of these gene-level statistics can be derived and used to evaluate signifi-cance. Single variant statistics are calculated using the Cochran-Mantel-Haenszel method. Our method has been published in [http://www.ncbi.nlm.nih.gov/pubmed/24336170 '''Liu et. al''']. The main formulae are tabulated in the following:
  
 
==KEY FORMULAE==
 
==KEY FORMULAE==
Line 18: Line 18:
  
 
<math> S </math> is the number of studies
 
<math> S </math> is the number of studies
 +
 +
<math> f_{i} </math> is the pooled allele frequency of <math>i^{th}</math> variant
 +
 +
<math> f_{i,k} </math> is the allele frequency of <math>i^{th}</math> variant in <math>k^{th}</math> study
 +
 +
<math> {\delta_{k}} </math> is the deviation of trait value of <math>k^{th}</math> study
  
 
<math> \mathbf{w^T} = (w_1,w_2,...,w_m)^T</math> is the vector of weights for <math>m</math> rare variants in a gene.
 
<math> \mathbf{w^T} = (w_1,w_2,...,w_m)^T</math> is the vector of weights for <math>m</math> rare variants in a gene.
  
 
===SINGLE VARIANT META ANALYSIS===
 
===SINGLE VARIANT META ANALYSIS===
Single variant meta-analysis score statistic can be reconstructed from score statistics and their variances generate by each study, assuming that samples are unrelated across studies. Define meta-analysis score statistics as
+
Single variant meta-analysis score statistic can be reconstructed from score statistics and their variances generated by each study, assuming that samples are unrelated across studies. Define meta-analysis score statistics as
  
<math>U_{meta,i}=\sum_{k=1}^S {U_{i,k}}</math>
+
<math>U_{meta_i}=\sum_{k=1}^S {U_{i,k}}</math>
  
 
and its variance
 
and its variance
  
<math>V_{meta,i}=\sum_{k=1}^S{V_{ii,k}}</math>
+
<math>V_{meta_i}=\sum_{k=1}^S{V_{ii,k}}</math>.
 +
 
 +
Then the score test statistics for the <math>i^{th}</math> variant <math>T_{meta_i}</math> asymptotically follows standard normal distribution
 +
 
 +
<math>T_{meta_i}=U_{meta_i}\bigg/\sqrt{V_{meta_i}}=\sum_{k=1}^S {U_{i,k}}\bigg/\sqrt{\sum_{k=1}^S{V_{ii,k}}} \sim\mathbf{N}(0,1)</math>.
 +
 
 +
 
 +
'''Optimized method for unbalanced studies (--useExact)''':
  
Then the score test statistics for the <math>i^{th}</math> variant <math>T_{meta,i}</math> asymptotically follows standard normal distribution
+
<math>U_{meta_i}=\sum_{k=1}^S {U_{i,k}/\hat{\Omega_{k}}}-\sum_{k=1}^S{2n_{k}{\delta_{k}^{2}(f_{i}-f_{i,k})}}</math>
  
<math>T_{meta,i}=U_{meta,i}\bigg/\sqrt{V_{meta,i}}=\sum_{k=1}^S {U_{i,k}}\bigg/\sqrt{\sum_{k=1}^S{V_{ii,k}}} \sim\mathbf{N}(0,1)</math>
+
<math>V_{meta_i}={\sigma^{2}}\sum_{k=1}^S{(V_{ii,k}{\Omega_{k}}-4n_{k}(ff'-f_{k}f_{k}'))}</math>
 +
 
 +
<math>{\sigma^{2}}=\sum_{k=1}^S{((n_{k}-1){\Omega_{k}}+n_{k}{\delta_{k}^{2}})}/(n-1)</math>
  
 
===BURDEN META ANALYSIS===
 
===BURDEN META ANALYSIS===
Once single variant meta analysis statistics are constructed, burden test score statistic can be reconstructed from these
 
  
<math>T_{meta,b}=\mathbf{w^TU_{meta}}\bigg/\sqrt{\mathbf{w^TV_{meta}w}}</math>
+
Burden test has been shown to be powerful detecting a group of rare variants that are unidirectional in effects. Once single variant meta analysis statistics are constructed, burden test score statistic for a gene can be easily reconstructed as
 +
 
 +
<math>T_{meta_{burden}}=\mathbf{w^TU_{meta}}\bigg/\sqrt{\mathbf{w^TV_{meta}w}} \sim\mathbf{N}(0,1)</math>,
 +
 
 +
where <math>\mathbf{U_{meta}} = (U_{meta_1},U_{meta_2},...,U_{meta_m})^T</math> and <math> \mathbf{V_{meta}}=cov(\mathbf{U_{meta}})</math>, representing a vector of single variant meta-analysis scores of <math>m</math> variants in a gene and the covariance matrix of the scores across <math>m</math> variants.
  
 
===VT META ANALYSIS===
 
===VT META ANALYSIS===
 +
 +
Including variants that are not associated to phenotype can hurt power. Variable threshold test is designed to choose the optimal allele frequency threshold amongst rare variants in a gene, to gain power. The test statistic is defined as the maximum burden score statistic calculated using every possible frequency threshold
 +
 +
 +
<math>T_{meta_{VT}}=\max(T_{b\left(f_1\right)},T_{b\left(f_2\right)},\dots,T_{b\left(f_m\right)})</math>,
 +
 +
where <math>T_{b\left(f_i\right)}</math> is the burden test statistic under allele frequency threshold <math>f_i</math>, and can be constructed from single variant meta-analysis statistics using
 +
 +
 +
<math>T_{b\left(f_j\right)}=\boldsymbol{\phi}_{f_j}^\mathbf{T}\mathbf{U_{meta}}\bigg/\sqrt{\boldsymbol{\phi}_{f_j}^\mathbf{T}\mathbf{V_{meta}}\boldsymbol{\phi}_{f_j}} </math>,
 +
 +
 +
where <math>j</math> represents any allele frequency in a group of rare variants, <math>\boldsymbol{\phi}_{f_j}</math> is a vector of 0 and 1, indicating if a variant is included in the analysis using frequency threshold <math>f_i</math>.
 +
 +
 +
As described by [http://www.ncbi.nlm.nih.gov/pubmed/21885029 '''Lin et. al'''], the p-value of this test can be calculated analytically using the fact that the burden test statistics together follow a multivariate normal distribution with mean <math>\mathbf{0}</math> and covariance <math>\boldsymbol{\Omega}</math>, written as
 +
 +
 +
<math> \left(T_{b\left(f_1\right)},T_{b\left(f_2\right)},\dots,T_{b\left(f_m\right)}\right)</math><math>\sim\mathbf{MVN}\left(\mathbf{0},\boldsymbol{\Omega}\right) </math>,
 +
 +
 +
where <math>\boldsymbol{\Omega_{ij}}=\frac{\boldsymbol{\phi}_{f_i}^T\mathbf{V_{meta}}\boldsymbol{\phi}_{f_j}}{\sqrt{\boldsymbol{\phi}_{f_i}^T\mathbf{V_{meta}}\boldsymbol{\phi}_{f_i}}\sqrt{\boldsymbol{\phi}_{f_j}^T\mathbf{V_{meta}}\boldsymbol{\phi}_{f_j}}}</math>.
  
 
===SKAT META ANALYSIS===
 
===SKAT META ANALYSIS===
  
{| border="1" cellpadding="5" cellspacing="0" align="center"
+
SKAT is most powerful when detecting genes with rare variants having opposite directions in effect sizes. Meta-analysis statistic can also be re-constructed using single variant meta-analysis scores and their covariances
|+'''Formulae for RAREMETAL'''
+
 
! scope="col" width="120pt" | Test
+
<math>\mathbf{Q}=\mathbf{{U_{meta}}^T}\mathbf{W}\mathbf{U_{meta}}</math>,
! scope="col" width="50pt" | Statistics
+
 
! scope="col" width="225pt" | Null Distribution
+
where <math>\mathbf{W}</math> is a diagonal matrix of weights of rare variants included in a gene.
! scope="col" width="225pt" | Notation
+
 
|-
+
As shown in [http://www.ncbi.nlm.nih.gov/pubmed/21737059 '''Wu et. al'''], the null distribution of the <math> \mathbf{Q} </math> statistic follows a mixture chi-sqaured distribution described as
| Single Variant  || <math>T=\sum_{i=1}^n {U_i}\bigg/\sqrt{\sum_{i=1}^n{V_i}}</math> || <math>T\sim\mathbf{N}(0,1)</math> ||<math> U_i \text{ is the score statistic from study }i;</math><math> V_i \text{ is the variance of } U_i.</math>
+
 
|-
+
<math>\mathbf{Q}\sim\sum_{i=1}^m{\lambda_i\chi_{1,i}^2}, </math> where <math>\left(\lambda_1,\lambda_2,\dots,\lambda_m\right)</math> are eigen values of <math>\mathbf{V_{meta}^\frac{1}{2}}\mathbf{W}\mathbf{V_{meta}^\frac{1}{2}}</math>.
| un-weighted Burden      || <math>T_b=\sum_{i=1}^n{\mathbf{U_i}}\Big/\sqrt{\sum_{i=1}^n{\mathbf{V_i}}}</math> || <math>T_b\sim\mathbf{N}(0,1)</math> ||<math> \mathbf{U_i}\text{ is the vector of score statistics from study }i, or </math> <math> \mathbf{U_i}=\{U_{i1},...,U_{im}\};</math> <math>\mathbf{V_i} \text{ is the covariance of } \mathbf{U_i}.</math>
+
 
|-
+
 
| Weighted Burden || <math>T_{wb}=\mathbf{w^T}\sum_{i=1}^n{\mathbf{U_i}}\bigg/\sqrt{\mathbf{w^T}\left(\sum_{i=1}^n{\mathbf{V_i}}\right)\mathbf{w}}</math>  || <math>T_{wb}\sim\mathbf{N}(0,1)</math> || <math> \mathbf{w^T}=\{w_1,w_2,...,w_m\}^T \text{ is the weight vector.}</math>
+
[[Category:RAREMETAL]]
|-style="height: 50pt;"
 
| VT || <math>T_{VT}=\max(T_{b\left(f_1\right)},T_{b\left(f_2\right)},\dots,T_{b\left(f_m\right)}),\text{ where}</math><math>T_{b\left(f_j\right)}=\boldsymbol{\phi}_{f_j}^\mathbf{T}\sum_{i=1}^n{\mathbf{U_i}}\bigg/\sqrt{\boldsymbol{\phi}_{f_j}^\mathbf{T}\left(\sum_{i=1}^n{\mathbf{V_i}}\right)\boldsymbol{\phi}_{f_j}} </math> ||<math> \left(T_{b\left(f_1\right)},T_{b\left(f_2\right)},\dots,T_{b\left(f_m\right)}\right)</math><math>\sim\mathbf{MVN}\left(\mathbf{0},\boldsymbol{\Omega}\right)\text{,} </math><math>\text{where }\boldsymbol{\Omega_{ij}}=\frac{\boldsymbol{\phi}_{f_i}^T\left(\sum_{i=1}^n{\mathbf{V_i}}\right)\boldsymbol{\phi}_{f_j}}{\sqrt{\boldsymbol{\phi}_{f_i}^T\left(\sum_{i=1}^n{\mathbf{V_i}}\right)\boldsymbol{\phi}_{f_i}}\sqrt{\boldsymbol{\phi}_{f_j}^T\left(\sum_{i=1}^n{\mathbf{V_i}}\right)\boldsymbol{\phi}_{f_j}}}</math> || <math> \boldsymbol{\phi}_{f_j}\text{ is a vector of } 0 \text{s and } 1\text{s,} </math> <math>\text{indicating the inclusion of a variant using threshold }f_j; </math>
 
|-
 
| SKAT || <math>\mathbf{Q}=\left(\sum_{i=1}^n{\mathbf{U_i^T}}\right) \mathbf{W}\left(\sum_{i=1}^n{\mathbf{U_i}}\right)</math> ||<math>\mathbf{Q}\sim\sum_{i=1}^m{\lambda_i\chi_{1,i}^2},\text{ where}</math> <math>\left(\lambda_1,\lambda_2,\dots,\lambda_m\right)\text{ are eigen values of}</math><math>\left(\sum_{i=1}^n{\mathbf{V_i}}\right)^\frac{1}{2}\mathbf{W}\left(\sum_{i=1}^n{\mathbf{V_i}}\right)^\frac{1}{2}</math> || <math>\mathbf{W}\text{ is a diagonal matrix of weights.}</math>
 
|}
 

Latest revision as of 13:28, 20 May 2019

INTRODUCTION

The key idea behind meta-analysis with RAREMETAL is that various gene-level test statistics can be reconstructed from single variant score statistics and that, when the linkage disequilibrium relationships between variants are known, the distribution of these gene-level statistics can be derived and used to evaluate signifi-cance. Single variant statistics are calculated using the Cochran-Mantel-Haenszel method. Our method has been published in Liu et. al. The main formulae are tabulated in the following:

KEY FORMULAE

NOTATIONS

We denote the following to describe our methods:

U_{i,k} is the score statistic for the i^{th} variant from the  k^{th} study

V_{ij,k} is the covariance of the score statistics between the i^{th} and the j^{th} variant from the  k^{th} study

U_{i,k} and V_{ij,k} are described in detail in RAREMETALWORKER method.

\mathbf{U_k} is the vector of score statistics of rare variants in a gene from the  k^{th} study.

\mathbf{V_k} is the variance-covariance matrix of score statistics of rare variants in a gene from the  k^{th} study, or \mathbf{V_k} = cov(\mathbf{U_k})

 S is the number of studies

 f_{i} is the pooled allele frequency of i^{th} variant

 f_{i,k} is the allele frequency of i^{th} variant in k^{th} study

 {\delta_{k}} is the deviation of trait value of k^{th} study

 \mathbf{w^T} = (w_1,w_2,...,w_m)^T is the vector of weights for m rare variants in a gene.

SINGLE VARIANT META ANALYSIS

Single variant meta-analysis score statistic can be reconstructed from score statistics and their variances generated by each study, assuming that samples are unrelated across studies. Define meta-analysis score statistics as

U_{meta_i}=\sum_{k=1}^S {U_{i,k}}

and its variance

V_{meta_i}=\sum_{k=1}^S{V_{ii,k}}.

Then the score test statistics for the i^{th} variant T_{meta_i} asymptotically follows standard normal distribution

T_{meta_i}=U_{meta_i}\bigg/\sqrt{V_{meta_i}}=\sum_{k=1}^S {U_{i,k}}\bigg/\sqrt{\sum_{k=1}^S{V_{ii,k}}} \sim\mathbf{N}(0,1).


Optimized method for unbalanced studies (--useExact):

U_{meta_i}=\sum_{k=1}^S {U_{i,k}/\hat{\Omega_{k}}}-\sum_{k=1}^S{2n_{k}{\delta_{k}^{2}(f_{i}-f_{i,k})}}

V_{meta_i}={\sigma^{2}}\sum_{k=1}^S{(V_{ii,k}{\Omega_{k}}-4n_{k}(ff'-f_{k}f_{k}'))}

{\sigma^{2}}=\sum_{k=1}^S{((n_{k}-1){\Omega_{k}}+n_{k}{\delta_{k}^{2}})}/(n-1)

BURDEN META ANALYSIS

Burden test has been shown to be powerful detecting a group of rare variants that are unidirectional in effects. Once single variant meta analysis statistics are constructed, burden test score statistic for a gene can be easily reconstructed as

T_{meta_{burden}}=\mathbf{w^TU_{meta}}\bigg/\sqrt{\mathbf{w^TV_{meta}w}} \sim\mathbf{N}(0,1),

where \mathbf{U_{meta}} = (U_{meta_1},U_{meta_2},...,U_{meta_m})^T and  \mathbf{V_{meta}}=cov(\mathbf{U_{meta}}), representing a vector of single variant meta-analysis scores of m variants in a gene and the covariance matrix of the scores across m variants.

VT META ANALYSIS

Including variants that are not associated to phenotype can hurt power. Variable threshold test is designed to choose the optimal allele frequency threshold amongst rare variants in a gene, to gain power. The test statistic is defined as the maximum burden score statistic calculated using every possible frequency threshold


T_{meta_{VT}}=\max(T_{b\left(f_1\right)},T_{b\left(f_2\right)},\dots,T_{b\left(f_m\right)}),

where T_{b\left(f_i\right)} is the burden test statistic under allele frequency threshold f_i, and can be constructed from single variant meta-analysis statistics using


T_{b\left(f_j\right)}=\boldsymbol{\phi}_{f_j}^\mathbf{T}\mathbf{U_{meta}}\bigg/\sqrt{\boldsymbol{\phi}_{f_j}^\mathbf{T}\mathbf{V_{meta}}\boldsymbol{\phi}_{f_j}} ,


where j represents any allele frequency in a group of rare variants, \boldsymbol{\phi}_{f_j} is a vector of 0 and 1, indicating if a variant is included in the analysis using frequency threshold f_i.


As described by Lin et. al, the p-value of this test can be calculated analytically using the fact that the burden test statistics together follow a multivariate normal distribution with mean \mathbf{0} and covariance \boldsymbol{\Omega}, written as


 \left(T_{b\left(f_1\right)},T_{b\left(f_2\right)},\dots,T_{b\left(f_m\right)}\right)\sim\mathbf{MVN}\left(\mathbf{0},\boldsymbol{\Omega}\right) ,


where \boldsymbol{\Omega_{ij}}=\frac{\boldsymbol{\phi}_{f_i}^T\mathbf{V_{meta}}\boldsymbol{\phi}_{f_j}}{\sqrt{\boldsymbol{\phi}_{f_i}^T\mathbf{V_{meta}}\boldsymbol{\phi}_{f_i}}\sqrt{\boldsymbol{\phi}_{f_j}^T\mathbf{V_{meta}}\boldsymbol{\phi}_{f_j}}}.

SKAT META ANALYSIS

SKAT is most powerful when detecting genes with rare variants having opposite directions in effect sizes. Meta-analysis statistic can also be re-constructed using single variant meta-analysis scores and their covariances

\mathbf{Q}=\mathbf{{U_{meta}}^T}\mathbf{W}\mathbf{U_{meta}},

where \mathbf{W} is a diagonal matrix of weights of rare variants included in a gene.

As shown in Wu et. al, the null distribution of the  \mathbf{Q} statistic follows a mixture chi-sqaured distribution described as

\mathbf{Q}\sim\sum_{i=1}^m{\lambda_i\chi_{1,i}^2}, where \left(\lambda_1,\lambda_2,\dots,\lambda_m\right) are eigen values of \mathbf{V_{meta}^\frac{1}{2}}\mathbf{W}\mathbf{V_{meta}^\frac{1}{2}}.