Open main menu

Genome Analysis Wiki β

Difference between revisions of "RAREMETAL METHOD"

(BURDEN META ANALYSIS)
(Tag category)
 
(26 intermediate revisions by 2 users not shown)
Line 18: Line 18:
  
 
<math> S </math> is the number of studies
 
<math> S </math> is the number of studies
 +
 +
<math> f_{i} </math> is the pooled allele frequency of <math>i^{th}</math> variant
 +
 +
<math> f_{i,k} </math> is the allele frequency of <math>i^{th}</math> variant in <math>k^{th}</math> study
 +
 +
<math> {\delta_{k}} </math> is the deviation of trait value of <math>k^{th}</math> study
  
 
<math> \mathbf{w^T} = (w_1,w_2,...,w_m)^T</math> is the vector of weights for <math>m</math> rare variants in a gene.
 
<math> \mathbf{w^T} = (w_1,w_2,...,w_m)^T</math> is the vector of weights for <math>m</math> rare variants in a gene.
Line 33: Line 39:
  
 
<math>T_{meta_i}=U_{meta_i}\bigg/\sqrt{V_{meta_i}}=\sum_{k=1}^S {U_{i,k}}\bigg/\sqrt{\sum_{k=1}^S{V_{ii,k}}} \sim\mathbf{N}(0,1)</math>.
 
<math>T_{meta_i}=U_{meta_i}\bigg/\sqrt{V_{meta_i}}=\sum_{k=1}^S {U_{i,k}}\bigg/\sqrt{\sum_{k=1}^S{V_{ii,k}}} \sim\mathbf{N}(0,1)</math>.
 +
 +
 +
'''Optimized method for unbalanced studies (--useExact)''':
 +
 +
<math>U_{meta_i}=\sum_{k=1}^S {U_{i,k}/\hat{\Omega_{k}}}-\sum_{k=1}^S{2n_{k}{\delta_{k}^{2}(f_{i}-f_{i,k})}}</math>
 +
 +
<math>V_{meta_i}={\sigma^{2}}\sum_{k=1}^S{(V_{ii,k}{\Omega_{k}}-4n_{k}(ff'-f_{k}f_{k}'))}</math>
 +
 +
<math>{\sigma^{2}}=\sum_{k=1}^S{((n_{k}-1){\Omega_{k}}+n_{k}{\delta_{k}^{2}})}/(n-1)</math>
  
 
===BURDEN META ANALYSIS===
 
===BURDEN META ANALYSIS===
Line 40: Line 55:
 
<math>T_{meta_{burden}}=\mathbf{w^TU_{meta}}\bigg/\sqrt{\mathbf{w^TV_{meta}w}} \sim\mathbf{N}(0,1)</math>,
 
<math>T_{meta_{burden}}=\mathbf{w^TU_{meta}}\bigg/\sqrt{\mathbf{w^TV_{meta}w}} \sim\mathbf{N}(0,1)</math>,
  
where <math>\mathbf{U_{meta}} = (U_{meta_1},U_{meta_2},...,U_{meta_m})</math> and <math> \mathbf{V_{meta}}=cov(\mathbf{U_{meta}})</math>, representing a vector of single variant meta-analysis scores of <math>m</math> variants in a gene and the covariance matrix of the scores across <math>m</math> variants.
+
where <math>\mathbf{U_{meta}} = (U_{meta_1},U_{meta_2},...,U_{meta_m})^T</math> and <math> \mathbf{V_{meta}}=cov(\mathbf{U_{meta}})</math>, representing a vector of single variant meta-analysis scores of <math>m</math> variants in a gene and the covariance matrix of the scores across <math>m</math> variants.
  
 
===VT META ANALYSIS===
 
===VT META ANALYSIS===
  
 
Including variants that are not associated to phenotype can hurt power. Variable threshold test is designed to choose the optimal allele frequency threshold amongst rare variants in a gene, to gain power. The test statistic is defined as the maximum burden score statistic calculated using every possible frequency threshold
 
Including variants that are not associated to phenotype can hurt power. Variable threshold test is designed to choose the optimal allele frequency threshold amongst rare variants in a gene, to gain power. The test statistic is defined as the maximum burden score statistic calculated using every possible frequency threshold
 +
  
 
<math>T_{meta_{VT}}=\max(T_{b\left(f_1\right)},T_{b\left(f_2\right)},\dots,T_{b\left(f_m\right)})</math>,
 
<math>T_{meta_{VT}}=\max(T_{b\left(f_1\right)},T_{b\left(f_2\right)},\dots,T_{b\left(f_m\right)})</math>,
  
 
where <math>T_{b\left(f_i\right)}</math> is the burden test statistic under allele frequency threshold <math>f_i</math>, and can be constructed from single variant meta-analysis statistics using
 
where <math>T_{b\left(f_i\right)}</math> is the burden test statistic under allele frequency threshold <math>f_i</math>, and can be constructed from single variant meta-analysis statistics using
 +
  
 
<math>T_{b\left(f_j\right)}=\boldsymbol{\phi}_{f_j}^\mathbf{T}\mathbf{U_{meta}}\bigg/\sqrt{\boldsymbol{\phi}_{f_j}^\mathbf{T}\mathbf{V_{meta}}\boldsymbol{\phi}_{f_j}} </math>,
 
<math>T_{b\left(f_j\right)}=\boldsymbol{\phi}_{f_j}^\mathbf{T}\mathbf{U_{meta}}\bigg/\sqrt{\boldsymbol{\phi}_{f_j}^\mathbf{T}\mathbf{V_{meta}}\boldsymbol{\phi}_{f_j}} </math>,
 +
  
 
where <math>j</math> represents any allele frequency in a group of rare variants, <math>\boldsymbol{\phi}_{f_j}</math> is a vector of 0 and 1, indicating if a variant is included in the analysis using frequency threshold <math>f_i</math>.
 
where <math>j</math> represents any allele frequency in a group of rare variants, <math>\boldsymbol{\phi}_{f_j}</math> is a vector of 0 and 1, indicating if a variant is included in the analysis using frequency threshold <math>f_i</math>.
 +
  
 
As described by [http://www.ncbi.nlm.nih.gov/pubmed/21885029 '''Lin et. al'''], the p-value of this test can be calculated analytically using the fact that the burden test statistics together follow a multivariate normal distribution with mean <math>\mathbf{0}</math> and covariance <math>\boldsymbol{\Omega}</math>, written as
 
As described by [http://www.ncbi.nlm.nih.gov/pubmed/21885029 '''Lin et. al'''], the p-value of this test can be calculated analytically using the fact that the burden test statistics together follow a multivariate normal distribution with mean <math>\mathbf{0}</math> and covariance <math>\boldsymbol{\Omega}</math>, written as
 +
  
 
<math> \left(T_{b\left(f_1\right)},T_{b\left(f_2\right)},\dots,T_{b\left(f_m\right)}\right)</math><math>\sim\mathbf{MVN}\left(\mathbf{0},\boldsymbol{\Omega}\right) </math>,  
 
<math> \left(T_{b\left(f_1\right)},T_{b\left(f_2\right)},\dots,T_{b\left(f_m\right)}\right)</math><math>\sim\mathbf{MVN}\left(\mathbf{0},\boldsymbol{\Omega}\right) </math>,  
  
where <math>\boldsymbol{\Omega_{ij}}=\frac{\boldsymbol{\phi}_{f_i}^T\mathbf{V_{meta}}\boldsymbol{\phi}_{f_j}}{\sqrt{\boldsymbol{\phi}_{f_i}^T\mathbf{V_{meta}}\boldsymbol{\phi}_{f_i}}\sqrt{\boldsymbol{\phi}_{f_j}^T\mathbf{V_{meta}}\boldsymbol{\phi}_{f_j}}}</math>
+
 
 +
where <math>\boldsymbol{\Omega_{ij}}=\frac{\boldsymbol{\phi}_{f_i}^T\mathbf{V_{meta}}\boldsymbol{\phi}_{f_j}}{\sqrt{\boldsymbol{\phi}_{f_i}^T\mathbf{V_{meta}}\boldsymbol{\phi}_{f_i}}\sqrt{\boldsymbol{\phi}_{f_j}^T\mathbf{V_{meta}}\boldsymbol{\phi}_{f_j}}}</math>.
  
 
===SKAT META ANALYSIS===
 
===SKAT META ANALYSIS===
Line 71: Line 92:
  
 
<math>\mathbf{Q}\sim\sum_{i=1}^m{\lambda_i\chi_{1,i}^2}, </math> where <math>\left(\lambda_1,\lambda_2,\dots,\lambda_m\right)</math> are eigen values of <math>\mathbf{V_{meta}^\frac{1}{2}}\mathbf{W}\mathbf{V_{meta}^\frac{1}{2}}</math>.
 
<math>\mathbf{Q}\sim\sum_{i=1}^m{\lambda_i\chi_{1,i}^2}, </math> where <math>\left(\lambda_1,\lambda_2,\dots,\lambda_m\right)</math> are eigen values of <math>\mathbf{V_{meta}^\frac{1}{2}}\mathbf{W}\mathbf{V_{meta}^\frac{1}{2}}</math>.
 +
 +
 +
[[Category:RAREMETAL]]

Latest revision as of 13:28, 20 May 2019

Contents

INTRODUCTION

The key idea behind meta-analysis with RAREMETAL is that various gene-level test statistics can be reconstructed from single variant score statistics and that, when the linkage disequilibrium relationships between variants are known, the distribution of these gene-level statistics can be derived and used to evaluate signifi-cance. Single variant statistics are calculated using the Cochran-Mantel-Haenszel method. Our method has been published in Liu et. al. The main formulae are tabulated in the following:

KEY FORMULAE

NOTATIONS

We denote the following to describe our methods:

U_{i,k} is the score statistic for the i^{th} variant from the  k^{th} study

V_{ij,k} is the covariance of the score statistics between the i^{th} and the j^{th} variant from the  k^{th} study

U_{i,k} and V_{ij,k} are described in detail in RAREMETALWORKER method.

\mathbf{U_k} is the vector of score statistics of rare variants in a gene from the  k^{th} study.

\mathbf{V_k} is the variance-covariance matrix of score statistics of rare variants in a gene from the  k^{th} study, or \mathbf{V_k} = cov(\mathbf{U_k})

 S is the number of studies

 f_{i} is the pooled allele frequency of i^{th} variant

 f_{i,k} is the allele frequency of i^{th} variant in k^{th} study

 {\delta_{k}} is the deviation of trait value of k^{th} study

 \mathbf{w^T} = (w_1,w_2,...,w_m)^T is the vector of weights for m rare variants in a gene.

SINGLE VARIANT META ANALYSIS

Single variant meta-analysis score statistic can be reconstructed from score statistics and their variances generated by each study, assuming that samples are unrelated across studies. Define meta-analysis score statistics as

U_{meta_i}=\sum_{k=1}^S {U_{i,k}}

and its variance

V_{meta_i}=\sum_{k=1}^S{V_{ii,k}}.

Then the score test statistics for the i^{th} variant T_{meta_i} asymptotically follows standard normal distribution

T_{meta_i}=U_{meta_i}\bigg/\sqrt{V_{meta_i}}=\sum_{k=1}^S {U_{i,k}}\bigg/\sqrt{\sum_{k=1}^S{V_{ii,k}}} \sim\mathbf{N}(0,1).


Optimized method for unbalanced studies (--useExact):

U_{meta_i}=\sum_{k=1}^S {U_{i,k}/\hat{\Omega_{k}}}-\sum_{k=1}^S{2n_{k}{\delta_{k}^{2}(f_{i}-f_{i,k})}}

V_{meta_i}={\sigma^{2}}\sum_{k=1}^S{(V_{ii,k}{\Omega_{k}}-4n_{k}(ff'-f_{k}f_{k}'))}

{\sigma^{2}}=\sum_{k=1}^S{((n_{k}-1){\Omega_{k}}+n_{k}{\delta_{k}^{2}})}/(n-1)

BURDEN META ANALYSIS

Burden test has been shown to be powerful detecting a group of rare variants that are unidirectional in effects. Once single variant meta analysis statistics are constructed, burden test score statistic for a gene can be easily reconstructed as

T_{meta_{burden}}=\mathbf{w^TU_{meta}}\bigg/\sqrt{\mathbf{w^TV_{meta}w}} \sim\mathbf{N}(0,1),

where \mathbf{U_{meta}} = (U_{meta_1},U_{meta_2},...,U_{meta_m})^T and  \mathbf{V_{meta}}=cov(\mathbf{U_{meta}}), representing a vector of single variant meta-analysis scores of m variants in a gene and the covariance matrix of the scores across m variants.

VT META ANALYSIS

Including variants that are not associated to phenotype can hurt power. Variable threshold test is designed to choose the optimal allele frequency threshold amongst rare variants in a gene, to gain power. The test statistic is defined as the maximum burden score statistic calculated using every possible frequency threshold


T_{meta_{VT}}=\max(T_{b\left(f_1\right)},T_{b\left(f_2\right)},\dots,T_{b\left(f_m\right)}),

where T_{b\left(f_i\right)} is the burden test statistic under allele frequency threshold f_i, and can be constructed from single variant meta-analysis statistics using


T_{b\left(f_j\right)}=\boldsymbol{\phi}_{f_j}^\mathbf{T}\mathbf{U_{meta}}\bigg/\sqrt{\boldsymbol{\phi}_{f_j}^\mathbf{T}\mathbf{V_{meta}}\boldsymbol{\phi}_{f_j}} ,


where j represents any allele frequency in a group of rare variants, \boldsymbol{\phi}_{f_j} is a vector of 0 and 1, indicating if a variant is included in the analysis using frequency threshold f_i.


As described by Lin et. al, the p-value of this test can be calculated analytically using the fact that the burden test statistics together follow a multivariate normal distribution with mean \mathbf{0} and covariance \boldsymbol{\Omega}, written as


 \left(T_{b\left(f_1\right)},T_{b\left(f_2\right)},\dots,T_{b\left(f_m\right)}\right)\sim\mathbf{MVN}\left(\mathbf{0},\boldsymbol{\Omega}\right) ,


where \boldsymbol{\Omega_{ij}}=\frac{\boldsymbol{\phi}_{f_i}^T\mathbf{V_{meta}}\boldsymbol{\phi}_{f_j}}{\sqrt{\boldsymbol{\phi}_{f_i}^T\mathbf{V_{meta}}\boldsymbol{\phi}_{f_i}}\sqrt{\boldsymbol{\phi}_{f_j}^T\mathbf{V_{meta}}\boldsymbol{\phi}_{f_j}}}.

SKAT META ANALYSIS

SKAT is most powerful when detecting genes with rare variants having opposite directions in effect sizes. Meta-analysis statistic can also be re-constructed using single variant meta-analysis scores and their covariances

\mathbf{Q}=\mathbf{{U_{meta}}^T}\mathbf{W}\mathbf{U_{meta}},

where \mathbf{W} is a diagonal matrix of weights of rare variants included in a gene.

As shown in Wu et. al, the null distribution of the  \mathbf{Q} statistic follows a mixture chi-sqaured distribution described as

\mathbf{Q}\sim\sum_{i=1}^m{\lambda_i\chi_{1,i}^2}, where \left(\lambda_1,\lambda_2,\dots,\lambda_m\right) are eigen values of \mathbf{V_{meta}^\frac{1}{2}}\mathbf{W}\mathbf{V_{meta}^\frac{1}{2}}.