From Genome Analysis Wiki
Revision as of 14:23, 27 November 2016 by Saichen (talk | contribs) (SINGLE VARIANT META ANALYSIS)
Jump to: navigation, search


The key idea behind meta-analysis with RAREMETAL is that various gene-level test statistics can be reconstructed from single variant score statistics and that, when the linkage disequilibrium relationships between variants are known, the distribution of these gene-level statistics can be derived and used to evaluate signifi-cance. Single variant statistics are calculated using the Cochran-Mantel-Haenszel method. Our method has been published in Liu et. al. The main formulae are tabulated in the following:



We denote the following to describe our methods:

U_{i,k} is the score statistic for the i^{th} variant from the  k^{th} study

V_{ij,k} is the covariance of the score statistics between the i^{th} and the j^{th} variant from the  k^{th} study

U_{i,k} and V_{ij,k} are described in detail in RAREMETALWORKER method.

\mathbf{U_k} is the vector of score statistics of rare variants in a gene from the  k^{th} study.

\mathbf{V_k} is the variance-covariance matrix of score statistics of rare variants in a gene from the  k^{th} study, or \mathbf{V_k} = cov(\mathbf{U_k})

 S is the number of studies

 \mathbf{w^T} = (w_1,w_2,...,w_m)^T is the vector of weights for m rare variants in a gene.


Single variant meta-analysis score statistic can be reconstructed from score statistics and their variances generated by each study, assuming that samples are unrelated across studies. Define meta-analysis score statistics as

U_{meta_i}=\sum_{k=1}^S {U_{i,k}}

and its variance


Then the score test statistics for the i^{th} variant T_{meta_i} asymptotically follows standard normal distribution

T_{meta_i}=U_{meta_i}\bigg/\sqrt{V_{meta_i}}=\sum_{k=1}^S {U_{i,k}}\bigg/\sqrt{\sum_{k=1}^S{V_{ii,k}}} \sim\mathbf{N}(0,1).

Optimized method for unbalanced studies:

U_{meta_i}=\sum_{k=1}^S {U_{i,k}}-\sum_{k=1}^S{2n_{k}{\delta_{k}}}


Burden test has been shown to be powerful detecting a group of rare variants that are unidirectional in effects. Once single variant meta analysis statistics are constructed, burden test score statistic for a gene can be easily reconstructed as

T_{meta_{burden}}=\mathbf{w^TU_{meta}}\bigg/\sqrt{\mathbf{w^TV_{meta}w}} \sim\mathbf{N}(0,1),

where \mathbf{U_{meta}} = (U_{meta_1},U_{meta_2},...,U_{meta_m})^T and  \mathbf{V_{meta}}=cov(\mathbf{U_{meta}}), representing a vector of single variant meta-analysis scores of m variants in a gene and the covariance matrix of the scores across m variants.


Including variants that are not associated to phenotype can hurt power. Variable threshold test is designed to choose the optimal allele frequency threshold amongst rare variants in a gene, to gain power. The test statistic is defined as the maximum burden score statistic calculated using every possible frequency threshold


where T_{b\left(f_i\right)} is the burden test statistic under allele frequency threshold f_i, and can be constructed from single variant meta-analysis statistics using

T_{b\left(f_j\right)}=\boldsymbol{\phi}_{f_j}^\mathbf{T}\mathbf{U_{meta}}\bigg/\sqrt{\boldsymbol{\phi}_{f_j}^\mathbf{T}\mathbf{V_{meta}}\boldsymbol{\phi}_{f_j}} ,

where j represents any allele frequency in a group of rare variants, \boldsymbol{\phi}_{f_j} is a vector of 0 and 1, indicating if a variant is included in the analysis using frequency threshold f_i.

As described by Lin et. al, the p-value of this test can be calculated analytically using the fact that the burden test statistics together follow a multivariate normal distribution with mean \mathbf{0} and covariance \boldsymbol{\Omega}, written as

 \left(T_{b\left(f_1\right)},T_{b\left(f_2\right)},\dots,T_{b\left(f_m\right)}\right)\sim\mathbf{MVN}\left(\mathbf{0},\boldsymbol{\Omega}\right) ,

where \boldsymbol{\Omega_{ij}}=\frac{\boldsymbol{\phi}_{f_i}^T\mathbf{V_{meta}}\boldsymbol{\phi}_{f_j}}{\sqrt{\boldsymbol{\phi}_{f_i}^T\mathbf{V_{meta}}\boldsymbol{\phi}_{f_i}}\sqrt{\boldsymbol{\phi}_{f_j}^T\mathbf{V_{meta}}\boldsymbol{\phi}_{f_j}}}.


SKAT is most powerful when detecting genes with rare variants having opposite directions in effect sizes. Meta-analysis statistic can also be re-constructed using single variant meta-analysis scores and their covariances


where \mathbf{W} is a diagonal matrix of weights of rare variants included in a gene.

As shown in Wu et. al, the null distribution of the  \mathbf{Q} statistic follows a mixture chi-sqaured distribution described as

\mathbf{Q}\sim\sum_{i=1}^m{\lambda_i\chi_{1,i}^2}, where \left(\lambda_1,\lambda_2,\dots,\lambda_m\right) are eigen values of \mathbf{V_{meta}^\frac{1}{2}}\mathbf{W}\mathbf{V_{meta}^\frac{1}{2}}.