# Difference between revisions of "RAREMETAL METHOD"

Jump to navigationJump to search

## INTRODUCTION

The key idea behind meta-analysis with RAREMETAL is that various gene-level test statistics can be reconstructed from single variant score statistics and that, when the linkage disequilibrium relationships between variants are known, the distribution of these gene-level statistics can be derived and used to evaluate signifi-cance. Single variant statistics are calculated using the Cochran-Mantel-Haenszel method. Our method has been published in Liu et. al. The main formulae are tabulated in the following:

## KEY FORMULAE

### NOTATIONS

We denote the following to describe our methods:

${\displaystyle U_{i,k}}$  is the score statistic for the ${\displaystyle i^{th}}$  variant from the ${\displaystyle k^{th}}$  study

${\displaystyle V_{ij,k}}$  is the covariance of the score statistics between the ${\displaystyle i^{th}}$  and the ${\displaystyle j^{th}}$  variant from the ${\displaystyle k^{th}}$  study

${\displaystyle U_{i,k}}$  and ${\displaystyle V_{ij,k}}$  are described in detail in RAREMETALWORKER method.

${\displaystyle \mathbf {U_{k}} }$  is the vector of score statistics of rare variants in a gene from the ${\displaystyle k^{th}}$  study.

${\displaystyle \mathbf {V_{k}} }$  is the variance-covariance matrix of score statistics of rare variants in a gene from the ${\displaystyle k^{th}}$  study, or ${\displaystyle \mathbf {V_{k}} =cov(\mathbf {U_{k}} )}$

${\displaystyle S}$  is the number of studies

${\displaystyle f_{i}}$  is the pooled allele frequency of ${\displaystyle i^{th}}$  variant

${\displaystyle f_{i,k}}$  is the allele frequency of ${\displaystyle i^{th}}$  variant in ${\displaystyle k^{th}}$  study

${\displaystyle {\delta _{k}}}$  is the deviation of trait value of ${\displaystyle k^{th}}$  study

${\displaystyle \mathbf {w^{T}} =(w_{1},w_{2},...,w_{m})^{T}}$  is the vector of weights for ${\displaystyle m}$  rare variants in a gene.

### SINGLE VARIANT META ANALYSIS

Single variant meta-analysis score statistic can be reconstructed from score statistics and their variances generated by each study, assuming that samples are unrelated across studies. Define meta-analysis score statistics as

${\displaystyle U_{meta_{i}}=\sum _{k=1}^{S}{U_{i,k}}}$

and its variance

${\displaystyle V_{meta_{i}}=\sum _{k=1}^{S}{V_{ii,k}}}$ .

Then the score test statistics for the ${\displaystyle i^{th}}$  variant ${\displaystyle T_{meta_{i}}}$  asymptotically follows standard normal distribution

${\displaystyle T_{meta_{i}}=U_{meta_{i}}{\bigg /}{\sqrt {V_{meta_{i}}}}=\sum _{k=1}^{S}{U_{i,k}}{\bigg /}{\sqrt {\sum _{k=1}^{S}{V_{ii,k}}}}\sim \mathbf {N} (0,1)}$ .

Optimized method for unbalanced studies (--useExact):

${\displaystyle U_{meta_{i}}=\sum _{k=1}^{S}{U_{i,k}/{\hat {\Omega _{k}}}}-\sum _{k=1}^{S}{2n_{k}{\delta _{k}^{2}(f_{i}-f_{i,k})}}}$

${\displaystyle V_{meta_{i}}={\sigma ^{2}}\sum _{k=1}^{S}{(V_{ii,k}{\Omega _{k}}-4n_{k}(ff'-f_{k}f_{k}'))}}$

${\displaystyle {\sigma ^{2}}=\sum _{k=1}^{S}{((n_{k}-1){\Omega _{k}}+n_{k}{\delta _{k}^{2}})}/(n-1)}$

### BURDEN META ANALYSIS

Burden test has been shown to be powerful detecting a group of rare variants that are unidirectional in effects. Once single variant meta analysis statistics are constructed, burden test score statistic for a gene can be easily reconstructed as

${\displaystyle T_{meta_{burden}}=\mathbf {w^{T}U_{meta}} {\bigg /}{\sqrt {\mathbf {w^{T}V_{meta}w} }}\sim \mathbf {N} (0,1)}$ ,

where ${\displaystyle \mathbf {U_{meta}} =(U_{meta_{1}},U_{meta_{2}},...,U_{meta_{m}})^{T}}$  and ${\displaystyle \mathbf {V_{meta}} =cov(\mathbf {U_{meta}} )}$ , representing a vector of single variant meta-analysis scores of ${\displaystyle m}$  variants in a gene and the covariance matrix of the scores across ${\displaystyle m}$  variants.

### VT META ANALYSIS

Including variants that are not associated to phenotype can hurt power. Variable threshold test is designed to choose the optimal allele frequency threshold amongst rare variants in a gene, to gain power. The test statistic is defined as the maximum burden score statistic calculated using every possible frequency threshold

${\displaystyle T_{meta_{VT}}=\max(T_{b\left(f_{1}\right)},T_{b\left(f_{2}\right)},\dots ,T_{b\left(f_{m}\right)})}$ ,

where ${\displaystyle T_{b\left(f_{i}\right)}}$  is the burden test statistic under allele frequency threshold ${\displaystyle f_{i}}$ , and can be constructed from single variant meta-analysis statistics using

${\displaystyle T_{b\left(f_{j}\right)}={\boldsymbol {\phi }}_{f_{j}}^{\mathbf {T} }\mathbf {U_{meta}} {\bigg /}{\sqrt {{\boldsymbol {\phi }}_{f_{j}}^{\mathbf {T} }\mathbf {V_{meta}} {\boldsymbol {\phi }}_{f_{j}}}}}$ ,

where ${\displaystyle j}$  represents any allele frequency in a group of rare variants, ${\displaystyle {\boldsymbol {\phi }}_{f_{j}}}$  is a vector of 0 and 1, indicating if a variant is included in the analysis using frequency threshold ${\displaystyle f_{i}}$ .

As described by Lin et. al, the p-value of this test can be calculated analytically using the fact that the burden test statistics together follow a multivariate normal distribution with mean ${\displaystyle \mathbf {0} }$  and covariance ${\displaystyle {\boldsymbol {\Omega }}}$ , written as

${\displaystyle \left(T_{b\left(f_{1}\right)},T_{b\left(f_{2}\right)},\dots ,T_{b\left(f_{m}\right)}\right)}$ ${\displaystyle \sim \mathbf {MVN} \left(\mathbf {0} ,{\boldsymbol {\Omega }}\right)}$ ,

where ${\displaystyle {\boldsymbol {\Omega _{ij}}}={\frac {{\boldsymbol {\phi }}_{f_{i}}^{T}\mathbf {V_{meta}} {\boldsymbol {\phi }}_{f_{j}}}{{\sqrt {{\boldsymbol {\phi }}_{f_{i}}^{T}\mathbf {V_{meta}} {\boldsymbol {\phi }}_{f_{i}}}}{\sqrt {{\boldsymbol {\phi }}_{f_{j}}^{T}\mathbf {V_{meta}} {\boldsymbol {\phi }}_{f_{j}}}}}}}$ .

### SKAT META ANALYSIS

SKAT is most powerful when detecting genes with rare variants having opposite directions in effect sizes. Meta-analysis statistic can also be re-constructed using single variant meta-analysis scores and their covariances

${\displaystyle \mathbf {Q} =\mathbf {{U_{meta}}^{T}} \mathbf {W} \mathbf {U_{meta}} }$ ,

where ${\displaystyle \mathbf {W} }$  is a diagonal matrix of weights of rare variants included in a gene.

As shown in Wu et. al, the null distribution of the ${\displaystyle \mathbf {Q} }$  statistic follows a mixture chi-sqaured distribution described as

${\displaystyle \mathbf {Q} \sim \sum _{i=1}^{m}{\lambda _{i}\chi _{1,i}^{2}},}$  where ${\displaystyle \left(\lambda _{1},\lambda _{2},\dots ,\lambda _{m}\right)}$  are eigen values of ${\displaystyle \mathbf {V_{meta}^{\frac {1}{2}}} \mathbf {W} \mathbf {V_{meta}^{\frac {1}{2}}} }$ .