RAREMETALWORKER METHOD

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Brief Introduction

RAREMETALWORKER generates single variant association test statistics for a single study prior to meta-analysis. This page provides a brief description of the statistics that RAREMETALWORKER calculates, together with key formulae.

Key Statistics for Analysis of Single Study

NOTATIONS

We use the following notations to describe our methods:

${\displaystyle \mathbf {y} }$ is the observed phenotype vector

${\displaystyle {\hat {\boldsymbol {\Omega }}}}$ estimated covariance matrix of ${\displaystyle \mathbf {y} }$

${\displaystyle \mathbf {X} }$ is the design matrix

${\displaystyle \mathbf {G_{i}} }$ is the genotype vector of the ${\displaystyle i^{th}}$ variant

${\displaystyle {\bar {\mathbf {G_{i}} }}}$ is the vector of average genotype of the ${\displaystyle i^{th}}$ variant

${\displaystyle {\boldsymbol {\beta _{c}}}}$ is the vector of covariate effects

${\displaystyle \beta _{i}}$ is the scalar of fixed genetic effect of the ${\displaystyle i^{th}}$ variant

${\displaystyle \mathbf {g} }$ is the random genetic effects

${\displaystyle {\boldsymbol {\varepsilon }}}$ is the non-shared environmental effects

SINGLE VARIANT SCORE TEST

We used the following model for the trait:

${\displaystyle \mathbf {y} =\mathbf {X} {\boldsymbol {\beta _{c}}}+\beta _{i}(\mathbf {G_{i}} -{\bar {\mathbf {G_{i}} }})+\mathbf {g} +{\boldsymbol {\varepsilon }}}$.

Here, [explain the formula].

In this model, ${\displaystyle \beta _{i}}$ is to measure the additive genetic effect of the ${\displaystyle i^{th}}$ variant. As usual, the score statistic for testing ${\displaystyle H_{0}:\beta _{i}=0}$ is:

${\displaystyle U_{i}=(\mathbf {G_{i}} -\mathbf {\bar {G_{i}}} )^{T}{\hat {\boldsymbol {\Omega }}}^{-1}(\mathbf {y} -\mathbf {X} {\boldsymbol {\beta }})}$

We further derive the variance-covariance matrix of these statistics as

${\displaystyle \mathbf {V} =(\mathbf {G} -{\bar {\mathbf {G} }})^{T}({\hat {\boldsymbol {\Omega }}}^{-1}-{\hat {\boldsymbol {\Omega }}}^{-1}\mathbf {X} (\mathbf {X^{T}} {\hat {\boldsymbol {\Omega }}}^{-1}\mathbf {X} )^{-1}\mathbf {X^{T}} {\hat {\boldsymbol {\Omega }}}^{-1})(\mathbf {G} -{\bar {\mathbf {G} }})}$.

The score test statistic, ${\displaystyle T_{i}=(U_{i}^{2})/V_{ii}}$, is asymptotically distributed as chi-squared with one degree of freedom. The score test p-value is reported in RAREMETALWORKER.

SUMMARY STATISTICS AND COVARIANCE MATRICES

RAREMETALWORKER automatically stores the score statistics for each marker ( ${\displaystyle U_{i}}$) together with quality information of that marker, including HWE p-value, call rate, and allele counts.

RAREMETALWORKER also stores the covariance matrices (${\displaystyle \mathbf {V} }$) of the score statistics of markers within a window, size of which can be specified through command line.

MODELING RELATEDNESS

We use a variance component model to handle familial relationships. We estimate the variance components under the null model:

${\displaystyle \mathbf {y} =\mathbf {X} {\boldsymbol {\beta }}+\mathbf {g} +{\boldsymbol {\varepsilon }}}$

We assume that genetic effects are normally distributed, with mean ${\displaystyle \mathbf {0} }$ and covariance ${\displaystyle \mathbf {K} \sigma _{g}^{2}}$ where the matrix ${\displaystyle \mathbf {K} }$ summarizes kinship coefficients between sampled individuals and ${\displaystyle \sigma _{g}^{2}}$ is a positive scalar describing the genetic contribution to the overall variance. We assume that non-shared environmental effects are normally distributed with mean ${\displaystyle \mathbf {0} }$ and covariance ${\displaystyle \mathbf {I} \sigma _{e}^{2}}$, where ${\displaystyle \mathbf {I} }$ is the identity matrix.

To estimate ${\displaystyle \mathbf {K} }$, we either use known pedigree structure to define ${\displaystyle \mathbf {K} }$ or else use the empirical estimator

${\displaystyle \mathbf {K} ={\frac {1}{l}}\sum _{i=1}^{l}{(G_{i}-2f_{i}\mathbf {1} )(G_{i}-2f_{i}\mathbf {1} ) \over 4f_{i}(1-f_{i})}}$,

where ${\displaystyle l}$ is the count of variants, ${\displaystyle G_{i}}$ and ${\displaystyle f_{i}}$ are the genotype vector and estimated allele frequency for the ${\displaystyle i^{th}}$ variant, respectively. Each element in ${\displaystyle G_{i}}$ encodes the minor allele count for one individual. Model parameters ${\displaystyle {\hat {\boldsymbol {\beta }}}}$, ${\displaystyle {\hat {\sigma _{g}^{2}}}}$ and ${\displaystyle {\hat {\sigma _{e}^{2}}}}$, are estimated using maximum likelihood and the efficient algorithm described in Lippert et. al. For convenience, let the estimated covariance matrix of ${\displaystyle \mathbf {y} }$ be ${\displaystyle {\hat {\boldsymbol {\Omega }}}={\hat {\sigma _{g}^{2}}}\mathbf {K} +{\hat {\sigma _{e}^{2}}}\mathbf {I} }$.

Chromosome X

To analyze markers on chromosome X, we fit an extra variance components ${\displaystyle {{\sigma _{g}}_{X}}^{2}}$, to model the variance explained by chromosome X. A kinship for chromosome X, ${\displaystyle {\boldsymbol {K_{X}}}}$, can be estimated either from a pedigree, or from genotypes of marker from chromosome X. Then the estimated covariance matrix can be written as ${\displaystyle {\hat {\boldsymbol {\Omega }}}={\hat {\sigma _{g}^{2}}}\mathbf {K} +{\hat {{\sigma _{g}}_{X}^{2}}}\mathbf {K_{X}} +{\hat {\sigma _{e}^{2}}}\mathbf {I} }$.