RAREMETALWORKER METHOD

Brief Introduction

RAREMETALWORKER generates single variant association test statistics for a single study prior to meta-analysis. This page provides a brief description of the statistics that RAREMETALWORKER calculates, together with key formulae.

Key Statistics for Analysis of Single Study

NOTATIONS

We use the following notations to describe our methods:

$\mathbf {y}$ is the observed phenotype vector

$\mathbf {X}$ is the design matrix

$\mathbf {G_{i}}$ is the genotype vector of the $i^{th}$ variant

${\bar {\mathbf {G_{i}} }}$ is the vector of average genotype of the $i^{th}$ variant

${\boldsymbol {\beta _{c}}}$ is the vector of covariate effects

$\beta _{i}$ is the scalar of fixed genetic effect of the $i^{th}$ variant

$\mathbf {g}$ is the random genetic effects

${\boldsymbol {\varepsilon }}$ is the non-shared environmental effects

SUMMARY STATISTICS AND COVARIANCE MATRICES

We used the following model for the trait:

$\mathbf {y} =\mathbf {X} {\boldsymbol {\beta _{c}}}+\beta _{i}(\mathbf {G_{i}} -{\bar {\mathbf {G_{i}} }})+\mathbf {g} +{\boldsymbol {\varepsilon }}$ .

Here, [explain the formula].

In this model, $\beta _{i}$ is to measure the additive genetic effect of the $i^{th}$ variant. As usual, the score statistic for testing $H_{0}:\beta _{i}=0$ is:

$U_{i}=(\mathbf {G_{i}} -\mathbf {\bar {G_{i}}} )^{T}{\hat {\boldsymbol {\Omega }}}^{-1}(\mathbf {y} -\mathbf {X} {\boldsymbol {\beta }})$

We further derive the variance-covariance matrix of these statistics as

$\mathbf {V} =(\mathbf {G} -{\bar {\mathbf {G} }})^{T}({\hat {\boldsymbol {\Omega }}}^{-1}-{\hat {\boldsymbol {\Omega }}}^{-1}\mathbf {X} (\mathbf {X^{T}} {\hat {\boldsymbol {\Omega }}}^{-1}\mathbf {X} )^{-1}\mathbf {X^{T}} {\hat {\boldsymbol {\Omega }}}^{-1})(\mathbf {G} -{\bar {\mathbf {G} }})$ .

The score test statistic, $T_{i}=(U_{i}^{2})/V_{ii}$ , is asymptotically distributed as chi-squared with one degree of freedom. The score test p-value is reported in RAREMETALWORKER.

Summary Statistics and Covariance Matrices

RAREMETALWORKER automatically stores the score statistics for each marker ( $U_{i}$ ) together with quality information of that marker, including HWE p-value, call rate, and allele counts.

RAREMETALWORKER also stores the covariance matrices ( $\mathbf {V}$ ) of the score statistics of markers within a window.

Modeling Relatedness

we use a variance component model to handle familial relationships. In a sample of n individuals, we model the observed phenotype vector ( $\mathbf {y}$ ) as a sum of covariate effects (specified by a design matrix $\mathbf {X}$ and a vector of covariate effects ${\boldsymbol {\beta }}$ ), additive genetic effects (modeled in vector $\mathbf {g}$ ) and non-shared environmental effects (modeled in vector ${\boldsymbol {\varepsilon }}$ ). Thus the null model is:

$\mathbf {y} =\mathbf {X} {\boldsymbol {\beta }}+\mathbf {g} +{\boldsymbol {\varepsilon }}$

We assume that genetic effects are normally distributed, with mean $\mathbf {0}$ and covariance $\mathbf {K} \sigma _{g}^{2}$ where the matrix $\mathbf {K}$ summarizes kinship coefficients between sampled individuals and $\sigma _{g}^{2}$ is a positive scalar describing the genetic contribution to the overall variance. We assume that non-shared environmental effects are normally distributed with mean $\mathbf {0}$ and covariance $\mathbf {I} \sigma _{e}^{2}$ , where $\mathbf {I}$ is the identity matrix.

To estimate $\mathbf {K}$ , we either use known pedigree structure to define $\mathbf {K}$ or else use the empirical estimator $\mathbf {K} ={\frac {1}{l}}\sum _{i=1}^{l}{(G_{i}-2f_{i}\mathbf {1} )(G_{i}-2f_{i}\mathbf {1} ) \over 4f_{i}(1-f_{i})}$ , where $l$ is the count of variants, $G_{i}$ and $f_{i}$ are the genotype vector and estimated allele frequency for the $i^{th}$ variant, respectively. Each element in $G_{i}$ encodes the minor allele count for one individual. Model parameters ${\hat {\boldsymbol {\beta }}}$ , ${\hat {\sigma _{g}^{2}}}$ and ${\hat {\sigma _{e}^{2}}}$ , are estimated using maximum likelihood and the efficient algorithm described in Lippert et. al. For convenience, let the estimated covariance matrix of $\mathbf {y}$ be ${\hat {\boldsymbol {\Omega }}}=2{\hat {\sigma _{g}^{2}}}\mathbf {K} +{\hat {\sigma _{e}^{2}}}\mathbf {I}$ .

Chromosome X

To analyze markers on chromosome X, we fit an extra variance components ${{\sigma _{g}}_{X}}^{2}$ , to model the variance explained by chromosome X. A kinship for chromosome X, ${\boldsymbol {K_{X}}}$ , can be estimated either from a pedigree, or from genotypes of marker from chromosome X. Then the estimated covariance matrix can be written as ${\hat {\boldsymbol {\Omega }}}=2{\hat {\sigma _{g}^{2}}}\mathbf {K} +2{\hat {{\sigma _{g}}_{X}^{2}}}\mathbf {K_{X}} +{\hat {\sigma _{e}^{2}}}\mathbf {I}$ .

RAREMETALWORKER METHOD

Contents

Brief Introduction

Key Statistics for Analysis of Single Study

NOTATIONS

SUMMARY STATISTICS AND COVARIANCE MATRICES

Summary Statistics and Covariance Matrices

Modeling Relatedness

Chromosome X

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools