From Genome Analysis Wiki
Revision as of 12:08, 1 April 2014 by Shuang Feng (talk | contribs) (Single Variant Score Tests)
Jump to: navigation, search

Brief Introduction

RAREMETALWORKER generates single variant association test statistics for a single study prior to meta-analysis. This page provides a brief description of the statistics that RAREMETALWORKER calculates, together with key formulae.

Key Statistics for Analysis of Single Study

We use the following notations to describe our methods:

\mathbf{y} is the observed phenotype vector

\mathbf{X} is the design matrix

\boldsymbol{\beta_c} is the vector of covariate effects

\beta_i is the scalar of fixed genetic effect of the i^{th} variant

\mathbf{g} is the random genetic effects

\boldsymbol{\varepsilon} is the non-shared environmental effects

Single Variant Score Tests

We used the following model for the trait:

 \mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\beta_i(\mathbf{G_i}-\bar{\mathbf{G_i}})+\mathbf{g}+\boldsymbol{\varepsilon} .

Here, [explain the formula].

In this model, \gamma_i is to measure the additive genetic effect of the i^{th} variant. As usual, the score statistic for testing H_0:\gamma_i=0 is:

 U_i=(\mathbf{G_i}-\mathbf{\bar{G_i}} )^T \hat{\boldsymbol{\Omega}}^{-1}(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})

We further derive the variance-covariance matrix of these statistics as

 \mathbf{V}=(\mathbf{G}-\bar{\mathbf{G}})^T (\hat{\boldsymbol{\Omega}}^{-1}-\hat{\boldsymbol{\Omega}}^{-1} \mathbf{X}(\mathbf{X^T}\hat{\boldsymbol{\Omega}}^{-1}\mathbf{X})^{-1} \mathbf{X^T} \hat{\boldsymbol{\Omega}}^{-1})(\mathbf{G}-\bar{\mathbf{G}}) .

Under the null, test statistics T_i=(U_i^2)/V_{ii} is asymptotically distributed as chi-squared with one degree of freedom.

Summary Statistics and Covariance Matrices

RAREMETALWORKER automatically stores the score statistics for each marker (  U_i ) together with quality information of that marker, including HWE p-value, call rate, and allele counts.

RAREMETALWORKER also stores the covariance matrices ( \mathbf{V} ) of the score statistics of markers within a window.

Modeling Relatedness

we use a variance component model to handle familial relationships. In a sample of n individuals, we model the observed phenotype vector (\mathbf{y}) as a sum of covariate effects (specified by a design matrix \mathbf{X} and a vector of covariate effects \boldsymbol{\beta}), additive genetic effects (modeled in vector \mathbf{g}) and non-shared environmental effects (modeled in vector \boldsymbol{\varepsilon}). Thus the null model is:

\mathbf{y}=\mathbf{X}\boldsymbol{\beta} +\mathbf{g}+ \boldsymbol{\varepsilon}

We assume that genetic effects are normally distributed, with mean \mathbf{0} and covariance \mathbf{K}\sigma_g^2 where the matrix \mathbf{K} summarizes kinship coefficients between sampled individuals and \sigma_g^2 is a positive scalar describing the genetic contribution to the overall variance. We assume that non-shared environmental effects are normally distributed with mean \mathbf{0} and covariance \mathbf{I}\sigma_e^2, where \mathbf{I} is the identity matrix.

To estimate \mathbf{K}, we either use known pedigree structure to define \mathbf{K} or else use the empirical estimator \mathbf{K}=\frac{1}{l}\sum_{i=1}^l{(G_i-2f_i\mathbf{1})(G_i-2f_i\mathbf{1})\over 4f_i(1-f_i)} , where l is the count of variants, G_i and f_i are the genotype vector and estimated allele frequency for the i^{th} variant, respectively. Each element in G_i encodes the minor allele count for one individual. Model parameters \hat{\boldsymbol{\beta}}, \hat{\sigma_g^2} and \hat{\sigma_e^2}, are estimated using maximum likelihood and the efficient algorithm described in Lippert et. al. For convenience, let the estimated covariance matrix of \mathbf{y} be \hat{\boldsymbol{\Omega}}=2\hat{\sigma_g^2}\mathbf{K}+\hat{\sigma_e^2}\mathbf{I}.

Chromosome X

To analyze markers on chromosome X, we fit an extra variance components  {{\sigma_g}_X}^2 , to model the variance explained by chromosome X. A kinship for chromosome X,  \boldsymbol{K_X} , can be estimated either from a pedigree, or from genotypes of marker from chromosome X. Then the estimated covariance matrix can be written as \hat{\boldsymbol{\Omega}}=2\hat{\sigma_g^2}\mathbf{K}+2\hat{{\sigma_g}_X^2}\mathbf{K_X}+\hat{\sigma_e^2}\mathbf{I}.