Difference between revisions of "RAREMETALWORKER METHOD"

From Genome Analysis Wiki
Jump to: navigation, search
(Brief Introduction)
Line 3: Line 3:
 
[[RAREMETALWORKER]] generates single variant association test statistics for a single study prior to meta-analysis. This page provides a brief description of the statistics that  
 
[[RAREMETALWORKER]] generates single variant association test statistics for a single study prior to meta-analysis. This page provides a brief description of the statistics that  
 
RAREMETALWORKER calculates, together with key formulae.
 
RAREMETALWORKER calculates, together with key formulae.
 
== Modeling Relatedness ==
 
we use a variance component model to handle familial relationships. In a sample of n individuals, we model the observed phenotype vector (<math>\mathbf{y}</math>) as a sum of covariate effects (specified by a design matrix <math>\mathbf{X}</math> and a vector of covariate effects <math>\boldsymbol{\beta}</math>), additive genetic effects (modeled in vector <math>\mathbf{g}</math>) and non-shared environmental effects (modeled in vector <math>\boldsymbol{\varepsilon}</math>). Thus the null model is:
 
 
<math>\mathbf{y}=\mathbf{X}\boldsymbol{\beta} +\mathbf{g}+ \boldsymbol{\varepsilon}</math>
 
 
 
We assume that genetic effects are normally distributed, with mean <math>\mathbf{0}</math> and covariance <math>\mathbf{K}\sigma_g^2</math> where the matrix <math>\mathbf{K}</math> summarizes kinship coefficients between sampled individuals and  <math>\sigma_g^2</math> is a positive scalar describing the genetic contribution to the overall variance. We assume that non-shared environmental effects are normally distributed with mean <math>\mathbf{0}</math> and covariance <math>\mathbf{I}\sigma_e^2</math>, where <math>\mathbf{I}</math> is the identity matrix.
 
 
To estimate <math>\mathbf{K}</math>, we either use known pedigree structure to define <math>\mathbf{K}</math> or else use the empirical estimator <math>\mathbf{K}=\frac{1}{l}\sum_{i=1}^l{(G_i-2f_i\mathbf{1})(G_i-2f_i\mathbf{1})\over 4f_i(1-f_i)} </math>,
 
where <math>l</math> is the count of variants, <math>G_i</math> and <math>f_i</math> are the genotype vector and estimated allele frequency for the <math>i^{th}</math> variant, respectively. Each element in <math>G_i</math> encodes the minor allele count for one individual. Model parameters <math>\hat{\boldsymbol{\beta}}</math>, <math>\hat{\sigma_g^2}</math> and <math>\hat{\sigma_e^2}</math>, are estimated using maximum likelihood and the efficient algorithm described in Lippert et. al. For convenience, let the estimated covariance matrix of <math>\mathbf{y}</math> be <math>\hat{\boldsymbol{\Omega}}=2\hat{\sigma_g^2}\mathbf{K}+\hat{\sigma_e^2}\mathbf{I}</math>.
 
  
 
== Single Variant Score Tests ==
 
== Single Variant Score Tests ==
Line 35: Line 24:
  
 
RAREMETALWORKER also stores the covariance matrices (<math> \mathbf{V} </math>) of the score statistics of markers within a window.
 
RAREMETALWORKER also stores the covariance matrices (<math> \mathbf{V} </math>) of the score statistics of markers within a window.
 +
 +
== Modeling Relatedness ==
 +
we use a variance component model to handle familial relationships. In a sample of n individuals, we model the observed phenotype vector (<math>\mathbf{y}</math>) as a sum of covariate effects (specified by a design matrix <math>\mathbf{X}</math> and a vector of covariate effects <math>\boldsymbol{\beta}</math>), additive genetic effects (modeled in vector <math>\mathbf{g}</math>) and non-shared environmental effects (modeled in vector <math>\boldsymbol{\varepsilon}</math>). Thus the null model is:
 +
 +
<math>\mathbf{y}=\mathbf{X}\boldsymbol{\beta} +\mathbf{g}+ \boldsymbol{\varepsilon}</math>
 +
 +
 +
We assume that genetic effects are normally distributed, with mean <math>\mathbf{0}</math> and covariance <math>\mathbf{K}\sigma_g^2</math> where the matrix <math>\mathbf{K}</math> summarizes kinship coefficients between sampled individuals and  <math>\sigma_g^2</math> is a positive scalar describing the genetic contribution to the overall variance. We assume that non-shared environmental effects are normally distributed with mean <math>\mathbf{0}</math> and covariance <math>\mathbf{I}\sigma_e^2</math>, where <math>\mathbf{I}</math> is the identity matrix.
 +
 +
To estimate <math>\mathbf{K}</math>, we either use known pedigree structure to define <math>\mathbf{K}</math> or else use the empirical estimator <math>\mathbf{K}=\frac{1}{l}\sum_{i=1}^l{(G_i-2f_i\mathbf{1})(G_i-2f_i\mathbf{1})\over 4f_i(1-f_i)} </math>,
 +
where <math>l</math> is the count of variants, <math>G_i</math> and <math>f_i</math> are the genotype vector and estimated allele frequency for the <math>i^{th}</math> variant, respectively. Each element in <math>G_i</math> encodes the minor allele count for one individual. Model parameters <math>\hat{\boldsymbol{\beta}}</math>, <math>\hat{\sigma_g^2}</math> and <math>\hat{\sigma_e^2}</math>, are estimated using maximum likelihood and the efficient algorithm described in Lippert et. al. For convenience, let the estimated covariance matrix of <math>\mathbf{y}</math> be <math>\hat{\boldsymbol{\Omega}}=2\hat{\sigma_g^2}\mathbf{K}+\hat{\sigma_e^2}\mathbf{I}</math>.
  
 
==Chromosome X==
 
==Chromosome X==
  
 
To analyze markers on chromosome X, we fit an extra variance components <math> {{\sigma_g}_X}^2 </math>, to model the variance explained by chromosome X. A kinship for chromosome X, <math> \boldsymbol{K_X} </math>, can be estimated either from a pedigree, or from genotypes of marker from chromosome X. Then the estimated covariance matrix can be written as <math>\hat{\boldsymbol{\Omega}}=2\hat{\sigma_g^2}\mathbf{K}+2\hat{{\sigma_g}_X^2}\mathbf{K_X}+\hat{\sigma_e^2}\mathbf{I}</math>.
 
To analyze markers on chromosome X, we fit an extra variance components <math> {{\sigma_g}_X}^2 </math>, to model the variance explained by chromosome X. A kinship for chromosome X, <math> \boldsymbol{K_X} </math>, can be estimated either from a pedigree, or from genotypes of marker from chromosome X. Then the estimated covariance matrix can be written as <math>\hat{\boldsymbol{\Omega}}=2\hat{\sigma_g^2}\mathbf{K}+2\hat{{\sigma_g}_X^2}\mathbf{K_X}+\hat{\sigma_e^2}\mathbf{I}</math>.

Revision as of 09:56, 27 March 2014

Brief Introduction

RAREMETALWORKER generates single variant association test statistics for a single study prior to meta-analysis. This page provides a brief description of the statistics that RAREMETALWORKER calculates, together with key formulae.

Single Variant Score Tests

Our single variant association test is the score test using linear mixed model, treating single variants as fixed effects. The alternative model is:

 \mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\gamma_i(\mathbf{G_i}-\bar{\mathbf{G_i}})+\mathbf{g}+\boldsymbol{\varepsilon} .

In this model, the scalar parameter \gamma_i is to measure the additive genetic effect of the i^{th} variant. As usual, the score statistic for testing H_0:\gamma_i=0 is:

 U_i=(\mathbf{G_i}-\mathbf{\bar{G_i}} )^T \hat{\boldsymbol{\Omega}}^{-1}(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})

We further derive the variance-covariance matrix of these statistics as

 \mathbf{V}=(\mathbf{G}-\bar{\mathbf{G}})^T (\hat{\boldsymbol{\Omega}}^{-1}-\hat{\boldsymbol{\Omega}}^{-1} \mathbf{X}(\mathbf{X^T}\hat{\boldsymbol{\Omega}}^{-1}\mathbf{X})^{-1} \mathbf{X^T} \hat{\boldsymbol{\Omega}}^{-1})(\mathbf{G}-\bar{\mathbf{G}}) .

Under the null, test statistics T_i=(U_i^2)/V_{ii} is asymptotically distributed as chi-squared with one degree of freedom.

Summary Statistics and Covariance Matrices

RAREMETALWORKER automatically stores the score statistics for each marker (  U_i ) together with quality information of that marker, including HWE p-value, call rate, and allele counts.

RAREMETALWORKER also stores the covariance matrices ( \mathbf{V} ) of the score statistics of markers within a window.

Modeling Relatedness

we use a variance component model to handle familial relationships. In a sample of n individuals, we model the observed phenotype vector (\mathbf{y}) as a sum of covariate effects (specified by a design matrix \mathbf{X} and a vector of covariate effects \boldsymbol{\beta}), additive genetic effects (modeled in vector \mathbf{g}) and non-shared environmental effects (modeled in vector \boldsymbol{\varepsilon}). Thus the null model is:

\mathbf{y}=\mathbf{X}\boldsymbol{\beta} +\mathbf{g}+ \boldsymbol{\varepsilon}


We assume that genetic effects are normally distributed, with mean \mathbf{0} and covariance \mathbf{K}\sigma_g^2 where the matrix \mathbf{K} summarizes kinship coefficients between sampled individuals and \sigma_g^2 is a positive scalar describing the genetic contribution to the overall variance. We assume that non-shared environmental effects are normally distributed with mean \mathbf{0} and covariance \mathbf{I}\sigma_e^2, where \mathbf{I} is the identity matrix.

To estimate \mathbf{K}, we either use known pedigree structure to define \mathbf{K} or else use the empirical estimator \mathbf{K}=\frac{1}{l}\sum_{i=1}^l{(G_i-2f_i\mathbf{1})(G_i-2f_i\mathbf{1})\over 4f_i(1-f_i)} , where l is the count of variants, G_i and f_i are the genotype vector and estimated allele frequency for the i^{th} variant, respectively. Each element in G_i encodes the minor allele count for one individual. Model parameters \hat{\boldsymbol{\beta}}, \hat{\sigma_g^2} and \hat{\sigma_e^2}, are estimated using maximum likelihood and the efficient algorithm described in Lippert et. al. For convenience, let the estimated covariance matrix of \mathbf{y} be \hat{\boldsymbol{\Omega}}=2\hat{\sigma_g^2}\mathbf{K}+\hat{\sigma_e^2}\mathbf{I}.

Chromosome X

To analyze markers on chromosome X, we fit an extra variance components  {{\sigma_g}_X}^2 , to model the variance explained by chromosome X. A kinship for chromosome X,  \boldsymbol{K_X} , can be estimated either from a pedigree, or from genotypes of marker from chromosome X. Then the estimated covariance matrix can be written as \hat{\boldsymbol{\Omega}}=2\hat{\sigma_g^2}\mathbf{K}+2\hat{{\sigma_g}_X^2}\mathbf{K_X}+\hat{\sigma_e^2}\mathbf{I}.