Changes

RAREMETALWORKER METHOD (view source)

Revision as of 09:56, 27 March 2014

No change in size , 09:56, 27 March 2014

no edit summary

Line 3: Line 3:

[[RAREMETALWORKER]] generates single variant association test statistics for a single study prior to meta-analysis. This page provides a brief description of the statistics that

RAREMETALWORKER calculates, together with key formulae.

−

~~== Modeling Relatedness ==~~

−

we use a variance component model to handle familial relationships. In a sample of n individuals, we model the observed phenotype vector (<math>\mathbf{y}</math>) as a sum of covariate effects (specified by a design matrix <math>\mathbf{X}</math> and a vector of covariate effects <math>\boldsymbol{\beta}</math>), additive genetic effects (modeled in vector <math>\mathbf{g}</math>) and non-shared environmental effects (modeled in vector <math>\boldsymbol{\varepsilon}</math>). Thus the null model is:

−

~~<math>\mathbf{y}=\mathbf{X}\boldsymbol{\beta} +\mathbf{g}+ \boldsymbol{\varepsilon}</math>~~

−

We assume that genetic effects are normally distributed, with mean <math>\mathbf{0}</math> and covariance <math>\mathbf{K}\sigma_g^2</math> where the matrix <math>\mathbf{K}</math> summarizes kinship coefficients between sampled individuals and <math>\sigma_g^2</math> is a positive scalar describing the genetic contribution to the overall variance. We assume that non-shared environmental effects are normally distributed with mean <math>\mathbf{0}</math> and covariance <math>\mathbf{I}\sigma_e^2</math>, where <math>\mathbf{I}</math> is the identity matrix.

−

To estimate <math>\mathbf{K}</math>, we either use known pedigree structure to define <math>\mathbf{K}</math> or else use the empirical estimator <math>\mathbf{K}=\frac{1}{l}\sum_{i=1}^l{(G_i-2f_i\mathbf{1})(G_i-2f_i\mathbf{1})\over 4f_i(1-f_i)} </math>,

−

where <math>l</math> is the count of variants, <math>G_i</math> and <math>f_i</math> are the genotype vector and estimated allele frequency for the <math>i^{th}</math> variant, respectively. Each element in <math>G_i</math> encodes the minor allele count for one individual. Model parameters <math>\hat{\boldsymbol{\beta}}</math>, <math>\hat{\sigma_g^2}</math> and <math>\hat{\sigma_e^2}</math>, are estimated using maximum likelihood and the efficient algorithm described in Lippert et. al. For convenience, let the estimated covariance matrix of <math>\mathbf{y}</math> be <math>\hat{\boldsymbol{\Omega}}=2\hat{\sigma_g^2}\mathbf{K}+\hat{\sigma_e^2}\mathbf{I}</math>.

== Single Variant Score Tests ==

Line 35: Line 24:

RAREMETALWORKER also stores the covariance matrices (<math> \mathbf{V} </math>) of the score statistics of markers within a window.

+

== Modeling Relatedness ==

+

we use a variance component model to handle familial relationships. In a sample of n individuals, we model the observed phenotype vector (<math>\mathbf{y}</math>) as a sum of covariate effects (specified by a design matrix <math>\mathbf{X}</math> and a vector of covariate effects <math>\boldsymbol{\beta}</math>), additive genetic effects (modeled in vector <math>\mathbf{g}</math>) and non-shared environmental effects (modeled in vector <math>\boldsymbol{\varepsilon}</math>). Thus the null model is:

+

<math>\mathbf{y}=\mathbf{X}\boldsymbol{\beta} +\mathbf{g}+ \boldsymbol{\varepsilon}</math>

+

We assume that genetic effects are normally distributed, with mean <math>\mathbf{0}</math> and covariance <math>\mathbf{K}\sigma_g^2</math> where the matrix <math>\mathbf{K}</math> summarizes kinship coefficients between sampled individuals and <math>\sigma_g^2</math> is a positive scalar describing the genetic contribution to the overall variance. We assume that non-shared environmental effects are normally distributed with mean <math>\mathbf{0}</math> and covariance <math>\mathbf{I}\sigma_e^2</math>, where <math>\mathbf{I}</math> is the identity matrix.

+

To estimate <math>\mathbf{K}</math>, we either use known pedigree structure to define <math>\mathbf{K}</math> or else use the empirical estimator <math>\mathbf{K}=\frac{1}{l}\sum_{i=1}^l{(G_i-2f_i\mathbf{1})(G_i-2f_i\mathbf{1})\over 4f_i(1-f_i)} </math>,

+

where <math>l</math> is the count of variants, <math>G_i</math> and <math>f_i</math> are the genotype vector and estimated allele frequency for the <math>i^{th}</math> variant, respectively. Each element in <math>G_i</math> encodes the minor allele count for one individual. Model parameters <math>\hat{\boldsymbol{\beta}}</math>, <math>\hat{\sigma_g^2}</math> and <math>\hat{\sigma_e^2}</math>, are estimated using maximum likelihood and the efficient algorithm described in Lippert et. al. For convenience, let the estimated covariance matrix of <math>\mathbf{y}</math> be <math>\hat{\boldsymbol{\Omega}}=2\hat{\sigma_g^2}\mathbf{K}+\hat{\sigma_e^2}\mathbf{I}</math>.

==Chromosome X==

To analyze markers on chromosome X, we fit an extra variance components <math> {{\sigma_g}_X}^2 </math>, to model the variance explained by chromosome X. A kinship for chromosome X, <math> \boldsymbol{K_X} </math>, can be estimated either from a pedigree, or from genotypes of marker from chromosome X. Then the estimated covariance matrix can be written as <math>\hat{\boldsymbol{\Omega}}=2\hat{\sigma_g^2}\mathbf{K}+2\hat{{\sigma_g}_X^2}\mathbf{K_X}+\hat{\sigma_e^2}\mathbf{I}</math>.

Goncalo

Bureaucrats, Administrators

1,555

edits

Changes

RAREMETALWORKER METHOD (view source)

Revision as of 09:56, 27 March 2014

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools