Difference between revisions of "RAREMETALWORKER METHOD"

From Genome Analysis Wiki
Jump to: navigation, search
(SUMMARY STATISTICS AND COVARIANCE MATRICES)
 
(17 intermediate revisions by one other user not shown)
Line 1: Line 1:
 +
[[Category:RAREMETALWORKER]]
 +
==Useful Links==
 +
 +
Here are some useful links to key pages:
 +
* The [[RAREMETALWORKER | '''RAREMETALWORKER documentation''']]
 +
* The [[RAREMETALWORKER_command_reference | '''RAREMETALWORKER command reference''']]
 +
* The [[RAREMETALWORKER_SPECIAL_TOPICS | '''RAREMETALWORKER special topics''']]
 +
* The [[Tutorial:_RAREMETAL | '''RAREMETALWORKER quick start tutorial''']]
 +
* The [[RAREMETAL_method | '''RAREMETAL method''']]
 +
* The [[RAREMETAL_FAQ | '''FAQ''']]
 +
 
== Brief Introduction==
 
== Brief Introduction==
  
Line 10: Line 21:
 
We use the following notations to describe our methods:
 
We use the following notations to describe our methods:
  
<math>\mathbf{y}</math> is the observed phenotype vector
+
<math>\mathbf{y}</math> is the vector of observed quantitative trait
  
 
<math>\mathbf{X}</math> is the design matrix
 
<math>\mathbf{X}</math> is the design matrix
Line 25: Line 36:
  
 
<math>\boldsymbol{\varepsilon}</math> is the non-shared environmental effects
 
<math>\boldsymbol{\varepsilon}</math> is the non-shared environmental effects
 +
 +
<math> \hat{\boldsymbol{\Omega}} </math> is the estimated covariance matrix of <math>\mathbf{y}</math>
 +
 +
<math>\mathbf{K}</math> is the kinship matrix
 +
 +
<math>\mathbf{K_X}</math> is the kinship matrix of Chromosome X
 +
 +
<math> \sigma_g^2 </math> is the genetic component
 +
 +
<math> {{\sigma_g}_X}^2 </math> is the genetic component for markers on chromosome X
 +
 +
<math>\sigma_e^2 </math> is the non-shared-environment component.
  
 
===SINGLE VARIANT SCORE TEST===
 
===SINGLE VARIANT SCORE TEST===
Line 32: Line 55:
 
<math> \mathbf{y}=\mathbf{X}\boldsymbol{\beta_c}+\beta_i(\mathbf{G_i}-\bar{\mathbf{G_i}})+\mathbf{g}+\boldsymbol{\varepsilon} </math>.
 
<math> \mathbf{y}=\mathbf{X}\boldsymbol{\beta_c}+\beta_i(\mathbf{G_i}-\bar{\mathbf{G_i}})+\mathbf{g}+\boldsymbol{\varepsilon} </math>.
  
Here, [explain the formula].  
+
Here, the quantitive trait for an individual is a sum of covariate effects, additive genetic effect from the <math> i^{th} </math> variant and the polygenic background effects together with non-shared environmental effect.
  
 
In this model, <math>\beta_i</math> is to measure the additive genetic effect of the <math>i^{th}</math> variant. As usual, the score statistic for testing <math>H_0:\beta_i=0</math> is:
 
In this model, <math>\beta_i</math> is to measure the additive genetic effect of the <math>i^{th}</math> variant. As usual, the score statistic for testing <math>H_0:\beta_i=0</math> is:
Line 50: Line 73:
 
RAREMETALWORKER also stores the covariance matrices (<math> \mathbf{V} </math>) of the score statistics of markers within a window, size of which can be specified through command line.
 
RAREMETALWORKER also stores the covariance matrices (<math> \mathbf{V} </math>) of the score statistics of markers within a window, size of which can be specified through command line.
  
== Modeling Relatedness ==
+
=== MODELING RELATEDNESS ===
we use a variance component model to handle familial relationships. The null model is:  
+
We use a variance component model to handle familial relationships. We estimate the variance components under the null model:  
  
 
<math>\mathbf{y}=\mathbf{X}\boldsymbol{\beta} +\mathbf{g}+ \boldsymbol{\varepsilon}</math>
 
<math>\mathbf{y}=\mathbf{X}\boldsymbol{\beta} +\mathbf{g}+ \boldsymbol{\varepsilon}</math>
Line 64: Line 87:
 
where <math>l</math> is the count of variants, <math>G_i</math> and <math>f_i</math> are the genotype vector and estimated allele frequency for the <math>i^{th}</math> variant, respectively. Each element in <math>G_i</math> encodes the minor allele count for one individual. Model parameters <math>\hat{\boldsymbol{\beta}}</math>, <math>\hat{\sigma_g^2}</math> and <math>\hat{\sigma_e^2}</math>, are estimated using maximum likelihood and the efficient algorithm described in [http://www.nature.com/nmeth/journal/v8/n10/full/nmeth.1681.html Lippert et. al]. For convenience, let the estimated covariance matrix of <math>\mathbf{y}</math> be <math>\hat{\boldsymbol{\Omega}}=\hat{\sigma_g^2}\mathbf{K}+\hat{\sigma_e^2}\mathbf{I}</math>.
 
where <math>l</math> is the count of variants, <math>G_i</math> and <math>f_i</math> are the genotype vector and estimated allele frequency for the <math>i^{th}</math> variant, respectively. Each element in <math>G_i</math> encodes the minor allele count for one individual. Model parameters <math>\hat{\boldsymbol{\beta}}</math>, <math>\hat{\sigma_g^2}</math> and <math>\hat{\sigma_e^2}</math>, are estimated using maximum likelihood and the efficient algorithm described in [http://www.nature.com/nmeth/journal/v8/n10/full/nmeth.1681.html Lippert et. al]. For convenience, let the estimated covariance matrix of <math>\mathbf{y}</math> be <math>\hat{\boldsymbol{\Omega}}=\hat{\sigma_g^2}\mathbf{K}+\hat{\sigma_e^2}\mathbf{I}</math>.
  
==Chromosome X==
+
===ANALYZING MARKERS ON CHROMOSOME X===
  
 
To analyze markers on chromosome X, we fit an extra variance components <math> {{\sigma_g}_X}^2 </math>, to model the variance explained by chromosome X. A kinship for chromosome X, <math> \boldsymbol{K_X} </math>, can be estimated either from a pedigree, or from genotypes of marker from chromosome X. Then the estimated covariance matrix can be written as <math>\hat{\boldsymbol{\Omega}}=\hat{\sigma_g^2}\mathbf{K}+\hat{{\sigma_g}_X^2}\mathbf{K_X}+\hat{\sigma_e^2}\mathbf{I}</math>.
 
To analyze markers on chromosome X, we fit an extra variance components <math> {{\sigma_g}_X}^2 </math>, to model the variance explained by chromosome X. A kinship for chromosome X, <math> \boldsymbol{K_X} </math>, can be estimated either from a pedigree, or from genotypes of marker from chromosome X. Then the estimated covariance matrix can be written as <math>\hat{\boldsymbol{\Omega}}=\hat{\sigma_g^2}\mathbf{K}+\hat{{\sigma_g}_X^2}\mathbf{K_X}+\hat{\sigma_e^2}\mathbf{I}</math>.

Latest revision as of 17:49, 16 March 2018

Useful Links

Here are some useful links to key pages:

Brief Introduction

RAREMETALWORKER generates single variant association test statistics for a single study prior to meta-analysis. This page provides a brief description of the statistics that RAREMETALWORKER calculates, together with key formulae.

Key Statistics for Analysis of Single Study

NOTATIONS

We use the following notations to describe our methods:

\mathbf{y} is the vector of observed quantitative trait

\mathbf{X} is the design matrix

\mathbf{G_i} is the genotype vector of the i^{th} variant

 \bar{\mathbf{G_i}} is the vector of average genotype of the i^{th} variant

\boldsymbol{\beta_c} is the vector of covariate effects

\beta_i is the scalar of fixed genetic effect of the i^{th} variant

\mathbf{g} is the random genetic effects

\boldsymbol{\varepsilon} is the non-shared environmental effects

 \hat{\boldsymbol{\Omega}} is the estimated covariance matrix of \mathbf{y}

\mathbf{K} is the kinship matrix

\mathbf{K_X} is the kinship matrix of Chromosome X

 \sigma_g^2 is the genetic component

 {{\sigma_g}_X}^2 is the genetic component for markers on chromosome X

\sigma_e^2 is the non-shared-environment component.

SINGLE VARIANT SCORE TEST

We used the following model for the trait:

 \mathbf{y}=\mathbf{X}\boldsymbol{\beta_c}+\beta_i(\mathbf{G_i}-\bar{\mathbf{G_i}})+\mathbf{g}+\boldsymbol{\varepsilon} .

Here, the quantitive trait for an individual is a sum of covariate effects, additive genetic effect from the  i^{th} variant and the polygenic background effects together with non-shared environmental effect.

In this model, \beta_i is to measure the additive genetic effect of the i^{th} variant. As usual, the score statistic for testing H_0:\beta_i=0 is:

 U_i=(\mathbf{G_i}-\mathbf{\bar{G_i}} )^T \hat{\boldsymbol{\Omega}}^{-1}(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})

We further derive the variance-covariance matrix of these statistics as

 \mathbf{V}=(\mathbf{G}-\bar{\mathbf{G}})^T (\hat{\boldsymbol{\Omega}}^{-1}-\hat{\boldsymbol{\Omega}}^{-1} \mathbf{X}(\mathbf{X^T}\hat{\boldsymbol{\Omega}}^{-1}\mathbf{X})^{-1} \mathbf{X^T} \hat{\boldsymbol{\Omega}}^{-1})(\mathbf{G}-\bar{\mathbf{G}}) .

The score test statistic, T_i=(U_i^2)/V_{ii}, is asymptotically distributed as chi-squared with one degree of freedom. The score test p-value is reported in RAREMETALWORKER.

SUMMARY STATISTICS AND COVARIANCE MATRICES

RAREMETALWORKER automatically stores the score statistics for each marker (  U_i ) together with quality information of that marker, including HWE p-value, call rate, and allele counts.

RAREMETALWORKER also stores the covariance matrices ( \mathbf{V} ) of the score statistics of markers within a window, size of which can be specified through command line.

MODELING RELATEDNESS

We use a variance component model to handle familial relationships. We estimate the variance components under the null model:

\mathbf{y}=\mathbf{X}\boldsymbol{\beta} +\mathbf{g}+ \boldsymbol{\varepsilon}


We assume that genetic effects are normally distributed, with mean \mathbf{0} and covariance \mathbf{K}\sigma_g^2 where the matrix \mathbf{K} summarizes kinship coefficients between sampled individuals and \sigma_g^2 is a positive scalar describing the genetic contribution to the overall variance. We assume that non-shared environmental effects are normally distributed with mean \mathbf{0} and covariance \mathbf{I}\sigma_e^2, where \mathbf{I} is the identity matrix.

To estimate \mathbf{K}, we either use known pedigree structure to define \mathbf{K} or else use the empirical estimator

\mathbf{K}=\frac{1}{l}\sum_{i=1}^l{(G_i-2f_i\mathbf{1})(G_i-2f_i\mathbf{1})\over 4f_i(1-f_i)} ,

where l is the count of variants, G_i and f_i are the genotype vector and estimated allele frequency for the i^{th} variant, respectively. Each element in G_i encodes the minor allele count for one individual. Model parameters \hat{\boldsymbol{\beta}}, \hat{\sigma_g^2} and \hat{\sigma_e^2}, are estimated using maximum likelihood and the efficient algorithm described in Lippert et. al. For convenience, let the estimated covariance matrix of \mathbf{y} be \hat{\boldsymbol{\Omega}}=\hat{\sigma_g^2}\mathbf{K}+\hat{\sigma_e^2}\mathbf{I}.

ANALYZING MARKERS ON CHROMOSOME X

To analyze markers on chromosome X, we fit an extra variance components  {{\sigma_g}_X}^2 , to model the variance explained by chromosome X. A kinship for chromosome X,  \boldsymbol{K_X} , can be estimated either from a pedigree, or from genotypes of marker from chromosome X. Then the estimated covariance matrix can be written as \hat{\boldsymbol{\Omega}}=\hat{\sigma_g^2}\mathbf{K}+\hat{{\sigma_g}_X^2}\mathbf{K_X}+\hat{\sigma_e^2}\mathbf{I}.