Changes

930 bytes added , 17:49, 16 March 2018

no edit summary

Line 1: Line 1: +

[[Category:RAREMETALWORKER]]

+

==Useful Links==

+

Here are some useful links to key pages:

+

* The [[RAREMETALWORKER | '''RAREMETALWORKER documentation''']]

+

* The [[RAREMETALWORKER_command_reference | '''RAREMETALWORKER command reference''']]

+

* The [[RAREMETALWORKER_SPECIAL_TOPICS | '''RAREMETALWORKER special topics''']]

+

* The [[Tutorial:_RAREMETAL | '''RAREMETALWORKER quick start tutorial''']]

+

* The [[RAREMETAL_method | '''RAREMETAL method''']]

+

* The [[RAREMETAL_FAQ | '''FAQ''']]

+

== Brief Introduction==

Line 10: Line 21:

We use the following notations to describe our methods:

−

<math>\mathbf{y}</math> is the observed ~~phenotype vector~~

+

<math>\mathbf{y}</math> is the vector of observed quantitative trait

<math>\mathbf{X}</math> is the design matrix

Line 26: Line 37:

<math>\boldsymbol{\varepsilon}</math> is the non-shared environmental effects

−

===~~Single Variant Score Tests~~ ===

+

<math> \hat{\boldsymbol{\Omega}} </math> is the estimated covariance matrix of <math>\mathbf{y}</math>

+

<math>\mathbf{K}</math> is the kinship matrix

+

<math>\mathbf{K_X}</math> is the kinship matrix of Chromosome X

+

<math> \sigma_g^2 </math> is the genetic component

+

<math> {{\sigma_g}_X}^2 </math> is the genetic component for markers on chromosome X

+

<math>\sigma_e^2 </math> is the non-shared-environment component.

+

===SINGLE VARIANT SCORE TEST===

We used the following model for the trait:

Line 32: Line 55:

<math> \mathbf{y}=\mathbf{X}\boldsymbol{\beta_c}+\beta_i(\mathbf{G_i}-\bar{\mathbf{G_i}})+\mathbf{g}+\boldsymbol{\varepsilon} </math>.

−

Here, ~~[explain~~ the ~~formula]~~.

+

Here, the quantitive trait for an individual is a sum of covariate effects, additive genetic effect from the <math> i^{th} </math> variant and the polygenic background effects together with non-shared environmental effect.

In this model, <math>\beta_i</math> is to measure the additive genetic effect of the <math>i^{th}</math> variant. As usual, the score statistic for testing <math>H_0:\beta_i=0</math> is:

Line 42: Line 65:

<math> \mathbf{V}=(\mathbf{G}-\bar{\mathbf{G}})^T (\hat{\boldsymbol{\Omega}}^{-1}-\hat{\boldsymbol{\Omega}}^{-1} \mathbf{X}(\mathbf{X^T}\hat{\boldsymbol{\Omega}}^{-1}\mathbf{X})^{-1} \mathbf{X^T} \hat{\boldsymbol{\Omega}}^{-1})(\mathbf{G}-\bar{\mathbf{G}}) </math>.

−

The score test statistic, <math>T_i=(U_i^2)/V_{ii}</math>, is asymptotically distributed as chi-squared with one degree of freedom.

+

The score test statistic, <math>T_i=(U_i^2)/V_{ii}</math>, is asymptotically distributed as chi-squared with one degree of freedom. The score test p-value is reported in RAREMETALWORKER.

−

== ~~Summary Statistics and Covariance Matrices~~==

+

===SUMMARY STATISTICS AND COVARIANCE MATRICES===

RAREMETALWORKER automatically stores the score statistics for each marker ( <math> U_i </math>) together with quality information of that marker, including HWE p-value, call rate, and allele counts.

−

RAREMETALWORKER also stores the covariance matrices (<math> \mathbf{V} </math>) of the score statistics of markers within a window.

+

RAREMETALWORKER also stores the covariance matrices (<math> \mathbf{V} </math>) of the score statistics of markers within a window, size of which can be specified through command line.

−

== ~~Modeling Relatedness~~ ==

+

=== MODELING RELATEDNESS ===

−

we use a variance component model to handle familial relationships. ~~In a sample of n individuals, we model~~ the observed phenotype vector (<math>\mathbf{y}</math>) as a sum of covariate effects (specified by a design matrix <math>\mathbf{X}</math> and a vector of covariate effects <math>\boldsymbol{\beta}</math>), additive genetic effects (modeled in vector <math>\mathbf{g}</math>) and non-shared environmental effects (modeled in vector <math>\boldsymbol{\varepsilon}</math>). Thus the null model is:

+

We use a variance component model to handle familial relationships. We estimate the variance components under the null model:

<math>\mathbf{y}=\mathbf{X}\boldsymbol{\beta} +\mathbf{g}+ \boldsymbol{\varepsilon}</math>

Line 58: Line 81:

We assume that genetic effects are normally distributed, with mean <math>\mathbf{0}</math> and covariance <math>\mathbf{K}\sigma_g^2</math> where the matrix <math>\mathbf{K}</math> summarizes kinship coefficients between sampled individuals and <math>\sigma_g^2</math> is a positive scalar describing the genetic contribution to the overall variance. We assume that non-shared environmental effects are normally distributed with mean <math>\mathbf{0}</math> and covariance <math>\mathbf{I}\sigma_e^2</math>, where <math>\mathbf{I}</math> is the identity matrix.

−

To estimate <math>\mathbf{K}</math>, we either use known pedigree structure to define <math>\mathbf{K}</math> or else use the empirical estimator <math>\mathbf{K}=\frac{1}{l}\sum_{i=1}^l{(G_i-2f_i\mathbf{1})(G_i-2f_i\mathbf{1})\over 4f_i(1-f_i)} </math>,

+

To estimate <math>\mathbf{K}</math>, we either use known pedigree structure to define <math>\mathbf{K}</math> or else use the empirical estimator

−

where <math>l</math> is the count of variants, <math>G_i</math> and <math>f_i</math> are the genotype vector and estimated allele frequency for the <math>i^{th}</math> variant, respectively. Each element in <math>G_i</math> encodes the minor allele count for one individual. Model parameters <math>\hat{\boldsymbol{\beta}}</math>, <math>\hat{\sigma_g^2}</math> and <math>\hat{\sigma_e^2}</math>, are estimated using maximum likelihood and the efficient algorithm described in Lippert et. al. For convenience, let the estimated covariance matrix of <math>\mathbf{y}</math> be <math>\hat{\boldsymbol{\Omega}}=2\hat{\sigma_g^2}\mathbf{K}+\hat{\sigma_e^2}\mathbf{I}</math>.

+

<math>\mathbf{K}=\frac{1}{l}\sum_{i=1}^l{(G_i-2f_i\mathbf{1})(G_i-2f_i\mathbf{1})\over 4f_i(1-f_i)} </math>,

+

where <math>l</math> is the count of variants, <math>G_i</math> and <math>f_i</math> are the genotype vector and estimated allele frequency for the <math>i^{th}</math> variant, respectively. Each element in <math>G_i</math> encodes the minor allele count for one individual. Model parameters <math>\hat{\boldsymbol{\beta}}</math>, <math>\hat{\sigma_g^2}</math> and <math>\hat{\sigma_e^2}</math>, are estimated using maximum likelihood and the efficient algorithm described in [http://www.nature.com/nmeth/journal/v8/n10/full/nmeth.1681.html Lippert et. al]. For convenience, let the estimated covariance matrix of <math>\mathbf{y}</math> be <math>\hat{\boldsymbol{\Omega}}=\hat{\sigma_g^2}\mathbf{K}+\hat{\sigma_e^2}\mathbf{I}</math>.

−

==~~Chromosome~~ X==

+

===ANALYZING MARKERS ON CHROMOSOME X===

−

To analyze markers on chromosome X, we fit an extra variance components <math> {{\sigma_g}_X}^2 </math>, to model the variance explained by chromosome X. A kinship for chromosome X, <math> \boldsymbol{K_X} </math>, can be estimated either from a pedigree, or from genotypes of marker from chromosome X. Then the estimated covariance matrix can be written as <math>\hat{\boldsymbol{\Omega}}=2\hat{\sigma_g^2}\mathbf{K}+2\hat{{\sigma_g}_X^2}\mathbf{K_X}+\hat{\sigma_e^2}\mathbf{I}</math>.

+

To analyze markers on chromosome X, we fit an extra variance components <math> {{\sigma_g}_X}^2 </math>, to model the variance explained by chromosome X. A kinship for chromosome X, <math> \boldsymbol{K_X} </math>, can be estimated either from a pedigree, or from genotypes of marker from chromosome X. Then the estimated covariance matrix can be written as <math>\hat{\boldsymbol{\Omega}}=\hat{\sigma_g^2}\mathbf{K}+\hat{{\sigma_g}_X^2}\mathbf{K_X}+\hat{\sigma_e^2}\mathbf{I}</math>.

Abought

32

edits

Changes

RAREMETALWORKER METHOD (view source)

Revision as of 17:49, 16 March 2018

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools