# Difference between revisions of "RAREMETALWORKER METHOD"

Shuang Feng (talk | contribs) (→Key Statistics for Analysis of Single Study) |
|||

(43 intermediate revisions by one other user not shown) | |||

Line 1: | Line 1: | ||

+ | [[Category:RAREMETALWORKER]] | ||

+ | ==Useful Links== | ||

+ | |||

+ | Here are some useful links to key pages: | ||

+ | * The [[RAREMETALWORKER | '''RAREMETALWORKER documentation''']] | ||

+ | * The [[RAREMETALWORKER_command_reference | '''RAREMETALWORKER command reference''']] | ||

+ | * The [[RAREMETALWORKER_SPECIAL_TOPICS | '''RAREMETALWORKER special topics''']] | ||

+ | * The [[Tutorial:_RAREMETAL | '''RAREMETALWORKER quick start tutorial''']] | ||

+ | * The [[RAREMETAL_method | '''RAREMETAL method''']] | ||

+ | * The [[RAREMETAL_FAQ | '''FAQ''']] | ||

+ | |||

== Brief Introduction== | == Brief Introduction== | ||

Line 5: | Line 16: | ||

== Key Statistics for Analysis of Single Study == | == Key Statistics for Analysis of Single Study == | ||

+ | |||

+ | ===NOTATIONS=== | ||

We use the following notations to describe our methods: | We use the following notations to describe our methods: | ||

− | <math>\mathbf{y}</math> is the observed | + | <math>\mathbf{y}</math> is the vector of observed quantitative trait |

<math>\mathbf{X}</math> is the design matrix | <math>\mathbf{X}</math> is the design matrix | ||

+ | |||

+ | <math>\mathbf{G_i}</math> is the genotype vector of the <math>i^{th}</math> variant | ||

+ | |||

+ | <math> \bar{\mathbf{G_i}}</math> is the vector of average genotype of the <math>i^{th}</math> variant | ||

<math>\boldsymbol{\beta_c}</math> is the vector of covariate effects | <math>\boldsymbol{\beta_c}</math> is the vector of covariate effects | ||

− | <math>\mathbf{g}</math> is the | + | <math>\beta_i</math> is the scalar of fixed genetic effect of the <math>i^{th}</math> variant |

+ | |||

+ | <math>\mathbf{g}</math> is the random genetic effects | ||

<math>\boldsymbol{\varepsilon}</math> is the non-shared environmental effects | <math>\boldsymbol{\varepsilon}</math> is the non-shared environmental effects | ||

− | == | + | <math> \hat{\boldsymbol{\Omega}} </math> is the estimated covariance matrix of <math>\mathbf{y}</math> |

+ | |||

+ | <math>\mathbf{K}</math> is the kinship matrix | ||

+ | |||

+ | <math>\mathbf{K_X}</math> is the kinship matrix of Chromosome X | ||

+ | |||

+ | <math> \sigma_g^2 </math> is the genetic component | ||

+ | |||

+ | <math> {{\sigma_g}_X}^2 </math> is the genetic component for markers on chromosome X | ||

+ | |||

+ | <math>\sigma_e^2 </math> is the non-shared-environment component. | ||

+ | |||

+ | ===SINGLE VARIANT SCORE TEST=== | ||

We used the following model for the trait: | We used the following model for the trait: | ||

− | <math> \mathbf{y}=\mathbf{X}\boldsymbol{\ | + | <math> \mathbf{y}=\mathbf{X}\boldsymbol{\beta_c}+\beta_i(\mathbf{G_i}-\bar{\mathbf{G_i}})+\mathbf{g}+\boldsymbol{\varepsilon} </math>. |

− | Here, | + | Here, the quantitive trait for an individual is a sum of covariate effects, additive genetic effect from the <math> i^{th} </math> variant and the polygenic background effects together with non-shared environmental effect. |

− | In this model, <math>\ | + | In this model, <math>\beta_i</math> is to measure the additive genetic effect of the <math>i^{th}</math> variant. As usual, the score statistic for testing <math>H_0:\beta_i=0</math> is: |

<math> U_i=(\mathbf{G_i}-\mathbf{\bar{G_i}} )^T \hat{\boldsymbol{\Omega}}^{-1}(\mathbf{y}-\mathbf{X}\boldsymbol{\beta}) </math> | <math> U_i=(\mathbf{G_i}-\mathbf{\bar{G_i}} )^T \hat{\boldsymbol{\Omega}}^{-1}(\mathbf{y}-\mathbf{X}\boldsymbol{\beta}) </math> | ||

Line 34: | Line 65: | ||

<math> \mathbf{V}=(\mathbf{G}-\bar{\mathbf{G}})^T (\hat{\boldsymbol{\Omega}}^{-1}-\hat{\boldsymbol{\Omega}}^{-1} \mathbf{X}(\mathbf{X^T}\hat{\boldsymbol{\Omega}}^{-1}\mathbf{X})^{-1} \mathbf{X^T} \hat{\boldsymbol{\Omega}}^{-1})(\mathbf{G}-\bar{\mathbf{G}}) </math>. | <math> \mathbf{V}=(\mathbf{G}-\bar{\mathbf{G}})^T (\hat{\boldsymbol{\Omega}}^{-1}-\hat{\boldsymbol{\Omega}}^{-1} \mathbf{X}(\mathbf{X^T}\hat{\boldsymbol{\Omega}}^{-1}\mathbf{X})^{-1} \mathbf{X^T} \hat{\boldsymbol{\Omega}}^{-1})(\mathbf{G}-\bar{\mathbf{G}}) </math>. | ||

− | + | The score test statistic, <math>T_i=(U_i^2)/V_{ii}</math>, is asymptotically distributed as chi-squared with one degree of freedom. The score test p-value is reported in RAREMETALWORKER. | |

− | == | + | ===SUMMARY STATISTICS AND COVARIANCE MATRICES=== |

RAREMETALWORKER automatically stores the score statistics for each marker ( <math> U_i </math>) together with quality information of that marker, including HWE p-value, call rate, and allele counts. | RAREMETALWORKER automatically stores the score statistics for each marker ( <math> U_i </math>) together with quality information of that marker, including HWE p-value, call rate, and allele counts. | ||

− | RAREMETALWORKER also stores the covariance matrices (<math> \mathbf{V} </math>) of the score statistics of markers within a window. | + | RAREMETALWORKER also stores the covariance matrices (<math> \mathbf{V} </math>) of the score statistics of markers within a window, size of which can be specified through command line. |

− | == | + | === MODELING RELATEDNESS === |

− | + | We use a variance component model to handle familial relationships. We estimate the variance components under the null model: | |

<math>\mathbf{y}=\mathbf{X}\boldsymbol{\beta} +\mathbf{g}+ \boldsymbol{\varepsilon}</math> | <math>\mathbf{y}=\mathbf{X}\boldsymbol{\beta} +\mathbf{g}+ \boldsymbol{\varepsilon}</math> | ||

Line 50: | Line 81: | ||

We assume that genetic effects are normally distributed, with mean <math>\mathbf{0}</math> and covariance <math>\mathbf{K}\sigma_g^2</math> where the matrix <math>\mathbf{K}</math> summarizes kinship coefficients between sampled individuals and <math>\sigma_g^2</math> is a positive scalar describing the genetic contribution to the overall variance. We assume that non-shared environmental effects are normally distributed with mean <math>\mathbf{0}</math> and covariance <math>\mathbf{I}\sigma_e^2</math>, where <math>\mathbf{I}</math> is the identity matrix. | We assume that genetic effects are normally distributed, with mean <math>\mathbf{0}</math> and covariance <math>\mathbf{K}\sigma_g^2</math> where the matrix <math>\mathbf{K}</math> summarizes kinship coefficients between sampled individuals and <math>\sigma_g^2</math> is a positive scalar describing the genetic contribution to the overall variance. We assume that non-shared environmental effects are normally distributed with mean <math>\mathbf{0}</math> and covariance <math>\mathbf{I}\sigma_e^2</math>, where <math>\mathbf{I}</math> is the identity matrix. | ||

− | To estimate <math>\mathbf{K}</math>, we either use known pedigree structure to define <math>\mathbf{K}</math> or else use the empirical estimator <math>\mathbf{K}=\frac{1}{l}\sum_{i=1}^l{(G_i-2f_i\mathbf{1})(G_i-2f_i\mathbf{1})\over 4f_i(1-f_i)} </math>, | + | To estimate <math>\mathbf{K}</math>, we either use known pedigree structure to define <math>\mathbf{K}</math> or else use the empirical estimator |

− | where <math>l</math> is the count of variants, <math>G_i</math> and <math>f_i</math> are the genotype vector and estimated allele frequency for the <math>i^{th}</math> variant, respectively. Each element in <math>G_i</math> encodes the minor allele count for one individual. Model parameters <math>\hat{\boldsymbol{\beta}}</math>, <math>\hat{\sigma_g^2}</math> and <math>\hat{\sigma_e^2}</math>, are estimated using maximum likelihood and the efficient algorithm described in Lippert et. al. For convenience, let the estimated covariance matrix of <math>\mathbf{y}</math> be <math>\hat{\boldsymbol{\Omega}}= | + | |

+ | <math>\mathbf{K}=\frac{1}{l}\sum_{i=1}^l{(G_i-2f_i\mathbf{1})(G_i-2f_i\mathbf{1})\over 4f_i(1-f_i)} </math>, | ||

+ | |||

+ | where <math>l</math> is the count of variants, <math>G_i</math> and <math>f_i</math> are the genotype vector and estimated allele frequency for the <math>i^{th}</math> variant, respectively. Each element in <math>G_i</math> encodes the minor allele count for one individual. Model parameters <math>\hat{\boldsymbol{\beta}}</math>, <math>\hat{\sigma_g^2}</math> and <math>\hat{\sigma_e^2}</math>, are estimated using maximum likelihood and the efficient algorithm described in [http://www.nature.com/nmeth/journal/v8/n10/full/nmeth.1681.html Lippert et. al]. For convenience, let the estimated covariance matrix of <math>\mathbf{y}</math> be <math>\hat{\boldsymbol{\Omega}}=\hat{\sigma_g^2}\mathbf{K}+\hat{\sigma_e^2}\mathbf{I}</math>. | ||

− | == | + | ===ANALYZING MARKERS ON CHROMOSOME X=== |

− | To analyze markers on chromosome X, we fit an extra variance components <math> {{\sigma_g}_X}^2 </math>, to model the variance explained by chromosome X. A kinship for chromosome X, <math> \boldsymbol{K_X} </math>, can be estimated either from a pedigree, or from genotypes of marker from chromosome X. Then the estimated covariance matrix can be written as <math>\hat{\boldsymbol{\Omega}}= | + | To analyze markers on chromosome X, we fit an extra variance components <math> {{\sigma_g}_X}^2 </math>, to model the variance explained by chromosome X. A kinship for chromosome X, <math> \boldsymbol{K_X} </math>, can be estimated either from a pedigree, or from genotypes of marker from chromosome X. Then the estimated covariance matrix can be written as <math>\hat{\boldsymbol{\Omega}}=\hat{\sigma_g^2}\mathbf{K}+\hat{{\sigma_g}_X^2}\mathbf{K_X}+\hat{\sigma_e^2}\mathbf{I}</math>. |

## Latest revision as of 17:49, 16 March 2018

## Contents

## Useful Links

Here are some useful links to key pages:

- The
**RAREMETALWORKER documentation** - The
**RAREMETALWORKER command reference** - The
**RAREMETALWORKER special topics** - The
**RAREMETALWORKER quick start tutorial** - The
**RAREMETAL method** - The
**FAQ**

## Brief Introduction

RAREMETALWORKER generates single variant association test statistics for a single study prior to meta-analysis. This page provides a brief description of the statistics that RAREMETALWORKER calculates, together with key formulae.

## Key Statistics for Analysis of Single Study

### NOTATIONS

We use the following notations to describe our methods:

is the vector of observed quantitative trait

is the design matrix

is the genotype vector of the variant

is the vector of average genotype of the variant

is the vector of covariate effects

is the scalar of fixed genetic effect of the variant

is the random genetic effects

is the non-shared environmental effects

is the estimated covariance matrix of

is the kinship matrix

is the kinship matrix of Chromosome X

is the genetic component

is the genetic component for markers on chromosome X

is the non-shared-environment component.

### SINGLE VARIANT SCORE TEST

We used the following model for the trait:

.

Here, the quantitive trait for an individual is a sum of covariate effects, additive genetic effect from the variant and the polygenic background effects together with non-shared environmental effect.

In this model, is to measure the additive genetic effect of the variant. As usual, the score statistic for testing is:

We further derive the variance-covariance matrix of these statistics as

.

The score test statistic, , is asymptotically distributed as chi-squared with one degree of freedom. The score test p-value is reported in RAREMETALWORKER.

### SUMMARY STATISTICS AND COVARIANCE MATRICES

RAREMETALWORKER automatically stores the score statistics for each marker ( ) together with quality information of that marker, including HWE p-value, call rate, and allele counts.

RAREMETALWORKER also stores the covariance matrices () of the score statistics of markers within a window, size of which can be specified through command line.

### MODELING RELATEDNESS

We use a variance component model to handle familial relationships. We estimate the variance components under the null model:

We assume that genetic effects are normally distributed, with mean and covariance where the matrix summarizes kinship coefficients between sampled individuals and is a positive scalar describing the genetic contribution to the overall variance. We assume that non-shared environmental effects are normally distributed with mean and covariance , where is the identity matrix.

To estimate , we either use known pedigree structure to define or else use the empirical estimator

,

where is the count of variants, and are the genotype vector and estimated allele frequency for the variant, respectively. Each element in encodes the minor allele count for one individual. Model parameters , and , are estimated using maximum likelihood and the efficient algorithm described in Lippert et. al. For convenience, let the estimated covariance matrix of be .

### ANALYZING MARKERS ON CHROMOSOME X

To analyze markers on chromosome X, we fit an extra variance components , to model the variance explained by chromosome X. A kinship for chromosome X, , can be estimated either from a pedigree, or from genotypes of marker from chromosome X. Then the estimated covariance matrix can be written as .