# Changes

## RAREMETALWORKER METHOD

, 17:49, 16 March 2018
no edit summary
[[Category:RAREMETALWORKER]]

Here are some useful links to key pages:
* The [[RAREMETALWORKER | '''RAREMETALWORKER documentation''']]
* The [[RAREMETALWORKER_command_reference | '''RAREMETALWORKER command reference''']]
* The [[RAREMETALWORKER_SPECIAL_TOPICS | '''RAREMETALWORKER special topics''']]
* The [[Tutorial:_RAREMETAL | '''RAREMETALWORKER quick start tutorial''']]
* The [[RAREMETAL_method | '''RAREMETAL method''']]
* The [[RAREMETAL_FAQ | '''FAQ''']]

== Brief Introduction==
== Key Statistics for Analysis of Single Study ==

===NOTATIONS===
We use the following notations to describe our methods:
$\mathbf{y}$ is the vector of observed phenotype vectorquantitative trait
$\mathbf{X}$ is the design matrix
$\mathbf{G_i}$ is the genotype vector of the $i^{th}$ variant $\bar{\mathbf{G_i}}$ is the vector of average genotype of the $i^{th}$ variant $\boldsymbol{\betabeta_c}$ is the vector of covariate effects $\beta_i$ is the scalar of fixed genetic effect of the $i^{th}$ variant  $\mathbf{g}$ is the additive random genetic effects
$\boldsymbol{\varepsilon}$ is the non-shared environmental effects
$\hat{\boldsymbol{\Omega}}$ is the estimated covariance matrix of $\mathbf{y}$ $\mathbf{K}$ is the kinship matrix $\mathbf{K_X}$ is the kinship matrix of Chromosome X $\sigma_g^2$ is the genetic component  ${{\sigma_g}_X}^2$ is the genetic component for markers on chromosome X $\sigma_e^2$ is the non-shared-environment component. == Single Variant Score Tests =SINGLE VARIANT SCORE TEST===
We used the following model for the trait:
$\mathbf{y}=\mathbf{X}\boldsymbol{\betabeta_c}+\gamma_ibeta_i(\mathbf{G_i}-\bar{\mathbf{G_i}})+\mathbf{g}+\boldsymbol{\varepsilon}$.
Here, [explain the formula]quantitive trait for an individual is a sum of covariate effects, additive genetic effect from the $i^{th}$ variant and the polygenic background effects together with non-shared environmental effect.
In this model, $\gamma_ibeta_i$ is to measure the additive genetic effect of the $i^{th}$ variant. As usual, the score statistic for testing $H_0:\gamma_ibeta_i=0$ is:
$U_i=(\mathbf{G_i}-\mathbf{\bar{G_i}} )^T \hat{\boldsymbol{\Omega}}^{-1}(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})$
$\mathbf{V}=(\mathbf{G}-\bar{\mathbf{G}})^T (\hat{\boldsymbol{\Omega}}^{-1}-\hat{\boldsymbol{\Omega}}^{-1} \mathbf{X}(\mathbf{X^T}\hat{\boldsymbol{\Omega}}^{-1}\mathbf{X})^{-1} \mathbf{X^T} \hat{\boldsymbol{\Omega}}^{-1})(\mathbf{G}-\bar{\mathbf{G}})$.
Under the nullThe score test statistic, test statistics $T_i=(U_i^2)/V_{ii}$ , is asymptotically distributed as chi-squared with one degree of freedom. The score test p-value is reported in RAREMETALWORKER.
== Summary Statistics and Covariance Matrices=SUMMARY STATISTICS AND COVARIANCE MATRICES===
RAREMETALWORKER automatically stores the score statistics for each marker ( $U_i$) together with quality information of that marker, including HWE p-value, call rate, and allele counts.
RAREMETALWORKER also stores the covariance matrices ($\mathbf{V}$) of the score statistics of markers within a window, size of which can be specified through command line.
== Modeling Relatedness = MODELING RELATEDNESS ===we We use a variance component model to handle familial relationships. In a sample of n individuals, we model We estimate the observed phenotype vector ($\mathbf{y}$) as a sum of covariate effects (specified by a design matrix $\mathbf{X}$ and a vector of covariate effects $\boldsymbol{\beta}$), additive genetic effects (modeled in vector $\mathbf{g}$) and non-shared environmental effects (modeled in vector $\boldsymbol{\varepsilon}$). Thus variance components under the null model is:
$\mathbf{y}=\mathbf{X}\boldsymbol{\beta} +\mathbf{g}+ \boldsymbol{\varepsilon}$
We assume that genetic effects are normally distributed, with mean $\mathbf{0}$ and covariance $\mathbf{K}\sigma_g^2$ where the matrix $\mathbf{K}$ summarizes kinship coefficients between sampled individuals and $\sigma_g^2$ is a positive scalar describing the genetic contribution to the overall variance. We assume that non-shared environmental effects are normally distributed with mean $\mathbf{0}$ and covariance $\mathbf{I}\sigma_e^2$, where $\mathbf{I}$ is the identity matrix.
To estimate $\mathbf{K}$, we either use known pedigree structure to define $\mathbf{K}$ or else use the empirical estimator  $\mathbf{K}=\frac{1}{l}\sum_{i=1}^l{(G_i-2f_i\mathbf{1})(G_i-2f_i\mathbf{1})\over 4f_i(1-f_i)}$,  where $l$ is the count of variants, $G_i$ and $f_i$ are the genotype vector and estimated allele frequency for the $i^{th}$ variant, respectively. Each element in $G_i$ encodes the minor allele count for one individual. Model parameters $\hat{\boldsymbol{\beta}}$, $\hat{\sigma_g^2}$ and $\hat{\sigma_e^2}$, are estimated using maximum likelihood and the efficient algorithm described in [http://www.nature.com/nmeth/journal/v8/n10/full/nmeth.1681.html Lippert et. al]. For convenience, let the estimated covariance matrix of $\mathbf{y}$ be $\hat{\boldsymbol{\Omega}}=2\hat{\sigma_g^2}\mathbf{K}+\hat{\sigma_e^2}\mathbf{I}$.
==Chromosome =ANALYZING MARKERS ON CHROMOSOME X===
To analyze markers on chromosome X, we fit an extra variance components ${{\sigma_g}_X}^2$, to model the variance explained by chromosome X. A kinship for chromosome X, $\boldsymbol{K_X}$, can be estimated either from a pedigree, or from genotypes of marker from chromosome X. Then the estimated covariance matrix can be written as $\hat{\boldsymbol{\Omega}}=2\hat{\sigma_g^2}\mathbf{K}+2\hat{{\sigma_g}_X^2}\mathbf{K_X}+\hat{\sigma_e^2}\mathbf{I}$.
30
edits