# Changes

## RAREMETALWORKER METHOD

, 09:56, 27 March 2014
no edit summary
[[RAREMETALWORKER]] generates single variant association test statistics for a single study prior to meta-analysis. This page provides a brief description of the statistics that
RAREMETALWORKER calculates, together with key formulae.

== Modeling Relatedness ==
we use a variance component model to handle familial relationships. In a sample of n individuals, we model the observed phenotype vector ($\mathbf{y}$) as a sum of covariate effects (specified by a design matrix $\mathbf{X}$ and a vector of covariate effects $\boldsymbol{\beta}$), additive genetic effects (modeled in vector $\mathbf{g}$) and non-shared environmental effects (modeled in vector $\boldsymbol{\varepsilon}$). Thus the null model is:

$\mathbf{y}=\mathbf{X}\boldsymbol{\beta} +\mathbf{g}+ \boldsymbol{\varepsilon}$

We assume that genetic effects are normally distributed, with mean $\mathbf{0}$ and covariance $\mathbf{K}\sigma_g^2$ where the matrix $\mathbf{K}$ summarizes kinship coefficients between sampled individuals and $\sigma_g^2$ is a positive scalar describing the genetic contribution to the overall variance. We assume that non-shared environmental effects are normally distributed with mean $\mathbf{0}$ and covariance $\mathbf{I}\sigma_e^2$, where $\mathbf{I}$ is the identity matrix.

To estimate $\mathbf{K}$, we either use known pedigree structure to define $\mathbf{K}$ or else use the empirical estimator $\mathbf{K}=\frac{1}{l}\sum_{i=1}^l{(G_i-2f_i\mathbf{1})(G_i-2f_i\mathbf{1})\over 4f_i(1-f_i)}$,
where $l$ is the count of variants, $G_i$ and $f_i$ are the genotype vector and estimated allele frequency for the $i^{th}$ variant, respectively. Each element in $G_i$ encodes the minor allele count for one individual. Model parameters $\hat{\boldsymbol{\beta}}$, $\hat{\sigma_g^2}$ and $\hat{\sigma_e^2}$, are estimated using maximum likelihood and the efficient algorithm described in Lippert et. al. For convenience, let the estimated covariance matrix of $\mathbf{y}$ be $\hat{\boldsymbol{\Omega}}=2\hat{\sigma_g^2}\mathbf{K}+\hat{\sigma_e^2}\mathbf{I}$.
== Single Variant Score Tests ==
RAREMETALWORKER also stores the covariance matrices ($\mathbf{V}$) of the score statistics of markers within a window.

== Modeling Relatedness ==
we use a variance component model to handle familial relationships. In a sample of n individuals, we model the observed phenotype vector ($\mathbf{y}$) as a sum of covariate effects (specified by a design matrix $\mathbf{X}$ and a vector of covariate effects $\boldsymbol{\beta}$), additive genetic effects (modeled in vector $\mathbf{g}$) and non-shared environmental effects (modeled in vector $\boldsymbol{\varepsilon}$). Thus the null model is:

$\mathbf{y}=\mathbf{X}\boldsymbol{\beta} +\mathbf{g}+ \boldsymbol{\varepsilon}$

We assume that genetic effects are normally distributed, with mean $\mathbf{0}$ and covariance $\mathbf{K}\sigma_g^2$ where the matrix $\mathbf{K}$ summarizes kinship coefficients between sampled individuals and $\sigma_g^2$ is a positive scalar describing the genetic contribution to the overall variance. We assume that non-shared environmental effects are normally distributed with mean $\mathbf{0}$ and covariance $\mathbf{I}\sigma_e^2$, where $\mathbf{I}$ is the identity matrix.

To estimate $\mathbf{K}$, we either use known pedigree structure to define $\mathbf{K}$ or else use the empirical estimator $\mathbf{K}=\frac{1}{l}\sum_{i=1}^l{(G_i-2f_i\mathbf{1})(G_i-2f_i\mathbf{1})\over 4f_i(1-f_i)}$,
where $l$ is the count of variants, $G_i$ and $f_i$ are the genotype vector and estimated allele frequency for the $i^{th}$ variant, respectively. Each element in $G_i$ encodes the minor allele count for one individual. Model parameters $\hat{\boldsymbol{\beta}}$, $\hat{\sigma_g^2}$ and $\hat{\sigma_e^2}$, are estimated using maximum likelihood and the efficient algorithm described in Lippert et. al. For convenience, let the estimated covariance matrix of $\mathbf{y}$ be $\hat{\boldsymbol{\Omega}}=2\hat{\sigma_g^2}\mathbf{K}+\hat{\sigma_e^2}\mathbf{I}$.
==Chromosome X==
To analyze markers on chromosome X, we fit an extra variance components ${{\sigma_g}_X}^2$, to model the variance explained by chromosome X. A kinship for chromosome X, $\boldsymbol{K_X}$, can be estimated either from a pedigree, or from genotypes of marker from chromosome X. Then the estimated covariance matrix can be written as $\hat{\boldsymbol{\Omega}}=2\hat{\sigma_g^2}\mathbf{K}+2\hat{{\sigma_g}_X^2}\mathbf{K_X}+\hat{\sigma_e^2}\mathbf{I}$.
1,555
edits