From Genome Analysis Wiki
Revision as of 13:32, 14 April 2014 by Shuang Feng (talk | contribs)
Jump to navigationJump to search

Useful Links

Here are some useful links to key pages:

Brief Introduction

RAREMETALWORKER generates single variant association test statistics for a single study prior to meta-analysis. This page provides a brief description of the statistics that RAREMETALWORKER calculates, together with key formulae.

Key Statistics for Analysis of Single Study


We use the following notations to describe our methods:

  is the vector of observed quantitative trait

  is the design matrix

  is the genotype vector of the   variant

  is the vector of average genotype of the   variant

  is the vector of covariate effects

  is the scalar of fixed genetic effect of the   variant

  is the random genetic effects

  is the non-shared environmental effects

  is the estimated covariance matrix of  

  is the kinship matrix

  is the kinship matrix of Chromosome X

  is the genetic component

  is the genetic component for markers on chromosome X

  is the non-shared-environment component.


We used the following model for the trait:


Here, the quantitive trait for an individual is a sum of covariate effects, additive genetic effect from the   variant and the polygenic background effects together with non-shared environmental effect.

In this model,   is to measure the additive genetic effect of the   variant. As usual, the score statistic for testing   is:


We further derive the variance-covariance matrix of these statistics as


The score test statistic,  , is asymptotically distributed as chi-squared with one degree of freedom. The score test p-value is reported in RAREMETALWORKER.


RAREMETALWORKER automatically stores the score statistics for each marker (  ) together with quality information of that marker, including HWE p-value, call rate, and allele counts.

RAREMETALWORKER also stores the covariance matrices ( ) of the score statistics of markers within a window, size of which can be specified through command line.


We use a variance component model to handle familial relationships. We estimate the variance components under the null model:


We assume that genetic effects are normally distributed, with mean   and covariance   where the matrix   summarizes kinship coefficients between sampled individuals and   is a positive scalar describing the genetic contribution to the overall variance. We assume that non-shared environmental effects are normally distributed with mean   and covariance  , where   is the identity matrix.

To estimate  , we either use known pedigree structure to define   or else use the empirical estimator


where   is the count of variants,   and   are the genotype vector and estimated allele frequency for the   variant, respectively. Each element in   encodes the minor allele count for one individual. Model parameters  ,   and  , are estimated using maximum likelihood and the efficient algorithm described in Lippert et. al. For convenience, let the estimated covariance matrix of   be  .


To analyze markers on chromosome X, we fit an extra variance components  , to model the variance explained by chromosome X. A kinship for chromosome X,  , can be estimated either from a pedigree, or from genotypes of marker from chromosome X. Then the estimated covariance matrix can be written as  .