Difference between revisions of "RAREMETALWORKER METHOD"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 11: Line 11:
 
We assume that genetic effects are normally distributed, with mean <math>\mathbf{0}</math> and covariance <math>\mathbf{K}\sigma_g^2</math> where the matrix <math>\mathbf{K}</math> summarizes kinship coefficients between sampled individuals and  <math>\sigma_g^2</math> is a positive scalar describing the genetic contribution to the overall variance. We assume that non-shared environmental effects are normally distributed with mean <math>\mathbf{0}</math> and covariance <math>\mathbf{I}\sigma_e^2</math>, where <math>\mathbf{I}</math> is the identity matrix.
 
We assume that genetic effects are normally distributed, with mean <math>\mathbf{0}</math> and covariance <math>\mathbf{K}\sigma_g^2</math> where the matrix <math>\mathbf{K}</math> summarizes kinship coefficients between sampled individuals and  <math>\sigma_g^2</math> is a positive scalar describing the genetic contribution to the overall variance. We assume that non-shared environmental effects are normally distributed with mean <math>\mathbf{0}</math> and covariance <math>\mathbf{I}\sigma_e^2</math>, where <math>\mathbf{I}</math> is the identity matrix.
  
To estimate <math>\mathbf{K}</math>, we either use known pedigree structure to define <math>\mathbf{K}</math> or else use the empirical estimator  
+
To estimate <math>\mathbf{K}</math>, we either use known pedigree structure to define <math>\mathbf{K}</math> or else use the empirical estimator <math>\mathbf{K}=1/l ∑_(i=1)^l{(G_i-2f_i\mathbf{1})(G_i-2f_i\mathbf{1})\over 4f_i(1-f_i)},  
K ̂=1/l ∑_(i=1)^l▒(〖(G〗_i-2f_i 1)〖〖(G〗_i-2f_i 1)〗^T  )/(4f_i (1-f_i)),  
 
 
where l is the count of variants, G_i and f_i are the genotype vector and estimated allele frequency for the i^(th) variant, respectively. Each element in G_i encodes the minor allele count for one individual. Model parameters β ̂, (σ_g^2 ) ̂ and (σ_e^2 ) ̂, are estimated using maximum likelihood and the efficient algorithm described in Lippert et. al34. For convenience, let the estimated covariance matrix of y be Ω ̂=2(σ_g^2 ) ̂K ̂+(σ_e^2 ) ̂I.
 
where l is the count of variants, G_i and f_i are the genotype vector and estimated allele frequency for the i^(th) variant, respectively. Each element in G_i encodes the minor allele count for one individual. Model parameters β ̂, (σ_g^2 ) ̂ and (σ_e^2 ) ̂, are estimated using maximum likelihood and the efficient algorithm described in Lippert et. al34. For convenience, let the estimated covariance matrix of y be Ω ̂=2(σ_g^2 ) ̂K ̂+(σ_e^2 ) ̂I.
  

Revision as of 11:04, 11 March 2014

Brief Introduction

RAREMETALWORKER(RMW) generates single variant association results from score test, together with summary statistics and covariance matrices of the score statistics.

In the following sections, we will go through the methods behind RWM including statistic model, handling sample relatedness, and definitions of statistics in the output.

Modeling Relatedness

we use a variance component model to handle familial relationships. In a sample of n individuals, we model the observed phenotype vector (y) as a sum of covariate effects (specified by a design matrix X and a vector of covariate effects β), additive genetic effects (modeled in vector g) and non-shared environmental effects (modeled in vector ε). Thus the null model is:

We assume that genetic effects are normally distributed, with mean and covariance where the matrix summarizes kinship coefficients between sampled individuals and is a positive scalar describing the genetic contribution to the overall variance. We assume that non-shared environmental effects are normally distributed with mean and covariance , where is the identity matrix.

To estimate , we either use known pedigree structure to define or else use the empirical estimator <math>\mathbf{K}=1/l ∑_(i=1)^l{(G_i-2f_i\mathbf{1})(G_i-2f_i\mathbf{1})\over 4f_i(1-f_i)}, where l is the count of variants, G_i and f_i are the genotype vector and estimated allele frequency for the i^(th) variant, respectively. Each element in G_i encodes the minor allele count for one individual. Model parameters β ̂, (σ_g^2 ) ̂ and (σ_e^2 ) ̂, are estimated using maximum likelihood and the efficient algorithm described in Lippert et. al34. For convenience, let the estimated covariance matrix of y be Ω ̂=2(σ_g^2 ) ̂K ̂+(σ_e^2 ) ̂I.

Single Variant Score Tests

Summary Statistics

Covariance Matrices

Chromosome X