Genotype Likelihood based Hardy-Weinberg Test

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Contents

Introduction

This page details a Hardy-Weinberg Equilibrium test based on genotype likelihoods in NGS data.

Formulation

Hardy Weinberg equilibrium is expected in a panmictic population. The following formulation is a likelihood ratio test statistic that incorporates genotype uncertainty via genotype likelihoods. $P(R_{k}|\textbf{p})$ is the probability of observing the reads for individual $k$ assuming that a locus observes HWE. $P(R_{k}|\textbf{g})$ is the probability of observing the reads for individual $k$ assuming that a locus does not observe HWE. $G_{i,j}$ denotes the genotype composed of alleles $i$ and $j$ . $k$ indexes the individuals from $1$ to $N$ . $P(R_{k} |G_{i,j})$ is the genotype likelihood. $P(G_{i,j}|\textbf{p})$ and $P(G_{i,j}|\textbf{g})$ are the genotype frequencies estimated with and without HWE assumption respectively.

\begin{align} L(R|g) & = \frac{\prod_{k}{P(R_{k}|\textbf{p})}} {\prod_{k}{P(R_{k}|\textbf{g})}} \\ & = \frac{\prod_{k}{\sum_{i,j}{P(R_{k}, G_{i,j}|\textbf{p})}}} {\prod_{k}{\sum_{i,j}{P(R_{k}, G_{i,j}|\textbf{g})}}} \\ & = \frac{\prod_{k}{\sum_{i,j}{P(R_{k} |G_{i,j} )P(G_{i,j}|\textbf{p})}}} {\prod_{k}{\sum_{i,j}{P(R_{k} |G_{i,j})P(G_{i,j}|\textbf{g})}}} \\ \end{align}

The likelihood ratio test statistic is as follows with $v$ degrees of freedom where $n$ is the number of alleles.

\begin{align} -2logL(R|g) \sim X^2_v, v = \frac{n(n-1)}{2} \end{align}

Hyun.

Implementation

This is implemented in vt.