# Genotype Likelihood based Hardy-Weinberg Test

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

### Introduction

This page details a Hardy-Weinberg Equilibrium test based on genotype likelihoods in NGS data.

### Formulation

Hardy Weinberg equilibrium is expected in a panmictic population. The following formulation is a likelihood ratio test statistic that incorporates genotype uncertainty via genotype likelihoods. $P(R_{k}|{\textbf {p}})$ is the probability of observing the reads for individual $k$ assuming that a locus observes HWE. $P(R_{k}|{\textbf {g}})$ is the probability of observing the reads for individual $k$ assuming that a locus does not observe HWE. $G_{i,j}$ denotes the genotype composed of alleles $i$ and $j$ . $k$ indexes the individuals from $1$ to $N$ . $P(R_{k}|G_{i,j})$ is the genotype likelihood. $P(G_{i,j}|{\textbf {p}})$ and $P(G_{i,j}|{\textbf {g}})$ are the genotype frequencies estimated with and without HWE assumption respectively.

{\begin{aligned}L(R|g)&={\frac {\prod _{k}{P(R_{k}|{\textbf {p}})}}{\prod _{k}{P(R_{k}|{\textbf {g}})}}}\\&={\frac {\prod _{k}{\sum _{i,j}{P(R_{k},G_{i,j}|{\textbf {p}})}}}{\prod _{k}{\sum _{i,j}{P(R_{k},G_{i,j}|{\textbf {g}})}}}}\\&={\frac {\prod _{k}{\sum _{i,j}{P(R_{k}|G_{i,j})P(G_{i,j}|{\textbf {p}})}}}{\prod _{k}{\sum _{i,j}{P(R_{k}|G_{i,j})P(G_{i,j}|{\textbf {g}})}}}}\\\end{aligned}} The likelihood ratio test statistic is as follows with $v$ degrees of freedom where $n$ is the number of alleles.

{\begin{aligned}-2logL(R|g)\sim X_{v}^{2},v={\frac {n(n-1)}{2}}\end{aligned}} Hyun.

### Implementation

This is implemented in vt.