Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Created page with '=== Introduction === Allele frequencies are an important statistic in the study of genetic variants. This page details EM algorithms to estimate allele frequencies from genoty…'
=== Introduction ===

Allele frequencies are an important statistic in the study of genetic variants. This page details EM algorithms to estimate allele frequencies from genotype likelihoods in NGS data.

=== Estimation of Genotype Frequencies without assuming HWE ===

This is an EM algorithm to estimate the genotype frequencies without assuming HWE. The posterior probability of the genotype given the reads for individual k (<math>R_k</math>) for the <math>l</math>th iteration is given by:

<math>
\begin{align}
P(G_{i,j}|R_{k})^{(l)}=\frac{P(R_{k}|G_{i,j})P(G_{i,j})^{(l-1)}}{\sum_{(i,j)}{P(R_{k}|G_{i,j})P(G_{i,j})^{(l-1)}}}
\end{align}
</math>

where <math>G_{i,j}</math> denotes the genotype composed of alleles <math>i</math> and <math>j</math>. <math>k</math> indexes the individuals from <math>1</math> to <math>N</math>.
The initial genotype probability is given by:

<math>
\begin{align}
P(G_{i,j})^{(0)} = f_{i,j}^{(0)} = \frac{2}{n(n+1)}
\end{align}
</math>

The E step equates the expectation of the genotype <math>G_{i,j}</math> for individual k as:

<math>
\begin{align}
E[G_{i,j}|R_{k}]^{(l)}=P(G_{i,j}|R_{k})^{(l)}
\end{align}
</math>

The M step estimates the genotype frequency using the individual expected genotype counts:

<math>
\begin{align}
P(G_{i,j})^{(l)} = f_{i,j}^{(l)} = \frac{1}{N}\sum_{k}{E[G_{i,j}|R_{k}]}^{(l)}
\end{align}
</math>

This is repeated till the appropriate convergence criteria is achieved.

=== Estimation of Genotype Frequencies assuming HWE ===

In order to estimate allele frequencies under HWE assumption, the E step estimates the individual expected posterior allele count for each individual.

<math>
\begin{align}
E[I|R_{k}]^{(l)}=P(G_{i,i}|R_{k})^{(l)} + 0.5P(G_{i,j}|R_{k})^{(l)}
\end{align}
</math>

In the M step, the posterior genotype frequencies are derived from the computed genotype allele frequencies obtained in the E step assuming HWE.

<math>
\begin{align}
P(I)^{(l)} = \frac{1}{N}\sum_{k}{E[I|R_{k}]}^{(l)}
\end{align}
</math>

<math>
P(G_{i,j})^{(l)} = \begin{cases}
(P(I)^{(l)})^2, & \text{if }i=j \\
2P(I)^{(l)}P(J)^{(l)}, & \text{if }i \ne j
\end{cases}
</math>

This is repeated till the appropriate convergence criteria is achieved.

=== Used in ===

[[HWEP|Hardy-Weinberg Likelihood Test statistic]] and [[FIC| Inbreeding Coefficient]]

=== Derivation ===

Adrian with much help from Hyun.

=== Maintained by ===

This page is maintained by [mailto:atks@umich.edu Adrian].
1,102

edits

Navigation menu