|
|
(6 intermediate revisions by the same user not shown) |
Line 1: |
Line 1: |
− | === Estimation of Genotype Frequencies without assuming HWE ===
| + | #REDIRECT [[Genotype_Likelihood_based_Allele_Frequency]] |
− | | |
− | We propose an EM algorithm to estimate the genotype frequencies without assuming HWE. The posterior probability of the genotype given the reads for individual k ($R_k$) for the $l$th iteration is given by: \\
| |
− | | |
− | <math>
| |
− | \begin{align}
| |
− | P(G_{i,j}|R_{k})^{(l)}=\frac{P(R_{k}|G_{i,j})P(G_{i,j})^{(l-1)}}{\sum_{(i,j)}{P(R_{k}|G_{i,j})P(G_{i,j})^{(l-1)}}}
| |
− | \end{align}
| |
− | </math>
| |
− | | |
− | where $G_{i,j}$ denotes the genotype composed of alleles $i$ and $j$. $k$ indexes the individuals from $1$ to $N$.
| |
− | The initial genotype probability is given by:
| |
− | | |
− | <math>
| |
− | \begin{align}
| |
− | P(G_{i,j})^{(0)} = f_{i,j}^{(0)} = \frac{2}{n(n+1)}
| |
− | \end{align}
| |
− | </math>
| |
− | | |
− | The E step equates the expectation of the genotype $G_{i,j}$ for individual k as:
| |
− |
| |
− | <math>
| |
− | \begin{align}
| |
− | E[G_{i,j}|R_{k}]^{(l)}=P(G_{i,j}|R_{k})^{(l)}
| |
− | \end{align}
| |
− | </math>
| |
− | | |
− | The M step estimates the genotype frequency using the individual expected genotype counts:
| |
− |
| |
− | <math>
| |
− | \begin{align}
| |
− | P(G_{i,j})^{(l)} = f_{i,j}^{(l)} = \frac{1}{N}\sum_{k}{E[G_{i,j}|R_{k}]}^{(l)}
| |
− | \end{align}
| |
− | </math>
| |
− | | |
− | This is repeated till the appropriate convergence criteria is achieved.
| |
− | | |
− | === Estimation of Genotype Frequencies assuming HWE ===
| |
− | | |
− | In order to estimate allele frequencies under HWE assumption, the E step estimates the individual expected posterior allele count for each individual.
| |
− | | |
− | <math>
| |
− | \begin{align}
| |
− | E[I|R_{k}]^{(l)}=P(G_{i,i}|R_{k})^{(l)} + 0.5P(G_{i,j}|R_{k})^{(l)}
| |
− | \end{align}
| |
− | </math>
| |
− | | |
− | In the M step, the posterior genotype frequencies are derived from the computed genotype allele frequencies obtained in the E step assuming HWE.
| |
− | | |
− | <math>
| |
− | \begin{align}
| |
− | P(I)^{(l)} = \frac{1}{N}\sum_{k}{E[I|R_{k}]}^{(l)}
| |
− | \end{align}
| |
− | </math>
| |
− | | |
− | <math>
| |
− | P(G_{i,j})^{(l)} = \begin{cases}
| |
− | (P(I)^{(l)})^2, & \text{if }i=j \\
| |
− | 2P(I)^{(l)}P(J)^{(l)}, & \text{if }i \ne j
| |
− | \end{cases}
| |
− | </math>
| |
− |
| |
− | This is repeated till the appropriate convergence criteria is achieved.
| |
− | | |
− | | |
− | === Maintained by ===
| |
− | | |
− | This page is maintained by [mailto:atks@umich.edu Adrian].
| |
− | with much help from Hyun.
| |