|
|
(2 intermediate revisions by the same user not shown) |
Line 1: |
Line 1: |
− | === Estimation of Genotype Frequencies without assuming HWE ===
| + | #REDIRECT [[Genotype_Likelihood_based_Allele_Frequency]] |
− | | |
− | We propose an EM algorithm to estimate the genotype frequencies without assuming HWE. The posterior probability of the genotype given the reads for individual k (<math>R_k</math>) for the <math>l</math>th iteration is given by:
| |
− | | |
− | <math>
| |
− | \begin{align}
| |
− | P(G_{i,j}|R_{k})^{(l)}=\frac{P(R_{k}|G_{i,j})P(G_{i,j})^{(l-1)}}{\sum_{(i,j)}{P(R_{k}|G_{i,j})P(G_{i,j})^{(l-1)}}}
| |
− | \end{align}
| |
− | </math>
| |
− | | |
− | where <math>G_{i,j}</math> denotes the genotype composed of alleles <math>i</math> and <math>j</math>. <math>k</math> indexes the individuals from <math>1</math> to <math>N</math>.
| |
− | The initial genotype probability is given by:
| |
− | | |
− | <math>
| |
− | \begin{align}
| |
− | P(G_{i,j})^{(0)} = f_{i,j}^{(0)} = \frac{2}{n(n+1)}
| |
− | \end{align}
| |
− | </math>
| |
− | | |
− | The E step equates the expectation of the genotype <math>G_{i,j}</math> for individual k as:
| |
− |
| |
− | <math>
| |
− | \begin{align}
| |
− | E[G_{i,j}|R_{k}]^{(l)}=P(G_{i,j}|R_{k})^{(l)}
| |
− | \end{align}
| |
− | </math>
| |
− | | |
− | The M step estimates the genotype frequency using the individual expected genotype counts:
| |
− |
| |
− | <math>
| |
− | \begin{align}
| |
− | P(G_{i,j})^{(l)} = f_{i,j}^{(l)} = \frac{1}{N}\sum_{k}{E[G_{i,j}|R_{k}]}^{(l)}
| |
− | \end{align}
| |
− | </math>
| |
− | | |
− | This is repeated till the appropriate convergence criteria is achieved.
| |
− | | |
− | === Estimation of Genotype Frequencies assuming HWE ===
| |
− | | |
− | In order to estimate allele frequencies under HWE assumption, the E step estimates the individual expected posterior allele count for each individual.
| |
− | | |
− | <math>
| |
− | \begin{align}
| |
− | E[I|R_{k}]^{(l)}=P(G_{i,i}|R_{k})^{(l)} + 0.5P(G_{i,j}|R_{k})^{(l)}
| |
− | \end{align}
| |
− | </math>
| |
− | | |
− | In the M step, the posterior genotype frequencies are derived from the computed genotype allele frequencies obtained in the E step assuming HWE.
| |
− | | |
− | <math>
| |
− | \begin{align}
| |
− | P(I)^{(l)} = \frac{1}{N}\sum_{k}{E[I|R_{k}]}^{(l)}
| |
− | \end{align}
| |
− | </math>
| |
− | | |
− | <math>
| |
− | P(G_{i,j})^{(l)} = \begin{cases}
| |
− | (P(I)^{(l)})^2, & \text{if }i=j \\
| |
− | 2P(I)^{(l)}P(J)^{(l)}, & \text{if }i \ne j
| |
− | \end{cases}
| |
− | </math>
| |
− |
| |
− | This is repeated till the appropriate convergence criteria is achieved.
| |
− | | |
− | === Derivation by ===
| |
− | | |
− | Adrian with much help from Hyun.
| |
− | | |
− | === Maintained by ===
| |
− | | |
− | This page is maintained by [mailto:atks@umich.edu Adrian].
| |