Difference between revisions of "AF"

From Genome Analysis Wiki
Jump to navigationJump to search
 
Line 1: Line 1:
=== Estimation of Genotype Frequencies without assuming HWE ===
+
#REDIRECT [[Genotype_Likelihood_based_Allele_Frequency]]
 
 
We propose an EM algorithm to estimate the genotype frequencies without assuming HWE.  The posterior probability of the genotype given the reads for individual k  (<math>R_k</math>) for the <math>l</math>th iteration is given by:
 
 
 
<math>
 
  \begin{align}
 
P(G_{i,j}|R_{k})^{(l)}=\frac{P(R_{k}|G_{i,j})P(G_{i,j})^{(l-1)}}{\sum_{(i,j)}{P(R_{k}|G_{i,j})P(G_{i,j})^{(l-1)}}}
 
  \end{align}
 
</math>
 
 
 
where <math>G_{i,j}</math> denotes the genotype composed of alleles <math>i</math> and <math>j</math>.  <math>k</math> indexes the individuals from <math>1</math> to <math>N</math>.
 
The initial genotype probability is given by:
 
 
 
<math>
 
  \begin{align}
 
P(G_{i,j})^{(0)} = f_{i,j}^{(0)} = \frac{2}{n(n+1)}
 
  \end{align}
 
</math>
 
 
 
The E step equates the expectation of the genotype <math>G_{i,j}</math> for individual k as:
 
 
<math>
 
  \begin{align}
 
E[G_{i,j}|R_{k}]^{(l)}=P(G_{i,j}|R_{k})^{(l)}
 
  \end{align}
 
</math>
 
 
 
The M step estimates the genotype frequency using the individual expected genotype counts:
 
 
<math>
 
  \begin{align}
 
P(G_{i,j})^{(l)} = f_{i,j}^{(l)} = \frac{1}{N}\sum_{k}{E[G_{i,j}|R_{k}]}^{(l)}
 
  \end{align}
 
</math>
 
 
 
This is repeated till the appropriate convergence criteria is achieved.
 
 
 
=== Estimation of Genotype Frequencies assuming HWE ===
 
 
 
In order to estimate allele frequencies under HWE assumption, the E step estimates the individual expected posterior allele count for each individual. 
 
 
 
<math>
 
  \begin{align}
 
E[I|R_{k}]^{(l)}=P(G_{i,i}|R_{k})^{(l)} + 0.5P(G_{i,j}|R_{k})^{(l)}
 
  \end{align}
 
</math>
 
 
 
In the M step, the posterior genotype frequencies are derived from the computed genotype allele frequencies obtained in the E step assuming HWE. 
 
 
 
<math>
 
  \begin{align}
 
P(I)^{(l)} =  \frac{1}{N}\sum_{k}{E[I|R_{k}]}^{(l)}
 
  \end{align}
 
</math>
 
 
 
<math>
 
  P(G_{i,j})^{(l)}  = \begin{cases}
 
                      (P(I)^{(l)})^2, &  \text{if }i=j \\
 
          2P(I)^{(l)}P(J)^{(l)},  & \text{if }i \ne j
 
                                    \end{cases}
 
</math>
 
 
 
This is repeated till the appropriate convergence criteria is achieved.
 
 
 
=== Used in ===
 
 
 
[[HWEP|Hardy-Weinberg Likelihood Test statistic]] and [[FIC| Inbreeding Coefficient]]
 
 
 
=== Derivation ===
 
 
 
Adrian with much help from Hyun.
 
 
 
=== Maintained by  ===
 
 
 
This page is maintained by  [mailto:atks@umich.edu Adrian].
 

Latest revision as of 13:38, 4 June 2013