Genotype Likelihood based Inbreeding Coefficient

From Genome Analysis Wiki
Jump to: navigation, search



Inbreeding Coefficient is an important statistic in the study of genetic variants. This page details a method to estimate inbreeding coefficients from genotype likelihoods in NGS data.


The inbreeding coefficient F_{IC} is a measure of deviation from the Hardy Weinberg Equilibrium in terms of the excess of heterozygotes observed. A value of 0 implies no deviation, a negative value implies an excess of heterozygotes and a positive value implies an excess of homozygotes. F_{IC} ranges from -1 to 1.

The following equation gives the estimate of F where the observed genotypes are available. g_{i,j,k} is the genotype composed of alleles i and j for the kth individual.P(G_{i,j}|\textbf{p}) is the estimated genotype allele frequency for genotype G_{i,j} under HWE assumption. I[i \ne j] is an indicator function for heterozygote genotypes.

F_{IC} & =   1 - \frac{O[Het]}{E[Het|\textbf{p}]}   \\
   		& =  1 - \frac{\sum_{i,j,k}{g_{i,j,k}I[i \ne j]}}{{\sum_{i,j}{P(G_{i,j}|\textbf{p})I[i \ne j]}}}   \\

The following equation gives the estimate of F where genotype likelihoods are available. P(R_{k} |G_{i,j}) is the genotype likelihood for individual k given genotype G_{i,j}. This is basically the probability of observing the reads in individual k assuming G_{i,j} is the underlying true genotype for that particular locus.

F_{IC} & =  1 - \frac{O[Het]}{E[Het|\textbf{p}]}  \\
   		& =  1 - \frac{\sum_{i,j,k}{P(G_{i,j}|R_k , \textbf{p})I[i \ne j]}} {{\sum_{i,j}{P(G_{i,j}|\textbf{p})I[i \ne j]}}}    \\
   		& =  1 - \frac{\sum_{i,j,k}{\frac{P(R_k|G_{i,j})P(G_{i,j}|\textbf{p})}{\sum_{i',j'}{P(R_k|G_{i',j'})P(G_{i',j'}|\textbf{p})}}}I[i \ne j]}
               	 {\sum_{i,j}{P(G_{i,j}|\textbf{p})I[i \ne j]}}   \\   	


Adrian with much help from Hyun.


This is implemented in vt.

Maintained by

This page is maintained by Adrian.