# Genotype Likelihood based Inbreeding Coefficient

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

### Introduction

Inbreeding Coefficient is an important statistic in the study of genetic variants. This page details a method to estimate inbreeding coefficients from genotype likelihoods in NGS data.

### Formulation

The inbreeding coefficient ${\displaystyle F_{IC}}$ is a measure of deviation from the Hardy Weinberg Equilibrium in terms of the excess of heterozygotes observed. A value of 0 implies no deviation, a negative value implies an excess of heterozygotes and a positive value implies an excess of homozygotes. ${\displaystyle F_{IC}}$ ranges from -1 to 1.

The following equation gives the estimate of F where the observed genotypes are available. ${\displaystyle g_{i,j,k}}$ is the genotype composed of alleles ${\displaystyle i}$ and ${\displaystyle j}$ for the ${\displaystyle k}$th individual.${\displaystyle P(G_{i,j}|{\textbf {p}})}$ is the estimated genotype allele frequency for genotype ${\displaystyle G_{i,j}}$ under HWE assumption. ${\displaystyle I[i\neq j]}$ is an indicator function for heterozygote genotypes.

{\displaystyle {\begin{aligned}F_{IC}&=1-{\frac {O[Het]}{E[Het|{\textbf {p}}]}}\\&=1-{\frac {\sum _{i,j,k}{g_{i,j,k}I[i\neq j]}}{\sum _{i,j}{P(G_{i,j}|{\textbf {p}})I[i\neq j]}}}\\\end{aligned}}}

The following equation gives the estimate of F where genotype likelihoods are available. ${\displaystyle P(R_{k}|G_{i,j})}$ is the genotype likelihood for individual ${\displaystyle k}$ given genotype ${\displaystyle G_{i,j}}$. This is basically the probability of observing the reads in individual ${\displaystyle k}$ assuming ${\displaystyle G_{i,j}}$ is the underlying true genotype for that particular locus.

{\displaystyle {\begin{aligned}F_{IC}&=1-{\frac {O[Het]}{E[Het|{\textbf {p}}]}}\\&=1-{\frac {\sum _{i,j,k}{P(G_{i,j}|R_{k},{\textbf {p}})I[i\neq j]}}{\sum _{i,j}{P(G_{i,j}|{\textbf {p}})I[i\neq j]}}}\\&=1-{\frac {\sum _{i,j,k}{\frac {P(R_{k}|G_{i,j})P(G_{i,j}|{\textbf {p}})}{\sum _{i',j'}{P(R_{k}|G_{i',j'})P(G_{i',j'}|{\textbf {p}})}}}I[i\neq j]}{\sum _{i,j}{P(G_{i,j}|{\textbf {p}})I[i\neq j]}}}\\\end{aligned}}}

### Derivation

Adrian with much help from Hyun.

### Implementation

This is implemented in vt.