# Genotype Likelihood based Inbreeding Coefficient

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

### Introduction

Inbreeding Coefficient is an important statistic in the study of genetic variants. This page details a method to estimate inbreeding coefficients from genotype likelihoods in NGS data.

### Formulation

The inbreeding coefficient $F_{IC}$ is a measure of deviation from the Hardy Weinberg Equilibrium in terms of the excess of heterozygotes observed. A value of 0 implies no deviation, a negative value implies an excess of heterozygotes and a positive value implies an excess of homozygotes. $F_{IC}$ ranges from -1 to 1.

The following equation gives the estimate of F where the observed genotypes are available. $g_{i,j,k}$ is the genotype composed of alleles $i$ and $j$ for the $k$ th individual.$P(G_{i,j}|{\textbf {p}})$ is the estimated genotype allele frequency for genotype $G_{i,j}$ under HWE assumption. $I[i\neq j]$ is an indicator function for heterozygote genotypes.

{\begin{aligned}F_{IC}&=1-{\frac {O[Het]}{E[Het|{\textbf {p}}]}}\\&=1-{\frac {\sum _{i,j,k}{g_{i,j,k}I[i\neq j]}}{\sum _{i,j}{P(G_{i,j}|{\textbf {p}})I[i\neq j]}}}\\\end{aligned}} The following equation gives the estimate of F where genotype likelihoods are available. $P(R_{k}|G_{i,j})$ is the genotype likelihood for individual $k$ given genotype $G_{i,j}$ . This is basically the probability of observing the reads in individual $k$ assuming $G_{i,j}$ is the underlying true genotype for that particular locus.

{\begin{aligned}F_{IC}&=1-{\frac {O[Het]}{E[Het|{\textbf {p}}]}}\\&=1-{\frac {\sum _{i,j,k}{P(G_{i,j}|R_{k},{\textbf {p}})I[i\neq j]}}{\sum _{i,j}{P(G_{i,j}|{\textbf {p}})I[i\neq j]}}}\\&=1-{\frac {\sum _{i,j,k}{\frac {P(R_{k}|G_{i,j})P(G_{i,j}|{\textbf {p}})}{\sum _{i',j'}{P(R_{k}|G_{i',j'})P(G_{i',j'}|{\textbf {p}})}}}I[i\neq j]}{\sum _{i,j}{P(G_{i,j}|{\textbf {p}})I[i\neq j]}}}\\\end{aligned}} ### Derivation

Adrian with much help from Hyun.

### Implementation

This is implemented in vt.