# Difference between revisions of "Genotype Likelihood based Inbreeding Coefficient"

## Contents

### Introduction

Inbreeding Coefficient is an important statistic in the study of genetic variants. This page details a method to estimate inbreeding coefficients from genotype likelihoods in NGS data.

### Formulation

The inbreeding coefficient $F_{IC}$ is a measure of deviation from the Hardy Weinberg Equilibrium in terms of the excess of heterozygotes observed. A value of 0 implies no deviation, a negative value implies an excess of heterozygotes and a positive value implies an excess of homozygotes. $F_{IC}$ ranges from -1 to 1.

The following equation gives the estimate of F where the observed genotypes are available. $g_{i,j,k}$ is the genotype composed of alleles $i$ and $j$ for the $k$th individual.$P(G_{i,j}|\textbf{p})$ is the estimated genotype allele frequency for genotype $G_{i,j}$ under HWE assumption. $I[i \ne j]$ is an indicator function for heterozygote genotypes.

\begin{align} F_{IC} & = 1 - \frac{O[Het]}{E[Het|\textbf{p}]} \\ & = 1 - \frac{\sum_{i,j,k}{g_{i,j,k}I[i \ne j]}}{{\sum_{i,j}{P(G_{i,j}|\textbf{p})I[i \ne j]}}} \\ \end{align}

The following equation gives the estimate of F where genotype likelihoods are available. $P(R_{k} |G_{i,j})$ is the genotype likelihood for individual $k$ given genotype $G_{i,j}$. This is basically the probability of observing the reads in individual $k$ assuming $G_{i,j}$ is the underlying true genotype for that particular locus.

\begin{align} F_{IC} & = 1 - \frac{O[Het]}{E[Het|\textbf{p}]} \\ & = 1 - \frac{\sum_{i,j,k}{P(G_{i,j}|R_k , \textbf{p})I[i \ne j]}} {{\sum_{i,j}{P(G_{i,j}|\textbf{p})I[i \ne j]}}} \\ & = 1 - \frac{\sum_{i,j,k}{\frac{P(R_k|G_{i,j})P(G_{i,j}|\textbf{p})}{\sum_{i',j'}{P(R_k|G_{i',j'})P(G_{i',j'}|\textbf{p})}}}I[i \ne j]} {\sum_{i,j}{P(G_{i,j}|\textbf{p})I[i \ne j]}} \\ \end{align}

### Derivation

Adrian with much help from Hyun.

### Implementation

This is implemented in vt.