Relationship between Ploidy, Alleles and Genotypes

From Genome Analysis Wiki
Jump to navigationJump to search

Introduction

The VCF format encodes genotypes by the index of the enumeration of genotypes give a ploidy number and alleles. Ploidy and alles are independent of one another while genotypes are a function of them.

Motivation

While there are explicit functions that could be googled for handling haploid and diploidy cases. It seems to be difficult to find the closed forms for the general case. This wiki fills in that need. The cases where one requires such extensions is when pooled samples are studied or when plant species that exhibit a diverse number of ploidy.

The number of genotypes given a ploidy and alleles

The indexing of genotypes given a ploidy and alleles

Case Alleles Genotypes Index comments
ploidy \le alleles ploidy alleles

18794 18849 bcftools's normalization is buggy, variants were truncated despite having differing prefix.
ploidy gt alleles - - - -
#normalized after gatk - 0 57 57 variants from GATK's normalization were left aligned by vt. 6 were biallelic and 51 were multiallelic. Note that 2 variants were changed by GATK but were not completely normalized.
#normalized after vt - 0 0 no variants processed by vt were further normalized.