Relationship between Ploidy, Alleles and Genotypes

From Genome Analysis Wiki
Revision as of 10:36, 31 January 2015 by Atks (talk | contribs) (→‎Simple cases)
Jump to navigationJump to search

Introduction

The VCF format encodes genotypes by the index of the enumeration of genotypes give a ploidy number and alleles. Ploidy and alles are independent of one another while genotypes are a function of them.

Motivation

While there are explicit functions that could be googled for handling haploid and diploidy cases. It seems to be difficult to find the closed forms for the general case. This wiki fills in that need. The cases where one requires such extensions is when pooled samples are studied or when plant species that exhibit a diverse number of ploidy.

The number of genotypes given a ploidy and alleles

The indexing of genotypes given a ploidy and alleles


where a_1, a_2 .... are the alleles in numeric encoding (0 to A-1) and are ordered (AB, ABCCCC). For example ACB is not ordered.

Simple cases

Ploidy Alleles Genotypes Index
1 A A Simple haploid case
2 A Diploid Case
#normalized after gatk 0 57 57 variants from GATK's normalization were left aligned by vt. 6 were biallelic and 51 were multiallelic. Note that 2 variants were changed by GATK but were not completely normalized.
#normalized after vt - 0 no variants processed by vt were further normalized.