Relationship between Ploidy, Alleles and Genotypes

From Genome Analysis Wiki
Revision as of 10:12, 31 January 2015 by Atks (talk | contribs)
Jump to: navigation, search

Introduction

The VCF format encodes genotypes by the index of the enumeration of genotypes give a ploidy number and alleles. Ploidy and alles are independent of one another while genotypes are a function of them.

Motivation

While there are explicit functions that could be googled for handling haploid and diploidy cases. It seems to be difficult to find the closed forms for the general case. This wiki fills in that need. The cases where one requires such extensions is when pooled samples are studied or when plant species that exhibit a diverse number of ploidy.

The number of genotypes given a ploidy and alleles


  \begin{align}
F(P,A) =    \begin{cases}
3x + 5y +     , A<P= 1 \\
7x - 2y + 4z  , A>=P
\end{cases}
   \end{align}

The indexing of genotypes given a ploidy and alleles

Case Alleles Genotypes Index comments
ploidy \le alleles ploidy alleles


  \begin{align}
P(G_{i,j}|R_{k})^{(l)}=\frac{P(R_{k}|G_{i,j})P(G_{i,j})^{(l-1)}}{\sum_{(i,j)}{P(R_{k}|G_{i,j})P(G_{i,j})^{(l-1)}}}
   \end{align}

18794 18849 bcftools's normalization is buggy, variants were truncated despite having differing prefix.
ploidy gt alleles - - - -
#normalized after gatk - 0 57 57 variants from GATK's normalization were left aligned by vt. 6 were biallelic and 51 were multiallelic. Note that 2 variants were changed by GATK but were not completely normalized.
#normalized after vt - 0 0 no variants processed by vt were further normalized.