Relationship between Ploidy, Alleles and Genotypes
The VCF format encodes genotypes by the index of the enumeration of genotypes give a ploidy number and alleles. Ploidy and alles are independent of one another while genotypes are a function of them.
While there are explicit functions that could be googled for handling haploid and diploidy cases. It seems to be difficult to find the closed forms for the general case. This wiki fills in that need. The cases where one requires such extensions is when pooled samples are studied or when plant species that exhibit a diverse number of ploidy.
The number of genotypes given a ploidy and alleles
The indexing of genotypes given a ploidy and alleles
|ploidy \le alleles||ploidy alleles
|18794||18849||bcftools's normalization is buggy, variants were truncated despite having differing prefix.|
|ploidy gt alleles||-||-||-||-|
|#normalized after gatk||-||0||57||57 variants from GATK's normalization were left aligned by vt. 6 were biallelic and 51 were multiallelic. Note that 2 variants were changed by GATK but were not completely normalized.|
|#normalized after vt||-||0||0||no variants processed by vt were further normalized.|