Difference between revisions of "Relationship between Ploidy, Alleles and Genotypes"
Line 59: | Line 59: | ||
| A | | A | ||
| <math> | | <math> | ||
− | F(2,A) = A + \binom{A}{2} | + | F(2,A) = A + \binom{A}{2} = \binom{A+1}{2} |
</math> | </math> | ||
| <math> | | <math> |
Revision as of 14:13, 31 January 2015
Introduction
The VCF format encodes genotypes by the index of the enumeration of genotypes give a ploidy number and alleles. This allows for direct access to a genotype value within an array particularly when one works with genotype likelihoods.
Motivation
Plants species exhibit a diverse number of ploidy, for example, the strawberry is an octoploid and the pear is a triploid.
While there are explicit functions that could be googled for handling haploid and diploidy cases. It seems to be difficult to find the closed forms for the general case. This wiki fills in that need.
The number of genotypes given a ploidy and alleles
where P is the ploidy number and A is the number of alleles.
Getting the index of a genotype in an enumerated list given a ploidy and alleles
where , .. are the alleles in numeric encoding (0 to A-1) and are ordered (AB, ABCCCC). For example ACB is not ordered.
Simple cases
Ploidy | Alleles | Genotypes | Index |
---|---|---|---|
1 | A | ||
2 | A |
Derivation for counting the number of genotypes
For the case where A < P, there must always be P observed alleles and there can only be at most A alleles. This can be modeled by P+A-1 points where you choose A-1 points to be dividers to define the alleles. Thus the number of ways you can observe this is .
For the case where A >= P,
Derivation for getting the index of a genotype in an enumerated list
An important observation here is that for the enumeration of A alleles for a given P ploidy, the enumeration of A-1 alleles for P ploidy is a subsequence.
Index | A=4,P=3 | A=3,P=3 | A=2,P=3 | A=1,P=3 |
---|---|---|---|---|
1 2 |
AAA AAB |
AAA AAB |
AAA AAB |
AAA
|
a_1, ... a_P is ordered and indexed 0 to A-1. Clearly when P = 1, the enumeration is of the genotypes is trivially the same as the allele.
when P > 1,
This gives a recursive relationship that is a chain of P-1 calculations.
Index | Iteration 0 | Iteration 1 | Iteration 2 | ||
---|---|---|---|---|---|
1 2 |
AAA AAB |
AAA AAB |
AA |
AA |
A |
Function call | G(CCD) | F(3,3) | G(CC) | F(2,2) | G(C) |
value returned | 10 | 3 | 3 |
Algorithm for enumerating the genotypes given ploidy and alleles
The below code is for enumerating genotypes and can be used to test the above equations.
uint32_t no = 0 // some global variable void print_genotypes(uint32_t A, uint32_t P, std::string genotype) { if (genotype.size()==P) { std::cerr << no << ") " << genotype << "\n"; ++no; } else { for (uint32_t a=0; a<A; ++a) { std::string s(1,(char)(a+65)); s.append(genotype); print_genotypes(a+1, P, s); } } }
Maintained by
This page is maintained by Adrian.