Biostatistics 666: Introduction to the Coalescent

From Genome Analysis Wiki
Jump to: navigation, search


Introduce the coalescent as a practical means to model the properties of alleles in a population.

The coalescent is an extremely convenient modeling tool for studying population genetic properties, such as the number of variants in region, their allele frequencies and linkage disequilibrium relationships between them. We'll discuss the advantages of the coalescent relative to alternatives such as forward simulation and introduce some of the properties of genetic variation that can be studied with the coalescent.

Tajima's D

This lecture introduces two different estimators of the population genetics parameter \theta, which is expected to equal 4N\mu in constant size, equilibrium populations. One estimator is based on the total number of segregating sites in the population, S, whereas the other estimator was based on the average number of pairwise differences between two sequences. In constant size, equilibrium populations the two parameters are expected to take the same value -- but, when the population is growing, is under natural selection, or otherwise deviates from the neutral model, they can be consistently different.

Tajima's D is a statistic that compares the two estimators. It takes positive values when the value of \theta estimated based on the number of pairwise differences is greater than that estimated based on the number of segregating sites. It takes negative values when the value of \theta estimated based on the number of pairwise differences is smaller.

Do you have an intuition of whether purifying selection might lead to positive or negative values of Tajima's D? What about population growth and population bottlenecks?


Slides in PDF Format

Background Reading

Richard R. Hudson (1990) Gene genealogies and the coalescent process in Oxford Surveys in Evolutionary Biology (vol. 7). Edited by D. Futuyma and J. Antonovics. Published by Oxford University Press, New York.