Difference between revisions of "Minimac Diagnostics"

From Genome Analysis Wiki
Jump to navigationJump to search
(Created page with ''''minimac''' is a tool for imputation of missing genotypes into phased haplotypes. At the end of each run, minimac generates summaries of imputation quality and stores t…')
 
Line 2: Line 2:
  
 
== Basic Descriptors ==
 
== Basic Descriptors ==
 +
 +
=== Marker and Allele Labels ===
  
 
The first three columns in the ''.info'' file list marker name and alleles for each marker. Typically, the most common allele will be listed first, but this is not guaranteed.
 
The first three columns in the ''.info'' file list marker name and alleles for each marker. Typically, the most common allele will be listed first, but this is not guaranteed.
 +
 +
=== Estimated Allele Frequency ===
  
 
The next column in the ''.info'' file lists the estimated frequency of allele 1 - this corresponds to the average number of imputed copies of allele 1 for each individual, divided by two.
 
The next column in the ''.info'' file lists the estimated frequency of allele 1 - this corresponds to the average number of imputed copies of allele 1 for each individual, divided by two.
 +
 +
=== Estimated Imputation Accuracy ===
  
 
Frequency information is followed by an estimate of the squared correlation between imputed genotypes and true, unobserved genotypes. Since true genotypes are not available, this calculation is based on the idea that poorly imputed genotype counts will shrink towards their expectations based on population allele frequencies alone; specifically <math>2p</math> where <math>p</math> is the frequency of the allele being imputed.
 
Frequency information is followed by an estimate of the squared correlation between imputed genotypes and true, unobserved genotypes. Since true genotypes are not available, this calculation is based on the idea that poorly imputed genotype counts will shrink towards their expectations based on population allele frequencies alone; specifically <math>2p</math> where <math>p</math> is the frequency of the allele being imputed.
Line 12: Line 18:
  
 
:<math>\hat{r}^2 = {{Var(\mbox{Estimated Counts})}\over{\hat{p}(1-\hat{p})}}</math>
 
:<math>\hat{r}^2 = {{Var(\mbox{Estimated Counts})}\over{\hat{p}(1-\hat{p})}}</math>
 +
 +
== Leave One Out Statistics ==
 +
 +
=== looRsq : Estimated R-squared in Leave-One-Out Analysis ===
 +
 +
=== empR : Correlation Between Imputed and True Genotypes ===
 +
 +
=== empRsq : Squared Correlation Between Imputed and True Genotypes ===

Revision as of 09:59, 21 October 2010

minimac is a tool for imputation of missing genotypes into phased haplotypes. At the end of each run, minimac generates summaries of imputation quality and stores those in a .info file.

Basic Descriptors

Marker and Allele Labels

The first three columns in the .info file list marker name and alleles for each marker. Typically, the most common allele will be listed first, but this is not guaranteed.

Estimated Allele Frequency

The next column in the .info file lists the estimated frequency of allele 1 - this corresponds to the average number of imputed copies of allele 1 for each individual, divided by two.

Estimated Imputation Accuracy

Frequency information is followed by an estimate of the squared correlation between imputed genotypes and true, unobserved genotypes. Since true genotypes are not available, this calculation is based on the idea that poorly imputed genotype counts will shrink towards their expectations based on population allele frequencies alone; specifically where is the frequency of the allele being imputed.

Currently, minimac uses the following definition:

Leave One Out Statistics

looRsq : Estimated R-squared in Leave-One-Out Analysis

empR : Correlation Between Imputed and True Genotypes

empRsq : Squared Correlation Between Imputed and True Genotypes