From Genome Analysis Wiki
→How do I get imputation quality estimates?
== How do I get imputation quality estimates? ==
A simple approach is to use --mask option. For example, --mask 0.02 masks 2% of the genotypes at random, impute them and compare with the masked original to estimate genotypic and allelic error rates. Messages like the following will be generated to stdout:
Comparing 948352 masked genotypes with MLE estimates ...
A better approach is to mask a small proportion of SNPs (vs. genotypes in the above simple approach). One can generate a mask.dat from the original .dat file by simply changing the flag of a subset of markers from M to S2 without duplicating the .ped file. Post-imputation, one can use [http://www.sph.umich.edu/csg/ylwtx/CalcMatch.1.0.5.tgz CalcMatch ]and [http://www.sph.umich.edu/csg/ylwtx/doseR2.tgz doseR2.pl ]to estimate genotypic/allelic error rate and correlation respectively. Both programs can be downloaded from [http://www.sph.umich.edu/csg/ylwtx/software.html http://www.sph.umich.edu/csg/ylwtx/software.html].
'''Warning''': Imputation involving masked datasets should be performed separately for imputation quality estimation. For production, one should use all available information.
== Shall I apply QC before or after imputation? If so, how? ==