Changes

From Genome Analysis Wiki
Jump to navigationJump to search
18 bytes removed ,  11:22, 2 February 2017
Line 62: Line 62:  
   Estimated per allele error rate is 0.0293  
 
   Estimated per allele error rate is 0.0293  
   −
A better approach is to mask a small proportion of SNPs (vs. genotypes in the above simple approach). One can generate a mask.dat from the original .dat file by simply changing the flag of a subset of markers from M to S2 without duplicating the .ped file. Post-imputation, one can use   [http://genome.sph.umich.edu/wiki/CalcMatch CalcMatch ]and [http://www.sph.umich.edu/csg/ylwtx/doseR2.tgz doseR2.pl ]to estimate genotypic/allelic error rate and correlation respectively. Both programs can be downloaded from [http://www.sph.umich.edu/csg/ylwtx/software.html http://www.sph.umich.edu/csg/ylwtx/software.html].  
+
A better approach is to mask a small proportion of SNPs (vs. genotypes in the above simple approach). One can generate a mask.dat from the original .dat file by simply changing the flag of a subset of markers from M to S2 without duplicating the .ped file. Post-imputation, one can use   [http://genome.sph.umich.edu/wiki/CalcMatch CalcMatch ]and [http://csg.sph.umich.edu//ylwtx/doseR2.tgz doseR2.pl ]to estimate genotypic/allelic error rate and correlation respectively. Both programs can be downloaded from [http://csg.sph.umich.edu//ylwtx/software.html http://csg.sph.umich.edu//ylwtx/software.html].  
    
'''Warning''': Imputation involving masked datasets should be performed separately for imputation quality estimation. For production, one should use all available information.
 
'''Warning''': Imputation involving masked datasets should be performed separately for imputation quality estimation. For production, one should use all available information.
Line 69: Line 69:  
In the simple approach, you will only get concordance/error estimates. There are two aspects to check. (1) the ratio between the genotypic error and allelic error. We expect that only a small proportion of errors where one homozygote is imputed as the other homozygote. Therefore, a ~2:1 ratio is expected. (2) the absolute error rate. There are several factors influencing imputation quality including the population to be imputed, the reference population and the genotyping panel used. Typically, we expect <2% allelic error rate among Caucasians and East Asians; 3-5% among Africans and African Americans. Figure below show imputation quality from the Human Genome Diversity Project (HGDP) for 52 populations across the world and by different HapMap reference panel.
 
In the simple approach, you will only get concordance/error estimates. There are two aspects to check. (1) the ratio between the genotypic error and allelic error. We expect that only a small proportion of errors where one homozygote is imputed as the other homozygote. Therefore, a ~2:1 ratio is expected. (2) the absolute error rate. There are several factors influencing imputation quality including the population to be imputed, the reference population and the genotyping panel used. Typically, we expect <2% allelic error rate among Caucasians and East Asians; 3-5% among Africans and African Americans. Figure below show imputation quality from the Human Genome Diversity Project (HGDP) for 52 populations across the world and by different HapMap reference panel.
   −
http://www.sph.umich.edu/csg/yli/figure3.gif
+
http://csg.sph.umich.edu//yli/figure3.gif
    
Table 3 in the MaCH 1.0 paper  tabulates imputation quality by commercial panel in CEU, YRI, and CHB+JPT.
 
Table 3 in the MaCH 1.0 paper  tabulates imputation quality by commercial panel in CEU, YRI, and CHB+JPT.
Line 79: Line 79:  
== How do I get reference files for an region of interest?  ==
 
== How do I get reference files for an region of interest?  ==
   −
Note that you do not need to extract regional pedigree files for your own samples because SNPs in pedigree but not in reference will be automatically discarded. <br> 1. For HapMapII format, download haplotypes from http://www.sph.umich.edu/csg/ylwtx/HapMapForMach.tgz <br> 2. For MACH format, you can do the following:  
+
Note that you do not need to extract regional pedigree files for your own samples because SNPs in pedigree but not in reference will be automatically discarded. <br> 1. For HapMapII format, download haplotypes from http://csg.sph.umich.edu//ylwtx/HapMapForMach.tgz <br> 2. For MACH format, you can do the following:  
    
*First, find the first and last SNP in the region you are interested in. Say "rsFIRST" and "rsLAST", defined according to position.  
 
*First, find the first and last SNP in the region you are interested in. Say "rsFIRST" and "rsLAST", defined according to position.  
Line 216: Line 216:     
== Install MaCH ==
 
== Install MaCH ==
We have source codes available through the MaCH download page: http://www.sph.umich.edu/csg/yli/mach/download/ <br>
+
We have source codes available through the MaCH download page: http://csg.sph.umich.edu//yli/mach/download/ <br>
    
== More questions?  ==
 
== More questions?  ==
    
Email [mailto:yunli@med.unc.edu Yun Li] or [mailto:goncalo@umich.edu Goncalo Abecasis].
 
Email [mailto:yunli@med.unc.edu Yun Li] or [mailto:goncalo@umich.edu Goncalo Abecasis].
96

edits

Navigation menu