Changes

From Genome Analysis Wiki
Jump to navigationJump to search
38 bytes added ,  08:23, 17 November 2011
Line 73: Line 73:  
We strongly recommend QC both before and after imputation. Before imputation, we recommend the standard battery of QC filters including HWE, MAF (recommended cutoff is 1% for genotyping-based GWAS), completeness, Mendelian inconsistency etc. Post-imputation, we recommend Rsq 0.3 (which removes >70% of poorly-imputed SNPs at the cost of <0.5% well-imputed SNPs) and MAF of 1%.  
 
We strongly recommend QC both before and after imputation. Before imputation, we recommend the standard battery of QC filters including HWE, MAF (recommended cutoff is 1% for genotyping-based GWAS), completeness, Mendelian inconsistency etc. Post-imputation, we recommend Rsq 0.3 (which removes >70% of poorly-imputed SNPs at the cost of <0.5% well-imputed SNPs) and MAF of 1%.  
   −
== How do I get reference files for an region of interest? ==
+
== How do I get reference files for an region of interest? ==
   −
Note that you do not need to extract regional pedigree files for your own samples because SNPs in pedigree but not in reference will be automatically discarded. <br>
+
Note that you do not need to extract regional pedigree files for your own samples because SNPs in pedigree but not in reference will be automatically discarded. <br> 1. For HapMapII format, download haplotypes from http://www.sph.umich.edu/csg/ylwtx/HapMapForMach.tgz <br> 2. For MACH format, you can do the following:  
1. For HapMapII format, download haplotypes from http://www.sph.umich.edu/csg/ylwtx/HapMapForMach.tgz <br>
  −
2. For MACH format, you can do the following:  
      
*First, find the first and last SNP in the region you are interested in. Say "rsFIRST" and "rsLAST", defined according to position.  
 
*First, find the first and last SNP in the region you are interested in. Say "rsFIRST" and "rsLAST", defined according to position.  
Line 83: Line 81:     
   @ first = `grep -nw rsFIRST orig.snps | cut -f1 -d ':'`
 
   @ first = `grep -nw rsFIRST orig.snps | cut -f1 -d ':'`
  @ last = `grep -nw rsLAST orig.snps | cut -f1 -d ':'`
+
@ last = `grep -nw rsLAST orig.snps | cut -f1 -d ':'`
   −
*Finally (assuming the third field contains the actual haplotypes, where alleles are separated by whitespace):
+
*Then find out the field that contains the actual haplotypes, where alleles are separated by whitespace
 +
  head -1 orig.hap | wc -w
   −
   awk '{print $3}' orig.hap | cut -c${first}-${last} &gt; region.hap
+
* Finally:
 +
 
 +
   awk '{print $'''3'''}' orig.hap | cut -c${first}-${last} &gt; region.hap
    
The created reference files are in MaCH format. You do NOT need to turn on --hapmapFormat option.
 
The created reference files are in MaCH format. You do NOT need to turn on --hapmapFormat option.
212

edits

Navigation menu