From Genome Analysis Wiki
Jump to: navigation, search


38 bytes added, 08:23, 17 November 2011
How do I get reference files for an region of interest?
We strongly recommend QC both before and after imputation. Before imputation, we recommend the standard battery of QC filters including HWE, MAF (recommended cutoff is 1% for genotyping-based GWAS), completeness, Mendelian inconsistency etc. Post-imputation, we recommend Rsq 0.3 (which removes >70% of poorly-imputed SNPs at the cost of <0.5% well-imputed SNPs) and MAF of 1%.
== How do I get reference files for an region of interest? ==
Note that you do not need to extract regional pedigree files for your own samples because SNPs in pedigree but not in reference will be automatically discarded. <br>1. For HapMapII format, download haplotypes from <br>2. For MACH format, you can do the following:
*First, find the first and last SNP in the region you are interested in. Say "rsFIRST" and "rsLAST", defined according to position.
@ first = `grep -nw rsFIRST orig.snps | cut -f1 -d ':'`
@ last = `grep -nw rsLAST orig.snps | cut -f1 -d ':'`
*Finally (assuming Then find out the third field that contains the actual haplotypes, where alleles are separated by whitespace): head -1 orig.hap | wc -w
* Finally:  awk '{print $'''3'''}' orig.hap | cut -c${first}-${last} &gt; region.hap
The created reference files are in MaCH format. You do NOT need to turn on --hapmapFormat option.

Navigation menu