From Genome Analysis Wiki
Jump to navigationJump to search
38 bytes added
, 08:23, 17 November 2011
Line 73: |
Line 73: |
| We strongly recommend QC both before and after imputation. Before imputation, we recommend the standard battery of QC filters including HWE, MAF (recommended cutoff is 1% for genotyping-based GWAS), completeness, Mendelian inconsistency etc. Post-imputation, we recommend Rsq 0.3 (which removes >70% of poorly-imputed SNPs at the cost of <0.5% well-imputed SNPs) and MAF of 1%. | | We strongly recommend QC both before and after imputation. Before imputation, we recommend the standard battery of QC filters including HWE, MAF (recommended cutoff is 1% for genotyping-based GWAS), completeness, Mendelian inconsistency etc. Post-imputation, we recommend Rsq 0.3 (which removes >70% of poorly-imputed SNPs at the cost of <0.5% well-imputed SNPs) and MAF of 1%. |
| | | |
− | == How do I get reference files for an region of interest? == | + | == How do I get reference files for an region of interest? == |
| | | |
− | Note that you do not need to extract regional pedigree files for your own samples because SNPs in pedigree but not in reference will be automatically discarded. <br> | + | Note that you do not need to extract regional pedigree files for your own samples because SNPs in pedigree but not in reference will be automatically discarded. <br> 1. For HapMapII format, download haplotypes from http://www.sph.umich.edu/csg/ylwtx/HapMapForMach.tgz <br> 2. For MACH format, you can do the following: |
− | 1. For HapMapII format, download haplotypes from http://www.sph.umich.edu/csg/ylwtx/HapMapForMach.tgz <br> | |
− | 2. For MACH format, you can do the following: | |
| | | |
| *First, find the first and last SNP in the region you are interested in. Say "rsFIRST" and "rsLAST", defined according to position. | | *First, find the first and last SNP in the region you are interested in. Say "rsFIRST" and "rsLAST", defined according to position. |
Line 83: |
Line 81: |
| | | |
| @ first = `grep -nw rsFIRST orig.snps | cut -f1 -d ':'` | | @ first = `grep -nw rsFIRST orig.snps | cut -f1 -d ':'` |
− | @ last = `grep -nw rsLAST orig.snps | cut -f1 -d ':'`
| + | @ last = `grep -nw rsLAST orig.snps | cut -f1 -d ':'` |
| | | |
− | *Finally (assuming the third field contains the actual haplotypes, where alleles are separated by whitespace): | + | *Then find out the field that contains the actual haplotypes, where alleles are separated by whitespace |
| + | head -1 orig.hap | wc -w |
| | | |
− | awk '{print $3}' orig.hap | cut -c${first}-${last} > region.hap | + | * Finally: |
| + | |
| + | awk '{print $'''3'''}' orig.hap | cut -c${first}-${last} > region.hap |
| | | |
| The created reference files are in MaCH format. You do NOT need to turn on --hapmapFormat option. | | The created reference files are in MaCH format. You do NOT need to turn on --hapmapFormat option. |