Changes

From Genome Analysis Wiki
Jump to: navigation, search

MaCH FAQ

344 bytes added, 17:38, 15 January 2012
How do I get reference files for an region of interest?
*First, find the first and last SNP in the region you are interested in. Say "rsFIRST" and "rsLAST", defined according to position.
*Then, under csh
@ first = `grep -nw rsFIRST orig.snps | cut -f1 -d ':'`
@ last = `grep -nw rsLAST orig.snps | cut -f1 -d ':'`
under bash:
first=`grep -nw rsFIRST orig.snps | cut -f1 -d ':'`
last=`grep -nw rsLAST orig.snps | cut -f1 -d ':'`
*Then find out the field that contains the actual haplotypes, where alleles are separated by whitespace
head -1 orig.hap | wc -w
Note: if the haplotypes are gz compressed, do:
zcat orig.hap.gz | head -1 | wc -w
* Finally (say you got 3 from the above wc -w command. If you got other numbers, replace the 3 in bold below with the number you got):
awk '{print $'''3'''}' orig.hap | cut -c${first}-${last} > region.hap
 
Note: if the haplotypes are gz compressed, do:
zcat orig.hap.gz | awk '{print $'''3'''}' | cut -c${first}-${last} > region.hap
The created reference files are in MaCH format. You do NOT need to turn on --hapmapFormat option.
212
edits

Navigation menu