MaCH (MArkov Chain Haplotyping), mostly known as a software for genotype imputation, is a Hidden Markov Model (HMM) based haplotyper that reconstructs haplotypes from genotypes of unrelated individuals. Three primary utilities of MaCH are (1) to resolve haplotypes from diploid genotypes; (2) impute missing genotypes; and (3) perform disease mapping analysis.
Mach takes unphased genotypes of unrelated individuals as input. Two input files are mandatory: a pedigree file and a marker information file. The pedigree file stores five key pieces of information and genotypes for each individual, with missing genotypes accepted and additional phenotypes allowed. The marker information file provides the list of marker names. Note that the list must be in order according to physical positions of the markers along the chromosomes. For more details, refer to
Each person contributes to one line in a pedigree file. Required fields are (1) the first five fixed fields corresponding to five key pieces of information (namely: family person father mother sex), and (2) genotype fields. Phenotype fields are allowed inbetween but will not be used by the program.
<sample.ped> fam1 indiv1 0 0 1 0/0 2/3 ./. fam2 indiv2 0 0 2 1/2 2/2 1/4 <EOF sample.ped>
This sample.ped contains 2 individuals. The first individual is from family fam1 with person ID indiv1 and no parental information available (father = 0, mother = 0). This person is a male (sex = 1). His genotypes are missing at the first and third markers (0/0 and ./.), and is 2/3 (C/G) at the second marker. Similarly, the second individual is from family fam2 with person ID indiv2 and no parental information available (father = 0, mother = 0). This person is a female (sex = 2). Her genotypes are 1/2 (A/C) at the first locus, 2/2 (Homozygous for C) at the second locus and 1/4 (A/T) at the third locus.
Marker Information File
<sample.dat> M SNP1 M SNP2 M SNP3 <EOF sample.dat>
This file tells us that fields 6-8 in the pedigree file store genotypes for SNP1-3 correspondingly. Note again that the list of SNPs must be in their physical order along the chromosomes.
Input Files: --datfile Marker information file for subjects under study. --pedfile Pedigree file for subjects under study.
Q: Where can I find combined HapMap reference files?
Q: Where can I find HapMap III reference files?
Q: Does --mle overwrite fed-in genotypes?
A: Yes. But rarely. --mle outputs the most likely genotype guesses by integrating over the probabilities of all possible configurations based on the reference haplotypes. The overwriting happens when the most likely guess differs from the experimental counterpart.
mach1 -d sample.dat -p sample.ped -s hapmap.snps -h hapmap.hap -r 100 -o phase