From Genome Analysis Wiki
no edit summary
An earlier version of this page is available at http://
MACH input files include information on experimental genotypes for a set of individuals and, optionally, on a set of known haplotypes. MACH can use these to estimate haplotypes for each sampled individual (conditional on the observed genotypes) or to fill in missing genotypes (conditional on observed genotypes at flanking markers and on the observed genotypes at other individuals). Since an essential first step in any analysis is to make sure data is formatted correctly, it is worthwhile to go over the input files MACH expects and their formats.
The essential inputs for MACH are a set of observed genotypes for each individual being studied. Typically, MACH expects that all the markers being examined map to one chromosome and that appear in map order in the input files. These requirements can be relaxed when using phased haplotypes as input (see below).
MACH expects observed genotype data to be stored in a set of matched pedigree and data files. The two files are intrinsically linked, the data file describes the contents of the pedigree file (every pedigree file is slightly different) and the pedigree file itself can only be decoded with its companion data file. The two files can use either the more modern [[Merlin]] / [[QTDT]] format or the classic [[LINKAGE]] format. Detailed descriptions of each format are available elsewhere (for example, see [http://
www.sph.umich.edu/ csg/abecasis/Merlin/tour/input_files.html details of Merlin input formats]), and here we focus on providing an overview of the bare essentials required for using MACH.
Data files can describe a variety of fields, including disease status information, quantitative traits and covariates, and marker genotypes. A simple MACH data file simply lists names for a series of genetic markers. Each marker name appears its own line prefaced by an " M " field code. Here is an example: