Changes

From Genome Analysis Wiki
Jump to navigationJump to search
178 bytes added ,  11:31, 2 February 2017
Line 1: Line 1: −
An earlier version of this page is available at http://www.sph.umich.edu/csg/abecasis/MaCH/tour/input_files.html.
+
An earlier version of this page is available at http://csg.sph.umich.edu//abecasis/MaCH/tour/input_files.html.
    
MACH input files include information on experimental genotypes for a set of individuals and, optionally, on a set of known haplotypes. MACH can use these to estimate haplotypes for each sampled individual (conditional on the observed genotypes) or to fill in missing genotypes (conditional on observed genotypes at flanking markers and on the observed genotypes at other individuals). Since an essential first step in any analysis is to make sure data is formatted correctly, it is worthwhile to go over the input files MACH expects and their formats.
 
MACH input files include information on experimental genotypes for a set of individuals and, optionally, on a set of known haplotypes. MACH can use these to estimate haplotypes for each sampled individual (conditional on the observed genotypes) or to fill in missing genotypes (conditional on observed genotypes at flanking markers and on the observed genotypes at other individuals). Since an essential first step in any analysis is to make sure data is formatted correctly, it is worthwhile to go over the input files MACH expects and their formats.
Line 6: Line 6:  
The essential inputs for MACH are a set of observed genotypes for each individual being studied. Typically, MACH expects that all the markers being examined map to one chromosome and that appear in map order in the input files. These requirements can be relaxed when using phased haplotypes as input (see below).
 
The essential inputs for MACH are a set of observed genotypes for each individual being studied. Typically, MACH expects that all the markers being examined map to one chromosome and that appear in map order in the input files. These requirements can be relaxed when using phased haplotypes as input (see below).
   −
MACH expects observed genotype data to be stored in a set of matched pedigree and data files. The two files are intrinsically linked, the data file describes the contents of the pedigree file (every pedigree file is slightly different) and the pedigree file itself can only be decoded with its companion data file. The two files can use either the more modern [[Merlin]] / [[QTDT]] format or the classic [[LINKAGE]] format. Detailed descriptions of each format are available elsewhere (for example, see [http://www.sph.umich.edu/csg/abecasis/Merlin/tour/input_files.html details of Merlin input formats]), and here we focus on providing an overview of the bare essentials required for using MACH.
+
MACH expects observed genotype data to be stored in a set of matched pedigree and data files. The two files are intrinsically linked, the data file describes the contents of the pedigree file (every pedigree file is slightly different) and the pedigree file itself can only be decoded with its companion data file. The two files can use either the more modern [[Merlin]] / [[QTDT]] format or the classic [[LINKAGE]] format. Detailed descriptions of each format are available elsewhere (for example, see [http://csg.sph.umich.edu//abecasis/Merlin/tour/input_files.html details of Merlin input formats]), and here we focus on providing an overview of the bare essentials required for using MACH.
    
Data files can describe a variety of fields, including disease status information, quantitative traits and covariates, and marker genotypes. A simple MACH data file simply lists names for a series of genetic markers. Each marker name appears its own line prefaced by an " M " field code. Here is an example:
 
Data files can describe a variety of fields, including disease status information, quantitative traits and covariates, and marker genotypes. A simple MACH data file simply lists names for a series of genetic markers. Each marker name appears its own line prefaced by an " M " field code. Here is an example:
Line 59: Line 59:  
You can retrieve a current set of phased HapMap format haplotypes from http://hapmap.org/downloads/phasing/2007-08_rel22/phased/.  
 
You can retrieve a current set of phased HapMap format haplotypes from http://hapmap.org/downloads/phasing/2007-08_rel22/phased/.  
   −
HapMap III phased haplotypes are in different format, you will need to use our converted haplotypes available at http://www.sph.umich.edu/csg/yli/mach/download/HapMap3.r2.b36.html
+
HapMap III phased haplotypes are in different format, you will need to use our converted haplotypes available at http://csg.sph.umich.edu//yli/mach/download/HapMap3.r2.b36.html
 +
 
 +
Additional reference files (e.g., those based on data from the 1000 Genomes Project; combined reference files) can be found through links at http://csg.sph.umich.edu//yli/mach/download/
    
Phase haplotype information is encoded in two files. The first file (which MACH calls the "snp file") lists the markers in the phased haplotype. The second file (which MACH calls the "haplotype file") lists one haplotype per line. If you retrieved these files from the HapMap website, simply combine the --hapmapFormat option with the --snp option to indicate the name of the HapMap legend file and the --haps option to indicate the name of the file with phased haplotypes. Here is an example:
 
Phase haplotype information is encoded in two files. The first file (which MACH calls the "snp file") lists the markers in the phased haplotype. The second file (which MACH calls the "haplotype file") lists one haplotype per line. If you retrieved these files from the HapMap website, simply combine the --hapmapFormat option with the --snp option to indicate the name of the HapMap legend file and the --haps option to indicate the name of the file with phased haplotypes. Here is an example:
96

edits

Navigation menu