Difference between revisions of "MaCH"

From Genome Analysis Wiki
Jump to navigationJump to search
 
(8 intermediate revisions by 3 users not shown)
Line 7: Line 7:
 
* The MaCH tutorial at http://www.sph.umich.edu/csg/abecasis/MaCH/tour/
 
* The MaCH tutorial at http://www.sph.umich.edu/csg/abecasis/MaCH/tour/
 
* [[MaCH FAQ|The MaCH FAQ]]
 
* [[MaCH FAQ|The MaCH FAQ]]
 +
* [[MaCH Options|MaCH Options]]
 +
* [[MaCH: Input Files|Information on MaCH input formats]]
 +
* [[MaCH: 1000 Genomes Imputation Cookbook|1000 Genomes Imputation Cookbook]]
 +
* [[MaCH: machX|Chromosome X Imputation]]
 +
* [[Mach2dat: Association with MACH output]]
  
 
Currently, it also includes random notes on input file formats, but these probably need to be cleaned up!
 
Currently, it also includes random notes on input file formats, but these probably need to be cleaned up!
  
== Input Files  ==
+
[[Category:Software]]
 
 
MaCH requires a pedigree file and a data file as input. The pedigree file stores genotypes and phenotypes for each individual. The data file describes the contents of the pedigre file.Both files should be in [[Merlin]] format, which is described at http://www.sph.umich.edu/csg/abecasis/Merlin/tour/input_files.html
 
 
 
Typically, MaCH expects markers to be ordered by position in the data and pedigree files.
 
 
 
=== Pedigree File (mandatory)  ===
 
 
 
Each person is listed in a separate line of the pedigree file. The lines should start with 5 canonical fields, which are:
 
 
 
* Family ID
 
* Individual ID
 
* Father ID (typically zero for unrelated individuals)
 
* Mother ID (typically zero for unrelated individuals)
 
* Sex (encoded as M or 1 for males and F or 2 for females)
 
 
 
These 5 canonical fields, which are present in all pedigree files, will typically be followed by a series of marker genotypes. Here is an example of a complete pedigree file:
 
 
 
<sample.ped>
 
  fam1 indiv1 0 0 1 ./. C/G ./.
 
  fam2 indiv2 0 0 2 A/C C/C A/T
 
 
 
This sample pedigree contains 2 individuals.
 
 
 
The first individual is a male from family ''fam1'' and is named ''indiv1''. Parental codes are set to zero; since there is no individual named ''0'' in ''fam1'', these simply indicate the individual is a founder. The sex code is 1, indicating the individual is a male. Two of the genotypes for this individual are missing and have been set to ''./.''. Missing genotypes can also be coded as ''0/0''.
 
 
 
The second individual is a female founder from family ''fam2'' with ID ''indiv2''. There are no missing genotypes for this individual.
 
 
 
In a file that includes only unrelated individuals, you could set family and individual ids to be identical for every individual. Alternatively, you could simply set the individual id or family id to be ''1'' for all individuals.
 
 
 
=== Marker Information File (mandatory)  ===
 
 
 
An example data file that matches the pedigree above might be:
 
 
 
<sample.dat>
 
  M SNP1
 
  M SNP2
 
  M SNP3
 
 
 
This file tells us that the three files that follow the sex code in the pedigree file store genotypes for 3 markers, named SNP1, SNP2 and SNP3. MaCH expects SNPs to be ordered by position.
 
 
 
=== Optional Input files  ===
 
 
 
==== Reference Haplotypes  ====
 
 
 
Reference Haplotypes (typically from HapMap or 1000 Genomes) are optional. Mach 1.0 accepts two different formats: MACH format or HapMap format.
 
 
 
===== MACH format SNP File  =====
 
 
 
One line per SNP and one field (marker name) only.
 
 
 
For example:
 
 
 
marker1
 
marker2
 
...
 
 
 
===== MACH format Haplotype File  =====
 
 
 
One line per haplotype. Each haplotype can be preceded by a series of annotation columns.
 
 
 
For example:
 
 
 
  H_0001->H_0001 HAPLO1 CGGCGCGCTTGGC
 
  H_0001->H_0001 HAPLO2 CGGCGCGTCCAGC
 
  H_0002->H_0002 HAPLO1 GGGCGCGCTTGGC
 
  H_0002->H_0002 HAPLO2 GGAAGCACTCGGC
 
  ...
 
 
 
===== HapMap format reference files  =====
 
 
 
HapMap format SNP File: legend file downloaded from HapMap website
 
 
 
HapMap format Haplotype File: phase fileddownloaded from HapMap website
 
 
 
HapMap format files can be downloaded from http://hapmap.org/downloads/phasing/2006-07_phaseII/phased/ or http://hapmap.org/downloads/phasing/2007-08_rel22/phased/
 
 
 
HapMapIII phased haplotypes are in different format, you will need to use our converted haplotypes available at http://www.sph.umich.edu/csg/yli/mach/download/HapMap3.r2.b36.html
 
 
 
When using HapMap format files, turn on --hapmapFormat option. For example:
 
 
 
  mach1 -d sample.dat -p sample.ped -s genotypes_chr14_CEU_r21_nr_fwd_legend.txt -h genotypes_chr14_CEU_r21_nr_fwd_phased.gz --hapmapFormat ...
 

Latest revision as of 08:34, 26 November 2010

MaCH is a tool for haplotyping, genotype imputation and disease association analysis developed by Goncalo Abecasis and Yun Li. MaCH was first used to imputed missing genotypes in our FUSION genomewide association study (Scott et al, Science, 2007) and has since been used in the analysis of many other GWAS.

This page includes links to several useful MaCH related resources.

Currently, it also includes random notes on input file formats, but these probably need to be cleaned up!