Changes

From Genome Analysis Wiki
Jump to navigationJump to search
54 bytes added ,  06:00, 4 June 2010
Line 3: Line 3:  
== Input Files  ==
 
== Input Files  ==
   −
Mach takes unphased genotypes of unrelated individuals as input. Two input files are mandatory: a pedigree file and a marker information file. The pedigree file stores five key pieces of information and genotypes for each individual, with missing genotypes accepted and additional phenotypes allowed. The marker information file provides the list of marker names. Note that the list must be in order according to physical positions of the markers along the chromosomes. For more details, refer to http://www.sph.umich.edu/csg/abecasis/Merlin/tour/input_files.html. <br>
+
MaCH requires a pedigree file and a data file as input. The pedigree file stores genotypes and phenotypes for each individual. The data file describes the contents of the pedigre file.Both files should be in [[Merlin]] format, which is described at http://www.sph.umich.edu/csg/abecasis/Merlin/tour/input_files.html
   −
=== '''Pedigree File (mandatory)'''<br>  ===
+
Typically, MaCH expects markers to be ordered by position in the data and pedigree files.
   −
Each person contributes to one line in a pedigree file. Required fields are (1) the first five fixed fields corresponding to five key pieces of information (namely: family person father mother sex), and (2) genotype fields. Phenotype fields are allowed inbetween but will not be used by the program. <br>
+
=== Pedigree File (mandatory) ===
   −
&lt;sample.ped&gt; <br>
+
Each person is listed in a separate line of the pedigree file. The lines should start with 5 canonical fields, which are:
   −
  fam1 indiv1 0 0 1 0/0 2/3 ./.
+
* Family ID
  fam2 indiv2 0 0 2 1/2 2/2 1/4
+
* Individual ID
 +
* Father ID (typically zero for unrelated individuals)
 +
* Mother ID (typically zero for unrelated individuals)
 +
* Sex (encoded as M or 1 for males and F or 2 for females)
   −
&lt;EOF sample.ped&gt; <br>
+
These 5 canonical fields, which are present in all pedigree files, will typically be followed by a series of marker genotypes. Here is an example of a complete pedigree file:
   −
This sample.ped contains 2 individuals. The first individual is from family fam1 with person ID indiv1 and no parental information available (father = 0, mother = 0). This person is a male (sex = 1). His genotypes are missing at the first and third markers (0/0 and ./.), and is 2/3 (C/G) at the second marker. Similarly, the second individual is from family fam2 with person ID indiv2 and no parental information available (father = 0, mother = 0). This person is a female (sex = 2). Her genotypes are 1/2 (A/C) at the first locus, 2/2 (Homozygous for C) at the second locus and 1/4 (A/T) at the third locus.
+
&lt;sample.ped&gt;
 +
  fam1 indiv1 0 0 1 ./. C/G ./.  
 +
  fam2 indiv2 0 0 2 A/C C/C A/T  
   −
=== '''Marker Information File (mandatory)'''<br>  ===
+
This sample pedigree contains 2 individuals.
   −
&lt;sample.dat&gt;<br>
+
The first individual is a male from family ''fam1'' and is named ''indiv1''. Parental codes are set to zero; since there is no individual named ''0'' in ''fam1'', these simply indicate the individual is a founder. The sex code is 1, indicating the individual is a male. Two of the genotypes for this individual are missing and have been set to ''./.''. Missing genotypes can also be coded as ''0/0''.
    +
The second individual is a female founder from family ''fam2'' with ID ''indiv2''. There are no missing genotypes for this individual.
 +
 +
In a file that includes only unrelated individuals, you could set family and individual ids to be identical for every individual. Alternatively, you could simply set the individual id or family id to be ''1'' for all individuals.
 +
 +
=== Marker Information File (mandatory)  ===
 +
 +
An example data file that matches the pedigree above might be:
 +
 +
&lt;sample.dat&gt;
 
   M SNP1
 
   M SNP1
 
   M SNP2
 
   M SNP2
 
   M SNP3
 
   M SNP3
   −
&lt;EOF sample.dat&gt;<br>
+
This file tells us that the three files that follow the sex code in the pedigree file store genotypes for 3 markers, named SNP1, SNP2 and SNP3. MaCH expects SNPs to be ordered by position.
 
  −
This file tells us that fields 6-8 in the pedigree file store genotypes for SNP1-3 correspondingly. Note again that the list of SNPs must be in their physical order along the chromosomes.  
  −
 
  −
<br>
     −
=== '''Optional Input files'''<br> ===
+
=== Optional Input files  ===
   −
==== External/reference files ====
+
==== Reference Haplotypes ====
   −
External/reference (e.g., HapMap) input files (snp and haplotype files) are optional. Mach 1.0 accepts two different formats: MACH format or HapMap format. <br>
+
Reference Haplotypes (typically from HapMap or 1000 Genomes) are optional. Mach 1.0 accepts two different formats: MACH format or HapMap format.  
    
===== MACH format SNP File  =====
 
===== MACH format SNP File  =====
  −
----
      
One line per SNP and one field (marker name) only.  
 
One line per SNP and one field (marker name) only.  
Line 52: Line 60:  
===== MACH format Haplotype File  =====
 
===== MACH format Haplotype File  =====
   −
----
+
One line per haplotype. Each haplotype can be preceded by a series of annotation columns.  
 
  −
One line per haplotype. <br> Heading identification fields are optional. <br> Each non-haplotype/heading field shall not start with a numeric digit. <br>
      
For example:  
 
For example:  
   −
   H_0001-&gt;H_0001 HAPLO1 2332323244332
+
   H_0001-&gt;H_0001 HAPLO1 CGGCGCGCTTGGC
   H_0001-&gt;H_0001 HAPLO2 2332323422132
+
   H_0001-&gt;H_0001 HAPLO2 CGGCGCGTCCAGC
   H_0002-&gt;H_0002 HAPLO1 3332323244332
+
   H_0002-&gt;H_0002 HAPLO1 GGGCGCGCTTGGC
   H_0002-&gt;H_0002 HAPLO2 3311321242332
+
   H_0002-&gt;H_0002 HAPLO2 GGAAGCACTCGGC
 
   ...
 
   ...
    
===== HapMap format reference files  =====
 
===== HapMap format reference files  =====
 +
 +
HapMap format SNP File: legend file downloaded from HapMap website
 +
 +
HapMap format Haplotype File: phase fileddownloaded from HapMap website
    
HapMap format files can be downloaded from http://hapmap.org/downloads/phasing/2006-07_phaseII/phased/ or http://hapmap.org/downloads/phasing/2007-08_rel22/phased/  
 
HapMap format files can be downloaded from http://hapmap.org/downloads/phasing/2006-07_phaseII/phased/ or http://hapmap.org/downloads/phasing/2007-08_rel22/phased/  
   −
HapMap format SNP File: legend file downloaded from HapMap website <br> HapMap format Haplotype File: phase file downloaded from HapMap website <br>
+
When using HapMap format files, turn on --hapmapFormat option. For example:
 
  −
<br> When using HapMap format files, turn on --hapmapFormat option.  
      
   mach1 -d sample.dat -p sample.ped -s genotypes_chr14_CEU_r21_nr_fwd_legend.txt -h genotypes_chr14_CEU_r21_nr_fwd_phased.gz --hapmapFormat ...
 
   mach1 -d sample.dat -p sample.ped -s genotypes_chr14_CEU_r21_nr_fwd_legend.txt -h genotypes_chr14_CEU_r21_nr_fwd_phased.gz --hapmapFormat ...
  −
==== Physical position file  ====
      
==== Parameter files  ====
 
==== Parameter files  ====

Navigation menu