Changes

From Genome Analysis Wiki
Jump to navigationJump to search
25 bytes added ,  10:25, 4 June 2010
Line 8: Line 8:  
Data files can describe a variety of fields, including disease status information, quantitative traits and covariates, and marker genotypes. A simple MACH data file simply lists names for a series of genetic markers. Each marker name appears its own line prefaced by an " M " field code. Here is an example:
 
Data files can describe a variety of fields, including disease status information, quantitative traits and covariates, and marker genotypes. A simple MACH data file simply lists names for a series of genetic markers. Each marker name appears its own line prefaced by an " M " field code. Here is an example:
   −
'''<Example of a simple data file>'''
+
  '''<Example of a simple data file>'''
 
   M marker1
 
   M marker1
 
   M marker2
 
   M marker2
 
   ...
 
   ...
'''<End of simple data file>'''
+
  '''<End of simple data file>'''
    
The actual genotypes are stored in a pedigree file. The pedigree file encodes one individual per row. Each row should start with an family id and individual id, followed by a father and mother id (which typically are both set to 0, 'zero', for unrelated individuals), and sex. These initial columns are followed by a series of marker genotypes, each with two alleles. We recommend that the alleles should be coded as A, C, G, T. For compatibility with older analysis tools, it is also possible to encode allels as 1 (for A), 2 (for C), 3 (for G) and 4 (for T). See below for an example:
 
The actual genotypes are stored in a pedigree file. The pedigree file encodes one individual per row. Each row should start with an family id and individual id, followed by a father and mother id (which typically are both set to 0, 'zero', for unrelated individuals), and sex. These initial columns are followed by a series of marker genotypes, each with two alleles. We recommend that the alleles should be coded as A, C, G, T. For compatibility with older analysis tools, it is also possible to encode allels as 1 (for A), 2 (for C), 3 (for G) and 4 (for T). See below for an example:
   −
'''<Example of a pedigree file with base-pair coded alleles>'''
+
  '''<Example of a pedigree file with base-pair coded alleles>'''
FAM1001  ID1234  0  0  M  A A  A C  C C
+
  FAM1001  ID1234  0  0  M  A A  A C  C C
FAM1002  ID5678  0  0  F  A C  C C  G G
+
  FAM1002  ID5678  0  0  F  A C  C C  G G
...
+
  ...
'''<End of pedigree file>'''
+
  '''<End of pedigree file>'''
   −
Although we don't recommend it, it is possible to use a pedigree file with numerically coded alleles. For an example, follow [[MaCH: Pedigree with Integer Allele Codes|this link]]
+
Although we don't recommend it, it is possible to use a pedigree file with numerically coded alleles. For an example, see [[MaCH: Pedigree with Integer Allele Codes|obsolete input formats]].
    
In the MACH command line, the name of the data and pedigree files is indicated with the -d and -p options (in short hand form) or the --datfile and --pedfile options (in long form) respectively.  
 
In the MACH command line, the name of the data and pedigree files is indicated with the -d and -p options (in short hand form) or the --datfile and --pedfile options (in long form) respectively.  

Navigation menu