Changes

From Genome Analysis Wiki
Jump to navigationJump to search
17 bytes added ,  10:31, 4 June 2010
Line 40: Line 40:  
Phase haplotype information is encoded in two files. The first file (which MACH calls the "snp file") lists the markers in the phased haplotype. The second file (which MACH calls the "haplotype file") lists one haplotype per line. If you retrieved these files from the HapMap website, simply combine the --hapmapFormat option with the --snp option to indicate the name of the HapMap legend file and the --haps option to indicate the name of the file with phased haplotypes. Here is an example:
 
Phase haplotype information is encoded in two files. The first file (which MACH calls the "snp file") lists the markers in the phased haplotype. The second file (which MACH calls the "haplotype file") lists one haplotype per line. If you retrieved these files from the HapMap website, simply combine the --hapmapFormat option with the --snp option to indicate the name of the HapMap legend file and the --haps option to indicate the name of the file with phased haplotypes. Here is an example:
   −
prompt> mach1 --hapmapFormat --snps genotypes_chr1_CEU_r22_nr.b36_fwd_legend.txt.gz --haps genotypes_chr1_CEU_r22_nr.b36_fwd.phase.gz ...
+
  prompt> mach1 --hapmapFormat --snps genotypes_chr1_CEU_r22_nr.b36_fwd_legend.txt.gz --haps genotypes_chr1_CEU_r22_nr.b36_fwd.phase.gz ...
    
If you don't use the --hapmapFormat option, MACH expects the snp file (indicated with the --snps option) to simply list one marker name per line and the haplotype files (indicated with the --haps option) to list one haplotype per line. Haplotypes can be prefaced by one or two optional labels followed by a series of single character alleles, one for each marker. Within each haplotype, spaces are ignored. Here are two examples:
 
If you don't use the --hapmapFormat option, MACH expects the snp file (indicated with the --snps option) to simply list one marker name per line and the haplotype files (indicated with the --haps option) to list one haplotype per line. Haplotypes can be prefaced by one or two optional labels followed by a series of single character alleles, one for each marker. Within each haplotype, spaces are ignored. Here are two examples:
   −
<Example of a snp list file>
+
  '''<Example of a snp list file>'''
marker1
+
  marker1
marker2
+
  marker2
...
+
  ...
<End of snp list file>
+
  marker13
In the sample haplotype file below, note that the first two columns are automatically ignored (because, based on the legend file, MACH knows the phased haplotypes should include only 13 markers, corresponding to the last string of digits on each line). Also note that the alleles A, C, G, and T have been recoded as digits 1, 2, 3, and 4.
+
  '''<End of snp list file>'''
 +
 
 +
In the sample haplotype file below, note that the first two columns are automatically ignored (because, based on the snp list file, MACH knows the phased haplotypes should include only 13 markers, corresponding to the last string of characters on each line).  
 +
 
 +
  '''<Example of a phased haplotype file>'''
 +
  FAMILY1->PERSON1 HAPLO1 CGGCGCGCTTGGC
 +
  FAMILY1->PERSON1 HAPLO2 CGGCGCGTCCAGC
 +
  FAMILY2->PERSON1 HAPLO1 GGGCGCGCTTGGC
 +
  FAMILY2->PERSON1 HAPLO2 GGAAGCACTCGGC
 +
  ...
 +
  '''<End of phased haplotype file>'''
   −
<Example of a phased haplotype file>
  −
FAMILY1->PERSON1 HAPLO1 2332323244332
  −
FAMILY1->PERSON1 HAPLO2 2332323422132
  −
FAMILY2->PERSON1 HAPLO1 3332323244332
  −
FAMILY2->PERSON1 HAPLO2 3311321242332
  −
...
  −
<End of phased haplotype file>
   
If you provide a MACH a set of reference haplotypes as input, the marker order in the phased haplotypes overrides any marker order that may be specified in the pedigree and data files that contain the genotype data. This means that one convenient way to re-order markers in your original pedigree and data file is to simply create an empty haplotype file and a companion snp that lists markers in the desired order. When you provide these two as input, they'll overwrite the marker order specified in the data file.
 
If you provide a MACH a set of reference haplotypes as input, the marker order in the phased haplotypes overrides any marker order that may be specified in the pedigree and data files that contain the genotype data. This means that one convenient way to re-order markers in your original pedigree and data file is to simply create an empty haplotype file and a companion snp that lists markers in the desired order. When you provide these two as input, they'll overwrite the marker order specified in the data file.
    +
== Saving Disk Space ==
   −
Useful Tip: You can usually economize disk space by using gzip to compress your input files (the data and pedigree files and any files containing the reference haplotypes). MACH can automatically recognize gzipped files and decompress them on the fly.
+
'''Useful Tip:''' You can usually economize disk space by using gzip to compress your input files (the data and pedigree files and any files containing the reference haplotypes). MACH can automatically recognize gzipped files and decompress them on the fly.
    
That is all you should need to get started!
 
That is all you should need to get started!

Navigation menu