Difference between revisions of "Mach DAC"

From Genome Analysis Wiki
Jump to: navigation, search
(Post Phasing/Imputation Ligation)
 
(4 intermediate revisions by 2 users not shown)
Line 8: Line 8:
  
 
=== Split Your Data ===
 
=== Split Your Data ===
You can split your data using [http://www.sph.umich.edu/csg/yli/splitPed/ splitPed].
+
You can split your data using [http://csg.sph.umich.edu//yli/splitPed/ splitPed]. If you follow our recommendation of using MaCH+minimac for imputation, you only need to use splitPed in the MaCH step (to phase your study sample), which does not involve external reference. In the minimac step, imputation finishes within a day for several thousand individuals even for the largest chromosome as a whole: A good rule of thumb is that minimac should take about 1 hour to impute 1,000,000 markers in 1,000 individuals using a reference panel with 100 haplotypes, see [http://genome.sph.umich.edu/wiki/Minimac#Imputation minimac wiki] for more details.
  
 
== Phasing/Imputation with External Reference ==
 
== Phasing/Imputation with External Reference ==
 
When you phase/impute with external reference panel, you will only need to break the reference files into parts containing subsets of markers because SNPs in your own data (pedigree files) but not in reference files will be automatically ignored by MaCH and minimac. <br>
 
When you phase/impute with external reference panel, you will only need to break the reference files into parts containing subsets of markers because SNPs in your own data (pedigree files) but not in reference files will be automatically ignored by MaCH and minimac. <br>
  
You can split the reference data using [http://www.sph.umich.edu/csg/yli/splitRef/ splitRef].
+
You can split the reference data using [http://csg.sph.umich.edu//yli/splitRef/ splitRef].
  
 
== Post Phasing/Imputation Ligation ==
 
== Post Phasing/Imputation Ligation ==
You can use [http://www.sph.umich.edu/csg/yli/ligateHap.V002.tgz LigateHaplotypes ] to ligate the parts.
+
You can use [http://csg.sph.umich.edu//yli/ligateHap.V004.tgz LigateHaplotypes ] to ligate the parts.
  
 
== Questions and Comments?  ==
 
== Questions and Comments?  ==
  
 
Email [mailto:yunli@med.unc.edu Yun Li].
 
Email [mailto:yunli@med.unc.edu Yun Li].

Latest revision as of 11:24, 2 February 2017

This is the MaCH Divide and Conquer page, documenting how to break the genome into smaller pieces before imputation/phasing and how to ligate after imputation/phasing.

Phasing without External Reference

Your Data

To get started, you will need to store your data in Merlin format pedigree and data files, one per chromosome. For details of the Merlin file format, see the Merlin tutorial [1].

Within each file, markers should be stored by chromosome position. Alleles should be stored in the forward strand and can be encoded as 'A', 'C', 'G' or 'T' (there is no need to use numeric identifiers for each allele).

Split Your Data

You can split your data using splitPed. If you follow our recommendation of using MaCH+minimac for imputation, you only need to use splitPed in the MaCH step (to phase your study sample), which does not involve external reference. In the minimac step, imputation finishes within a day for several thousand individuals even for the largest chromosome as a whole: A good rule of thumb is that minimac should take about 1 hour to impute 1,000,000 markers in 1,000 individuals using a reference panel with 100 haplotypes, see minimac wiki for more details.

Phasing/Imputation with External Reference

When you phase/impute with external reference panel, you will only need to break the reference files into parts containing subsets of markers because SNPs in your own data (pedigree files) but not in reference files will be automatically ignored by MaCH and minimac.

You can split the reference data using splitRef.

Post Phasing/Imputation Ligation

You can use LigateHaplotypes to ligate the parts.

Questions and Comments?

Email Yun Li.