Minimac3 Cookbook : Pre-Phasing
Introduction
Once a quality controlled dataset is available we need to pre-phase the data followed by imputation. Pre-Phasing can be done using either MaCH or SHAPEIT, the most commonly used tools. This wiki page gives detailed instruction on pre-phasing GWAS data of different samples sizes. This also applies for phasing of the non-Pseudo-Autosomal Region (only for females) and the Pseudo-Autosomal Region (for both males and females) of chromosome X .
Pre-phasing with MaCH
MaCH is a Markov Chain based haplotyper. It can resolve long haplotypes in samples of unrelated individuals. The source code is available for download here. Check out their home-page for further details.
A typical command line to phase using MaCH looks like this (Gwas.chr20.Unphased.dat
and Gwas.chr20.Unphased.ped
is the quality controlled GWAS data set in Merlin format)
mach1 -d Gwas.chr20.Unphased.dat \ -p Gwas.chr20.Unphased.ped \ --rounds 20 \ --states 200 \ --phase \ --interim 5 \ --sample 5 \ --prefix Gwas.Chr20.Phased.Output
Pre-phasing with SHAPEIT
SHAPEIT is a fast and accurate method for estimation of haplotypes (phasing) from genotype or sequencing data. The source code is available for download here. Check out their home-page for further details. It can be used to phase a small number of samples (a reference panel required) as well as a large number of samples (NO reference panel required). The reference panels and genetic map files required by SHAPEIT are available for download here.
- The following example shows a typical SHAPEIT command line to phase a LARGE number (>200) of GWAS samples (
Gwas.chr20.Unphased.vcf
is the quality controlled GWAS data set in VCF format).
shapeit -V Gwas.chr20.Unphased.vcf \ -M genetic_map_chr20.txt \ -O Gwas.Chr20.Phased.Output
- The following example shows a typical SHAPEIT command line to phase a SMALL number (<200) of GWAS samples (
Gwas.chr20.Unphased.vcf
is the quality controlled GWAS data set in VCF format).
## The following step splits out variants mis-aligned between the reference and gwas panel shapeit -check \ -V Gwas.chr20.Unphased.vcf\ -M genetic_map_chr20.txt \ --input-ref reference.haplotypes.gz reference.legend.gz reference.sample \ --output-log gwas.alignments ## The following step phases gwas panel using the reference panel while excluding the markers found in the step above. shapeit -B gwas \ -V Gwas.chr20.Unphased.vcf \ --input-ref reference.haplotypes.gz reference.legend.gz reference.sample \ --exclude-snp gwas.alignments.strand.exclude \ -O Gwas.Chr20.Phased.Output
Useful Wiki Pages
There are a few pages in this Wiki that may be useful to for Minimac3 users. Here are links to a few:
- Minimac3 Imputation Cookbook (Recommended for New Users!!)
Contact
In case of any queries and bugs please contact Sayantan Das.