Minimac3 Cookbook : Pre-Phasing

From Genome Analysis Wiki
Jump to navigationJump to search

Introduction

Once a quality controlled dataset is available we need to pre-phase the data followed by imputation. Pre-Phasing can be done using either MaCH or SHAPEIT, the most commonly used tools. This wiki page gives detailed instruction on pre-phasing GWAS data of different samples sizes. This also applies for phasing of the non-Pseudo-Autosomal Region (only for females) and the Pseudo-Autosomal Region (for both males and females) of chromosome X .

Pre-phasing with MaCH

MaCH is a Markov Chain based haplotyper. It can resolve long haplotypes in samples of unrelated individuals. The source code is available for download here. Check out their home-page for further details.

A typical command line to phase using MaCH looks like this (Gwas.chr20.Unphased.dat and Gwas.chr20.Unphased.ped is the quality controlled GWAS data set in Merlin format)

mach1 -d Gwas.chr20.Unphased.dat \
      -p Gwas.chr20.Unphased.ped \
      --rounds 20 \
      --states 200 \
      --phase \
      --interim 5 \
      --sample 5 \
      --prefix Gwas.Chr20.Phased.Output

Pre-phasing with SHAPEIT

SHAPEIT is a fast and accurate method for estimation of haplotypes (phasing) from genotype or sequencing data. The source code is available for download here. Check out their home-page for further details. It can be used to phase a small number of samples (a reference panel required) as well as a large number of samples (NO reference panel required). The reference panels and genetic map files required by SHAPEIT are available for download here.

  • The following example shows a typical SHAPEIT command line to phase a LARGE number (>200) of GWAS samples (Gwas.chr20.Unphased.vcf is the quality controlled GWAS data set in VCF format).
shapeit -V Gwas.chr20.Unphased.vcf \
        -M genetic_map_chr20.txt \
        -O Gwas.Chr20.Phased.Output
  • The following example shows a typical SHAPEIT command line to phase a SMALL number (<200) of GWAS samples (Gwas.chr20.Unphased.vcf is the quality controlled GWAS data set in VCF format).
## The following step splits out variants mis-aligned between the reference and gwas panel
shapeit -check \
        -V Gwas.chr20.Unphased.vcf\
        -M genetic_map_chr20.txt \
        --input-ref reference.haplotypes.gz reference.legend.gz reference.sample \
        --output-log gwas.alignments

## The following step phases gwas panel using the reference panel while excluding the markers found in the step above.
shapeit -B gwas \
        -V Gwas.chr20.Unphased.vcf \
        --input-ref reference.haplotypes.gz reference.legend.gz reference.sample \
        --exclude-snp gwas.alignments.strand.exclude \
        -O Gwas.Chr20.Phased.Output

Useful Wiki Pages

There are a few pages in this Wiki that may be useful to for Minimac3 users. Here are links to a few:

Contact

In case of any queries and bugs please contact Sayantan Das.