Minimac3 Cookbook : Pre-Phasing

From Genome Analysis Wiki
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Introduction

Once a quality controlled dataset is available we need to pre-phase the data followed by imputation. Pre-Phasing can be done using either MaCH or SHAPEIT, the most commonly used tools. This wiki page gives detailed instruction on pre-phasing GWAS data of different samples sizes. This also applies for phasing of the non-Pseudo-Autosomal Region (only for females) and the Pseudo-Autosomal Region (for both males and females) of chromosome X .

Pre-phasing with MaCH

MaCH is a Markov Chain based haplotyper. It can resolve long haplotypes in samples of unrelated individuals. The source code is available for download here. Check out their home-page for further details.

A typical command line to phase using MaCH looks like this (Gwas.chr20.Unphased.dat and Gwas.chr20.Unphased.ped is the quality controlled GWAS data set in Merlin format)

mach1 -d Gwas.chr20.Unphased.dat \
      -p Gwas.chr20.Unphased.ped \
      --rounds 20 \
      --states 200 \
      --phase \
      --interim 5 \
      --sample 5 \
      --prefix Gwas.Chr20.Phased.Output

Pre-phasing with SHAPEIT

SHAPEIT is a fast and accurate method for estimation of haplotypes (phasing) from genotype or sequencing data. The source code is available for download here. Check out their home-page for further details. It can be used to phase a small number of samples (a reference panel required) as well as a large number of samples (NO reference panel required). The reference panels and genetic map files required by SHAPEIT are available for download here.

  • The following example shows a typical SHAPEIT command line to phase a LARGE number (>200) of GWAS samples (Gwas.chr20.Unphased.vcf is the quality controlled GWAS data set in VCF format).
shapeit -V Gwas.chr20.Unphased.vcf \
        -M genetic_map_chr20.txt \
        -O Gwas.Chr20.Phased.Output
  • The following example shows a typical SHAPEIT command line to phase a SMALL number (<200) of GWAS samples (Gwas.chr20.Unphased.vcf is the quality controlled GWAS data set in VCF format).
## The following step splits out variants mis-aligned between the reference and gwas panel
shapeit -check \
        -V Gwas.chr20.Unphased.vcf\
        -M genetic_map_chr20.txt \
        --input-ref reference.haplotypes.gz reference.legend.gz reference.sample \
        --output-log gwas.alignments

## The following step phases gwas panel using the reference panel while excluding the markers found in the step above.
shapeit -B gwas \
        -V Gwas.chr20.Unphased.vcf \
        --input-ref reference.haplotypes.gz reference.legend.gz reference.sample \
        --exclude-snp gwas.alignments.strand.exclude \
        -O Gwas.Chr20.Phased.Output

Useful Wiki Pages

There are a few pages in this Wiki that may be useful to for Minimac3 users. Here are links to a few:

Contact

In case of any queries and bugs please contact Sayantan Das.