Difference between revisions of "Minimac3 Cookbook : Converting Files to VCF"

From Genome Analysis Wiki
Jump to navigationJump to search
(Created page with " Consequently, we would have the following steps. ===Convert GWAS Panel Files into VCF === If pre-phased GWAS data is available in VCF format, users can skip this step. Othe...")
(No difference)

Revision as of 21:54, 29 January 2015

Consequently, we would have the following steps.

Convert GWAS Panel Files into VCF

If pre-phased GWAS data is available in VCF format, users can skip this step. Otherwise, the following steps show how to convert other format files to VCF format.

  • PLINK: Use PLINK2 (available here) as follows:
plink --bfile Gwas.Chr20.Phased.Output \
      --recode vcf \
      --out Gwas.Chr20.Phased.Output.VCF.format
  • MaCH: Use Mach2VCF (coming soon) as follows:
mach2VCF --haps Gwas.Chr20.Phased.Output.hap \
         --snps Gwas.Chr20.Phased.Output.snps \
         --prefix Gwas.Chr20.Phased.Output.VCF.format
  • SHAPEIT: Use SHAPEIT (available here) as follows:
shapeit -convert \
        --input-haps Gwas.Chr20.Phased.Output \
        --output-vcf Gwas.Chr20.Phased.Output.VCF.format.vcf

Download Reference Panel

Commonly used reference panels are 1000 Genomes Phase 3 (2,535 samples), 1000 Genomes Phase 1 (1,094 samples), HapMap2 (269 samples), Haplotype Reference Consortium (32,914 samples) etc. Users are advised to use either 1000 Genomes Phase 3 (available for download in Reference Panels ) or the Haplotype Reference Consortium (which due to data privacy issues cannot be shared publicly but can be used for imputation remotely on a server through a imputation server setup at University of Michigan). Reference panels for different versions of 1000 Genomes, in both VCF and M3VCF format, are available for download in Reference Panels.

Impute Samples

The final step for imputation involves running Minimac3 to perform the imputation analysis. Now that we have the pre-phased GWAS panel (in VCF format) and the appropriate reference panel (in VCF or M3VCF format), we can run Minimac3 as follows. In the following examples, the first one uses a VCF file for reference (that can be obtained as explained above) and the second example uses a M3VCF file (that might have been downloaded from the links below or created on a previous run of Minimac3).

../bin/Minimac3 --refHaps ReferencePanel.Chr20.1000Genomes.vcf \ 
                --haps Gwas.Chr20.Phased.Output.VCF.format.vcf \
                --prefix Gwas.Chr20.Imputed.Output
../bin/Minimac3 --refHaps ReferencePanel.Chr20.1000Genomes.m3vcf \ 
                --haps Gwas.Chr20.Phased.Output.VCF.format.vcf \
                --prefix Gwas.Chr20.Imputed.Output