Difference between revisions of "Minimac3 Cookbook : Pre-Phasing"
Santy.8128 (talk | contribs) (Created page with " Once a quality controlled dataset is available we need to pre-phase the data followed by imputation. Pre-Phasing can be done using either [http://www.sph.umich.edu/csg/abecas...") |
Santy.8128 (talk | contribs) |
||
Line 1: | Line 1: | ||
+ | = Introduction = | ||
− | Once a quality controlled dataset is available we need to pre-phase the data followed by imputation. Pre-Phasing can be done using either [http://www.sph.umich.edu/csg/abecasis/MaCH/ MaCH] or [https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html SHAPEIT], the most commonly used tools. | + | Once a quality controlled dataset is available we need to pre-phase the data followed by imputation. Pre-Phasing can be done using either [http://www.sph.umich.edu/csg/abecasis/MaCH/ MaCH] or [https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html SHAPEIT], the most commonly used tools. This wiki page gives detailed instruction on pre-phasing GWAS data of different samples sizes. |
− | = | + | = Pre-phasing with MaCH = |
'''MaCH''' is a Markov Chain based haplotyper. It can resolve long haplotypes in samples of unrelated individuals. The source code is available for download [http://www.sph.umich.edu/csg/abecasis/MaCH/download/ here]. Check out their [http://www.sph.umich.edu/csg/abecasis/MaCH/ home-page] for further details. | '''MaCH''' is a Markov Chain based haplotyper. It can resolve long haplotypes in samples of unrelated individuals. The source code is available for download [http://www.sph.umich.edu/csg/abecasis/MaCH/download/ here]. Check out their [http://www.sph.umich.edu/csg/abecasis/MaCH/ home-page] for further details. | ||
Line 17: | Line 18: | ||
--prefix Gwas.Chr20.Phased.Output | --prefix Gwas.Chr20.Phased.Output | ||
− | = | + | = Pre-phasing with SHAPEIT = |
'''SHAPEIT''' is a fast and accurate method for estimation of haplotypes (phasing) from genotype or sequencing data. The source code is available for download [https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html#download here]. Check out their [https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html home-page] for further details. It can be used to phase a small number of samples (a reference panel required) as well as a large number of samples (NO reference panel required). The reference panels and genetic map files required by SHAPEIT are available for download [https://mathgen.stats.ox.ac.uk/impute/impute_v2.html#reference here]. | '''SHAPEIT''' is a fast and accurate method for estimation of haplotypes (phasing) from genotype or sequencing data. The source code is available for download [https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html#download here]. Check out their [https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html home-page] for further details. It can be used to phase a small number of samples (a reference panel required) as well as a large number of samples (NO reference panel required). The reference panels and genetic map files required by SHAPEIT are available for download [https://mathgen.stats.ox.ac.uk/impute/impute_v2.html#reference here]. | ||
Line 42: | Line 43: | ||
--exclude-snp gwas.alignments.strand.exclude \ | --exclude-snp gwas.alignments.strand.exclude \ | ||
-O Gwas.Chr20.Phased.Output | -O Gwas.Chr20.Phased.Output | ||
+ | |||
+ | = Contact = | ||
+ | |||
+ | In case of any queries and bugs please contact [mailto:sayantan@umich.edu Sayantan Das]. |
Revision as of 22:03, 29 January 2015
Introduction
Once a quality controlled dataset is available we need to pre-phase the data followed by imputation. Pre-Phasing can be done using either MaCH or SHAPEIT, the most commonly used tools. This wiki page gives detailed instruction on pre-phasing GWAS data of different samples sizes.
Pre-phasing with MaCH
MaCH is a Markov Chain based haplotyper. It can resolve long haplotypes in samples of unrelated individuals. The source code is available for download here. Check out their home-page for further details.
A typical command line to phase using MaCH looks like this (Gwas.chr20.Unphased.dat
and Gwas.chr20.Unphased.ped
is the quality controlled GWAS data set in Merlin format)
mach1 -d Gwas.chr20.Unphased.dat \ -p Gwas.chr20.Unphased.ped \ --rounds 20 \ --states 200 \ --phase \ --interim 5 \ --sample 5 \ --prefix Gwas.Chr20.Phased.Output
Pre-phasing with SHAPEIT
SHAPEIT is a fast and accurate method for estimation of haplotypes (phasing) from genotype or sequencing data. The source code is available for download here. Check out their home-page for further details. It can be used to phase a small number of samples (a reference panel required) as well as a large number of samples (NO reference panel required). The reference panels and genetic map files required by SHAPEIT are available for download here.
- The following example shows a typical SHAPEIT command line to phase a LARGE number (>200) of GWAS samples (
Gwas.chr20.Unphased.vcf
is the quality controlled GWAS data set in VCF format).
shapeit -V Gwas.chr20.Unphased.vcf \ -M genetic_map_chr20.txt \ -O Gwas.Chr20.Phased.Output
- The following example shows a typical SHAPEIT command line to phase a SMALL number (<200) of GWAS samples (
Gwas.chr20.Unphased.vcf
is the quality controlled GWAS data set in VCF format).
## The following step splits out variants mis-aligned between the reference and gwas panel shapeit -check \ -V Gwas.chr20.Unphased.vcf\ -M genetic_map_chr20.txt \ --input-ref reference.haplotypes.gz reference.legend.gz reference.sample \ --output-log gwas.alignments ## The following step phases gwas panel using the reference panel while excluding the markers found in the step above. shapeit -B gwas \ -V Gwas.chr20.Unphased.vcf \ --input-ref reference.haplotypes.gz reference.legend.gz reference.sample \ --exclude-snp gwas.alignments.strand.exclude \ -O Gwas.Chr20.Phased.Output
Contact
In case of any queries and bugs please contact Sayantan Das.