Changes

From Genome Analysis Wiki
Jump to navigationJump to search
69 bytes removed ,  12:34, 25 January 2017
Line 1: Line 1: −
Before reading this tutorial, you might find it useful to spend a few minutes reading through the main [[Minimac]] documentation.  
+
Before reading this tutorial, you might find it useful to spend a few minutes reading through the main [[Minimac]] and [[Minimac2]] documentation.  
    
== Getting Started ==
 
== Getting Started ==
   −
=== Example Data ===
+
Download [http://csg.sph.umich.edu/abecasis/MaCH/download/ MaCH] and [http://genome.sph.umich.edu/wiki/Minimac#Download Minimac] or [http://genome.sph.umich.edu/wiki/Minimac2#Download Minimac2]. Furthermore, example data used in this tutorial can be found [http://csg.sph.umich.edu/cfuchsb/minimac2_example.tgz here]
   −
# GWAS data
+
== Minimac and Minimac2 Imputation ==
   −
# Reference haplotypes
+
[[Minimac]] and [[Minimac2]] relies on a two step approach. First, the samples that are to be analyzed must be phased into a series of estimated haplotypes. Second, imputation is carried out directly into these phased haplotypes. As newer reference panels become available, only the second step must be repeated.
 
  −
 
  −
== Minimac Imputation ==
  −
 
  −
[[Minimac]] relies on a two step approach. First, the samples that are to be analyzed must be phased into a series of estimated haplotypes. Second, imputation is carried out directly into these phased haplotypes. As newer reference panels become available, only the second step must be repeated.
      
=== Pre-phasing - MaCH ===
 
=== Pre-phasing - MaCH ===
Line 18: Line 13:  
A convenient way to haplotype your sample is to use MaCH. A typical MaCH command line to estimate phased haplotypes might look like this:
 
A convenient way to haplotype your sample is to use MaCH. A typical MaCH command line to estimate phased haplotypes might look like this:
   −
  mach1 -d chr1.dat -p chr1.ped --rounds 20 --states 200 --phase --interim 5 --sample 5 --prefix chr$chr.haps
+
./mach1 -d sample.dat -p sample.ped --rounds 20 --states 50 --phase --interim 5 --sample 5 --prefix sample.pp | tee mach.log
   −
This will request that MaCH estimate haplotypes for your sample, using 20 iterations of its Markov sampler and conditioning each update on up to 200 haplotypes. A summary description of these parameters follows (but for a more complete description, you should go to the [http://www.sph.umich.edu/csg/abecasis/MaCH/ MaCH website]):
+
This will request that MaCH estimate haplotypes for your sample, using 20 iterations of its Markov sampler and conditioning each update on up to 50 haplotypes. A summary description of these parameters follows (but for a more complete description, you should go to the [http://csg.sph.umich.edu/abecasis/MaCH/ MaCH website]):
    
{| class="wikitable" border="1" cellpadding="2"
 
{| class="wikitable" border="1" cellpadding="2"
Line 28: Line 23:  
|-  
 
|-  
 
|style=white-space:nowrap|<code>-d sample.dat</code>
 
|style=white-space:nowrap|<code>-d sample.dat</code>
| Data file in [http://www.sph.umich.edu/csg/abecasis/Merlin/tour/input_files.html Merlin format]. Markers should be listed according to their order along the chromosome.
+
| Data file in [http://csg.sph.umich.edu/abecasis/Merlin/tour/input_files.html Merlin format]. Markers should be listed according to their order along the chromosome.
 
|-  
 
|-  
 
| <code>-p sample.ped</code>
 
| <code>-p sample.ped</code>
| Pedigree file in [http://www.sph.umich.edu/csg/abecasis/Merlin/tour/input_files.html Merlin format]. Alleles should be labeled on the forward strand.
+
| Pedigree file in [http://csg.sph.umich.edu/abecasis/Merlin/tour/input_files.html Merlin format]. Alleles should be labeled on the forward strand.
 
|-
 
|-
 
| <code>--states 200</code>
 
| <code>--states 200</code>
Line 52: Line 47:  
|}
 
|}
    +
=== Imputation into Phased Haplotypes - minimac(2)===
   −
You should be able to run this step in parallel and in our cluster we'd use:
+
Imputing genotypes using '''minimac(2)''' is a straightforward process: after selecting a set of reference haplotypes, plugging-in the target haplotypes from the previous step and setting the number of rounds to use for estimating model parameters (which describe the length and conservation of haplotype stretches shared between the reference panel and your study samples), imputation should proceed rapidly.
   −
<source lang="text">
+
Minimac needs a file listing the variants in your sample. If your directory already includes a "sample.snps" file, no worries. If it doesn't, you can generate one using "sample.dat" as input with the following command:
  foreach chr (`seq 1 22`)
     −
    runon -m 4096 mach -d chr$chr.dat -p chr$chr.ped --rounds 20 --states 200 --phase --interim 5 --sample 5 --prefix chr$chr.haps
+
  cut -f 2 -d " " sample.dat > sample.snps
   −
  end
+
The minimac command line would look like this:
</source>
     −
=== Imputation into Phased Haplotypes - minimac ===
+
./minimac --refHaps hapmap.hap --refSnps hapmap.snps --haps sample.pp.gz --snps sample.snps --prefix sample.imp | tee minimac.log
   −
Imputing genotypes using '''minimac''' is a straightforward process: after selecting a set of reference haplotypes, plugging-in the target haplotypes from the previous step and setting the number of rounds to use for estimating model parameters (which describe the length and conservation of haplotype stretches shared between the reference panel and your study samples), imputation should proceed rapidly. Because marker names can change between dbSNP versions, it is usually a good idea to include ''aliases'' file that provides mappings between earlier marker names and the current preferred name for each polymorphism.
+
or
   −
A typical minimac command line, where the string $chr should be replaced with an appropriate chromosome number, might look like this:
+
./minimac2 --refHaps hapmap.hap --refSnps hapmap.snps --haps sample.pp.gz --snps sample.snps --prefix sample2.imp | tee minimac2.log
   −
== using a VCF reference panel ==
  −
minimac --vcfReference --refHaps ref.vcf.gz --haps target.hap.gz --snps target.snps.gz --rounds 5 --states 200 --prefix results
  −
Note: GWAS SNPs (file --snps target.snps.gz) are by default expected to be in the chr:pos format e.g. 1:1000 and on build37/hg19;
  −
          otherwise, please set the --rs flag and include an aliases file --snpAliase [http://www.sph.umich.edu/csg/abecasis/downloads/dbsnp134-merges.txt.gz dbsnp134-merges.txt.gz]
     −
+
A detailed description of all minimac(2) options is available [[Minimac Command Reference|elsewhere]]. Here is a brief description of the above parameters:
A detailed description of all minimac options is available [[Minimac Command Reference|elsewhere]]. Here is a brief description of the above parameters:
      
{| class="wikitable" border="1" cellpadding="2"
 
{| class="wikitable" border="1" cellpadding="2"
Line 82: Line 71:  
! Description
 
! Description
 
|-  
 
|-  
| <code>--refHaps ref.hap.gz </code>  
+
| <code>--refHaps hapmap.hap </code>  
| Reference haplotypes (e.g. from [http://www.sph.umich.edu/csg/abecasis/MACH/download/1000G-2010-06.html MaCH download page])
+
| Reference haplotypes (e.g. from HapMap or the 1000 genomes Project).
 +
|-
 +
| <code>--refSnps hapmap.snps </code>
 +
| List of sites in the reference haplotypes; needed unless the reference haplotypes are in VCF format.
 
|-
 
|-
 
| <code>--vcfReference </code>  
 
| <code>--vcfReference </code>  
Line 106: Line 98:       −
You can speed-up things by running minimac in parallel by launching the [http://genome.sph.umich.edu/wiki/Minimac#Multiprocessor_Version minimac-omp] version. On our cluster 4 cpus per minimac is optimal (--cpus 4).
     −
<source lang="text">
+
You can speed-up things by running minimac in parallel by launching the [http://genome.sph.umich.edu/wiki/Minimac2#Multiprocessor_Version minimac2-omp] version. On our cluster 4 cpus per minimac(2) is optimal (--cpus 4).
  foreach chr (`seq 1 22`)
+
 
 +
./minimac-omp --cpus 4 --refHaps hapmap.hap --refSnps hapmap.snps --haps sample.pp.gz --snps sample.snps --prefix sample.imp | tee minimac-omp.log
   −
    runon -m 1024 minimac-omp --cpus 4 --refHaps ref.hap.$chr.gz  --vcfReference  \
+
or
                          --haps chr$chr.haps.gz --snps chr.$chr.snps --prefix chr$chr.imputed                   
  −
  end
  −
</source>
      +
./minimac2-omp --cpus 4 --refHaps hapmap.hap --refSnps hapmap.snps --haps sample.pp.gz --snps sample.snps --prefix sample2.imp | tee minimac-omp.log
    
== Imputation quality evaluation ==
 
== Imputation quality evaluation ==
Line 129: Line 119:  
= Reference =
 
= Reference =
   −
If you use minimac, please cite:  
+
If you use minimac or minimac2, please cite:  
    
Howie B, Fuchsberger C, Stephens M, Marchini J, and Abecasis GR.
 
Howie B, Fuchsberger C, Stephens M, Marchini J, and Abecasis GR.

Navigation menu