Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,912 bytes added ,  10:02, 2 February 2017
Line 1: Line 1: −
''ChunkChromosome'' is a helper utility for [[minimac]] and [[MaCH]]. It can be used to facilitate analyses of very large datasets in overlapping slices.
+
''ChunkChromosome'' is a helper utility for [[minimac]] and [[MaCH]]. It can be used to facilitate analyses of very large datasets in overlapping slices. For information on how to put the resulting chunks back together, see [[Ligate Minimac|this page]].
    
== Parameters ==
 
== Parameters ==
Line 11: Line 11:  
== Usage ==
 
== Usage ==
   −
Suppose you plan to run [[Minimac: 1000 Genomes Imputation Cookbook|1000 Genomes Imputation]] using [[MaCH]] and [[minimac]]. Typically, you'd accomplish this by running the two commands:
+
Suppose you plan to run [[Minimac: 1000 Genomes Imputation Cookbook|1000 Genomes Imputation]] using [[MaCH]] and [[minimac]]. Typically, you'd accomplish this by running the following commands:
    
<source lang="bash">
 
<source lang="bash">
mach1 -d chr1.dat -p chr1.ped --rounds 20 --states 200 --phase --interim 5 --sample 5 --prefix chr1
+
awk '{ if ($1 == "M") print $2; }' < chr1.dat > chr1.snps
minimac --refHaps 1000genomes.chr1.hap.gz  --refSnps 1000genomes.chr1.snps --haps chr1.hap.gz --snps chr1.snps --rounds 5 --states 200 --prefix imputation-results
+
mach1 -d chr1.dat -p chr1.ped --rounds 20 --states 200 --phase --prefix chr1.haps
 +
minimac --refHaps 1000genomes.chr1.haps.gz  --refSnps 1000genomes.chr1.snps --haps chr1.haps.gz --snps chr1.snps --rounds 5 --states 200 --prefix chr1.imputed
 
</source>
 
</source>
 +
 +
These commands would haplotype (with MaCH) and then impute (with Minimac) an entire chromosome. While the process works, it can be rather time consuming for large chromosomes and large numbers of individuals. ChunkChromosome allows the process to be streamlined by running different portions of each chromosome in parallel.
 +
 +
<source lang="bash">
 +
#!/bin/tcsh
 +
 +
@ length = 2500
 +
@ overlap = 500
 +
 +
# Estimate haplotypes for all individuals, in 2500 marker chunks, with 500 marker overhang
 +
foreach chr (`seq 1 22`)
 +
 +
  ChunkChromosome -d chr$chr.dat -n $length -o $overlap
 +
 +
  foreach chunk (chunk*-chr$chr.dat)
 +
 +
      mach -d $chunk -p chr$chr.ped --prefix ${chunk:r} \
 +
          --rounds 20 --states 200 --phase --sample 5 >& ${chunk:r}-mach.log &
 +
 +
  end
 +
 +
end
 +
wait
 +
 +
# Impute into phased haplotypes
 +
foreach chr (`seq 1 22`)
 +
 +
  foreach chunk (chunk*-chr$chr.dat)
 +
 +
      set haps = /data/1000g/hap/all/20101123.chr$chr.hap.gz
 +
      set snps = /data/1000g/snps/chr$chr.snps
 +
 +
      minimac --refHaps $haps --refSnps $snps --rounds 5 --states 200 \
 +
              --haps ${chunk:r}.gz --snps ${chunk}.snps  --autoClip autoChunk-chr$chr.dat  \
 +
              --prefix ${chunk:r}.imputed >& ${chunk:r}-minimac.log &
 +
 +
  end
 +
 +
end
 +
wait
 +
</source>
 +
 +
The ''autoChunk'' file, generated by the ChunkChromosome program, tells minimac what are the markers of interest for each chunk. This allows chunks to overlap (which improves accuracy near the edges) but still ensures that each marker is only imputed once.
    
== Download ==
 
== Download ==
 +
 +
You can download source code for the ChunkChromosome program in a tar-ball archive [http://csg.sph.umich.edu//cfuchsb/generic-ChunkChromosome-2014-05-27.tar.gz generic-ChunkChromosome-2014-05-27.tar.gz]. After downloading it, unpack the archive and use Make to compile the tool.
96

edits

Navigation menu