Changes

704 bytes added , 10:02, 2 February 2017

Line 1: Line 1: −

''ChunkChromosome'' is a helper utility for [[minimac]] and [[MaCH]]. It can be used to facilitate analyses of very large datasets in overlapping slices.

+

''ChunkChromosome'' is a helper utility for [[minimac]] and [[MaCH]]. It can be used to facilitate analyses of very large datasets in overlapping slices. For information on how to put the resulting chunks back together, see [[Ligate Minimac|this page]].

== Parameters ==

Line 16: Line 16:

awk '{ if ($1 == "M") print $2; }' < chr1.dat > chr1.snps

mach1 -d chr1.dat -p chr1.ped --rounds 20 --states 200 --phase --prefix chr1.haps

−

minimac --refHaps 1000genomes.chr1.haps.gz --refSnps 1000genomes.chr1.snps --haps chr1.haps.gz --snps chr1.snps --rounds 5 --states 200 --prefix ~~imputation-results~~

+

minimac --refHaps 1000genomes.chr1.haps.gz --refSnps 1000genomes.chr1.snps --haps chr1.haps.gz --snps chr1.snps --rounds 5 --states 200 --prefix chr1.imputed

</source>

Line 22: Line 22:

−

ChunkChromosome -d ~~chr1~~.dat

+

#!/bin/tcsh

+

@ length = 2500

+

@ overlap = 500

+

# Estimate haplotypes for all individuals, in 2500 marker chunks, with 500 marker overhang

+

foreach chr (`seq 1 22`)

+

ChunkChromosome -d chr$chr.dat -n $length -o $overlap

+

foreach chunk (chunk*-chr$chr.dat)

+

mach -d $chunk -p chr$chr.ped --prefix ${chunk:r} \

+

--rounds 20 --states 200 --phase --sample 5 >& ${chunk:r}-mach.log &

+

end

−

~~# Phase each chunk in parallel~~

−

~~foreach chunk (chunk*-chr1.dat)~~

−

~~mach1 -d $chunk -p chr1.ped --rounds 20 --states 200 --phase --prefix ${chunk:r}.haps >& ${chunk:r}-mach.log &~~

end

wait

−

# Impute ~~each chunk in parallel~~

+

# Impute into phased haplotypes

−

foreach chunk (chunk*-~~chr1~~.dat)

+

foreach chr (`seq 1 22`)

−

~~minimac --autoClip autoChunk-chr1~~.~~dat \~~

+

−

--refHaps ~~1000genomes.chr1.~~haps~~.gz~~ --refSnps ~~1000genomes.chr1.~~snps --rounds 5 --states 200 \

+

foreach chunk (chunk*-chr$chr.dat)

−

--haps ${chunk:r}~~.haps~~.gz --snps ${chunk}.snps --prefix ${chunk:r}~~-results~~ >& ${chunk:r}-minimac.log &

+

set haps = /data/1000g/hap/all/20101123.chr$chr.hap.gz

+

set snps = /data/1000g/snps/chr$chr.snps

+

minimac --refHaps $haps --refSnps $snps --rounds 5 --states 200 \

+

--haps ${chunk:r}.gz --snps ${chunk}.snps --autoClip autoChunk-chr$chr.dat \

+

--prefix ${chunk:r}.imputed >& ${chunk:r}-minimac.log &

+

end

+

end

wait

</source>

−

The ''autoChunk'', generated by the ChunkChromosome program, tells minimac what are the markers of interest for each chunk. This allows chunks to overlap (which improves accuracy near the edges) but still ensures that each marker is only imputed once.

+

The ''autoChunk'' file, generated by the ChunkChromosome program, tells minimac what are the markers of interest for each chunk. This allows chunks to overlap (which improves accuracy near the edges) but still ensures that each marker is only imputed once.

== Download ==

+

You can download source code for the ChunkChromosome program in a tar-ball archive [http://csg.sph.umich.edu//cfuchsb/generic-ChunkChromosome-2014-05-27.tar.gz generic-ChunkChromosome-2014-05-27.tar.gz]. After downloading it, unpack the archive and use Make to compile the tool.

Ppwhite

96

edits

Changes

ChunkChromosome (view source)

Revision as of 10:02, 2 February 2017

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools