Difference between revisions of "ChunkChromosome"

Revision as of 07:55, 5 August 2011

ChunkChromosome is a helper utility for minimac and MaCH. It can be used to facilitate analyses of very large datasets in overlapping slices.

Parameters

ChunkChromosome expects three parameters:

A data file (specified with the -d command line option), listing all the markers along one chromosome. The data file can optionally include phenotype and other information, which is safely ignored.
A desired core chunk size, in markers (specified with the -n command line option and defaulting to 5000 markers).
A desired overlap between chunks, also in markers (specified with the -o command line option and defaulting to 500 markers).

Usage

Suppose you plan to run 1000 Genomes Imputation using MaCH and minimac. Typically, you'd accomplish this by running the following commands:

awk '{ if ($1 == "M") print $2; }' < chr1.dat > chr1.snps
mach1 -d chr1.dat -p chr1.ped --rounds 20 --states 200 --phase --prefix chr1.haps
minimac --refHaps 1000genomes.chr1.haps.gz  --refSnps 1000genomes.chr1.snps --haps chr1.haps.gz --snps chr1.snps  --rounds 5 --states 200 --prefix imputation-results

These commands would haplotype (with MaCH) and then impute (with Minimac) an entire chromosome. While the process works, it can be rather time consuming for large chromosomes and large numbers of individuals. ChunkChromosome allows the process to be streamlined by running different portions of each chromosome in parallel.

<source lang="bash"> ChunkChromosome -d chr1.dat

Phase each chunk in parallel

foreach chunk (chunk*-chr1.dat)

  mach1 -d $chunk -p chr1.ped --rounds 20 --states 200 --phase --prefix ${chunk:r}.haps &

end

Impute each chunk in parallel

foreach chunk (chunk*-chr1.dat)

  minimac --refHaps 1000genomes.chr1.haps.gz  --refSnps 1000genomes.chr1.snps --haps ${chunk:r}.haps.gz --snps ${chunk:r}.snps  --rounds 5 --states 200 --prefix ${chunk:r}-results

end

Download

Difference between revisions of "ChunkChromosome"

Revision as of 07:55, 5 August 2011

Parameters

Usage

Download

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools

@@ Line 15: / Line 15: @@
 <source lang="bash">
 awk '{ if ($1 == "M") print $2; }' < chr1.dat > chr1.snps
-mach1 -d chr1.dat -p chr1.ped --rounds 20 --states 200 --phase --interim 5 --sample 5 --prefix chr1.haps
+mach1 -d chr1.dat -p chr1.ped --rounds 20 --states 200 --phase --prefix chr1.haps
-minimac --refHaps 1000genomes.chr1.haps.gz  --refSnps 1000genomes.chr1.snps --haps chr1.haps --snps chr1.snps  --rounds 5 --states 200 --prefix imputation-results
+minimac --refHaps 1000genomes.chr1.haps.gz  --refSnps 1000genomes.chr1.snps --haps chr1.haps.gz --snps chr1.snps  --rounds 5 --states 200 --prefix imputation-results
 </source>
 These commands would haplotype (with MaCH) and then impute (with Minimac) an entire chromosome. While the process works, it can be rather time consuming for large chromosomes and large numbers of individuals. ChunkChromosome allows the process to be streamlined by running different portions of each chromosome in parallel.
+<source lang="bash">
+ChunkChromosome -d chr1.dat
+# Phase each chunk in parallel
+foreach chunk (chunk*-chr1.dat)
+   mach1 -d $chunk -p chr1.ped --rounds 20 --states 200 --phase --prefix ${chunk:r}.haps &
+end
+# Impute each chunk in parallel
+foreach chunk (chunk*-chr1.dat)
+   minimac --refHaps 1000genomes.chr1.haps.gz  --refSnps 1000genomes.chr1.snps --haps ${chunk:r}.haps.gz --snps ${chunk:r}.snps  --rounds 5 --states 200 --prefix ${chunk:r}-results
+end
 == Download ==