Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 48: Line 48:  
Centromere locations are available here: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/cytoBand.txt.gz
 
Centromere locations are available here: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/cytoBand.txt.gz
   −
== Pre-Phasing using IMPUTE2 ===
+
== Pre-Phasing using IMPUTE2 ==
    
You should now be ready to run the first analysis step, which is to estimate phased haplotypes for each sample. The script below, named ''prototype_phasing_job.sh''(based on the [https://mathgen.stats.ox.ac.uk/impute/prephasing_and_imputation_with_impute2.tgz example script] on the IMPUTE2 website), illustrates how to do this. It requires three parameters (chromosome number, interval start and interval end) and assumes that input data files will in a directory called <code>data_files</code>, with results in <code>results</code> directory and estimated haplotypes in a <code>sampled_haps</code> directory. It has been modified to specify the build 37 recombination map should be used and to include the <code>-allow_large_regions</code> parameter.
 
You should now be ready to run the first analysis step, which is to estimate phased haplotypes for each sample. The script below, named ''prototype_phasing_job.sh''(based on the [https://mathgen.stats.ox.ac.uk/impute/prephasing_and_imputation_with_impute2.tgz example script] on the IMPUTE2 website), illustrates how to do this. It requires three parameters (chromosome number, interval start and interval end) and assumes that input data files will in a directory called <code>data_files</code>, with results in <code>results</code> directory and estimated haplotypes in a <code>sampled_haps</code> directory. It has been modified to specify the build 37 recombination map should be used and to include the <code>-allow_large_regions</code> parameter.
Line 109: Line 109:  
</source>
 
</source>
   −
where <code>sge</code> should be replaced with the appropriate command for submitting jobs to your cluster (<code>sge</code> applies to sun grid engine, other common choices might be <code>qsub</code> and <code>mosrun</mosrun>. The three numbers correspond to chromosome and chunk start and end positions.
+
where <code>sge</code> should be replaced with the appropriate command for submitting jobs to your cluster (<code>sge</code> applies to sun grid engine, other common choices might be <code>qsub</code> and <code>mosrun</code>. The three numbers correspond to chromosome and chunk start and end positions.
    
On a 30 node cluster, phasing should take approximately 5 hours per 1000 individuals.
 
On a 30 node cluster, phasing should take approximately 5 hours per 1000 individuals.

Navigation menu