From Genome Analysis Wiki
Jump to navigationJump to search
3 bytes removed
, 06:36, 5 August 2011
Line 48: |
Line 48: |
| Centromere locations are available here: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/cytoBand.txt.gz | | Centromere locations are available here: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/cytoBand.txt.gz |
| | | |
− | == Pre-Phasing using IMPUTE2 === | + | == Pre-Phasing using IMPUTE2 == |
| | | |
| You should now be ready to run the first analysis step, which is to estimate phased haplotypes for each sample. The script below, named ''prototype_phasing_job.sh''(based on the [https://mathgen.stats.ox.ac.uk/impute/prephasing_and_imputation_with_impute2.tgz example script] on the IMPUTE2 website), illustrates how to do this. It requires three parameters (chromosome number, interval start and interval end) and assumes that input data files will in a directory called <code>data_files</code>, with results in <code>results</code> directory and estimated haplotypes in a <code>sampled_haps</code> directory. It has been modified to specify the build 37 recombination map should be used and to include the <code>-allow_large_regions</code> parameter. | | You should now be ready to run the first analysis step, which is to estimate phased haplotypes for each sample. The script below, named ''prototype_phasing_job.sh''(based on the [https://mathgen.stats.ox.ac.uk/impute/prephasing_and_imputation_with_impute2.tgz example script] on the IMPUTE2 website), illustrates how to do this. It requires three parameters (chromosome number, interval start and interval end) and assumes that input data files will in a directory called <code>data_files</code>, with results in <code>results</code> directory and estimated haplotypes in a <code>sampled_haps</code> directory. It has been modified to specify the build 37 recombination map should be used and to include the <code>-allow_large_regions</code> parameter. |
Line 109: |
Line 109: |
| </source> | | </source> |
| | | |
− | where <code>sge</code> should be replaced with the appropriate command for submitting jobs to your cluster (<code>sge</code> applies to sun grid engine, other common choices might be <code>qsub</code> and <code>mosrun</mosrun>. The three numbers correspond to chromosome and chunk start and end positions. | + | where <code>sge</code> should be replaced with the appropriate command for submitting jobs to your cluster (<code>sge</code> applies to sun grid engine, other common choices might be <code>qsub</code> and <code>mosrun</code>. The three numbers correspond to chromosome and chunk start and end positions. |
| | | |
| On a 30 node cluster, phasing should take approximately 5 hours per 1000 individuals. | | On a 30 node cluster, phasing should take approximately 5 hours per 1000 individuals. |