From Genome Analysis Wiki
Jump to navigationJump to search
1,133 bytes added
, 17:25, 4 October 2010
Line 25: |
Line 25: |
| The 1000 Genome pilot project genotypes use NCBI Build 36. | | The 1000 Genome pilot project genotypes use NCBI Build 36. |
| | | |
− | === Step 1: Phasing === | + | === Step 1: Pre-Phasing === |
| + | |
| + | For the pre-phasing step we recommend [[MaCH]] using the --phase command line option. As input [[MaCH]] needs a [[Merlin]] format pedigree and data file. All markers must be ordered according to their physical position. |
| + | |
| + | ==== Usage ==== |
| + | mach1 -d sample.dat -p sample.ped --rounds 20 --states 200 --phase --interim 1 --sample 1 --compact |
| + | |
| + | ==== Parameters ==== |
| + | --rounds R |
| + | how many iterations of the Markov sampler should be run. |
| + | |
| + | --states ST |
| + | use a random subset of ST haplotypes as reference. We recommend values between 200 - 500. More states result in more accurate haplotypes, but are computational more expensive. |
| + | |
| + | --interim I |
| + | output a set of best-guess haplotypes every I rounds by building consensus from all previous Markov iterations. These haplotypes can be used for imputation. |
| + | |
| + | --sample SA |
| + | output a set of haplotypes every SA rounds based on random sampling from the last Markov iteration. These intermediate results can be combined and used as input for the imputation process. |
| + | |
| + | --phase |
| + | enables [[MaCH]] phasing mode. |
| + | |
| + | --compact |
| + | reduces the amount of memory needed dramatically, but doubles execution time. |
| + | |
| + | |
| | | |
| === Step 2: Imputation === | | === Step 2: Imputation === |