Line 1: |
Line 1: |
| Michigan Pilot One SNP calling work flow: | | Michigan Pilot One SNP calling work flow: |
− |
| |
− | (1) Create GLF files from BAM files
| |
− |
| |
− | (2) Split GLF files by chromosome
| |
− |
| |
− | (3) Build a list of individuals within each population
| |
− |
| |
− | (4) Link files and tabulate # of files per population, per platform
| |
− |
| |
− | (5) Check total depth for each population, each platform
| |
− |
| |
− | (6) Filter sites with total depth at the extremes, within each population, each platform
| |
− |
| |
− | (7) Obtain one merged GLF for each individual by merging GLFs across platforms for the same individual
| |
− |
| |
− | (8) Promote a set of sites for each population
| |
− |
| |
− | (9) Merge with genotype data
| |
− |
| |
− | (10) Run thunder
| |
− |
| |
− | (11) Ligate thunder results for larger chromosomes
| |
− |
| |
− | (12) Extract QC+ sites
| |
− |
| |
− | (13) Generate other information for VCF format
| |
− |
| |
− | (14) Generate VCF
| |
− |
| |
− | (15) Quality check
| |
− |
| |
− | <br>
| |
| | | |
| == (1) Create GLF files from BAM files == | | == (1) Create GLF files from BAM files == |
Line 46: |
Line 14: |
| == (2) Split GLF files by chromosome == | | == (2) Split GLF files by chromosome == |
| | | |
− | /home1/ylwtx/2009.08.GLF-split/ | + | /home1/ylwtx/2009.08.GLF-split/ |
| | | |
− | update-glf.csh<br>splitGLF.csh | + | update-glf.csh<br>splitGLF.csh |
| | | |
| key command: glfSplit<br>Source codes: ~goncalo/code/glfSplit/ | | key command: glfSplit<br>Source codes: ~goncalo/code/glfSplit/ |
Line 54: |
Line 22: |
| Input GLF format: gz or bgzf<br>Output GLF format: gz | | Input GLF format: gz or bgzf<br>Output GLF format: gz |
| | | |
− | Tom suggested combing the first two steps using the following samtools command:<br>samtools -view -u *.bam 22 | samtools pileup –g - > *.glf | + | Tom suggested combing the first two steps using the following samtools command:<br>samtools -view -u *.bam 22 | samtools pileup –g - > *.glf |
| | | |
| == (3) Build a list of individuals within each population == | | == (3) Build a list of individuals within each population == |
| | | |
− | /home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.11.all | + | /home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.11.all |
| | | |
| STEP 0 in s1-5.csh | | STEP 0 in s1-5.csh |
| | | |
− | Note: Check to make sure that all the individuals with GLF are included in the list “NA.number.by.popn” | + | Note: Check to make sure that all the individuals with GLF are included in the list “NA.number.by.popn” |
| | | |
| == (4) Link files and tabulate # of files per population, per platform == | | == (4) Link files and tabulate # of files per population, per platform == |
Line 92: |
Line 60: |
| Parameters:<br> minMapQ = 30 | | Parameters:<br> minMapQ = 30 |
| | | |
− | back<br> <br>(7) Obtain one merged GLF for each individual by merging GLFs across platforms for the same individual
| + | == (7) Obtain one merged GLF for each individual by merging GLFs across platforms for the same individual == |
| | | |
| /home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.09.all/ | | /home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.09.all/ |
Line 132: |
Line 100: |
| Notes: <br> Sites with more than two alleles (not including REF_ALLELE) will be discarded | | Notes: <br> Sites with more than two alleles (not including REF_ALLELE) will be discarded |
| | | |
− | == (10) Run thunder (hidden Markov model) == | + | To-Do: <br> ** Include genotypes only for individuals that have sequence data |
| + | |
| + | == (10) Run thunder (hidden Markov model) == |
| | | |
| /home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.09.all/ | | /home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.09.all/ |
Line 142: |
Line 112: |
| Notes:<br>(1) Cleaned monomorphic sites before feeding to thunder (no need, b/c thunder 005 handles AL1/-)<br>(2) All sites are bi-allelic with one of the alleles being the reference allele (sites with more than 2 alleles including the reference allele are discarded at the beginning of thunder run: initially because of a prior dependent on the reference allele. In the current setting, where Freq1 is used for the prior, we can choose to ignore the reference allele information.) <br>a. Codes changed on 2009-11-02<br>(3) Split: | | Notes:<br>(1) Cleaned monomorphic sites before feeding to thunder (no need, b/c thunder 005 handles AL1/-)<br>(2) All sites are bi-allelic with one of the alleles being the reference allele (sites with more than 2 alleles including the reference allele are discarded at the beginning of thunder run: initially because of a prior dependent on the reference allele. In the current setting, where Freq1 is used for the prior, we can choose to ignore the reference allele information.) <br>a. Codes changed on 2009-11-02<br>(3) Split: |
| | | |
− | Total 150 jobs (50 jobs for each population) | + | <br> |
| + | |
| + | {| cellspacing="1" cellpadding="1" border="1" style="width: 634px; height: 251px;" |
| + | |- |
| + | | chromosome |
| + | | #parts<br> |
| + | | length per part in Mb (last segment)<br> |
| + | | start<br> |
| + | | end<br> |
| + | |- |
| + | | 1-2<br> |
| + | | 4<br> |
| + | | 70 (63-67)<br> |
| + | | 0,60,120,180<br> |
| + | | 70,130,190,243-247<br> |
| + | |- |
| + | | 3-4<br> |
| + | | 4<br> |
| + | | 60 (41-49)<br> |
| + | | 0,50,100,150<br> |
| + | | 60,110,160,191-200<br> |
| + | |- |
| + | | <br> |
| + | | <br> |
| + | | <br> |
| + | | <br> |
| + | | <br> |
| + | |- |
| + | | 5-6<br> |
| + | | 3<br> |
| + | | 70 (51-61)<br> |
| + | | 0,60,120<br> |
| + | | 70,130,171-181<br> |
| + | |- |
| + | | 7-8<br> |
| + | | 3<br> |
| + | | 60 (40-59)<br> |
| + | | 0,50,100<br> |
| + | | 60,110,140-159<br> |
| + | |- |
| + | | 9*<br> |
| + | | 3<br> |
| + | | 75, 45, 40<br> |
| + | | 0,'''65''',100<br> |
| + | | '''75''',110,140<br> |
| + | |- |
| + | | <br> |
| + | | <br> |
| + | | <br> |
| + | | <br> |
| + | | <br> |
| + | |- |
| + | | 10-12<br> |
| + | | 2<br> |
| + | | 70 (72-75)<br> |
| + | | 0,60<br> |
| + | | 70,132-135<br> |
| + | |- |
| + | | 13-15<br> |
| + | | 2<br> |
| + | | 60 (50-64)<br> |
| + | | 0,50<br> |
| + | | 60,100-114<br> |
| + | |- |
| + | | <br> |
| + | | <br> |
| + | | <br> |
| + | | <br> |
| + | | <br> |
| + | |- |
| + | | 16-22<br> |
| + | | 1<br> |
| + | | 47-89<br> |
| + | | <br> |
| + | | <br> |
| + | |} |
| + | |
| + | *GAP btw 47-65Mb |
| + | |
| + | |
| + | |
| + | Total 150 jobs (50 jobs for each population) |
| | | |
| == (11) Ligate thunder results for larger chromosomes == | | == (11) Ligate thunder results for larger chromosomes == |