Changes

From Genome Analysis Wiki
Jump to navigationJump to search
no edit summary
Line 44: Line 44:  
(file size) BAM (binary SAM) file: up to 43Gb (average 10s Gb) for each individual<br>(file size) GLF (binary) file: up to 19Gb (average 10s Gb) for each individual  
 
(file size) BAM (binary SAM) file: up to 43Gb (average 10s Gb) for each individual<br>(file size) GLF (binary) file: up to 19Gb (average 10s Gb) for each individual  
   −
<br>back
+
== (2) Split GLF files by chromosome ==
 
  −
<br>(2) Split GLF files by chromosome  
      
/home1/ylwtx/2009.08.GLF-split/  
 
/home1/ylwtx/2009.08.GLF-split/  
Line 58: Line 56:  
Tom suggested combing the first two steps using the following samtools command:<br>samtools -view -u *.bam 22 | samtools pileup –g - &gt; *.glf  
 
Tom suggested combing the first two steps using the following samtools command:<br>samtools -view -u *.bam 22 | samtools pileup –g - &gt; *.glf  
   −
back<br> <br>(3) Build a list of individuals within each population  
+
== (3) Build a list of individuals within each population ==
    
/home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.11.all  
 
/home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.11.all  
Line 66: Line 64:  
Note: Check to make sure that all the individuals with GLF are included in the list “NA.number.by.popn”  
 
Note: Check to make sure that all the individuals with GLF are included in the list “NA.number.by.popn”  
   −
<br>back
+
== (4) Link files and tabulate # of files per population, per platform ==
 
  −
<br>(4) Link files and tabulate # of files per population, per platform  
      
/home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.09.all  
 
/home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.09.all  
Line 76: Line 72:  
key command: ln -s  
 
key command: ln -s  
   −
<br>back
+
== (5) Check total (aggregated over all individual samples) depth for each population, each platform ==
 
  −
<br>
  −
 
  −
<br> <br>(5) Check total (aggregated over all individual samples) depth for each population, each platform  
      
/home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.09.all/  
 
/home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.09.all/  
Line 90: Line 82:  
Other: total depth for each site (for vcf) (run after filter)<br> IMPORTANT/total_depth_per_site.r3.csh<br> key command&nbsp;: DepthPerSite<br> Source codes&nbsp;: /home/ylwtx/codes/cpp/ DepthPerSite/<br> Input&nbsp;: GLF files<br> Output&nbsp;: total depth for each site  
 
Other: total depth for each site (for vcf) (run after filter)<br> IMPORTANT/total_depth_per_site.r3.csh<br> key command&nbsp;: DepthPerSite<br> Source codes&nbsp;: /home/ylwtx/codes/cpp/ DepthPerSite/<br> Input&nbsp;: GLF files<br> Output&nbsp;: total depth for each site  
   −
back
+
== (6) Filter sites with total depth at the extremes, within each population, each platform ==
 
  −
<br> <br>(6) Filter sites with total depth at the extremes, within each population, each platform  
      
/home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.09.all/  
 
/home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.09.all/  
Line 110: Line 100:  
key command&nbsp;: glfMerge or glfMerge_noBGZF<br>Source codes&nbsp;: /home/ylwtx/codes/cpp/glfMerge<br> Originally ~goncalo/code/glfMerge<br>Input&nbsp;: GLF files for one individual<br>Output&nbsp;: GLF file  
 
key command&nbsp;: glfMerge or glfMerge_noBGZF<br>Source codes&nbsp;: /home/ylwtx/codes/cpp/glfMerge<br> Originally ~goncalo/code/glfMerge<br>Input&nbsp;: GLF files for one individual<br>Output&nbsp;: GLF file  
   −
back<br> <br>(8) Promote a set of sites for each population  
+
== (8) Promote a set of sites for each population ==
    
/home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.09.all/  
 
/home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.09.all/  
Line 120: Line 110:  
Parameters set:<br>(1) Posterior probability (for being a polymorphism) threshold: 0.999 (0.9 for genomewide but need test)<br>(2) minMapQ = 30<br>(3) –allhet default is ON<br>Input: GLF<br>Output: simplified GLF with three likelihoods  
 
Parameters set:<br>(1) Posterior probability (for being a polymorphism) threshold: 0.999 (0.9 for genomewide but need test)<br>(2) minMapQ = 30<br>(3) –allhet default is ON<br>Input: GLF<br>Output: simplified GLF with three likelihoods  
   −
back<br> <br>(9) Merge with genotype data  
+
== (9) Merge with genotype data ==
   −
(8.1) prepare genotype data in a unified format  
+
=== (9.1) prepare genotype data in a unified format ===
    
/home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/genotypes_all_2  
 
/home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/genotypes_all_2  
Line 132: Line 122:  
*special thanks to Wei Chen for preparing the genotype files
 
*special thanks to Wei Chen for preparing the genotype files
   −
(8.2) merge genotype data with sequence data at promoted sites  
+
=== (9.2) merge genotype data with sequence data at promoted sites ===
    
/home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.09.all/  
 
/home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.09.all/  
Line 142: Line 132:  
Notes: <br> Sites with more than two alleles (not including REF_ALLELE) will be discarded  
 
Notes: <br> Sites with more than two alleles (not including REF_ALLELE) will be discarded  
   −
<br>back<br> <br>[[|]](10) Run thunder (hidden Markov model)  
+
== (10) Run thunder (hidden Markov model) ==
    
/home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.09.all/  
 
/home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.09.all/  
Line 151: Line 141:     
Notes:<br>(1) Cleaned monomorphic sites before feeding to thunder (no need, b/c thunder 005 handles AL1/-)<br>(2) All sites are bi-allelic with one of the alleles being the reference allele (sites with more than 2 alleles including the reference allele are discarded at the beginning of thunder run: initially because of a prior dependent on the reference allele. In the current setting, where Freq1 is used for the prior, we can choose to ignore the reference allele information.) <br>a. Codes changed on 2009-11-02<br>(3) Split:  
 
Notes:<br>(1) Cleaned monomorphic sites before feeding to thunder (no need, b/c thunder 005 handles AL1/-)<br>(2) All sites are bi-allelic with one of the alleles being the reference allele (sites with more than 2 alleles including the reference allele are discarded at the beginning of thunder run: initially because of a prior dependent on the reference allele. In the current setting, where Freq1 is used for the prior, we can choose to ignore the reference allele information.) <br>a. Codes changed on 2009-11-02<br>(3) Split:  
  −
<br>
      
Total 150 jobs (50 jobs for each population)  
 
Total 150 jobs (50 jobs for each population)  
   −
back<br> <br>(11) Ligate thunder results for larger chromosomes  
+
== (11) Ligate thunder results for larger chromosomes ==
    
<br>/home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.11.all  
 
<br>/home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.11.all  
Line 162: Line 150:  
ligate.all. 2009-10-27.csh  
 
ligate.all. 2009-10-27.csh  
   −
<br>back
+
== (12) Extract QC+ sites ==
 
  −
<br>(12) Extract QC+ sites  
      
/home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.09.all/  
 
/home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.09.all/  
Line 172: Line 158:  
(1) Check for individuals who are genotyped only:<br>For example, before 2009-12, in CHB+JPT, the following 2 individuals are sequenced:<br>NA18631<br>NA18634<br>(2) Check for trios whose sequencing information were not used: e.g., daughter and father of the trio  
 
(1) Check for individuals who are genotyped only:<br>For example, before 2009-12, in CHB+JPT, the following 2 individuals are sequenced:<br>NA18631<br>NA18634<br>(2) Check for trios whose sequencing information were not used: e.g., daughter and father of the trio  
   −
back
+
== (13) Generate other information for VCF format (no longer needed, already generated) ==
 
  −
<br>
  −
 
  −
<br>
  −
 
  −
<br> <br>(13) Generate other information for VCF format (no longer needed, already generated)  
      
/home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.09.all/<br>site.depth.csh (no longer needed, generated in GLF step)  
 
/home/ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.09.all/<br>site.depth.csh (no longer needed, generated in GLF step)  
   −
<br>
+
== (14) Generate VCF ==
 
  −
back<br> <br>(14) Generate VCF  
      
/home/ylwtx/1000Genomes/UoM_2009_12<br>cmd.csh<br> $pop.sh<br> vcf.py  
 
/home/ylwtx/1000Genomes/UoM_2009_12<br>cmd.csh<br> $pop.sh<br> vcf.py  
   −
<br>
+
== (15) Quality check ==
 
  −
<br>back
  −
 
  −
<br>(15) Quality check  
      
~ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.11.all/chr20/tout/CEU  
 
~ylwtx/codes/cpp/mach-1.0.16/test_thunder/2009.11.all/chr20/tout/CEU  
   −
(A) Accuracy of genotype calls<br>eval.Rsq.csh<br>or eval.r2_hat.csh<br>(B) Accuracy of haplotype calls<br>comparehaplotypes.csh  
+
(A) Accuracy of genotype calls<br>eval.Rsq.csh<br>or eval.r2_hat.csh<br>(B) Accuracy of haplotype calls<br>comparehaplotypes.csh
 
  −
<br>
  −
 
  −
<br>back
  −
 
  −
<br><br>
 

Navigation menu