Changes

From Genome Analysis Wiki
Jump to navigationJump to search
796 bytes added ,  12:08, 21 February 2017
Line 1: Line 1: −
This page documents how to perform variant calling from low-coverage sequencing data using glfmultiples and thunder. The pipeline was originally developed by [mailto:yunli@med.unc.edu Yun Li] for the 1000 Genomes Low Coverage Pilot Project.  
+
This page documents how to perform variant calling from low-coverage sequencing data using glfmultiples and thunder. The pipeline was originally developed by [mailto:yunli@med.unc.edu Yun Li] and for [mailto:goncalo@umich.edu Goncalo Abecasis] the 1000 Genomes Low Coverage Pilot Project.  
    
== Input Data  ==
 
== Input Data  ==
Line 9: Line 9:  
   samtools pileup -g -T 1 -f ref.fa my.bam > my.glf
 
   samtools pileup -g -T 1 -f ref.fa my.bam > my.glf
   −
Note: you will need the reference fasta file ref.fa to create glf file from bam file.  
+
Note: you will need the reference fasta file ref.fa to create glf file from bam file.
    
== How to Run  ==
 
== How to Run  ==
Line 15: Line 15:  
This variant calling pipeline has two steps. (step 1) promotion of a set of potential polymorphisms; and (step 2) genotype/haplotype calling using LD information.  
 
This variant calling pipeline has two steps. (step 1) promotion of a set of potential polymorphisms; and (step 2) genotype/haplotype calling using LD information.  
   −
=== (step 1) Site promotion using software glfMultiples [https://www.sph.umich.edu/csg/yli/GPT_Freq.011.source.tgz GPT_Freq] ===
+
=== (step 1) Site promotion using software glfMultiples [https://csg.sph.umich.edu//yli/GPT_Freq.011.source.tgz GPT_Freq] ===
    
   GPT_Freq -b my.out -p 0.9 --minDepth 10 --maxDepth 1000 *.glf  
 
   GPT_Freq -b my.out -p 0.9 --minDepth 10 --maxDepth 1000 *.glf  
Line 21: Line 21:  
minDepth and maxDepth are the cutoffs on total depth (across all individuals). We have found it useful to exclude sites with extremely low and high total depth. Please see Important Filters below.
 
minDepth and maxDepth are the cutoffs on total depth (across all individuals). We have found it useful to exclude sites with extremely low and high total depth. Please see Important Filters below.
   −
=== (step 2) Genotype/haplotype calling using thunder [https://www.sph.umich.edu/csg/yli/thunder/thunder.V010.source.tgz thunder_glf_freq] ===
+
=== (step 2) Genotype/haplotype calling using thunder [https://csg.sph.umich.edu//yli/thunder/thunder.V011.source.tgz thunder_glf_freq] ===
   −
   thunder_glf_freq --shotgun my.out.$chr -r 100 --states 200 --dosage --phase --interim 25 -o my.final.out
+
   thunder_glf_freq --shotgun my.out.$chr --detailedInput -r 100 --states 200 --dosage --phase --interim 25 -o my.final.out
    
Notes:  
 
Notes:  
   −
(1) The program thunder used in step 2 is an extension of MaCH, the genotype imputation software we have previously developed. For details regarding the shared options, please check out [http://www.sph.umich.edu/csg/yli/mach/index.html MaCH website] and [http://genome.sph.umich.edu/wiki/Mach MaCH wiki].  
+
(1) The program thunder used in step 2 is an extension of MaCH, the genotype imputation software we have previously developed. For details regarding the shared options, please check out [http://sph.umich.edu/csg/abecasis/mach/index.html MaCH website] and [http://genome.sph.umich.edu/wiki/Mach MaCH wiki].  
   −
(2) Check out example files and command lines under examples/thunder/ in the thunder package [https://www.sph.umich.edu/csg/yli/thunder/thunder.V010.source.tgz thunder_glf_freq].  
+
(2) Check out example files and command lines under examples/thunder/ in the thunder package [https://sph.umich.edu/csg/abecasis/thunder/thunder.V011.source.tgz thunder_glf_freq].
 +
 
 +
== Example Showing the Whole Pipeline ==
 +
In the thunder [https://csg.sph.umich.edu//yli/thunder/thunder.V011.source.tgz thunder_glf_freq] tarball, you can find under example/thunder/ folder, input files extracted from real data and a C-shell script that executes the whole analysis pipeline.
 +
 
 +
== Ligate Haplotypes ==
 +
Please use [http://csg.sph.umich.edu//yli/ligateHap.V004.tgz ligateHaplotypes].
    
== Important Filters ==
 
== Important Filters ==
Line 37: Line 43:  
=== allelic imbalance ===
 
=== allelic imbalance ===
 
A statistic developed by Dr. Tom Blackwell [http://genome.sph.umich.edu/wiki/Genotype_Likelihood_Based_Allele_Balance allelic imbalance].  
 
A statistic developed by Dr. Tom Blackwell [http://genome.sph.umich.edu/wiki/Genotype_Likelihood_Based_Allele_Balance allelic imbalance].  
 +
 +
=== indel filter ===
 +
We recommend distance to known indels >= 5bp. A catalog of known indels can be found at [ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/pilot_data/release/2010_07/low_coverage/indels/ indel catalog].
 +
 +
=== site promotion filter ===
 +
We recommend setting parameter -p at least >= 0.9 in step 1 (running glfMultiples).
 +
 +
=== strand bias filter ===
    
=== total depth filter ===
 
=== total depth filter ===
Line 47: Line 61:  
We recommend excluding sites with >0.1% flanking 10-mer frequency among candidate sites. samtools calmd -br performs this base quality re-calibration.
 
We recommend excluding sites with >0.1% flanking 10-mer frequency among candidate sites. samtools calmd -br performs this base quality re-calibration.
   −
=== indel filter ===
+
== Citation ==
We recommend distance to known indels >= 5bp. A catalog of known indels can be found at [ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/pilot_data/release/2010_07/low_coverage/indels/ indel catalog].
+
Li Y, Sidore C, Kang HM, Boehnke M, Abecasis GR. Low-coverage sequencing: Implications for design of complex trait association studies. <em>Genome Res.</em> 2011 Jun;21(6):940-51. <br>
 +
 
 +
== Inference with External Reference ==
   −
=== site promotion filter ===
+
Please refer to [http://genome.sph.umich.edu/wiki/UMAKE UMAKE]. <br>
We recommend setting parameter -p at least >= 0.9 in step 1 (running glfMultiples).
      
== Questions and Comments?  ==
 
== Questions and Comments?  ==
    
Email [mailto:yunli@med.unc.edu Yun Li].
 
Email [mailto:yunli@med.unc.edu Yun Li].
96

edits

Navigation menu