From Genome Analysis Wiki
→(step 2) Genotype/haplotype calling using thunder thunder_glf_freq
This page documents how to perform variant calling from low-coverage sequencing data using glfmultiples and thunder. The pipeline was originally developed by [mailto:email@example.com Yun Li] for the 1000 Genomes Low Coverage Pilot Project.
== Input Data ==
samtools pileup -g -T 1 -f ref.fa my.bam > my.glf
Note: you will need the reference fasta file ref.fa to create glf file from bam file.
== How to Run ==
This variant calling pipeline has two steps. (step 1) promotion of a set of potential polymorphisms; and (step 2) genotype/haplotype calling using LD information.
=== (step 1) Site promotion using software glfMultiples [https://
www.sph.umich.edu/ csg/yli/GPT_Freq.011.source.tgz GPT_Freq] ===
GPT_Freq -b my.out -p 0.9 --minDepth 10 --maxDepth 1000 *.glf
minDepth and maxDepth are the cutoffs on total depth (across all individuals). We have found it useful to exclude sites with extremely low and high total depth. Please see Important Filters below.
=== (step 2) Genotype/haplotype calling using thunder [https://
www.sph.umich.edu/ csg/yli/thunder/thunder. V010.source.tgz thunder_glf_freq] ===
thunder_glf_freq --shotgun my.out.$chr -r 100 --states 200 --dosage --phase --interim 25 -o my.final.out
(1) The program thunder used in step 2 is an extension of MaCH, the genotype imputation software we have previously developed. For details regarding the shared options, please check out [http://
www.sph.umich.edu/csg/ yli/mach/index.html MaCH website] and [http://genome.sph.umich.edu/wiki/Mach MaCH wiki].
(2) Check out example files and command lines under examples/thunder/ in the thunder package [https://
www .sph.umich.edu/ csg/yli/thunder/thunder. V010.source.tgz thunder_glf_freq ].
== Important Filters ==
=== allelic imbalance ===
A statistic developed by Dr. Tom Blackwell [http://genome.sph.umich.edu/wiki/Genotype_Likelihood_Based_Allele_Balance allelic imbalance].
=== total depth filter ===
=== flanking sequence filter ===
We recommend excluding sites with >0.1% flanking 10-mer frequency among candidate sites.
indel filter = == We recommend distance to known indels >= 5bp. A catalog of known indels can be found at [ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/pilot_data/release/2010_07/low_coverage/indels/ indel catalog].
== Questions and Comments? ==
Email [mailto:firstname.lastname@example.org Yun Li].