This page documents how to perform variant calling from low-coverage sequencing data using glfmultiples and thunder. The pipeline was originally developed by [ Yun Li] for the 1000 Genomes Low Coverage Pilot Project.
== Input Data ==
To get started, you will need glf files in the standard format [ glf format]. Sample files are available at [ sample glf files].
If you do not have glf files, you can generate them from bam files (bam format also specified in [ glf format bam format]) using the following command line:
samtools pileup -g -T 1 -f ref.fa my.bam > > my.glf
Note: you will need the reference fasta file ref.fa to create glf file from bam file.
== How to Run ==
This variant calling pipeline has two steps. (step 1) promotion of a set of potential polymorphisms; and (step 2) genotype/haplotype calling using LD information.
(step 1) Site promotion using software glfMultiples [ GPT_Freq].
GPT_Freq -b my.out -p 0.9 --minDepth 10 --maxDepth 1000 *.glf
(step 2) Genotype/haplotype calling using thunder [ thunder_glf_freq].
thunder_glf_freq --shotgun my.out.$chr -r 100 --states 200 --dosage --phase --interim 25 -o
 (1) The program thunder used in step 2 is an extension of MaCH, the genotype imputation software we have previously developed. For details regarding the shared options, please check out [ MaCH website] and [ MaCH wiki]. (2) Check out example files and command lines under examples/thunder/ in the thunder package [ thunder_glf_freq].
== Important Filters ==

