Difference between revisions of "Thunder"
Line 27: | Line 27: | ||
Notes: | Notes: | ||
− | (1) The program thunder used in step 2 is an extension of MaCH, the genotype imputation software we have previously developed. For details regarding the shared options, please check out [http://sph.umich.edu/csg/ | + | (1) The program thunder used in step 2 is an extension of MaCH, the genotype imputation software we have previously developed. For details regarding the shared options, please check out [http://sph.umich.edu/csg/abecasis/mach/index.html MaCH website] and [http://genome.sph.umich.edu/wiki/Mach MaCH wiki]. |
− | (2) Check out example files and command lines under examples/thunder/ in the thunder package [https://sph.umich.edu/csg/ | + | (2) Check out example files and command lines under examples/thunder/ in the thunder package [https://sph.umich.edu/csg/abecasis/thunder/thunder.V011.source.tgz thunder_glf_freq]. |
== Example Showing the Whole Pipeline == | == Example Showing the Whole Pipeline == |
Latest revision as of 12:08, 21 February 2017
This page documents how to perform variant calling from low-coverage sequencing data using glfmultiples and thunder. The pipeline was originally developed by Yun Li and for Goncalo Abecasis the 1000 Genomes Low Coverage Pilot Project.
Input Data
To get started, you will need glf files in the standard format glf format. Sample files are available at sample glf files.
If you do not have glf files, you can generate them from bam files (bam format also specified in glf format bam format) using the following command line:
samtools pileup -g -T 1 -f ref.fa my.bam > my.glf
Note: you will need the reference fasta file ref.fa to create glf file from bam file.
How to Run
This variant calling pipeline has two steps. (step 1) promotion of a set of potential polymorphisms; and (step 2) genotype/haplotype calling using LD information.
(step 1) Site promotion using software glfMultiples GPT_Freq
GPT_Freq -b my.out -p 0.9 --minDepth 10 --maxDepth 1000 *.glf
minDepth and maxDepth are the cutoffs on total depth (across all individuals). We have found it useful to exclude sites with extremely low and high total depth. Please see Important Filters below.
(step 2) Genotype/haplotype calling using thunder thunder_glf_freq
thunder_glf_freq --shotgun my.out.$chr --detailedInput -r 100 --states 200 --dosage --phase --interim 25 -o my.final.out
Notes:
(1) The program thunder used in step 2 is an extension of MaCH, the genotype imputation software we have previously developed. For details regarding the shared options, please check out MaCH website and MaCH wiki.
(2) Check out example files and command lines under examples/thunder/ in the thunder package thunder_glf_freq.
Example Showing the Whole Pipeline
In the thunder thunder_glf_freq tarball, you can find under example/thunder/ folder, input files extracted from real data and a C-shell script that executes the whole analysis pipeline.
Ligate Haplotypes
Please use ligateHaplotypes.
Important Filters
We have found that the following filters are helpful.
allelic imbalance
A statistic developed by Dr. Tom Blackwell allelic imbalance.
indel filter
We recommend distance to known indels >= 5bp. A catalog of known indels can be found at indel catalog.
site promotion filter
We recommend setting parameter -p at least >= 0.9 in step 1 (running glfMultiples).
strand bias filter
total depth filter
For the 1000 Genomes Project (average depth per individual ~4X), we have found it useful to exclude sites with average total depth per individual < 0.5X or > 20X.
coverage filter
We recommend the filter of >50% individuals with coverage.
flanking sequence filter
We recommend excluding sites with >0.1% flanking 10-mer frequency among candidate sites. samtools calmd -br performs this base quality re-calibration.
Citation
Li Y, Sidore C, Kang HM, Boehnke M, Abecasis GR. Low-coverage sequencing: Implications for design of complex trait association studies. Genome Res. 2011 Jun;21(6):940-51.
Inference with External Reference
Please refer to UMAKE.
Questions and Comments?
Email Yun Li.