GlfSingle
glfSingle is a GLF-based variant caller for next-generation sequencing data. It takes a GLF format genotype likelihood file as input and generates a VCF-format set of variant calls as output.
Basic Usage Example
Here is an example of how glfSingle
works:
glfSingle -g NA19240.chrom20.SLX.glf -b NA19240.chrom20.SLX.vcf > NA19240.chrom20.SLX.log
Command Line Options
-g genotype likelihood file Specifies the name of the input GLF-format genotype likelihood file -b base call file Specifies the name of the output VCF-format base call file -s sample label Specifies a label for the sample being analyzed, which will be included in the output VCF file -p threshold The threshold for base calling. Base calls will be made when their posterior likelihood exceeds threshold
--minMapQuality threshold Positions where the root-means squared mapping quality falls below this threshold will be excluded. --minDepth threshold Positions where the read depth falls below this threshold will be excluded. --maxDepth threshold Positions where the read depth exceeds this threshold will be excluded. --reference Positions called as homozygous reference will be included in the output. --uniformTsTv Use a uniform prior for transition-to-transversion ratio.
To learn about default values for these options, simply run the program with no arguments.
Model for Variant Calling
glfSingle uses a likelihood-based model for variant calling. Starting from genotype likelihoods Pr(reads| genotype) per genomic position, computed from appropriate tools (eg. Samtools BAQ), the likelihoods combine with an individual-based prior p(genotype) to generate posterior probabilities Pr(genotype| reads).
Ingredients that go into prior:
- All sites have an equal probability of showing polymorphism:
- P(non-reference base) = 0.001
- When a site shows polymorphism, it is usually heterozygous:
- P(non-reference heterozygote) = 0.01 * 2/3
- P(non-reference homozygote) = 0.01 * 1/3
- Mutation model: Transitions (C <-> T or A <-> G) accounts for most variants, while transversions account for minority of variants
- transition has 2/3 probability
- each transversion has 1/6 probability
- New implementation: Alternative mutation model with uniform (uninformative) prior for transition to transversion ratio
- each mutation has a 1/3 probability
- add --uniformTsTv in command line to enable this alternative mutation model
- download here
Download
For the current of glfSingle, please go to our GLF Tools Website.
TODO
Support for X chromosome variant calling.
Support for a two pass depth filter that uses the data to automatically work out appropriate filtering thresholds.