GlfSingle
glfSingle is a GLF-based variant caller for next-generation sequencing data. It takes a GLF format genotype likelihood file as input and generates a VCF-format set of variant calls as output.
Basic Usage Example
Here is an example of how glfSingle
works:
glfSingle -g NA19240.chrom20.SLX.glf -b NA19240.chrom20.SLX.vcf > NA19240.chrom20.SLX.log
Command Line Options
-g genotype likelihood file Specifies the name of the input GLF-format genotype likelihood file -b base call file Specifies the name of the output VCF-format base call file -s sample label Specifies a label for the sample being analyzed, which will be included in the output VCF file -p threshold The threshold for base calling. Base calls will be made when their posterior likelihood exceeds threshold
--minMapQuality threshold Positions where the root-means squared mapping quality falls below this threshold will be excluded. --minDepth threshold Positions where the read depth falls below this threshold will be excluded. --maxDepth threshold Positions where the read depth exceeds this threshold will be excluded. --reference Positions called as homozygous reference will be included in the output.
To learn about default values for these options, simply run the program with no arguments.
Model for Variant Calling
glfSingle uses a likelihood-based model for variant calling. Starting from genotype likelihoods Pr(reads| genotype) per genomic position, computed from appropriate tools (eg. Samtools BAQ), the likelihoods combine with an individual-based prior p(genotype) to generate posterior probabilities Pr(genotype| reads).
Ingredients that go into prior:
- All sites have an equal probability of showing polymorphism:
- P(non-reference base) = 0.001
- When a site shows polymorphism, it is usually heterozygous:
- P(non-reference heterozygote) = 0.01 * 2/3
- P(non-reference homozygote) = 0.01 * 1/3
- Mutation model: Transitions (C <-> T or A <-> G) accounts for most variants, while transversions account for minority of variants
- transition has 2/3 probability
- each transversion has 1/6 probability
- New implementation: Alternative mutation model with uniform (uninformative) prior for transition to transversion ratio
- each mutation has a 1/3 probability
- add --uniformTsTv in command line to enable this alternative mutation model
- download here
Download
For the current of glfSingle, please go to our GLF Tools Website.
TODO
Support for X chromosome variant calling.
Support for a two pass depth filter that uses the data to automatically work out appropriate filtering thresholds.