Difference between revisions of "GlfSingle"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 23: Line 23:
  
 
== Model for Variant Calling ==
 
== Model for Variant Calling ==
 +
glfSingle uses a likelihood-based model for variant calling. Starting from genotype likelihoods ''Pr(reads| genotype)'' per genomic position, computed from appropriate tools (eg. Samtools BAQ), the likelihoods combine with an individual-based prior ''p(genotype)'' to generate posterior probabilities ''Pr(genotype| reads)''.
 +
 +
Ingredients that go into prior:
 +
*All sites have an equal probability of showing polymorphism:
 +
**P(non-reference base) = 0.001
 +
*When a site shows polymorphism, it is usually heterozygous:
 +
**P(non-reference heterozygote) = 0.01 * 2/3
 +
**P(non-reference homozygote) = 0.01 * 1/3
 +
*Two options of mutation model:
 +
**Transitions (C <-> T or A <-> G) accounts for most variants, while transversions account for minority of variants
 +
**Uniform prior for transition to transversion ratio (1:1)
  
 
== Download ==
 
== Download ==

Revision as of 15:17, 23 September 2013

glfSingle is a GLF-based variant caller for next-generation sequencing data. It takes a GLF format genotype likelihood file as input and generates a VCF-format set of variant calls as output.

Basic Usage Example

Here is an example of how glfSingle works:

  glfSingle -g NA19240.chrom20.SLX.glf -b NA19240.chrom20.SLX.vcf > NA19240.chrom20.SLX.log

Command Line Options

 -g genotype likelihood file    Specifies the name of the input GLF-format genotype likelihood file
 -b base call file              Specifies the name of the output VCF-format base call file
 -s sample label                Specifies a label for the sample being analyzed, which will be included in the output VCF file
 -p threshold                   The threshold for base calling. Base calls will be made when their posterior likelihood exceeds threshold
 --minMapQuality threshold      Positions where the root-means squared mapping quality falls below this threshold will be excluded.
 --minDepth      threshold      Positions where the read depth falls below this threshold will be excluded.
 --maxDepth      threshold      Positions where the read depth exceeds this threshold will be excluded.
 --reference                    Positions called as homozygous reference will be included in the output.

To learn about default values for these options, simply run the program with no arguments.

Model for Variant Calling

glfSingle uses a likelihood-based model for variant calling. Starting from genotype likelihoods Pr(reads| genotype) per genomic position, computed from appropriate tools (eg. Samtools BAQ), the likelihoods combine with an individual-based prior p(genotype) to generate posterior probabilities Pr(genotype| reads).

Ingredients that go into prior:

  • All sites have an equal probability of showing polymorphism:
    • P(non-reference base) = 0.001
  • When a site shows polymorphism, it is usually heterozygous:
    • P(non-reference heterozygote) = 0.01 * 2/3
    • P(non-reference homozygote) = 0.01 * 1/3
  • Two options of mutation model:
    • Transitions (C <-> T or A <-> G) accounts for most variants, while transversions account for minority of variants
    • Uniform prior for transition to transversion ratio (1:1)

Download

For the current of glfSingle, please go to our GLF Tools Website.

TODO

Support for X chromosome variant calling.

Support for a two pass depth filter that uses the data to automatically work out appropriate filtering thresholds.