Difference between revisions of "GlfSingle"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 32: Line 32:
 
**P(non-reference heterozygote) = 0.01 * 2/3
 
**P(non-reference heterozygote) = 0.01 * 2/3
 
**P(non-reference homozygote) = 0.01 * 1/3
 
**P(non-reference homozygote) = 0.01 * 1/3
*Two options of mutation model:
+
*Mutation model: Transitions (C <-> T or A <-> G) accounts for most variants, while transversions account for minority of variants
**Transitions (C <-> T or A <-> G) accounts for most variants, while transversions account for minority of variants
+
**transition has 2/3 probability
***transition has 2/3 probability
+
**each transversion has 1/6 probability
***each transversion has 1/6 probability
+
 
**Uniform (uninformative) prior for transition to transversion ratio
+
*New implementation: Alternative mutation model with uniform (uninformative) prior for transition to transversion ratio
***each mutation has a 1/3 probability
+
**each mutation has a 1/3 probability
 +
**add --uniformTsTv in command line to enable this alternative mutation model
 +
**download here
  
 
== Download ==
 
== Download ==

Revision as of 16:14, 25 September 2013

glfSingle is a GLF-based variant caller for next-generation sequencing data. It takes a GLF format genotype likelihood file as input and generates a VCF-format set of variant calls as output.

Basic Usage Example

Here is an example of how glfSingle works:

  glfSingle -g NA19240.chrom20.SLX.glf -b NA19240.chrom20.SLX.vcf > NA19240.chrom20.SLX.log

Command Line Options

 -g genotype likelihood file    Specifies the name of the input GLF-format genotype likelihood file
 -b base call file              Specifies the name of the output VCF-format base call file
 -s sample label                Specifies a label for the sample being analyzed, which will be included in the output VCF file
 -p threshold                   The threshold for base calling. Base calls will be made when their posterior likelihood exceeds threshold
 --minMapQuality threshold      Positions where the root-means squared mapping quality falls below this threshold will be excluded.
 --minDepth      threshold      Positions where the read depth falls below this threshold will be excluded.
 --maxDepth      threshold      Positions where the read depth exceeds this threshold will be excluded.
 --reference                    Positions called as homozygous reference will be included in the output.
 --uniformTsTv                  Use a uniform prior for transition-to-transversion ratio. 

To learn about default values for these options, simply run the program with no arguments.

Model for Variant Calling

glfSingle uses a likelihood-based model for variant calling. Starting from genotype likelihoods Pr(reads| genotype) per genomic position, computed from appropriate tools (eg. Samtools BAQ), the likelihoods combine with an individual-based prior p(genotype) to generate posterior probabilities Pr(genotype| reads).

Ingredients that go into prior:

  • All sites have an equal probability of showing polymorphism:
    • P(non-reference base) = 0.001
  • When a site shows polymorphism, it is usually heterozygous:
    • P(non-reference heterozygote) = 0.01 * 2/3
    • P(non-reference homozygote) = 0.01 * 1/3
  • Mutation model: Transitions (C <-> T or A <-> G) accounts for most variants, while transversions account for minority of variants
    • transition has 2/3 probability
    • each transversion has 1/6 probability
  • New implementation: Alternative mutation model with uniform (uninformative) prior for transition to transversion ratio
    • each mutation has a 1/3 probability
    • add --uniformTsTv in command line to enable this alternative mutation model
    • download here

Download

For the current of glfSingle, please go to our GLF Tools Website.

TODO

Support for X chromosome variant calling.

Support for a two pass depth filter that uses the data to automatically work out appropriate filtering thresholds.