Changes

From Genome Analysis Wiki
Jump to navigationJump to search
2,557 bytes added ,  11:34, 26 September 2013
Line 1: Line 1:  +
[[Category:Software]]
 
'''glfSingle''' is a [[GLF]]-based variant caller for next-generation sequencing data. It takes a [[GLF]] format genotype likelihood file as input and generates a [[VCF]]-format set of variant calls as output.
 
'''glfSingle''' is a [[GLF]]-based variant caller for next-generation sequencing data. It takes a [[GLF]] format genotype likelihood file as input and generates a [[VCF]]-format set of variant calls as output.
   Line 8: Line 9:     
== Command Line Options ==
 
== Command Line Options ==
 +
 +
  -g ''genotype likelihood file''    Specifies the name of the input [[GLF]]-format genotype likelihood file
 +
  -b ''base call file''              Specifies the name of the output [[VCF]]-format base call file
 +
  -s ''sample label''                Specifies a label for the sample being analyzed, which will be included in the output VCF file
 +
  -p ''threshold''                  The threshold for base calling. Base calls will be made when their posterior likelihood exceeds ''threshold''
 +
 +
  --minMapQuality ''threshold''      Positions where the root-means squared mapping quality falls below this threshold will be excluded.
 +
  --minDepth      ''threshold''      Positions where the read depth falls below this threshold will be excluded.
 +
  --maxDepth      ''threshold''      Positions where the read depth exceeds this threshold will be excluded.
 +
  --reference                    Positions called as homozygous reference will be included in the output.
 +
 +
To learn about default values for these options, simply run the program with no arguments.
    
== Model for Variant Calling ==
 
== Model for Variant Calling ==
 +
glfSingle uses a likelihood-based model for variant calling. Starting from genotype likelihoods ''Pr(reads| genotype)'' per genomic position, computed from appropriate tools (eg. Samtools BAQ), the likelihoods combine with an individual-based prior ''p(genotype)'' to generate posterior probabilities ''Pr(genotype| reads)''.
 +
 +
Ingredients that go into prior:
 +
*All sites have an equal probability of showing polymorphism:
 +
**P(non-reference base) = 0.001
 +
*When a site shows polymorphism, it is usually heterozygous:
 +
**P(non-reference heterozygote) = 0.01 * 2/3
 +
**P(non-reference homozygote) = 0.01 * 1/3
 +
*Mutation model: Transitions (C <-> T or A <-> G) accounts for most variants, while transversions account for minority of variants
 +
**transition has 2/3 probability
 +
**each transversion has 1/6 probability
 +
 +
*'''New implementation''': Alternative mutation model with uniform (uninformative) prior for transition to transversion ratio
 +
**updated by Yancy Lo, 9/24/2012
 +
**each mutation has a 1/3 probability
 +
**add --uniformTsTv in command line to enable this alternative mutation model
 +
**download glfSingle with this new implementation here: [[File:Generic-glfSingle-2013-09-25.tar.gz]]
 +
 +
== Download ==
 +
 +
For the current of glfSingle, please go to [http://www.sph.umich.edu/csg/abecasis/glfTools/ our GLF Tools Website].
 +
 +
== TODO ==
 +
 +
Support for X chromosome variant calling.
 +
 +
Support for a two pass depth filter that uses the data to automatically work out appropriate filtering thresholds.
27

edits

Navigation menu