glfMultiples is a GLF-based variant caller for next-generation sequencing data. It takes a set of GLF format genotype likelihood files as input and generates a VCF-format set of variant calls as output.
Basic Usage Example
In a typical command line, a series of options controlling variant calling appear first and are followed by a trailing list of GLF-format likelihood files. Here is an example of how
glfMultiples --minMapQuality 30 --minTotalDepth 60 --maxTotalDepth 240 -b YRI.SLX.vcf YRI/NA*.SLX.glf > YRI.SLX.log
Command Line Options
Basic Output Options
-b baseCallFile Specifies the name of the output VCF-format base call file -p threshold The threshold for base calling. Base calls will be made when their posterior likelihood exceeds threshold
Filtering According to Depth and Map Quality
--minMapQuality threshold Positions where the root-means squared mapping quality falls below this threshold will be excluded. --strict When the map quality is interpreted strictly, all three trio individuals must exceed minMapQuality before a call is made. Without the --strict option, reads for individuals below the threshold are ignored.
--minDepth threshold Positions where the read depth falls below this threshold will be excluded. --maxDepth threshold Positions where the read depth exceeds this threshold will be excluded.
--hardFilter Filtered positions will be completely absent from output. The default is to use a soft filter, where these positions are included in output but annotated as failing specific filters.
--glfAliases filename By default, GLF filenames are used to label each column in the VCF file. This option allows each filename to be matched to a more specific individual identifier. The aliases file should include two columns per row, the first specifying the VCF filename, the second specifying a sample name.
How It Works
For each possible position, glfMultiples considers a series of potential polymorphisms, including transitions and transversions from the reference base, but also bi-allelic polymorphisms where neither of the alleles present in the sample is the reference base. For each potential polymorphism type, the likelihood of the observed bases is maximized with respect to allele frequency. Decisions of which sites are polymorphic take into account the maximized likelihood but also an overall prior for each type of polymorphism (for example, transitions are assumed to account for ~2/3 of all variants).
glfMultiples works with log-likelihoods internally to avoid underflows in samples that may include hundreds or thousands of individuals.
The current version is available for download from http://www.sph.umich.edu/csg/abecasis/downloads/generic-glfMultiples-2010-06-16.tar.gz.
Support for X chromosome variant calling.
Support for two-pass depth filter that looks at the data to work out appropriate thresholds for shallow and deep coverage.