Changes

From Genome Analysis Wiki
Jump to navigationJump to search
551 bytes added ,  11:20, 8 October 2012
Line 3: Line 3:     
== Introduction ==
 
== Introduction ==
* The program '''polymutt''' implemented a likelihood-based framework for calling '''single nucleotide variants''' and detecting '''''de novo''''' '''point mutation''' events in families for next-generation sequencing data. The program takes as input genotype likelihood format (GLF) files which can be generated following the  [[#Creation of GLF files | Creation of GLF files]] instruction and outputs the result in the [[http://www.1000genomes.org/node/101 VCF]] format. The variant calling and ''de novo'' mutation detection are modelled jointly within families and can handle both nuclear and extended pedigrees without consanguinity loops. The input is a set of GLF files for each of family members and the relationships are specified through the .ped file.
+
* The program '''polymutt''' implemented a likelihood-based framework for calling '''single nucleotide variants''' and detecting '''''de novo''''' '''point mutation''' events in families for next-generation sequencing data.  
 +
 
 +
* The program takes as input genotype likelihood format (GLF) files which can be generated following the  [[#Creation of GLF files | Creation of GLF files]] instruction and outputs the result in the [[http://www.1000genomes.org/node/101 VCF]] format. For variant calling, alternatively polymutt can also take the VCF format input in which either the PL or the GL field are present. Commonly used variant calling algorithms such as GATK and samtools by default generate PL values in the VCF files. Current version works only on biallelic variants and non-biallelic variants in the VCF files will be ignored.
 +
 
 +
* The variant calling and ''de novo'' mutation detection are modeled jointly within families and can handle both nuclear and extended pedigrees without consanguinity loops.
 +
 
 +
* Since unrelated individuals are kind of special case of families, unrelated individuals or a mixture of related and unrelated individuals can be handled. The relationship is specified in the input .ped file and for unrelated individuals each of them can be assigned a unique family ID.
    
* The evidence of variants and ''de novo'' mutations are assessed probabilistically. For a variant, the QUAL value is calculated as -10*log10(1-posterior(Variant | Data)) and for ''de novo'' mutation events a ''de novo'' quality (DQ) value is defined as log10(lk_denovo / lk_no_denovo) where lk_denovo and lk_no_denovo are the likelihoods of data allowing and disallowing ''de novo'' mutations respectively. Similarly, for each genotype, a genotype quality (GQ) value is defined as -10*log10(1-posterior(Genotype | Data)).
 
* The evidence of variants and ''de novo'' mutations are assessed probabilistically. For a variant, the QUAL value is calculated as -10*log10(1-posterior(Variant | Data)) and for ''de novo'' mutation events a ''de novo'' quality (DQ) value is defined as log10(lk_denovo / lk_no_denovo) where lk_denovo and lk_no_denovo are the likelihoods of data allowing and disallowing ''de novo'' mutations respectively. Similarly, for each genotype, a genotype quality (GQ) value is defined as -10*log10(1-posterior(Genotype | Data)).
   −
* Since unrelated individuals are kind of special case of families, unrelated individuals or a mixture of related and unrelated individuals can be handled.
+
* If some individuals in a family are not sequenced, this can be handled by setting the corresponding GLF file indices to zero for those family members who are not sequenced, if the input are GLF files. For VCF input, all individuals in the .ped file but not in the VCF files are considered missing data (not sequenced).
 
  −
* If some individuals in a family are not sequenced, this can be handled by setting the corresponding GLF file indices to zero for those family members who are not sequenced.
     −
* NOTE: This version only works for autosomes. Variant calling for X, Y and MT is in the testing process and will be available in next version.
+
* NOTE: Variant calling for X, Y and MT has been only lightly tested. Any comments/suggestions about polymutt and non-autosomal variant calling in particular are appreciated.
    
* See below for more details.
 
* See below for more details.
480

edits

Navigation menu