Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,174 bytes added ,  10:12, 27 April 2014
Line 1: Line 1:  +
== NOTE ==
 +
If you are interested in calling '''''de novo''''' mutations in '''trios''' based on '''VCF''' files, we recommend our new tool, triodenovo, which implemented a nicer algorithm with a more natural interpretation of the ''''''de novo''''' quality. Please check it out following the link below. Thanks for trying it out!
 +
 +
http://genome.sph.umich.edu/wiki/Triodenovo
 +
 
== Updates ==
 
== Updates ==
The latest version of 0.13 is available for [[#Download | Download]].
+
The latest version of 0.18 is available for [[#Download | Download]].
 +
 
 +
v0.18 fixed a bug when it reported inbreeding for some pedigrees which are not inbreeding families
 +
 
 +
v0.17 fixed a bug when some of the samples in the per files are not in the input vcf file
 +
 
 +
v0.16 fixed a bug when the input is a VCF file with multiple nuclear families and the ped file contains only a single nuclear family.
 +
 
 +
v0.15 added an option (--mixed_vcf_records) to handle input vcf files in which mixed records with different FORMAT fields are present.
 +
 
 +
v0.14 implemented both inherited variant calling and '''''de novo''''' mutation detection from VCF input files. If you have a VCF file with PL or GL fields, you can run polymutt on the VCF file to quickly and conveniently call variants and mutations.
 +
*NOTE: When there is missing data in a trio or family in VCF files, the '''''de novo''''' mutation calling is not reliable and often times is not possible. So these sites should be ignored for '''''de novo''''' mutations after calling.
   −
v0.13 fixed the bug for generating genotypes when the input is a VCF file and the ped file contains only a single nuclear family. As for unrelated samples (e.g. GATK recommends at least 30 samples), it is also desirable to use more families or mixture of families and unrelated samples for polymutt.
+
v0.13 fixed the bug for generating genotypes when the input is a VCF file and the ped file contains only a single nuclear family. Like unrelated samples (e.g. [[http://gatkforums.broadinstitute.org/discussion/1186/best-practice-variant-detection-with-the-gatk-v4-for-release-2-0 GATK]] recommends at least 30 samples), it is also desirable to use more families or mixture of families and unrelated samples for polymutt.
    
== Introduction ==
 
== Introduction ==
 
* The program '''polymutt''' implemented a likelihood-based framework for calling '''single nucleotide variants''' and detecting '''''de novo''''' '''point mutation''' events in families for next-generation sequencing data.  
 
* The program '''polymutt''' implemented a likelihood-based framework for calling '''single nucleotide variants''' and detecting '''''de novo''''' '''point mutation''' events in families for next-generation sequencing data.  
   −
* The program takes as input genotype likelihood format (GLF) files which can be generated following the  [[#Creation of GLF files | Creation of GLF files]] instruction and outputs the result in the [[http://www.1000genomes.org/node/101 VCF]] format. For variant calling, alternatively polymutt can also take the VCF format input in which either the PL or the GL field are present. Commonly used variant calling algorithms such as GATK and samtools by default generate PL values in the VCF files. Current version works only on biallelic variants and non-biallelic variants in the VCF files will be ignored.
+
* The program takes as input genotype likelihood format (GLF) files which can be generated following the  [[#Creation of GLF files | Creation of GLF files]] instruction and outputs the result in the [[http://www.1000genomes.org/node/101 VCF]] format. Alternatively polymutt can also take the VCF format input in which either the PL or the GL field are present. Commonly used variant calling algorithms such as GATK and samtools by default generate PL values in the VCF files. Current version works only on biallelic variants and non-biallelic variants in the VCF files will be ignored.
    
* The variant calling and ''de novo'' mutation detection are modeled jointly within families and can handle both nuclear and extended pedigrees without consanguinity loops.
 
* The variant calling and ''de novo'' mutation detection are modeled jointly within families and can handle both nuclear and extended pedigrees without consanguinity loops.
Line 22: Line 38:     
== Additional Notes ==
 
== Additional Notes ==
 +
 
* All GLF files (and BAM files) in the input have to have IDENTICAL chromosome orders. Polymutt will go through the chromosomes in the order until when one GLF file has a different chromosome from others. All results prior to that problematic chromosome are valid though.
 
* All GLF files (and BAM files) in the input have to have IDENTICAL chromosome orders. Polymutt will go through the chromosomes in the order until when one GLF file has a different chromosome from others. All results prior to that problematic chromosome are valid though.
    
** If you do have different orders or even different numbers of chromosomes in the BAM files, you can create GLF files for individual chromosomes and run polymutt on matched chromosomes.
 
** If you do have different orders or even different numbers of chromosomes in the BAM files, you can create GLF files for individual chromosomes and run polymutt on matched chromosomes.
   −
* For ''de novo'' mutations, the current version can only take GLF files and only detect single nucleotide mutations. It does not call ''de novo'' mutations on X, Y and MT chromosomes, and please ignore records in these non-autosomes. Indels are not handled either.
+
* The current version does NOT call ''de novo'' mutations on X, Y and MT chromosomes, and please ignore records in these non-autosomes.
 +
 
 +
* For ''de novo'' mutations, it is usually helpful to explore various mutation rate in addition to the default one (1.5x10-8). For depth lower than 30X for example, the support of ''de novo'' mutation will be weak given the low mutation rate of the default value. Trying higher values of mutation rates (e.g. 10-6 or 10-7)  may be able to pick up these sites with low depth.
    
* Some of the features will be implemented in future versions.
 
* Some of the features will be implemented in future versions.
  −
* For ''de novo'' mutations, it is usually helpful to explore various mutation rate in addition to the default one (1.5x10-8). For depth lower than 30X for example, the support of ''de novo'' mutation will be weak given the low mutation rate of the default value. Trying higher values of mutation rates (e.g. 10-6 or 10-7)  may be able to pick up these sites with low depth.
      
== Usage ==
 
== Usage ==
Line 64: Line 81:  
  polymutt -p input.ped -d input.dat  --in_vcf input.vcf --out_vcf out.vcf --nthreads 4
 
  polymutt -p input.ped -d input.dat  --in_vcf input.vcf --out_vcf out.vcf --nthreads 4
   −
Examples for ''de novo'' mutation detection (works only for GLF files):
+
Examples for ''de novo'' mutation detection
 
  polymutt -p input.ped -d input.dat -g input.gif --denovo --out_vcf out.denovo.vcf --nthreads 4
 
  polymutt -p input.ped -d input.dat -g input.gif --denovo --out_vcf out.denovo.vcf --nthreads 4
  polymutt -p input.ped -d input.dat -g input.gif --denovo --rate_denovo 1.2e-06 --out_vcf out.denovo.vcf --nthreads 4
+
  polymutt -p input.ped -d input.dat -g input.gif --out_vcf out.vcf --denovo
    
Examples of calling X, Y and MT (works only for variants but not de novo mutations):
 
Examples of calling X, Y and MT (works only for variants but not de novo mutations):
Line 104: Line 121:  
'''Option 2'''
 
'''Option 2'''
 
Alternatively, if you want to refine the variant and genotype calling using family relatedness based on your existing VCF files, polymutt can take a VCF file as input. In this case, the VCF file has to have the PL or the GL field, which is usually available from commonly used tools (e.g. GATK and samtools).
 
Alternatively, if you want to refine the variant and genotype calling using family relatedness based on your existing VCF files, polymutt can take a VCF file as input. In this case, the VCF file has to have the PL or the GL field, which is usually available from commonly used tools (e.g. GATK and samtools).
  −
''NOTE'': this options does not work for de novo mutation detection in this version due to the lack of sequencing information in most VCF files.
      
In this option, you can specify --in_vcf input.vcf in place of -g input.gif for variant calling. If both the --in_vcf ang -g options are specified, --in_vcf will take action while -g will not. The .ped and .dat files are as in Option 1 but only first 5 columns are in effect and other columns will be ignored. You can remove the GLF_Index column but currently it still requires the presence of .dat file even if it is empty (will make it more flexible in future versions).
 
In this option, you can specify --in_vcf input.vcf in place of -g input.gif for variant calling. If both the --in_vcf ang -g options are specified, --in_vcf will take action while -g will not. The .ped and .dat files are as in Option 1 but only first 5 columns are in effect and other columns will be ignored. You can remove the GLF_Index column but currently it still requires the presence of .dat file even if it is empty (will make it more flexible in future versions).
Line 164: Line 179:     
== Download ==
 
== Download ==
The latest version of source code v0.13 with test files can be [[Media:Polymutt.0.13.tar.gz | downloaded]] here.
+
The latest version of source code v0.18 with test files can be [[Media:Polymutt.0.18.tar.gz | downloaded]] here.
A precompiled version on Ubuntu 10.04 (works on CentOS 6.3 as well) is available for [[Media:polymutt.0.13.binary.tar.gz | download]]
      
== Contact ==
 
== Contact ==
 
For questions please contact the authors (Bingshan Li:  [mailto:bingshan@umich.edu bingshan@umich.edu])
 
For questions please contact the authors (Bingshan Li:  [mailto:bingshan@umich.edu bingshan@umich.edu])
 +
 +
== Citation ==
 +
Li B, Chen W, Zhan X, Busonero F, Sanna S, et al. (2012) A Likelihood-Based Framework for Variant Calling and De Novo Mutation Detection in Families. PLoS Genet 8(10): e1002944. doi:10.1371/journal.pgen.1002944
    
[[Category:Software]]
 
[[Category:Software]]
480

edits

Navigation menu