Changes

From Genome Analysis Wiki
Jump to navigationJump to search
106 bytes added ,  10:12, 27 April 2014
Line 1: Line 1:  +
== NOTE ==
 +
If you are interested in calling '''''de novo''''' mutations in '''trios''' based on '''VCF''' files, we recommend our new tool, triodenovo, which implemented a nicer algorithm with a more natural interpretation of the ''''''de novo''''' quality. Please check it out following the link below. Thanks for trying it out!
 +
 +
http://genome.sph.umich.edu/wiki/Triodenovo
 +
 
== Updates ==
 
== Updates ==
The latest version of 0.14 is available for [[#Download | Download]].
+
The latest version of 0.18 is available for [[#Download | Download]].
   −
v0.14 implemented both inherited variant calling and de novo mutation detection from VCF input files. If you have a VCF file with PL or GL fields, you can run polymutt on the VCF file to quickly and conveniently call variants and mutations.
+
v0.18 fixed a bug when it reported inbreeding for some pedigrees which are not inbreeding families
*NOTE: When there is missing data in a trio or family in VCF files, the de novo calling is not reliable and often times is not possible. So these sites should be ignored for de novo mutations.
+
 
 +
v0.17 fixed a bug when some of the samples in the per files are not in the input vcf file
 +
 
 +
v0.16 fixed a bug when the input is a VCF file with multiple nuclear families and the ped file contains only a single nuclear family.
 +
 
 +
v0.15 added an option (--mixed_vcf_records) to handle input vcf files in which mixed records with different FORMAT fields are present.
 +
 
 +
v0.14 implemented both inherited variant calling and '''''de novo''''' mutation detection from VCF input files. If you have a VCF file with PL or GL fields, you can run polymutt on the VCF file to quickly and conveniently call variants and mutations.
 +
*NOTE: When there is missing data in a trio or family in VCF files, the '''''de novo''''' mutation calling is not reliable and often times is not possible. So these sites should be ignored for '''''de novo''''' mutations after calling.
    
v0.13 fixed the bug for generating genotypes when the input is a VCF file and the ped file contains only a single nuclear family. Like unrelated samples (e.g.  [[http://gatkforums.broadinstitute.org/discussion/1186/best-practice-variant-detection-with-the-gatk-v4-for-release-2-0 GATK]] recommends at least 30 samples), it is also desirable to use more families or mixture of families and unrelated samples for polymutt.
 
v0.13 fixed the bug for generating genotypes when the input is a VCF file and the ped file contains only a single nuclear family. Like unrelated samples (e.g.  [[http://gatkforums.broadinstitute.org/discussion/1186/best-practice-variant-detection-with-the-gatk-v4-for-release-2-0 GATK]] recommends at least 30 samples), it is also desirable to use more families or mixture of families and unrelated samples for polymutt.
Line 10: Line 23:  
* The program '''polymutt''' implemented a likelihood-based framework for calling '''single nucleotide variants''' and detecting '''''de novo''''' '''point mutation''' events in families for next-generation sequencing data.  
 
* The program '''polymutt''' implemented a likelihood-based framework for calling '''single nucleotide variants''' and detecting '''''de novo''''' '''point mutation''' events in families for next-generation sequencing data.  
   −
* The program takes as input genotype likelihood format (GLF) files which can be generated following the  [[#Creation of GLF files | Creation of GLF files]] instruction and outputs the result in the [[http://www.1000genomes.org/node/101 VCF]] format. For variant calling, alternatively polymutt can also take the VCF format input in which either the PL or the GL field are present. Commonly used variant calling algorithms such as GATK and samtools by default generate PL values in the VCF files. Current version works only on biallelic variants and non-biallelic variants in the VCF files will be ignored.
+
* The program takes as input genotype likelihood format (GLF) files which can be generated following the  [[#Creation of GLF files | Creation of GLF files]] instruction and outputs the result in the [[http://www.1000genomes.org/node/101 VCF]] format. Alternatively polymutt can also take the VCF format input in which either the PL or the GL field are present. Commonly used variant calling algorithms such as GATK and samtools by default generate PL values in the VCF files. Current version works only on biallelic variants and non-biallelic variants in the VCF files will be ignored.
    
* The variant calling and ''de novo'' mutation detection are modeled jointly within families and can handle both nuclear and extended pedigrees without consanguinity loops.
 
* The variant calling and ''de novo'' mutation detection are modeled jointly within families and can handle both nuclear and extended pedigrees without consanguinity loops.
Line 25: Line 38:     
== Additional Notes ==
 
== Additional Notes ==
  −
* If the input if a VCF file, if any of the members in a family has missing PL values, the de novo calling is not reliable. Please ignore such de novo calls.
      
* All GLF files (and BAM files) in the input have to have IDENTICAL chromosome orders. Polymutt will go through the chromosomes in the order until when one GLF file has a different chromosome from others. All results prior to that problematic chromosome are valid though.
 
* All GLF files (and BAM files) in the input have to have IDENTICAL chromosome orders. Polymutt will go through the chromosomes in the order until when one GLF file has a different chromosome from others. All results prior to that problematic chromosome are valid though.
Line 70: Line 81:  
  polymutt -p input.ped -d input.dat  --in_vcf input.vcf --out_vcf out.vcf --nthreads 4
 
  polymutt -p input.ped -d input.dat  --in_vcf input.vcf --out_vcf out.vcf --nthreads 4
   −
Examples for ''de novo'' mutation detection (works only for GLF files):
+
Examples for ''de novo'' mutation detection
 
  polymutt -p input.ped -d input.dat -g input.gif --denovo --out_vcf out.denovo.vcf --nthreads 4
 
  polymutt -p input.ped -d input.dat -g input.gif --denovo --out_vcf out.denovo.vcf --nthreads 4
  polymutt -p input.ped -d input.dat -g input.gif --denovo --rate_denovo 1.2e-06 --out_vcf out.denovo.vcf --nthreads 4
+
  polymutt -p input.ped -d input.dat -g input.gif --out_vcf out.vcf --denovo
    
Examples of calling X, Y and MT (works only for variants but not de novo mutations):
 
Examples of calling X, Y and MT (works only for variants but not de novo mutations):
Line 110: Line 121:  
'''Option 2'''
 
'''Option 2'''
 
Alternatively, if you want to refine the variant and genotype calling using family relatedness based on your existing VCF files, polymutt can take a VCF file as input. In this case, the VCF file has to have the PL or the GL field, which is usually available from commonly used tools (e.g. GATK and samtools).
 
Alternatively, if you want to refine the variant and genotype calling using family relatedness based on your existing VCF files, polymutt can take a VCF file as input. In this case, the VCF file has to have the PL or the GL field, which is usually available from commonly used tools (e.g. GATK and samtools).
  −
''NOTE'': this options does not work for de novo mutation detection in this version due to the lack of sequencing information in most VCF files.
      
In this option, you can specify --in_vcf input.vcf in place of -g input.gif for variant calling. If both the --in_vcf ang -g options are specified, --in_vcf will take action while -g will not. The .ped and .dat files are as in Option 1 but only first 5 columns are in effect and other columns will be ignored. You can remove the GLF_Index column but currently it still requires the presence of .dat file even if it is empty (will make it more flexible in future versions).
 
In this option, you can specify --in_vcf input.vcf in place of -g input.gif for variant calling. If both the --in_vcf ang -g options are specified, --in_vcf will take action while -g will not. The .ped and .dat files are as in Option 1 but only first 5 columns are in effect and other columns will be ignored. You can remove the GLF_Index column but currently it still requires the presence of .dat file even if it is empty (will make it more flexible in future versions).
Line 170: Line 179:     
== Download ==
 
== Download ==
The latest version of source code v0.14 with test files can be [[Media:Polymutt.0.14.tar.gz | downloaded]] here.
+
The latest version of source code v0.18 with test files can be [[Media:Polymutt.0.18.tar.gz | downloaded]] here.
A precompiled version on Ubuntu 10.04 (works on CentOS 6.3 as well) is available for [[Media:polymutt.0.14.precompiled.tar.gz | download]]
  −
 
  −
The previous version of source code v0.13 with test files can be [[Media:Polymutt.0.13.tar.gz | downloaded]] here.
  −
A precompiled version on Ubuntu 10.04 (works on CentOS 6.3 as well) is available for [[Media:polymutt.0.13.binary.tar.gz | download]]
      
== Contact ==
 
== Contact ==
480

edits

Navigation menu