Difference between revisions of "Triodenovo"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 67: Line 67:
 
1. Basic filtering
 
1. Basic filtering
  
2. Advanced filtering
+
less trio.vcf.out | egrep "DQ|#" | perl -lane 'print if /#/; next if length($F[3])>1 || length($F[4])>1 || $F[4]=~/,/; next if $F[5]<30; $F[9] =~ /([A-Z])\/([A-Z])/; next if $1 ne $2; next if $F[10] !~ /$1\/$1/; $F[11]=~/([A-Z])\/([A-Z])/; next if $1 eq $2; $F[11] =~ /\d+,(\d+),\d+/; next if $1 != 0; print' | less
 +
 
 +
2. Advanced filtering using a machine-learning approach (i.e. DNMFilter at the bottom of the following page)
 +
 
 +
http://humangenome.duke.edu/software
  
 
== Download ==
 
== Download ==
 
Source code of v0.04 [[Media:triodenovo.0.04.tar.gz | download]] here. Pre-compiled binary version of 0.02  [[Media:triodenovo.0.02.binary.tar.gz | download]] here.
 
Source code of v0.04 [[Media:triodenovo.0.04.tar.gz | download]] here. Pre-compiled binary version of 0.02  [[Media:triodenovo.0.02.binary.tar.gz | download]] here.

Revision as of 00:22, 4 April 2014

Introduction

  • The program triodenovo implemented a Bayesian framework for calling de novo mutations in trios for next-generation sequencing data.
  • It takes as input a standard VCF file with PL or GL fields (storing genotype likelihoods). Commonly used callers, e.g. GATK and samtools, generate VCF files with PL values.
  • It calculates the likelihood of the model with de novo mutations, denoted as L1, and the likelihood of Mendelian transmission, denoted as L0, and represent the de novo evidence using DQ=-10(L1/L0)
  • DQ is the major parameter to control the output, along with others. See the example output file below
  • We recommend some basic and also a more advanced filtering filtering

Usage

A command without any input will invoke triodenovo and display the following message

 The following parameters are in effect:
                   Input files : --ped [], --in_vcf []
                  Output files : --out_vcf []
          Denovo mutation rate : --mu [1.0e-07]
          Scaled mutation rate : --theta [1.0e-03], --indel_theta [1.0e-04]
  Prior of de novo ts/tv ratio : --denovo_tstv [2.00]
           Non-autosome labels : --chrX [X]
                       Filters : --minDQ [5.00], --minTotalDepth,
                                 --maxTotalDepth, --minDepth [5], --maxDepth,
                                 --mixed_vcf_records
  • Example 1: using default parameters
triodenovo --ped trio.ped --in_vcf input.vcf --out denovo.vcf
  • Example 2: using --minDQ 7 to output de novo calls which are a minimum DQ of 7.
triodenovo --ped trio.ped --in_vcf input.vcf --out denovo.vcf --minDQ 7
  • Example 1: using --minDepth 10 to output de novo calls for which all three individuals (father, mother and child) have depth >=10.
triodenovo --ped trio.ped --in_vcf input.vcf --out denovo.vcf --minDepth 10

Input files

trio1 p1  0  0   1
trio1 p2  0  0   2
trio1 p3  p1 p2  1
  • A VCF file [VCF specs]. It can contain variant information for more individuals than in the ped file.

Output

  • The output file is specified via --out_vcf

An example use is as follows

##fileformat=VCFv4.1
##triodenovo=../src/triodenovo --ped trio.denovo.ped --in_vcf trio.denovo.vcf --out_vcf trio.denovo.vcf.out 
##Note=VCF file modified by polymutt. Updated fileds include: QUAL, GT and GQ, AF and AC. NOTE: modification was applied only to biallelic variants
##FILTER=<ID=LOWDP,Description="Low Depth filter when the average depth per sample is lessn than 1">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Read Depth">
##INFO=<ID=AF,Number=A,Type=Float,Description="Alternative Allele Frequency">
##INFO=<ID=AC,Number=1,Type=Integer,Description="Alternative Allele Count">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DQ,Number=1,Type=Denovo Quality,Description="Denovo Quality: log10(BF) where BF is Bayes Factor calculated as L(M1)/L(M0) for M1 and M0 representing models with and without de novo mutations">
##FORMAT=<ID=DGQ,Number=1,Type=Integer,Description="Denovo Genotype Quality: -10*log10(post) where post is the posterior probability of the called trio genotypes among all trios with de novo mutations">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=PL,Number=3,Type=Integer,Description="Phred-scaled Genotype Likelihoods">
##FORMAT=<ID=GL,Number=3,Type=Float,Description="Log10 Genotype Likelihoods">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  3       2       1
1       10      .       C       T       20.71   .       DP=47   GT:DQ:DGQ:DP:PL C/C:7.52:100:15:0,44,255        C/C:7.52:100:16:0,44,255        C/T:7.52:100:16:100,0,199
1       13      .       C       T       117.18  .       DP=34   GT:DQ:DGQ:DP:PL C/C:5.67:100:8:0,23,197 C/C:5.67:100:11:0,32,255        C/T:5.67:100:15:178,0,103
1       17      .       C       A       111.15  .       DP=36   GT:DQ:DGQ:DP:PL C/C:5.37:100:12:0,35,255        C/C:5.37:100:10:0,26,222        A/C:5.37:100:14:181,0,82
1       82      .       C       G       118.40  .       DP=43   GT:DQ:DGQ:DP:PL C/C:6.45:100:14:0,41,255        C/C:6.45:100:13:0,38,255        C/G:6.45:100:16:199,0,100
1       91      .       C       T       28.69   .       DP=36   GT:DQ:DGQ:DP:PL C/C:6.02:100:9:0,26,222 C/C:6.02:100:17:0,50,255        C/T:6.02:100:10:93,0,93

Filtering

We recommend two filtering strategies. The first is a simple filtering and the second one is more advance

1. Basic filtering

less trio.vcf.out | egrep "DQ|#" | perl -lane 'print if /#/; next if length($F[3])>1 || length($F[4])>1 || $F[4]=~/,/; next if $F[5]<30; $F[9] =~ /([A-Z])\/([A-Z])/; next if $1 ne $2; next if $F[10] !~ /$1\/$1/; $F[11]=~/([A-Z])\/([A-Z])/; next if $1 eq $2; $F[11] =~ /\d+,(\d+),\d+/; next if $1 != 0; print' | less

2. Advanced filtering using a machine-learning approach (i.e. DNMFilter at the bottom of the following page)

http://humangenome.duke.edu/software

Download

Source code of v0.04 download here. Pre-compiled binary version of 0.02 download here.