Difference between revisions of "Triodenovo"
From Genome Analysis Wiki
Jump to navigationJump to searchLine 75: | Line 75: | ||
== Download == | == Download == | ||
Source code of v0.04 [[Media:triodenovo.0.04.tar.gz | download]] here. Pre-compiled binary version of 0.02 [[Media:triodenovo.0.02.binary.tar.gz | download]] here. | Source code of v0.04 [[Media:triodenovo.0.04.tar.gz | download]] here. Pre-compiled binary version of 0.02 [[Media:triodenovo.0.02.binary.tar.gz | download]] here. | ||
+ | |||
+ | == Contact == | ||
+ | For questions please contact the authors (Bingshan Li: [mailto:bingshan@umich.edu bingshan@umich.edu]) |
Revision as of 00:32, 4 April 2014
Introduction
- The program triodenovo implemented a Bayesian framework for calling de novo mutations in trios for next-generation sequencing data.
- It takes as input a standard VCF file with PL or GL fields (storing genotype likelihoods). Commonly used callers, e.g. GATK and samtools, generate VCF files with PL values.
- It calculates the likelihood of the model with de novo mutations, denoted as L1, and the likelihood of Mendelian transmission, denoted as L0, and represent the de novo evidence using DQ=-10(L1/L0)
- DQ is the major parameter to control the output, along with others. See the example output file below
- We recommend some basic and also a more advanced filtering
Usage
A command without any input will invoke triodenovo and display the following message
The following parameters are in effect: Input files : --ped [], --in_vcf [] Output files : --out_vcf [] Denovo mutation rate : --mu [1.0e-07] Scaled mutation rate : --theta [1.0e-03], --indel_theta [1.0e-04] Prior of de novo ts/tv ratio : --denovo_tstv [2.00] Non-autosome labels : --chrX [X] Filters : --minDQ [5.00], --minTotalDepth, --maxTotalDepth, --minDepth [5], --maxDepth, --mixed_vcf_records
- Example 1: using default parameters
triodenovo --ped trio.ped --in_vcf input.vcf --out denovo.vcf
- Example 2: using --minDQ 7 to output de novo calls which are a minimum DQ of 7.
triodenovo --ped trio.ped --in_vcf input.vcf --out denovo.vcf --minDQ 7
- Example 1: using --minDepth 10 to output de novo calls for which all three individuals (father, mother and child) have depth >=10.
triodenovo --ped trio.ped --in_vcf input.vcf --out denovo.vcf --minDepth 10
Input files
- A ped file, with 5 colums [see merlin documentation]. An example ped file is as follows
trio1 p1 0 0 1 trio1 p2 0 0 2 trio1 p3 p1 p2 1
- A VCF file [VCF specs]. It can contain variant information for more individuals than in the ped file.
Output
- The output file is specified via --out_vcf
An example use is as follows
##fileformat=VCFv4.1 ##triodenovo=../src/triodenovo --ped trio.denovo.ped --in_vcf trio.denovo.vcf --out_vcf trio.denovo.vcf.out ##Note=VCF file modified by polymutt. Updated fileds include: QUAL, GT and GQ, AF and AC. NOTE: modification was applied only to biallelic variants ##FILTER=<ID=LOWDP,Description="Low Depth filter when the average depth per sample is lessn than 1"> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Read Depth"> ##INFO=<ID=AF,Number=A,Type=Float,Description="Alternative Allele Frequency"> ##INFO=<ID=AC,Number=1,Type=Integer,Description="Alternative Allele Count"> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##FORMAT=<ID=DQ,Number=1,Type=Denovo Quality,Description="Denovo Quality: log10(BF) where BF is Bayes Factor calculated as L(M1)/L(M0) for M1 and M0 representing models with and without de novo mutations"> ##FORMAT=<ID=DGQ,Number=1,Type=Integer,Description="Denovo Genotype Quality: -10*log10(post) where post is the posterior probability of the called trio genotypes among all trios with de novo mutations"> ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth"> ##FORMAT=<ID=PL,Number=3,Type=Integer,Description="Phred-scaled Genotype Likelihoods"> ##FORMAT=<ID=GL,Number=3,Type=Float,Description="Log10 Genotype Likelihoods"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 3 2 1 1 10 . C T 20.71 . DP=47 GT:DQ:DGQ:DP:PL C/C:7.52:100:15:0,44,255 C/C:7.52:100:16:0,44,255 C/T:7.52:100:16:100,0,199 1 13 . C T 117.18 . DP=34 GT:DQ:DGQ:DP:PL C/C:5.67:100:8:0,23,197 C/C:5.67:100:11:0,32,255 C/T:5.67:100:15:178,0,103 1 17 . C A 111.15 . DP=36 GT:DQ:DGQ:DP:PL C/C:5.37:100:12:0,35,255 C/C:5.37:100:10:0,26,222 A/C:5.37:100:14:181,0,82 1 82 . C G 118.40 . DP=43 GT:DQ:DGQ:DP:PL C/C:6.45:100:14:0,41,255 C/C:6.45:100:13:0,38,255 C/G:6.45:100:16:199,0,100 1 91 . C T 28.69 . DP=36 GT:DQ:DGQ:DP:PL C/C:6.02:100:9:0,26,222 C/C:6.02:100:17:0,50,255 C/T:6.02:100:10:93,0,93
Filtering
We recommend two filtering strategies. The first is a simple filtering and the second one is more advance
1. Basic filtering for SNV
less trio.vcf.out | egrep "DQ|#" | perl -lane 'print if /#/; next if length($F[3])>1 || length($F[4])>1 || $F[4]=~/,/; next if $F[5]<30; $F[9] =~ /([A-Z])\/([A-Z])/; next if $1 ne $2; next if $F[10] !~ /$1\/$1/; $F[11]=~/([A-Z])\/([A-Z])/; next if $1 eq $2; $F[11] =~ /\d+,(\d+),\d+/; next if $1 != 0; print' | less
2. Advanced filtering using a machine-learning approach (i.e. DNMFilter in the following webpage)
http://humangenome.duke.edu/software
Download
Source code of v0.04 download here. Pre-compiled binary version of 0.02 download here.
Contact
For questions please contact the authors (Bingshan Li: bingshan@umich.edu)