Difference between revisions of "Bayesdenovo"
From Genome Analysis Wiki
Jump to navigationJump to search(One intermediate revision by the same user not shown) | |||
Line 89: | Line 89: | ||
== Filtering == | == Filtering == | ||
− | We recommend two filtering strategies. The first is a simple filtering and the second one is more | + | We recommend two filtering strategies. The first is a simple filtering and the second one is more advanced. Please see the triodenovo page below for more information: |
http://genome.sph.umich.edu/wiki/Triodenovo | http://genome.sph.umich.edu/wiki/Triodenovo | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Download == | == Download == |
Latest revision as of 10:22, 26 October 2016
Update
v0.01 is available for download
Compilation
- After downloading the source code, unzip and untar it, and cd bayesdenovo, and then type Make
- If you encountered errors related to deprecated usage of some syntax please try to comment out the following in the core/Makefile
CXXFLAGS += -Werror -Wno-unused-variable -Wno-unused-result
Introduction
- The program bayesdenovo implemented a Bayesian framework for calling de novo mutations in nuclear families (including trios, quartets, and families with more siblings) for next-generation sequencing data. If infers Identity-by-Descednt (IBD) allele sharing to increase the de novo mutation calling accuracy. As a result, the IBD sharing for the called de novo mutations is also available in the output file.
- It takes as input a standard VCF file with PL or GL fields (storing genotype likelihoods). Commonly used callers, e.g. GATK and samtools, generate VCF files with PL values.
- It calculates the likelihood of the model with de novo mutations, denoted as L1, and the likelihood of Mendelian transmission, denoted as L0, and represent the de novo evidence using a Bayesian factor BF=L1/L0. In TrioDeNovo the de novo quality is represented as DQ=log10(BF) = log10(L1/L0).
- DQ is the major parameter to control the output, along with others. See the example output file below
- We recommend some basic and also a more advanced filtering
Usage
A command without any input will invoke triodenovo and display the following message
*** This build (v.0.1) was compiled on Oct 25 2016, 11:16:22 *** pedfile : (-pname) datfile : (-dname) mapfile : (-mname)
Additional Options Input : --in_vcf [], --submap [1.00] Denovo mutation parameters : --tstv_ratio [2.00], --minDQ [5.00] Multi-threading : --nthreads [1] Output : --out_prefix []
- Example 1: using default parameters
bayesdenovo -p sim.vcf.ped -d sim.vcf.dat -m sim.vcf.map --in_vcf sim.vcf --out_prefix sim.vcf.denovo
- Example 2: using --minDQ 7 to output de novo calls which are a minimum DQ of 7.
bayesdenovo -p sim.vcf.ped -d sim.vcf.dat -m sim.vcf.map --in_vcf sim.vcf --out_prefix sim.vcf.denovo --min_DQ 7
Input files
- A ped file, with 5 colums [see merlin documentation]. An example ped file is as follows (Note that you can mix trios with other nuclear families in the same VCF file):
quartet1 p1 0 0 1 quartet1 p2 0 0 2 quartet1 p3 p1 p2 1 quartet1 p4 p1 p2 1 nuc1 p5 0 0 1 nuc1 p6 0 0 2 nuc1 p7 p1 p2 1 nuc1 p8 p1 p2 1 nuc1 p9 p1 p2 1 trio1 p10 0 0 1 trio1 p11 0 0 2 troi1 p12 p1 p2 1 trio2 p13 0 0 1 trio2 p14 0 0 2 troi2 p15 p1 p2 1
- A VCF file [VCF specs]. It can contain variant information for more individuals than in the ped file.
- Note: In the VCF file either PL or GL has to be provided, and only the PL (or GL) field is used in the calling.
- A map file in the PLINK format. See blow for examples how to generate a map file with common and high quality variants
Examples of generating the map file
- vcf2map: generate a sparse map file (see Download for files genetic_map_GRCh37_chr1.txt and 1000G.SNV.clean.MAF0.05.tbl.gz)
vcf2map --vcf input.vcf --ped input.ped --map genetic_map_GRCh37_chr1.txt --include_list 1000G.SNV.clean.MAF0.05.tbl.gz --out_map chr1.map
- User defined r2 cutoff for LD pruning , min of average depth for filtering
vcf2map --vcf input.vcf --ped input.ped --map genetic_map_GRCh37_chr1.txt --include_list 1000G.SNV.clean.MAF0.05.tbl.gz --max_r2 0.2 --min_avg_dp 2 --out_map chr1.r0.2.map
Output
- The output will be one file per family, and the prefix to the names is specified via --out_prefix
An example of output file is as follows
##fileformat=VCFv4.1 ##ProgramStart=Tue Oct 25 11:20:39 2016 ##BayesDeNovo=../bin/bayesdenovo -p sim.vcf.ped -d sim.vcf.dat -m sim.vcf.map --in_vcf sim.vcf --out_prefix sim.vcf.denovo ##Note=VCF file modified by polymutt2. Updated fileds include: QUAL, GT and GQ, and AC. NOTE: modification was applied only to biallelic variants ##FILTER=<ID=LOWDP,Description="Low Depth filter when the average depth per sample is lessn than 1"> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Read Depth"> ##INFO=<ID=AF,Number=A,Type=Float,Description="Alternative Allele Frequency"> ##INFO=<ID=AC,Number=1,Type=Integer,Description="Alternative Allele Count"> ##INFO=<ID=FDQ,Number=1,Type=Integer,Description="Family-wise De Novo Mutatoin Quality in log10(BF) format"> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##FORMAT=<ID=DQ,Number=1,Type=Integer,Description="De Novo Mutation Quality in log10(BF) format"> ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth"> ##FORMAT=<ID=IV,Number=2,Type=Integer,Description="Best path inheritance vector. Founder alleles are arbiturally labeled (1 to 2*nFounders) and L1|L2 for non-founders indicated L1 and L2 from founders are transmitted"> ##FORMAT=<ID=PL,Number=3,Type=Integer,Description="Phred-scaled Genotype Likelihoods"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Rep1_1_1 Rep1_1_2 Rep1_1_3 Rep1_1_4 1 118 . C A 100 . AF=0.333333;AC=7;DP=1629;FDQ=8.667414 GT:DQ:DP:PL 0/0:.:202:1|2:0,100,255 0/0:.:234:3|4:0,100,255 0/1:8.97:203:1|3:100,0,255 0/0:.:206:2|4:0,100,255 1 858 . C A 100 . AF=0.333333;AC=10;DP=1592;FDQ=8.688325 GT:DQ:DP:PL 0/0:.:184:1|2:0,100,255 0/0:.:208:3|4:0,100,255 0/0:.:220:1|3:0,100,255 0/1:8.99:197:2|4:100,0,255
Filtering
We recommend two filtering strategies. The first is a simple filtering and the second one is more advanced. Please see the triodenovo page below for more information:
http://genome.sph.umich.edu/wiki/Triodenovo
Download
Source code of v0.01 download here.
Contact
For questions please contact the authors (Bingshan Li: bingshan.li@vanderbilt.edu)