Changes

From Genome Analysis Wiki
Jump to: navigation, search

Vt

11,534 bytes added, 16:48, 4 December 2015
Annotate Indels
</div>
</div>
 
=== Annotate Indels ===
 
Annotates indels with VNTR information and adds a VNTR record. Facilitates the simultaneous calling of VNTR together with Indels and SNPs.
 
<div class=" mw-collapsible mw-collapsed">
#annotates indels from VCFs with VNTR information.
vt annotate_indels in.vcf -r hs37d5.fa -o annotated.sites.vcf
 
<div style="height:20em; overflow:auto; border: 2px solid #FFF">
CHROM POS ID REF ALT QUAL FILTER INFO
20 82079 . G A 1255.98 . NSAMPLES=1;E=43;N=51;ESUM=43;NSUM=51;FLANKSEQ=GGAGCACGCC[G/A]CCATGCCCGG
20 82217 . G A 1632.77 . NSAMPLES=1;E=56;N=61;ESUM=56;NSUM=61;FLANKSEQ=GAGCCACCGC[G/A]CCCGGCCCAG
20 83250 . CTGTGTGTG C . . NSAMPLES=1;E=18;N=35;ESUM=18;NSUM=35;FLANKS=83250,83304;FZ_FLANKS=83250,83303;FLANKSEQ=TCTCTCTCTC[TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT]TTAGTATTTG;GMOTIF=GT;TR=20:83251:TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG:<VNTR>:GT
20 83250 . CTGTGTGTGTG C . . NSAMPLES=1;E=3;N=35;ESUM=3;NSUM=35;FLANKS=83250,83304;FZ_FLANKS=83250,83303;FLANKSEQ=TCTCTCTCTC[TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT]TTAGTATTTG;GMOTIF=GT;TR=20:83251:TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG:<VNTR>:GT
20 83251 . TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG <VNTR> . . MOTIF=GT;RU=TG;FZ_CONCORDANCE=1;FZ_RL=52;FZ_LL=0;FLANKS=83250,83304;FZ_FLANKS=83250,83303;FZ_RU_COUNTS=26,26;FLANKSEQ=TCTCTCTCTC[TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG]TTTAGTATTT
20 83252 . G C 359.204 . NSAMPLES=1;E=13;N=14;ESUM=13;NSUM=14;FLANKSEQ=CTCTCTCTCT[G/C]TGTGTGTGTG
20 83260 . G C 500.163 . NSAMPLES=1;E=18;N=34;ESUM=18;NSUM=34;FLANKSEQ=CTGTGTGTGT[G/C]TGTGTGTGTG
20 83267 . T C 247.043 . NSAMPLES=1;E=11;N=43;ESUM=11;NSUM=43;FLANKSEQ=TGTGTGTGTG[T/C]GTGTGTGTGT
20 83275 . T C 609.669 . NSAMPLES=1;E=24;N=43;ESUM=24;NSUM=43;FLANKSEQ=TGTGTGTGTG[T/C]GTGTGTGTGT
20 90008 . C A 1546.88 . NSAMPLES=1;E=52;N=60;ESUM=52;NSUM=60;FLANKSEQ=AACAGAAAAC[C/A]AAATACTGTA
20 91088 . C T 1766.04 . NSAMPLES=1;E=58;N=66;ESUM=58;NSUM=66;FLANKSEQ=CCCAGCATAC[C/T]ATGGTTGTGC
20 91508 . G A 1266.93 . NSAMPLES=1;E=44;N=53;ESUM=44;NSUM=53;FLANKSEQ=AATTAGTAAG[G/A]CTTACGTAAG
20 91707 . C T 888.134 . NSAMPLES=1;E=30;N=53;ESUM=30;NSUM=53;FLANKSEQ=TGATTTTCTA[C/T]AGCAGGACCT
20 92527 . A G 828.593 . NSAMPLES=1;E=34;N=40;ESUM=34;NSUM=40;FLANKSEQ=ATTAATTGCC[A/G]TTCTCTCTTT
20 93440 . A G 688.144 . NSAMPLES=1;E=24;N=58;ESUM=24;NSUM=58;FLANKSEQ=TTGGATGCAT[A/G]GTCTGTAAAT
20 93636 . TTTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTT <VNTR> . . MOTIF=T;RU=T;FZ_CONCORDANCE=0.939394;FZ_RL=35;FZ_LL=0;FLANKS=93646,93671;FZ_FLANKS=93635,93671;FZ_RU_COUNTS=31,33;FLANKSEQ=TCTAGGATTC[TTTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTT]GAGATGGAGT
20 93646 . C CT . . NSAMPLES=1;E=2;N=29;ESUM=2;NSUM=29;FLANKS=93646,93671;FZ_FLANKS=93635,93671;FLANKSEQ=TTTTTCTTTC[TTTTTTTTTTTTTTTTTTTTTTTT]GAGATGGAGT;GMOTIF=T;TR=20:93636:TTTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTT:<VNTR>:T
20 93717 . A T 31.7622 . NSAMPLES=1;E=2;N=29;ESUM=2;NSUM=29;FLANKSEQ=CAGTGGCGTG[A/T]TCTTAGATCA
20 93931 . G A 628.149 . NSAMPLES=1;E=22;N=53;ESUM=22;NSUM=53;FLANKSEQ=GATTACAGGT[G/A]TGAGCCGCTG
20 100699 . C T 809.09 . NSAMPLES=1;E=28;N=61;ESUM=28;NSUM=61;FLANKSEQ=GGTGAAAAAT[C/T]ACCTGTCAGT
20 101362 . G A 1087.13 . NSAMPLES=1;E=36;N=67;ESUM=36;NSUM=67;FLANKSEQ=TAATACTGAA[G/A]TTTACTTCTC
 
</div>
 
The following shows the trace of how the algorithm works
 
============================================
ANNOTATING INDEL FUZZILY
********************************************
EXTRACTIING REGION BY EXACT LEFT AND RIGHT ALIGNMENT
20:131948:C/CCA
EXACT REGION 131948-131965 (18)
CCACACACACACACACAA
FINAL EXACT REGION 131948-131965 (18)
CCACACACACACACACAA
********************************************
PICK CANDIDATE MOTIFS
Longest Allele : C[CA]CACACACACACACACAA
detecting motifs for an str
seq: CCACACACACACACACACAA
len : 20
cmax_len : 10
candidate motifs: 25
AC : 0.894737 2 0
AAC : 0.5 3 0.0555556
ACC : 0.5 3 0.0555556
AAAC : 0.0588235 4 0.125 (< 2 copies)
ACCC : 0.0588235 4 0.125 (< 2 copies)
AACAC : 0.5 5 0.02
ACACC : 0.5 5 0.02
AAACAC : 0.0666667 6 0.0555556 (< 2 copies)
ACACCC : 0.0666667 6 0.0555556 (< 2 copies)
AACACAC : 0.5 7 0.0102041
ACACACC : 0.5 7 0.0102041
AAACACAC : 0.0769231 8 0.03125 (< 2 copies)
ACACACCC : 0.0769231 8 0.03125 (< 2 copies)
AACACACAC : 0.5 9 0.00617284 (< 2 copies)
ACACACACC : 0.5 9 0.00617284 (< 2 copies)
AAACACACAC : 0.0909091 10 0.02 (< 2 copies)
ACACACACCC : 0.0909091 10 0.02 (< 2 copies)
********************************************
PICKING NEXT BEST MOTIF
selected: AC 0.89 0.00
********************************************
DETECTING REPEAT TRACT FUZZILY
++++++++++++++++++++++++++++++++++++++++++++
Exact left/right alignment
repeat_tract : CACACACACACACACA
position : [131949,131964]
motif_concordance : 1
repeat units : 8
exact repeat units : 8
total no. of repeat units : 8
++++++++++++++++++++++++++++++++++++++++++++
Fuzzy right alignment
repeat motif : CA
rflank : AACTC
mlen : 2
rflen : 5
plen : 111
read : AGAAATGATAGTCACTTCAACAGATGGTGTTGGGAAAACTGGATTTCCACAGGCAGAACAAATGAAATGGATCCTTATCTTACACCACACACACACACACAAACTC
rlen : 106
optimal score: 50.5073
optimal state: MR
optimal track: MR|r|0|5
optimal probe len: 25
optimal path length : 107
max j: 106
probe: (1~82) [1~10] (1~5)
read : (1~82) [83~101] (102~106)
motif # : 10 [83,101]
motif concordance : 95% (9/10)
motif discordance : 0|1|0|0|0|0|0|0|0|0
Model: ----------------------------------------------------------------------------------CACACACACACACACACACAAACTC
SYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYMMMDMMMMMMMMMMMMMMMMMMMMME
oo++oo++oo++oo++oo++RRRRR
Read: AGAAATGATAGTCACTTCAACAGATGGTGTTGGGAAAACTGGATTTCCACAGGCAGAACAAATGAAATGGATCCTTATCTTACAC-CACACACACACACACAAACTC
++++++++++++++++++++++++++++++++++++++++++++
Fuzzy left alignment
lflank : ATCTTA
repeat motif : CA
lflen : 6
mlen : 2
plen : 111
read : ATCTTACACCACACACACACACACAAACTCAAAATGGATTTAAAGACTTAAATGTGAGCCTGGCAAACTTAAAACTCCTAAAATAAAACAGAAGGGAATATCTTT
rlen : 105
optimal score: 50.5858
optimal state: Z
optimal track: Z|m|10|2
optimal probe len: 26
optimal path length : 106
max j: 105
mismatch penalty: 3
model: (1~6) [1~10]
read : (1~6) [7~25][26~106]
motif # : 10 [7,25]
motif concordance : 95% (9/10)
motif discordance : 0|1|0|0|0|0|0|0|0|0
Model: ATCTTACACACACACACACACACACA--------------------------------------------------------------------------------
SMMMMMMMMMDMMMMMMMMMMMMMMMMZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZE
LLLLLLoo++oo++oo++oo++oo++
Read: ATCTTACAC-CACACACACACACACAAACTCAAAATGGATTTAAAGACTTAAATGTGAGCCTGGCAAACTTAAAACTCCTAAAATAAAACAGAAGGGAATATCTTT
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
VNTR Summary
rid : 19
motif : AC
ru : CA
Exact
repeat_tract : CACACACACACACACA
position : [131949,131964]
reference repeat unit length : 8
motif_concordance : 1
repeat units : 8
exact repeat units : 8
total no. of repeat units : 8
Fuzzy
repeat_tract : CACCACACACACACACACA
position : [131946,131964]
reference repeat unit length : 19
motif_concordance : 0.95
repeat units : 19
exact repeat units : 9
total no. of repeat units : 10
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 
<div class="mw-collapsible-content">
usage : vt annotate_indels [options] <in.vcf>
 
options : -v add vntr record [false]
-x override tags [false]
-f filter expression []
-d debug [false]
-m mode [f]
e : by exact alignment f : by fuzzy alignment
-c classification schemas of tandem repeat [6]
1 : lai2003
2 : kelkar2008
3 : fondon2012
4 : ananda2013
5 : willems2014
6 : tan_kang2015
-a annotation type [v]
v : a. output VNTR variant (defined by classification).
RU repeat unit on reference sequence (CA)
MOTIF canonical representation (AC)
RL repeat tract length in bases (11)
FLANKS flanking positions of repeat tract determined by exact alignment
RU_COUNTS number of exact repeat units and total number of repeat units in
repeat tract determined by exact alignment
FZ_RL fuzzy repeat tract length in bases (11)
FZ_FLANKS flanking positions of repeat tract determined by fuzzy alignment
FZ_RU_COUNTS number of exact repeat units and total number of repeat units in
repeat tract determined by fuzzy alignment
FLANKSEQ flanking sequence of indel
LARGE_REPEAT_REGION repeat region exceeding 2000bp
b. mark indels with overlapping VNTR.
FLANKS flanking positions of repeat tract determined by exact alignment
FZ_FLANKS flanking positions of repeat tract determined by fuzzy alignment
GMOTIF generating motif used in fuzzy alignment
TR position and alleles of VNTR (20:23413:CACACACACAC:<VNTR>)
a : annotate each indel with RU, RL, MOTIF, REF.
-r reference sequence fasta file []
-o output VCF file [-]
-I file containing list of intervals []
-i intervals
-? displays help
</div>
</div>
 
 
 
=== Annotate Indels ===
1,102
edits

Navigation menu