Line 1,152: |
Line 1,152: |
| </div> | | </div> |
| </div> | | </div> |
| + | |
| + | === Annotate Indels === |
| + | |
| + | Annotates indels with VNTR information and adds a VNTR record. Facilitates the simultaneous calling of VNTR together with Indels and SNPs. |
| + | |
| + | <div class=" mw-collapsible mw-collapsed"> |
| + | #annotates indels from VCFs with VNTR information. |
| + | vt annotate_indels in.vcf -r hs37d5.fa -o annotated.sites.vcf |
| + | |
| + | <div style="height:20em; overflow:auto; border: 2px solid #FFF"> |
| + | CHROM POS ID REF ALT QUAL FILTER INFO |
| + | 20 82079 . G A 1255.98 . NSAMPLES=1;E=43;N=51;ESUM=43;NSUM=51;FLANKSEQ=GGAGCACGCC[G/A]CCATGCCCGG |
| + | 20 82217 . G A 1632.77 . NSAMPLES=1;E=56;N=61;ESUM=56;NSUM=61;FLANKSEQ=GAGCCACCGC[G/A]CCCGGCCCAG |
| + | 20 83250 . CTGTGTGTG C . . NSAMPLES=1;E=18;N=35;ESUM=18;NSUM=35;FLANKS=83250,83304;FZ_FLANKS=83250,83303;FLANKSEQ=TCTCTCTCTC[TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT]TTAGTATTTG;GMOTIF=GT;TR=20:83251:TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG:<VNTR>:GT |
| + | 20 83250 . CTGTGTGTGTG C . . NSAMPLES=1;E=3;N=35;ESUM=3;NSUM=35;FLANKS=83250,83304;FZ_FLANKS=83250,83303;FLANKSEQ=TCTCTCTCTC[TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT]TTAGTATTTG;GMOTIF=GT;TR=20:83251:TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG:<VNTR>:GT |
| + | 20 83251 . TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG <VNTR> . . MOTIF=GT;RU=TG;FZ_CONCORDANCE=1;FZ_RL=52;FZ_LL=0;FLANKS=83250,83304;FZ_FLANKS=83250,83303;FZ_RU_COUNTS=26,26;FLANKSEQ=TCTCTCTCTC[TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG]TTTAGTATTT |
| + | 20 83252 . G C 359.204 . NSAMPLES=1;E=13;N=14;ESUM=13;NSUM=14;FLANKSEQ=CTCTCTCTCT[G/C]TGTGTGTGTG |
| + | 20 83260 . G C 500.163 . NSAMPLES=1;E=18;N=34;ESUM=18;NSUM=34;FLANKSEQ=CTGTGTGTGT[G/C]TGTGTGTGTG |
| + | 20 83267 . T C 247.043 . NSAMPLES=1;E=11;N=43;ESUM=11;NSUM=43;FLANKSEQ=TGTGTGTGTG[T/C]GTGTGTGTGT |
| + | 20 83275 . T C 609.669 . NSAMPLES=1;E=24;N=43;ESUM=24;NSUM=43;FLANKSEQ=TGTGTGTGTG[T/C]GTGTGTGTGT |
| + | 20 90008 . C A 1546.88 . NSAMPLES=1;E=52;N=60;ESUM=52;NSUM=60;FLANKSEQ=AACAGAAAAC[C/A]AAATACTGTA |
| + | 20 91088 . C T 1766.04 . NSAMPLES=1;E=58;N=66;ESUM=58;NSUM=66;FLANKSEQ=CCCAGCATAC[C/T]ATGGTTGTGC |
| + | 20 91508 . G A 1266.93 . NSAMPLES=1;E=44;N=53;ESUM=44;NSUM=53;FLANKSEQ=AATTAGTAAG[G/A]CTTACGTAAG |
| + | 20 91707 . C T 888.134 . NSAMPLES=1;E=30;N=53;ESUM=30;NSUM=53;FLANKSEQ=TGATTTTCTA[C/T]AGCAGGACCT |
| + | 20 92527 . A G 828.593 . NSAMPLES=1;E=34;N=40;ESUM=34;NSUM=40;FLANKSEQ=ATTAATTGCC[A/G]TTCTCTCTTT |
| + | 20 93440 . A G 688.144 . NSAMPLES=1;E=24;N=58;ESUM=24;NSUM=58;FLANKSEQ=TTGGATGCAT[A/G]GTCTGTAAAT |
| + | 20 93636 . TTTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTT <VNTR> . . MOTIF=T;RU=T;FZ_CONCORDANCE=0.939394;FZ_RL=35;FZ_LL=0;FLANKS=93646,93671;FZ_FLANKS=93635,93671;FZ_RU_COUNTS=31,33;FLANKSEQ=TCTAGGATTC[TTTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTT]GAGATGGAGT |
| + | 20 93646 . C CT . . NSAMPLES=1;E=2;N=29;ESUM=2;NSUM=29;FLANKS=93646,93671;FZ_FLANKS=93635,93671;FLANKSEQ=TTTTTCTTTC[TTTTTTTTTTTTTTTTTTTTTTTT]GAGATGGAGT;GMOTIF=T;TR=20:93636:TTTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTT:<VNTR>:T |
| + | 20 93717 . A T 31.7622 . NSAMPLES=1;E=2;N=29;ESUM=2;NSUM=29;FLANKSEQ=CAGTGGCGTG[A/T]TCTTAGATCA |
| + | 20 93931 . G A 628.149 . NSAMPLES=1;E=22;N=53;ESUM=22;NSUM=53;FLANKSEQ=GATTACAGGT[G/A]TGAGCCGCTG |
| + | 20 100699 . C T 809.09 . NSAMPLES=1;E=28;N=61;ESUM=28;NSUM=61;FLANKSEQ=GGTGAAAAAT[C/T]ACCTGTCAGT |
| + | 20 101362 . G A 1087.13 . NSAMPLES=1;E=36;N=67;ESUM=36;NSUM=67;FLANKSEQ=TAATACTGAA[G/A]TTTACTTCTC |
| + | |
| + | </div> |
| + | |
| + | The following shows the trace of how the algorithm works |
| + | |
| + | ============================================ |
| + | ANNOTATING INDEL FUZZILY |
| + | ******************************************** |
| + | EXTRACTIING REGION BY EXACT LEFT AND RIGHT ALIGNMENT |
| + | |
| + | 20:131948:C/CCA |
| + | EXACT REGION 131948-131965 (18) |
| + | CCACACACACACACACAA |
| + | FINAL EXACT REGION 131948-131965 (18) |
| + | CCACACACACACACACAA |
| + | ******************************************** |
| + | PICK CANDIDATE MOTIFS |
| + | |
| + | Longest Allele : C[CA]CACACACACACACACAA |
| + | detecting motifs for an str |
| + | seq: CCACACACACACACACACAA |
| + | len : 20 |
| + | cmax_len : 10 |
| + | candidate motifs: 25 |
| + | AC : 0.894737 2 0 |
| + | AAC : 0.5 3 0.0555556 |
| + | ACC : 0.5 3 0.0555556 |
| + | AAAC : 0.0588235 4 0.125 (< 2 copies) |
| + | ACCC : 0.0588235 4 0.125 (< 2 copies) |
| + | AACAC : 0.5 5 0.02 |
| + | ACACC : 0.5 5 0.02 |
| + | AAACAC : 0.0666667 6 0.0555556 (< 2 copies) |
| + | ACACCC : 0.0666667 6 0.0555556 (< 2 copies) |
| + | AACACAC : 0.5 7 0.0102041 |
| + | ACACACC : 0.5 7 0.0102041 |
| + | AAACACAC : 0.0769231 8 0.03125 (< 2 copies) |
| + | ACACACCC : 0.0769231 8 0.03125 (< 2 copies) |
| + | AACACACAC : 0.5 9 0.00617284 (< 2 copies) |
| + | ACACACACC : 0.5 9 0.00617284 (< 2 copies) |
| + | AAACACACAC : 0.0909091 10 0.02 (< 2 copies) |
| + | ACACACACCC : 0.0909091 10 0.02 (< 2 copies) |
| + | ******************************************** |
| + | PICKING NEXT BEST MOTIF |
| + | |
| + | selected: AC 0.89 0.00 |
| + | ******************************************** |
| + | DETECTING REPEAT TRACT FUZZILY |
| + | ++++++++++++++++++++++++++++++++++++++++++++ |
| + | Exact left/right alignment |
| + | |
| + | repeat_tract : CACACACACACACACA |
| + | position : [131949,131964] |
| + | motif_concordance : 1 |
| + | repeat units : 8 |
| + | exact repeat units : 8 |
| + | total no. of repeat units : 8 |
| + | |
| + | ++++++++++++++++++++++++++++++++++++++++++++ |
| + | Fuzzy right alignment |
| + | |
| + | repeat motif : CA |
| + | rflank : AACTC |
| + | mlen : 2 |
| + | rflen : 5 |
| + | plen : 111 |
| + | |
| + | read : AGAAATGATAGTCACTTCAACAGATGGTGTTGGGAAAACTGGATTTCCACAGGCAGAACAAATGAAATGGATCCTTATCTTACACCACACACACACACACAAACTC |
| + | rlen : 106 |
| + | |
| + | optimal score: 50.5073 |
| + | optimal state: MR |
| + | optimal track: MR|r|0|5 |
| + | optimal probe len: 25 |
| + | optimal path length : 107 |
| + | max j: 106 |
| + | probe: (1~82) [1~10] (1~5) |
| + | read : (1~82) [83~101] (102~106) |
| + | |
| + | motif # : 10 [83,101] |
| + | motif concordance : 95% (9/10) |
| + | motif discordance : 0|1|0|0|0|0|0|0|0|0 |
| + | |
| + | Model: ----------------------------------------------------------------------------------CACACACACACACACACACAAACTC |
| + | SYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYMMMDMMMMMMMMMMMMMMMMMMMMME |
| + | oo++oo++oo++oo++oo++RRRRR |
| + | Read: AGAAATGATAGTCACTTCAACAGATGGTGTTGGGAAAACTGGATTTCCACAGGCAGAACAAATGAAATGGATCCTTATCTTACAC-CACACACACACACACAAACTC |
| + | |
| + | ++++++++++++++++++++++++++++++++++++++++++++ |
| + | Fuzzy left alignment |
| + | |
| + | lflank : ATCTTA |
| + | repeat motif : CA |
| + | lflen : 6 |
| + | mlen : 2 |
| + | plen : 111 |
| + | |
| + | read : ATCTTACACCACACACACACACACAAACTCAAAATGGATTTAAAGACTTAAATGTGAGCCTGGCAAACTTAAAACTCCTAAAATAAAACAGAAGGGAATATCTTT |
| + | rlen : 105 |
| + | |
| + | optimal score: 50.5858 |
| + | optimal state: Z |
| + | optimal track: Z|m|10|2 |
| + | optimal probe len: 26 |
| + | optimal path length : 106 |
| + | max j: 105 |
| + | mismatch penalty: 3 |
| + | |
| + | model: (1~6) [1~10] |
| + | read : (1~6) [7~25][26~106] |
| + | |
| + | motif # : 10 [7,25] |
| + | motif concordance : 95% (9/10) |
| + | motif discordance : 0|1|0|0|0|0|0|0|0|0 |
| + | |
| + | Model: ATCTTACACACACACACACACACACA-------------------------------------------------------------------------------- |
| + | SMMMMMMMMMDMMMMMMMMMMMMMMMMZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZE |
| + | LLLLLLoo++oo++oo++oo++oo++ |
| + | Read: ATCTTACAC-CACACACACACACACAAACTCAAAATGGATTTAAAGACTTAAATGTGAGCCTGGCAAACTTAAAACTCCTAAAATAAAACAGAAGGGAATATCTTT |
| + | |
| + | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
| + | VNTR Summary |
| + | rid : 19 |
| + | motif : AC |
| + | ru : CA |
| + | |
| + | Exact |
| + | repeat_tract : CACACACACACACACA |
| + | position : [131949,131964] |
| + | reference repeat unit length : 8 |
| + | motif_concordance : 1 |
| + | repeat units : 8 |
| + | exact repeat units : 8 |
| + | total no. of repeat units : 8 |
| + | |
| + | Fuzzy |
| + | repeat_tract : CACCACACACACACACACA |
| + | position : [131946,131964] |
| + | reference repeat unit length : 19 |
| + | motif_concordance : 0.95 |
| + | repeat units : 19 |
| + | exact repeat units : 9 |
| + | total no. of repeat units : 10 |
| + | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
| + | |
| + | <div class="mw-collapsible-content"> |
| + | usage : vt annotate_indels [options] <in.vcf> |
| + | |
| + | options : -v add vntr record [false] |
| + | -x override tags [false] |
| + | -f filter expression [] |
| + | -d debug [false] |
| + | -m mode [f] |
| + | e : by exact alignment f : by fuzzy alignment |
| + | -c classification schemas of tandem repeat [6] |
| + | 1 : lai2003 |
| + | 2 : kelkar2008 |
| + | 3 : fondon2012 |
| + | 4 : ananda2013 |
| + | 5 : willems2014 |
| + | 6 : tan_kang2015 |
| + | -a annotation type [v] |
| + | v : a. output VNTR variant (defined by classification). |
| + | RU repeat unit on reference sequence (CA) |
| + | MOTIF canonical representation (AC) |
| + | RL repeat tract length in bases (11) |
| + | FLANKS flanking positions of repeat tract determined by exact alignment |
| + | RU_COUNTS number of exact repeat units and total number of repeat units in |
| + | repeat tract determined by exact alignment |
| + | FZ_RL fuzzy repeat tract length in bases (11) |
| + | FZ_FLANKS flanking positions of repeat tract determined by fuzzy alignment |
| + | FZ_RU_COUNTS number of exact repeat units and total number of repeat units in |
| + | repeat tract determined by fuzzy alignment |
| + | FLANKSEQ flanking sequence of indel |
| + | LARGE_REPEAT_REGION repeat region exceeding 2000bp |
| + | b. mark indels with overlapping VNTR. |
| + | FLANKS flanking positions of repeat tract determined by exact alignment |
| + | FZ_FLANKS flanking positions of repeat tract determined by fuzzy alignment |
| + | GMOTIF generating motif used in fuzzy alignment |
| + | TR position and alleles of VNTR (20:23413:CACACACACAC:<VNTR>) |
| + | a : annotate each indel with RU, RL, MOTIF, REF. |
| + | -r reference sequence fasta file [] |
| + | -o output VCF file [-] |
| + | -I file containing list of intervals [] |
| + | -i intervals |
| + | -? displays help |
| + | </div> |
| + | </div> |
| + | |
| + | |
| + | |
| | | |
| === Annotate Indels === | | === Annotate Indels === |