Line 3: |
Line 3: |
| Tandem repeats are a common polymorphism in the genome. | | Tandem repeats are a common polymorphism in the genome. |
| | | |
− | This wiki talks about its representation - and compiles from previous work -especially by Gary Benson's tandem repeat finder's output. | + | This wiki is an attempt at consolidation on past work by many people |
| + | |
| + | We talk about its representation, major characteristics, and a set of useful definitions and algorithms for working with them |
| + | |
| + | * Gary Benson's tandem repeat finder's |
| + | * Gymrek et al. |
| + | * Highnam et al. |
| | | |
| = Definition = | | = Definition = |
Line 129: |
Line 135: |
| GAAAGAAGGAAAAGAGAGAAAAGAAGAAGAA | | GAAAGAAGGAAAAGAGAGAAAAGAAGAAGAA |
| | | |
− | = Classification = | + | = Characteristics = |
− | | + | |
| * motif length | | * motif length |
| * motif basis | | * motif basis |
− | * repeat tract lengfth | + | * repeat tract length |
| * purity | | * purity |
| + | |
| + | questions. |
| + | |
| + | Is AAAAC really different from AAAAAC? |
| + | |
| + | |
| | | |
| | | |
Line 145: |
Line 157: |
| == Detection of a motif in a sequence == | | == Detection of a motif in a sequence == |
| | | |
| + | The following shows the trace of how the algorithm works |
| | | |
| + | ============================================ |
| + | ANNOTATING INDEL FUZZILY |
| + | ******************************************** |
| + | EXTRACTIING REGION BY EXACT LEFT AND RIGHT ALIGNMENT |
| + | |
| + | 20:131948:C/CCA |
| + | EXACT REGION 131948-131965 (18) |
| + | CCACACACACACACACAA |
| + | FINAL EXACT REGION 131948-131965 (18) |
| + | CCACACACACACACACAA |
| + | ******************************************** |
| + | PICK CANDIDATE MOTIFS |
| + | |
| + | Longest Allele : C[CA]CACACACACACACACAA |
| + | detecting motifs for an str |
| + | seq: CCACACACACACACACACAA |
| + | len : 20 |
| + | cmax_len : 10 |
| + | candidate motifs: 25 |
| + | AC : 0.894737 2 0 |
| + | AAC : 0.5 3 0.0555556 |
| + | ACC : 0.5 3 0.0555556 |
| + | AAAC : 0.0588235 4 0.125 (< 2 copies) |
| + | ACCC : 0.0588235 4 0.125 (< 2 copies) |
| + | AACAC : 0.5 5 0.02 |
| + | ACACC : 0.5 5 0.02 |
| + | AAACAC : 0.0666667 6 0.0555556 (< 2 copies) |
| + | ACACCC : 0.0666667 6 0.0555556 (< 2 copies) |
| + | AACACAC : 0.5 7 0.0102041 |
| + | ACACACC : 0.5 7 0.0102041 |
| + | AAACACAC : 0.0769231 8 0.03125 (< 2 copies) |
| + | ACACACCC : 0.0769231 8 0.03125 (< 2 copies) |
| + | AACACACAC : 0.5 9 0.00617284 (< 2 copies) |
| + | ACACACACC : 0.5 9 0.00617284 (< 2 copies) |
| + | AAACACACAC : 0.0909091 10 0.02 (< 2 copies) |
| + | ACACACACCC : 0.0909091 10 0.02 (< 2 copies) |
| + | ******************************************** |
| + | PICKING NEXT BEST MOTIF |
| + | |
| + | selected: AC 0.89 0.00 |
| + | ******************************************** |
| + | DETECTING REPEAT TRACT FUZZILY |
| + | ++++++++++++++++++++++++++++++++++++++++++++ |
| + | Exact left/right alignment |
| + | |
| + | repeat_tract : CACACACACACACACA |
| + | position : [131949,131964] |
| + | motif_concordance : 1 |
| + | repeat units : 8 |
| + | exact repeat units : 8 |
| + | total no. of repeat units : 8 |
| + | |
| + | ++++++++++++++++++++++++++++++++++++++++++++ |
| + | Fuzzy right alignment |
| + | |
| + | repeat motif : CA |
| + | rflank : AACTC |
| + | mlen : 2 |
| + | rflen : 5 |
| + | plen : 111 |
| + | |
| + | read : AGAAATGATAGTCACTTCAACAGATGGTGTTGGGAAAACTGGATTTCCACAGGCAGAACAAATGAAATGGATCCTTATCTTACACCACACACACACACACAAACTC |
| + | rlen : 106 |
| + | |
| + | optimal score: 50.5073 |
| + | optimal state: MR |
| + | optimal track: MR|r|0|5 |
| + | optimal probe len: 25 |
| + | optimal path length : 107 |
| + | max j: 106 |
| + | probe: (1~82) [1~10] (1~5) |
| + | read : (1~82) [83~101] (102~106) |
| + | |
| + | motif # : 10 [83,101] |
| + | motif concordance : 95% (9/10) |
| + | motif discordance : 0|1|0|0|0|0|0|0|0|0 |
| + | |
| + | Model: ----------------------------------------------------------------------------------CACACACACACACACACACAAACTC |
| + | SYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYMMMDMMMMMMMMMMMMMMMMMMMMME |
| + | oo++oo++oo++oo++oo++RRRRR |
| + | Read: AGAAATGATAGTCACTTCAACAGATGGTGTTGGGAAAACTGGATTTCCACAGGCAGAACAAATGAAATGGATCCTTATCTTACAC-CACACACACACACACAAACTC |
| + | |
| + | ++++++++++++++++++++++++++++++++++++++++++++ |
| + | Fuzzy left alignment |
| + | |
| + | lflank : ATCTTA |
| + | repeat motif : CA |
| + | lflen : 6 |
| + | mlen : 2 |
| + | plen : 111 |
| + | |
| + | read : ATCTTACACCACACACACACACACAAACTCAAAATGGATTTAAAGACTTAAATGTGAGCCTGGCAAACTTAAAACTCCTAAAATAAAACAGAAGGGAATATCTTT |
| + | rlen : 105 |
| + | |
| + | optimal score: 50.5858 |
| + | optimal state: Z |
| + | optimal track: Z|m|10|2 |
| + | optimal probe len: 26 |
| + | optimal path length : 106 |
| + | max j: 105 |
| + | mismatch penalty: 3 |
| + | |
| + | model: (1~6) [1~10] |
| + | read : (1~6) [7~25][26~106] |
| + | |
| + | motif # : 10 [7,25] |
| + | motif concordance : 95% (9/10) |
| + | motif discordance : 0|1|0|0|0|0|0|0|0|0 |
| + | |
| + | Model: ATCTTACACACACACACACACACACA-------------------------------------------------------------------------------- |
| + | SMMMMMMMMMDMMMMMMMMMMMMMMMMZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZE |
| + | LLLLLLoo++oo++oo++oo++oo++ |
| + | Read: ATCTTACAC-CACACACACACACACAAACTCAAAATGGATTTAAAGACTTAAATGTGAGCCTGGCAAACTTAAAACTCCTAAAATAAAACAGAAGGGAATATCTTT |
| + | |
| + | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
| + | VNTR Summary |
| + | rid : 19 |
| + | motif : AC |
| + | ru : CA |
| + | |
| + | Exact |
| + | repeat_tract : CACACACACACACACA |
| + | position : [131949,131964] |
| + | reference repeat unit length : 8 |
| + | motif_concordance : 1 |
| + | repeat units : 8 |
| + | exact repeat units : 8 |
| + | total no. of repeat units : 8 |
| + | |
| + | Fuzzy |
| + | repeat_tract : CACCACACACACACACACA |
| + | position : [131946,131964] |
| + | reference repeat unit length : 19 |
| + | motif_concordance : 0.95 |
| + | repeat units : 19 |
| + | exact repeat units : 9 |
| + | total no. of repeat units : 10 |
| + | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
| | | |
| == Model free left alignment and right alignment == | | == Model free left alignment and right alignment == |