Tandem Repeat Concepts

From Genome Analysis Wiki
Revision as of 18:57, 25 February 2016 by Atks (talk | contribs) (Shifting)
Jump to: navigation, search

Introduction

This page is about Tandem Repeats.

Definition

A series of repeats that are contiguous


Concepts

Motif Canonical Class

Shifting

Consider the motifs ACTT, TACT and TTAC, the following stretches can be observed in the genome

 GGGGGGACTTACTTACTTACTTACTTAGGGGG
 GGGGGGTACTTACTTACTTACTTACTTGGGGG
 GGGGGGTTACTTACTTACTTACTTACTGGGGG

but looking from the right flank, they can easily be CTTA, ACTT and TACT respectively.

The concept of shifting the sequence is useful for grouping such like motifs together.

We define shift as follow

 A shift of a sequence is the sliding of the sequence with the alleles wrapped to the front?

Reverse Complement

Acyclicity

Motifs are required to be acyclic. For example, a motif ACACAC should just be represented by AC as it is 3 copies of AC.

  A sequence is cyclic if and only if there exists a sub sequence in which it is a multiple copy of.

The definition can be more explicit as follows:

 A sequence is cyclic if and only if there exists a non trivial shift of the sequence that is equivalent to  the sequence.

Take for example, the sequence ACACA, is this a bona fide motif? After it seems like it is 2.5 copies of AC and AC might be more appropriate.

 shift 0: ACACA
 shift 1: CACAA
 shift 2: ACAAC
 shift 3: CAACA
 shift 4: AACAC

So ACACA is a bona fide motif.

Fractional counts

Scoring

TRF Scoring

Normalized scoring

Classification

  • motif length
  • motif basis
  • repeat tract lengfth
  • purity


Algorithm for Detection

Detection of a motif in a sequence

Model free left alignment and right alignment

Model based fuzzy left alignment and right alignment

Model free fuzzy left alignment and right alignment

Implementation

This is implemented in vt.

Citation

Maintained by

This page is maintained by Adrian.