MappabilityScores

From Genome Analysis Wiki
Revision as of 07:17, 18 March 2010 by Goncalo (talk | contribs) (moved MapabilityScores to MappabilityScores)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

(Pasted from http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=wgEncodeMapability)

Broad alignability score

The Broad alignability track displays whether a region is made up of mostly unique or mostly non-unique sequence. To generate the track, every 36-mer in the genome was marker as "unique" if the most similar 36-mer elsewhere in the genome have at most 2 mismatches, and as "non-unique" otherwise. Position X in the alignable track is marked by 1 if >50% of the bases in [X-200,X+200] are "unique" and by 0 otherwise. Every point in the alignable track has a corresponding position in each of the ChIP signal tracks. The Broad alignability track was generated for the ENCODE project as a tool for development of the Broad Histone tracks.

Duke uniqueness score

The Duke uniqueness tracks display how unique is each sequence on the positive strand starting at a particular base and of a particular length. Thus, the 20 bp track reflects the uniqueness of all 20 base sequences with the score being assigned to the first base of the sequence. Scores are normalized to between 0 and 1 with 1 representing a completely unique sequence and 0 representing the sequence occurs >4 times in the genome (excluding chrN_random and alternative haplotypes). A score of 0.5 indicates the sequence occurs exactly twice, likewise 0.33 for three times and 0.25 for four times. The Duke uniqueness tracks were generated for the ENCODE project as tools in the development of the Open Chromatin tracks.

The Duke excluded regions track displays genomic regions for which mapped sequence tags were filtered out before signal generation and peak calling for Duke/UNC/UTA's Open Chromatin tracks. This track contains problematic regions for short sequence tag signal detection (such as satellites and rRNA genes). The Duke excluded regions track was generated for the ENCODE project.

Rosetta uniqueness score

The Rosetta uniqueness track uses sequence 'tiles' of 35 bp. Each tile was aligned to the genome using the BWA aligner. Tiles that align uniquely and perfectly in hg18 receive a p-value of 1e-37, while those that align perfectly in multiple locations receive a p-value of 0. For each tile, the oligo midpoint coordinate was recorded along with the -log_10 p-value: 37 (unambiguous) to 0 (ambiguous). The Rosetta uniqueness track was generated independently of the ENCODE project.

UMass uniqueness score

The UMass uniqueness track displays a uniqueness signal for each base which represents the sum of both plus and minus strand 15-mer occurrences of that particular 5'->3' (plus strand) sequence throughout the genome. Scores are normalized between 0 and 1 by calculating ( 1 / N ) where N is the number of genome wide occurrences of the 15-mer starting at position X. A score of 1 represents a single genome wide occurrence of that 15-mer. A 0.5 would represent either 2 plus strand occurrences or 1 plus and 1 minus strand occurrence, and so on. Ratios are rounded to 3 significant digits. Therefore a 0.000 would represent > 2000 occurrences. A 0 is reserved for a given 15-mer that is either not assembled or contains at least one N at position X. The UMass uniqueness track was generated for the ENCODE project.