Changes

Evaluating a Read Mapper on Simulated Data (view source)

Revision as of 15:49, 11 February 2010

2,758 bytes added , 15:49, 11 February 2010

Line 50: Line 50:

*Format

−

~~both~~ base space and color space ~~both~~ single end and paired end, and paired end reads are given insert size 1500.

+

; Both base space and color space

+

; Both single end and paired end, and paired end reads are given insert size 1500.

+

; Forward strand and reverse strand are randomly assign with probability 1/2

+

* Tag

+

@2:12345:F:SE:Exact

+

@2:12345:F:SE:SNP:2,12345,A,G;2,12346,T,C

+

@2:12345:F:PE+offset:SNP:2,12345,A,G (ref is A, read is G)

+

@2:12345:F:PE+offset:Indel:25M30D5M

+

* File Naming

+

BS_SE_EXACT_1M_50

+

BS_SE_SNP1_1M_50

+

CS_SE_INDEL1_1M

+

CS_SE_INDEL30_1M

+

CS_SE_INDEL200_1M

+

CS_SE_DEL1_1M

+

For PE, appending "_1" and "_2", e.g.:

+

PE_EXACT_1M_1

+

PE_EXACT_1M_2

*Program (generator)

Line 67: Line 87:

Simulation file are named like: BS_SE_EXACT_1000000_35, meaning base space, single end, exact (no polymorphism), 1M reads, 35 bp per read. For each read, the tag was named in a similar way to Sanger's.

−

<br>

+

* Example

+

For illumina (from Sanger, 108mer hap1 test file):

+

Example:

+

_1 file:

+

@20:14812275:F:217;None;None/1

+

AGTTGTTTACTTTCCTTTCCTACCTGGCTGCATCTGTCACATGCATATAGTGTCCCCTGACATGAAGCTCTGATATTGATCTGGAGCCCTATTGGTCTGCAAGTGACT

+

%27::2:::<70<<::95<<6/8<.)3;::9-,3:6/67731/.+)66;;53'31;9<815.%%%+%4-%%%90-)./26<831))(.%%%%%%%)%0%2%%%%%+%%

+

@15:59364621:R:-118;None;None/1

+

TGTTCAACCCACTATTAAGCCAGTATTAAATTGTTAATATCAGTTATTATACTTTTATTTCTAAAATTTCTATTTGATCCCTTTTTTTATAAACTCCAATGCATTCTC

+

%%2=;28>>>>=><>>>>>=>>=>>>;>=>9<1%+,//0+)<<91<4=;;<.%)2::8;;/9<;;;;8647<<;8;;066:<:4628;;;;5:9<<0/25752:3482

+

_2 file:

+

@20:14812275:F:217;None;None/2

+

CACTGGAGGGAATCCAATCCCAAATTAATATAACAAAACCAGAAGCTTGCTTAAAAAATATTTTATCAGATTCCAAAGTTGAGCTTGTGTTAGGGTGTACTGGAACTC

+

%%0;+250::-863486::599<9679/2%%))%+80%--7<;9/1%33,-%%)28/),3,67-8;56<1%)0/%%8;<;59/%%,())%%1%%+%).%099'4;+%-

+

@15:59364621:R:-118;None;None/2

+

AGAAATAAGACCACATGACAATGTTAAAAATAAAACAGGCAATAGCAATAGTCCCAGAGGTGGTTACAATATGATTTCATGCTCCAGAAAGTATAGGAGAAGACAAAG

+

%3===;==;7<<;7<5;==<<4<;9=8==<====:<<<<<;<==:=<58;===;:8'8:<===:.9:38908:=;;7;57)%.+%)967%%-%%'6:-%)7);<;0+%

+

Conclusion:

+

If the first read is forward, then itself is the same as reference sequence and the second read is reverse complement to the reference sequence.

+

If the first read is backward, then itself is reverse complement to the reference genome and the second read is the same as the reference sequence.

+

The first strand always position can always obtain from tag, first two fields (seperated by colon).

+

The second strand position is first strand position plus the offset.

+

For SOLiD (from Sanger, 50 mer hap1 test file)

+

e.g.

+

_1 file:

+

>2:67043752:F:1445;2,67043761,A,G;None

+

T12221203021201200302123102221322000012301300211213

+

22212031230012003021211022213220000123013022112123 (ref)

+

>4:125830377:R:-1541;None;None

+

T30002222300330113020203010322111010300030003230320

+

_2 file:

+

>2:67043752:F:1445;2,67043761,A,G;None

+

G13031223023023012201210020003310110111111203310211

+

30312230230230122012100200033121201111113033112112 (ref)

+

>4:125830377:R:-1541;None;None

+

G13311131230200010201210032223330120312000301230032

+

Conclusion:

+

The first strand and second strand have the same direction (both either same as the reference genome, or reverse complement to reference genome),

+

where their positions are the same as Illumina reads.

+

<br>

= Bulk statistics result =

Zhanxw

255

edits

Changes

Evaluating a Read Mapper on Simulated Data (view source)

Revision as of 15:49, 11 February 2010

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools