Line 50: |
Line 50: |
| *Format | | *Format |
| | | |
− | both base space and color space both single end and paired end, and paired end reads are given insert size 1500.
| + | ; Both base space and color space |
| + | ; Both single end and paired end, and paired end reads are given insert size 1500. |
| + | ; Forward strand and reverse strand are randomly assign with probability 1/2 |
| + | |
| + | * Tag |
| + | @2:12345:F:SE:Exact |
| + | @2:12345:F:SE:SNP:2,12345,A,G;2,12346,T,C |
| + | @2:12345:F:PE+offset:SNP:2,12345,A,G (ref is A, read is G) |
| + | @2:12345:F:PE+offset:Indel:25M30D5M |
| + | |
| + | * File Naming |
| + | BS_SE_EXACT_1M_50 |
| + | BS_SE_SNP1_1M_50 |
| + | CS_SE_INDEL1_1M |
| + | CS_SE_INDEL30_1M |
| + | CS_SE_INDEL200_1M |
| + | CS_SE_DEL1_1M |
| + | |
| + | For PE, appending "_1" and "_2", e.g.: |
| + | PE_EXACT_1M_1 |
| + | PE_EXACT_1M_2 |
| | | |
| *Program (generator) | | *Program (generator) |
Line 67: |
Line 87: |
| Simulation file are named like: BS_SE_EXACT_1000000_35, meaning base space, single end, exact (no polymorphism), 1M reads, 35 bp per read. For each read, the tag was named in a similar way to Sanger's. | | Simulation file are named like: BS_SE_EXACT_1000000_35, meaning base space, single end, exact (no polymorphism), 1M reads, 35 bp per read. For each read, the tag was named in a similar way to Sanger's. |
| | | |
− | <br> | + | * Example |
| + | For illumina (from Sanger, 108mer hap1 test file): |
| + | Example: |
| + | _1 file: |
| + | @20:14812275:F:217;None;None/1 |
| + | AGTTGTTTACTTTCCTTTCCTACCTGGCTGCATCTGTCACATGCATATAGTGTCCCCTGACATGAAGCTCTGATATTGATCTGGAGCCCTATTGGTCTGCAAGTGACT |
| + | + |
| + | %27::2:::<70<<::95<<6/8<.)3;::9-,3:6/67731/.+)66;;53'31;9<815.%%%+%4-%%%90-)./26<831))(.%%%%%%%)%0%2%%%%%+%% |
| + | |
| + | @15:59364621:R:-118;None;None/1 |
| + | TGTTCAACCCACTATTAAGCCAGTATTAAATTGTTAATATCAGTTATTATACTTTTATTTCTAAAATTTCTATTTGATCCCTTTTTTTATAAACTCCAATGCATTCTC |
| + | + |
| + | %%2=;28>>>>=><>>>>>=>>=>>>;>=>9<1%+,//0+)<<91<4=;;<.%)2::8;;/9<;;;;8647<<;8;;066:<:4628;;;;5:9<<0/25752:3482 |
| + | |
| + | _2 file: |
| + | @20:14812275:F:217;None;None/2 |
| + | CACTGGAGGGAATCCAATCCCAAATTAATATAACAAAACCAGAAGCTTGCTTAAAAAATATTTTATCAGATTCCAAAGTTGAGCTTGTGTTAGGGTGTACTGGAACTC |
| + | + |
| + | %%0;+250::-863486::599<9679/2%%))%+80%--7<;9/1%33,-%%)28/),3,67-8;56<1%)0/%%8;<;59/%%,())%%1%%+%).%099'4;+%- |
| + | |
| + | @15:59364621:R:-118;None;None/2 |
| + | AGAAATAAGACCACATGACAATGTTAAAAATAAAACAGGCAATAGCAATAGTCCCAGAGGTGGTTACAATATGATTTCATGCTCCAGAAAGTATAGGAGAAGACAAAG |
| + | + |
| + | %3===;==;7<<;7<5;==<<4<;9=8==<====:<<<<<;<==:=<58;===;:8'8:<===:.9:38908:=;;7;57)%.+%)967%%-%%'6:-%)7);<;0+% |
| + | |
| + | Conclusion: |
| + | If the first read is forward, then itself is the same as reference sequence and the second read is reverse complement to the reference sequence. |
| + | If the first read is backward, then itself is reverse complement to the reference genome and the second read is the same as the reference sequence. |
| + | The first strand always position can always obtain from tag, first two fields (seperated by colon). |
| + | The second strand position is first strand position plus the offset. |
| + | |
| + | For SOLiD (from Sanger, 50 mer hap1 test file) |
| + | e.g. |
| + | _1 file: |
| + | >2:67043752:F:1445;2,67043761,A,G;None |
| + | T12221203021201200302123102221322000012301300211213 |
| + | 22212031230012003021211022213220000123013022112123 (ref) |
| + | >4:125830377:R:-1541;None;None |
| + | T30002222300330113020203010322111010300030003230320 |
| + | |
| + | _2 file: |
| + | >2:67043752:F:1445;2,67043761,A,G;None |
| + | G13031223023023012201210020003310110111111203310211 |
| + | 30312230230230122012100200033121201111113033112112 (ref) |
| + | |
| + | >4:125830377:R:-1541;None;None |
| + | G13311131230200010201210032223330120312000301230032 |
| + | |
| + | Conclusion: |
| + | The first strand and second strand have the same direction (both either same as the reference genome, or reverse complement to reference genome), |
| + | where their positions are the same as Illumina reads. |
| + | |
| + | <br> |
| | | |
| = Bulk statistics result = | | = Bulk statistics result = |