Changes

From Genome Analysis Wiki
Jump to: navigation, search

Evaluating a Read Mapper on Simulated Data

14 bytes added, 20:12, 16 February 2010
Available Test Datasets
*Format
; Both base space and color space ; Both single end and paired end, and paired end reads are given insert size 1500. ; Forward strand and reverse strand are randomly assign with probability 1/2
* Tag
 
@2:12345:F:SE:Exact
 
@2:12345:F:SE:SNP:2,12345,A,G;2,12346,T,C
 
@2:12345:F:PE+offset:SNP:2,12345,A,G (ref is A, read is G)
 
@2:12345:F:PE+offset:Indel:25M30D5M
* File Naming
 
BS_SE_EXACT_1M_50
 
BS_SE_SNP1_1M_50
 
CS_SE_INDEL1_1M
 
CS_SE_INDEL30_1M
 
CS_SE_INDEL200_1M
 
CS_SE_DEL1_1M
For PE, appending "_1" and "_2", e.g.:
 
PE_EXACT_1M_1
 
PE_EXACT_1M_2
* Example
 
For illumina (from Sanger, 108mer hap1 test file):
 
Example:
<pre>
Conclusion:
 
If the first read is forward, then itself is the same as reference sequence and the second read is reverse complement to the reference sequence.
 
If the first read is backward, then itself is reverse complement to the reference genome and the second read is the same as the reference sequence.
 
The first strand always position can always obtain from tag, first two fields (seperated by colon).
 
The second strand position is first strand position plus the offset.
255
edits

Navigation menu