Line 35: |
Line 35: |
| == Mapping Qualities == | | == Mapping Qualities == |
| | | |
− | We should evaluate mapping qualities by counting how many reads are assigned each mapping quality (or greater) and among those how many map correctly or incorrectly. This gives a Heng Li graph, where one plots number of correctly mapped reads vs. number of mismapped reads. | + | We should evaluate mapping qualities by counting how many reads are assigned each mapping quality (or greater) and among those how many map correctly or incorrectly. This gives a Heng Li graph, where one plots number of correctly mapped reads vs. number of mismapped reads. |
− | | |
− | == Available Test Datasets ==
| |
− | | |
− | *Location: wonderland:~zhanxw/BigSimulation
| |
− | *Scenarios:
| |
− | | |
− | no polymorphism ; 1, 2, 3 SNP ; Deletion 5, 30, 200; Insertion 5, 30
| |
− | | |
− | *Quality String
| |
− | | |
− | Picked the 75 percentile of Sanger Iluumina 108 mer test data set
| |
− | | |
− | *Format
| |
− | | |
− | both base space and color space both single end and paired end, and paired end reads are given insert size 1500.
| |
− | | |
− | *Program (generator)
| |
− | | |
− | Usage:
| |
− | | |
− | generator [bs|cs] [se|pe] [exact|snpXX|indelXX|delXX] -n numbers -l readLength -i insertSize
| |
− | exact: Accurate sample from reference genome
| |
− | snpXX: Bring total XXX SNP for a single read or a pair of reads
| |
− | indelXX: Insert a random XX-length piece for a single read, or at the same position for a paired reads
| |
− | delXX: Delete a random XX-length piece for a single read, or at the same position for a paired reads
| |
− | e.g. ./generator bs se exact -n 100 -l 35
| |
− | | |
− | *Output
| |
− | | |
− | Simulation file are named like: BS_SE_EXACT_1000000_35, meaning base space, single end, exact (no polymorphism), 1M reads, 35 bp per read. For each read, the tag was named in a similar way to Sanger's.
| |
− | | |
− | <br>
| |
− | | |
− | = Bulk statistics result =
| |
− | Running time (all submitted to the MOSIX client node)
| |
− | <br>
| |
− | | |
− | BWA(second) Karma(second) Scenarios
| |
− | 2594 7182 BS_SE_DEL200_1000000_50.fastq
| |
− | 2641 -1 BS_SE_DEL30_1000000_50.fastq
| |
− | 2355 -1 BS_SE_DEL5_1000000_50.fastq
| |
− | 441 7941 BS_SE_EXACT_1000000_50.fastq
| |
− | 809 282 BS_SE_INDEL30_1000000_50.fastq
| |
− | 2217 -1 BS_SE_INDEL5_1000000_50.fastq
| |
− | 645 7206 BS_SE_SNP1_1000000_50.fastq
| |
− | 1102 -1 BS_SE_SNP2_1000000_50.fastq
| |
− | 1142 -1 BS_SE_SNP3_1000000_50.fastq
| |
− | 6536 8874 BS_PE_DEL200_1000000_50_?.fastq
| |
− | 6699 9017 BS_PE_DEL30_1000000_50_?.fastq
| |
− | 6468 9033 BS_PE_DEL5_1000000_50_?.fastq
| |
− | 1743 10112 BS_PE_EXACT_1000000_50_?.fastq
| |
− | 2305 231 BS_PE_INDEL30_1000000_50_?.fastq
| |
− | 5703 2989 BS_PE_INDEL5_1000000_50_?.fastq
| |
− | 1974 3718 BS_PE_SNP1_1000000_50_?.fastq
| |
− | 2396 3339 BS_PE_SNP2_1000000_50_?.fastq
| |
− | 2817 3131 BS_PE_SNP3_1000000_50_?.fastq
| |
− | 4362 16074 CS_PE_DEL200_1000000_50_?.fastq
| |
− | 4385 -1 CS_PE_DEL30_1000000_50_?.fastq
| |
− | 4373 9287 CS_PE_DEL5_1000000_50_?.fastq
| |
− | 773 -1 CS_PE_EXACT_1000000_50_?.fastq
| |
− | 1735 3142 CS_PE_INDEL30_1000000_50_?.fastq
| |
− | 4023 8591 CS_PE_INDEL5_1000000_50_?.fastq
| |
− | 1034 10528 CS_PE_SNP1_1000000_50_?.fastq
| |
− | 2236 -1 CS_PE_SNP2_1000000_50_?.fastq
| |
− | 3810 6617 CS_PE_SNP3_1000000_50_?.fastq
| |
− | 7129 1493 CS_SE_DEL200_1000000_50.fastq
| |
− | 7115 1513 CS_SE_DEL30_1000000_50.fastq
| |
− | 7065 1542 CS_SE_DEL5_1000000_50.fastq
| |
− | 1544 1666 CS_SE_EXACT_1000000_50.fastq
| |
− | 2954 289 CS_SE_INDEL30_1000000_50.fastq
| |
− | 6547 1390 CS_SE_INDEL5_1000000_50.fastq
| |
− | 1690 1661 CS_SE_SNP1_1000000_50.fastq
| |
− | 2853 1449 CS_SE_SNP2_1000000_50.fastq
| |
− | 4039 1237 CS_SE_SNP3_1000000_50.fastq
| |