Changes

From Genome Analysis Wiki
Jump to navigationJump to search
no edit summary
Line 35: Line 35:  
== Mapping Qualities  ==
 
== Mapping Qualities  ==
   −
We should evaluate mapping qualities by counting how many reads are assigned each mapping quality (or greater) and among those how many map correctly or incorrectly. This gives a Heng Li graph, where one plots number of correctly mapped reads vs. number of mismapped reads.  
+
We should evaluate mapping qualities by counting how many reads are assigned each mapping quality (or greater) and among those how many map correctly or incorrectly. This gives a Heng Li graph, where one plots number of correctly mapped reads vs. number of mismapped reads.
 
  −
== Available Test Datasets  ==
  −
 
  −
*Location: wonderland:~zhanxw/BigSimulation
  −
*Scenarios:
  −
 
  −
no polymorphism ; 1, 2, 3 SNP ; Deletion 5, 30, 200; Insertion 5, 30
  −
 
  −
*Quality String
  −
 
  −
Picked the 75 percentile of Sanger Iluumina 108 mer test data set
  −
 
  −
*Format
  −
 
  −
both base space and color space both single end and paired end, and paired end reads are given insert size 1500.
  −
 
  −
*Program (generator)
  −
 
  −
Usage:
  −
 
  −
        generator [bs|cs] [se|pe] [exact|snpXX|indelXX|delXX] -n numbers -l readLength -i insertSize
  −
        exact: Accurate sample from reference genome
  −
        snpXX: Bring total XXX SNP for a single read or a pair of reads
  −
        indelXX: Insert a random XX-length piece for a single read, or at the same position for a paired reads
  −
        delXX: Delete a random XX-length piece for a single read, or at the same position for a paired reads
  −
        e.g. ./generator bs se exact -n 100 -l 35
  −
 
  −
*Output
  −
 
  −
Simulation file are named like: BS_SE_EXACT_1000000_35, meaning base space, single end, exact (no polymorphism), 1M reads, 35 bp per read. For each read, the tag was named in a similar way to Sanger's.
  −
 
  −
<br>
  −
 
  −
= Bulk statistics result  =
  −
Running time (all submitted to the MOSIX client nodes)
  −
<br>
  −
Calculated by "./parseRunbatch.py batch2.log  |cutrange 0,-1|charrange :-1".
  −
 
  −
Log file is from runbatch.pl and negative time means unfinished (at the moment of editing).
  −
<pre>
  −
BWA(second) Karma(second) Scenarios
  −
2594 7182 BS_SE_DEL200_1000000_50.fastq
  −
2641 -1 BS_SE_DEL30_1000000_50.fastq
  −
2355 -1 BS_SE_DEL5_1000000_50.fastq
  −
441 7941 BS_SE_EXACT_1000000_50.fastq
  −
809 282 BS_SE_INDEL30_1000000_50.fastq
  −
2217 -1 BS_SE_INDEL5_1000000_50.fastq
  −
645 7206 BS_SE_SNP1_1000000_50.fastq
  −
1102 -1 BS_SE_SNP2_1000000_50.fastq
  −
1142 -1 BS_SE_SNP3_1000000_50.fastq
  −
6536 8874 BS_PE_DEL200_1000000_50_?.fastq
  −
6699 9017 BS_PE_DEL30_1000000_50_?.fastq
  −
6468 9033 BS_PE_DEL5_1000000_50_?.fastq
  −
1743 10112 BS_PE_EXACT_1000000_50_?.fastq
  −
2305 231 BS_PE_INDEL30_1000000_50_?.fastq
  −
5703 2989 BS_PE_INDEL5_1000000_50_?.fastq
  −
1974 3718 BS_PE_SNP1_1000000_50_?.fastq
  −
2396 3339 BS_PE_SNP2_1000000_50_?.fastq
  −
2817 3131 BS_PE_SNP3_1000000_50_?.fastq
  −
4362 16074 CS_PE_DEL200_1000000_50_?.fastq
  −
4385 -1 CS_PE_DEL30_1000000_50_?.fastq
  −
4373 9287 CS_PE_DEL5_1000000_50_?.fastq
  −
773 -1 CS_PE_EXACT_1000000_50_?.fastq
  −
1735 3142 CS_PE_INDEL30_1000000_50_?.fastq
  −
4023 8591 CS_PE_INDEL5_1000000_50_?.fastq
  −
1034 10528 CS_PE_SNP1_1000000_50_?.fastq
  −
2236 -1 CS_PE_SNP2_1000000_50_?.fastq
  −
3810 6617 CS_PE_SNP3_1000000_50_?.fastq
  −
7129 1493 CS_SE_DEL200_1000000_50.fastq
  −
7115 1513 CS_SE_DEL30_1000000_50.fastq
  −
7065 1542 CS_SE_DEL5_1000000_50.fastq
  −
1544 1666 CS_SE_EXACT_1000000_50.fastq
  −
2954 289 CS_SE_INDEL30_1000000_50.fastq
  −
6547 1390 CS_SE_INDEL5_1000000_50.fastq
  −
1690 1661 CS_SE_SNP1_1000000_50.fastq
  −
2853 1449 CS_SE_SNP2_1000000_50.fastq
  −
4039 1237 CS_SE_SNP3_1000000_50.fastq
  −
</pre>
 

Navigation menu