Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 159: Line 159:  
It is usually useful to examine the call sets against known data sets.
 
It is usually useful to examine the call sets against known data sets.
   −
   vt profile_indels -g /net/fantasia/home/atks/ref/vt/grch37/indel.reference.txt  -r /net/fantasia/home/atks/ref/vt/grch37/hs37d5.fa run/final/all.genotypes.bcf -i 22:36000000-37000000
+
   vt profile_indels -g /net/fantasia/home/atks/ref/vt/grch37/indel.reference.txt  -r /net/fantasia/home/atks/ref/vt/grch37/hs37d5.fa run/final/all.genotypes.bcf -i 22:36000000-37000000 -f "PASS"
    
   data set
 
   data set
     No Indels        :        720 [0.84] #720 indels, with and insertion deletion ratio of 0.84
+
     No Indels        :        613 [0.72]
       FS/NFS        :      0.50 (2/2) #only 4 variants overlap with coding regions, half of which are frameshift variants
+
       FS/NFS        :      0.50 (2/2)
       Low complexity :      0.47 (335/720)   #47% of the variants are in low complexity regions <br>
+
       Low complexity :      0.46 (283/613) <br>
 
   1000G
 
   1000G
     A-B        719 [0.83] #value in brackets is the insertion deletion ratio
+
     A-B        371 [0.76]
     A&B         1 [inf]       #only one variant overlaps with 1000 Genomes phase 1 data set.
+
     A&B       242 [0.66]
     B-A        517 [0.77]
+
     B-A        276 [0.89]
     Precision     0.1%
+
     Precision   39.5%
     Sensitivity   0.2% <br>
+
     Sensitivity 46.7% <br>
 
   mills
 
   mills
     A-B        720 [0.84]
+
     A-B        542 [0.68]
     A&B         0 [-nan] #no variants overlaps with Mills et al. double hit variants.
+
     A&B         71 [1.03]
     B-A       102 [1.04]
+
     B-A         31 [1.07]
     Precision     0.0%
+
     Precision   11.6%
     Sensitivity   0.0% <br>
+
     Sensitivity 69.6% <br>
 
   dbsnp
 
   dbsnp
     A-B        720 [0.84]
+
     A-B        405 [0.68]
     A&B         0 [-nan] #no variants overlaps with Mills et al. double hit variants.
+
     A&B       208 [0.79]
     B-A        702 [1.52]
+
     B-A        494 [2.03]
     Precision     0.0%
+
     Precision   33.9%
     Sensitivity   0.0%
+
     Sensitivity 29.6%
 
  −
This discovery set appears to have many novel variants! (or false positives)
      
Ins/Del ratios:  Reference alignment based methods tend to be biased towards the detection of deletions.  This provides a useful measure for discovery Indel sets to show the varying degree of biasness.  It also appears that as coverage increases, the ins/del ratio tends to 1.
 
Ins/Del ratios:  Reference alignment based methods tend to be biased towards the detection of deletions.  This provides a useful measure for discovery Indel sets to show the varying degree of biasness.  It also appears that as coverage increases, the ins/del ratio tends to 1.
Line 198: Line 196:  
* Mills:  contains doublehit common indels from the Mills. et al paper and is a relatively good measure of sensitivity for common variants.  Because not all Indels in this set is expected to be present in your sample, this actually gives you an underestimate of sensitivity.
 
* Mills:  contains doublehit common indels from the Mills. et al paper and is a relatively good measure of sensitivity for common variants.  Because not all Indels in this set is expected to be present in your sample, this actually gives you an underestimate of sensitivity.
    +
  vt profile_indels -g /net/fantasia/home/atks/ref/vt/grch37/indel.reference.txt  -r /net/fantasia/home/atks/ref/vt/grch37/hs37d5.fa run/final/all.genotypes.bcf -i 22:36000000-37000000 -f "~PASS"
 +
 +
  data set
 +
    No Indels        :        107 [2.06]
 +
      FS/NFS        :      -nan (0/0)
 +
      Low complexity :      0.79 (85/107) <br>
 +
  1000G
 +
    A-B        107 [2.06]
 +
    A&B          0 [-nan]
 +
    B-A        518 [0.77]
 +
    Precision    0.0%
 +
    Sensitivity  0.0% <br>
 +
  mills
 +
    A-B        105 [2.09]
 +
    A&B          2 [1.00]
 +
    B-A        100 [1.04]
 +
    Precision    1.9%
 +
    Sensitivity  2.0% <br>
 +
  dbsnp
 +
    A-B        102 [2.00]
 +
    A&B          5 [4.00]
 +
    B-A        697 [1.51]
 +
    Precision    4.7%
 +
    Sensitivity  0.7%
 
This analysis supports filters too.
 
This analysis supports filters too.
  
1,102

edits

Navigation menu