Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 173: Line 173:     
This discovery set appears to have many novel variants! (or false positives)
 
This discovery set appears to have many novel variants! (or false positives)
 +
 +
Ins/Del ratios:  Reference alignment based methods tend to be biased towards the detection of deletions.  This provides a useful measure for discovery Indel sets to show the varying degree of biasness.  It also appears that as coverage increases, the ins/del ratio tends to 1.
 +
 +
Coding region analysis:  Coding region Indels may be categorised as Frame shift Indels and Non frameshift Indels.  A lower proportion of Frameshift Indels may indicate a better quality data set but this depends also on the individuals sequenced.
 +
 +
Complexity region analysis: Indels in regions marked by DUST - a low complexity region masker used in the NCBI pipeline.
 +
 +
Overlap analysis:  overlap analysis with other data sets is an indicator of sensitivity.
 +
 +
* dbsnp: contains Indels submitted from everywhere, I am not sure what does this represent exactly.  But assuming most are real, then precision is a useful estimated quantity from this reference data set.
 +
* dbsnp: contains Indels submitted from everywhere, I am not sure what does this represent exactly.  But assuming most are real, then precision is a useful estimated quantity from this reference data set.
 +
* Mills:  contains doublehit common indels from the Mills. et al paper and is a relatively good measure of sensitivity for common variants.  Because not all Indels in this set is expected to be present in your sample, this actually gives you an underestimate of sensitivity.
 +
 +
This analysis supports filters too.
    
==Normalization==
 
==Normalization==
1,102

edits

Navigation menu