Changes

Sequencing Workshop Analysis of Indels (view source)

Revision as of 15:42, 16 June 2014

243 bytes added , 15:42, 16 June 2014

Line 159: Line 159:

It is usually useful to examine the call sets against known data sets.

−

vt profile_indels -g /net/fantasia/home/atks/ref/vt/grch37/indel.reference.txt -r /net/fantasia/home/atks/ref/vt/grch37/hs37d5.fa run/final/all.genotypes.bcf -i 22:36000000-37000000

+

vt profile_indels -g /net/fantasia/home/atks/ref/vt/grch37/indel.reference.txt -r /net/fantasia/home/atks/ref/vt/grch37/hs37d5.fa run/final/all.genotypes.bcf -i 22:36000000-37000000 -f "PASS"

data set

−

No Indels : ~~720~~ [0.84] ~~#720 indels, with and insertion deletion ratio of 0.84~~

+

No Indels : 613 [0.72]

−

FS/NFS : 0.50 (2/2) ~~#only 4 variants overlap with coding regions, half of which are frameshift variants~~

+

FS/NFS : 0.50 (2/2)

−

Low complexity : 0.47 (~~335~~/~~720~~) ~~#47% of the variants are in low complexity regions~~

+

Low complexity : 0.46 (283/613)

1000G

−

A-B ~~719~~ [0.83] ~~#value in brackets is the insertion deletion ratio~~

+

A-B 371 [0.76]

−

A&B 1 [~~inf~~] ~~#only one variant overlaps with 1000 Genomes phase 1 data set.~~

+

A&B 242 [0.66]

−

B-A ~~517~~ [0.77]

+

B-A 276 [0.89]

−

Precision 0.1%

+

Precision 39.5%

−

Sensitivity 0.2%

+

Sensitivity 46.7%

mills

−

A-B ~~720~~ [0.84]

+

A-B 542 [0.68]

−

A&B 0 [~~-nan~~] ~~#no variants overlaps with Mills et al. double hit variants.~~

+

A&B 71 [1.03]

−

B-A ~~102~~ [1.04]

+

B-A 31 [1.07]

−

Precision 0.0%

+

Precision 11.6%

−

Sensitivity 0.0%

+

Sensitivity 69.6%

dbsnp

−

A-B ~~720~~ [0.84]

+

A-B 405 [0.68]

−

A&B 0 ~~[-nan~~] ~~#no variants overlaps with Mills et al. double hit variants.~~

+

A&B 208 [0.79]

−

B-A ~~702~~ [1.52]

+

B-A 494 [2.03]

−

Precision 0.0%

+

Precision 33.9%

−

Sensitivity 0.0%

+

Sensitivity 29.6%

−

~~This discovery set appears to have many novel variants! (or false positives)~~

Ins/Del ratios: Reference alignment based methods tend to be biased towards the detection of deletions. This provides a useful measure for discovery Indel sets to show the varying degree of biasness. It also appears that as coverage increases, the ins/del ratio tends to 1.

Line 198: Line 196:

* Mills: contains doublehit common indels from the Mills. et al paper and is a relatively good measure of sensitivity for common variants. Because not all Indels in this set is expected to be present in your sample, this actually gives you an underestimate of sensitivity.

+

vt profile_indels -g /net/fantasia/home/atks/ref/vt/grch37/indel.reference.txt -r /net/fantasia/home/atks/ref/vt/grch37/hs37d5.fa run/final/all.genotypes.bcf -i 22:36000000-37000000 -f "~PASS"

+

data set

+

No Indels : 107 [2.06]

+

FS/NFS : -nan (0/0)

+

Low complexity : 0.79 (85/107)

+

1000G

+

A-B 107 [2.06]

+

A&B 0 [-nan]

+

B-A 518 [0.77]

+

Precision 0.0%

+

Sensitivity 0.0%

+

mills

+

A-B 105 [2.09]

+

A&B 2 [1.00]

+

B-A 100 [1.04]

+

Precision 1.9%

+

Sensitivity 2.0%

+

dbsnp

+

A-B 102 [2.00]

+

A&B 5 [4.00]

+

B-A 697 [1.51]

+

Precision 4.7%

+

Sensitivity 0.7%

This analysis supports filters too.

Atks

1,102

edits

Changes

Sequencing Workshop Analysis of Indels (view source)

Revision as of 15:42, 16 June 2014

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools