From Genome Analysis Wiki
Jump to navigationJump to search
243 bytes added
, 15:42, 16 June 2014
Line 159: |
Line 159: |
| It is usually useful to examine the call sets against known data sets. | | It is usually useful to examine the call sets against known data sets. |
| | | |
− | vt profile_indels -g /net/fantasia/home/atks/ref/vt/grch37/indel.reference.txt -r /net/fantasia/home/atks/ref/vt/grch37/hs37d5.fa run/final/all.genotypes.bcf -i 22:36000000-37000000 | + | vt profile_indels -g /net/fantasia/home/atks/ref/vt/grch37/indel.reference.txt -r /net/fantasia/home/atks/ref/vt/grch37/hs37d5.fa run/final/all.genotypes.bcf -i 22:36000000-37000000 -f "PASS" |
| | | |
| data set | | data set |
− | No Indels : 720 [0.84] #720 indels, with and insertion deletion ratio of 0.84 | + | No Indels : 613 [0.72] |
− | FS/NFS : 0.50 (2/2) #only 4 variants overlap with coding regions, half of which are frameshift variants | + | FS/NFS : 0.50 (2/2) |
− | Low complexity : 0.47 (335/720) #47% of the variants are in low complexity regions <br> | + | Low complexity : 0.46 (283/613) <br> |
| 1000G | | 1000G |
− | A-B 719 [0.83] #value in brackets is the insertion deletion ratio | + | A-B 371 [0.76] |
− | A&B 1 [inf] #only one variant overlaps with 1000 Genomes phase 1 data set. | + | A&B 242 [0.66] |
− | B-A 517 [0.77] | + | B-A 276 [0.89] |
− | Precision 0.1% | + | Precision 39.5% |
− | Sensitivity 0.2% <br> | + | Sensitivity 46.7% <br> |
| mills | | mills |
− | A-B 720 [0.84] | + | A-B 542 [0.68] |
− | A&B 0 [-nan] #no variants overlaps with Mills et al. double hit variants. | + | A&B 71 [1.03] |
− | B-A 102 [1.04] | + | B-A 31 [1.07] |
− | Precision 0.0% | + | Precision 11.6% |
− | Sensitivity 0.0% <br> | + | Sensitivity 69.6% <br> |
| dbsnp | | dbsnp |
− | A-B 720 [0.84] | + | A-B 405 [0.68] |
− | A&B 0 [-nan] #no variants overlaps with Mills et al. double hit variants. | + | A&B 208 [0.79] |
− | B-A 702 [1.52] | + | B-A 494 [2.03] |
− | Precision 0.0% | + | Precision 33.9% |
− | Sensitivity 0.0% | + | Sensitivity 29.6% |
− | | |
− | This discovery set appears to have many novel variants! (or false positives)
| |
| | | |
| Ins/Del ratios: Reference alignment based methods tend to be biased towards the detection of deletions. This provides a useful measure for discovery Indel sets to show the varying degree of biasness. It also appears that as coverage increases, the ins/del ratio tends to 1. | | Ins/Del ratios: Reference alignment based methods tend to be biased towards the detection of deletions. This provides a useful measure for discovery Indel sets to show the varying degree of biasness. It also appears that as coverage increases, the ins/del ratio tends to 1. |
Line 198: |
Line 196: |
| * Mills: contains doublehit common indels from the Mills. et al paper and is a relatively good measure of sensitivity for common variants. Because not all Indels in this set is expected to be present in your sample, this actually gives you an underestimate of sensitivity. | | * Mills: contains doublehit common indels from the Mills. et al paper and is a relatively good measure of sensitivity for common variants. Because not all Indels in this set is expected to be present in your sample, this actually gives you an underestimate of sensitivity. |
| | | |
| + | vt profile_indels -g /net/fantasia/home/atks/ref/vt/grch37/indel.reference.txt -r /net/fantasia/home/atks/ref/vt/grch37/hs37d5.fa run/final/all.genotypes.bcf -i 22:36000000-37000000 -f "~PASS" |
| + | |
| + | data set |
| + | No Indels : 107 [2.06] |
| + | FS/NFS : -nan (0/0) |
| + | Low complexity : 0.79 (85/107) <br> |
| + | 1000G |
| + | A-B 107 [2.06] |
| + | A&B 0 [-nan] |
| + | B-A 518 [0.77] |
| + | Precision 0.0% |
| + | Sensitivity 0.0% <br> |
| + | mills |
| + | A-B 105 [2.09] |
| + | A&B 2 [1.00] |
| + | B-A 100 [1.04] |
| + | Precision 1.9% |
| + | Sensitivity 2.0% <br> |
| + | dbsnp |
| + | A-B 102 [2.00] |
| + | A&B 5 [4.00] |
| + | B-A 697 [1.51] |
| + | Precision 4.7% |
| + | Sensitivity 0.7% |
| This analysis supports filters too. | | This analysis supports filters too. |
| | | |