Changes

709 bytes added , 18:12, 15 June 2014

→‎Motivation

Line 5: Line 5:

=Tools=

−

~~You can download~~ [[vt|vt]] ~~and have some working knowledge of PERL to do stuff that vt does not support~~.

+

This walkthrough requires [[vt|vt]].

=Analyses=

+

The file generated from the indel calling is a binary version [[http://www.1000genomes.org/wiki/analysis/variant-call-format/bcf-binary-vcf-version-2 BCFv2.1]] of the Variant Call Format (VCF). BCFv2.1 is more efficient to process as the data is already stored in computer readable format on the hard disk. It is however not necessarily more compact than VCF4.2 especially when the format fields are rich in details.

==File Preparation==

Line 14: Line 16:

To convert to BCF format which will work fast with vt:

+

vt view mills.vcf -o mills.bcf

Line 203: Line 207:

data set

−

No Indels : 8904 [0.93]

+

No Indels : 8904 [0.93] //#variants in your data set [ins/del ratio]

−

FS/NFS : 0.66 (67/35)<br>

+

FS/NFS : 0.66 (67/35) //Proportion of frameshift Indels. (#Frameshift Indels/#Nonframeshift Indels)<br>

−

dbsnp

+

dbsnp //A represents the data set you input, B represents dbsnp

−

A-B 2975 [1.06]

+

A-B 2975 [1.06] //#variants in A only [ins/del ratio]

−

A&B 5929 [0.86]

+

A&B 5929 [0.86] //#variants in A and B

B-A 2059845 [1.51]

−

Precision 66.6%

+

Precision 66.6% //A&B/A this represents how novel your data set is in the variants represented.

−

Sensitivity 0.3% <br>

+

Sensitivity 0.3% //A&B/B this represents sensitivity somewhat if dbsnp is considered a high quality Indel

+

//set and the sample are the same in both data sets. (which they usually are not, this is still

+

//nonetheless a useful indicator)<br>

mills

A-B 5705 [0.81]

Line 230: Line 236:

Sensitivity 0.2%

−

Ins/Del ratios: Reference alignment based methods tend to be biased towards the detection of deletions. This provides a useful measure for discovery Indel sets to show the varying degree of biasness.

+

Ins/Del ratios: Reference alignment based methods tend to be biased towards the detection of deletions. This provides a useful measure for discovery Indel sets to show the varying degree of biasness. It also appears that as coverage increases, the ins/del ratio tends to 1.

Coding region analysis: Coding region Indels may be categorised as Frame shift Indels and Non frameshift Indels. A lower proportion of Frameshift Indels may indicate a better quality data set but this depends also on the individuals sequenced.

Line 241: Line 247:

* Affy Exome Chip: This contains somewhat rare variants in exonic regions and is useful for exome chip analysis. You should subset your exome data to exome region Indels before comparing against this data set.

−

~~==STR ==~~

+

This analysis supports filters too.

−

~~Annotation of STRs is really important~~. ~~Show example of a deceptive single base pair variant~~

−

~~==Annotation of Indels==~~

−

~~==Examining Mendelian Errors==~~

−

~~==Useful to have call sets from several different callers==~~

−

~~==Concordance==~~

−

~~Can check concordance of genotypes between callers~~

−

~~==Overlapping percentages with known data sets==~~

−

~~With Mills~~

−

~~with dbSNP~~

−

~~with exome chips~~

−

~~with genotyping chips if available~~

−

~~==Useful stratifying features==~~

−

~~AF - rare versus common~~

−

~~Indel length - computed naively versus tract length~~

−

~~Allele frequency bins~~

−

~~Type of Indels - homopolymer types and STR types and isolated~~

−

~~Adjacent SNPs~~

−

~~Adjacent MNPs~~

−

~~Clumping variants~~

−

==~~Other useful evaluations~~==

+

==to document==

−

genotype likelihood concordance

+

* Annotation of STRs is really important. Show example of a deceptive single base pair variant

−

concordance stratified by indel length or tract length

+

* Mendelian analysis

−

mendelian concordance by tract length

+

* AFS

+

* Can check concordance of genotypes between callers - partitiion

+

* Type of Indels - homopolymer types and STR types and isolated, Adjacent SNPs ,Adjacent MNPs,Clumping variants

+

* genotype likelihood concordance

+

* concordance stratified by indel length or tract length

+

* mendelian concordance by tract length

Atks

1,102

edits

Changes

Analyses of Indels (view source)

Revision as of 18:12, 15 June 2014

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools