Changes

From Genome Analysis Wiki
Jump to navigationJump to search
995 bytes added ,  18:12, 15 June 2014
Line 5: Line 5:  
=Tools=
 
=Tools=
   −
You can download [[vt|vt]] and have some working knowledge of PERL to do stuff that vt does not support.
+
This walkthrough requires  [[vt|vt]].
    
=Analyses=
 
=Analyses=
 +
 +
The file generated from the indel calling is a binary version [[http://www.1000genomes.org/wiki/analysis/variant-call-format/bcf-binary-vcf-version-2 BCFv2.1]] of the Variant Call Format (VCF).  BCFv2.1 is more efficient to process as the data is already stored in computer readable format on the hard disk.  It is however not necessarily more compact than VCF4.2 especially when the format fields are rich in details.
    
==File Preparation==
 
==File Preparation==
Line 14: Line 16:     
To convert to BCF format which will work fast with vt:
 
To convert to BCF format which will work fast with vt:
 +
 +
    
   vt view mills.vcf -o mills.bcf
 
   vt view mills.vcf -o mills.bcf
Line 203: Line 207:     
   data set
 
   data set
     No Indels :      8904 [0.93]
+
     No Indels :      8904 [0.93]  //#variants in your data set [ins/del ratio]
       FS/NFS :      0.66 (67/35)<br>
+
       FS/NFS :      0.66 (67/35)  //Proportion of frameshift Indels. (#Frameshift Indels/#Nonframeshift Indels)<br>
   dbsnp
+
   dbsnp  //A represents the data set you input, B represents dbsnp
     A-B      2975 [1.06]
+
     A-B      2975 [1.06]  //#variants in A only [ins/del ratio]
     A&B      5929 [0.86]
+
     A&B      5929 [0.86] //#variants in A and B
 
     B-A    2059845 [1.51]
 
     B-A    2059845 [1.51]
     Precision    66.6%
+
     Precision    66.6%     //A&B/A this represents how novel your data set is in the variants represented.
     Sensitivity  0.3% <br>
+
     Sensitivity  0.3%     //A&B/B this represents sensitivity somewhat if dbsnp is considered a high quality Indel
 +
                          //set and the sample are the same in both data sets. (which they usually are not, this is still
 +
                          //nonetheless a useful indicator)<br>
 
   mills
 
   mills
 
     A-B      5705 [0.81]
 
     A-B      5705 [0.81]
Line 246: Line 252:     
* Annotation of STRs is really important.  Show example of a deceptive single base pair variant
 
* Annotation of STRs is really important.  Show example of a deceptive single base pair variant
 +
* Mendelian analysis
 +
* AFS
 
* Can check concordance of genotypes between callers - partitiion
 
* Can check concordance of genotypes between callers - partitiion
 
* Type of Indels - homopolymer types and STR types and isolated, Adjacent SNPs ,Adjacent MNPs,Clumping variants
 
* Type of Indels - homopolymer types and STR types and isolated, Adjacent SNPs ,Adjacent MNPs,Clumping variants
1,102

edits

Navigation menu