Changes

From Genome Analysis Wiki
Jump to navigationJump to search
no edit summary
Line 1: Line 1:  +
'''Note:''' the latest version of this practical is available at: [[SeqShop: Variant Calling and Filtering for INDELs Practical]]
 +
* The ones here is the original one from the June workshop (updated to be run from elsewhere)
 +
 +
 
== Goals of This Session ==
 
== Goals of This Session ==
 
* What we want to learn  
 
* What we want to learn  
Line 5: Line 9:  
** How to evaluate the quality of INDEL calls
 
** How to evaluate the quality of INDEL calls
   −
[[Media:Variant Calling and Filtering for INDELs.pdf|Intro Slides]]
+
[[Media:Variant Calling and Filtering for INDELs.pdf|Lecture Slides]]
    
== Setup in person at the SeqShop Workshop ==
 
== Setup in person at the SeqShop Workshop ==
Line 45: Line 49:  
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
   −
This tutorial builds on the alignment tutorial, if you have not already, please first run that tutorial: [[SeqShop:_Sequence_Mapping_and_Assembly_Practical|Alignment Tutorial]]
+
This tutorial builds on the alignment tutorial, if you have not already, please first run that tutorial: [[SeqShop:_Sequence_Mapping_and_Assembly_Practical, June 2014|Alignment Tutorial]]
   −
It also uses the bam.index file created in the SnpCall Tutorial.  If you have not yet run that tutorial, please follow the directions at: [[SeqShop:_Variant_Calling_and_Filtering_for_SNPs_Practical#GotCloud_BAM_Index_File|GotCloud BAM Index File]]
+
It also uses the bam.index file created in the SnpCall Tutorial.  If you have not yet run that tutorial, please follow the directions at: [[SeqShop:_Variant_Calling_and_Filtering_for_SNPs_Practical, June 2014#GotCloud_BAM_Index_File|GotCloud BAM Index File]]
      Line 58: Line 62:  
* BAMs->INDELs rather than BAMs->SNPs
 
* BAMs->INDELs rather than BAMs->SNPs
   −
If you want a reminder, of what they look like, here is a link to the previous tutorial : [[SeqShop:_Variant_Calling_and_Filtering_for_SNPs_Practical#Examining_GotCloud_SnpCall_Input_files|GotCloud SnpCall Input Files]]
+
If you want a reminder, of what they look like, here is a link to the previous tutorial : [[SeqShop:_Variant_Calling_and_Filtering_for_SNPs_Practical, June 2014#Examining_GotCloud_SnpCall_Input_files|GotCloud SnpCall Input Files]]
    
== Running GotCloud Indel ==
 
== Running GotCloud Indel ==
Line 264: Line 268:  
The following section details some simple analyses we can perform.
 
The following section details some simple analyses we can perform.
   −
== Summary ==
+
===Summary===
    
First you want to know what is in the vcf file.
 
First you want to know what is in the vcf file.
Line 305: Line 309:  
   
 
   
 
   #passed indels of length >4  
 
   #passed indels of length >4  
   ${GC}/bin/vt peek ${OUT}/final/all.genotypes.vcf.gz -f "FILTER.PASS&&LEN>1"
+
   ${GC}/bin/vt peek ${OUT}/final/all.genotypes.vcf.gz -f "FILTER.PASS&&LEN>4"
 
    
 
    
 
   #passed singletons of length 4 or insertions of length 3
 
   #passed singletons of length 4 or insertions of length 3
 
   ${GC}/bin/vt peek ${OUT}/final/all.genotypes.vcf.gz -f "FILTER.PASS&&(LEN==4||DLEN==3)"
 
   ${GC}/bin/vt peek ${OUT}/final/all.genotypes.vcf.gz -f "FILTER.PASS&&(LEN==4||DLEN==3)"
   −
== Comparison with other data sets ==
+
=== Comparison with other data sets ===
    
It is usually useful to examine the call sets against known data sets for the passed variants.
 
It is usually useful to examine the call sets against known data sets for the passed variants.
Line 319: Line 323:  
Edit indel.reference.txt and specify the correct path to ${SS}
 
Edit indel.reference.txt and specify the correct path to ${SS}
 
  nedit ${OUT}/indel.reference.txt
 
  nedit ${OUT}/indel.reference.txt
 +
*'''Replace all occurrences of <code>username</code> with your username  (or the correct path to your seqshop example directory).'''
 +
*:[[File:IndelRef.png]]
    
  ${GC}/bin/vt profile_indels -g ${OUT}/indel.reference.txt  -r ${SS}/ref22/human.g1k.v37.chr22.fa ${OUT}/final/all.genotypes.vcf.gz -i 22:36000000-37000000 -f "PASS"
 
  ${GC}/bin/vt profile_indels -g ${OUT}/indel.reference.txt  -r ${SS}/ref22/human.g1k.v37.chr22.fa ${OUT}/final/all.genotypes.vcf.gz -i 22:36000000-37000000 -f "PASS"
Line 388: Line 394:  
This analysis supports filters too.
 
This analysis supports filters too.
   −
==Normalization==
+
===Normalization===
    
A slight digression here, when analyzing indels, it is important to normalize it.  While it is a simple concept,
 
A slight digression here, when analyzing indels, it is important to normalize it.  While it is a simple concept,

Navigation menu