Changes

From Genome Analysis Wiki
Jump to navigationJump to search
no edit summary
Line 1: Line 1:  +
'''Note:''' the latest version of this practical is available at: [[SeqShop: Variant Calling and Filtering for INDELs Practical]]
 +
* The ones here is the original one from the June workshop (updated to be run from elsewhere)
 +
 +
 
== Goals of This Session ==
 
== Goals of This Session ==
 
* What we want to learn  
 
* What we want to learn  
Line 5: Line 9:  
** How to evaluate the quality of INDEL calls
 
** How to evaluate the quality of INDEL calls
   −
[[Media:Variant Calling and Filtering for INDELs.pdf|Intro Slides]]
+
[[Media:Variant Calling and Filtering for INDELs.pdf|Lecture Slides]]
    
== Setup in person at the SeqShop Workshop ==
 
== Setup in person at the SeqShop Workshop ==
Line 44: Line 48:  
''If you are running during the SeqShop Workshop, please skip this section.''
 
''If you are running during the SeqShop Workshop, please skip this section.''
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
=== Download the example data ===
  −
  −
=== Setup your run environment ===
  −
  −
Environment variables will be used throughout the tutorial.
     −
We recommend that you setup these variables so you won't have to modify every command in the tutorial.
+
This tutorial builds on the alignment tutorial, if you have not already, please first run that tutorial: [[SeqShop:_Sequence_Mapping_and_Assembly_Practical, June 2014|Alignment Tutorial]]
   −
<div class="mw-collapsible mw-collapsed" style="width:500px">
+
It also uses the bam.index file created in the SnpCall Tutorial.  If you have not yet run that tutorial, please follow the directions at: [[SeqShop:_Variant_Calling_and_Filtering_for_SNPs_Practical, June 2014#GotCloud_BAM_Index_File|GotCloud BAM Index File]]
I'm using bash (replace the paths below with the appropriate paths):
  −
<div class="mw-collapsible-content">
  −
* Point to where you installed GotCloud
  −
*:<pre>export GC=/home/username/gotcloud</pre>
  −
* Point to where you installed the seqshop files
  −
*:<pre>export SS=/home/username/seqshop/</pre>
  −
* Point to where you want the output to go
  −
*:<pre>export OUT=/home/username/seqshop_output/</pre>
  −
</div>
  −
</div>
     −
<div class="mw-collapsible mw-collapsed" style="width:500px">
  −
I'm using tcsh (replace the paths below with the appropriate paths):
  −
<div class="mw-collapsible-content">
  −
* Point to where you installed GotCloud
  −
*:<pre>setenv GC /home/username/gotcloud</pre>
  −
* Point to where you installed the seqshop files
  −
*:<pre>setenv SS /home/username/seqshop/</pre>
  −
* Point to where you want the output to go
  −
*:<pre>setenv OUT /home/username/seqshop_output/</pre>
  −
</div>
  −
</div>
      +
{{SeqShopRemoteEnv}}
 
</div>
 
</div>
 
</div>
 
</div>
Line 83: Line 62:  
* BAMs->INDELs rather than BAMs->SNPs
 
* BAMs->INDELs rather than BAMs->SNPs
   −
If you want a reminder, of what they look like, here is a link to the previous tutorial : [[SeqShop:_Variant_Calling_and_Filtering_for_SNPs_Practical#Examining_GotCloud_SnpCall_Input_files|GotCloud SnpCall Input Files]]
+
If you want a reminder, of what they look like, here is a link to the previous tutorial : [[SeqShop:_Variant_Calling_and_Filtering_for_SNPs_Practical, June 2014#Examining_GotCloud_SnpCall_Input_files|GotCloud SnpCall Input Files]]
    
== Running GotCloud Indel ==
 
== Running GotCloud Indel ==
Line 289: Line 268:  
The following section details some simple analyses we can perform.
 
The following section details some simple analyses we can perform.
   −
== Summary ==
+
===Summary===
    
First you want to know what is in the vcf file.
 
First you want to know what is in the vcf file.
Line 330: Line 309:  
   
 
   
 
   #passed indels of length >4  
 
   #passed indels of length >4  
   ${GC}/bin/vt peek ${OUT}/final/all.genotypes.vcf.gz -f "FILTER.PASS&&LEN>1"
+
   ${GC}/bin/vt peek ${OUT}/final/all.genotypes.vcf.gz -f "FILTER.PASS&&LEN>4"
 
    
 
    
 
   #passed singletons of length 4 or insertions of length 3
 
   #passed singletons of length 4 or insertions of length 3
 
   ${GC}/bin/vt peek ${OUT}/final/all.genotypes.vcf.gz -f "FILTER.PASS&&(LEN==4||DLEN==3)"
 
   ${GC}/bin/vt peek ${OUT}/final/all.genotypes.vcf.gz -f "FILTER.PASS&&(LEN==4||DLEN==3)"
   −
== Comparison with other data sets ==
+
=== Comparison with other data sets ===
    
It is usually useful to examine the call sets against known data sets for the passed variants.
 
It is usually useful to examine the call sets against known data sets for the passed variants.
Line 344: Line 323:  
Edit indel.reference.txt and specify the correct path to ${SS}
 
Edit indel.reference.txt and specify the correct path to ${SS}
 
  nedit ${OUT}/indel.reference.txt
 
  nedit ${OUT}/indel.reference.txt
 +
*'''Replace all occurrences of <code>username</code> with your username  (or the correct path to your seqshop example directory).'''
 +
*:[[File:IndelRef.png]]
    
  ${GC}/bin/vt profile_indels -g ${OUT}/indel.reference.txt  -r ${SS}/ref22/human.g1k.v37.chr22.fa ${OUT}/final/all.genotypes.vcf.gz -i 22:36000000-37000000 -f "PASS"
 
  ${GC}/bin/vt profile_indels -g ${OUT}/indel.reference.txt  -r ${SS}/ref22/human.g1k.v37.chr22.fa ${OUT}/final/all.genotypes.vcf.gz -i 22:36000000-37000000 -f "PASS"
Line 413: Line 394:  
This analysis supports filters too.
 
This analysis supports filters too.
   −
==Normalization==
+
===Normalization===
    
A slight digression here, when analyzing indels, it is important to normalize it.  While it is a simple concept,
 
A slight digression here, when analyzing indels, it is important to normalize it.  While it is a simple concept,

Navigation menu