Line 1: |
Line 1: |
| + | '''Note:''' the latest version of this practical is available at: [[SeqShop: Variant Calling and Filtering for INDELs Practical]] |
| + | * The ones here is the original one from the June workshop (updated to be run from elsewhere) |
| + | |
| + | |
| == Goals of This Session == | | == Goals of This Session == |
| * What we want to learn | | * What we want to learn |
Line 5: |
Line 9: |
| ** How to evaluate the quality of INDEL calls | | ** How to evaluate the quality of INDEL calls |
| | | |
− | [[Media:Variant Calling and Filtering for INDELs.pdf|Intro Slides]] | + | [[Media:Variant Calling and Filtering for INDELs.pdf|Lecture Slides]] |
| | | |
| == Setup in person at the SeqShop Workshop == | | == Setup in person at the SeqShop Workshop == |
Line 45: |
Line 49: |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
| | | |
− | This tutorial builds on the alignment tutorial, if you have not already, please first run that tutorial: [[SeqShop:_Sequence_Mapping_and_Assembly_Practical|Alignment Tutorial]] | + | This tutorial builds on the alignment tutorial, if you have not already, please first run that tutorial: [[SeqShop:_Sequence_Mapping_and_Assembly_Practical, June 2014|Alignment Tutorial]] |
| | | |
− | It also uses the bam.index file created in the SnpCall Tutorial. If you have not yet run that tutorial, please follow the directions at: [[SeqShop:_Variant_Calling_and_Filtering_for_SNPs_Practical#GotCloud_BAM_Index_File|GotCloud BAM Index File]] | + | It also uses the bam.index file created in the SnpCall Tutorial. If you have not yet run that tutorial, please follow the directions at: [[SeqShop:_Variant_Calling_and_Filtering_for_SNPs_Practical, June 2014#GotCloud_BAM_Index_File|GotCloud BAM Index File]] |
| | | |
| | | |
Line 58: |
Line 62: |
| * BAMs->INDELs rather than BAMs->SNPs | | * BAMs->INDELs rather than BAMs->SNPs |
| | | |
− | If you want a reminder, of what they look like, here is a link to the previous tutorial : [[SeqShop:_Variant_Calling_and_Filtering_for_SNPs_Practical#Examining_GotCloud_SnpCall_Input_files|GotCloud SnpCall Input Files]] | + | If you want a reminder, of what they look like, here is a link to the previous tutorial : [[SeqShop:_Variant_Calling_and_Filtering_for_SNPs_Practical, June 2014#Examining_GotCloud_SnpCall_Input_files|GotCloud SnpCall Input Files]] |
| | | |
| == Running GotCloud Indel == | | == Running GotCloud Indel == |
Line 264: |
Line 268: |
| The following section details some simple analyses we can perform. | | The following section details some simple analyses we can perform. |
| | | |
− | == Summary == | + | ===Summary=== |
| | | |
| First you want to know what is in the vcf file. | | First you want to know what is in the vcf file. |
Line 305: |
Line 309: |
| | | |
| #passed indels of length >4 | | #passed indels of length >4 |
− | ${GC}/bin/vt peek ${OUT}/final/all.genotypes.vcf.gz -f "FILTER.PASS&&LEN>1" | + | ${GC}/bin/vt peek ${OUT}/final/all.genotypes.vcf.gz -f "FILTER.PASS&&LEN>4" |
| | | |
| #passed singletons of length 4 or insertions of length 3 | | #passed singletons of length 4 or insertions of length 3 |
| ${GC}/bin/vt peek ${OUT}/final/all.genotypes.vcf.gz -f "FILTER.PASS&&(LEN==4||DLEN==3)" | | ${GC}/bin/vt peek ${OUT}/final/all.genotypes.vcf.gz -f "FILTER.PASS&&(LEN==4||DLEN==3)" |
| | | |
− | == Comparison with other data sets == | + | === Comparison with other data sets === |
| | | |
| It is usually useful to examine the call sets against known data sets for the passed variants. | | It is usually useful to examine the call sets against known data sets for the passed variants. |
Line 319: |
Line 323: |
| Edit indel.reference.txt and specify the correct path to ${SS} | | Edit indel.reference.txt and specify the correct path to ${SS} |
| nedit ${OUT}/indel.reference.txt | | nedit ${OUT}/indel.reference.txt |
| + | *'''Replace all occurrences of <code>username</code> with your username (or the correct path to your seqshop example directory).''' |
| + | *:[[File:IndelRef.png]] |
| | | |
| ${GC}/bin/vt profile_indels -g ${OUT}/indel.reference.txt -r ${SS}/ref22/human.g1k.v37.chr22.fa ${OUT}/final/all.genotypes.vcf.gz -i 22:36000000-37000000 -f "PASS" | | ${GC}/bin/vt profile_indels -g ${OUT}/indel.reference.txt -r ${SS}/ref22/human.g1k.v37.chr22.fa ${OUT}/final/all.genotypes.vcf.gz -i 22:36000000-37000000 -f "PASS" |
Line 388: |
Line 394: |
| This analysis supports filters too. | | This analysis supports filters too. |
| | | |
− | ==Normalization== | + | ===Normalization=== |
| | | |
| A slight digression here, when analyzing indels, it is important to normalize it. While it is a simple concept, | | A slight digression here, when analyzing indels, it is important to normalize it. While it is a simple concept, |