Changes

From Genome Analysis Wiki
Jump to: navigation, search
no edit summary
'''Note:''' the latest version of this practical is available at: [[SeqShop: Variant Calling and Filtering for SNPs Practical]]
* The ones here is the original one from the June workshop (updated to be run from elsewhere)
 
 
==Introduction==
See the [[Media:SeqShop - GotCloud snpcall.pdf|introductory slides]] for an intro to this tutorial.
''If you are running during the SeqShop Workshop, please skip this section.''
<div class="mw-collapsible-content">
=== Download the example data ===
 
=== Setup your run environment ===
Environment variables will be used throughout This tutorial builds on the alignment tutorial. We recommend that you setup these variables so , if you won't have to modify every command in the not already, please first run that tutorial.  <div class="mw-collapsible mw-collapsed" style="width:500px">I'm using bash (replace the paths below with the appropriate paths):<div class="mw-collapsible-content">* Point to where you installed GotCloud*:<pre>export GC=/home/username/gotcloud</pre>* Point to where you installed the seqshop files*:<pre>export SS=/home/username/seqshop/</pre>* Point to where you want the output to go*:<pre>export OUT=/home/username/seqshop_output/</pre></div></div> <div class="mw-collapsible mw-collapsed" style="width:500px">I'm using tcsh (replace the paths below with the appropriate paths):<div class="mw-collapsible-content">* Point to where you installed GotCloud*:<pre>setenv GC /home/username/gotcloud</pre>* Point to where you installed the seqshop files*:<pre>setenv SS /home/username/seqshop/</pre>* Point to where you want the output to go*[[SeqShop:<pre>setenv OUT /home/username/seqshop_output/</pre></div></div> </div></div>_Sequence_Mapping_and_Assembly_Practical, June 2014|Alignment Tutorial]]
{{SeqShopRemoteEnv}}
== Examining GotCloud SnpCall Input files ==
Per sample BAM files contain sequence reads that are mapped to positions in the genome.
For a reminder on how to look at/read BAM files, see: [[SeqShop:_Sequence_Mapping_and_Assembly_Practical, June 2014#BAM_Files|SeqShop Aligment: BAM Files]]
For this tutorial, we will use the 4 BAMs produced in the [[SeqShop: Sequence Mapping and Assembly Practical, June 2014]] as well as with 58 BAMs that were pre-aligned to that 1MB region of chromosome 22.
=== Reference Files ===
<li>View Screenshot</li>
<div class="mw-collapsible-content">
[[File:RefDir.png|500px700px]]
</div>
</div>
;Do you notice a difference between this index and yours?
<ul>
<div class="mw-collapsible mw-collapsed" style="width:500px550px">
<li>Answer:</li>
<div class="mw-collapsible-content">
<li>It doesn't have a full path to the BAM file, while your index has /home/...</li>
[[File:Bamindex1.png|300px]]
<li>That's ok, we will use the <code>--base_prefix ${SS}</code> command-line option to prefix the BAM paths</li><li>Alternatively, we could have set BAM_INDEX in <code>gotcloud.conf</code> contains to the path to those the BAMs<pre>BAM_INDEX = /home/username/seqshop/example</pre> </li>[[File<ul><li>NOTE:BamindexConfthe conf file can't interpret ${SS} environment variables or '~', so you would have to specify the full path</li><li>We just used the command-line option for this tutorial since this path will vary by user.png|300px]]</li></ul>
</div>
</div>
We need to add these BAMs to our index
* Append the bam.index from the pre-aligned BAMs to the one you generated from the alignment pipeline
** '''Be sure to do this command just once'''
cat ${SS}/bams/bam.index >> ${OUT}/bam.index
* ">>" will append to the file that follows it
* Be sure to do this command just once
** Check that your BAM index is the correct size
**:<pre>wc -l ${OUT}/bam.index</pre>
We will use the same configuration file as we used yesterday in GotCloud Align.
See [[SeqShop:_Sequence_Mapping_and_Assembly_Practical_Sequence Mapping and Assembly Practical, June 2014#GotCloud Configuration File|SeqShop: Alignment: GotCloud Configuration File]] for more details
* Note we want to limit snpcall to just chr22 so the configuration already has <code>CHRS = 22</code> (default was 1-22 & X).
Now that we have all of our input files, we need just a simple command to run:
${GC}/gotcloud snpcall --conf ${SS}/gotcloud.conf --numjobs 4 --region 22:36000000-37000000 --base_prefix ${SS} --outdir ${OUT}
* <code>${GC}/gotcloud</code> runs GotCloud
* <code>align</code> tells GotCloud you want to run the alignment pipeline.
* <code>--conf</code> tells GotCloud the name of the configuration file to use.
** The configuration for this test was downloaded with the seqshop input files.
* --numjobs tells GotCloud how many jobs to run in parallel
** Depends on your system
* --region 22:36000000-37000000
** The sample files are just a small region of chromosome 22, so to save time, we tell GotCloud to ignore the other regions
* <code>--base_prefix</code> tells GotCloud the prefix to append to relative paths.
** The Configuration file cannot read environment variables, so we need to tell GotCloud the path to the input files, ${SS}
** Alternatively, gotcloud.conf could be updated to specify the full paths
* <code>--out_dir</code> tells GotCloud where to write the output.
** This could be specified in gotcloud.conf, but to allow you to use the ${OUT} to change the output location, it is specified on the command-line
<div class="mw-collapsible mw-collapsed" style="width:500px">
=== Running GotCloud Genotype Refinement ===
Since everything is setup, just run the following command (very similar to snpcall).
${GC}/gotcloud ldrefine --conf ${INSS}/gotcloud.conf --numjobs 2 --region 22:36000000-37000000--base_prefix ${SS} --outdir ${OUT}
* Beagle will take about 2-3 minutes to complete
* Thunder will automatically run and will take another 3-4 minutes
 
<div class="mw-collapsible mw-collapsed" style="width:350px">
When completed, it should look like this:
<div class="mw-collapsible-content">
[[File:GcldrefineOut.png]]
</div>
</div>
=== Genotype Refinement Output ===
; What's new in the output directory?
 
<ul>
<div class="mw-collapsible mw-collapsed" style="width:500px">
<li>Answer</li>
<div class="mw-collapsible-content">
:<pre>ls ${OUT}</pre>
<ul>
<li><code>beagle</code> directory : Beagle output</li>
<li><code>thunder</code> directory : Thunder output</li>
<li><code>umake.beagle.conf*</code> : Configuration values Contain the configuration & steps used for GotCloud beagle</li><li><code>umake.beagle.Makefile</code> : GNU makefile for commands run as part of GotCloud beagle</li><li><code>umake.beagle.Makefile.log</code> : Log of the in GotCloud beagle run</li>
<li><code>umake.thunder.*</code> files : Contain the configuration & steps used in GotCloud thunder</li>
</ul>
</ul>
Let's take a look at that interesting location we found in the [[SeqShop:_Sequence_Mapping_and_Assembly_Practical, June 2014#Accessing_BAMs_by_Position|alignment tutorial]] : chromosome 22, positions 36907000-36907100
Use tabix to extract that from the VCFs:
The region we selected contains ''APOL1'' gene, which is known to play an important role in kidney diseases such as nephrotic syndrome. One of the non-synonymous risk allele, <code>rs73885139</code> located at position <code>22:36661906</code> increases the risk of nephrotic syndrome by >2-folds. Let's see if we found the interesting variant by looking at the VCF file by position.
${GC}/bin/tabix ${OUT}/vcfs/chr22/chr22.filtered.vcf.gz 22:36661906 | head -1
Did you see a variant at the position?
${GC}/bin/tabix ${OUT}/vcfs/chr22/chr22.filtered.vcf.gz 22:36661906 | head -1
22 36661906 . A G 18 PASS DP=409;MQ=59;NS=62;AN=124;AC=2;AF=0.013827;AB=0.4065;AZ=-0.5287;
FIC=-0.0092;SLRT=-0.0075;HWEAF=0.0138;HWDAF=0.0276,0.0000;LBS=36,36,0,0,1,1,0,0;
Let's check the sequence data to confirm that the variant really exists
${GC}/bin/samtools tview $IN{SS}/bams/HG01242.recal.bam $REF{SS}/ref22/human.g1k.v37.chr22.fa
* Type 'g' to go to a specific position
Let's get some information on the BEAGLE VCF:
perl ${SS}/ext/bed-diff.pl --vcf1 ${SS}/ref22/1kg.omni.chr22.36Mb.vcf.gz --vcf2 ${OUT}/beagle/chr22/chr22.filtered.PASS.beagled.ALL.vcf.gz --gcRoot ${GC} --out ${OUT}/bedDiff.beagle
perl ${EXT}/bed-diff.pl --vcf1 ${REF}/1kg.omni.chr22.36Mb.vcf.gz --vcf2 ${OUT}/beagle/chr22/chr22.filtered.PASS.beagled.ALL.vcf.gz --gcRoot ${GC} --out ${OUT}/bedDiff.beagle
Look at the results:
Now, let's see if it improved after running Thunder VCF:
perl ${EXTSS}/ext/bed-diff.pl --vcf1 ${REFSS}/ref22/1kg.omni.chr22.36Mb.vcf.gz --vcf2 ${OUT}/thunder/chr22/ALL/thunder/chr22.filtered.PASS.beagled.ALL.thunder.vcf.gz --gcRoot ${GC} --out ${OUT}/bedDiff.thunder
Look at the results:

Navigation menu