From Genome Analysis Wiki
Jump to: navigation, search
no edit summary
'''Note:''' the latest version of this practical is available at: [[SeqShop: Variant Calling and Filtering for SNPs Practical]]
* The ones here is the original one from the June workshop (updated to be run from elsewhere)
See the [[Media:SeqShop - GotCloud snpcall.pdf|introductory slides]] for an intro to this tutorial.
''If you are running during the SeqShop Workshop, please skip this section.''
<div class="mw-collapsible-content">
=== Download the example data ===
=== Setup your run environment ===
Environment variables will be used throughout This tutorial builds on the alignment tutorial. We recommend that you setup these variables so , if you won't have to modify every command in the not already, please first run that tutorial.  <div class="mw-collapsible mw-collapsed" style="width:500px">I'm using bash (replace the paths below with the appropriate paths):<div class="mw-collapsible-content">* Point to where you installed GotCloud*:<pre>export GC=/home/username/gotcloud</pre>* Point to where you installed the seqshop files*:<pre>export SS=/home/username/seqshop/</pre>* Point to where you want the output to go*:<pre>export OUT=/home/username/seqshop_output/</pre></div></div> <div class="mw-collapsible mw-collapsed" style="width:500px">I'm using tcsh (replace the paths below with the appropriate paths):<div class="mw-collapsible-content">* Point to where you installed GotCloud*:<pre>setenv GC /home/username/gotcloud</pre>* Point to where you installed the seqshop files*:<pre>setenv SS /home/username/seqshop/</pre>* Point to where you want the output to go*[[SeqShop:<pre>setenv OUT /home/username/seqshop_output/</pre></div></div> </div></div>_Sequence_Mapping_and_Assembly_Practical, June 2014|Alignment Tutorial]]
== Examining GotCloud SnpCall Input files ==
Per sample BAM files contain sequence reads that are mapped to positions in the genome.
For a reminder on how to look at/read BAM files, see: [[SeqShop:_Sequence_Mapping_and_Assembly_Practical, June 2014#BAM_Files|SeqShop Aligment: BAM Files]]
For this tutorial, we will use the 4 BAMs produced in the [[SeqShop: Sequence Mapping and Assembly Practical, June 2014]] as well as with 58 BAMs that were pre-aligned to that 1MB region of chromosome 22.
=== Reference Files ===
<li>View Screenshot</li>
<div class="mw-collapsible-content">
;Do you notice a difference between this index and yours?
<div class="mw-collapsible mw-collapsed" style="width:500px550px">
<div class="mw-collapsible-content">
<li>It doesn't have a full path to the BAM file, while your index has /home/...</li>
<li>That's ok, we will use the <code>--base_prefix ${SS}</code> command-line option to prefix the BAM paths</li><li>Alternatively, we could have set BAM_INDEX in <code>gotcloud.conf</code> contains to the path to those the BAMs<pre>BAM_INDEX = /home/username/seqshop/example</pre> </li>[[File<ul><li>NOTE:BamindexConfthe conf file can't interpret ${SS} environment variables or '~', so you would have to specify the full path</li><li>We just used the command-line option for this tutorial since this path will vary by user.png|300px]]</li></ul>
We need to add these BAMs to our index
* Append the bam.index from the pre-aligned BAMs to the one you generated from the alignment pipeline
** '''Be sure to do this command just once'''
cat ${SS}/bams/bam.index >> ${OUT}/bam.index
* ">>" will append to the file that follows it
* Be sure to do this command just once
** Check that your BAM index is the correct size
**:<pre>wc -l ${OUT}/bam.index</pre>
We will use the same configuration file as we used yesterday in GotCloud Align.
See [[SeqShop:_Sequence_Mapping_and_Assembly_Practical_Sequence Mapping and Assembly Practical, June 2014#GotCloud Configuration File|SeqShop: Alignment: GotCloud Configuration File]] for more details
* Note we want to limit snpcall to just chr22 so the configuration already has <code>CHRS = 22</code> (default was 1-22 & X).
Now that we have all of our input files, we need just a simple command to run:
${GC}/gotcloud snpcall --conf ${SS}/gotcloud.conf --numjobs 4 --region 22:36000000-37000000 --base_prefix ${SS} --outdir ${OUT}
* <code>${GC}/gotcloud</code> runs GotCloud
* <code>align</code> tells GotCloud you want to run the alignment pipeline.
* <code>--conf</code> tells GotCloud the name of the configuration file to use.
** The configuration for this test was downloaded with the seqshop input files.
* --numjobs tells GotCloud how many jobs to run in parallel
** Depends on your system
* --region 22:36000000-37000000
** The sample files are just a small region of chromosome 22, so to save time, we tell GotCloud to ignore the other regions
* <code>--base_prefix</code> tells GotCloud the prefix to append to relative paths.
** The Configuration file cannot read environment variables, so we need to tell GotCloud the path to the input files, ${SS}
** Alternatively, gotcloud.conf could be updated to specify the full paths
* <code>--out_dir</code> tells GotCloud where to write the output.
** This could be specified in gotcloud.conf, but to allow you to use the ${OUT} to change the output location, it is specified on the command-line
<div class="mw-collapsible mw-collapsed" style="width:500px">
* Beagle will take about 2-3 minutes to complete
* Thunder will automatically run and will take another 3-4 minutes
<div class="mw-collapsible mw-collapsed" style="width:350px">
When completed, it should look like this:
<div class="mw-collapsible-content">
=== Genotype Refinement Output ===
; What's new in the output directory?
<div class="mw-collapsible mw-collapsed" style="width:500px">
<div class="mw-collapsible-content">
:<pre>ls ${OUT}</pre>
<li><code>beagle</code> directory : Beagle output</li>
<li><code>thunder</code> directory : Thunder output</li>
<li><code>umake.beagle.conf*</code> : Configuration values Contain the configuration & steps used for GotCloud beagle</li><li><code>umake.beagle.Makefile</code> : GNU makefile for commands run as part of GotCloud beagle</li><li><code>umake.beagle.Makefile.log</code> : Log of the in GotCloud beagle run</li>
<li><code>umake.thunder.*</code> files : Contain the configuration & steps used in GotCloud thunder</li>
Let's take a look at that interesting location we found in the [[SeqShop:_Sequence_Mapping_and_Assembly_Practical, June 2014#Accessing_BAMs_by_Position|alignment tutorial]] : chromosome 22, positions 36907000-36907100
Use tabix to extract that from the VCFs:
Let's get some information on the BEAGLE VCF:
perl ${SS}/ext/ --vcf1 ${SS}/ref22/1kg.omni.chr22.36Mb.vcf.gz --vcf2 ${OUT}/beagle/chr22/chr22.filtered.PASS.beagled.ALL.vcf.gz --gcRoot ${GC} --out ${OUT}/bedDiff.beagle
perl ${EXT}/ --vcf1 ${REF}/1kg.omni.chr22.36Mb.vcf.gz --vcf2 ${OUT}/beagle/chr22/chr22.filtered.PASS.beagled.ALL.vcf.gz --gcRoot ${GC} --out ${OUT}/bedDiff.beagle
Look at the results:
Now, let's see if it improved after running Thunder VCF:
perl ${EXTSS}/ext/ --vcf1 ${REFSS}/ref22/1kg.omni.chr22.36Mb.vcf.gz --vcf2 ${OUT}/thunder/chr22/ALL/thunder/chr22.filtered.PASS.beagled.ALL.thunder.vcf.gz --gcRoot ${GC} --out ${OUT}/bedDiff.thunder
Look at the results:

Navigation menu