Changes

From Genome Analysis Wiki
Jump to navigationJump to search
no edit summary
Line 1: Line 1: −
  seqshop/gotcloud/gotcloud snpcall --conf seqshop/inputs/gotcloud.conf --numjobs 4 --region 22:36000000-37000000
+
== Goals of This Session ==
 +
* What we want to learn
 +
** Basic variant call file format (VCF)
 +
** How to generate filtered variant calls for SNPs
 +
** How to evaluate the quality of variant calls
 +
** How to visualize the variant calls to examine the variants at particular genomic positions
 +
 
 +
== GotCloud SnpCall Pipeline ==
 +
 
 +
[[File:SnpcallDiagram.png|500px]]
 +
 
 +
=== Why GotCloud?===
 +
Many of the same reasons as using GotCloud align
 +
* Easy to learn & run
 +
** All-in-one package for '''snp calling pipeline'''
 +
** You don’t have to know the details of individual component
 +
* Robust parallelization
 +
** Automatic partitions '''by regions'''
 +
** Reliable and fault-tolerant parallelization via GNU make
 +
*** Restart from where it stopped upon unexpected crash
 +
* Cloud & Cluster-friendly
 +
** Supports multiple clusters such as MOSIX, Slurm, & SGE
 +
** Amazon instances allow running large-scale jobs without having your own cluster
 +
* '''Easy to add new samples to your study'''
 +
** Just add them to your index
 +
** GotCloud will reuse the genotype likelihoods for samples already completed
 +
 
 +
 
 +
{{SeqShopLogin}}
 +
 
 +
== Setup your run environment ==
 +
 
 +
This will setup some environment variables to point you to the Tutorial files as well as to an output directory.
 +
source /home/mktrost/seqshop/setup.txt
 +
 
 +
Alternatively, if you would like to change the output directory, copy the file, make the modifications and source your file:
 +
  cp /home/mktrost/seqshop/setup.txt ~/setup.txt
 +
nedit ~/setup.txt
 +
source ~/setup.txt
 +
(You can use your favorite editor instead of nedit.  I typically use emacs, but nedit is more like Windows.)
 +
 
 +
 
 +
== Examining GotCloud SnpCall Input files ==
 +
=== Sequnce Alignment Files: BAM Files ===
 +
 
 +
=== Reference Files ===
 +
 
 +
=== GotCloud BAM Index File ===
 +
 
 +
=== GotCloud Configuration File ===
 +
We will use the same configuration file as we used yesterday in GotCloud Align.
 +
 
 +
 
 +
== Run GotCloud SnpCall ==
 +
Now that we have all of our input files, we need just a simple command to run:
 +
${GC}/gotcloud/gotcloud snpcall --conf ${GC}/inputs/gotcloud.conf --numjobs 4 --region 22:36000000-37000000
 +
* --numjobs tells GotCloud how many jobs to run in parallel
 +
** Depends on your system
 +
* --region 22:36000000-37000000
 +
** The sample files are just a small region of chromosome 22, so to save time, we tell Gotcloud to ignore the other regions
 +
 
 +
This should take about 5 minutes to run.
 +
 
 +
It should end with a line like: <code>TBD</code>
 +
 
 +
If you cancelled GotCloud part way through, just rerun your GotCloud command and it will pick up where it left off.
 +
 
 +
== Examining GotCloud SnpCall Output ==
 +
 
 +
=== Filtering Summary Statistics ===
 +
 
 +
cat ${OUTPUT}/vcfs/chr22/chr22.filtered.sites.vcf.summary
 +
 
 +
<div class="mw-collapsible mw-collapsed" style="width:250px">
 +
View Annotated Screenshot
 +
<div class="mw-collapsible-content">
 +
[[File:filterSum.png]]
 +
</div>
 +
</div>
 +
 
 +
 
 +
== GotCloud Genotype Refinement ==
 +
 
 
  seqshop/gotcloud/gotcloud beagle --conf seqshop/inputs/gotcloud.conf --numjobs 2 --region 22:36000000-37000000
 
  seqshop/gotcloud/gotcloud beagle --conf seqshop/inputs/gotcloud.conf --numjobs 2 --region 22:36000000-37000000

Navigation menu