Changes

From Genome Analysis Wiki
Jump to navigationJump to search
no edit summary
Line 5: Line 5:  
** How to examine the variants at particular genomic positions
 
** How to examine the variants at particular genomic positions
 
** How to evaluate the quality of SNP calls
 
** How to evaluate the quality of SNP calls
  −
== GotCloud SnpCall Pipeline ==
  −
  −
[[File:SnpcallDiagram.png|500px]]
  −
  −
=== Why GotCloud?===
  −
Many of the same reasons as using GotCloud align
  −
* Easy to learn & run
  −
** All-in-one package for '''snp calling pipeline'''
  −
** You don’t have to know the details of individual component
  −
* Robust parallelization
  −
** Automatic partitions '''by regions'''
  −
** Reliable and fault-tolerant parallelization via GNU make
  −
*** Restart from where it stopped upon unexpected crash
  −
* Cloud & Cluster-friendly
  −
** Supports multiple clusters such as MOSIX, Slurm, & SGE
  −
** Amazon instances allow running large-scale jobs without having your own cluster
  −
* '''Easy to add new samples to your study'''
  −
** Just add them to your index
  −
** GotCloud will reuse the genotype likelihoods for samples already completed
         
{{SeqShopLogin}}
 
{{SeqShopLogin}}
   −
== Setup your run environment ==
+
== Setup your run environment==
   −
This will setup some environment variables to point you to the Tutorial files as well as to an output directory.
+
This is the same setup you did for the previous tutorial, but you need to redo it each time you log in. It will setup some environment variables to point you to:
 +
* GotCloud program
 +
* Tutorial input files
 +
* Setup an output directory
 +
** It will leave your output directory from the previous tutorial in tact.
 
  source /home/mktrost/seqshop/setup.txt
 
  source /home/mktrost/seqshop/setup.txt
 
+
* You won't see any output after running <code>source</code>
Alternatively, if you would like to change the output directory, copy the file, make the modifications and source your file:
+
** It silently sets up your environment
cp /home/mktrost/seqshop/setup.txt ~/setup.txt
+
<div class="mw-collapsible mw-collapsed" style="width:200px">
nedit ~/setup.txt
+
View setup.txt
source ~/setup.txt
+
<div class="mw-collapsible-content">
(You can use your favorite editor instead of nedit.  I typically use emacs, but nedit is more like Windows.)
+
[[File:setup.png|500px]]
 +
</div>
 +
</div>
       
== Examining GotCloud SnpCall Input files ==
 
== Examining GotCloud SnpCall Input files ==
 
=== Sequnce Alignment Files: BAM Files ===
 
=== Sequnce Alignment Files: BAM Files ===
 +
    
=== Reference Files ===
 
=== Reference Files ===
 +
Reference files can be downloaded with GotCloud or from other sources
 +
* See [[GotCloud: Genetic Reference and Resource Files]] for more information on downloading/generating reference files
 +
 +
For GotCloud snpcall, you need:
 +
# Reference genome FASTA file
 +
#* Contains the reference base for each position of each chromosome
 +
#* Additional information on the FASTA format: http://en.wikipedia.org/wiki/FASTA_format
 +
# VCF (variant call format) files with chromosomes/positions
 +
#* dbsnp - used to skip known variants when recalibrating
 +
#* hapmap - used for sample contamination/sample swap validation
 +
#*
 +
    
=== GotCloud BAM Index File ===
 
=== GotCloud BAM Index File ===
Line 53: Line 52:     
== Run GotCloud SnpCall ==
 
== Run GotCloud SnpCall ==
 +
[[File:SnpcallDiagram.png|500px]]
 +
 
Now that we have all of our input files, we need just a simple command to run:
 
Now that we have all of our input files, we need just a simple command to run:
 
  ${GC}/gotcloud/gotcloud snpcall --conf ${GC}/inputs/gotcloud.conf --numjobs 4 --region 22:36000000-37000000
 
  ${GC}/gotcloud/gotcloud snpcall --conf ${GC}/inputs/gotcloud.conf --numjobs 4 --region 22:36000000-37000000

Navigation menu