Changes

From Genome Analysis Wiki
Jump to navigationJump to search
2,273 bytes added ,  12:55, 31 October 2014
no edit summary
Line 1: Line 1:  +
__TOC__
 +
 +
Back to [[GotCloud]]
 +
 +
Back to [[GotCloud: Amazon]]
 +
 
== Introduction ==
 
== Introduction ==
 
This Amazon demo runs through the GotCloud SNP and INDEL calling pipelines.
 
This Amazon demo runs through the GotCloud SNP and INDEL calling pipelines.
Line 9: Line 15:  
* [[SeqShop: Variant Calling and Filtering for INDELs Practical]]
 
* [[SeqShop: Variant Calling and Filtering for INDELs Practical]]
 
* [[SeqShop: Analysis of Structural Variation Practical]]
 
* [[SeqShop: Analysis of Structural Variation Practical]]
 +
 +
 +
To run this Demo on an Amazon Cluster rather than on a single node, see: [[StarCluster#Run_GotCloud_Demo_Using_StarCluster|StarCluster -> Run GotCloud Demo Using StarCluster]]
 +
    
== Starting up a Node ==
 
== Starting up a Node ==
Line 15: Line 25:     
== Running the Demo on Already Running Node ==
 
== Running the Demo on Already Running Node ==
=== Examine the Setup ===
+
{{GotCloud: Amazon Demo Setup}}
#After logging into the Amazon node as the ubuntu user, you should by default be in the ubuntu home directory: <code>/home/ubuntu</code>
  −
## You can check this by doing:
  −
##:<pre>pwd</pre>
  −
##* This should output: <code>/home/ubuntu</code>
  −
## Take a look at the contents of the ubuntu user home directory
  −
##: <pre>ls</pre>
  −
##* This should output be 2 directories, <code>example</code> and <code>gotcloud</code>
  −
##** The <code>example</code> directory contains the files for this demo
  −
##** The <code>gotcloud</code> directory contains the gotcloud programs and pre-compiled source
  −
#:[[File:DemoHome.png|400px]]
  −
# Look at the example input files:
  −
#:<pre>ls example</pre>
  −
#:[[File:ExampleFiles.png|400px]]
  −
#:# <code>bam.list</code> contains the list of BAM files per sample
  −
#:# <code>bams</code> is a subdirectory containing the BAM files for this demo
  −
#:# <code>test.bed</code> contains the region we want to process in this demo
  −
#:#* To make the demo run faster, we only want to process a small region of chromosome 22. This file tells GotCloud the region.  The region we are using is the APOL1 region
  −
#:#:[[File:BedContents.png|400px]]
  −
#:# <code>test.conf</code> contains the settings we want GotCloud to use for this run
  −
#:#:[[File:ConfContents.png|400px]]
  −
#:#:* For the demo, we want to tell GotCloud:
  −
#:#:*# The list of bams to use: <code>BAM_LIST = example/bam.list</code>
  −
#:#:*# The region to process rather than the whole genome: <code>UNIFORM_TARGET_BED = example/test.bed</code>
  −
#:#:*# The chromosomes to process.  The default chromosomes are 1-22 & X, but we only want to process chromosome 22: <code>CHRS = 22</code>
      
=== Run GotCloud SnpCall ===
 
=== Run GotCloud SnpCall ===
Line 55: Line 41:  
# ls on the ubuntu home directory to see the new output directory:
 
# ls on the ubuntu home directory to see the new output directory:
 
#:<pre>ls</pre>
 
#:<pre>ls</pre>
#:[[File:SnpCallOutput1.png|400px]]
+
#:[[File:SnpCallOutput1.png|500px]]
 
# Look inside the output directory:
 
# Look inside the output directory:
 
#:<pre>ls output</pre>
 
#:<pre>ls output</pre>
#:[[File:SnpCallOutput2.png|400px]]
+
#:[[File:SnpCallOutput2.png|700px]]
#:#<code>glfs</code>
+
#:#<code>glfs</code> - intermediate per sample genotype likelihood files
 +
#:#<code>jobfiles</code> - empty; was used to store commands as GotCloud was running
 +
#:#<code>pvcfs</code> - intermediate vcfs with per sample information
 +
#:#<code>split</code> - contains per chromosome directories with vcfs containing PASS only snps split up as required for beagle (part of ldrefine)
 +
#:#<code>target</code> - contains the bed with the region to be processed
 +
#:#<code>umake.snpcall.conf</code> - file containing all of the configuration settings used for this run of GotCloud
 +
#:#<code>umake.snpcall.Makefile</code> - Makefile containing commands for this run of GotCloud
 +
#:#<code>umake.snpcall.Makefile.cluster</code> - Makefile log of start/stop times of various steps
 +
#:#<code>umake.snpcall.Makefile.log</code> - log of the GotCloud run
 +
#:#<code>vcfs</code> - contains per chromosome directories with vcfs
 +
#:#* important output is
 +
#:#*# filtered.vcf.gz file: <code>vcfs/chr22/chr22.filtered.vcf.gz</code>
 +
#:#*# summary information: <code>vcfs/chr22/chr22.filtered.sites.vcf.summary</code>
 +
# Look at the filtered file.
 +
#: <pre>zless -S output/vcfs/chr22/chr22.filtered.vcf.gz</pre>
 +
#: Scroll down some: [[File:SnpVcfOutput.png|700px]]
 +
## Check that GotCloud found the expected SNP at 22:36661906
 +
##:<pre>tabix output/vcfs/chr22/chr22.filtered.vcf.gz 22:36661906 |head -1</pre>
 +
#:[[File:SnpTabix.png|700px]]
 +
# Look at the summary information
 +
#:<pre>cat output/vcfs/chr22/chr22.filtered.sites.vcf.summary</pre>
 +
#:[[File:SnpSummary.png|700px]]
 +
#:*'''To understand how to interpret the filtering summary statistics, please refer to [[Understanding vcf-summary output]]'''
 +
 
 +
 
 +
=== Run GotCloud Indel ===
 +
Now that we have examined the instance files, run GotCloud indel
 +
# <pre>gotcloud indel --conf example/test.conf --outdir output --numjobs 8</pre>
 +
#* The ubuntu user is setup to have the gotcloud program and tools in its path, so you can just type the program name and it will be found
 +
#: [[File:RunIndel.png|700px]]
 +
#:* This will take a few minutes to run.
 +
#:* GotCloud first generates a makefile, and then runs the makefile
 +
# When complete, GotCloud indel will indicate success/failure
 +
#:[[File:IndelSuccess.png|700px]]
 +
 
 +
==== Examining Indel Output ====
 +
# Look inside the output directory to see the new indel directories:
 +
#:<pre>ls output</pre>
 +
#:[[File:IndelSnpOutput.png|900px]]
 +
#:#<code>aux</code> - intermediate indel files
 +
#:#<code>final</code> - final output of Indel pipeline
 +
#:#<code>indelvcf</code> - intermediate indel files
 +
#:#<code>gotcloud.indel.conf</code> - file containing all of the configuration settings used for this run of GotCloud
 +
#:#<code>gotcloud.indel.Makefile</code> - Makefile containing commands for this run of GotCloud
 +
#:#<code>gotcloud.indel.Makefile.log</code> - log of the GotCloud run
 +
# Important Indel output is in <code>output/final</code>
 +
#:<pre> ls output/final/</pre>
 +
#: [[File:IndelFinal.png|900px]]
 +
# Look at the final Indel VCF file.
 +
#: <pre>zless output/final/all.genotypes.vcf.gz</pre>
 +
#: Scroll down some: [[File::IndelLess.png|900px]]
 +
## Check that GotCloud found the expected Indel at 22:36662041
 +
##:<pre>tabix output/final/all.genotypes.vcf.gz 22:36662041-36662041|less -S</pre>
 +
##:[[File:IndelTabix.png|900px]]
 +
 
 +
== Exit & Terminate ==
 +
Prior to terminating, make sure you copy any data off of the root EBS volume attached to the instance as it will be deleted when you terminate.
 +
# Exit from your terminal
 +
# Terminate your Amazon Instance
 +
#* Right click on the instance you want to terminate in the EC2 dashboard
 +
#* Select Terminate
 +
#* Select "Yes, Terminate" to indicate you would like to terminate and the storage will be deleted.

Navigation menu