Changes

From Genome Analysis Wiki
Jump to navigationJump to search
5,581 bytes added ,  10:49, 7 November 2014
Line 1: Line 1:  +
__TOC__
 +
 
Back to the beginning: [[GotCloud]]
 
Back to the beginning: [[GotCloud]]
    
Back to [[GotCloud: Amazon]]
 
Back to [[GotCloud: Amazon]]
  −
<!--        BANNER ACROSS TOP OF PAGE        -->
  −
{| style="width:100%; background:#ffb6c1; margin-top:1.2em; border:1px solid #ccc;" |
  −
| style="width:100%; text-align:center; white-space:nowrap; color:#000;" |
  −
<div style="font-size:162%; border:none; margin:0; padding:.1em; color:#000;">This page is in the process of being updated...(10/29/14)</div>
  −
|}
  −
      
If you have access to your own cluster, your task will be much simpler.
 
If you have access to your own cluster, your task will be much simpler.
Line 23: Line 18:  
of the many variations you might choose, but should provide you a working example.
 
of the many variations you might choose, but should provide you a working example.
   −
== Tasks to be completed ==
  −
  −
* Install the ec2 tools package (ec2-api-tools for Ubuntu) on your machine (optional)
  −
* Install and configure starcluster on your machine (required)
  −
** Note: gotcloud requires a 64bit machine
  −
** Please use <code>NODE_IMAGE_ID = ami-765b3e1f</code>
  −
* Create an EBS volume based on the GotCloud snapshot
  −
* Configure StarCluster to use the volume just created
  −
* Create an AWS cluster
  −
* Create storage for your sequence data and make it available for the software
  −
* Run the GotCloud software
      
== Getting Started With StarCluster ==
 
== Getting Started With StarCluster ==
StarCluster provides lots of documentation.
+
StarCluster provides lots of documentation (http://star.mit.edu/cluster/) which will provide more information on it than we have here.
    
To install and setup StarCluster for the first time, you can follow the QuickStart instructions: http://star.mit.edu/cluster/docs/latest/quickstart.html
 
To install and setup StarCluster for the first time, you can follow the QuickStart instructions: http://star.mit.edu/cluster/docs/latest/quickstart.html
Line 45: Line 29:  
*** If you need help setting up your AWS credentials, see: [[AWS Credentials]]
 
*** If you need help setting up your AWS credentials, see: [[AWS Credentials]]
   −
You can skip the cluster start section if you want.
+
You can skip actually starting the cluster in the QuickStart instructions if you want.
    
''' Troubleshooting: '''  When I tried this, the <code>starcluster start mycluster</code> step failed similar to:
 
''' Troubleshooting: '''  When I tried this, the <code>starcluster start mycluster</code> step failed similar to:
Line 59: Line 43:  
''' Don't forget to terminate your cluster:'''
 
''' Don't forget to terminate your cluster:'''
 
  starcluster terminate mycluster
 
  starcluster terminate mycluster
 +
    
== StarCluster and GotCloud ==
 
== StarCluster and GotCloud ==
 +
=== StarCluster Config Settings ===
 +
 
By default, StarCluster expects a configuration file in ~/.starcluster/config.
 
By default, StarCluster expects a configuration file in ~/.starcluster/config.
 
* StarCluster will create a model file for you
 
* StarCluster will create a model file for you
Line 72: Line 59:  
* You should have set these in [[#Getting Started With StarCluster|Getting Started With StarCluster]] above (quickstart guide and AWS Credentials) .
 
* You should have set these in [[#Getting Started With StarCluster|Getting Started With StarCluster]] above (quickstart guide and AWS Credentials) .
   −
'''GotCloud settings:'''
+
* GotCloud Cluster Definition
* You may want to Create a new cluster description for running GotCloud (or you can use smallcluster)
+
** You may want to create a new cluster section for running GotCloud (or you can use smallcluster) in your configuration file: <code>~/.starcluster/config</code>
* Use the GotCloud AMIs:
+
** You can call it anything you want, for example, <pre>gccluster</pre>
MASTER_IMAGE_ID = ami-6ae65e02
+
** Example:
NODE_IMAGE_ID = ami-3393a45a
+
**: <pre>[cluster gccluster]&#10;KEYNAME = mykey&#10;CLUSTER_SIZE = 4&#10;CLUSTER_USER = sgeadmin&#10;CLUSTER_SHELL = bash&#10;MASTER_IMAGE_ID = ami-6ae65e02&#10;NODE_IMAGE_ID = ami-3393a45a&#10;NODE_INSTANCE_TYPE = m3.large</pre>
* We do not recommend running GotCloud on machines with less than 4MB of memory
+
*** Set <code>KEYNAME</code> to the key you want to use
 +
*** Set <code>CLUSTER_SIZE</code> to the number of nodes you want to start up (this may be different than 4)
 +
*** Set <code>CLUSTER_USER</code> to add additional users, like <code>sgeadmin</code>
 +
*** Set <code>CLUSTER_SHELL</code> to define the shell you want to use, like <code>bash</code>
 +
*** Set <code>MASTER_IMAGE_ID</code> to the latest GotCloud AMI, see: [[GotCloud: AMIs]]
 +
**** Contains GotCloud, the reference, and the demo files in the /home/ubuntu/ directory that will be visible on all nodes in the cluster
 +
**** Has a 30G volume, but only 6G available
 +
*** Set <code>NODE_IMAGE_ID</code> to a StarCluster <code>ubuntu x86_64</code> AMI
 +
**** Since each node does not need its own 30G volume containing GotCloud, the reference, and the demo files, we use a separate image for the nodes.
 +
*** The nodes can just access the master's copy of GotCloud, the reference, and the Amazon demo
 +
*** Set <code>NODE_INSTANCE_TYPE</code> to the type of instances you want to start in your cluster
 +
**** See http://aws.amazon.com/ec2/pricing/ for instance descriptions and prices
 +
**** We do not recommend running GotCloud on machines with less than 4MB of memory
 +
** The <code>CLUSTER_SIZE</code> * CPUs in <code>NODE_INSTANCE_TYPE</code> = the number of jobs you can run concurrently in GotCloud
 +
 
 +
* Define Data Volumes
 +
** By default, the GotCloud AMI contains about 5G of extra space that you can use
 +
*** /home/ubuntu/ directory is visible from all machines
 +
**** Use /home/ubuntu/ for the output directory if it is <5G
 +
**** This directory will be deleted when you terminate the AMI
 +
** Create your Own Volumes and attach them to the GotCloud cluster
 +
*** '''Instructions TBD'''
 +
 
 +
=== Starting the Cluster ===
 +
# Start the cluster:
 +
#* <pre>starcluster start -c gccluster mycluster</pre>
 +
#** Alternatively, if you can the default template at the start of the configuration file in the <code>[global]</code> section to gccluster: <code>DEFAULT_TEMPLATE=gccluster</code>, you can run:
 +
#*** <pre>starcluster start mycluster</pre>
 +
#* It will take a few minutes for the cluster to start
 +
 
 +
 
 +
=== Copying Data to/from the Cluster ===
 +
Copy data onto the cluster (command run from your local machine)
 +
starcluster put /path/to/local/file/or/dir /remote/path/
 +
 
 +
 
 +
Pull the data from the cluster onto your local machine (command run from your local machine)
 +
starcluster get /path/to/remote/file/or/dir /local/path/
 +
 
 +
'''Reminder, if you write your output to /home/ubuntu/, it will be deleted when you terminate the cluster'''
 +
 
 +
 
 +
=== Running GotCloud on StarCluster ===
 +
* If you have not already, logon to the cluster as ubuntu:
 +
** <pre>starcluster sshmaster -u ubuntu mycluster</pre>
 +
*** Type <code>yes</code> if the terminal asks if you want to continue connecting
 +
* When running GotCloud:
 +
** Set the cluster/batch type in either configuration or on the command line:
 +
*** In Configuration:
 +
***: <pre>BATCH_TYPE = sgei</pre>
 +
*** On the command-line:
 +
***: <pre>--batchtype sgei</pre>
 +
** Set the number of jobs to run:
 +
**: <pre>--numjobs #</pre>
 +
*** Replace number with the number of concurrent jobs you want to run (probably <code>CLUSTER_SIZE</code> * CPUs in <code>NODE_INSTANCE_TYPE</code>)
 +
** Otherwise, run GotCloud as you normally would.
 +
 
 +
 
 +
To login to a specific non-master node, do:
 +
starcluster sshnode -u ubuntu mycluster node001
   −
[[GotCloud: AMIs]]
+
=== Monitoring Cluster Usage ===
+
* Monitor jobs in the queue
=== Run GotCloud Demo Using StarCluster ===
+
** <pre>qstat</pre>
Be sure to set:
+
** This will show you how the currently running jobs and how they are spread across the nodes in your cluster
MASTER_IMAGE_ID = ami-6ae65e02
+
**:[[File:Qstat.png|800px]]
NODE_IMAGE_ID = ami-3393a45a
+
*** state descriptions:
 +
**** <code>qw</code> : queued and waiting (not yet assigned to a node)
 +
**** <code>r</code> : running
 +
* View Sun Grid Engine Load
 +
** <pre>qhost</pre>
 +
**:[[File:Qhost.png|600px]]
 +
*** ARCH : architecture
 +
*** NCPU : number of CPUs
 +
*** LOAD : current load
 +
*** MEMTOT : total memory
 +
*** MEMUSE : memory in use
 +
*** SWAPTO : swap space
 +
*** SWAPUS : swap space in use
 +
* View the average load per node using:
 +
** <pre>qstat -f</pre>
 +
**:[[File:Qstatf.png|650px]]
 +
*** <code>load_avg</code> field contains the load average for each node
 +
 
 +
 
 +
=== Terminate the Cluster ===
 +
# Reminder, check if you need to copy any data off of the cluster that will be deleted upon termination
 +
#* [[#Copying Data to/from the Cluster|Copying Data to/from the Cluster]]
 +
# Terminate the cluster
 +
#* <pre>starcluster terminate mycluster</pre>
 +
 
 +
 
 +
== Run GotCloud Demo Using StarCluster ==
    
#Create a new cluster section in your configuration file: <code>~/.starcluster/config</code>
 
#Create a new cluster section in your configuration file: <code>~/.starcluster/config</code>
Line 90: Line 162:  
#*: <pre>[cluster gccluster]&#10;KEYNAME = mykey&#10;CLUSTER_SIZE = 4&#10;CLUSTER_USER = sgeadmin&#10;CLUSTER_SHELL = bash&#10;MASTER_IMAGE_ID = ami-6ae65e02&#10;NODE_IMAGE_ID = ami-3393a45a&#10;NODE_INSTANCE_TYPE = m3.large</pre>
 
#*: <pre>[cluster gccluster]&#10;KEYNAME = mykey&#10;CLUSTER_SIZE = 4&#10;CLUSTER_USER = sgeadmin&#10;CLUSTER_SHELL = bash&#10;MASTER_IMAGE_ID = ami-6ae65e02&#10;NODE_IMAGE_ID = ami-3393a45a&#10;NODE_INSTANCE_TYPE = m3.large</pre>
 
# Start the cluster:
 
# Start the cluster:
# <pre>starcluster start -c gccluster mycluster</pre>
+
#* <pre>starcluster start -c gccluster mycluster</pre>
#* Alternatively, you can change the default template at the start of the configuration file in the <code>[global]</code> section to gccluster: <code>DEFAULT_TEMPLATE=gccluster</code>
+
#** Alternatively, if you can the default template at the start of the configuration file in the <code>[global]</code> section to gccluster: <code>DEFAULT_TEMPLATE=gccluster</code>, you can run:
 +
#*** <pre>starcluster start mycluster</pre>
 +
#* It will take a few minutes for the cluster to start
 
# Logon to the cluster as ubuntu:
 
# Logon to the cluster as ubuntu:
 
#* <pre>starcluster sshmaster -u ubuntu mycluster</pre>
 
#* <pre>starcluster sshmaster -u ubuntu mycluster</pre>
 +
#** Type <code>yes</code> if the terminal asks if you want to continue connecting
 +
 +
{{GotCloud: Amazon Demo Setup|hdr=====}}
 +
 +
==== Run GotCloud SnpCall Demo ====
 
# Run GotCloud snpcall
 
# Run GotCloud snpcall
 
#* <pre>gotcloud snpcall --conf example/test.conf --outdir output --numjobs 8 --batchtype sgei</pre>
 
#* <pre>gotcloud snpcall --conf example/test.conf --outdir output --numjobs 8 --batchtype sgei</pre>
# Run GotCloud indell
+
#** The ubuntu user is setup to have the gotcloud program and tools in its path, so you can just type the program name and it will be found
#* <pre>gotcloud snpcall --conf example/test.conf --outdir output --numjobs 8 --batchtype sgei</pre>
+
#** There is enough space in /home/ubuntu to put the Demo output
 +
#*** /home/ubuntu is visible from all nodes in the cluster
 +
#** This will take a few minutes to run.
 +
#** GotCloud first generates a makefile, and then runs the makefile
 +
#** After a while GotCloud snpcall will print some messages to the screen. This is expected and ok.
 +
# See [[#Monitoring Cluster Usage|Monitoring Cluster Usage]] if you are interested in monitoring the cluster usage as GotCloud runs
 +
# When complete, GotCloud snpcall will indicate success/failure
 +
#* Look at the snpcall results, see: [[GotCloud:_Amazon_Demo#Examining_SnpCall_Output|GotCloud: Amazon Demo -> Examining SnpCall Output]]
 +
 
 +
==== Run GotCloud Indel Demo ====
 +
# Run GotCloud indel
 +
#* <pre>gotcloud indel --conf example/test.conf --outdir output --numjobs 8 --batchtype sgei</pre>
 +
#** The ubuntu user is setup to have the gotcloud program and tools in its path, so you can just type the program name and it will be found
 +
#** There is enough space in /home/ubuntu to put the Demo output
 +
#*** /home/ubuntu is visible from all nodes in the cluster
 +
#** This will take a few minutes to run.
 +
# See [[#Monitoring Cluster Usage|Monitoring Cluster Usage]] if you are interested in monitoring the cluster usage as GotCloud runs
 +
# When complete, GotCloud indel will indicate success/failure
 +
#* Look at the indel results, see: [[GotCloud:_Amazon_Demo#Examining_Indel_Output|GotCloud: Amazon Demo -> Examining Indel Output]]
 +
 
 +
==== Terminate the Demo Cluster ====
 +
# Exit out of your master node
 +
#* <pre>exit</pre>
 
# Terminate the cluster
 
# Terminate the cluster
 +
#* Since this is just a demo, we don't have to worry about the data getting deleted upon termination
 
#* <pre>starcluster terminate mycluster</pre>
 
#* <pre>starcluster terminate mycluster</pre>
 
+
#** Answer <code>y</code> to the questions <code>Terminate EBS cluster mycluster (y/n)? </code>
 
      
== Old Instructions==
 
== Old Instructions==
Line 197: Line 298:  
If your data is visible on each node, you're ready to run the software as described
 
If your data is visible on each node, you're ready to run the software as described
 
in [[GotCloud]].
 
in [[GotCloud]].
  −
== Running GotCloud on StarCluster ==
  −
To tell GotCloud to run data on the StarCluster you have setup, specify the following on your gotcloud command-line:
  −
-batchtype sgei
  −
  −
Alternatively, you can set the following in your configuration file:
  −
BATCH_TYPE = sgei
 

Navigation menu