Line 1: |
Line 1: |
| + | __TOC__ |
| + | |
| Back to the beginning: [[GotCloud]] | | Back to the beginning: [[GotCloud]] |
| | | |
| Back to [[GotCloud: Amazon]] | | Back to [[GotCloud: Amazon]] |
− |
| |
− | <!-- BANNER ACROSS TOP OF PAGE -->
| |
− | {| style="width:100%; background:#ffb6c1; margin-top:1.2em; border:1px solid #ccc;" |
| |
− | | style="width:100%; text-align:center; white-space:nowrap; color:#000;" |
| |
− | <div style="font-size:162%; border:none; margin:0; padding:.1em; color:#000;">This page is in the process of being updated...(10/29/14)</div>
| |
− | |}
| |
− |
| |
| | | |
| If you have access to your own cluster, your task will be much simpler. | | If you have access to your own cluster, your task will be much simpler. |
Line 23: |
Line 18: |
| of the many variations you might choose, but should provide you a working example. | | of the many variations you might choose, but should provide you a working example. |
| | | |
− | == Tasks to be completed ==
| |
− |
| |
− | * Install the ec2 tools package (ec2-api-tools for Ubuntu) on your machine (optional)
| |
− | * Install and configure starcluster on your machine (required)
| |
− | ** Note: gotcloud requires a 64bit machine
| |
− | ** Please use <code>NODE_IMAGE_ID = ami-765b3e1f</code>
| |
− | * Create an EBS volume based on the GotCloud snapshot
| |
− | * Configure StarCluster to use the volume just created
| |
− | * Create an AWS cluster
| |
− | * Create storage for your sequence data and make it available for the software
| |
− | * Run the GotCloud software
| |
| | | |
| == Getting Started With StarCluster == | | == Getting Started With StarCluster == |
− | StarCluster provides lots of documentation. | + | StarCluster provides lots of documentation (http://star.mit.edu/cluster/) which will provide more information on it than we have here. |
| | | |
| To install and setup StarCluster for the first time, you can follow the QuickStart instructions: http://star.mit.edu/cluster/docs/latest/quickstart.html | | To install and setup StarCluster for the first time, you can follow the QuickStart instructions: http://star.mit.edu/cluster/docs/latest/quickstart.html |
Line 45: |
Line 29: |
| *** If you need help setting up your AWS credentials, see: [[AWS Credentials]] | | *** If you need help setting up your AWS credentials, see: [[AWS Credentials]] |
| | | |
− | You can skip the cluster start section if you want. | + | You can skip actually starting the cluster in the QuickStart instructions if you want. |
| | | |
| ''' Troubleshooting: ''' When I tried this, the <code>starcluster start mycluster</code> step failed similar to: | | ''' Troubleshooting: ''' When I tried this, the <code>starcluster start mycluster</code> step failed similar to: |
Line 59: |
Line 43: |
| ''' Don't forget to terminate your cluster:''' | | ''' Don't forget to terminate your cluster:''' |
| starcluster terminate mycluster | | starcluster terminate mycluster |
| + | |
| | | |
| == StarCluster and GotCloud == | | == StarCluster and GotCloud == |
| + | === StarCluster Config Settings === |
| + | |
| By default, StarCluster expects a configuration file in ~/.starcluster/config. | | By default, StarCluster expects a configuration file in ~/.starcluster/config. |
| * StarCluster will create a model file for you | | * StarCluster will create a model file for you |
Line 72: |
Line 59: |
| * You should have set these in [[#Getting Started With StarCluster|Getting Started With StarCluster]] above (quickstart guide and AWS Credentials) . | | * You should have set these in [[#Getting Started With StarCluster|Getting Started With StarCluster]] above (quickstart guide and AWS Credentials) . |
| | | |
− | '''GotCloud settings:'''
| + | * GotCloud Cluster Definition |
− | * You may want to Create a new cluster description for running GotCloud (or you can use smallcluster) | + | ** You may want to create a new cluster section for running GotCloud (or you can use smallcluster) in your configuration file: <code>~/.starcluster/config</code> |
− | * Use the GotCloud AMIs: | + | ** You can call it anything you want, for example, <pre>gccluster</pre> |
− | MASTER_IMAGE_ID = ami-6ae65e02
| + | ** Example: |
− | NODE_IMAGE_ID = ami-3393a45a
| + | **: <pre>[cluster gccluster] KEYNAME = mykey CLUSTER_SIZE = 4 CLUSTER_USER = sgeadmin CLUSTER_SHELL = bash MASTER_IMAGE_ID = ami-6ae65e02 NODE_IMAGE_ID = ami-3393a45a NODE_INSTANCE_TYPE = m3.large</pre> |
− | * We do not recommend running GotCloud on machines with less than 4MB of memory | + | *** Set <code>KEYNAME</code> to the key you want to use |
| + | *** Set <code>CLUSTER_SIZE</code> to the number of nodes you want to start up (this may be different than 4) |
| + | *** Set <code>CLUSTER_USER</code> to add additional users, like <code>sgeadmin</code> |
| + | *** Set <code>CLUSTER_SHELL</code> to define the shell you want to use, like <code>bash</code> |
| + | *** Set <code>MASTER_IMAGE_ID</code> to the latest GotCloud AMI, see: [[GotCloud: AMIs]] |
| + | **** Contains GotCloud, the reference, and the demo files in the /home/ubuntu/ directory that will be visible on all nodes in the cluster |
| + | **** Has a 30G volume, but only 6G available |
| + | *** Set <code>NODE_IMAGE_ID</code> to a StarCluster <code>ubuntu x86_64</code> AMI |
| + | **** Since each node does not need its own 30G volume containing GotCloud, the reference, and the demo files, we use a separate image for the nodes. |
| + | *** The nodes can just access the master's copy of GotCloud, the reference, and the Amazon demo |
| + | *** Set <code>NODE_INSTANCE_TYPE</code> to the type of instances you want to start in your cluster |
| + | **** See http://aws.amazon.com/ec2/pricing/ for instance descriptions and prices |
| + | **** We do not recommend running GotCloud on machines with less than 4MB of memory |
| + | ** The <code>CLUSTER_SIZE</code> * CPUs in <code>NODE_INSTANCE_TYPE</code> = the number of jobs you can run concurrently in GotCloud |
| + | |
| + | * Define Data Volumes |
| + | ** By default, the GotCloud AMI contains about 5G of extra space that you can use |
| + | *** /home/ubuntu/ directory is visible from all machines |
| + | **** Use /home/ubuntu/ for the output directory if it is <5G |
| + | **** This directory will be deleted when you terminate the AMI |
| + | ** Create your Own Volumes and attach them to the GotCloud cluster |
| + | *** '''Instructions TBD''' |
| + | |
| + | === Starting the Cluster === |
| + | # Start the cluster: |
| + | #* <pre>starcluster start -c gccluster mycluster</pre> |
| + | #** Alternatively, if you can the default template at the start of the configuration file in the <code>[global]</code> section to gccluster: <code>DEFAULT_TEMPLATE=gccluster</code>, you can run: |
| + | #*** <pre>starcluster start mycluster</pre> |
| + | #* It will take a few minutes for the cluster to start |
| + | |
| + | |
| + | === Copying Data to/from the Cluster === |
| + | Copy data onto the cluster (command run from your local machine) |
| + | starcluster put /path/to/local/file/or/dir /remote/path/ |
| + | |
| + | |
| + | Pull the data from the cluster onto your local machine (command run from your local machine) |
| + | starcluster get /path/to/remote/file/or/dir /local/path/ |
| + | |
| + | '''Reminder, if you write your output to /home/ubuntu/, it will be deleted when you terminate the cluster''' |
| + | |
| + | |
| + | === Running GotCloud on StarCluster === |
| + | * If you have not already, logon to the cluster as ubuntu: |
| + | ** <pre>starcluster sshmaster -u ubuntu mycluster</pre> |
| + | *** Type <code>yes</code> if the terminal asks if you want to continue connecting |
| + | * When running GotCloud: |
| + | ** Set the cluster/batch type in either configuration or on the command line: |
| + | *** In Configuration: |
| + | ***: <pre>BATCH_TYPE = sgei</pre> |
| + | *** On the command-line: |
| + | ***: <pre>--batchtype sgei</pre> |
| + | ** Set the number of jobs to run: |
| + | **: <pre>--numjobs #</pre> |
| + | *** Replace number with the number of concurrent jobs you want to run (probably <code>CLUSTER_SIZE</code> * CPUs in <code>NODE_INSTANCE_TYPE</code>) |
| + | ** Otherwise, run GotCloud as you normally would. |
| + | |
| + | |
| + | To login to a specific non-master node, do: |
| + | starcluster sshnode -u ubuntu mycluster node001 |
| | | |
− | [[GotCloud: AMIs]] | + | === Monitoring Cluster Usage === |
− |
| + | * Monitor jobs in the queue |
− | === Run GotCloud Demo Using StarCluster === | + | ** <pre>qstat</pre> |
− | Be sure to set:
| + | ** This will show you how the currently running jobs and how they are spread across the nodes in your cluster |
− | MASTER_IMAGE_ID = ami-6ae65e02
| + | **:[[File:Qstat.png|800px]] |
− | NODE_IMAGE_ID = ami-3393a45a
| + | *** state descriptions: |
| + | **** <code>qw</code> : queued and waiting (not yet assigned to a node) |
| + | **** <code>r</code> : running |
| + | * View Sun Grid Engine Load |
| + | ** <pre>qhost</pre> |
| + | **:[[File:Qhost.png|600px]] |
| + | *** ARCH : architecture |
| + | *** NCPU : number of CPUs |
| + | *** LOAD : current load |
| + | *** MEMTOT : total memory |
| + | *** MEMUSE : memory in use |
| + | *** SWAPTO : swap space |
| + | *** SWAPUS : swap space in use |
| + | * View the average load per node using: |
| + | ** <pre>qstat -f</pre> |
| + | **:[[File:Qstatf.png|650px]] |
| + | *** <code>load_avg</code> field contains the load average for each node |
| + | |
| + | |
| + | === Terminate the Cluster === |
| + | # Reminder, check if you need to copy any data off of the cluster that will be deleted upon termination |
| + | #* [[#Copying Data to/from the Cluster|Copying Data to/from the Cluster]] |
| + | # Terminate the cluster |
| + | #* <pre>starcluster terminate mycluster</pre> |
| + | |
| + | |
| + | == Run GotCloud Demo Using StarCluster == |
| | | |
| #Create a new cluster section in your configuration file: <code>~/.starcluster/config</code> | | #Create a new cluster section in your configuration file: <code>~/.starcluster/config</code> |
Line 90: |
Line 162: |
| #*: <pre>[cluster gccluster] KEYNAME = mykey CLUSTER_SIZE = 4 CLUSTER_USER = sgeadmin CLUSTER_SHELL = bash MASTER_IMAGE_ID = ami-6ae65e02 NODE_IMAGE_ID = ami-3393a45a NODE_INSTANCE_TYPE = m3.large</pre> | | #*: <pre>[cluster gccluster] KEYNAME = mykey CLUSTER_SIZE = 4 CLUSTER_USER = sgeadmin CLUSTER_SHELL = bash MASTER_IMAGE_ID = ami-6ae65e02 NODE_IMAGE_ID = ami-3393a45a NODE_INSTANCE_TYPE = m3.large</pre> |
| # Start the cluster: | | # Start the cluster: |
− | # <pre>starcluster start -c gccluster mycluster</pre> | + | #* <pre>starcluster start -c gccluster mycluster</pre> |
− | #* Alternatively, you can change the default template at the start of the configuration file in the <code>[global]</code> section to gccluster: <code>DEFAULT_TEMPLATE=gccluster</code> | + | #** Alternatively, if you can the default template at the start of the configuration file in the <code>[global]</code> section to gccluster: <code>DEFAULT_TEMPLATE=gccluster</code>, you can run: |
| + | #*** <pre>starcluster start mycluster</pre> |
| + | #* It will take a few minutes for the cluster to start |
| # Logon to the cluster as ubuntu: | | # Logon to the cluster as ubuntu: |
| #* <pre>starcluster sshmaster -u ubuntu mycluster</pre> | | #* <pre>starcluster sshmaster -u ubuntu mycluster</pre> |
| + | #** Type <code>yes</code> if the terminal asks if you want to continue connecting |
| + | |
| + | {{GotCloud: Amazon Demo Setup|hdr=====}} |
| + | |
| + | ==== Run GotCloud SnpCall Demo ==== |
| # Run GotCloud snpcall | | # Run GotCloud snpcall |
| #* <pre>gotcloud snpcall --conf example/test.conf --outdir output --numjobs 8 --batchtype sgei</pre> | | #* <pre>gotcloud snpcall --conf example/test.conf --outdir output --numjobs 8 --batchtype sgei</pre> |
− | # Run GotCloud indell | + | #** The ubuntu user is setup to have the gotcloud program and tools in its path, so you can just type the program name and it will be found |
− | #* <pre>gotcloud snpcall --conf example/test.conf --outdir output --numjobs 8 --batchtype sgei</pre> | + | #** There is enough space in /home/ubuntu to put the Demo output |
| + | #*** /home/ubuntu is visible from all nodes in the cluster |
| + | #** This will take a few minutes to run. |
| + | #** GotCloud first generates a makefile, and then runs the makefile |
| + | #** After a while GotCloud snpcall will print some messages to the screen. This is expected and ok. |
| + | # See [[#Monitoring Cluster Usage|Monitoring Cluster Usage]] if you are interested in monitoring the cluster usage as GotCloud runs |
| + | # When complete, GotCloud snpcall will indicate success/failure |
| + | #* Look at the snpcall results, see: [[GotCloud:_Amazon_Demo#Examining_SnpCall_Output|GotCloud: Amazon Demo -> Examining SnpCall Output]] |
| + | |
| + | ==== Run GotCloud Indel Demo ==== |
| + | # Run GotCloud indel |
| + | #* <pre>gotcloud indel --conf example/test.conf --outdir output --numjobs 8 --batchtype sgei</pre> |
| + | #** The ubuntu user is setup to have the gotcloud program and tools in its path, so you can just type the program name and it will be found |
| + | #** There is enough space in /home/ubuntu to put the Demo output |
| + | #*** /home/ubuntu is visible from all nodes in the cluster |
| + | #** This will take a few minutes to run. |
| + | # See [[#Monitoring Cluster Usage|Monitoring Cluster Usage]] if you are interested in monitoring the cluster usage as GotCloud runs |
| + | # When complete, GotCloud indel will indicate success/failure |
| + | #* Look at the indel results, see: [[GotCloud:_Amazon_Demo#Examining_Indel_Output|GotCloud: Amazon Demo -> Examining Indel Output]] |
| + | |
| + | ==== Terminate the Demo Cluster ==== |
| + | # Exit out of your master node |
| + | #* <pre>exit</pre> |
| # Terminate the cluster | | # Terminate the cluster |
| + | #* Since this is just a demo, we don't have to worry about the data getting deleted upon termination |
| #* <pre>starcluster terminate mycluster</pre> | | #* <pre>starcluster terminate mycluster</pre> |
− | | + | #** Answer <code>y</code> to the questions <code>Terminate EBS cluster mycluster (y/n)? </code> |
− | | |
| | | |
| == Old Instructions== | | == Old Instructions== |
Line 197: |
Line 298: |
| If your data is visible on each node, you're ready to run the software as described | | If your data is visible on each node, you're ready to run the software as described |
| in [[GotCloud]]. | | in [[GotCloud]]. |
− |
| |
− | == Running GotCloud on StarCluster ==
| |
− | To tell GotCloud to run data on the StarCluster you have setup, specify the following on your gotcloud command-line:
| |
− | -batchtype sgei
| |
− |
| |
− | Alternatively, you can set the following in your configuration file:
| |
− | BATCH_TYPE = sgei
| |