Difference between revisions of "StarCluster"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 19: Line 19:
 
* Install the ec2 tools package (ec2-api-tools for Ubuntu) on your machine (optional)
 
* Install the ec2 tools package (ec2-api-tools for Ubuntu) on your machine (optional)
 
* Install and configure starcluster on your machine (required)
 
* Install and configure starcluster on your machine (required)
 +
** Note: gotcloud requires a 64bit machine
 +
** Please use <code>NODE_IMAGE_ID = ami-765b3e1f</code>
 
* Create an EBS volume based on the GotCloud snapshot
 
* Create an EBS volume based on the GotCloud snapshot
 
* Configure StarCluster to use the volume just created
 
* Configure StarCluster to use the volume just created
Line 50: Line 52:
 
  AWS_USER_ID=199998888709  
 
  AWS_USER_ID=199998888709  
 
   
 
   
  AWS_REGION_NAME = us-west-2                 # Choose your own region
+
  AWS_REGION_NAME = us-east-1                 # Choose your own region
  AWS_REGION_HOST = ec2.us-west-2.amazonaws.com
+
  AWS_REGION_HOST = ec2.us-east-1.amazonaws.com
  AWS_S3_HOST = s3-us-west-2.amazonaws.com
+
  AWS_S3_HOST = s3-us-east-1.amazonaws.com
 
   
 
   
 
  ###########################
 
  ###########################
 
  ## EC2 Keypairs
 
  ## EC2 Keypairs
 
  ###########################
 
  ###########################
  [key <font color='green'>west2_starcluster</font>]
+
  [key <font color='green'>east1_starcluster</font>]
  KEY_LOCATION = ~/.ssh/AWS/west2_starcluster_key.rsa  # Same region
+
  KEY_LOCATION = ~/.ssh/AWS/east1_starcluster_key.rsa  # Same region
 
   
 
   
 
  ###########################################
 
  ###########################################
 
  ## Define Cluster
 
  ## Define Cluster
  ##  starcluster start -c west2_starcluster nameichose4cluster
+
  ##  starcluster start -c east1_starcluster nameichose4cluster
 
  ###########################################
 
  ###########################################
 
  [cluster <font color='red'>myexample</font>]          # Name of this cluster definition
 
  [cluster <font color='red'>myexample</font>]          # Name of this cluster definition
  KEYNAME = <font color='green'>west2_starcluster</font>                # Name of keys I need
+
  KEYNAME = <font color='green'>east1_starcluster</font>                # Name of keys I need
 
  CLUSTER_SIZE = 4                            # Number of nodes
 
  CLUSTER_SIZE = 4                            # Number of nodes
 
  CLUSTER_SHELL = bash
 
  CLUSTER_SHELL = bash
Line 71: Line 73:
 
  #  Choose the base AMI using  starcluster listpublic
 
  #  Choose the base AMI using  starcluster listpublic
 
  #  (http://star.mit.edu/cluster/docs/0.93.3/faq.html)
 
  #  (http://star.mit.edu/cluster/docs/0.93.3/faq.html)
  NODE_IMAGE_ID = ami-c6bd30f6
+
  NODE_IMAGE_ID = ami-765b3e1f
  AVAILABILITY_ZONE = us-west-2a              # Region again!
+
  AVAILABILITY_ZONE = us-east-1              # Region again!
 
  NODE_INSTANCE_TYPE = m1.medium              # 4G memory should work for Pipeline
 
  NODE_INSTANCE_TYPE = m1.medium              # 4G memory should work for Pipeline
 
   
 
   

Revision as of 14:45, 2 July 2013

Back to the beginning: GotCloud

If you have access to your own cluster, your task will be much simpler. Install the Pipeline software (links at GotCloud) and run it as descibed on the same pages.

For those who are not so lucky to have access to a cluster, Amazon Web Services (AWS) provides an alternative. You may run the pipeline software on a cluster created in AWS. One tool that makes the creation of a cluster of AMIs (Amazon Machine Instances) is StarCluster (see http://star.mit.edu/cluster/).

The following shows an example of how you might use StarCluster to create and AWS cluster and set it up to run the Pipeline. There are many details setting up starcluster and this is not intended to explain all of the many variations you might choose, but should provide you a working example.

The tasks to be completed are:

  • Install the ec2 tools package (ec2-api-tools for Ubuntu) on your machine (optional)
  • Install and configure starcluster on your machine (required)
    • Note: gotcloud requires a 64bit machine
    • Please use NODE_IMAGE_ID = ami-765b3e1f
  • Create an EBS volume based on the GotCloud snapshot
  • Configure StarCluster to use the volume just created
  • Create an AWS cluster
  • Create storage for your sequence data and make it available for the software
  • Run the Pipeline software

Installing and configuring StarCluster on your machine is described at http://star.mit.edu/cluster/.

StarCluster Configuration Example

StarCluster creates a model configuration file in ~/.starcluster/config and you are instructed to edit this and set the correct values for the variables. Here is a highly simplified example of a config file that should work. Please note there are many things you might want to choose, so craft the config file with care. You'll need to specify nodes with 4GB of memory (type m1.medium) and make sure each node has access to the input and output data for the step being run.

####################################
## StarCluster Configuration File ##
####################################
[global]
DEFAULT_TEMPLATE=myexample

#############################################
## AWS Credentials Settings
#############################################
[aws info]
AWS_ACCESS_KEY_ID = AKImyexample8FHJJF2Q
AWS_SECRET_ACCESS_KEY = fthis_was_my_example_secretMqkMIkJjFCIGf
AWS_USER_ID=199998888709 

AWS_REGION_NAME = us-east-1                 # Choose your own region
AWS_REGION_HOST = ec2.us-east-1.amazonaws.com
AWS_S3_HOST = s3-us-east-1.amazonaws.com

###########################
## EC2 Keypairs
###########################
[key east1_starcluster]
KEY_LOCATION = ~/.ssh/AWS/east1_starcluster_key.rsa   # Same region

###########################################
## Define Cluster
##   starcluster start -c east1_starcluster  nameichose4cluster
###########################################
[cluster myexample]          # Name of this cluster definition
KEYNAME = east1_starcluster                 # Name of keys I need
CLUSTER_SIZE = 4                            # Number of nodes
CLUSTER_SHELL = bash

#  Choose the base AMI using   starcluster listpublic
#   (http://star.mit.edu/cluster/docs/0.93.3/faq.html)
NODE_IMAGE_ID = ami-765b3e1f
AVAILABILITY_ZONE = us-east-1               # Region again!
NODE_INSTANCE_TYPE = m1.medium              # 4G memory should work for Pipeline

VOLUMES = gotcloud, mydata
[volume mydata]
VOLUME_ID = vol-6e729657
MOUNT_PATH = /mydata

[volume gotcloud]
VOLUME_ID = vol-56071570
MOUNT_PATH = /gotcloud


Create Your Cluster

 starcluster start -c myexample myseq-example
 StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
 Software Tools for Academics and Researchers (STAR)
 Please submit bug reports to starcluster@mit.edu

 >>> Validating cluster template settings...
 >>> Cluster template settings are valid
 >>> Starting cluster...
     [lines deleted]
 >>> Mounting EBS volume vol-32273514 on /gotcloud...
 >>> Mounting EBS volume vol-36788522 on /mydata...
     [lines deleted]

When this completes, you are ready to run the GotCloud software on your data. Make sure you have defined and mounted volumes for your sequence data and the output steps of the aligner and umake. These volumes (as well as /gotcloud) should be available on each node.

 starcluster sshmaster myseq-example
 StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
 Software Tools for Academics and Researchers (STAR)
   [lines deleted]

 df -h
 ssh node001 df -h

If your data is visible on each node, you're ready to run the software as described in GotCloud.