StarCluster

From Genome Analysis Wiki
Revision as of 12:56, 29 October 2012 by Terry Gliedt (talk | contribs)
Jump to navigationJump to search

Back to the beginning [1]

If you have access to your own cluster, your task will be much simpler. Install the Pipeline software (links at [2]) and run it as descibed on the same pages.

For those who are not so lucky to have access to a cluster, AWS provides an alternative. You may run the pipeline software on a cluster created in AWS. One tool that makes the creation of a cluster of AMIs (Amazon Machine Instances) is StarCluster (see http://star.mit.edu/cluster/).

The following shows an example of how you might use starcluster to create and AWS cluster and set it up to run the Pipeline.

We will use starcluster to launch a set of AWS instances. There are many details setting up starcluster and this is not intended to explain all of the many variations you might choose, but should provide you a working example.

The tasks to be completed are:

  • Install and configure starcluster on a machine you use.
  • Create an AWS cluster
  • Install the Pipeline software on the master node
  • Create storage for your sequence data and make it available for the software
  • Run the Pipeline software

Installing and configuring starcluster on your machine is described at http://star.mit.edu/cluster/. Only the second step will be covered here, as the others are described at [3].


StarCluster Configuration Example

StarCluster creates a model configuration file in ~/.starcluster/config and you are instructed to edit this and set the correct values for the variables. Here is a highly simplified example of a config file that should work. Please note there are many things you might want to choose, so craft the starcluster config file with care.

####################################
## StarCluster Configuration File ##
####################################
[global]
DEFAULT_TEMPLATE=myexample

#############################################
## AWS Credentials Settings
#############################################
[aws info]
AWS_ACCESS_KEY_ID = AKImyexample8FHJJF2Q
AWS_SECRET_ACCESS_KEY = fthis_was_my_example_secretMqkMIkJjFCIGf
AWS_USER_ID=199998888709 

AWS_REGION_NAME = us-west-2                 # Choose your own region
AWS_REGION_HOST = ec2.us-west-2.amazonaws.com
AWS_S3_HOST = s3-us-west-2.amazonaws.com

###########################
## EC2 Keypairs
###########################
[key west2_starcluster]
KEY_LOCATION = ~/.ssh/AWS/west2_starcluster_key.rsa   # Same region

###########################################
## Define Cluster
##   starcluster start -c west2_starcluster  nameichose4cluster
###########################################
[cluster myexample]
KEYNAME = west2_starcluster                 # Name I chose
CLUSTER_SIZE = 4                            # Number of nodes
CLUSTER_SHELL = bash

#  Choose the base AMI:  starcluster listpublic
#   (http://star.mit.edu/cluster/docs/0.93.3/faq.html)
NODE_IMAGE_ID = ami-c6bd30f6
AVAILABILITY_ZONE = us-west-2a              # Region again!
NODE_INSTANCE_TYPE = m1.medium              # 4G memory should work for Pipeline


Create Your Cluster

 starcluster start -c myexample myseq-example
 StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
 Software Tools for Academics and Researchers (STAR)
 Please submit bug reports to starcluster@mit.edu

 >>> Validating cluster template settings...
 >>> Cluster template settings are valid
 >>> Starting cluster...
 >>> Launching a 3-node cluster...
 >>> Creating security group @sc-myseq-example...
 Reservation:r-c3b4f6f0
 >>> Waiting for cluster to come up... (updating every 30s)
 >>> Waiting for all nodes to be in a 'running' state...
 3/3 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%  
 >>> Waiting for SSH to come up on all nodes...
 3/3 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%  
 >>> Waiting for cluster to come up took 1.282 mins
 >>> The master node is ec2-50-112-230-67.us-west-2.compute.amazonaws.com
 >>> Setting up the cluster...
 >>> Configuring hostnames...
 3/3 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%  
 >>> Creating cluster user: None (uid: 1001, gid: 1001)
 3/3 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%  
 >>> Configuring scratch space for user(s): sgeadmin
 3/3 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%  
 >>> Configuring /etc/hosts on each node
 3/3 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%  
 >>> Starting NFS server on master
 >>> Configuring NFS exports path(s):
 /home
 >>> Mounting all NFS export path(s) on 2 worker node(s)
 2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%  
 >>> Setting up NFS took 0.096 mins
 >>> Configuring passwordless ssh for root
 >>> Configuring passwordless ssh for sgeadmin
 >>> Shutting down threads...
 20/20 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%  
 >>> Configuring SGE...
 >>> Configuring NFS exports path(s):
 /opt/sge6
 >>> Mounting all NFS export path(s) on 2 worker node(s)
 2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%  
 >>> Setting up NFS took 0.048 mins
 >>> Installing Sun Grid Engine...
 2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%  
 >>> Creating SGE parallel environment 'orte'
 3/3 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%  
 >>> Adding parallel environment 'orte' to queue 'all.q'
 >>> Shutting down threads...
 20/20 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%  
 >>> Running plugin createusers
 >>> Creating 2 cluster users
 3/3 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%  
 >>> Configuring passwordless ssh for 2 cluster users
 2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%  
 >>> Configuring scratch space for user(s): mktrost, tpg
 3/3 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%  
 >>> Tarring all SSH keys for cluster users...
 >>> Copying cluster users SSH keys to: /tmp/myseq-example-us-west-2.tar.gz
 /tmp/myseq-example-us-west-2.tar.gz 100% |||||||||||| Time: 00:00:00   0.00 B/s
 >>> Configuring cluster took 1.583 mins
 >>> Starting cluster took 2.895 mins
 The cluster is now ready to use. To login to the master node
 as root, run:
     $ starcluster sshmaster myseq-example
 If you're having issues with the cluster you can reboot the
 instances and completely reconfigure the cluster from
 scratch using:
     $ starcluster restart myseq-example
 When you're finished using the cluster and wish to terminate
 it and stop paying for service:
     $ starcluster terminate myseq-example
 Alternatively, if the cluster uses EBS instances, you can
 use the 'stop' command to shutdown all nodes and put them
 into a 'stopped' state preserving the EBS volumes backing
 the nodes:
     $ starcluster stop myseq-example
 WARNING: Any data stored in ephemeral storage (usually /mnt)
 will be lost!
 You can activate a 'stopped' cluster by passing the -x
 option to the 'start' command:
     $ starcluster start -x myseq-example
 This will start all 'stopped' nodes and reconfigure the
 cluster.


Set Up Master Node, Login as root

 starcluster sshmaster myseq-example
 StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
 Software Tools for Academics and Researchers (STAR)
   [lines deleted]
 #   Install Pipeline software just like says in 
 mkdir debs
 cd debs
 wget ftp://share.sph.umich.edu/biopipe/current-align.deb
   [lines deleted]
 wget ftp://share.sph.umich.edu/biopipe/current-umake.deb
   [lines deleted]
 dpkg -i debs/current-align*.deb debs/current-umake*amd64.deb
   [lines deleted]