Difference between revisions of "StarCluster"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 1: Line 1:
Back to the beginning [http://genome.sph.umich.edu/wiki/Pipelines]
+
Back to the beginning: [[GotCloud]]
  
 
If you have access to your own cluster, your task will be much simpler.
 
If you have access to your own cluster, your task will be much simpler.
Install the Pipeline software (links at [http://genome.sph.umich.edu/wiki/Pipelines])
+
Install the Pipeline software (links at [[GotCloud]])
 
and run it as descibed on the same pages.
 
and run it as descibed on the same pages.
  

Revision as of 13:42, 15 November 2012

Back to the beginning: GotCloud

If you have access to your own cluster, your task will be much simpler. Install the Pipeline software (links at GotCloud) and run it as descibed on the same pages.

For those who are not so lucky to have access to a cluster, AWS provides an alternative. You may run the pipeline software on a cluster created in AWS. One tool that makes the creation of a cluster of AMIs (Amazon Machine Instances) is StarCluster (see http://star.mit.edu/cluster/).

The following shows an example of how you might use starcluster to create and AWS cluster and set it up to run the Pipeline.

We will use starcluster to launch a set of AWS instances. There are many details setting up starcluster and this is not intended to explain all of the many variations you might choose, but should provide you a working example.

The tasks to be completed are:

  • Install and configure starcluster on a machine you use.
  • Create an AWS cluster
  • Install the Pipeline software on the master node
  • Create storage for your sequence data and make it available for the software
  • Run the Pipeline software

Installing and configuring starcluster on your machine is described at http://star.mit.edu/cluster/. Only the second step will be covered here, as the others are described at [1].


StarCluster Configuration Example

StarCluster creates a model configuration file in ~/.starcluster/config and you are instructed to edit this and set the correct values for the variables. Here is a highly simplified example of a config file that should work. Please note there are many things you might want to choose, so craft the starcluster config file with care.

####################################
## StarCluster Configuration File ##
####################################
[global]
DEFAULT_TEMPLATE=myexample

#############################################
## AWS Credentials Settings
#############################################
[aws info]
AWS_ACCESS_KEY_ID = AKImyexample8FHJJF2Q
AWS_SECRET_ACCESS_KEY = fthis_was_my_example_secretMqkMIkJjFCIGf
AWS_USER_ID=199998888709 

AWS_REGION_NAME = us-west-2                 # Choose your own region
AWS_REGION_HOST = ec2.us-west-2.amazonaws.com
AWS_S3_HOST = s3-us-west-2.amazonaws.com

###########################
## EC2 Keypairs
###########################
[key west2_starcluster]
KEY_LOCATION = ~/.ssh/AWS/west2_starcluster_key.rsa   # Same region

###########################################
## Define Cluster
##   starcluster start -c west2_starcluster  nameichose4cluster
###########################################
[cluster myexample]
KEYNAME = west2_starcluster                 # Name I chose
CLUSTER_SIZE = 4                            # Number of nodes
CLUSTER_SHELL = bash

#  Choose the base AMI:  starcluster listpublic
#   (http://star.mit.edu/cluster/docs/0.93.3/faq.html)
NODE_IMAGE_ID = ami-c6bd30f6
AVAILABILITY_ZONE = us-west-2a              # Region again!
NODE_INSTANCE_TYPE = m1.medium              # 4G memory should work for Pipeline


Create Your Cluster

 starcluster start -c myexample myseq-example
 StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
 Software Tools for Academics and Researchers (STAR)
 Please submit bug reports to starcluster@mit.edu

 >>> Validating cluster template settings...
 >>> Cluster template settings are valid
 >>> Starting cluster...
     [lines deleted]


Set Up Master Node, Login as root

Login as root and set up the Pipeline software like is says in debian package.

 starcluster sshmaster myseq-example
 StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
 Software Tools for Academics and Researchers (STAR)
   [lines deleted]

 mkdir debs
 cd debs
 wget ftp://share.sph.umich.edu/biopipe/current-align.deb
   [lines deleted]
 wget ftp://share.sph.umich.edu/biopipe/current-umake.deb
   [lines deleted]

 dpkg -i debs/current-align*.deb debs/current-umake*.deb
   [lines deleted]