StarCluster

From Genome Analysis Wiki
Revision as of 11:03, 29 October 2012 by Terry Gliedt (talk | contribs)
Jump to navigationJump to search

Back to the beginning [1]

If you have access to your own cluster, your task will be much simpler. Install the Pipeline software (links at [2]) and run it as descibed on the same pages.

For those who are not so lucky to have access to a cluster, AWS provides an alternative. You may run the pipeline software on a cluster created in AWS. One tool that makes the creation of a cluster of AMIs (Amazon Machine Instances) is StarCluster (see http://star.mit.edu/cluster/).

The following shows an example of how you might use starcluster to create and AWS cluster and set it up to run the Pipeline.

We will use starcluster to launch a set of AWS instances. There are many details setting up starcluster and this is not intended to explain all of the many variations you might choose, but should provide you a working example.

The tasks to be completed are:

  • Install and configure starcluster on a machine you use.
  • Create an AWS cluster
  • Install the Pipeline software on the master node
  • Create storage for your sequence data and make it available for the software
  • Run the Pipeline software

Installing and configuring starcluster on your machine is described at http://star.mit.edu/cluster/. Only the second step will be covered here, as the others are described at [3].


StarCluster Configuration Example

StarCluster creates a model configuration file in ~/.starcluster/config and you are instructed to edit this and set the correct values for the variables. Here is a highly simplified example of a config file that should work. Please note there are many things you might want to choose, so craft the starcluster config file with care.

####################################
## StarCluster Configuration File ##
####################################
[global]
DEFAULT_TEMPLATE=myexample

#############################################
## AWS Credentials Settings
#############################################
[aws info]
AWS_ACCESS_KEY_ID = AKImyexample8FHJJF2Q
AWS_SECRET_ACCESS_KEY = fthis_was_my_example_secretMqkMIkJjFCIGf
AWS_USER_ID=199998888709 

AWS_REGION_NAME = us-west-2                 # Choose your own region
AWS_REGION_HOST = ec2.us-west-2.amazonaws.com
AWS_S3_HOST = s3-us-west-2.amazonaws.com

###########################
## EC2 Keypairs
###########################
[key west2_starcluster]
KEY_LOCATION = ~/.ssh/AWS/west2_starcluster_key.rsa   # Same region

###########################################
## Define Cluster
##   starcluster start -c west2_starcluster  nameichose4cluster
###########################################
[cluster myexample]
KEYNAME = west2_starcluster                 # Name I chose
CLUSTER_SIZE = 8                            # Number of nodes
CLUSTER_SHELL = bash

#  This is the 64 bit AMI from starcluster
NODE_IMAGE_ID = ami-c6bd50f6
AVAILABILITY_ZONE = us-west-2a              # Region again!
NODE_INSTANCE_TYPE = m1.large               # 8G memory should work for Pipeline