StarCluster

From Genome Analysis Wiki
Revision as of 10:53, 29 October 2012 by Terry Gliedt (talk | contribs)
Jump to navigationJump to search

Back to the beginning [1]

If you have access to your own cluster, your task will be much simpler. Install the Pipeline software (links at [2]) and run it as descibed on the same pages.

For those who are not so lucky to have access to a cluster, AWS provides an alternative. You may run the pipeline software on a cluster created in AWS. One tool that makes the creation of a cluster of AMIs (Amazon Machine Instances) is StarCluster (see http://star.mit.edu/cluster/).

The following shows an example of how you might use starcluster to create and AWS cluster and set it up to run the Pipeline.

We will use starcluster to launch a set of AWS instances. There are many details setting up starcluster and this is not intended to explain all of the many variations you might choose, but should provide you a working example.

The tasks to be completed are:

  • Install and configure starcluster on a machine you use.
  • Create an AWS cluster
  • Install the Pipeline software on the master node
  • Create storage for your sequence data and make it available for the software
  • Run the Pipeline software

Installing and configuring starcluster on your machine is described at http://star.mit.edu/cluster/. Only the second step will be covered here, as the others are described at [3].


StarCluster Configuration Example

StarCluster creates a model configuration file in ~/.starcluster/config and you are instructed to edit this and set the correct values for the variables. Here is an example of a config file that we used (with some details changed of course).

    1. StarCluster Configuration File ##

[global] DEFAULT_TEMPLATE=xxx

    1. AWS Credentials Settings

[aws info] AWS_ACCESS_KEY_ID = AKImyexample8FHJJF2Q AWS_SECRET_ACCESS_KEY = fthis_was_my_example_secretMqkMIkJjFCIGf AWS_USER_ID=199998888709

AWS_REGION_NAME = us-west-2 # Choose your own region AWS_REGION_HOST = ec2.us-west-2.amazonaws.com AWS_S3_HOST = s3-us-west-2.amazonaws.com

    1. EC2 Keypairs

[key east1_starcluster] KEY_LOCATION = ~/.ssh/AWS/east1_starcluster_key.rsa

[key west1_starcluster] KEY_LOCATION = ~/.ssh/AWS/west1_starcluster_key.rsa

[key west2_starcluster] KEY_LOCATION = ~/.ssh/AWS/west2_starcluster_key.rsa


  1. Configure the default cluster template to use when starting a cluster
  2. defaults to 'smallcluster' defined below. This template should be usable
  3. out-of-the-box provided you've configured your keypair correctly

DEFAULT_TEMPLATE=smallcluster

  1. enable experimental features for this release
  2. ENABLE_EXPERIMENTAL=True
  3. number of seconds to wait when polling instances (default: 30s)
  4. REFRESH_INTERVAL=15
  5. specify a web browser to launch when viewing spot history plots
  6. WEB_BROWSER=chromium
  7. split the config into multiple files
  8. INCLUDE=~/.starcluster/aws, ~/.starcluster/keys, ~/.starcluster/vols
    1. AWS Credentials and Connection Settings ##

[aws info]

  1. This is the AWS credentials section (required).
  2. These settings apply to all clusters
  3. replace these with your AWS keys

AWS_ACCESS_KEY_ID = #your_aws_access_key_id AWS_SECRET_ACCESS_KEY = #your_secret_access_key

  1. replace this with your account number

AWS_USER_ID= #your userid

  1. Uncomment to specify a different Amazon AWS region (OPTIONAL)
  2. (defaults to us-east-1 if not specified)
  3. NOTE: AMIs have to be migrated!
  4. AWS_REGION_NAME = eu-west-1
  5. AWS_REGION_HOST = ec2.eu-west-1.amazonaws.com
  6. Uncomment these settings when creating an instance-store (S3) AMI (OPTIONAL)
  7. EC2_CERT = /path/to/your/cert-asdf0as9df092039asdfi02089.pem
  8. EC2_PRIVATE_KEY = /path/to/your/pk-asdfasd890f200909.pem
  9. Uncomment these settings to use a proxy host when connecting to AWS
  10. AWS_PROXY = your.proxyhost.com
  11. AWS_PROXY_PORT = 8080
  12. AWS_PROXY_USER = yourproxyuser
  13. AWS_PROXY_PASS = yourproxypass
    1. Defining EC2 Keypairs ##
  1. Sections starting with "key" define your keypairs. See "starcluster createkey
  2. --help" for instructions on how to create a new keypair. Section name should
  3. match your key name e.g.:

[key mykey] KEY_LOCATION=~/.ssh/mykey.rsa

  1. You can of course have multiple keypair sections
  2. [key myotherkey]
  3. KEY_LOCATION=~/.ssh/myotherkey.rsa
    1. Defining Cluster Templates ##
  1. Sections starting with "cluster" represent a cluster template. These
  2. "templates" are a collection of settings that define a single cluster
  3. configuration and are used when creating and configuring a cluster. You can
  4. change which template to use when creating your cluster using the -c option
  5. to the start command:
  6. $ starcluster start -c mediumcluster mycluster
  7. If a template is not specified then the template defined by DEFAULT_TEMPLATE
  8. in the [global] section above is used. Below is the "default" template named
  9. "smallcluster". You can rename it but dont forget to update the
  10. DEFAULT_TEMPLATE setting in the [global] section above. See the next section
  11. on defining multiple templates.

[cluster smallcluster]

  1. change this to the name of one of the keypair sections defined above

KEYNAME = mykey

  1. number of ec2 instances to launch

CLUSTER_SIZE = 2

  1. create the following user on the cluster

CLUSTER_USER = sgeadmin

  1. optionally specify shell (defaults to bash)
  2. (options: tcsh, zsh, csh, bash, ksh)

CLUSTER_SHELL = bash

  1. AMI to use for cluster nodes. These AMIs are for the us-east-1 region.
  2. Use the 'listpublic' command to list StarCluster AMIs in other regions
  3. The base i386 StarCluster AMI is ami-899d49e0
  4. The base x86_64 StarCluster AMI is ami-999d49f0
  5. The base HVM StarCluster AMI is ami-4583572c

NODE_IMAGE_ID = ami-899d49e0

  1. instance type for all cluster nodes
  2. (options: cg1.4xlarge, c1.xlarge, m1.small, c1.medium, m2.xlarge, t1.micro, cc1.4xlarge, m1.medium, cc2.8xlarge, m1.large, m1.xlarge, m2.4xlarge, m2.2xlarge)

NODE_INSTANCE_TYPE = m1.small

  1. Uncomment to disable installing/configuring a queueing system on the
  2. cluster (SGE)
  3. DISABLE_QUEUE=True
  4. Uncomment to specify a different instance type for the master node (OPTIONAL)
  5. (defaults to NODE_INSTANCE_TYPE if not specified)
  6. MASTER_INSTANCE_TYPE = m1.small
  7. Uncomment to specify a separate AMI to use for the master node. (OPTIONAL)
  8. (defaults to NODE_IMAGE_ID if not specified)
  9. MASTER_IMAGE_ID = ami-899d49e0 (OPTIONAL)
  10. availability zone to launch the cluster in (OPTIONAL)
  11. (automatically determined based on volumes (if any) or
  12. selected by Amazon if not specified)
  13. AVAILABILITY_ZONE = us-east-1c
  14. list of volumes to attach to the master node (OPTIONAL)
  15. these volumes, if any, will be NFS shared to the worker nodes
  16. see "Configuring EBS Volumes" below on how to define volume sections
  17. VOLUMES = oceandata, biodata
  18. list of plugins to load after StarCluster's default setup routines (OPTIONAL)
  19. see "Configuring StarCluster Plugins" below on how to define plugin sections
  20. PLUGINS = myplugin, myplugin2
  21. list of permissions (or firewall rules) to apply to the cluster's security
  22. group (OPTIONAL).
  23. PERMISSIONS = ssh, http
  24. Uncomment to always create a spot cluster when creating a new cluster from
  25. this template. The following example will place a $0.50 bid for each spot
  26. request.
  27. SPOT_BID = 0.50
    1. Defining Additional Cluster Templates ##
  1. You can also define multiple cluster templates. You can either supply all
  2. configuration options as with smallcluster above, or create an
  3. EXTENDS=<cluster_name> variable in the new cluster section to use all
  4. settings from <cluster_name> as defaults. Below are example templates that
  5. use the EXTENDS feature:
  1. [cluster mediumcluster]
  2. Declares that this cluster uses smallcluster as defaults
  3. EXTENDS=smallcluster
  4. This section is the same as smallcluster except for the following settings:
  5. KEYNAME=myotherkey
  6. NODE_INSTANCE_TYPE = c1.xlarge
  7. CLUSTER_SIZE=8
  8. VOLUMES = biodata2
  1. [cluster largecluster]
  2. Declares that this cluster uses mediumcluster as defaults
  3. EXTENDS=mediumcluster
  4. This section is the same as mediumcluster except for the following variables:
  5. CLUSTER_SIZE=16
    1. Configuring EBS Volumes ##
  1. StarCluster can attach one or more EBS volumes to the master and then
  2. NFS_share these volumes to all of the worker nodes. A new [volume] section
  3. must be created for each EBS volume you wish to use with StarCluser. The
  4. section name is a tag for your volume. This tag is used in the VOLUMES
  5. setting of a cluster template to declare that an EBS volume is to be mounted
  6. and nfs shared on the cluster. (see the commented VOLUMES setting in the
  7. example 'smallcluster' template above) Below are some examples of defining
  8. and configuring EBS volumes to be used with StarCluster:
  1. Sections starting with "volume" define your EBS volumes
  2. [volume biodata]
  3. attach vol-c9999999 to /home on master node and NFS-shre to worker nodes
  4. VOLUME_ID = vol-c999999
  5. MOUNT_PATH = /home
  1. Same volume as above, but mounts to different location
  2. [volume biodata2]
  3. VOLUME_ID = vol-c999999
  4. MOUNT_PATH = /opt/
  1. Another volume example
  2. [volume oceandata]
  3. VOLUME_ID = vol-d7777777
  4. MOUNT_PATH = /mydata
  1. By default StarCluster will attempt first to mount the entire volume device,
  2. failing that it will try the first partition. If you have more than one
  3. partition you will need to set the PARTITION number, e.g.:
  4. [volume oceandata]
  5. VOLUME_ID = vol-d7777777
  6. MOUNT_PATH = /mydata
  7. PARTITION = 2
    1. Configuring Security Group Permissions ##
  1. Sections starting with "permission" define security group rules to
  2. automatically apply to newly created clusters. PROTOCOL in the following
  3. examples can be can be: tcp, udp, or icmp. CIDR_IP defaults to 0.0.0.0/0 or
  4. "open to the # world"
  1. open port 80 on the cluster to the world
  2. [permission http]
  3. PROTOCOL = tcp
  4. FROM_PORT = 80
  5. TO_PORT = 80
  1. open https on the cluster to the world
  2. [permission https]
  3. PROTOCOL = tcp
  4. FROM_PORT = 443
  5. TO_PORT = 443
  1. open port 80 on the cluster to an ip range using CIDR_IP
  2. [permission http]
  3. PROTOCOL = tcp
  4. FROM_PORT = 80
  5. TO_PORT = 80
  6. CIDR_IP = 18.0.0.0/8
  1. restrict ssh access to a single ip address (<your_ip>)
  2. [permission ssh]
  3. PROTOCOL = tcp
  4. FROM_PORT = 22
  5. TO_PORT = 22
  6. CIDR_IP = <your_ip>/32


    1. Configuring StarCluster Plugins ##
  1. Sections starting with "plugin" define a custom python class which perform
  2. additional configurations to StarCluster's default routines. These plugins
  3. can be assigned to a cluster template to customize the setup procedure when
  4. starting a cluster from this template (see the commented PLUGINS setting in
  5. the 'smallcluster' template above). Below is an example of defining a user
  6. plugin called 'myplugin':
  1. [plugin myplugin]
  2. NOTE: myplugin module must either live in ~/.starcluster/plugins or be
  3. on your PYTHONPATH
  4. SETUP_CLASS = myplugin.SetupClass
  5. extra settings are passed as __init__ arguments to your plugin:
  6. SOME_PARAM_FOR_MY_PLUGIN = 1
  7. SOME_OTHER_PARAM = 2
    1. Built-in Plugins ##
  1. The following plugins ship with StarCluster and should work out-of-the-box.
  2. Uncomment as needed. Don't forget to update your PLUGINS list!
  3. See http://web.mit.edu/star/cluster/docs/latest/plugins for plugin details.
  4. Use this plugin to install one or more packages on all nodes
  5. [plugin pkginstaller]
  6. SETUP_CLASS = starcluster.plugins.pkginstaller.PackageInstaller
  7. # list of apt-get installable packages
  8. PACKAGES = mongodb, python-pymongo
  9. Use this plugin to create one or more cluster users and download all user ssh
  10. keys to $HOME/.starcluster/user_keys/<cluster>-<region>.tar.gz
  11. [plugin createusers]
  12. SETUP_CLASS = starcluster.plugins.users.CreateUsers
  13. NUM_USERS = 30
  14. # you can also comment out NUM_USERS and specify exact usernames, e.g.
  15. # usernames = linus, tux, larry
  16. DOWNLOAD_KEYS = True
  17. Use this plugin to configure the Condor queueing system
  18. [plugin condor]
  19. SETUP_CLASS = starcluster.plugins.condor.CondorPlugin
  20. The SGE plugin is enabled by default and not strictly required. Only use this
  21. if you want to tweak advanced settings in which case you should also set
  22. DISABLE_QUEUE=TRUE in your cluster template. See the plugin doc for more
  23. details.
  24. [plugin sge]
  25. SETUP_CLASS = starcluster.plugins.sge.SGEPlugin
  26. MASTER_IS_EXEC_HOST = False
  27. The IPCluster plugin configures a parallel IPython cluster with optional
  28. web notebook support. This allows you to run Python code in parallel with low
  29. latency message passing via ZeroMQ.
  30. [plugin ipcluster]
  31. SETUP_CLASS = starcluster.plugins.ipcluster.IPCluster
  32. ENABLE_NOTEBOOK = True
  33. #set a password for the notebook for increased security
  34. NOTEBOOK_PASSWD = a-secret-password
  35. Use this plugin to create a cluster SSH "dashboard" using tmux. The plugin
  36. creates a tmux session on the master node that automatically connects to all
  37. the worker nodes over SSH. Attaching to the session shows a separate window
  38. for each node and each window is logged into the node via SSH.
  39. [plugin tmux]
  40. SETUP_CLASS = starcluster.plugins.tmux.TmuxControlCenter
  41. Use this plugin to change the default MPI implementation on the
  42. cluster from OpenMPI to MPICH2.
  43. [plugin mpich2]
  44. SETUP_CLASS = starcluster.plugins.mpich2.MPICH2Setup
  45. Configure a hadoop cluster. (includes dumbo setup)
  46. [plugin hadoop]
  47. SETUP_CLASS = starcluster.plugins.hadoop.Hadoop
  48. Configure a distributed MySQL Cluster
  49. [plugin mysqlcluster]
  50. SETUP_CLASS = starcluster.plugins.mysql.MysqlCluster
  51. NUM_REPLICAS = 2
  52. DATA_MEMORY = 80M
  53. INDEX_MEMORY = 18M
  54. DUMP_FILE = test.sql
  55. DUMP_INTERVAL = 60
  56. DEDICATED_QUERY = True
  57. NUM_DATA_NODES = 2
  58. Install and setup an Xvfb server on each cluster node
  59. [plugin xvfb]
  60. SETUP_CLASS = starcluster.plugins.xvfb.XvfbSetup