StarCluster
Back to the beginning [1]
If you have access to your own cluster, your task will be much simpler. Install the Pipeline software (links at [2]) and run it as descibed on the same pages.
For those who are not so lucky to have access to a cluster, AWS provides an alternative. You may run the pipeline software on a cluster created in AWS. One tool that makes the creation of a cluster of AMIs (Amazon Machine Instances) is StarCluster (see http://star.mit.edu/cluster/).
The following shows an example of how you might use starcluster to create and AWS cluster and set it up to run the Pipeline.
We will use starcluster to launch a set of AWS instances. There are many details setting up starcluster and this is not intended to explain all of the many variations you might choose, but should provide you a working example.
The tasks to be completed are:
- Install and configure starcluster on a machine you use.
- Create an AWS cluster
- Install the Pipeline software on the master node
- Create storage for your sequence data and make it available for the software
- Run the Pipeline software
Installing and configuring starcluster on your machine is described at http://star.mit.edu/cluster/. Only the second step will be covered here, as the others are described at [3].
StarCluster Configuration Example
StarCluster creates a model configuration file in ~/.starcluster/config and you are instructed to edit this and set the correct values for the variables. Here is a highly simplified example of a config file that should work. Please note there are many things you might want to choose, so craft the starcluster config file with care.
####################################
## StarCluster Configuration File ##
####################################
[global]
DEFAULT_TEMPLATE=myexample
#############################################
## AWS Credentials Settings
#############################################
[aws info]
AWS_ACCESS_KEY_ID = AKImyexample8FHJJF2Q
AWS_SECRET_ACCESS_KEY = fthis_was_my_example_secretMqkMIkJjFCIGf
AWS_USER_ID=199998888709
AWS_REGION_NAME = us-west-2 # Choose your own region
AWS_REGION_HOST = ec2.us-west-2.amazonaws.com
AWS_S3_HOST = s3-us-west-2.amazonaws.com
###########################
## EC2 Keypairs
###########################
[key west2_starcluster]
KEY_LOCATION = ~/.ssh/AWS/west2_starcluster_key.rsa # Same region
###########################################
## Define Cluster
## starcluster start -c west2_starcluster nameichose4cluster
###########################################
[cluster myexample]
KEYNAME = west2_starcluster # Name I chose
CLUSTER_SIZE = 4 # Number of nodes
CLUSTER_SHELL = bash
# Choose the base AMI: starcluster listpublic
# (http://star.mit.edu/cluster/docs/0.93.3/faq.html)
NODE_IMAGE_ID = ami-c6bd30f6
AVAILABILITY_ZONE = us-west-2a # Region again!
NODE_INSTANCE_TYPE = m1.medium # 4G memory should work for Pipeline
Create Your Cluster
starcluster start -c myexample myseq-example
StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu
>>> Validating cluster template settings...
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Launching a 3-node cluster...
>>> Creating security group @sc-myseq-example...
Reservation:r-c3b4f6f0
>>> Waiting for cluster to come up... (updating every 30s)
>>> Waiting for all nodes to be in a 'running' state...
3/3 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for SSH to come up on all nodes...
3/3 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for cluster to come up took 1.282 mins
>>> The master node is ec2-50-112-230-67.us-west-2.compute.amazonaws.com
>>> Setting up the cluster...
>>> Configuring hostnames...
3/3 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Creating cluster user: None (uid: 1001, gid: 1001)
3/3 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring scratch space for user(s): sgeadmin
3/3 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring /etc/hosts on each node
3/3 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Starting NFS server on master
>>> Configuring NFS exports path(s):
/home
>>> Mounting all NFS export path(s) on 2 worker node(s)
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Setting up NFS took 0.096 mins
>>> Configuring passwordless ssh for root
>>> Configuring passwordless ssh for sgeadmin
>>> Shutting down threads...
20/20 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring SGE...
>>> Configuring NFS exports path(s):
/opt/sge6
>>> Mounting all NFS export path(s) on 2 worker node(s)
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Setting up NFS took 0.048 mins
>>> Installing Sun Grid Engine...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Creating SGE parallel environment 'orte'
3/3 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Adding parallel environment 'orte' to queue 'all.q'
>>> Shutting down threads...
20/20 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Running plugin createusers
>>> Creating 2 cluster users
3/3 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring passwordless ssh for 2 cluster users
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring scratch space for user(s): mktrost, tpg
3/3 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Tarring all SSH keys for cluster users...
>>> Copying cluster users SSH keys to: /tmp/myseq-example-us-west-2.tar.gz
/tmp/myseq-example-us-west-2.tar.gz 100% |||||||||||| Time: 00:00:00 0.00 B/s
>>> Configuring cluster took 1.583 mins
>>> Starting cluster took 2.895 mins
The cluster is now ready to use. To login to the master node
as root, run:
$ starcluster sshmaster myseq-example
If you're having issues with the cluster you can reboot the
instances and completely reconfigure the cluster from
scratch using:
$ starcluster restart myseq-example
When you're finished using the cluster and wish to terminate
it and stop paying for service:
$ starcluster terminate myseq-example
Alternatively, if the cluster uses EBS instances, you can
use the 'stop' command to shutdown all nodes and put them
into a 'stopped' state preserving the EBS volumes backing
the nodes:
$ starcluster stop myseq-example
WARNING: Any data stored in ephemeral storage (usually /mnt)
will be lost!
You can activate a 'stopped' cluster by passing the -x
option to the 'start' command:
$ starcluster start -x myseq-example
This will start all 'stopped' nodes and reconfigure the
cluster.
Set Up Master Node, Login as root
starcluster sshmaster myseq-example
StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
Software Tools for Academics and Researchers (STAR)
[lines deleted]
# Install Pipeline software just like says in
mkdir debs
cd debs
wget ftp://share.sph.umich.edu/biopipe/current-align.deb
[lines deleted]
wget ftp://share.sph.umich.edu/biopipe/current-umake.deb
[lines deleted]
dpkg -i debs/current-align*.deb debs/current-umake*amd64.deb
[lines deleted]