Difference between revisions of "Amazon Snapshot"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 53: Line 53:
 
                             Availability
 
                             Availability
 
   Name                        Zone        Snapshot        Size
 
   Name                        Zone        Snapshot        Size
   GotCloud software/refs    us-west-2a    snap-f8c854de  40GB
+
   GotCloud distribution      us-west-2a    snap-f8c854de  40GB
 
</code>
 
</code>
  

Revision as of 10:09, 13 November 2012

Back to the beginning [1]

GotCloud is made available in various forms. In Amazon Web Services the software is made available in a EBS (Elastic Block Store) Snapshot. This is simple a copy of a data volume we have created that has our software and, additionally, some reference files you will find useful. You simply need to create your own EBS volume from our snapshot, mount your new volume and you are ready.

If this does not work or is unacceptable for some reason, you may install the software and reference files wherever you'd like (read about this in from a Debian package.

Your first task is get an AWS account and keys so that you can use the AWS EC2 Console Dashboard (see https://console.aws.amazon.com/ec2/). From here you can launch instances prepared by others or create your own. We cannot assist in this step - Amazon has plenty of documentation. Once you are at the AWS EC2 Console Dashboard, you're almost ready for GotCloud.


Your First Instance

You'll need to know some details when launching an instance:

  • Launch an Instance - use any instance running 64 bit software and

either an Ubuntu of any version or Redhat/CentOS 6.3 distribution.

  • Instance size (memory and number of processors). The pipeline software will require at least 4GB of memory (type m1.medium) and can use as many processors as is available.
  • GotCloud Volume (copy from GotCloud snapshot). We provide an AWS snapshot of a small volume which contains the aligner and umake software and reference files. Your task is to create an EBS volume based on our snapshot and then mount that volume on your instance (see below for more precise details).
  • Storage for the instance refers to the size for root (/) partition. This can be quite small, as little as 8GB should work. Of course if you intend to bring lots of other files/programs to the instance, you may want to increase this to something a bit larger (e.g. 30GB).


Prepare Your Instance

You will also want additional storage volumes for:

  • GotCloud software and reference files
  • Your data
    • Sequence data
    • Output of the aligner
    • Output of umake

The first of these is a small volume based on a snapshot containing the GotCloud files you will need. We provide an AWS snapshot of a small volume which contains the aligner and umake software and reference files. Create an EBS volume based on our snapshot and then mount that volume on your instance. In the EC2 Management Console under ELASTIC BLOCK STORE, select Volumes -> Create Volume. In the prompt supply the size and Snapshot (based on the table below). You may take the defaults for the Volume Type and IOPS.

                            Availability
  Name                         Zone         Snapshot        Size
  GotCloud distribution      us-west-2a     snap-f8c854de   40GB

and create the volume. This will create a device which you need to mount in your instance. This will create a device like /dev/sdf, which unfortunately actually translates to the device /dev/xvdf in your Linux instance. Once the volume is ready, mount it by logging into your instance with ssh and issuing the command:

 sudo mkdir -p /gotcloud
 sudo mount /dev/xvdf  /gotcloud    # or whatever device yours is
 df -h

This will make the GotCloud software available under the path /gotcloud/bin etc. Each time your instance is started, you'll need to mount this volume. You may want to create a small shell script to mount the device.

In Your Data the storage volumes will vary based on what you data you have. The sequence data might already exist, provided a vendor who created the sequence data. If not, you'll have to create a volume for this and copy your data into it. You'll have to mount volumes for all three types of data (sequence, aligner and umake).

You should expect the three data volumes will all need to be the same size. That is, if your sequence data is 300GB, then you'll need an additional 300GB for the aligner output and then another 300GB of storage for the umake output. We suggest you consider making each set of data be separate volumes.

You may also find that your sequence data is too large to be easily handled in one go, so you might choose to only use the aligner/umake on part of your sequence data, capture the files of interest from umake, and then go back and rerun the software with the next bit of sequence data.