Changes

1,114 bytes added , 16:49, 5 November 2014

no edit summary

Line 1: Line 1: −

Back to the beginning at [~~http://genome.sph.umich.edu/wiki/Pipelines~~]

+

Back to the beginning at [[GotCloud]]

Setting up your storage is perhaps the most difficult step as it is controlled completely by the size of your data.

As a general rule you will need three times the space required for your sequence data.

For instance in the 1000 Genomes data, the data for one individual takes about 45G.

−

If you have 1000 Genome data for nine individuals, you'll need about 1500GB of space (~~9x450x3~~ plus a little extra space).

+

If you have 1000 Genome data for nine individuals, you'll need about 1500GB of space (9x45x3 plus a little extra space).

Making your data available for the Pipeline software can be accomplished in many ways.

Here is a simple straightforward organization you might want to use.

+

===Making Use of Instance Storage===

+

Some instances provide storage. By default they are not added to your instance. You need to set them up prior to launching your instance.

−

'''~~Create Volumes~~'''

+

I found instructions at: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html#InstanceStore_UsageScenarios

+

* referred me to: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/block-device-mapping-concepts.html#Using_OverridingAMIBDM

+

'''Make sure you add the instance store prior to launching'''

+

* After launching, only one of the 2 instance stores was mounted at /mnt

+

** To mount the other instance store, I followed the instructions in the '''To make a volume available''' section of: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-add-volume-to-instance.html

+

===Create Volumes===

* Launch your instance and login as explained in the AWS documentation.

Line 29: Line 39: −

~~'''~~Prepare and Attach Volumes~~'''~~

+

====Prepare and Attach Volumes====

This first time you need to prepare the disks by formatting and mounting them.

Line 80: Line 90: −

~~'''~~Getting Your Sequence Data~~'''~~

+

===Getting Your Sequence Data===

Now it's time to get your sequence data so you can run the Pipeline on it.

Line 108: Line 118:

Files created

Completed bucket 'seq/data/HG01112' in 378.38 min

−

+

[lines deleted]

</code>

Line 115: Line 125:

By now you know that getting your sequence data can take a long time.

When this completes you are finally ready to run the Pipeline software.

+

In one case we copied 387GB of data from 1000 Genomes to our own EBS volume.

+

This took about 60 hours (longer actually because the copy failed and had to be restarted).

+

For an m1.medium instance (any sized instance can be used for this step) this cost about $20 (Oct 2012).

+

The cost for the 500GB EBS volume where the 1000 Genomes data was copied is very low ($0.50/month).

Mktrost

Administrators

3,045

edits

Changes

Amazon Storage (view source)

Revision as of 16:49, 5 November 2014

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools