Amazon Storage

From Genome Analysis Wiki
Revision as of 09:04, 29 October 2012 by Terry Gliedt (talk | contribs)
Jump to navigationJump to search

Back to parent [1]

Setting up your storage is perhaps the most difficult step as it is controlled completely by the size of your data. As a general rule you will need three times the space required for your sequence data. For instance in the 1000 Genomes data, the data for one individual takes about 45G. If you have 1000 Genome data for nine individuals, you'll need about 1500GB of space (9x450x3 plus a little extra space).

Making your data available for the Pipeline software can be accomplished in many ways. Here is a simple straightforward organization you might want to use.

  • Launch your instance and login as explained in the AWS documentation.
  • Using the AWS EC2 Console Dashboard create one EBS volume (ELASTIC BLOCK STORE -> Volumes) for the sequence data (e.g. 500GB).
  • Using the Dashboard create another EBS volume for the output of the aligner step (e.g. another 500GB).
  • Using the Dashboard create another EBS volume for the output of the umake step (e.g. another 500GB).

Attach the volume to the instance you have just launched and specify as a separate device: f, g and h (e.g. /dev/sdf, /dev/sdg and /dev/sdh). Note: as of this writing you specity a device as sdf, but will actually show up as /dev/xdvf in the instance.

This first time you need to prepare the disks by formatting and mounting them:

 sudo fdisk -l /dev/xdvg