Amazon Storage

From Genome Analysis Wiki
Revision as of 08:56, 29 October 2012 by Terry Gliedt (talk | contribs) (Created page with 'Back to parent [http://genome.sph.umich.edu/wiki/Pipelines] Setting up your storage is perhaps the most difficult step as it is controlled completely by the size of your data. A…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Back to parent [1]

Setting up your storage is perhaps the most difficult step as it is controlled completely by the size of your data. As a generate rule you will need three times the space required for your sequence data. For instance in the 1000 Genomes data, the data for one individual takes about 45G. If you have 1000 Genome data for nine individuals, you'll need about 1500GB of space (9x450x3 plus a little extra space).

Making your data available for the Pipeline can be accomplished in many ways. Here is a simple straightforward organization you might want to use.

  • Using the AWS EC2 Console Dashboard create one EBS volume (ELASTIC BLOCK STORE -> Volumes) for the sequence data (e.g. 500GB).
  • Using the Dashboard create another EBS volume for the output of the aligner step (e.g. another 500GB).
  • Using the Dashboard create another EBS volume for the output of the umake step (e.g. another 500GB).

Configure these EBS volumes so they use separate devices devices g, g and h (e.g. /dev/sdf (probably /dev/xvdf),

/dev/sdg (probably /dev/xvdg) and /dev/sdh (probably /dev/xvdh)).

Launch your instance and login as explained in the AWS documentation. This first time you need to prepare the disks