Difference between revisions of "Amazon Storage"

From Genome Analysis Wiki
Jump to navigationJump to search
(Created page with 'Back to parent [http://genome.sph.umich.edu/wiki/Pipelines] Setting up your storage is perhaps the most difficult step as it is controlled completely by the size of your data. A…')
 
Line 2: Line 2:
  
 
Setting up your storage is perhaps the most difficult step as it is controlled completely by the size of your data.
 
Setting up your storage is perhaps the most difficult step as it is controlled completely by the size of your data.
As a generate rule you will need three times the space required for your sequence data.
+
As a general rule you will need three times the space required for your sequence data.
 
For instance in the 1000 Genomes data, the data for one individual takes about 45G.
 
For instance in the 1000 Genomes data, the data for one individual takes about 45G.
 
If you have 1000 Genome data for nine individuals, you'll need about 1500GB of space (9x450x3 plus a little extra space).
 
If you have 1000 Genome data for nine individuals, you'll need about 1500GB of space (9x450x3 plus a little extra space).
  
Making your data available for the Pipeline can be accomplished in many ways.
+
Making your data available for the Pipeline software can be accomplished in many ways.
 
Here is a simple straightforward organization you might want to use.
 
Here is a simple straightforward organization you might want to use.
  
 +
* Launch your instance and login as explained in the AWS documentation.
 
* Using the AWS EC2 Console Dashboard create one EBS volume (ELASTIC BLOCK STORE -> Volumes) for the sequence data (e.g. 500GB).  
 
* Using the AWS EC2 Console Dashboard create one EBS volume (ELASTIC BLOCK STORE -> Volumes) for the sequence data (e.g. 500GB).  
 
* Using the Dashboard create another EBS volume for the output of the aligner step (e.g. another 500GB).
 
* Using the Dashboard create another EBS volume for the output of the aligner step (e.g. another 500GB).
 
* Using the Dashboard create another EBS volume for the output of the umake step (e.g. another 500GB).
 
* Using the Dashboard create another EBS volume for the output of the umake step (e.g. another 500GB).
  
Configure these EBS volumes so they use separate devices devices g, g and h (e.g. /dev/sdf (probably /dev/xvdf),  
+
Attach the volume to the instance you have just launched and specify as a
/dev/sdg (probably /dev/xvdg) and /dev/sdh (probably /dev/xvdh)).
+
separate device: f, g and h (e.g. /dev/sdf, /dev/sdg and /dev/sdh).
 +
Note: as of this writing you specity a device as sdf, but will actually show up as /dev/xdvf in the instance.
  
Launch your instance and login as explained in the AWS documentation.
+
This first time you need to prepare the disks by formatting and mounting them:
This first time you need to prepare the disks
+
 
 +
<code>
 +
  sudo fdisk -l /dev/xdvg

Revision as of 09:04, 29 October 2012

Back to parent [1]

Setting up your storage is perhaps the most difficult step as it is controlled completely by the size of your data. As a general rule you will need three times the space required for your sequence data. For instance in the 1000 Genomes data, the data for one individual takes about 45G. If you have 1000 Genome data for nine individuals, you'll need about 1500GB of space (9x450x3 plus a little extra space).

Making your data available for the Pipeline software can be accomplished in many ways. Here is a simple straightforward organization you might want to use.

  • Launch your instance and login as explained in the AWS documentation.
  • Using the AWS EC2 Console Dashboard create one EBS volume (ELASTIC BLOCK STORE -> Volumes) for the sequence data (e.g. 500GB).
  • Using the Dashboard create another EBS volume for the output of the aligner step (e.g. another 500GB).
  • Using the Dashboard create another EBS volume for the output of the umake step (e.g. another 500GB).

Attach the volume to the instance you have just launched and specify as a separate device: f, g and h (e.g. /dev/sdf, /dev/sdg and /dev/sdh). Note: as of this writing you specity a device as sdf, but will actually show up as /dev/xdvf in the instance.

This first time you need to prepare the disks by formatting and mounting them:

 sudo fdisk -l /dev/xdvg