Difference between revisions of "Amazon Storage"

Revision as of 09:20, 29 October 2012

Back to parent [1]

Setting up your storage is perhaps the most difficult step as it is controlled completely by the size of your data. As a general rule you will need three times the space required for your sequence data. For instance in the 1000 Genomes data, the data for one individual takes about 45G. If you have 1000 Genome data for nine individuals, you'll need about 1500GB of space (9x450x3 plus a little extra space).

Making your data available for the Pipeline software can be accomplished in many ways. Here is a simple straightforward organization you might want to use.

Create Volumes

Launch your instance and login as explained in the AWS documentation.
Using the AWS EC2 Console Dashboard create one EBS volume (ELASTIC BLOCK STORE -> Volumes) for the sequence data (e.g. 500GB).
Using the Dashboard create another EBS volume for the output of the aligner step (e.g. another 500GB).
Using the Dashboard create another EBS volume for the output of the umake step (e.g. another 500GB).

Attach the volume to the instance you have just launched and specify as a separate device: f, g and h (e.g. /dev/sdf, /dev/sdg and /dev/sdh). It'll take a few minutes for the volume to show up in your instance. Note: as of this writing if you specify a device as sdf, it will actually show up as /dev/xvdf in the instance.

We suggest you create storage volumes and use each one for each particular kind of data:

/dev/xvdf for sequence data
/dev/xvdg for aligner output data
/dev/xvdh for umake output data

Prepare and Attach Volumes

This first time you need to prepare the disks by formatting and mounting them. Realize this step destroys the data on each volume, so be careful which volume you are working on.

sudo fdisk -l /dev/xdvf # Do not continue until this works

   Disk /dev/xvdf: 536.9 GB, 536870912000 bytes
     [lines deleted]
   Disk /dev/xvdf doesn't contain a valid partition table   # This is OK

Device exists, good. Format it, destroying any data there, so be sure of the device name.

sudo mkfs -t ext4 -L seq /dev/xvdf

   mke2fs 1.42 (29-Nov-2011)
   Filesystem label=seq
     [lines deleted]
   Allocating group tables: done                            
   Writing inode tables: done                            
   Creating journal (32768 blocks): done
   Writing superblocks and filesystem accounting information: done

Repeat these steps for the other volumes

sudo fdisk -l /dev/xdvg
sudo mkfs -t ext4 -L aligner /dev/xvdg

sudo fdisk -l /dev/xdvh sudo mkfs -t ext4 -L umake /dev/xvdh

@@ Line 8: / Line 8: @@
 Making your data available for the Pipeline software can be accomplished in many ways.
 Here is a simple straightforward organization you might want to use.
+'''Create Volumes'''
 * Launch your instance and login as explained in the AWS documentation.
@@ Line 19: / Line 21: @@
 Note: as of this writing if you specify a device as sdf, it will actually show up as /dev/xvdf in the instance.
-This first time you need to prepare the disks by formatting and mounting them:
+We suggest you create storage volumes and use each one for each particular kind of data:
+* /dev/xvdf for sequence data
+* /dev/xvdg for aligner output data
+* /dev/xvdh for umake output data
+'''Prepare and Attach Volumes'''
+This first time you need to prepare the disks by formatting and mounting them.
+Realize this step destroys the data on each volume, so be careful which volume you are working on.
 <code>
-  '''sudo fdisk -l /dev/xdvg'''          # Do not continue until this works
+'''sudo fdisk -l /dev/xdvf'''          # Do not continue until this works
-     Disk /dev/xvdg: 536.9 GB, 536870912000 bytes
+     Disk /dev/xvdf: 536.9 GB, 536870912000 bytes
-heads, 63 sectors/track, 65270 cylinders, total 1048576000 sectors
+      [lines deleted]
-    Units = sectors of 1 * 512 = 512 bytes
+     Disk /dev/xvdf doesn't contain a valid partition table   # This is OK
-    Sector size (logical/physical): 512 bytes / 512 bytes
-    I/O size (minimum/optimal): 512 bytes / 512 bytes
-    Disk identifier: 0x00000000
-     Disk /dev/xvdg doesn't contain a valid partition table
-   '''sudo mkfs -t ext4 /dev/xvdg'''
+#   Device exists, good. Format it, destroying any data there, so be sure of the device name.
+'''sudo mkfs -t ext4 -L seq /dev/xvdf'''
      mke2fs 1.42 (29-Nov-2011)
-     Filesystem label=
+     Filesystem label=seq
-    OS type: Linux
+      [lines deleted]
-    Block size=4096 (log=2)
-    Fragment size=4096 (log=2)
-    Stride=0 blocks, Stripe width=0 blocks
-    32768000 inodes, 131072000 blocks
-    6553600 blocks (5.00%) reserved for the super user
-    First data block=0
-    Maximum filesystem blocks=4294967296
-block groups
-blocks per group, 32768 fragments per group
-inodes per group
-    Superblock backups stored on blocks:
-, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
-        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
-        102400000
      Allocating group tables: done
      Writing inode tables: done
@@ Line 55: / Line 48: @@
      Writing superblocks and filesystem accounting information: done
+*   Repeat these steps for the other volumes
+'''sudo fdisk -l /dev/xdvg'''
+'''sudo mkfs -t ext4 -L aligner /dev/xvdg'''
+'''sudo fdisk -l /dev/xdvh'''
+'''sudo mkfs -t ext4 -L umake /dev/xvdh'''
-  sudo fdisk -l /dev/xdvg          # Do not continue until this works
-  sudo mkfs -t ext4 /dev/xdvg
 </code>

Difference between revisions of "Amazon Storage"

Revision as of 09:20, 29 October 2012

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools