Difference between revisions of "Amazon Storage"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 8: Line 8:
 
Making your data available for the Pipeline software can be accomplished in many ways.
 
Making your data available for the Pipeline software can be accomplished in many ways.
 
Here is a simple straightforward organization you might want to use.
 
Here is a simple straightforward organization you might want to use.
 +
 +
'''Create Volumes'''
  
 
* Launch your instance and login as explained in the AWS documentation.
 
* Launch your instance and login as explained in the AWS documentation.
Line 19: Line 21:
 
Note: as of this writing if you specify a device as sdf, it will actually show up as /dev/xvdf in the instance.
 
Note: as of this writing if you specify a device as sdf, it will actually show up as /dev/xvdf in the instance.
  
This first time you need to prepare the disks by formatting and mounting them:
+
We suggest you create storage volumes and use each one for each particular kind of data:
 +
 
 +
* /dev/xvdf for sequence data
 +
* /dev/xvdg for aligner output data
 +
* /dev/xvdh for umake output data
 +
 
 +
'''Prepare and Attach Volumes'''
 +
 
 +
This first time you need to prepare the disks by formatting and mounting them.
 +
Realize this step destroys the data on each volume, so be careful which volume you are working on.
  
 
<code>
 
<code>
  '''sudo fdisk -l /dev/xdvg'''          # Do not continue until this works
+
'''sudo fdisk -l /dev/xdvf'''          # Do not continue until this works
     Disk /dev/xvdg: 536.9 GB, 536870912000 bytes
+
     Disk /dev/xvdf: 536.9 GB, 536870912000 bytes
    255 heads, 63 sectors/track, 65270 cylinders, total 1048576000 sectors
+
      [lines deleted]
    Units = sectors of 1 * 512 = 512 bytes
+
     Disk /dev/xvdf doesn't contain a valid partition table   # This is OK
    Sector size (logical/physical): 512 bytes / 512 bytes
 
    I/O size (minimum/optimal): 512 bytes / 512 bytes
 
    Disk identifier: 0x00000000
 
 
     Disk /dev/xvdg doesn't contain a valid partition table
 
  
   '''sudo mkfs -t ext4 /dev/xvdg'''
+
#   Device exists, good. Format it, destroying any data there, so be sure of the device name.
 +
'''sudo mkfs -t ext4 -L seq /dev/xvdf'''
 
     mke2fs 1.42 (29-Nov-2011)
 
     mke2fs 1.42 (29-Nov-2011)
     Filesystem label=
+
     Filesystem label=seq
    OS type: Linux
+
      [lines deleted]
    Block size=4096 (log=2)
 
    Fragment size=4096 (log=2)
 
    Stride=0 blocks, Stripe width=0 blocks
 
    32768000 inodes, 131072000 blocks
 
    6553600 blocks (5.00%) reserved for the super user
 
    First data block=0
 
    Maximum filesystem blocks=4294967296
 
    4000 block groups
 
    32768 blocks per group, 32768 fragments per group
 
    8192 inodes per group
 
    Superblock backups stored on blocks:
 
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
 
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
 
        102400000
 
 
     Allocating group tables: done                             
 
     Allocating group tables: done                             
 
     Writing inode tables: done                             
 
     Writing inode tables: done                             
Line 55: Line 48:
 
     Writing superblocks and filesystem accounting information: done     
 
     Writing superblocks and filesystem accounting information: done     
  
 +
*  Repeat these steps for the other volumes
 +
'''sudo fdisk -l /dev/xdvg'''
 +
'''sudo mkfs -t ext4 -L aligner /dev/xvdg'''
  
 
+
'''sudo fdisk -l /dev/xdvh'''
 
+
'''sudo mkfs -t ext4 -L umake /dev/xvdh'''
  sudo fdisk -l /dev/xdvg          # Do not continue until this works
 
  sudo mkfs -t ext4 /dev/xdvg
 
 
</code>
 
</code>

Revision as of 09:20, 29 October 2012

Back to parent [1]

Setting up your storage is perhaps the most difficult step as it is controlled completely by the size of your data. As a general rule you will need three times the space required for your sequence data. For instance in the 1000 Genomes data, the data for one individual takes about 45G. If you have 1000 Genome data for nine individuals, you'll need about 1500GB of space (9x450x3 plus a little extra space).

Making your data available for the Pipeline software can be accomplished in many ways. Here is a simple straightforward organization you might want to use.

Create Volumes

  • Launch your instance and login as explained in the AWS documentation.
  • Using the AWS EC2 Console Dashboard create one EBS volume (ELASTIC BLOCK STORE -> Volumes) for the sequence data (e.g. 500GB).
  • Using the Dashboard create another EBS volume for the output of the aligner step (e.g. another 500GB).
  • Using the Dashboard create another EBS volume for the output of the umake step (e.g. another 500GB).

Attach the volume to the instance you have just launched and specify as a separate device: f, g and h (e.g. /dev/sdf, /dev/sdg and /dev/sdh). It'll take a few minutes for the volume to show up in your instance. Note: as of this writing if you specify a device as sdf, it will actually show up as /dev/xvdf in the instance.

We suggest you create storage volumes and use each one for each particular kind of data:

  • /dev/xvdf for sequence data
  • /dev/xvdg for aligner output data
  • /dev/xvdh for umake output data

Prepare and Attach Volumes

This first time you need to prepare the disks by formatting and mounting them. Realize this step destroys the data on each volume, so be careful which volume you are working on.

sudo fdisk -l /dev/xdvf # Do not continue until this works

   Disk /dev/xvdf: 536.9 GB, 536870912000 bytes
     [lines deleted]
   Disk /dev/xvdf doesn't contain a valid partition table   # This is OK
  1. Device exists, good. Format it, destroying any data there, so be sure of the device name.

sudo mkfs -t ext4 -L seq /dev/xvdf

   mke2fs 1.42 (29-Nov-2011)
   Filesystem label=seq
     [lines deleted]
   Allocating group tables: done                            
   Writing inode tables: done                            
   Creating journal (32768 blocks): done
   Writing superblocks and filesystem accounting information: done     
  • Repeat these steps for the other volumes

sudo fdisk -l /dev/xdvg sudo mkfs -t ext4 -L aligner /dev/xvdg

sudo fdisk -l /dev/xdvh sudo mkfs -t ext4 -L umake /dev/xvdh