Line 1: |
Line 1: |
− | Back to the beginning at [http://genome.sph.umich.edu/wiki/Pipelines] | + | Back to the beginning at [[GotCloud]] |
| | | |
| Setting up your storage is perhaps the most difficult step as it is controlled completely by the size of your data. | | Setting up your storage is perhaps the most difficult step as it is controlled completely by the size of your data. |
| As a general rule you will need three times the space required for your sequence data. | | As a general rule you will need three times the space required for your sequence data. |
| For instance in the 1000 Genomes data, the data for one individual takes about 45G. | | For instance in the 1000 Genomes data, the data for one individual takes about 45G. |
− | If you have 1000 Genome data for nine individuals, you'll need about 1500GB of space (9x450x3 plus a little extra space). | + | If you have 1000 Genome data for nine individuals, you'll need about 1500GB of space (9x45x3 plus a little extra space). |
| | | |
| Making your data available for the Pipeline software can be accomplished in many ways. | | Making your data available for the Pipeline software can be accomplished in many ways. |
| Here is a simple straightforward organization you might want to use. | | Here is a simple straightforward organization you might want to use. |
| | | |
| + | ===Making Use of Instance Storage=== |
| + | Some instances provide storage. By default they are not added to your instance. You need to set them up prior to launching your instance. |
| | | |
− | '''Create Volumes''' | + | I found instructions at: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html#InstanceStore_UsageScenarios |
| + | * referred me to: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/block-device-mapping-concepts.html#Using_OverridingAMIBDM |
| + | |
| + | '''Make sure you add the instance store prior to launching''' |
| + | * After launching, only one of the 2 instance stores was mounted at /mnt |
| + | ** To mount the other instance store, I followed the instructions in the '''To make a volume available''' section of: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-add-volume-to-instance.html |
| + | |
| + | |
| + | ===Create Volumes=== |
| | | |
| * Launch your instance and login as explained in the AWS documentation. | | * Launch your instance and login as explained in the AWS documentation. |
Line 29: |
Line 39: |
| | | |
| | | |
− | '''Prepare and Attach Volumes'''
| + | ====Prepare and Attach Volumes==== |
| | | |
| This first time you need to prepare the disks by formatting and mounting them. | | This first time you need to prepare the disks by formatting and mounting them. |
Line 80: |
Line 90: |
| | | |
| | | |
− | '''Getting Your Sequence Data'''
| + | ===Getting Your Sequence Data=== |
| | | |
| Now it's time to get your sequence data so you can run the Pipeline on it. | | Now it's time to get your sequence data so you can run the Pipeline on it. |
Line 108: |
Line 118: |
| Files created | | Files created |
| Completed bucket 'seq/data/HG01112' in 378.38 min | | Completed bucket 'seq/data/HG01112' in 378.38 min |
− | | + | |
| [lines deleted] | | [lines deleted] |
| </code> | | </code> |
Line 115: |
Line 125: |
| By now you know that getting your sequence data can take a long time. | | By now you know that getting your sequence data can take a long time. |
| When this completes you are finally ready to run the Pipeline software. | | When this completes you are finally ready to run the Pipeline software. |
| + | |
| + | In one case we copied 387GB of data from 1000 Genomes to our own EBS volume. |
| + | This took about 60 hours (longer actually because the copy failed and had to be restarted). |
| + | For an m1.medium instance (any sized instance can be used for this step) this cost about $20 (Oct 2012). |
| + | The cost for the 500GB EBS volume where the 1000 Genomes data was copied is very low ($0.50/month). |