Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,114 bytes added ,  16:49, 5 November 2014
no edit summary
Line 1: Line 1: −
Back to the beginning at [http://genome.sph.umich.edu/wiki/Pipelines]
+
Back to the beginning at [[GotCloud]]
    
Setting up your storage is perhaps the most difficult step as it is controlled completely by the size of your data.
 
Setting up your storage is perhaps the most difficult step as it is controlled completely by the size of your data.
 
As a general rule you will need three times the space required for your sequence data.
 
As a general rule you will need three times the space required for your sequence data.
 
For instance in the 1000 Genomes data, the data for one individual takes about 45G.
 
For instance in the 1000 Genomes data, the data for one individual takes about 45G.
If you have 1000 Genome data for nine individuals, you'll need about 1500GB of space (9x450x3 plus a little extra space).
+
If you have 1000 Genome data for nine individuals, you'll need about 1500GB of space (9x45x3 plus a little extra space).
    
Making your data available for the Pipeline software can be accomplished in many ways.
 
Making your data available for the Pipeline software can be accomplished in many ways.
 
Here is a simple straightforward organization you might want to use.
 
Here is a simple straightforward organization you might want to use.
    +
===Making Use of Instance Storage===
 +
Some instances provide storage.  By default they are not added to your instance.  You need to set them up prior to launching your instance.
   −
'''Create Volumes'''
+
I found instructions at: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html#InstanceStore_UsageScenarios
 +
* referred me to: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/block-device-mapping-concepts.html#Using_OverridingAMIBDM
 +
 
 +
'''Make sure you add the instance store prior to launching'''
 +
* After launching, only one of the 2 instance stores was mounted at /mnt
 +
** To mount the other instance store, I followed the instructions in the '''To make a volume available''' section of: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-add-volume-to-instance.html
 +
 
 +
 
 +
===Create Volumes===
    
* Launch your instance and login as explained in the AWS documentation.
 
* Launch your instance and login as explained in the AWS documentation.
Line 29: Line 39:       −
'''Prepare and Attach Volumes'''
+
====Prepare and Attach Volumes====
    
This first time you need to prepare the disks by formatting and mounting them.
 
This first time you need to prepare the disks by formatting and mounting them.
Line 80: Line 90:       −
'''Getting Your Sequence Data'''
+
===Getting Your Sequence Data===
    
Now it's time to get your sequence data so you can run the Pipeline on it.
 
Now it's time to get your sequence data so you can run the Pipeline on it.
Line 108: Line 118:  
   Files created
 
   Files created
 
   Completed bucket 'seq/data/HG01112' in 378.38 min
 
   Completed bucket 'seq/data/HG01112' in 378.38 min
 
+
 
   [lines deleted]
 
   [lines deleted]
 
</code>
 
</code>
Line 115: Line 125:  
By now you know that getting your sequence data can take a long time.
 
By now you know that getting your sequence data can take a long time.
 
When this completes you are finally ready to run the Pipeline software.
 
When this completes you are finally ready to run the Pipeline software.
 +
 +
In one case we copied 387GB of data from 1000 Genomes to our own EBS volume.
 +
This took about 60 hours (longer actually because the copy failed and had to be restarted).
 +
For an m1.medium instance (any sized instance can be used for this step) this cost about $20 (Oct 2012).
 +
The cost for the 500GB EBS volume where the 1000 Genomes data was copied is very low ($0.50/month).

Navigation menu