Changes

1,999 bytes removed , 13:06, 20 May 2013

no edit summary

Line 1: Line 1:

Back to the beginning: [[GotCloud]]

+

'''We no longer use a snapshot.''' It is very likely that you will need quite a few packages installed

+

so that you can compile your software, access the EC2 application data or access data on S3.

+

It just seemed foolish to not make these software available in an AMI.

+

GotCloud is made available in various forms.

Line 44: Line 50:

* '''Instance size''' (memory and number of processors). The pipeline software will require at least 4GB of memory (''type m1.medium'') and can use as many processors as is available.

−

* '''Storage''' for the instance refers to the size for root (/) partition. This can be quite small, as little as 8GB can work. Of course if you intend to bring ~~lots of~~ other files/programs to the instance, you may ~~want~~ to increase this to something a bit larger (e.g. 30GB).

+

* '''Storage''' for the instance refers to the size for root (/) partition. This can be quite small, as little as 8GB can work. Of course if you intend to bring other files/programs to the instance, you may need to increase this to something a bit larger (e.g. 30GB).

−

* '''Data Storage''' for the aligner or snpcaller will likely be far larger than the system you are creating.

−

~~You'll need to create EBS Volumes for the input and output of the aligner and snpcaller.~~

−

~~These can be quite substantial and because of that we recommend you create separate volumes like this:~~

−

* Your input FASTQ files for the aligner. This might have been done for you by some vendor when they put your FASTQ data on an S3 volume. If so, your vendor will need to provide you with the details of how to access your FASTQ files.

−

* The output of the aligner (BAM files)

−

* ~~The intermediate files of~~ the SNP caller

+

* '''Data Storage''' for the aligner or SNP caller (see below)

Line 61: Line 59:

You will also want additional storage volumes for:

−

* ~~GotCloud software and reference~~ files

+

* '''Local Storage''' for the instance refers to the size for root (/) partition. This can be quite small, as little as 8GB can work. Of course if you intend to bring other files/programs to the instance, you may need to increase this to something a bit larger (e.g. 30GB).

−

* Your data

−

** Sequence data

−

** Output of the ~~aligner~~

−

** Output of umake

−

~~The~~ '''~~first of these~~''' ~~is a small volume based on a snapshot containing~~ the ~~GotCloud files you~~ will ~~need.~~

+

* '''Data Storage''' for the aligner or SNP caller will likely be far larger than the system you are creating.

−

~~We provide an AWS snapshot of a small volume which contains~~ the ~~aligner and umake software and reference files~~.

+

You'll need to create EBS Volumes for the input and output of the aligner and SNP caller.

−

~~Create an~~ EBS ~~volume based on our snapshot and then mount that volume on your instance.~~

−

~~In the EC2 Management Console under ELASTIC BLOCK STORE, select~~ Volumes ~~-> Create Volume.~~

−

~~In the prompt supply~~ the ~~size~~ and ~~Snapshot (based on~~ the ~~table below).~~

−

~~You may take the defaults for the Volume Type~~ and ~~IOPS~~.

−

~~The snapshot ID varies by zone and the release of the software. You can see the complete list of GotCloud snapshots:~~

+

'''Prepare Your Storage'''

−

~~<code>~~

+

These can be quite substantial and because of that we recommend you create separate volumes like this:

−

~~wget -qO - share.sph.umich.edu:gotcloud/snapshots.txt~~

−

~~# GotCloud SnapShot List~~

−

#

−

~~# Create an EBS volume from these snapshots. Use the AWS console or~~

−

~~# with an ec2-api-tools command:~~

−

#

−

~~# ec2-create-volume -K ~/ec2/EC2-X509-private_key.pem \~~

−

~~# -C ~/ec2/EC2-X509-cert.pem -s 40 \~~

−

~~# --snapshot snap-14ea7632 --region us-west-2 -z us-west-2a~~

−

#

−

~~# Availability~~

−

~~# Zone Snapshot Size~~

−

~~us-west-2a snap-14ea7632 40GB~~

−

~~</code>~~

−

~~This will create a device which~~ you ~~need to mount in your instance.~~

−

~~This will~~ create ~~a device~~ like ~~/dev/sdf, which unfortunately actually translates to~~

−

~~the device /dev/xvdf in your Linux instance. Once the volume is ready, mount it~~

−

~~by logging into your instance with ssh and issuing the command~~:

−

~~<code>~~

−

~~sudo mkdir -p /gotcloud~~

−

~~sudo mount /dev/xvdf /gotcloud # or whatever device yours is~~

−

~~df -h~~

−

~~</code>~~

−

This will ~~make~~ the ~~GotCloud software available under the path /gotcloud/bin etc~~.

+

* Your '''input FASTQ''' files for the aligner.

−

~~Each time~~ your ~~instance is started~~, you'll ~~need~~ to ~~mount~~ this ~~volume~~.

+

This may have been done for you by some vendor when they put your FASTQ data on an S3 volume.

−

~~You may want to create~~ a ~~small shell script to mount the device~~.

+

If so, your vendor will need to provide you with the details of how to access your FASTQ files.

+

If your FASTQ files are not in S3 storage, you'll have to create a volume for this and copy your data into it.

+

This can take a very long time.

−

In '''~~Your Data~~''' ~~the storage volumes will vary based on what you data you have.~~

+

* The '''output of the aligner''' (BAM files)

−

~~The sequence data might already exist, provided by a vendor who created the sequence data.~~

−

~~If not, you'll have to create a volume for this and copy your data into it.~~

−

~~You'll have to mount volumes for all three types of data~~ (~~sequence, aligner and umake~~).

−

~~You should expect the three data volumes will all need to be the same size. That is, if your sequence data is 300GB, then you~~'~~ll need an additional 300GB for the aligner output and then another 300GB~~ of ~~storage for~~ the ~~umake output. We suggest you consider making each set of data be separate volumes.~~

+

* The '''intermediate files of the SNP caller''' (GLF files)

−

~~You may also find that your sequence data is too large to be easily handled in one go,~~

+

* The '''final output of the SNP caller''' (VCF files)

−

~~so you might choose to only use the aligner/umake on part~~ of ~~your sequence data, capture~~ the files

−

~~of interest from umake, and then go back and rerun the software with the next bit of sequence data.~~

Terry Gliedt

283

edits

Changes

Amazon Snapshot (view source)

Revision as of 13:06, 20 May 2013

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools