Changes

1,847 bytes added , 13:06, 20 May 2013

no edit summary

Line 1: Line 1: −

Back to the beginning [~~http://genome.sph.umich.edu/wiki/Pipelines~~]

+

Back to the beginning: [[GotCloud]]

−

~~You may run~~ the ~~pipeline~~ software on a ~~single instance we have created~~ for you in ~~AWS~~.

+

'''We no longer use a snapshot.''' It is very likely that you will need quite a few packages installed

−

You ~~may also create~~ your own ~~AWS~~ instance ~~and run it there~~.

+

so that you can compile your software, access the EC2 application data or access data on S3.

−

~~You~~ may ~~also, of course,~~ install ~~and run~~ the software ~~on your own hardware~~.

+

It just seemed foolish to not make these software available in an AMI.

+

GotCloud is made available in various forms.

+

It is distributed as conventional packages for Ubuntu and as compress TAR files for others.

+

In addition the source is available from github.

+

In Amazon Web Services the software is made available as an Amazon Machine Instance (AMI).

+

The GotCloud software itself only requires a few packages to be installed for Ubuntu installations

+

(java-common default-jre make libssl0.9.8).

+

However, there are a number of things you may well want to do in getting your data

+

ready for processing (access data on S3 storage, compile GotCloud or others, or

+

access the EC2 application data).

+

Assuming this is the case, the GotCloud AMI has installed these packages on Ubunutu.

+

If you need to run on some other distribution, you may need to install their packages.

+

<code>

+

sudo apt-get install java-common default-jre make libssl0.9.8

+

sudo apt-get install libnet-amazon-ec2-perl

+

sudo apt-get install make g++ libcurl4-openssl-dev libssl-dev libxml2-dev libfuse-dev

+

</code>

+

You will almost certainly need to fetch and install your own reference files - regardless

+

of the details of the system you are using.

+

Finally, you'll need access to your FASTQ files - either copied to the Amazon instance

+

or perhaps accessible from S3 storage.

+

If the GotCloud instance is unacceptable for some reason, you may install the

+

software and reference files wherever you'd like

+

(read about this in [[Pipeline_Debian_Package|Installing from a Debian package]]).

Your first task is get an AWS account and keys so that you can use the AWS EC2 Console Dashboard

Line 9: Line 39:

From here you can launch instances prepared by others or create your own.

We cannot assist in this step - Amazon has plenty of documentation.

−

Once you are at the AWS EC2 Console Dashboard, you're ready ~~to run the pipeline~~.

+

Once you are at the AWS EC2 Console Dashboard, you're almost ready for GotCloud.

−

'''~~Launch~~ Your First Instance'''

+

'''Your First Instance'''

You'll need to know some details when launching an instance:

−

* '''~~What~~ Instance''' ~~to launch. You have several choices~~

+

* '''Launch an Instance''' - use the GotCloud instance running 64 bit software.

−

** ami-~~be59d78e which is an instance we have prepared based on ''Ubuntu Server 12.04.1 LTS''. It has all of our software installed.~~

−

** Some other instance~~. The instance must run~~ 64 bit ~~software and is either Ubuntu of any version or Redhat/CentOS 6.3. You will also need to install the Pipeline~~ software.

−

* '''Instance size''' (memory and number of processors). The pipeline software will require at least ~~8GB~~ of memory (type m1.~~large~~) and can use as many processors as is available.

+

* '''Instance size''' (memory and number of processors). The pipeline software will require at least 4GB of memory (''type m1.medium'') and can use as many processors as is available.

−

* '''Storage''' for the instance refers to the size for root (/) partition. This can be quite small, as little as 8GB ~~should~~ work. Of course if you intend to bring ~~lots of~~ other files/programs to the instance, you may ~~want~~ to increase this to something a bit larger (e.g. 30GB).

+

* '''Storage''' for the instance refers to the size for root (/) partition. This can be quite small, as little as 8GB can work. Of course if you intend to bring other files/programs to the instance, you may need to increase this to something a bit larger (e.g. 30GB).

+

* '''Data Storage''' for the aligner or SNP caller (see below)

'''Prepare Your Instance'''

−

If you ~~launched some~~ other instance than the ~~one prepared~~ for ~~our software~~, ~~you~~ will need to ~~install~~

+

You will also want additional storage volumes for:

−

the ~~Pipeline software~~. ~~This is quite simple - see [[Pipeline Debian Package|debian package]] or~~

+

−

~~[[Pipeline RedHatPackage|red hat package]]~~.

+

* '''Local Storage''' for the instance refers to the size for root (/) partition. This can be quite small, as little as 8GB can work. Of course if you intend to bring other files/programs to the instance, you may need to increase this to something a bit larger (e.g. 30GB).

−

This ~~should only~~ take ~~15 minutes~~.

+

* '''Data Storage''' for the aligner or SNP caller will likely be far larger than the system you are creating.

+

You'll need to create EBS Volumes for the input and output of the aligner and SNP caller.

+

'''Prepare Your Storage'''

+

These can be quite substantial and because of that we recommend you create separate volumes like this:

+

* Your '''input FASTQ''' files for the aligner.

+

This may have been done for you by some vendor when they put your FASTQ data on an S3 volume.

+

If so, your vendor will need to provide you with the details of how to access your FASTQ files.

+

If your FASTQ files are not in S3 storage, you'll have to create a volume for this and copy your data into it.

+

This can take a very long time.

+

* The '''output of the aligner''' (BAM files)

+

* The '''intermediate files of the SNP caller''' (GLF files)

−

The ~~last step is to organize your storage so you have enough space for the input sequence data~~

+

* The '''final output of the SNP caller''' (VCF files)

−

~~and the~~ output of the ~~aligner and umake steps.~~

−

~~This is described in more detail in [[Amazon Storage|Amazon Storage]].~~

−

~~If you are not using AWS, the process will be similar to that described above,~~

−

~~but the details will vary based on your environment.~~

Terry Gliedt

283

edits

Changes

Amazon Snapshot (view source)

Revision as of 13:06, 20 May 2013

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools