Line 1: |
Line 1: |
− | Back to the beginning [http://genome.sph.umich.edu/wiki/GotCloud] | + | Back to the beginning: [[GotCloud]] |
| | | |
− | GotCloud is made available in various forms. In Amazon Web Services the software
| + | '''We no longer use a snapshot.''' It is very likely that you will need quite a few packages installed |
− | is made available in a EBS (Elastic Block Store) '''Snapshot'''.
| + | so that you can compile your software, access the EC2 application data or access data on S3. |
− | This is simple a copy of a data volume we have created that has our software
| + | It just seemed foolish to not make these software available in an AMI. |
− | and, additionally, some reference files you will find useful.
| |
− | You simply need to create your own EBS volume from our snapshot, mount your
| |
− | new volume and you are ready.
| |
| | | |
− | If this does not work or is unacceptable for some reason, you may install the | + | |
| + | |
| + | GotCloud is made available in various forms. |
| + | It is distributed as conventional packages for Ubuntu and as compress TAR files for others. |
| + | In addition the source is available from github. |
| + | In Amazon Web Services the software is made available as an Amazon Machine Instance (AMI). |
| + | |
| + | The GotCloud software itself only requires a few packages to be installed for Ubuntu installations |
| + | (java-common default-jre make libssl0.9.8). |
| + | However, there are a number of things you may well want to do in getting your data |
| + | ready for processing (access data on S3 storage, compile GotCloud or others, or |
| + | access the EC2 application data). |
| + | Assuming this is the case, the GotCloud AMI has installed these packages on Ubunutu. |
| + | If you need to run on some other distribution, you may need to install their packages. |
| + | |
| + | <code> |
| + | sudo apt-get install java-common default-jre make libssl0.9.8 |
| + | sudo apt-get install libnet-amazon-ec2-perl |
| + | sudo apt-get install make g++ libcurl4-openssl-dev libssl-dev libxml2-dev libfuse-dev |
| + | </code> |
| + | |
| + | You will almost certainly need to fetch and install your own reference files - regardless |
| + | of the details of the system you are using. |
| + | Finally, you'll need access to your FASTQ files - either copied to the Amazon instance |
| + | or perhaps accessible from S3 storage. |
| + | |
| + | If the GotCloud instance is unacceptable for some reason, you may install the |
| software and reference files wherever you'd like | | software and reference files wherever you'd like |
− | (read about this in [http://genome.sph.umich.edu/wiki/Pipeline_Debian_Package|Installing from a Debian package]. | + | (read about this in [[Pipeline_Debian_Package|Installing from a Debian package]]). |
| | | |
| Your first task is get an AWS account and keys so that you can use the AWS EC2 Console Dashboard | | Your first task is get an AWS account and keys so that you can use the AWS EC2 Console Dashboard |
Line 19: |
Line 42: |
| | | |
| | | |
− | '''Launch Your First Instance''' | + | '''Your First Instance''' |
| | | |
| You'll need to know some details when launching an instance: | | You'll need to know some details when launching an instance: |
| | | |
− | * '''Launch an Instance''' to launch. Any instance running 64 bit software and | + | * '''Launch an Instance''' - use the GotCloud instance running 64 bit software. |
− | either an Ubuntu of any version or Redhat/CentOS 6.3 distribution will work.
| + | |
| + | * '''Instance size''' (memory and number of processors). The pipeline software will require at least 4GB of memory (''type m1.medium'') and can use as many processors as is available. |
| | | |
− | * '''Instance size''' (memory and number of processors). The pipeline software will require at least 4GB of memory (type m1.medium) and can use as many processors as is available. | + | * '''Storage''' for the instance refers to the size for root (/) partition. This can be quite small, as little as 8GB can work. Of course if you intend to bring other files/programs to the instance, you may need to increase this to something a bit larger (e.g. 30GB). |
| | | |
− | * '''Storage''' for the instance refers to the size for root (/) partition. This can be quite small, as little as 8GB should work. Of course if you intend to bring lots of other files/programs to the instance, you may want to increase this to something a bit larger (e.g. 30GB). | + | * '''Data Storage''' for the aligner or SNP caller (see below) |
| | | |
| | | |
| '''Prepare Your Instance''' | | '''Prepare Your Instance''' |
| | | |
− | If you launched some other instance than the one prepared for our software, you will need to install
| + | You will also want additional storage volumes for: |
− | the Pipeline software. This is quite simple - see [[Pipeline Debian Package|debian package]] or | + | |
− | [[Pipeline RedHatPackage|red hat package]].
| + | * '''Local Storage''' for the instance refers to the size for root (/) partition. This can be quite small, as little as 8GB can work. Of course if you intend to bring other files/programs to the instance, you may need to increase this to something a bit larger (e.g. 30GB). |
− | This should only take 15 minutes. | + | |
| + | * '''Data Storage''' for the aligner or SNP caller will likely be far larger than the system you are creating. |
| + | You'll need to create EBS Volumes for the input and output of the aligner and SNP caller. |
| + | |
| + | '''Prepare Your Storage''' |
| + | |
| + | These can be quite substantial and because of that we recommend you create separate volumes like this: |
| + | |
| + | * Your '''input FASTQ''' files for the aligner. |
| + | This may have been done for you by some vendor when they put your FASTQ data on an S3 volume. |
| + | If so, your vendor will need to provide you with the details of how to access your FASTQ files. |
| + | If your FASTQ files are not in S3 storage, you'll have to create a volume for this and copy your data into it. |
| + | This can take a very long time. |
| + | |
| + | * The '''output of the aligner''' (BAM files) |
| + | |
| + | * The '''intermediate files of the SNP caller''' (GLF files) |
| | | |
− | The last step is to organize your storage so you have enough space for the input sequence data | + | * The '''final output of the SNP caller''' (VCF files) |
− | and the output of the aligner and umake steps.
| |
− | This is described in more detail in [[Amazon Storage|Amazon Storage]].
| |
− | If you are not using AWS, the process will be similar to that described above,
| |
− | but the details will vary based on your environment.
| |