Line 1: |
Line 1: |
− | Back to the beginning [http://genome.sph.umich.edu/wiki/Pipelines] | + | Back to the beginning: [[GotCloud]] |
| | | |
− | You may run the pipeline software on a single instance we have created for you in AWS.
| + | '''We no longer use a snapshot.''' It is very likely that you will need quite a few packages installed |
− | You may also create your own AWS instance and run it there. | + | so that you can compile your software, access the EC2 application data or access data on S3. |
− | You may also, of course, install and run the software on your own hardware.
| + | It just seemed foolish to not make these software available in an AMI. |
| + | |
| + | |
| + | |
| + | GotCloud is made available in various forms. |
| + | It is distributed as conventional packages for Ubuntu and as compress TAR files for others. |
| + | In addition the source is available from github. |
| + | In Amazon Web Services the software is made available as an Amazon Machine Instance (AMI). |
| + | |
| + | The GotCloud software itself only requires a few packages to be installed for Ubuntu installations |
| + | (java-common default-jre make libssl0.9.8). |
| + | However, there are a number of things you may well want to do in getting your data |
| + | ready for processing (access data on S3 storage, compile GotCloud or others, or |
| + | access the EC2 application data). |
| + | Assuming this is the case, the GotCloud AMI has installed these packages on Ubunutu. |
| + | If you need to run on some other distribution, you may need to install their packages. |
| + | |
| + | <code> |
| + | sudo apt-get install java-common default-jre make libssl0.9.8 |
| + | sudo apt-get install libnet-amazon-ec2-perl |
| + | sudo apt-get install make g++ libcurl4-openssl-dev libssl-dev libxml2-dev libfuse-dev |
| + | </code> |
| + | |
| + | You will almost certainly need to fetch and install your own reference files - regardless |
| + | of the details of the system you are using. |
| + | Finally, you'll need access to your FASTQ files - either copied to the Amazon instance |
| + | or perhaps accessible from S3 storage. |
| + | |
| + | If the GotCloud instance is unacceptable for some reason, you may install the |
| + | software and reference files wherever you'd like |
| + | (read about this in [[Pipeline_Debian_Package|Installing from a Debian package]]). |
| | | |
| Your first task is get an AWS account and keys so that you can use the AWS EC2 Console Dashboard | | Your first task is get an AWS account and keys so that you can use the AWS EC2 Console Dashboard |
Line 9: |
Line 39: |
| From here you can launch instances prepared by others or create your own. | | From here you can launch instances prepared by others or create your own. |
| We cannot assist in this step - Amazon has plenty of documentation. | | We cannot assist in this step - Amazon has plenty of documentation. |
− | Once you are at the AWS EC2 Console Dashboard, you're ready to run the pipeline. | + | Once you are at the AWS EC2 Console Dashboard, you're almost ready for GotCloud. |
| | | |
| | | |
− | '''Launch Your First Instance''' | + | '''Your First Instance''' |
| | | |
| You'll need to know some details when launching an instance: | | You'll need to know some details when launching an instance: |
| | | |
− | * '''What Instance''' to launch. You have several choices | + | * '''Launch an Instance''' - use the GotCloud instance running 64 bit software. |
− | ** ami-be59d78e which is an instance we have prepared based on ''Ubuntu Server 12.04.1 LTS''. It has all of our software installed.
| |
− | ** Some other instance. The instance must run 64 bit software and is either Ubuntu of any version or Redhat/CentOS 6.3. You will also need to install the Pipeline software.
| |
| | | |
− | * '''Instance size''' (memory and number of processors). The pipeline software will require at least 8GB of memory (type m1.large) and can use as many processors as is available. | + | * '''Instance size''' (memory and number of processors). The pipeline software will require at least 4GB of memory (''type m1.medium'') and can use as many processors as is available. |
| | | |
− | * '''Storage''' for the instance refers to the size for root (/) partition. This can be quite small, as little as 8GB should work. Of course if you intend to bring lots of other files/programs to the instance, you may want to increase this to something a bit larger (e.g. 30GB). | + | * '''Storage''' for the instance refers to the size for root (/) partition. This can be quite small, as little as 8GB can work. Of course if you intend to bring other files/programs to the instance, you may need to increase this to something a bit larger (e.g. 30GB). |
| + | |
| + | * '''Data Storage''' for the aligner or SNP caller (see below) |
| | | |
| | | |
| '''Prepare Your Instance''' | | '''Prepare Your Instance''' |
| | | |
− | If you launched some other instance than the one prepared for our software, you will need to install
| + | You will also want additional storage volumes for: |
− | the Pipeline software. This is quite simple - see [[Pipeline Debian Package|debian package]] or | + | |
− | [[Pipeline RedHatPackage|red hat package]].
| + | * '''Local Storage''' for the instance refers to the size for root (/) partition. This can be quite small, as little as 8GB can work. Of course if you intend to bring other files/programs to the instance, you may need to increase this to something a bit larger (e.g. 30GB). |
− | This should only take 15 minutes. | + | |
| + | * '''Data Storage''' for the aligner or SNP caller will likely be far larger than the system you are creating. |
| + | You'll need to create EBS Volumes for the input and output of the aligner and SNP caller. |
| + | |
| + | '''Prepare Your Storage''' |
| + | |
| + | These can be quite substantial and because of that we recommend you create separate volumes like this: |
| + | |
| + | * Your '''input FASTQ''' files for the aligner. |
| + | This may have been done for you by some vendor when they put your FASTQ data on an S3 volume. |
| + | If so, your vendor will need to provide you with the details of how to access your FASTQ files. |
| + | If your FASTQ files are not in S3 storage, you'll have to create a volume for this and copy your data into it. |
| + | This can take a very long time. |
| + | |
| + | * The '''output of the aligner''' (BAM files) |
| + | |
| + | * The '''intermediate files of the SNP caller''' (GLF files) |
| | | |
− | The last step is to organize your storage so you have enough space for the input sequence data | + | * The '''final output of the SNP caller''' (VCF files) |
− | and the output of the aligner and umake steps.
| |
− | This is described in more detail in [[Amazon Storage|Amazon Storage]].
| |
− | If you are not using AWS, the process will be similar to that described above,
| |
− | but the details will vary based on your environment.
| |