Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,928 bytes removed ,  13:03, 20 May 2013
no edit summary
Line 57: Line 57:  
* '''Data Storage''' for the aligner or SNP caller will likely be far larger than the system you are creating.
 
* '''Data Storage''' for the aligner or SNP caller will likely be far larger than the system you are creating.
 
You'll need to create EBS Volumes for the input and output of the aligner and SNP caller.
 
You'll need to create EBS Volumes for the input and output of the aligner and SNP caller.
 +
 +
'''Prepare Your Storage'''
 +
 
These can be quite substantial and because of that we recommend you create separate volumes like this:
 
These can be quite substantial and because of that we recommend you create separate volumes like this:
   −
* Your '''input FASTQ''' files for the aligner. This may have been done for you by some vendor when they put your FASTQ data on an S3 volume. If so, your vendor will need to provide you with the details of how to access your FASTQ files.
+
* Your '''input FASTQ''' files for the aligner.
 +
This may have been done for you by some vendor when they put your FASTQ data on an S3 volume.
 +
If so, your vendor will need to provide you with the details of how to access your FASTQ files.
 +
If your FASTQ files are not in S3 storage, you'll have to create a volume for this and copy your data into it.
 +
This can take a very long time.
    
* The '''output of the aligner''' (BAM files)
 
* The '''output of the aligner''' (BAM files)
Line 66: Line 73:     
* The '''final output of the SNP caller''' (VCF files)
 
* The '''final output of the SNP caller''' (VCF files)
  −
<code>
  −
  wget -qO -  share.sph.umich.edu:gotcloud/snapshots.txt
  −
  −
  #                          GotCloud SnapShot List
  −
  #
  −
  #  Create an EBS volume from these snapshots. Use the AWS console or
  −
  #  with an ec2-api-tools command:
  −
  #
  −
  #    ec2-create-volume -K ~/ec2/EC2-X509-private_key.pem \
  −
  #      -C ~/ec2/EC2-X509-cert.pem -s 40 \
  −
  #      --snapshot snap-14ea7632 --region us-west-2 -z us-west-2a
  −
  #
  −
  #                        Availability
  −
  #  Zone        Snapshot      Size
  −
  us-west-2a    snap-14ea7632  40GB
  −
</code>
  −
  −
This will create a device which you need to mount in your instance.
  −
This will create a device like /dev/sdf, which unfortunately actually translates to
  −
the device /dev/xvdf in your Linux instance. Once the volume is ready, mount it
  −
by logging into your instance with ssh and issuing the command:
  −
  −
<code>
  −
  sudo mkdir -p /gotcloud
  −
  sudo mount /dev/xvdf  /gotcloud    # or whatever device yours is
  −
  df -h
  −
</code>
  −
  −
This will make the GotCloud software available under the path /gotcloud/bin etc.
  −
Each time your instance is started, you'll need to mount this volume.
  −
You may want to create a small shell script to mount the device.
  −
  −
In '''Your Data''' the storage volumes will vary based on what you data you have.
  −
The sequence data might already exist, provided by a vendor who created the sequence data.
  −
If not, you'll have to create a volume for this and copy your data into it.
  −
You'll have to mount volumes for all three types of data (sequence, aligner and umake).
  −
  −
You should expect the three data volumes will all need to be the same size. That is, if your sequence data is 300GB, then you'll need an additional 300GB for the aligner output and then another 300GB of storage for the umake output. We suggest you consider making each set of data be separate volumes.
  −
  −
You may also find that your sequence data is too large to be easily handled in one go,
  −
so you might choose to only use the aligner/umake on part of your sequence data, capture the files
  −
of interest from umake, and then go back and rerun the software with the next bit of sequence data.
 
283

edits

Navigation menu