Mount S3 Volume
Back to the beginning: GotCloud
We are still working to figure out how to get S3 working with the 1000G data.
For now, we found it more reliable to copy the data into a volume and run on that. |
Ref: http://thecrystalclouds.wordpress.com/2012/05/18/installation-and-setup-of-s3fs-on-amazon-web-services/
and http://code.google.com/p/s3fs/wiki/FuseOverAmazon
It seemed like such a good idea. Rather than copy the data from S3 storage, why not mount the S3 volume and run the Pipeline on it directly. After all, in theory, the aligner only reads the FASTQ files once and then creates it's output to local disk, which is then input to UMAKE.
Here's what happened:
apt-get update
apt-get upgrade
apt-get install build-essential libcurl4-openssl-dev libxml2-dev libfuse-dev comerr-dev libfuse2 libidn11-dev libkrb5-dev libldap2-dev libselinux1-dev libsepol1-dev pkg-config fuse-utils sshfs
wget https://s3fs.googlecode.com/files/s3fs-r203.tar.gz
tar xzvf s3fs-r203.tar.gz
cd s3f3
# The Makefile does not work, Fix is put the s3fe.cpp file right after g++
g++ s3fs.cpp -ggdb -Wall $(shell pkg-config ...
make # Should create an s3f3 executable
make install
# Configuration change for fuse
vi /etc/fuse.conf
user_allow_other # Uncomment this line
mkdir /mnt/s3
s3fs 1000genomes -o accessKeyId=AKourkey2Q -o secretAccessKey=ftoutsecretaccesskeyIGf -o use_cache=/tmp -o allow_other /mnt/s3
# Did it work?
cd /mnt
ls /mnt/s3/data/HG01550 # Wow, look at all those files
rsync -av s3/data/HG01550 .
sending incremental file list
HG01550/
HG01550/alignment/
HG01550/alignment/HG01550.chrom11.ILLUMINA.bwa.CLM.low_coverage.20111114.bam
HG01550/alignment/HG01550.chrom11.ILLUMINA.bwa.CLM.low_coverage.20111114.bam.bai
HG01550/alignment/HG01550.chrom11.ILLUMINA.bwa.CLM.low_coverage.20111114.bam.bas
HG01550/alignment/HG01550.chrom20.ILLUMINA.bwa.CLM.low_coverage.20111114.bam
HG01550/alignment/HG01550.chrom20.ILLUMINA.bwa.CLM.low_coverage.20111114.bam.bai
HG01550/alignment/HG01550.chrom20.ILLUMINA.bwa.CLM.low_coverage.20111114.bam.bas
HG01550/alignment/HG01550.mapped.ILLUMINA.bwa.CLM.low_coverage.20111114.bam
rsync: read errors mapping "/mnt/s3/data/HG01550/alignment
/HG01550.mapped.ILLUMINA.bwa.CLM.low_coverage.20111114.bam": Bad file descriptor (9)
HG01550/alignment/HG01550.mapped.ILLUMINA.bwa.CLM.low_coverage.20111114.bam.bai
rsync: read errors mapping "/mnt/s3/data/HG01550/alignment
/HG01550.mapped.ILLUMINA.bwa.CLM.low_coverage.20111114.bam.bai": Bad file descriptor (9)
[deleted lines]
After all that, waiting 80 minutes for the rsync to work, we got nearly no files.
It does not appear you can actually mount an S3 bucket and make it work with data on this size.