Difference between revisions of "Mount S3 Volume"
From Genome Analysis Wiki
Jump to navigationJump to searchTerry Gliedt (talk | contribs) |
Terry Gliedt (talk | contribs) |
||
Line 2: | Line 2: | ||
Ref: http://thecrystalclouds.wordpress.com/2012/05/18/installation-and-setup-of-s3fs-on-amazon-web-services/ | Ref: http://thecrystalclouds.wordpress.com/2012/05/18/installation-and-setup-of-s3fs-on-amazon-web-services/ | ||
− | + | and http://code.google.com/p/s3fs/wiki/FuseOverAmazon | |
It seemed like such a good idea. Rather than copy the data from S3 storage, why not mount the S3 volume | It seemed like such a good idea. Rather than copy the data from S3 storage, why not mount the S3 volume |
Revision as of 10:32, 30 October 2012
Back to the beginning [1]
Ref: http://thecrystalclouds.wordpress.com/2012/05/18/installation-and-setup-of-s3fs-on-amazon-web-services/ and http://code.google.com/p/s3fs/wiki/FuseOverAmazon
It seemed like such a good idea. Rather than copy the data from S3 storage, why not mount the S3 volume and run the Pipeline on it directly. After all, in theory, the aligner only reads the FASTQ files once and then creates it's output to local disk, which is then input to UMAKE.
Here's what happened:
apt-get update
apt-get upgrade
apt-get install build-essential libcurl4-openssl-dev libxml2-dev libfuse-dev comerr-dev libfuse2 libidn11-dev libkrb5-dev libldap2-dev libselinux1-dev libsepol1-dev pkg-config fuse-utils sshfs
wget https://s3fs.googlecode.com/files/s3fs-r203.tar.gz
tar xzvf s3fs-r203.tar.gz
cd s3f3
# The Makefile does not work, Fix is put the s3fe.cpp file right after g++
g++ s3fs.cpp -ggdb -Wall $(shell pkg-config ...
make # Should create an s3f3 executable
make install
# Configuration change for fuse
vi /etc/fuse.conf
user_allow_other # Uncomment this line
mkdir /mnt/s3
s3fs 1000genomes -o accessKeyId=AKourkey2Q -o secretAccessKey=ftoutsecretaccesskeyIGf -o use_cache=/tmp -o allow_other /mnt/s3
# Did it work?
cd /mnt
ls /mnt/s3/data/HG01550 # Wow, look at all those files
rsync -av s3/data/HG01550 .
sending incremental file list
HG01550/
HG01550/alignment/
HG01550/alignment/HG01550.chrom11.ILLUMINA.bwa.CLM.low_coverage.20111114.bam
HG01550/alignment/HG01550.chrom11.ILLUMINA.bwa.CLM.low_coverage.20111114.bam.bai
HG01550/alignment/HG01550.chrom11.ILLUMINA.bwa.CLM.low_coverage.20111114.bam.bas
HG01550/alignment/HG01550.chrom20.ILLUMINA.bwa.CLM.low_coverage.20111114.bam
HG01550/alignment/HG01550.chrom20.ILLUMINA.bwa.CLM.low_coverage.20111114.bam.bai
HG01550/alignment/HG01550.chrom20.ILLUMINA.bwa.CLM.low_coverage.20111114.bam.bas
HG01550/alignment/HG01550.mapped.ILLUMINA.bwa.CLM.low_coverage.20111114.bam
rsync: read errors mapping "/mnt/s3/data/HG01550/alignment
/HG01550.mapped.ILLUMINA.bwa.CLM.low_coverage.20111114.bam": Bad file descriptor (9)
HG01550/alignment/HG01550.mapped.ILLUMINA.bwa.CLM.low_coverage.20111114.bam.bai
rsync: read errors mapping "/mnt/s3/data/HG01550/alignment
/HG01550.mapped.ILLUMINA.bwa.CLM.low_coverage.20111114.bam.bai": Bad file descriptor (9)
[deleted lines]
After all that, waiting 80 minutes for the rsync to work, we got nearly no files.
It does not appear you can actually mount an S3 bucket and make it work with data on this size.