Mount S3 Volume
Back to the beginning: GotCloud
|We are still working to figure out how to get S3 working with the 1000G data.
For now, we found it more reliable to copy the data into a volume and run on that.
It seemed like such a good idea. Rather than copy the data from S3 storage, why not mount the S3 volume and run the Pipeline on it directly. After all, in theory, the aligner only reads the FASTQ files once and then creates it's output to local disk, which is then input to UMAKE.
Here's what happened:
apt-get update apt-get upgrade apt-get install build-essential libcurl4-openssl-dev libxml2-dev libfuse-dev comerr-dev libfuse2 libidn11-dev libkrb5-dev libldap2-dev libselinux1-dev libsepol1-dev pkg-config fuse-utils sshfs wget https://s3fs.googlecode.com/files/s3fs-r203.tar.gz tar xzvf s3fs-r203.tar.gz cd s3f3 # The Makefile does not work, Fix is put the s3fe.cpp file right after g++ g++ s3fs.cpp -ggdb -Wall $(shell pkg-config ... make # Should create an s3f3 executable make install # Configuration change for fuse vi /etc/fuse.conf user_allow_other # Uncomment this line mkdir /mnt/s3 s3fs 1000genomes -o accessKeyId=AKourkey2Q -o secretAccessKey=ftoutsecretaccesskeyIGf -o use_cache=/tmp -o allow_other /mnt/s3 # Did it work? cd /mnt ls /mnt/s3/data/HG01550 # Wow, look at all those files rsync -av s3/data/HG01550 . sending incremental file list HG01550/ HG01550/alignment/ HG01550/alignment/HG01550.chrom11.ILLUMINA.bwa.CLM.low_coverage.20111114.bam HG01550/alignment/HG01550.chrom11.ILLUMINA.bwa.CLM.low_coverage.20111114.bam.bai HG01550/alignment/HG01550.chrom11.ILLUMINA.bwa.CLM.low_coverage.20111114.bam.bas HG01550/alignment/HG01550.chrom20.ILLUMINA.bwa.CLM.low_coverage.20111114.bam HG01550/alignment/HG01550.chrom20.ILLUMINA.bwa.CLM.low_coverage.20111114.bam.bai HG01550/alignment/HG01550.chrom20.ILLUMINA.bwa.CLM.low_coverage.20111114.bam.bas HG01550/alignment/HG01550.mapped.ILLUMINA.bwa.CLM.low_coverage.20111114.bam rsync: read errors mapping "/mnt/s3/data/HG01550/alignment /HG01550.mapped.ILLUMINA.bwa.CLM.low_coverage.20111114.bam": Bad file descriptor (9) HG01550/alignment/HG01550.mapped.ILLUMINA.bwa.CLM.low_coverage.20111114.bam.bai rsync: read errors mapping "/mnt/s3/data/HG01550/alignment /HG01550.mapped.ILLUMINA.bwa.CLM.low_coverage.20111114.bam.bai": Bad file descriptor (9) [deleted lines]
After all that, waiting 80 minutes for the rsync to work, we got nearly no files.
It does not appear you can actually mount an S3 bucket and make it work with data on this size.