Changes

From Genome Analysis Wiki
Jump to navigationJump to search
2,660 bytes added ,  15:28, 29 October 2012
Created page with 'Back to the beginning [http://genome.sph.umich.edu/wiki/Pipelines] Ref: http://thecrystalclouds.wordpress.com/2012/05/18/installation-and-setup-of-s3fs-on-amazon-web-services/ …'
Back to the beginning [http://genome.sph.umich.edu/wiki/Pipelines]

Ref: http://thecrystalclouds.wordpress.com/2012/05/18/installation-and-setup-of-s3fs-on-amazon-web-services/

It seemed like such a good idea. Rather than copy the data from S3 storage, why not mount the S3 volume
and run the Pipeline on it directly.
After all, in theory, the aligner only reads the FASTQ files once and then creates it's output
to local disk, which is then input to UMAKE.

Here's what happened:

<code>
apt-get update
apt-get upgrade
apt-get install build-essential libcurl4-openssl-dev libxml2-dev libfuse-dev comerr-dev libfuse2 libidn11-dev libkrb5-dev libldap2-dev libselinux1-dev libsepol1-dev pkg-config fuse-utils sshfs

wget https://s3fs.googlecode.com/files/s3fs-r203.tar.gz
tar xzvf s3fs-r203.tar.gz
cd s3f3
# The Makefile does not work, Fix is put the s3fe.cpp file right after g++
g++ s3fs.cpp -ggdb -Wall $(shell pkg-config ...

make # Should create an s3f3 executable
make install

# Configuration change for fuse
vi /etc/fuse.conf
user_allow_other # Uncomment this line

mkdir /mnt/s3
s3fs 1000genomes -o accessKeyId=AKourkey2Q -o secretAccessKey=ftoutsecretaccesskeyIGf -o use_cache=/tmp -o allow_other /mnt/s3

# Did it work?
cd /mnt
ls /mnt/s3/data/HG01550 # Wow, look at all those files
rsync -av s3/data/HG01550 .
sending incremental file list
HG01550/
HG01550/alignment/
HG01550/alignment/HG01550.chrom11.ILLUMINA.bwa.CLM.low_coverage.20111114.bam
HG01550/alignment/HG01550.chrom11.ILLUMINA.bwa.CLM.low_coverage.20111114.bam.bai
HG01550/alignment/HG01550.chrom11.ILLUMINA.bwa.CLM.low_coverage.20111114.bam.bas
HG01550/alignment/HG01550.chrom20.ILLUMINA.bwa.CLM.low_coverage.20111114.bam
HG01550/alignment/HG01550.chrom20.ILLUMINA.bwa.CLM.low_coverage.20111114.bam.bai
HG01550/alignment/HG01550.chrom20.ILLUMINA.bwa.CLM.low_coverage.20111114.bam.bas
HG01550/alignment/HG01550.mapped.ILLUMINA.bwa.CLM.low_coverage.20111114.bam
rsync: read errors mapping "/mnt/s3/data/HG01550/alignment
/HG01550.mapped.ILLUMINA.bwa.CLM.low_coverage.20111114.bam": Bad file descriptor (9)
HG01550/alignment/HG01550.mapped.ILLUMINA.bwa.CLM.low_coverage.20111114.bam.bai
rsync: read errors mapping "/mnt/s3/data/HG01550/alignment
/HG01550.mapped.ILLUMINA.bwa.CLM.low_coverage.20111114.bam.bai": Bad file descriptor (9)
[deleted lines]
</code>

After all that, waiting 80 minutes for the rsync to work, we got nearly no files.

It does not appear you can actually mount an S3 bucket and make it work with data on this size.
283

edits

Navigation menu