Changes

From Genome Analysis Wiki
Jump to navigationJump to search
5,183 bytes added ,  14:54, 14 October 2014
Line 4: Line 4:     
The following are notes taken when creating the Amazon Machine Instance used for the CSG pipeline process.
 
The following are notes taken when creating the Amazon Machine Instance used for the CSG pipeline process.
 +
 
These notes assume you have already created an EC2 account and have the certificates and keys set up properly.
 
These notes assume you have already created an EC2 account and have the certificates and keys set up properly.
   −
== Launch an instance ==
+
 
 +
== Create new GotCloud AMI from StarCluster AMI ==
 +
=== Launch an instance ===
    
<code>
 
<code>
Line 13: Line 16:     
Pay attention to the region you are using, at least for now it seems any StarCluster activity must be in '''us-east-1'''.
 
Pay attention to the region you are using, at least for now it seems any StarCluster activity must be in '''us-east-1'''.
Launch a new instance which we will use to set up the software and ultimately save it as an AMI.
+
 
 +
Launch a new instance starting from a StarCluster AMI.  We will use set up the software on this instance and ultimately save it as an AMI.
 +
 
 +
# <code>EC2 DashBoard -> Launch Instance</code>
 +
# Select: <code>Community AMIs</code>
 +
## Enter in the search box: <code>starcluster-base-ubuntu</code>
 +
## Select: <code>starcluster-base-ubuntu-12.04-x86_64 - ami-765b3e1f</code>
 +
# Select the Instance Type: <code>Compute optimized c3.2xlarge</code>
 +
#* You can use a smaller/cheaper machine - I originaly used t1.micro, but I found things go so much faster with a larger machine.
 +
# Click: <code>Review and Launch</code>
 +
## Select: <code>Make General Purpose (SSD) the boot volume for this instance.</code>
 +
## Select: <code>Next</code>
 +
# Scroll down to the <code>Storage</code> section
 +
# Click: <code>Edit storage</code>
 +
## Update the Size: <code>30</code>
 +
##* We use 30G to fit the GotCloud code and reference files.  Make it larger if you want additional space.
 +
## Click: <code>Review and Launch</code>
 +
# Click: <code>Launch</code>
 +
# Select the key/pair you want to use & Launch
 +
 
 +
=== Setup the instance with GotCloud ===
 +
This assumes you have already logged onto the instance.
 +
 
 +
# Get the latest version of GotCloud:
 +
#* Multiples ways to do this, one way is to do:
 +
#*# <code>sudo git clone https://github.com/statgen/gotcloud.git</code>
 +
# Download cmake (required to build premo)
 +
#*<code>sudo apt-get update</code>
 +
#*<code>sudo apt-get upgrade</code>  (takes a while, may be able to skip this step)
 +
#*<code>sudo apt-get install cmake</code>
 +
## Build the source (if you obtained the source code).
 +
### <code>cd gotcloud/src</code>
 +
### <code>sudo make</code>
 +
###* Specify <code>-j #</code> based on the number of CPUs your instance has, if more than 1
 +
### <code>cd</code>
 +
# Get the reference files
 +
## wget ftp://anonymous@share.sph.umich.edu/gotcloud/ref/h37-db135-v3.tgz
 +
# Untar: <code>tar xvf h37-db135-v3.tgz</code>
 +
# Move reference to gotcloud directory: <code>sudo mv gotcloud.ref gotcloud</code>
 +
# Remove tar file: <code>rm h37-db135-v3.tgz</code>
 +
# Set the paths, by updating .profile: <code>vi .profile</code>
 +
#* <code>i</code>
 +
#: <pre>if [ -d "$HOME/gotcloud" ] ; then&#10;    PATH="$HOME/gotcloud:$PATH"&#10;fi&#10;if [ -d "$HOME/gotcloud/bin" ] ; then&#10;    PATH="$HOME/gotcloud/bin:$PATH"&#10;fi&#10;if [ -d "$HOME/gotcloud/scripts" ] ; then&#10;    PATH="$HOME/gotcloud/scripts:$PATH"&#10;fi</pre>
 +
#* <code>ESC</code>
 +
#* <code>:q</code>
 +
 
 +
=== Set Up Swap Space ===
 +
 
 +
Issue the command '''swapon -s''' to see if there is swap space.
 +
If there is only a header line, you need to add a swap file like this:
    
<code>
 
<code>
   EC2 DashBoard -> Launch Instance
+
   df -h          # Be sure there's enough space, decide on swap size
   Class Wizard
+
   #  Create a file /swap to use (assuming / is large enough)
   Ubuntu Server 12.04.1 LTS   64 bit
+
   sudo bash      # Run these commands as root
   Instance type -> Micro,  EC2, no preference        # Memory size does not matter
+
   swap=/swap
   Advanced Instance Options  (take defaults)
+
   dd if=/dev/zero of=$swap bs=524288 count=16384    # 8GB swap on t1.micro  15G=bs=1073741824 count=15
   Storage Device Configuration -> Edit
+
   chown root:root $swap
   Change volume to 30G -> Save -> Continue          # Storage size does not matter
+
   mkswap $swap
   Key Name = GotCloud 1.06a
+
   chmod 0600 $swap
   Create Key/Pair if you need to, Name the PEM and save the pem file for access by ssh
+
   swapon $swap
  Choose a Security Group (take default)
+
   echo "$swap  none swap sw  0  0" >> /etc/fstab
   Launch
+
   
    No need to Create Status Check Alarms
+
   swapon -s      # Should show the swap device
    No need to Create EBS Volumes
   
</code>
 
</code>
   −
== Install Additional Software ==
+
=== Cleanup the instance for creating an AMI ===
 +
# Go to : [[#Cleanup Instance for AMI Creation|Cleanup Instance for AMI Creation]]
 +
 
 +
=== Create the AMI ===
 +
# Go to : [[#Create the AMI|Create the AMI]]
 +
 
 +
 
 +
== Update the GotCloud AMI ==
 +
# Start an instance of the current GotCloud AMI
 +
#* Suggest an instance with some CPU so you can parallelize the "make" call.
 +
# Login as ubuntu
 +
# <code>cd gotcloud</code>
 +
# <code>sudo git pull</code>
 +
# <code>cd gotcloud/src</code>
 +
# <code>sudo make</code>
 +
#* Specify <code>-j #</code> based on the number of CPUs your instance has
 +
# <code>cd</code>
 +
# Go to : [[#Create the AMI|Create the AMI]]
   −
There are a number of additional Debian packages that you may well need, so we make
+
 
 +
==Cleanup Instance for AMI Creation==
 +
First time from generic/starcluster AMI
 +
# Disable password-based logins for root
 +
## Open /etc/ssh/sshd_config
 +
## Change <code>PermitRootLogin yes</code> to <code>PermitRootLogin without-password</code>
 +
# Disable root access
 +
## <code> sudo passwd -l root</code>
 +
 
 +
 
 +
Each time we generate a new AMI, run:
 +
<pre>sudo shred -u /etc/ssh/*_key /etc/ssh/*_key.pub
 +
sudo find / -name "authorized_keys" -exec rm -f {} \;
 +
rm -rf ~/.ssh
 +
shred -u ~/.*history
 +
sudo find /root/.*history /home/*/.*history -exec rm -f {} \;
 +
history -w
 +
history -c
 +
</pre>
 +
These commands do the following:
 +
# Remove SSH host key pairs
 +
# Remove SSH authorized keys
 +
# Remove ssh
 +
# Delete shell history
 +
 
 +
== Create the AMI ==
 +
 
 +
Once your instance is all ready with everything you want, create the AMI.
 +
 
 +
In your browser at the EC2 Management Console do the following:
 +
# Select the running instance
 +
# Right click, <code>Create Image</code>
 +
# Enter name & Description
 +
# Ensure volume size is correct
 +
# Mark delete on terminate
 +
#:This will take several minutes to complete.
 +
#:In the EC2 Dashboard, you can monitor the progress.
 +
#:When it is done, you'll see a new AMI under the list of AMIs.
 +
# When completed, terminate your old instance
 +
 
 +
 
 +
== Older/Additional Instructions ==
 +
=== Install the Software ===
 +
 
 +
'''(1)''' There are a number of additional Debian packages that you may well need, so we make
 
sure they are all installed.
 
sure they are all installed.
    
<code>
 
<code>
 +
  sudo apt-get update
 +
  sudo apt-get upgrade          # Apply maintenance
 +
 
   sudo apt-get install java-common default-jre make libssl0.9.8  
 
   sudo apt-get install java-common default-jre make libssl0.9.8  
   sudo apt-get install libnet-amazon-ec2-perl
+
   sudo apt-get install libnet-amazon-ec2-perl s3cmd
 
   sudo apt-get install make g++ libcurl4-openssl-dev libssl-dev libxml2-dev libfuse-dev
 
   sudo apt-get install make g++ libcurl4-openssl-dev libssl-dev libxml2-dev libfuse-dev
 
</code>
 
</code>
   −
'''S3fs''' allows one to access S3 storage as a conventional file system.
+
'''(2)''' '''S3fs''' allows one to access S3 storage as a conventional file system.
This can be quite handy, but our recent experience is that accessing S3 data this
+
This can be quite handy, if it is set up properly.
way (at least for 1000 Genomes data) is seldom really usable.
+
Our recent experience is that the 1000 Genomes data is has many files with incorrect permissions.
 
Still if you're lucky, your data will be useful.
 
Still if you're lucky, your data will be useful.
 
Install the software like this:
 
Install the software like this:
Line 58: Line 173:  
</code>
 
</code>
   −
 
+
'''(3)''' Configure s3cmd.  This will ask for your AWS ID and Secret Key. If creates a file in ~/.s3cfg
== Make Sure System Has Swap Space ==
  −
 
  −
Issue the command '''swapon -s''' to see if there is swap space.
  −
If there is only a header line, you need to add a swap file like this:
      
<code>
 
<code>
   df -h          # Be sure there's enough space, decide on swap size
+
   s3cmd --configure
   # Create a file /swap to use (assuming / is large enough)
+
   sudo bash      # Run these commands as root
+
  Enter new values or accept defaults in brackets with Enter.
   swap=/swap
+
   Refer to user manual for detailed description of all options.
   dd if=/dev/zero of=$swap bs=1073741824 count=15    # 15GB swap
+
   
   chown root:root $swap
+
  Access key and Secret key are your identifiers for Amazon S3
   mkswap $swap
+
  Access Key: AKI1234QEUWZ3YCZF2Q
   chmod 0600 $swap
+
  Secret Key: ft1eJa1234NE8iitNlbA08x/G8iMqkMI1234IGf
   swapon $swap
+
   echo "$swap  none swap sw  0" >> /etc/fstab
+
  Encryption password is used to protect your files from reading
 +
  by unauthorized persons while in transfer to S3
 +
  Encryption password: password_you_do_not_need_to_know
 +
  Path to GPG program [/usr/bin/gpg]:
 +
 +
   When using secure HTTPS protocol all communication with Amazon S3
 +
   servers is protected from 3rd party eavesdropping. This method is
 +
   slower than plain HTTP and can't be used if you're behind a proxy
 +
   Use HTTPS protocol [No]:  
 +
 +
   On some networks all internet access must go through a HTTP proxy.
 +
   Try setting it here if you can't conect to S3 directly
 +
   HTTP Proxy server name:
 +
 +
   New settings:
 +
    Access Key: AKI1234QEUWZ3YCZF2Q
 +
    Secret Key: ft1eJa1234NE8iitNlbA08x/G8iMqkMI1234IGf
 +
    Encryption password: password_you_do_not_need_to_know
 +
    Path to GPG program: /usr/bin/gpg
 +
    Use HTTPS protocol: False
 +
    HTTP Proxy server name:
 +
    HTTP Proxy server port: 0
 +
   
 +
  Test access with supplied credentials? [Y/n]
 +
  Please wait...
 +
  Success. Your access key and secret key worked fine :-)
 +
 +
  Now verifying that encryption works...
 +
  Success. Encryption and decryption worked fine :-)
 
   
 
   
   swapon -s      # Should show the swap device
+
   Save settings? [y/N] y
 +
  Configuration saved to '/home/ubuntu/.s3cfg'
 
</code>
 
</code>
   −
== Configure the Host to be Usable ==
+
'''(4)''' Follow the instructions to install the [[Pipeline Debian Package|'''GotCloud Debian packages''']]
 +
Run the tests to be sure everything is OK.
 +
 
 +
=== Configure the Host to be Usable ===
    
It is useful to configure /etc/rc.local to do most things you need at boot time.
 
It is useful to configure /etc/rc.local to do most things you need at boot time.
Line 89: Line 232:  
<code>
 
<code>
 
ubuntu@ip-10-254-60-210:~$ sudo more /etc/rc.local
 
ubuntu@ip-10-254-60-210:~$ sudo more /etc/rc.local
  #!/bin/sh  
+
#!/bin/sh  
  USER=ubuntu
+
#
  THOUSANDG=/mnt/1000g
+
# rc.local
  FILES3=passwd-s3fs
+
#
  S3ERR=/tmp/s3fs.err
+
# This script is executed at the end of each multiuser runlevel.
 +
# Make sure that the script will "exit 0" on success or any other
 +
# value on error.
 +
#
 +
# In order to enable or disable this script just change the execution
 +
# bits.
 +
#
 +
# By default this script does nothing.
 +
USER=ubuntu
 +
THOUSANDG=/mnt/1000g
 +
FILES3=/etc/passwd-s3fs     # Where s3fs access info will live
 +
S3ERR=/tmp/s3fs.err
 +
#  These are needed for s3fs access
 +
AWSACCESSKEYID=AKIAxxxxxxZ3YCZF2Q
 +
AWSSECRETACCESSKEY=ft1eJa3WxxxxxxxNlbA08x/G8iMqkMIkJjFCIGf
 +
 +
 +
#    Check that we have swap set up
 +
a=`swapon -s | grep -v File`
 +
if [ "$a" = "" ]; then
 +
  echo "#######################################################"
 +
  echo "#  You have no SWAP file set up"
 +
  echo ""
 +
  echo "#  swap=/mnt/swapfile"
 +
  echo "#  sudo dd if=/dev/zero of=$swap bs=1073741824 count=20"
 +
  echo "#  sudo chown root:root $swap"
 +
  echo "#  sudo mkswap $swap"
 +
  echo "#  sudo chmod 0600 $swap"
 +
  echo "#  sudo swapon $swap"
 +
  echo ""
 +
  echo "#  If need be, add to /etc/fstab"
 +
  echo "#  echo "$swap  none swap sw  0  0" >> /etc/fstab"
 +
  echo "#######################################################"
 +
fi
 +
 +
#    Set up for GotCloud
 +
gc=/gotcloud.mnt
 +
if [ ! -r $gc/release_version.txt ]; then
 +
  mkdir -p $gc
 +
  mount /dev/xvdg $gc
 +
  if [ -d $gc/gotcloud.ref ]; then
 +
    echo "#######################################################"
 +
    echo "#  GotCloud is set up on $gc"
 +
    echo "#######################################################"
 +
  fi
 +
fi
 +
 +
#    Set up access to S3 storage as normal filesystem
 +
echo "${AWSACCESSKEYID}:$AWSSECRETACCESSKEY" > $FILES3
 +
chown root.root $FILES3
 +
chmod 640 $FILES3
 
   
 
   
  #    Set up for GotCloud    Assumes /dev/xvdf has reference files for GotCloud
+
usermod -aG fuse $USER
  mkdir -p /gotcloud
  −
  mount /dev/xvdf /gotcloud
  −
  if [ ! -d /gotcloud/gotcloud.ref ]; then
  −
    echo "#######################################################"
  −
    echo "#  GotCloud is not set up on /gotcloud"
  −
    echo "#######################################################"
  −
  fi
   
   
 
   
  #    Setup 1000g access by s3fs
+
#    Setup 1000genomes
  usermod -aG fuse $USER
+
mkdir -p $THOUSANDG
  echo 'AKIAIW5TQEUWZ3YCZF2Q:ft1eJa3WGzNE8iitNlbA08x/G8iMqkMIkJjFCIGf' > /etc/$FILES3
+
if [ ! -r $THOUSANDG/release ]; then
  chown root.root /etc/$FILES3
+
  chown $USER.$USER $THOUSANDG
  chmod 640 /etc/$FILES3
+
  /usr/local/bin/s3fs -o allow_other 1000genomes $THOUSANDG > $S3ERR 2>&1
  mkdir -p $THOUSANDG
+
  if [ ! -r $THOUSANDG/alignment.index ]; then
  chown $USER.$USER $THOUSANDG
+
    echo "#######################################################"
  #  It is tempting to use caching with  -o use_cache=/tmp 1000genomes
+
    echo "#  1000genomes is not set up on $THOUSANDG"
  #  But s3fs cache is exceedingly dumb and does not use a least recently used
+
    echo "#  See S3FS errors in $S3ERR"
  #  mechanism -- which will guarantee your root volume will fill up
+
    echo "#######################################################"
  /usr/local/bin/s3fs -o allow_other 1000genomes $THOUSANDG > $S3ERR 2>&1
+
  fi
  if [ ! -r $THOUSANDG/alignment.index ]; then
+
  df -h
    echo "#######################################################" >> $S3ERR
+
fi
    echo "#  1000genomes is not set up on $THOUSANDG" >> $S3ERR
+
exit 0
    echo "#######################################################" >> $S3ERR
  −
  fi
  −
  df -h
  −
 
  −
  exit 0
  −
</code>
  −
 
  −
== Create the AMI ==
  −
 
  −
Once your instance is all ready with the files you want, swap space etc, then create the AMI.
  −
In your browser at the EC2 Management Console do the following:
  −
 
  −
<code>
  −
  Create Image
  −
    Image Name  csg-biopipe_instance
  −
    Image Description:  Image for CSG Biopipe instance
  −
    Volume Size:  30GB
  −
    Take defaults otherwise
  −
</code>
  −
 
  −
This will take several minutes to complete.
  −
In the EC2 Dashboard, you can monitor the progress.
  −
When it is done, you'll see a new AMI under the list of AMIs.
  −
 
  −
Your new AMI should look pretty much like this:
  −
 
  −
<code>
  −
  AMI: Ubuntu Cloud Guest AMI ID ami-3d4ff254 (x86_64)
  −
  Name: Ubuntu Server 12.04.1 LTS
  −
  Description: Ubuntu Server 12.04.1 LTS with support available from Canonical (http://www.ubuntu.com/cloud/services).
  −
  Number of Instances: 1
  −
  Availability Zone: No Preference
  −
  Instance Type: Micro (t1.micro)
  −
  Instance Class: On Demand Edit Instance Details
  −
  EBS-Optimized: No
  −
  Monitoring: Disabled Termination Protection: Disabled
  −
  Tenancy: Default
  −
  Kernel ID: Use Default Shutdown Behavior: Stop
  −
  RAM Disk ID: Use Default
  −
  Network Interfaces:
  −
  Secondary IP Addresses:
  −
  User Data:
  −
  IAM Role: Edit Advanced Details
  −
  Key Pair Name: CSG Edit Key Pair
  −
  Security Group(s): sg-a098e9c8 Edit Firewall
   
</code>
 
</code>
   −
== Test the new AMI ==
+
=== Test the new AMI ===
    
Launch a new AMI instance and check that files are in the correct places.
 
Launch a new AMI instance and check that files are in the correct places.
Line 175: Line 316:  
   Advanced Instance Options  (take defaults)
 
   Advanced Instance Options  (take defaults)
 
   Storage Device Configuration -> Edit
 
   Storage Device Configuration -> Edit
   Change volume to 30G or whatever -> Continue       # Defaults are OK
+
   Change volume to 30G or larger -> Continue     # Defaults are OK
 
   Instance Details
 
   Instance Details
 
     Key Name = test of instance
 
     Key Name = test of instance

Navigation menu