GotCloud: Amazon Demo

From Genome Analysis Wiki
Jump to navigationJump to search

Back to GotCloud

Back to GotCloud: Amazon

Introduction

This Amazon demo runs through the GotCloud SNP and INDEL calling pipelines.

The data used for this demo is originally from our sequencing workshop demos. We also have alignment and structural variation demos.

Links to the general GotCloud Demos (originally from our sequencing workshop):


To run this Demo on an Amazon Cluster rather than on a single node, see: StarCluster -> Run GotCloud Demo Using StarCluster


Starting up a Node

See Amazon Single Node for instructions on starting a node and getting a terminal running.

  • For the demo, we recommend using a c3.2xlarge instance.

Running the Demo on Already Running Node

Examine the Setup

  1. After logging into the Amazon node as the ubuntu user, you should by default be in the ubuntu home directory: /home/ubuntu
    1. You can check this by doing:
      pwd
      • This should output: /home/ubuntu
    2. Take a look at the contents of the ubuntu user home directory
      ls
      • This should output be 2 directories, example and gotcloud
        • The example directory contains the files for this demo
        • The gotcloud directory contains the gotcloud programs and pre-compiled source
     
  2. Look at the example input files:
    ls example
     
    1. bam.list contains the list of BAM files per sample
    2. bams is a subdirectory containing the BAM files for this demo
    3. test.bed contains the region we want to process in this demo
      • To make the demo run faster, we only want to process a small region of chromosome 22. This file tells GotCloud the region. The region we are using is the APOL1 region
       
    4. test.conf contains the settings we want GotCloud to use for this run
       
      • For the demo, we want to tell GotCloud:
        1. The list of bams to use: BAM_LIST = example/bam.list
        2. The region to process rather than the whole genome: UNIFORM_TARGET_BED = example/test.bed
        3. The chromosomes to process. The default chromosomes are 1-22 & X, but we only want to process chromosome 22: CHRS = 22

Run GotCloud SnpCall

Now that we have examined the instance files, run GotCloud snpcall

  1. gotcloud snpcall --conf example/test.conf --outdir output --numjobs 8
    • The ubuntu user is setup to have the gotcloud program and tools in its path, so you can just type the program name and it will be found
     
    • This will take a few minutes to run.
    • GotCloud first generates a makefile, and then runs the makefile
    • After a while GotCloud snpcall will print some messages to the screen. This is expected and ok.
  2. When complete, GotCloud snpcall will indicate success/failure
     

Examining SnpCall Output

  1. ls on the ubuntu home directory to see the new output directory:
    ls
     
  2. Look inside the output directory:
    ls output
     
    1. glfs - intermediate per sample genotype likelihood files
    2. jobfiles - empty; was used to store commands as GotCloud was running
    3. pvcfs - intermediate vcfs with per sample information
    4. split - contains per chromosome directories with vcfs containing PASS only snps split up as required for beagle (part of ldrefine)
    5. target - contains the bed with the region to be processed
    6. umake.snpcall.conf - file containing all of the configuration settings used for this run of GotCloud
    7. umake.snpcall.Makefile - Makefile containing commands for this run of GotCloud
    8. umake.snpcall.Makefile.cluster - Makefile log of start/stop times of various steps
    9. umake.snpcall.Makefile.log - log of the GotCloud run
    10. vcfs - contains per chromosome directories with vcfs
      • important output is
        1. filtered.vcf.gz file: vcfs/chr22/chr22.filtered.vcf.gz
        2. summary information: vcfs/chr22/chr22.filtered.sites.vcf.summary
  3. Look at the filtered file.
    zless -S output/vcfs/chr22/chr22.filtered.vcf.gz
    Scroll down some:  
    1. Check that GotCloud found the expected SNP at 22:36661906
      tabix output/vcfs/chr22/chr22.filtered.vcf.gz 22:36661906 |head -1
     
  4. Look at the summary information
    cat output/vcfs/chr22/chr22.filtered.sites.vcf.summary
     


Run GotCloud Indel

Now that we have examined the instance files, run GotCloud indel

  1. gotcloud indel --conf example/test.conf --outdir output --numjobs 8
    • The ubuntu user is setup to have the gotcloud program and tools in its path, so you can just type the program name and it will be found
     
    • This will take a few minutes to run.
    • GotCloud first generates a makefile, and then runs the makefile
  2. When complete, GotCloud indel will indicate success/failure
     

Examining Indel Output

  1. Look inside the output directory to see the new indel directories:
    ls output
     
    1. aux - intermediate indel files
    2. final - final output of Indel pipeline
    3. indelvcf - intermediate indel files
    4. gotcloud.indel.conf - file containing all of the configuration settings used for this run of GotCloud
    5. gotcloud.indel.Makefile - Makefile containing commands for this run of GotCloud
    6. gotcloud.indel.Makefile.log - log of the GotCloud run
  2. Important Indel output is in output/final
     ls output/final/
     
  3. Look at the final Indel VCF file.
    zless output/final/all.genotypes.vcf.gz
    Scroll down some: [[File::IndelLess.png|900px]]
    1. Check that GotCloud found the expected Indel at 22:36662041
      tabix output/final/all.genotypes.vcf.gz 22:36662041-36662041|less -S
       

Exit & Terminate

Prior to terminating, make sure you copy any data off of the root EBS volume attached to the instance as it will be deleted when you terminate.

  1. Exit from your terminal
  2. Terminate your Amazon Instance
    • Right click on the instance you want to terminate in the EC2 dashboard
    • Select Terminate
    • Select "Yes, Terminate" to indicate you would like to terminate and the storage will be deleted.