Difference between revisions of "GotCloud: Amazon Demo"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 74: Line 74:
 
# Look at the filtered file.
 
# Look at the filtered file.
 
#: <pre>zless -S output/vcfs/chr22/chr22.filtered.vcf.gz</pre>
 
#: <pre>zless -S output/vcfs/chr22/chr22.filtered.vcf.gz</pre>
#: Scroll down some: [[File::screenshot]]
+
#: Scroll down some: [[File::SnpVcfOutput.png|700px]]
## Check for the expected SNP at 22:36661906
+
## Check that GotCloud found the expected SNP at 22:36661906
 
##:<pre>tabix output/vcfs/chr22/chr22.filtered.vcf.gz 22:36661906 |head -1</pre>
 
##:<pre>tabix output/vcfs/chr22/chr22.filtered.vcf.gz 22:36661906 |head -1</pre>
#:[[File::screenshot]]
+
#:[[File::SnpTabix.png|700px]]
 
# Look at the summary information
 
# Look at the summary information
 
#:<pre>cat output/vcfs/chr22/chr22.filtered.sites.vcf.summary</pre>
 
#:<pre>cat output/vcfs/chr22/chr22.filtered.sites.vcf.summary</pre>
#:[[File::screenshot]]
+
#:[[File:SnpSummary.png|700px]]
 
+
#:*'''To understand how to interpret the filtering summary statistics, please refer to [[Understanding vcf-summary output]]'''
  
 
=== Run GotCloud Indel ===
 
=== Run GotCloud Indel ===

Revision as of 14:11, 17 October 2014

Introduction

This Amazon demo runs through the GotCloud SNP and INDEL calling pipelines.

The data used for this demo is originally from our sequencing workshop demos. We also have alignment and structural variation demos.

Links to the general GotCloud Demos (originally from our sequencing workshop):

Starting up a Node

See Amazon Single Node for instructions on starting a node and getting a terminal running.

  • For the demo, we recommend using a c3.2xlarge instance.

Running the Demo on Already Running Node

Examine the Setup

  1. After logging into the Amazon node as the ubuntu user, you should by default be in the ubuntu home directory: /home/ubuntu
    1. You can check this by doing:
      pwd
      • This should output: /home/ubuntu
    2. Take a look at the contents of the ubuntu user home directory
      ls
      • This should output be 2 directories, example and gotcloud
        • The example directory contains the files for this demo
        • The gotcloud directory contains the gotcloud programs and pre-compiled source
    DemoHome.png
  2. Look at the example input files:
    ls example
    ExampleFiles.png
    1. bam.list contains the list of BAM files per sample
    2. bams is a subdirectory containing the BAM files for this demo
    3. test.bed contains the region we want to process in this demo
      • To make the demo run faster, we only want to process a small region of chromosome 22. This file tells GotCloud the region. The region we are using is the APOL1 region
      BedContents.png
    4. test.conf contains the settings we want GotCloud to use for this run
      ConfContents.png
      • For the demo, we want to tell GotCloud:
        1. The list of bams to use: BAM_LIST = example/bam.list
        2. The region to process rather than the whole genome: UNIFORM_TARGET_BED = example/test.bed
        3. The chromosomes to process. The default chromosomes are 1-22 & X, but we only want to process chromosome 22: CHRS = 22

Run GotCloud SnpCall

Now that we have examined the instance files, run GotCloud snpcall

  1. gotcloud snpcall --conf example/test.conf --outdir output --numjobs 8
    • The ubuntu user is setup to have the gotcloud program and tools in its path, so you can just type the program name and it will be found
    RunSnpCall.png
    • This will take a few minutes to run.
    • GotCloud first generates a makefile, and then runs the makefile
    • After a while GotCloud snpcall will print some messages to the screen. This is expected and ok.
  2. When complete, GotCloud snpcall will indicate success/failure
    SnpcallSuccess.png

Examining SnpCall Output

  1. ls on the ubuntu home directory to see the new output directory:
    ls
    SnpCallOutput1.png
  2. Look inside the output directory:
    ls output
    SnpCallOutput2.png
    1. glfs - intermediate per sample genotype likelihood files
    2. jobfiles - empty; was used to store commands as GotCloud was running
    3. pvcfs - intermediate vcfs with per sample information
    4. split - contains per chromosome directories with vcfs containing PASS only snps split up as required for beagle (part of ldrefine)
    5. target - contains the bed with the region to be processed
    6. umake.snpcall.conf - file containing all of the configuration settings used for this run of GotCloud
    7. umake.snpcall.Makefile - Makefile containing commands for this run of GotCloud
    8. umake.snpcall.Makefile.cluster - Makefile log of start/stop times of various steps
    9. umake.snpcall.Makefile.log - log of the GotCloud run
    10. vcfs - contains per chromosome directories with vcfs
      • important output is
        1. filtered.vcf.gz file: vcfs/chr22/chr22.filtered.vcf.gz
        2. summary information: vcfs/chr22/chr22.filtered.sites.vcf.summary
  3. Look at the filtered file.
    zless -S output/vcfs/chr22/chr22.filtered.vcf.gz
    Scroll down some: [[File::SnpVcfOutput.png|700px]]
    1. Check that GotCloud found the expected SNP at 22:36661906
      tabix output/vcfs/chr22/chr22.filtered.vcf.gz 22:36661906 |head -1
    [[File::SnpTabix.png|700px]]
  4. Look at the summary information
    cat output/vcfs/chr22/chr22.filtered.sites.vcf.summary
    SnpSummary.png

Run GotCloud Indel

Now that we have examined the instance files, run GotCloud indel

  1. gotcloud indel --conf example/test.conf --outdir output --numjobs 8
    • The ubuntu user is setup to have the gotcloud program and tools in its path, so you can just type the program name and it will be found
    RunIndel.png
    • This will take a few minutes to run.
    • GotCloud first generates a makefile, and then runs the makefile
  2. When complete, GotCloud indel will indicate success/failure
    IndelSuccess.png

Examining Indel Output