Changes

From Genome Analysis Wiki
Jump to navigationJump to search
345 bytes removed ,  12:35, 10 January 2014
no edit summary
Line 8: Line 8:  
GotCloud and this basic tutorial were presented at the [http://ibg.colorado.edu/dokuwiki/doku.php?id=workshop:2013:announcement 2013 IBG Workshop].  It was presented in two sessions.  On Wednesday an overview was presented with steps for running the tutorial data: [[Media:IBG2013GotCloud.pdf|IBG2013GotCloud.pdf]].  On Friday more detail on the input files and what goes into generating the input files was presented: [[Media:GotCloudIBGWorkshop2013Friday.pdf|GotCloudIBGWorkshop2013Friday.pdf]].
 
GotCloud and this basic tutorial were presented at the [http://ibg.colorado.edu/dokuwiki/doku.php?id=workshop:2013:announcement 2013 IBG Workshop].  It was presented in two sessions.  On Wednesday an overview was presented with steps for running the tutorial data: [[Media:IBG2013GotCloud.pdf|IBG2013GotCloud.pdf]].  On Friday more detail on the input files and what goes into generating the input files was presented: [[Media:GotCloudIBGWorkshop2013Friday.pdf|GotCloudIBGWorkshop2013Friday.pdf]].
   −
'''This tutorial is in the process of being updated for gotcloud version 1.06 (April 17. 2013).'''
+
'''This tutorial is in the process of being updated for gotcloud version 1.08 (July 30, 2013).'''
    
== STEP 1 : Setup GotCloud ==
 
== STEP 1 : Setup GotCloud ==
Line 15: Line 15:     
We will use 3 different directories for this tutorial:
 
We will use 3 different directories for this tutorial:
# path to the directory where gotcloud is installed, default ~/gotcloud/
+
# path to the directory where gotcloud is installed, default ~/gotcloud-latest/
 
# path to the directory where the example data is installed, default ~/gotcloudExample
 
# path to the directory where the example data is installed, default ~/gotcloudExample
 
# path to your output directory, default ~/gotcloudTutorialOut/
 
# path to your output directory, default ~/gotcloudTutorialOut/
Line 28: Line 28:     
Otherwise, you can install it in your own directory:
 
Otherwise, you can install it in your own directory:
# Change to the directory where you want gotcloud/ installed
+
# Change to the directory where you want gotcloud-latest/ installed
# Download the gotcloud tar from the ftp site.
+
# Download the gotcloud tar.
 
# Extract the tar
 
# Extract the tar
 
# Build (compile) the source
 
# Build (compile) the source
Line 35: Line 35:     
  cd ~
 
  cd ~
  wget ftp://share.sph.umich.edu/gotcloud/gotcloud_latest.tgz # Download
+
  wget https://github.com/statgen/gotcloud/archive/latest.tar.gz # Download
  tar xf gotcloud_latest.tgz     # Extracts into gotcloud/
+
  tar xf latest.tar.gz     # Extracts into gotcloud-latest/
  cd ~/gotcloud/src; make        # Build source
+
  cd gotcloud-latest/src; make        # Build source
  −
GotCloud requires the following tools to be installed.
  −
You can run ~/gotcloud/scripts/check_requirements.sh
  −
...TBD – put in required programs/tools.
  −
* java (java-common default-jre on ubuntu)
  −
* make (make on ubuntu)
  −
* libssl (libssl0.9.8 on ubuntu)
  −
* gcc 4.4 or newer
      
=== Step 1b: Install Example Dataset ===
 
=== Step 1b: Install Example Dataset ===
Our dataset consists of 60 individuals from Great Britain (GBR) sequenced by the 1000 Genomes Project. These individuals have been sequenced to an average depth of about 4x.
+
Our dataset consists of individuals from Great Britain (GBR) sequenced by the 1000 Genomes Project. These individuals have been sequenced to an average depth of about 4x.
    
To conserve time and disk-space, our analysis will focus on a small region on chromosome 20, 42900000 - 43200000.  
 
To conserve time and disk-space, our analysis will focus on a small region on chromosome 20, 42900000 - 43200000.  
   −
The tutorial will run the alignment pipeline on 2 of the individuals (HG00096, HG00100).  The fastqs used for this step are reduced to reads that fall into our target region.
+
The alignment pipeline in this tutorial will be run on 2 of the individuals (HG00096, HG00100).  The fastqs used for this step are reduced to reads that fall into our target region.
   −
The tutorial will then used previously aligned/mapped reads for the full 60 individuals to generate a list of polymorphic sites and estimate accurate genotypes at each of these sites.  
+
The snpcall and ldrefine pipelines will use previously aligned/mapped reads for 60 individuals to generate a list of polymorphic sites and estimate accurate genotypes at each of these sites.  
    
The example dataset we'll be using is available at: ftp://share.sph.umich.edu/gotcloud/gotcloudExample.tgz  
 
The example dataset we'll be using is available at: ftp://share.sph.umich.edu/gotcloud/gotcloudExample.tgz  
Line 64: Line 56:  
  cd ~
 
  cd ~
 
  wget ftp://share.sph.umich.edu/gotcloud/gotcloudExample_latest.tgz  # Download  
 
  wget ftp://share.sph.umich.edu/gotcloud/gotcloudExample_latest.tgz  # Download  
  tar xvf gotcloudExample_latest.tgz    # Extracts into gotcloudExample/
+
  tar xf gotcloudExample_latest.tgz    # Extracts into gotcloudExample/
    
== STEP 2 : Run GotCloud Alignment Pipeline ==  
 
== STEP 2 : Run GotCloud Alignment Pipeline ==  
Line 83: Line 75:     
Run the alignment pipeline (the example aligns 2 samples) :  
 
Run the alignment pipeline (the example aligns 2 samples) :  
  ~/gotcloud/gotcloud align --conf ~/gotcloudExample/[[#Alignment Configuration File|GBR2align.conf]] --outdir [[#Alignment Output Directory|~/gotcloudTutorialOut]] --baseprefix ~/gotcloudExample
+
  cd ~
 +
gotcloud-latest/gotcloud align --conf gotcloudExample/[[#Alignment Configuration File|GBR2align.conf]] --outdir [[#Alignment Output Directory|gotcloudTutorialOut]] --baseprefix gotcloudExample
    
Upon successful completion of the alignment pipeline (about 1-3 minutes), you will see the following message:  
 
Upon successful completion of the alignment pipeline (about 1-3 minutes), you will see the following message:  
  Processing finished in nn secs with no errors reported  
+
  Processing finished in n secs with no errors reported  
    
The final BAM files produced by the alignment pipeline are:  
 
The final BAM files produced by the alignment pipeline are:  
  ls ~/gotcloudTutorialOut/bams
+
  ls gotcloudTutorialOut/bams
 
In this directory you will see:
 
In this directory you will see:
 
* BAM (.bam) files - 1 per sample
 
* BAM (.bam) files - 1 per sample
Line 97: Line 90:  
** HG00096.recal.bam.bai  
 
** HG00096.recal.bam.bai  
 
** HG00100.recal.bam.bai  
 
** HG00100.recal.bam.bai  
* BAM checksum files (.md5) – 1 per sample
  −
** HG00096.recal.bam.md5
  −
** HG00100.recal.bam.md5
   
* Indicator files that the step completed successfully:
 
* Indicator files that the step completed successfully:
 
** HG00096.recal.bam.done  
 
** HG00096.recal.bam.done  
Line 105: Line 95:     
The Quality Control (QC) files are:  
 
The Quality Control (QC) files are:  
  ls ~/gotcloudTutorialOut/QCFiles
+
  ls gotcloudTutorialOut/QCFiles
 
In this directory you will see:
 
In this directory you will see:
 
* VerifyBamID output files:
 
* VerifyBamID output files:
Line 151: Line 141:     
Run the variant calling pipeline:  
 
Run the variant calling pipeline:  
  ~/gotcloud/gotcloud snpcall --conf ~/gotcloudExample/[[GBR60vc.conf]] --outdir ~/gotcloudTutorialOut --numjobs 2 --region 20:42900000-43200000 --baseprefix ~/gotcloudExample
+
  cd ~
 +
gotcloud-latest/gotcloud snpcall --conf gotcloudExample/[[GBR60vc.conf]] --outdir gotcloudTutorialOut --numjobs 2 --region 20:42900000-43200000 --baseprefix gotcloudExample
   −
Upon successful completion of the variant calling pipeline (about 3-4 minutes), you will see the following message:  
+
Upon successful completion of the variant calling pipeline (about 2-4 minutes), you will see the following message:  
 
   Commands finished in nnn secs with no errors reported  
 
   Commands finished in nnn secs with no errors reported  
    
On SNP Call success, the VCF files of interest are:  
 
On SNP Call success, the VCF files of interest are:  
  ls ~/gotcloudTutorialOut/vcfs/chr20/chr20.filtered*
+
  ls gotcloudTutorialOut/vcfs/chr20/chr20.filtered*
    
This gives you the following files:
 
This gives you the following files:
Line 174: Line 165:  
** chr20.merged.vcf.OK - indicator that the step completed successfully
 
** chr20.merged.vcf.OK - indicator that the step completed successfully
 
* the hardfiltered (pre-svm filtered) variant calls:
 
* the hardfiltered (pre-svm filtered) variant calls:
** chr20.filtered.vcf.gz - vcf for whole chromosome after it has been run through hard filters
+
** chr20.hardfiltered.vcf.gz - vcf for whole chromosome after it has been run through hard filters
 
** chr20.hardfiltered.sites.vcf - vcf for whole chromosome after it has been run through filters and marked with PASS/FAIL without the per sample genotypes
 
** chr20.hardfiltered.sites.vcf - vcf for whole chromosome after it has been run through filters and marked with PASS/FAIL without the per sample genotypes
 
** chr20.hardfiltered.sites.vcf.log - log file
 
** chr20.hardfiltered.sites.vcf.log - log file
Line 195: Line 186:     
Note: the tutorial does not produce a target directory, but if you run with targeted data, you may see that.
 
Note: the tutorial does not produce a target directory, but if you run with targeted data, you may see that.
      
== STEP 4 : Run GotCloud Genotype Refinement Pipeline ==  
 
== STEP 4 : Run GotCloud Genotype Refinement Pipeline ==  
Line 201: Line 191:     
Run the LD-aware genotype refinement pipeline:  
 
Run the LD-aware genotype refinement pipeline:  
  ~/gotcloud/gotcloud ldrefine --conf ~/gotcloudExample/[[GBR60vc.conf]] --outdir ~/gotcloudTutorialOut --numjobs 2 --baseprefix ~/gotcloudExample
+
  cd ~
 +
gotcloud-latest/gotcloud ldrefine --conf gotcloudExample/[[GBR60vc.conf]] --outdir gotcloudTutorialOut --numjobs 2 --baseprefix gotcloudExample
   −
Upon successful completion of this pipeline (about 10 minutes), you will see the following message:  
+
Upon successful completion of this pipeline (about 3-10 minutes), you will see the following message:  
 
  Commands finished in nnn secs with no errors reported  
 
  Commands finished in nnn secs with no errors reported  
    
The output from the beagle step of the genotype refinement pipeline is found in:  
 
The output from the beagle step of the genotype refinement pipeline is found in:  
  ls ~/gotcloudTutorialOut/beagle/chr20/chr20.filtered.PASS.beagled.vcf.gz ~/gotcloudTutorialOut/beagle/chr20/chr20.filtered.PASS.beagled.vcf.gz.tbi  
+
  ls gotcloudTutorialOut/beagle/chr20/chr20.filtered.PASS.beagled.vcf.gz gotcloudTutorialOut/beagle/chr20/chr20.filtered.PASS.beagled.vcf.gz.tbi  
    
The output from the thunderVcf (final step) of the genotype refinement pipeline is found in:  
 
The output from the thunderVcf (final step) of the genotype refinement pipeline is found in:  
  ls ~/gotcloudTutorialOut/thunder/chr20/GBR/chr20.filtered.PASS.beagled.GBR.thunder.vcf.gz ~/gotcloudTutorialOut/thunder/chr20/GBR/chr20.filtered.PASS.beagled.GBR.thunder.vcf.gz.tbi  
+
  ls gotcloudTutorialOut/thunder/chr20/GBR/thunder/chr20.filtered.PASS.beagled.GBR.thunder.vcf.gz gotcloudTutorialOut/thunder/chr20/GBR/chr20.filtered.PASS.beagled.GBR.thunder.vcf.gz.tbi
 
  −
 
      
== STEP 5 : Run GotCloud Association Analysis Pipeline (EPACTS) ==  
 
== STEP 5 : Run GotCloud Association Analysis Pipeline (EPACTS) ==  

Navigation menu