Changes

341 bytes added , 12:35, 10 January 2014

no edit summary

Line 8: Line 8:

GotCloud and this basic tutorial were presented at the [http://ibg.colorado.edu/dokuwiki/doku.php?id=workshop:2013:announcement 2013 IBG Workshop]. It was presented in two sessions. On Wednesday an overview was presented with steps for running the tutorial data: [[Media:IBG2013GotCloud.pdf|IBG2013GotCloud.pdf]]. On Friday more detail on the input files and what goes into generating the input files was presented: [[Media:GotCloudIBGWorkshop2013Friday.pdf|GotCloudIBGWorkshop2013Friday.pdf]].

−

'''This tutorial is in the process of being updated for gotcloud version 1.06 (~~April 17.~~ 2013).'''

+

'''This tutorial is in the process of being updated for gotcloud version 1.08 (July 30, 2013).'''

== STEP 1 : Setup GotCloud ==

Line 15: Line 15:

We will use 3 different directories for this tutorial:

−

# path to the directory where gotcloud is installed, default ~/gotcloud/

+

# path to the directory where gotcloud is installed, default ~/gotcloud-latest/

# path to the directory where the example data is installed, default ~/gotcloudExample

# path to your output directory, default ~/gotcloudTutorialOut/

Line 28: Line 28:

Otherwise, you can install it in your own directory:

−

# Change to the directory where you want gotcloud/ installed

+

# Change to the directory where you want gotcloud-latest/ installed

−

# Download the gotcloud tar ~~from the ftp site~~.

+

# Download the gotcloud tar.

# Extract the tar

# Build (compile) the source

Line 35: Line 35:

cd ~

−

wget ~~ftp~~://~~share~~.~~sph.umich.edu~~/gotcloud/~~gotcloud_latest~~.~~tgz~~ # Download

+

wget https://github.com/statgen/gotcloud/archive/latest.tar.gz # Download

−

tar xf ~~gotcloud_latest~~.~~tgz~~ # Extracts into gotcloud/

+

tar xf latest.tar.gz # Extracts into gotcloud-latest/

−

cd ~/gotcloud/src; make # Build source

+

cd gotcloud-latest/src; make # Build source

−

~~GotCloud requires the following tools to be installed.~~

−

~~You can run ~/gotcloud/scripts/check_requirements.sh~~

−

~~...TBD – put in required programs/tools.~~

−

* java (java-common default-jre on ubuntu)

−

* make (make on ubuntu)

−

* libssl (libssl0.9.8 on ubuntu)

−

* gcc 4.4 or newer

=== Step 1b: Install Example Dataset ===

−

Our dataset consists of 60 individuals from Great Britain (GBR) sequenced by the 1000 Genomes Project. These individuals have been sequenced to an average depth of about 4x.

+

Our dataset consists of individuals from Great Britain (GBR) sequenced by the 1000 Genomes Project. These individuals have been sequenced to an average depth of about 4x.

To conserve time and disk-space, our analysis will focus on a small region on chromosome 20, 42900000 - 43200000.

−

The tutorial will run ~~the alignment pipeline~~ on 2 of the individuals (HG00096, HG00100). The fastqs used for this step are reduced to reads that fall into our target region.

+

The alignment pipeline in this tutorial will be run on 2 of the individuals (HG00096, HG00100). The fastqs used for this step are reduced to reads that fall into our target region.

−

The ~~tutorial~~ will ~~then used~~ previously aligned/mapped reads for ~~the full~~ 60 individuals to generate a list of polymorphic sites and estimate accurate genotypes at each of these sites.

+

The snpcall and ldrefine pipelines will use previously aligned/mapped reads for 60 individuals to generate a list of polymorphic sites and estimate accurate genotypes at each of these sites.

The example dataset we'll be using is available at: ftp://share.sph.umich.edu/gotcloud/gotcloudExample.tgz

Line 64: Line 56:

cd ~

wget ftp://share.sph.umich.edu/gotcloud/gotcloudExample_latest.tgz # Download

−

tar ~~xvf~~ gotcloudExample_latest.tgz # Extracts into gotcloudExample/

+

tar xf gotcloudExample_latest.tgz # Extracts into gotcloudExample/

== STEP 2 : Run GotCloud Alignment Pipeline ==

Line 83: Line 75:

Run the alignment pipeline (the example aligns 2 samples) :

−

~/gotcloud/gotcloud align --conf ~/gotcloudExample/[[#Alignment Configuration File|GBR2align.conf]] --outdir [[#Alignment Output Directory|~/gotcloudTutorialOut]]

+

cd ~

+

gotcloud-latest/gotcloud align --conf gotcloudExample/[[#Alignment Configuration File|GBR2align.conf]] --outdir [[#Alignment Output Directory|gotcloudTutorialOut]] --baseprefix gotcloudExample

−

Upon successful completion of the alignment pipeline (about 1-2 minutes), you will see the following message:

+

Upon successful completion of the alignment pipeline (about 1-3 minutes), you will see the following message:

−

Processing finished in nn secs with no errors reported

+

Processing finished in n secs with no errors reported

The final BAM files produced by the alignment pipeline are:

−

ls ~/gotcloudTutorialOut/bams

+

ls gotcloudTutorialOut/bams

In this directory you will see:

* BAM (.bam) files - 1 per sample

Line 97: Line 90:

** HG00096.recal.bam.bai

** HG00100.recal.bam.bai

−

* ~~BAM checksum~~ files ~~(.md5) – 1 per sample~~

+

* Indicator files that the step completed successfully:

−

** HG00096.recal.bam.md5

−

** HG00100.recal.bam.md5

−

* Indicator flies that the step completed successfully:

** HG00096.recal.bam.done

** HG00100.recal.bam.done

The Quality Control (QC) files are:

−

ls ~/gotcloudTutorialOut/QCFiles

+

ls gotcloudTutorialOut/QCFiles

In this directory you will see:

* VerifyBamID output files:

Line 133: Line 123:

For information on the VerifyBamID output, see: [[Understanding VerifyBamID output]]

−

For information on the QPLOT output, see: [[Understanding QPLOT output]]

+

For information on the QPLOT output, see: [[Understanding QPLOT output]]

== STEP 3 : Run GotCloud Variant Calling Pipeline ==

Line 151: Line 141:

Run the variant calling pipeline:

−

~/gotcloud/gotcloud snpcall --conf [[GBR60vc.conf]] --outdir ~/gotcloudTutorialOut --numjobs 2 --region 20:42900000-43200000

+

cd ~

+

gotcloud-latest/gotcloud snpcall --conf gotcloudExample/[[GBR60vc.conf]] --outdir gotcloudTutorialOut --numjobs 2 --region 20:42900000-43200000 --baseprefix gotcloudExample

−

Upon successful completion of the variant calling pipeline (about 3-4 minutes), you will see the following message:

+

Upon successful completion of the variant calling pipeline (about 2-4 minutes), you will see the following message:

Commands finished in nnn secs with no errors reported

On SNP Call success, the VCF files of interest are:

−

ls ~/gotcloudTutorialOut/vcfs/chr20/chr20.filtered*

+

ls gotcloudTutorialOut/vcfs/chr20/chr20.filtered*

This gives you the following files:

−

* '''chr20.filtered.vcf.gz ''' - vcf for whole chromosome after it has been run through filters and marked with PASS/FAIL including per sample genotypes

+

* '''chr20.filtered.vcf.gz ''' - vcf for whole chromosome after it has been run through hardfilters and SVM filters and marked with PASS/FAIL including per sample genotypes

* chr20.filtered.sites.vcf - vcf for whole chromosome after it has been run through filters and marked with PASS/FAIL without the per sample genotypes

−

* chr20.filtered.sites.vcf.log - log file

+

* chr20.filtered.sites.vcf.norm.log - log file

* chr20.filtered.sites.vcf.summary - summary of filters applied

* chr20.filtered.vcf.gz.OK - indicator that the filtering completed successfully

Line 173: Line 164:

** chr20.merged.vcf - including per sample genotypes

** chr20.merged.vcf.OK - indicator that the step completed successfully

+

* the hardfiltered (pre-svm filtered) variant calls:

+

** chr20.hardfiltered.vcf.gz - vcf for whole chromosome after it has been run through hard filters

+

** chr20.hardfiltered.sites.vcf - vcf for whole chromosome after it has been run through filters and marked with PASS/FAIL without the per sample genotypes

+

** chr20.hardfiltered.sites.vcf.log - log file

+

** chr20.hardfiltered.sites.vcf.summary - summary of filters applied

+

** chr20.hardfiltered.vcf.gz.OK - indicator that the filtering completed successfully

+

** chr20.hardfiltered.vcf.gz.tbi - index file for the vcf file

* 40000001.45000000 subdirectory contains the data for just that region.

Line 189: Line 187:

Note: the tutorial does not produce a target directory, but if you run with targeted data, you may see that.

−

== STEP 4 ~~: Run Support Vector Machine (SVM) Pipeline ==~~

+

== STEP 4 : Run GotCloud Genotype Refinement Pipeline ==

−

~~== STEP 5~~ : Run GotCloud Genotype Refinement Pipeline ==

The next step is to perform genotype refinement using linkage disequilibrium information using [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] & [[ThunderVCF]].

Run the LD-aware genotype refinement pipeline:

−

~/gotcloud/gotcloud ldrefine --conf [[GBR60vc.conf]] --outdir ~/gotcloudTutorialOut --numjobs 2

+

cd ~

+

gotcloud-latest/gotcloud ldrefine --conf gotcloudExample/[[GBR60vc.conf]] --outdir gotcloudTutorialOut --numjobs 2 --baseprefix gotcloudExample

−

Upon successful completion of this pipeline, you will see the following message:

+

Upon successful completion of this pipeline (about 3-10 minutes), you will see the following message:

Commands finished in nnn secs with no errors reported

The output from the beagle step of the genotype refinement pipeline is found in:

−

ls ~/gotcloudTutorialOut/beagle/chr20/chr20.filtered.PASS.beagled.vcf.gz ~/gotcloudTutorialOut/beagle/chr20/chr20.filtered.PASS.beagled.vcf.gz.tbi

+

ls gotcloudTutorialOut/beagle/chr20/chr20.filtered.PASS.beagled.vcf.gz gotcloudTutorialOut/beagle/chr20/chr20.filtered.PASS.beagled.vcf.gz.tbi

The output from the thunderVcf (final step) of the genotype refinement pipeline is found in:

−

ls ~/gotcloudTutorialOut/thunder/chr20/GBR/chr20.filtered.PASS.beagled.GBR.thunder.vcf.gz ~/gotcloudTutorialOut/thunder/chr20/GBR/chr20.filtered.PASS.beagled.GBR.thunder.vcf.gz.tbi

+

ls gotcloudTutorialOut/thunder/chr20/GBR/thunder/chr20.filtered.PASS.beagled.GBR.thunder.vcf.gz gotcloudTutorialOut/thunder/chr20/GBR/chr20.filtered.PASS.beagled.GBR.thunder.vcf.gz.tbi

−

== STEP 6 : Run GotCloud Association Analysis Pipeline (EPACTS) ==

+

== STEP 5 : Run GotCloud Association Analysis Pipeline (EPACTS) ==

We will assume that the EPACTS are installed in the following directory

Line 230: Line 225:

= Frequently Asked Questions (FAQs) =

−

'''I ran the ~~tutorai~~ example successfully, how can I run it with my real sequence data?'''

+

'''I ran the tutorial example successfully, how can I run it with my real sequence data?'''

−

Congratulations for your successful run of your [[GotCloud]] Tutorial. Please see [[#Tutorial Inputs]] section to prepare your own input files for your sequence data. You will need to specify the FASTQ files associated with its sample names as explained. In addition, you will need to download the full reference and resource file across whole genome (the Tutorial contains only chr20 portion to make it compact) See [[#Alignment Configuration File]] section for the detailed information. Also, please refer to the original documentation of [[GotCloud]] for more detailed guide on installation beyond the scope of tutorial

+

Congratulations for your successful run of your [[GotCloud]] Tutorial. Please see [[#Tutorial Inputs]] section to prepare your own input files for your sequence data. You will need to specify the FASTQ files associated with its sample names as explained. In addition, you will need to download the full reference and resource file across whole genome (the Tutorial contains only chr20 portion to make it compact) See [[#Alignment Configuration File]] section for the detailed information. Also, please refer to the original documentation of [[GotCloud]] for more detailed guide on installation beyond the scope of the tutorial.

= Input Files for GotCloud Tutorial =

Mktrost

Administrators

3,045

edits

Changes

Tutorial: GotCloud (view source)

Revision as of 12:35, 10 January 2014

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools