Line 8: |
Line 8: |
| GotCloud and this basic tutorial were presented at the [http://ibg.colorado.edu/dokuwiki/doku.php?id=workshop:2013:announcement 2013 IBG Workshop]. It was presented in two sessions. On Wednesday an overview was presented with steps for running the tutorial data: [[Media:IBG2013GotCloud.pdf|IBG2013GotCloud.pdf]]. On Friday more detail on the input files and what goes into generating the input files was presented: [[Media:GotCloudIBGWorkshop2013Friday.pdf|GotCloudIBGWorkshop2013Friday.pdf]]. | | GotCloud and this basic tutorial were presented at the [http://ibg.colorado.edu/dokuwiki/doku.php?id=workshop:2013:announcement 2013 IBG Workshop]. It was presented in two sessions. On Wednesday an overview was presented with steps for running the tutorial data: [[Media:IBG2013GotCloud.pdf|IBG2013GotCloud.pdf]]. On Friday more detail on the input files and what goes into generating the input files was presented: [[Media:GotCloudIBGWorkshop2013Friday.pdf|GotCloudIBGWorkshop2013Friday.pdf]]. |
| | | |
− | '''This tutorial is in the process of being updated for gotcloud version 1.06 (April 17. 2013).''' | + | '''This tutorial is in the process of being updated for gotcloud version 1.08 (July 30, 2013).''' |
| | | |
| == STEP 1 : Setup GotCloud == | | == STEP 1 : Setup GotCloud == |
Line 15: |
Line 15: |
| | | |
| We will use 3 different directories for this tutorial: | | We will use 3 different directories for this tutorial: |
− | # path to the directory where gotcloud is installed, default ~/gotcloud/ | + | # path to the directory where gotcloud is installed, default ~/gotcloud-latest/ |
| # path to the directory where the example data is installed, default ~/gotcloudExample | | # path to the directory where the example data is installed, default ~/gotcloudExample |
| # path to your output directory, default ~/gotcloudTutorialOut/ | | # path to your output directory, default ~/gotcloudTutorialOut/ |
Line 28: |
Line 28: |
| | | |
| Otherwise, you can install it in your own directory: | | Otherwise, you can install it in your own directory: |
− | # Change to the directory where you want gotcloud/ installed | + | # Change to the directory where you want gotcloud-latest/ installed |
− | # Download the gotcloud tar from the ftp site. | + | # Download the gotcloud tar. |
| # Extract the tar | | # Extract the tar |
| # Build (compile) the source | | # Build (compile) the source |
Line 35: |
Line 35: |
| | | |
| cd ~ | | cd ~ |
− | wget ftp://share.sph.umich.edu/gotcloud/gotcloud_latest.tgz # Download | + | wget https://github.com/statgen/gotcloud/archive/latest.tar.gz # Download |
− | tar xf gotcloud_latest.tgz # Extracts into gotcloud/ | + | tar xf latest.tar.gz # Extracts into gotcloud-latest/ |
− | cd ~/gotcloud/src; make # Build source | + | cd gotcloud-latest/src; make # Build source |
− |
| |
− | GotCloud requires the following tools to be installed.
| |
− | You can run ~/gotcloud/scripts/check_requirements.sh
| |
− | ...TBD – put in required programs/tools.
| |
− | * java (java-common default-jre on ubuntu)
| |
− | * make (make on ubuntu)
| |
− | * libssl (libssl0.9.8 on ubuntu)
| |
− | * gcc 4.4 or newer
| |
| | | |
| === Step 1b: Install Example Dataset === | | === Step 1b: Install Example Dataset === |
− | Our dataset consists of 60 individuals from Great Britain (GBR) sequenced by the 1000 Genomes Project. These individuals have been sequenced to an average depth of about 4x. | + | Our dataset consists of individuals from Great Britain (GBR) sequenced by the 1000 Genomes Project. These individuals have been sequenced to an average depth of about 4x. |
| | | |
| To conserve time and disk-space, our analysis will focus on a small region on chromosome 20, 42900000 - 43200000. | | To conserve time and disk-space, our analysis will focus on a small region on chromosome 20, 42900000 - 43200000. |
| | | |
− | The tutorial will run the alignment pipeline on 2 of the individuals (HG00096, HG00100). The fastqs used for this step are reduced to reads that fall into our target region. | + | The alignment pipeline in this tutorial will be run on 2 of the individuals (HG00096, HG00100). The fastqs used for this step are reduced to reads that fall into our target region. |
| | | |
− | The tutorial will then used previously aligned/mapped reads for the full 60 individuals to generate a list of polymorphic sites and estimate accurate genotypes at each of these sites. | + | The snpcall and ldrefine pipelines will use previously aligned/mapped reads for 60 individuals to generate a list of polymorphic sites and estimate accurate genotypes at each of these sites. |
| | | |
| The example dataset we'll be using is available at: ftp://share.sph.umich.edu/gotcloud/gotcloudExample.tgz | | The example dataset we'll be using is available at: ftp://share.sph.umich.edu/gotcloud/gotcloudExample.tgz |
Line 64: |
Line 56: |
| cd ~ | | cd ~ |
| wget ftp://share.sph.umich.edu/gotcloud/gotcloudExample_latest.tgz # Download | | wget ftp://share.sph.umich.edu/gotcloud/gotcloudExample_latest.tgz # Download |
− | tar xvf gotcloudExample_latest.tgz # Extracts into gotcloudExample/ | + | tar xf gotcloudExample_latest.tgz # Extracts into gotcloudExample/ |
| | | |
| == STEP 2 : Run GotCloud Alignment Pipeline == | | == STEP 2 : Run GotCloud Alignment Pipeline == |
Line 83: |
Line 75: |
| | | |
| Run the alignment pipeline (the example aligns 2 samples) : | | Run the alignment pipeline (the example aligns 2 samples) : |
− | ~/gotcloud/gotcloud align --conf ~/gotcloudExample/[[#Alignment Configuration File|GBR2align.conf]] --outdir [[#Alignment Output Directory|~/gotcloudTutorialOut]] --baseprefix ~/gotcloudExample | + | cd ~ |
| + | gotcloud-latest/gotcloud align --conf gotcloudExample/[[#Alignment Configuration File|GBR2align.conf]] --outdir [[#Alignment Output Directory|gotcloudTutorialOut]] --baseprefix gotcloudExample |
| | | |
| Upon successful completion of the alignment pipeline (about 1-3 minutes), you will see the following message: | | Upon successful completion of the alignment pipeline (about 1-3 minutes), you will see the following message: |
− | Processing finished in nn secs with no errors reported | + | Processing finished in n secs with no errors reported |
| | | |
| The final BAM files produced by the alignment pipeline are: | | The final BAM files produced by the alignment pipeline are: |
− | ls ~/gotcloudTutorialOut/bams | + | ls gotcloudTutorialOut/bams |
| In this directory you will see: | | In this directory you will see: |
| * BAM (.bam) files - 1 per sample | | * BAM (.bam) files - 1 per sample |
Line 97: |
Line 90: |
| ** HG00096.recal.bam.bai | | ** HG00096.recal.bam.bai |
| ** HG00100.recal.bam.bai | | ** HG00100.recal.bam.bai |
− | * BAM checksum files (.md5) – 1 per sample
| |
− | ** HG00096.recal.bam.md5
| |
− | ** HG00100.recal.bam.md5
| |
| * Indicator files that the step completed successfully: | | * Indicator files that the step completed successfully: |
| ** HG00096.recal.bam.done | | ** HG00096.recal.bam.done |
Line 105: |
Line 95: |
| | | |
| The Quality Control (QC) files are: | | The Quality Control (QC) files are: |
− | ls ~/gotcloudTutorialOut/QCFiles | + | ls gotcloudTutorialOut/QCFiles |
| In this directory you will see: | | In this directory you will see: |
| * VerifyBamID output files: | | * VerifyBamID output files: |
Line 151: |
Line 141: |
| | | |
| Run the variant calling pipeline: | | Run the variant calling pipeline: |
− | ~/gotcloud/gotcloud snpcall --conf ~/gotcloudExample/[[GBR60vc.conf]] --outdir ~/gotcloudTutorialOut --numjobs 2 --region 20:42900000-43200000 --baseprefix ~/gotcloudExample | + | cd ~ |
| + | gotcloud-latest/gotcloud snpcall --conf gotcloudExample/[[GBR60vc.conf]] --outdir gotcloudTutorialOut --numjobs 2 --region 20:42900000-43200000 --baseprefix gotcloudExample |
| | | |
− | Upon successful completion of the variant calling pipeline (about 3-4 minutes), you will see the following message: | + | Upon successful completion of the variant calling pipeline (about 2-4 minutes), you will see the following message: |
| Commands finished in nnn secs with no errors reported | | Commands finished in nnn secs with no errors reported |
| | | |
| On SNP Call success, the VCF files of interest are: | | On SNP Call success, the VCF files of interest are: |
− | ls ~/gotcloudTutorialOut/vcfs/chr20/chr20.filtered* | + | ls gotcloudTutorialOut/vcfs/chr20/chr20.filtered* |
| | | |
| This gives you the following files: | | This gives you the following files: |
Line 174: |
Line 165: |
| ** chr20.merged.vcf.OK - indicator that the step completed successfully | | ** chr20.merged.vcf.OK - indicator that the step completed successfully |
| * the hardfiltered (pre-svm filtered) variant calls: | | * the hardfiltered (pre-svm filtered) variant calls: |
− | ** chr20.filtered.vcf.gz - vcf for whole chromosome after it has been run through hard filters | + | ** chr20.hardfiltered.vcf.gz - vcf for whole chromosome after it has been run through hard filters |
| ** chr20.hardfiltered.sites.vcf - vcf for whole chromosome after it has been run through filters and marked with PASS/FAIL without the per sample genotypes | | ** chr20.hardfiltered.sites.vcf - vcf for whole chromosome after it has been run through filters and marked with PASS/FAIL without the per sample genotypes |
| ** chr20.hardfiltered.sites.vcf.log - log file | | ** chr20.hardfiltered.sites.vcf.log - log file |
Line 195: |
Line 186: |
| | | |
| Note: the tutorial does not produce a target directory, but if you run with targeted data, you may see that. | | Note: the tutorial does not produce a target directory, but if you run with targeted data, you may see that. |
− |
| |
| | | |
| == STEP 4 : Run GotCloud Genotype Refinement Pipeline == | | == STEP 4 : Run GotCloud Genotype Refinement Pipeline == |
Line 201: |
Line 191: |
| | | |
| Run the LD-aware genotype refinement pipeline: | | Run the LD-aware genotype refinement pipeline: |
− | ~/gotcloud/gotcloud ldrefine --conf ~/gotcloudExample/[[GBR60vc.conf]] --outdir ~/gotcloudTutorialOut --numjobs 2 --baseprefix ~/gotcloudExample | + | cd ~ |
| + | gotcloud-latest/gotcloud ldrefine --conf gotcloudExample/[[GBR60vc.conf]] --outdir gotcloudTutorialOut --numjobs 2 --baseprefix gotcloudExample |
| | | |
− | Upon successful completion of this pipeline (about 10 minutes), you will see the following message: | + | Upon successful completion of this pipeline (about 3-10 minutes), you will see the following message: |
| Commands finished in nnn secs with no errors reported | | Commands finished in nnn secs with no errors reported |
| | | |
| The output from the beagle step of the genotype refinement pipeline is found in: | | The output from the beagle step of the genotype refinement pipeline is found in: |
− | ls ~/gotcloudTutorialOut/beagle/chr20/chr20.filtered.PASS.beagled.vcf.gz ~/gotcloudTutorialOut/beagle/chr20/chr20.filtered.PASS.beagled.vcf.gz.tbi | + | ls gotcloudTutorialOut/beagle/chr20/chr20.filtered.PASS.beagled.vcf.gz gotcloudTutorialOut/beagle/chr20/chr20.filtered.PASS.beagled.vcf.gz.tbi |
| | | |
| The output from the thunderVcf (final step) of the genotype refinement pipeline is found in: | | The output from the thunderVcf (final step) of the genotype refinement pipeline is found in: |
− | ls ~/gotcloudTutorialOut/thunder/chr20/GBR/chr20.filtered.PASS.beagled.GBR.thunder.vcf.gz ~/gotcloudTutorialOut/thunder/chr20/GBR/chr20.filtered.PASS.beagled.GBR.thunder.vcf.gz.tbi | + | ls gotcloudTutorialOut/thunder/chr20/GBR/thunder/chr20.filtered.PASS.beagled.GBR.thunder.vcf.gz gotcloudTutorialOut/thunder/chr20/GBR/chr20.filtered.PASS.beagled.GBR.thunder.vcf.gz.tbi |
− | | |
− | | |
| | | |
| == STEP 5 : Run GotCloud Association Analysis Pipeline (EPACTS) == | | == STEP 5 : Run GotCloud Association Analysis Pipeline (EPACTS) == |