Difference between revisions of "Tutorial: GotCloud UW CMG"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 82: Line 82:
 
  % cat ${GC}/out/snps/vcfs/chr7/chr7.filtered.sites.vcf.summary
 
  % cat ${GC}/out/snps/vcfs/chr7/chr7.filtered.sites.vcf.summary
  
  % time ${GC}/gotcloud/gotcloud beagle--conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2
+
  % time ${GC}/gotcloud/gotcloud beagle --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2
  
 
  % samtools tview ${GC}/examples/bams/NA12843.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam ${GC}/examples/chr7Ref/hs37d5.chr7.fa
 
  % samtools tview ${GC}/examples/bams/NA12843.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam ${GC}/examples/chr7Ref/hs37d5.chr7.fa
 
  
 
== STEP 4 : EPACTS association analysis  ==  
 
== STEP 4 : EPACTS association analysis  ==  

Revision as of 10:30, 6 August 2013

GotCloud / EPACTS Tutorial

In this tutorial, we illustrate some of the essential steps in the analysis of next generation sequence data.

For a background on GotCloud and Sequence Analysis Pipelines, see GotCloud

While GotCloud can run on a cluster of machines or instances, this tutorial is just a small test that just runs on the machine the commands are run on.

This tutorial is a specialized version for UW Center for Mendelian Genomics. For other tutorial, please see Tutorial: GotCloud.

STEP 0 : Login to the cluster

In UW CMG cluster, you need to login to one of the high-performance cluster node via the following way.

% ssh yourid@uwcmg-head.gs.washington.edu
Password:
% qlogin
Your job 120 ("QLOGIN") has been submitted
waiting for interactive job to be scheduled …
Your interactive job 120 has been successfully scheduled.
Establishing /usr/local/bin/qlogin_command session to host uwcmg001.gs.washington.edu ...
yourid@uwcmg001.gs.washington.edu's password: 

% ls ${GC}/examples

Step 1 : GotCloud Alignment Pipeline

In order to run this tutorial, you need to make sure you have GotCloud installed on your system.

Try to run the following commands sequentially to follow the instruction

% export GC=/home/presenter02/day2/session1/uwcmg_2013_08/
% ls ${GC}/examples/fastq
% ls ${GC}/examples
% cat ${GC}/examples/index/chr7.CFTR.fastq.index
% cat ${GC}/examples/index/chr7.CFTR.align.conf
% mkdir test
% cd test

Then, run the alignment pipeline through the following command

% ${GC}/gotcloud/gotcloud align --conf ${GC}/examples/index/chr7.CFTR.align.conf --outDir align --baseprefix ${GC}/examples

Examine the QC metrics by

% ls align/bams

% ls align/QCFiles
% cat align/QCFiles/NA06984.qplot.stats
% cat align/QCFiles/NA06984.genoCheck.selfSM

STEP 3 : Run GotCloud SNP Calling Pipeline

The next step is to analyze BAM files by calling SNPs and generating a VCF file containing the variant calls.

The variant calling pipeline has multiple built-in steps to generate BAMs:

  1. Filter out reads with low mapping quality
  2. Per Base Alignment Quality Adjustment (BAQ)
  3. Resolve overlapping paired end reads
  4. Generate genotype likelihood files
  5. Perform variant calling
  6. Extract features from variant sites
  7. Perform variant filtering


To speed variant calling, each chromosome is broken up into smaller regions which are processed separately. While initially split by sample, the per sample data gets merged and is processed together for each region. These regions are later merged to result in a single Variant Call File (VCF) per chromosome.

% ls ${GC}/examples/bams
% cat ${GC}/examples/index/chr7.CFTR.low_coverage.index
% cat ${GC}/examples/index/chr7.CFTR.low_coverage.conf
% time ${GC}/gotcloud/gotcloud snpcall --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2
% cat ${GC}/out/snps/vcfs/chr7/chr7.filtered.sites.vcf.summary
% time ${GC}/gotcloud/gotcloud beagle --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2
% samtools tview ${GC}/examples/bams/NA12843.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam ${GC}/examples/chr7Ref/hs37d5.chr7.fa

STEP 4 : EPACTS association analysis

% head ${GC}/examples/index/chr7.CFTR.ped 
% mkdir assoc
% time ${GC}/epacts/bin/epacts single  --ped ${GC}/examples/index/chr7.CFTR.ped   --vcf ${GC}/out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz  --pheno PHENO --out assoc/single.b.score --test b.score --anno   --ref ${GC}/examples/chr7Ref/hs37d5.chr7.fa   --region 7:117000000-117500000 --run 1
% time ${GC}/epacts/bin/epacts anno --in ${GC}/out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz --out snps/chr7.filtered.PASS.beagled.anno.vcf.gz --ref ${GC}examples/chr7Ref/hs37d5.chr7.fa
% zcat snps/chr7.filtered.PASS.beagled.anno.vcf.gz | grep Nonsynonymous | grep CFTR | cut -f 1-8 | head -1
% ${GC}/epacts/bin/epacts make-group --vcf snps/chr7.filtered.PASS.beagled.anno.vcf.gz --out snps/chr7.filtered.PASS.beagled.anno.grp –nonsyn
% ${GC}/epacts/bin/epacts group --ped ${GC}/examples/index/chr7.CFTR.ped --vcf snps/chr7.filtered.PASS.beagled.anno.vcf.gz --out assoc/group.skat.o --groupf snps/chr7.filtered.PASS.beagled.anno.grp --test skat --skat-o --run 2
% cat assoc/group.skat.o.epacts 
% ${GC}/epacts/bin/epacts group --ped ${GC}/examples/index/chr7.CFTR.ped --vcf snps/

STEP 5 : All-in-one running of all scripts (for later use)

% tar xzvf /home/presenter02/day2/session1/uwcmg_2013_08.tar.gz�
% cd /home/presenter02/day2/session1/uwcmg_2013_08�
% sh go.sh  �