Tutorial: EMMAX GotCloud STOM: Lecture 6

From Genome Analysis Wiki
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

STOM 2014 Workshop - Practical Sessions 6

Lecture 6

The slides describing the notes below are available here (PDF)

Basic Setup

If you still have the logged in console from lecture 5 practical session, you may skip this part

  • To see the files for the session 5(,6, and 8), type
ls /data/stom2014/session5/

If you see any errors, please let me know now!

  • For convenience, let’s set some variables
export S5=/data/stom2014/session5
  • And check the input files
ls $S5/examples/bams

Preparing Input Files

  • Index file - See the example index file already prepared for this project
less $S5/examples/index/chr7.CFTR.low_coverage.index
  • Configuration File - See the example configuration file below.
cat $S5/examples/index/chr7.CFTR.low_coverage.conf
CHRS = 7
BAM_INDEX = index/chr7.CFTR.low_coverage.index 
############
# References
REF_ROOT = chr7Ref
#
REF = $(REF_ROOT)/hs37d5.chr7.fa
INDEL_PREFIX = $(REF_ROOT)/1kg.pilot_release.merged.indels.sites.hg19
DBSNP_VCF =  $(REF_ROOT)/dbsnp_135.b37.chr7.CFTR.vcf.gz
HM3_VCF =  $(REF_ROOT)/hapmap_3.3.b37.sites.chr7.CFTR.vcf.gz
OMNI_VCF = $(REF_ROOT)/1000G_omni2.5.b37.sites.PASS.chr7.CFTR.vcf.gz

Default options should be mostly fine in many other cases. In this example, because it is not genome-wide calling, reference files are modified to be chr7-specific

Run GotCloud SNP calling Pipeline

  • Using the prepared input files, call SNPs using the gotcloud SNP calling pipeline

(WARNING: This step will take a long time, up to 5 minutes, because it processes nearly 100 BAMs, 4 jobs simultaneously at a time)

time $S5/gotcloud/gotcloud snpcall --conf $S5/examples/index/chr7.CFTR.low_coverage.conf --outDir ~/out/snps --baseprefix $S5/examples --region 7:117000000-117500000 --numjobs 4
  • Check the summary statistics of SNP call to evaluate its quality
cat ~/out/snps/vcfs/chr7/chr7.filtered.sites.vcf.summary

Running LD-aware genotype refinement

(WARNING: This step will take a long time, up to 2 minutes)

time $S5/gotcloud/gotcloud beagle --conf $S5/examples/index/chr7.CFTR.low_coverage.conf --outDir ~/out/snps --baseprefix $S5/examples  --region 7:117000000-117500000 --numjobs 2

Understanding the Output Files

  • To see the content of VCF file, try
zless ~/out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz
  • If you want to look at a variant at a particular genomic position or in a particular genomic region, use the tabix utility to facilitate random access without reading the whole VCF file.
$S5/epacts/bin/tabix ~/out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz 7:117149147-117149147
The example above displays known CFTR risk variant, R75Q

Using samtools to visualize the sequence reads and verify the variant calls

samtools tview $S5/examples/bams/NA12843.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam $S5/examples/chr7Ref/hs37d5.chr7.fa

After running this.. try to

  1. Type 'g'
  2. Type '7:117149147'
  3. Type 'n' to change the display (color by nucleotide)
  4. Move with left arrow to see the variant site clearly