Difference between revisions of "Tutorial: EMMAX GotCloud STOM: Lecture 6"
From Genome Analysis Wiki
Jump to navigationJump to search (Created page with "STOM 2014 Workshop - Practical Sessions 6 == Lecture 6 == The slides describing the notes below are available here (PDF) === Basic Setup ==...") |
|||
(2 intermediate revisions by the same user not shown) | |||
Line 28: | Line 28: | ||
* Configuration File - See the example configuration file below. | * Configuration File - See the example configuration file below. | ||
− | + | cat $S5/examples/index/chr7.CFTR.low_coverage.conf | |
CHRS = 7 | CHRS = 7 | ||
Line 50: | Line 50: | ||
(WARNING: This step will take a long time, up to 5 minutes, because it processes nearly 100 BAMs, 4 jobs simultaneously at a time) | (WARNING: This step will take a long time, up to 5 minutes, because it processes nearly 100 BAMs, 4 jobs simultaneously at a time) | ||
− | time $S5/gotcloud/gotcloud snpcall --conf | + | time $S5/gotcloud/gotcloud snpcall --conf $S5/examples/index/chr7.CFTR.low_coverage.conf --outDir ~/out/snps --baseprefix $S5/examples --region 7:117000000-117500000 --numjobs 4 |
* Check the summary statistics of SNP call to evaluate its quality | * Check the summary statistics of SNP call to evaluate its quality | ||
Line 60: | Line 60: | ||
(WARNING: This step will take a long time, up to 2 minutes) | (WARNING: This step will take a long time, up to 2 minutes) | ||
− | time $S5/gotcloud/gotcloud beagle --conf | + | time $S5/gotcloud/gotcloud beagle --conf $S5/examples/index/chr7.CFTR.low_coverage.conf --outDir ~/out/snps --baseprefix $S5/examples --region 7:117000000-117500000 --numjobs 2 |
=== Understanding the Output Files === | === Understanding the Output Files === |
Latest revision as of 09:58, 6 January 2014
STOM 2014 Workshop - Practical Sessions 6
Lecture 6
The slides describing the notes below are available here (PDF)
Basic Setup
If you still have the logged in console from lecture 5 practical session, you may skip this part
- To see the files for the session 5(,6, and 8), type
ls /data/stom2014/session5/
If you see any errors, please let me know now!
- For convenience, let’s set some variables
export S5=/data/stom2014/session5
- And check the input files
ls $S5/examples/bams
Preparing Input Files
- Index file - See the example index file already prepared for this project
less $S5/examples/index/chr7.CFTR.low_coverage.index
- Configuration File - See the example configuration file below.
cat $S5/examples/index/chr7.CFTR.low_coverage.conf
CHRS = 7 BAM_INDEX = index/chr7.CFTR.low_coverage.index ############ # References REF_ROOT = chr7Ref # REF = $(REF_ROOT)/hs37d5.chr7.fa INDEL_PREFIX = $(REF_ROOT)/1kg.pilot_release.merged.indels.sites.hg19 DBSNP_VCF = $(REF_ROOT)/dbsnp_135.b37.chr7.CFTR.vcf.gz HM3_VCF = $(REF_ROOT)/hapmap_3.3.b37.sites.chr7.CFTR.vcf.gz OMNI_VCF = $(REF_ROOT)/1000G_omni2.5.b37.sites.PASS.chr7.CFTR.vcf.gz
Default options should be mostly fine in many other cases. In this example, because it is not genome-wide calling, reference files are modified to be chr7-specific
Run GotCloud SNP calling Pipeline
- Using the prepared input files, call SNPs using the gotcloud SNP calling pipeline
(WARNING: This step will take a long time, up to 5 minutes, because it processes nearly 100 BAMs, 4 jobs simultaneously at a time)
time $S5/gotcloud/gotcloud snpcall --conf $S5/examples/index/chr7.CFTR.low_coverage.conf --outDir ~/out/snps --baseprefix $S5/examples --region 7:117000000-117500000 --numjobs 4
- Check the summary statistics of SNP call to evaluate its quality
cat ~/out/snps/vcfs/chr7/chr7.filtered.sites.vcf.summary
Running LD-aware genotype refinement
(WARNING: This step will take a long time, up to 2 minutes)
time $S5/gotcloud/gotcloud beagle --conf $S5/examples/index/chr7.CFTR.low_coverage.conf --outDir ~/out/snps --baseprefix $S5/examples --region 7:117000000-117500000 --numjobs 2
Understanding the Output Files
- To see the content of VCF file, try
zless ~/out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz
- If you want to look at a variant at a particular genomic position or in a particular genomic region, use the tabix utility to facilitate random access without reading the whole VCF file.
$S5/epacts/bin/tabix ~/out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz 7:117149147-117149147
The example above displays known CFTR risk variant, R75Q
Using samtools to visualize the sequence reads and verify the variant calls
samtools tview $S5/examples/bams/NA12843.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam $S5/examples/chr7Ref/hs37d5.chr7.fa
After running this.. try to
- Type 'g'
- Type '7:117149147'
- Type 'n' to change the display (color by nucleotide)
- Move with left arrow to see the variant site clearly