Difference between revisions of "Tutorial: GotCloud UW CMG"
Line 210: | Line 210: | ||
% ${GC}/gotcloud/gotcloud snpcall --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2 | % ${GC}/gotcloud/gotcloud snpcall --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2 | ||
+ | |||
+ | CHRS = 7 | ||
+ | BAM_INDEX = index/chr7.CFTR.low_coverage.index | ||
+ | ############ | ||
+ | # References | ||
+ | REF_ROOT = chr7Ref | ||
+ | # | ||
+ | REF = $(REF_ROOT)/hs37d5.chr7.fa | ||
+ | INDEL_PREFIX = $(REF_ROOT)/1kg.pilot_release.merged.indels.sites.hg19 | ||
+ | DBSNP_VCF = $(REF_ROOT)/dbsnp_135.b37.chr7.CFTR.vcf.gz | ||
+ | HM3_VCF = $(REF_ROOT)/hapmap_3.3.b37.sites.chr7.CFTR.vcf.gz | ||
+ | OMNI_VCF = $(REF_ROOT)/1000G_omni2.5.b37.sites.PASS.chr7.CFTR.vcf.gz | ||
+ | [presenter02@uwcmg002 test]$ time ${GC}/gotcloud/gotcloud snpcall --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2 | ||
+ | Key configurations: | ||
+ | GOTCLOUD_ROOT: /home/presenter02/day2/session1/uwcmg_2013_08/gotcloud | ||
+ | OUT_DIR: snps | ||
+ | BAM_INDEX: index/chr7.CFTR.low_coverage.index | ||
+ | REF: chr7Ref/hs37d5.chr7.fa | ||
+ | CHRS: 7 | ||
+ | BATCH_TYPE: local | ||
+ | BATCH_OPTS: | ||
+ | |||
+ | Processing the following steps... | ||
+ | 2: RUN_PILEUP | ||
+ | 3: RUN_GLFMULTIPLES | ||
+ | 4: RUN_VCFPILEUP | ||
+ | 5: RUN_FILTER | ||
+ | 6: RUN_SVM | ||
+ | 8: RUN_SPLIT | ||
+ | Call region is 7:117000000-117500000 | ||
+ | Generating commands for chr7... | ||
+ | Creating glf INDEX at 7:115000001-120000000.. | ||
+ | -------------------------------------------------------------------- | ||
+ | Finished creating makefile /home/presenter02/test/snps/umake.Makefile | ||
+ | |||
+ | Running /home/presenter02/test/snps/umake.Makefile | ||
+ | |||
+ | Start merging VCFs... | ||
+ | Merging /home/presenter02/test/snps/vcfs/chr7/115000001.120000000/chr7.115000001.120000000.vcf into STDOUT... | ||
+ | Start merging VCFs... | ||
+ | Merging /home/presenter02/test/snps/vcfs/chr7/115000001.120000000/chr7.115000001.120000000.stats.vcf into STDOUT... | ||
+ | |||
+ | The following parameters are available. Ones with "[]" are in effect: | ||
+ | |||
+ | Available Options | ||
+ | Recipes : --write-bed, --write-vcf [ON], --upgrade, | ||
+ | --summarize, --filter [ON], --subset | ||
+ | VCF Input options : --in-vcf [/home/presenter02/test/snps/vcfs/chr7/chr7.merged.stats.vcf] | ||
+ | BED Input options : --in-bfile [], --in-bed [], --in-bim [], | ||
+ | --in-fam [], | ||
+ | --ref [/data/local/ref/karma.ref/human.g1k.v37.fa] | ||
+ | Subsetting options : --in-subset [], --mono-subset, | ||
+ | --filt-only-subset | ||
+ | Output Options : --out [/home/presenter02/test/snps/vcfs/chr7/chr7.hardfiltered.sites.vcf], | ||
+ | --qGeno, --print-every [10000] | ||
+ | Output compression Options : --plain [ON], --bgzf, --gzip | ||
+ | Genotype-level Filter Options : --minGQ, --minGD | ||
+ | Filter Options : --winIndel [5], | ||
+ | --indelVCF [/home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/1kg.pilot_release.merged.indels.sites.hg19.chr7.vcf], | ||
+ | --minQUAL [5], --minMQ [20], | ||
+ | --maxDP [99000], --minDP [99], | ||
+ | --maxABL [70], --winFFRQ, --maxFFRQ, | ||
+ | --winFVAR, --merFVAR, --maxFVAR, | ||
+ | --minNS [1], --maxSTP [2147483647], | ||
+ | --maxTTT [2147483647], | ||
+ | --minTTT [-2147483648], --maxSTR [20], | ||
+ | --minSTR [-20], --maxSTZ [5], --minSTZ [-5], | ||
+ | --maxCBR [20], --minCBR [-100], | ||
+ | --maxQBR [100], --minQBR [-100], | ||
+ | --maxCBZ [2147483647], --maxCSR [100], | ||
+ | --minCSR [-100], --maxLQZ [2147483647], | ||
+ | --minLQZ [-2147483648], | ||
+ | --maxRBZ [2147483647], | ||
+ | --minRBZ [-2147483648], | ||
+ | --maxIOZ [2147483647], | ||
+ | --minIOR [-2147483648], | ||
+ | --maxIOR [2147483647], | ||
+ | --maxAOZ [2147483647], --maxAOI [5], | ||
+ | --maxMQ0 [10], --maxMQ10 [100], | ||
+ | --maxMQ20 [100], --minFIC [-20], | ||
+ | --minABE [-100], --maxABE [100], | ||
+ | --maxLQR [30], --minMBR [-100], | ||
+ | --maxMBR [100], --minABZ [-2147483648], | ||
+ | --maxABZ [2147483647], | ||
+ | --maxBCS [2147483647], --keepFilter | ||
+ | |||
+ | Analysis started on Tue Aug 6 12:25:04 2013 | ||
+ | |||
+ | The following filters are in effect: | ||
+ | m20 : MQ >= 20.00 | ||
+ | dp99 : DP >= 99.00 | ||
+ | DP99000 : DP <= 99000.00 | ||
+ | ns1 : NS >= 1.00 | ||
+ | AB70 : AB <= 0.70 | ||
+ | STR20 : STR <= 0.20 | ||
+ | str-20 : STR >= -0.20 | ||
+ | stz-5 : STZ >= -5.00 | ||
+ | STZ5 : STZ <= 5.00 | ||
+ | CBR20 : CBR <= 0.20 | ||
+ | AOI5 : AOI <= 5.00 | ||
+ | MQ0_10 : MQ0 <= 0.10 | ||
+ | fic-20 : FIC >= -0.20 | ||
+ | LQR30 : LQR <= 0.30 | ||
+ | q5 : QUAL >= 5 | ||
+ | INDEL5 : INDEL >= 5 bp with /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/1kg.pilot_release.merged.indels.sites.hg19.chr7.vcf | ||
+ | |||
+ | Writing to VCF file /home/presenter02/test/snps/vcfs/chr7/chr7.hardfiltered.sites.vcf | ||
+ | Analysis finished on Tue Aug 6 12:25:05 2013 | ||
+ | |||
+ | loading /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/dbsnp_135.b37.chr7.CFTR.vcf.gz as a VCF input.. | ||
+ | finished loading dbSNP for 7415 variants | ||
+ | TOTAL 1538 1538 662 378 166 25 145 129 27 6 | ||
+ | |||
+ | The following parameters are available. Ones with "[]" are in effect: | ||
+ | |||
+ | Available Options | ||
+ | Input options : --in [/home/presenter02/test/snps/vcfs/chr7/chr7.filtered.sites.vcf.raw], | ||
+ | --ignore | ||
+ | Output Options : --out [/home/presenter02/test/snps/vcfs/chr7/chr7.filtered.sites.vcf.norm], | ||
+ | --digits [3], --rank, --verbose [ON] | ||
+ | Output compression Options : --plain [ON], --bgzf, --gzip | ||
+ | |||
+ | Analysis started on Tue Aug 6 12:25:05 2013 | ||
+ | |||
+ | Reading Input File /home/presenter02/test/snps/vcfs/chr7/chr7.filtered.sites.vcf.raw | ||
+ | Finished Reading 1538 lines | ||
+ | Starting sorting values | ||
+ | Writing output values | ||
+ | Analysis finished on Tue Aug 6 12:25:05 2013 | ||
+ | |||
+ | loading /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/dbsnp_135.b37.chr7.CFTR.vcf.gz as a VCF input.. | ||
+ | finished loading dbSNP for 7415 variants | ||
+ | TOTAL 1538 1538 662 378 166 25 145 129 27 6 | ||
+ | Commands finished in 306 secs with no errors reported | ||
+ | -------------------------------------------------------------------- | ||
% cat ${GC}/out/snps/vcfs/chr7/chr7.filtered.sites.vcf.summary | % cat ${GC}/out/snps/vcfs/chr7/chr7.filtered.sites.vcf.summary |
Revision as of 12:37, 6 August 2013
GotCloud / EPACTS Tutorial
In this tutorial, we illustrate some of the essential steps in the analysis of next generation sequence data.
For a background on GotCloud and Sequence Analysis Pipelines, see GotCloud
While GotCloud can run on a cluster of machines or instances, this tutorial is just a small test that just runs on the machine the commands are run on.
This tutorial is a specialized version for UW Center for Mendelian Genomics. For other tutorial, please see Tutorial: GotCloud.
STEP 0 : Login to the cluster
In UW CMG cluster, you need to login to one of the high-performance cluster node via the following way.
% ssh yourid@uwcmg-head.gs.washington.edu Password:
% qlogin Your job 120 ("QLOGIN") has been submitted waiting for interactive job to be scheduled … Your interactive job 120 has been successfully scheduled. Establishing /usr/local/bin/qlogin_command session to host uwcmg001.gs.washington.edu ... yourid@uwcmg001.gs.washington.edu's password:
Step 1 : GotCloud Alignment Pipeline
In order to run this tutorial, you need to make sure you have GotCloud installed on your system.
Try to run the following commands sequentially to follow the instruction.
First, let's set an environment variable for your convenience.
% export GC=/home/presenter02/day2/session1/uwcmg_2013_08/
The raw sequence reads can be found by
% ls ${GC}/examples/fastq SRR035022_1.fastq.gz SRR035023_1.fastq.gz SRR035024_1.fastq.gz SRR035025_1.fastq.gz SRR035026_1.fastq.gz SRR035027_1.fastq.gz SRR035669_1.fastq.gz SRR622461_1.fastq.gz SRR035022_2.fastq.gz SRR035023_2.fastq.gz SRR035024_2.fastq.gz SRR035025_2.fastq.gz SRR035026_2.fastq.gz SRR035027_2.fastq.gz SRR035669_2.fastq.gz SRR622461_2.fastq.gz SRR035022.fastq.gz SRR035023.fastq.gz SRR035024.fastq.gz SRR035025.fastq.gz SRR035026.fastq.gz SRR035027.fastq.gz SRR035669.fastq.gz SRR622461.fastq.gz
This set of FASTQ are extracted from two samples from the 1000 Genomes project, focusing around 500kb region are CFTR.
The example directory contain other set of files, such as reference sequence, index, and BAM files (for variant calling).
% ls ${GC}/examples bams chr7Ref fastq index
In order to run alignment pipeline, we need an index file for FASTQ files.
% cat ${GC}/examples/index/chr7.CFTR.fastq.index MERGE_NAME FASTQ1 FASTQ2 RGID SAMPLE LIBRARY CENTER PLATFORM NA06984 fastq/SRR035022.fastq.gz . SRR035022 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035022_1.fastq.gz fastq/SRR035022_2.fastq.gz SRR035022 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035023.fastq.gz . SRR035023 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035023_1.fastq.gz fastq/SRR035023_2.fastq.gz SRR035023 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035024.fastq.gz . SRR035024 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035024_1.fastq.gz fastq/SRR035024_2.fastq.gz SRR035024 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035025.fastq.gz . SRR035025 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035025_1.fastq.gz fastq/SRR035025_2.fastq.gz SRR035025 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035026.fastq.gz . SRR035026 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035026_1.fastq.gz fastq/SRR035026_2.fastq.gz SRR035026 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035027.fastq.gz . SRR035027 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035027_1.fastq.gz fastq/SRR035027_2.fastq.gz SRR035027 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035669.fastq.gz . SRR035669 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035669_1.fastq.gz fastq/SRR035669_2.fastq.gz SRR035669 NA06984 Solexa-16556 BI ILLUMINA NA12878 fastq/SRR622461.fastq.gz . SRR622461 NA12878 Illumina_NA12878 ILLUMINA ILLUMINA NA12878 fastq/SRR622461_1.fastq.gz fastq/SRR622461_2.fastq.gz SRR622461 NA12878 Illumina_NA12878 ILLUMINA ILLUMINA
Also, we need a configuration file for GotCloud (default configuration usually works if you install GotCloud on your own, but here we're using reduced examples).
% cat ${GC}/examples/index/chr7.CFTR.align.conf INDEX_FILE = index/chr7.CFTR.fastq.index ################### # References REF_DIR = chr7Ref AS = NCBI37 REF = $(REF_DIR)/hs37d5.chr7.fa DBSNP_VCF = $(REF_DIR)/dbsnp_135.b37.chr7.CFTR.vcf.gz HM3_VCF = $(REF_DIR)/hapmap_3.3.b37.sites.chr7.CFTR.vcf.gz
Create a test directory and enter it for running alignment pipeline
% mkdir test
% cd test
Then, run the alignment pipeline through the following command
% ${GC}/gotcloud/gotcloud align --conf ${GC}/examples/index/chr7.CFTR.align.conf --outDir align --baseprefix ${GC}/examples Retrieved 168 bytes. Random chance is 54 (compared to 30) File sizes of 32 FASTQ input files referenced in '/home/presenter02/day2/session1/uwcmg_2013_08/examples/index/chr7.CFTR.fastq.index' = 6849341.4 Size of BAMs from aligner will be about 0.01 GB Intermediate files from snpcaller will be about 0.01 GB Final VCF output from snpcaller will be about 0.00 GB Be sure you have enough space to hold all this data Created /home/presenter02/test/align/Makefiles/align_NA06984.Makefile Created /home/presenter02/test/align/Makefiles/align_NA12878.Makefile --------------------------------------------------------------------- Waiting while samples are processed... Processing finished in 42 secs with no errors reported
Let's see where the BAM files are generated
% ls align/bams NA06984.recal.bam NA06984.recal.bam.bai.done NA06984.recal.bam.metrics NA12878.recal.bam NA12878.recal.bam.bai.done NA12878.recal.bam.metrics NA06984.recal.bam.bai NA06984.recal.bam.done NA06984.recal.bam.qemp NA12878.recal.bam.bai NA12878.recal.bam.done NA12878.recal.bam.qemp
Examine the QC metrics by
% ls align/QCFiles NA06984.genoCheck.depthRG NA06984.genoCheck.selfRG NA06984.qplot.R NA12878.genoCheck.depthSM NA12878.genoCheck.selfSM NA12878.qplot.stats NA06984.genoCheck.depthSM NA06984.genoCheck.selfSM NA06984.qplot.stats NA12878.genoCheck.done NA12878.qplot.done NA06984.genoCheck.done NA06984.qplot.done NA12878.genoCheck.depthRG NA12878.genoCheck.selfRG NA12878.qplot.R
Text-version of QC metrics looks as follows
% cat align/QCFiles/NA06984.qplot.stats Stats\BAM /home/presenter02/test2/align/bams/NA06984.recal.bam TotalReads(e6) 0.06 MappingRate(%) 97.44 MapRate_MQpass(%) 97.44 TargetMapping(%) 0.00 ZeroMapQual(%) 0.56 MapQual<10(%) 0.70 PairedReads(%) 98.27 ProperPaired(%) 92.83 MappedBases(e9) 0.00 Q20Bases(e9) 0.00 Q20BasesPct(%) 95.27 MeanDepth 7.33 GenomeCover(%) 0.33 EPS_MSE 293.97 EPS_Cycle_Mean 25.06 GCBiasMSE 0.00 ISize_mode 335 ISize_medium 331 DupRate(%) 2.05 QCFailRate(%) 0.00 BaseComp_A(%) 31.8 BaseComp_C(%) 18.4 BaseComp_G(%) 18.1 BaseComp_T(%) 31.6 BaseComp_O(%) 0.0
You can also run the R command to create PDF files (see lecture slides for examples).
To check whether the samples are contaminated, look at the verifyBamID results.
% cat align/QCFiles/NA06984.genoCheck.selfSM #SEQ_ID RG CHIP_ID #SNPS #READS AVG_DP FREEMIX FREELK1 FREELK0 FREE_RH FREE_RA CHIPMIX CHIPLK1 CHIPLK0 CHIP_RH CHIP_RA DPREF RDPHET RDPALT NA06984 ALL NA 175 1038 5.93 0.00841 351.80 351.85 NA NA NA NA NA NA NA NA NA NA
STEP 3 : Run GotCloud SNP Calling Pipeline
The next step is to analyze BAM files by calling SNPs and generating a VCF file containing the variant calls.
The variant calling pipeline has multiple built-in steps to generate BAMs:
- Filter out reads with low mapping quality
- Per Base Alignment Quality Adjustment (BAQ)
- Resolve overlapping paired end reads
- Generate genotype likelihood files
- Perform variant calling
- Extract features from variant sites
- Perform variant filtering
To speed variant calling, each chromosome is broken up into smaller regions which are processed separately. While initially split by sample, the per sample data gets merged and is processed together for each region. These regions are later merged to result in a single Variant Call File (VCF) per chromosome.
% ls ${GC}/examples/bams NA06984.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam NA12272.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai NA06984.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai NA12273.mapped.ILLUMINA.bwa.CEU.low_coverage.20121211.CFTR.bam NA06985.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam NA12273.mapped.ILLUMINA.bwa.CEU.low_coverage.20121211.CFTR.bam.bai NA06985.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai NA12275.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam NA06986.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam NA12275.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai NA06986.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai NA12282.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam NA06989.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam NA12282.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai NA06989.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai NA12283.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam NA06994.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam NA12283.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai NA06994.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai NA12286.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam NA07000.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam NA12286.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai NA07000.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai NA12287.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam NA07037.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.CFTR.bam NA12287.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai ...
% cat ${GC}/examples/index/chr7.CFTR.low_coverage.index NA06984 EUR bams/NA06984.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam NA06985 EUR bams/NA06985.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam NA06986 EUR bams/NA06986.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam NA06989 EUR bams/NA06989.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam NA06994 EUR bams/NA06994.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam NA07000 EUR bams/NA07000.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam NA07037 EUR bams/NA07037.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.CFTR.bam NA07048 EUR bams/NA07048.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam NA07051 EUR bams/NA07051.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam NA07056 EUR bams/NA07056.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam ...
% cat ${GC}/examples/index/chr7.CFTR.low_coverage.conf CHRS = 7 BAM_INDEX = index/chr7.CFTR.low_coverage.index ############ # References REF_ROOT = chr7Ref # REF = $(REF_ROOT)/hs37d5.chr7.fa INDEL_PREFIX = $(REF_ROOT)/1kg.pilot_release.merged.indels.sites.hg19 DBSNP_VCF = $(REF_ROOT)/dbsnp_135.b37.chr7.CFTR.vcf.gz HM3_VCF = $(REF_ROOT)/hapmap_3.3.b37.sites.chr7.CFTR.vcf.gz OMNI_VCF = $(REF_ROOT)/1000G_omni2.5.b37.sites.PASS.chr7.CFTR.vcf.gz
% ${GC}/gotcloud/gotcloud snpcall --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2
CHRS = 7 BAM_INDEX = index/chr7.CFTR.low_coverage.index ############ # References REF_ROOT = chr7Ref # REF = $(REF_ROOT)/hs37d5.chr7.fa INDEL_PREFIX = $(REF_ROOT)/1kg.pilot_release.merged.indels.sites.hg19 DBSNP_VCF = $(REF_ROOT)/dbsnp_135.b37.chr7.CFTR.vcf.gz HM3_VCF = $(REF_ROOT)/hapmap_3.3.b37.sites.chr7.CFTR.vcf.gz OMNI_VCF = $(REF_ROOT)/1000G_omni2.5.b37.sites.PASS.chr7.CFTR.vcf.gz [presenter02@uwcmg002 test]$ time ${GC}/gotcloud/gotcloud snpcall --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2 Key configurations: GOTCLOUD_ROOT: /home/presenter02/day2/session1/uwcmg_2013_08/gotcloud OUT_DIR: snps BAM_INDEX: index/chr7.CFTR.low_coverage.index REF: chr7Ref/hs37d5.chr7.fa CHRS: 7 BATCH_TYPE: local BATCH_OPTS: Processing the following steps... 2: RUN_PILEUP 3: RUN_GLFMULTIPLES 4: RUN_VCFPILEUP 5: RUN_FILTER 6: RUN_SVM 8: RUN_SPLIT Call region is 7:117000000-117500000 Generating commands for chr7... Creating glf INDEX at 7:115000001-120000000.. -------------------------------------------------------------------- Finished creating makefile /home/presenter02/test/snps/umake.Makefile Running /home/presenter02/test/snps/umake.Makefile Start merging VCFs... Merging /home/presenter02/test/snps/vcfs/chr7/115000001.120000000/chr7.115000001.120000000.vcf into STDOUT... Start merging VCFs... Merging /home/presenter02/test/snps/vcfs/chr7/115000001.120000000/chr7.115000001.120000000.stats.vcf into STDOUT... The following parameters are available. Ones with "[]" are in effect: Available Options Recipes : --write-bed, --write-vcf [ON], --upgrade, --summarize, --filter [ON], --subset VCF Input options : --in-vcf [/home/presenter02/test/snps/vcfs/chr7/chr7.merged.stats.vcf] BED Input options : --in-bfile [], --in-bed [], --in-bim [], --in-fam [], --ref [/data/local/ref/karma.ref/human.g1k.v37.fa] Subsetting options : --in-subset [], --mono-subset, --filt-only-subset Output Options : --out [/home/presenter02/test/snps/vcfs/chr7/chr7.hardfiltered.sites.vcf], --qGeno, --print-every [10000] Output compression Options : --plain [ON], --bgzf, --gzip Genotype-level Filter Options : --minGQ, --minGD Filter Options : --winIndel [5], --indelVCF [/home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/1kg.pilot_release.merged.indels.sites.hg19.chr7.vcf], --minQUAL [5], --minMQ [20], --maxDP [99000], --minDP [99], --maxABL [70], --winFFRQ, --maxFFRQ, --winFVAR, --merFVAR, --maxFVAR, --minNS [1], --maxSTP [2147483647], --maxTTT [2147483647], --minTTT [-2147483648], --maxSTR [20], --minSTR [-20], --maxSTZ [5], --minSTZ [-5], --maxCBR [20], --minCBR [-100], --maxQBR [100], --minQBR [-100], --maxCBZ [2147483647], --maxCSR [100], --minCSR [-100], --maxLQZ [2147483647], --minLQZ [-2147483648], --maxRBZ [2147483647], --minRBZ [-2147483648], --maxIOZ [2147483647], --minIOR [-2147483648], --maxIOR [2147483647], --maxAOZ [2147483647], --maxAOI [5], --maxMQ0 [10], --maxMQ10 [100], --maxMQ20 [100], --minFIC [-20], --minABE [-100], --maxABE [100], --maxLQR [30], --minMBR [-100], --maxMBR [100], --minABZ [-2147483648], --maxABZ [2147483647], --maxBCS [2147483647], --keepFilter Analysis started on Tue Aug 6 12:25:04 2013 The following filters are in effect: m20 : MQ >= 20.00 dp99 : DP >= 99.00 DP99000 : DP <= 99000.00 ns1 : NS >= 1.00 AB70 : AB <= 0.70 STR20 : STR <= 0.20 str-20 : STR >= -0.20 stz-5 : STZ >= -5.00 STZ5 : STZ <= 5.00 CBR20 : CBR <= 0.20 AOI5 : AOI <= 5.00 MQ0_10 : MQ0 <= 0.10 fic-20 : FIC >= -0.20 LQR30 : LQR <= 0.30 q5 : QUAL >= 5 INDEL5 : INDEL >= 5 bp with /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/1kg.pilot_release.merged.indels.sites.hg19.chr7.vcf Writing to VCF file /home/presenter02/test/snps/vcfs/chr7/chr7.hardfiltered.sites.vcf Analysis finished on Tue Aug 6 12:25:05 2013 loading /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/dbsnp_135.b37.chr7.CFTR.vcf.gz as a VCF input.. finished loading dbSNP for 7415 variants TOTAL 1538 1538 662 378 166 25 145 129 27 6 The following parameters are available. Ones with "[]" are in effect: Available Options Input options : --in [/home/presenter02/test/snps/vcfs/chr7/chr7.filtered.sites.vcf.raw], --ignore Output Options : --out [/home/presenter02/test/snps/vcfs/chr7/chr7.filtered.sites.vcf.norm], --digits [3], --rank, --verbose [ON] Output compression Options : --plain [ON], --bgzf, --gzip Analysis started on Tue Aug 6 12:25:05 2013 Reading Input File /home/presenter02/test/snps/vcfs/chr7/chr7.filtered.sites.vcf.raw Finished Reading 1538 lines Starting sorting values Writing output values Analysis finished on Tue Aug 6 12:25:05 2013 loading /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/dbsnp_135.b37.chr7.CFTR.vcf.gz as a VCF input.. finished loading dbSNP for 7415 variants TOTAL 1538 1538 662 378 166 25 145 129 27 6 Commands finished in 306 secs with no errors reported --------------------------------------------------------------------
% cat ${GC}/out/snps/vcfs/chr7/chr7.filtered.sites.vcf.summary
cat ${GC}/out/snps/vcfs/chr7/chr7.filtered.sites.vcf.summary
-------------------------------------------------------------------------------------------------------------- FILTER #SNPs #dbSNP %dbSNP %CpG %CpG %Known %Novel %nCpG-K %nCpG-N %HM3 %HM3 Known Novel Ts/Tv Ts/Tv Ts/Tv Ts/Tv sens /SNP -------------------------------------------------------------------------------------------------------------- INDEL5 11 8 72.7 25.0 0.0 3.00 0.50 2.00 0.50 0.000 0.000 INDEL5;SVM 8 4 50.0 50.0 25.0 0.33 0.33 1.00 0.00 0.000 0.000 PASS 1388 1186 85.4 15.5 11.4 2.10 1.62 1.79 1.42 85.083 11.095 SVM 131 33 25.2 9.1 9.2 1.06 0.85 0.88 0.78 0.000 0.000 -------------------------------------------------------------------------------------------------------------- FILTER #SNPs #dbSNP %dbSNP %CpG %CpG %Known %Novel %nCpG-K %nCpG-N %HM3 %HM3 Known Novel Ts/Tv Ts/Tv Ts/Tv Ts/Tv sens /SNP -------------------------------------------------------------------------------------------------------------- INDEL5 19 12 63.2 33.3 14.3 1.40 0.40 1.67 0.20 0.000 0.000 PASS 1388 1186 85.4 15.5 11.4 2.10 1.62 1.79 1.42 85.083 11.095 SVM 139 37 26.6 13.5 9.8 0.95 0.82 0.88 0.74 0.000 0.000 -------------------------------------------------------------------------------------------------------------- PASS 1388 1186 85.4 15.5 11.4 2.10 1.62 1.79 1.42 85.083 11.095 FAIL 150 45 30.0 15.6 9.5 1.14 0.81 1.00 0.73 0.000 0.000 TOTAL 1538 1231 80.0 15.5 10.7 2.05 1.27 1.75 1.12 85.083 10.013 ------------------------------------------------------------------------------------------------------------
% time ${GC}/gotcloud/gotcloud beagle --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2
% samtools tview ${GC}/examples/bams/NA12843.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam ${GC}/examples/chr7Ref/hs37d5.chr7.fa
STEP 4 : EPACTS association analysis
% head ${GC}/examples/index/chr7.CFTR.ped
% mkdir assoc
% time ${GC}/epacts/bin/epacts single --ped ${GC}/examples/index/chr7.CFTR.ped --vcf ${GC}/out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz --pheno PHENO --out assoc/single.b.score --test b.score --anno --ref ${GC}/examples/chr7Ref/hs37d5.chr7.fa --region 7:117000000-117500000 --run 1
% time ${GC}/epacts/bin/epacts anno --in ${GC}/out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz --out snps/chr7.filtered.PASS.beagled.anno.vcf.gz --ref ${GC}examples/chr7Ref/hs37d5.chr7.fa
% zcat snps/chr7.filtered.PASS.beagled.anno.vcf.gz | grep Nonsynonymous | grep CFTR | cut -f 1-8 | head -1
% ${GC}/epacts/bin/epacts make-group --vcf snps/chr7.filtered.PASS.beagled.anno.vcf.gz --out snps/chr7.filtered.PASS.beagled.anno.grp –nonsyn
% ${GC}/epacts/bin/epacts group --ped ${GC}/examples/index/chr7.CFTR.ped --vcf snps/chr7.filtered.PASS.beagled.anno.vcf.gz --out assoc/group.skat.o --groupf snps/chr7.filtered.PASS.beagled.anno.grp --test skat --skat-o --run 2
% cat assoc/group.skat.o.epacts
% ${GC}/epacts/bin/epacts group --ped ${GC}/examples/index/chr7.CFTR.ped --vcf snps/
STEP 5 : All-in-one running of all scripts (for later use)
% tar xzvf /home/presenter02/day2/session1/uwcmg_2013_08.tar.gz�
% cd /home/presenter02/day2/session1/uwcmg_2013_08�
% sh go.sh �