Tutorial: GotCloud UW CMG
GotCloud / EPACTS Tutorial
In this tutorial, we illustrate some of the essential steps in the analysis of next generation sequence data.
For a background on GotCloud and Sequence Analysis Pipelines, see GotCloud
While GotCloud can run on a cluster of machines or instances, this tutorial is just a small test that just runs on the machine the commands are run on.
This tutorial is a specialized version for UW Center for Mendelian Genomics. For other tutorial, please see Tutorial: GotCloud.
STEP 0 : Login to the cluster
In UW CMG cluster, you need to login to one of the high-performance cluster node via the following way.
% ssh yourid@uwcmg-head.gs.washington.edu Password:
% qlogin Your job 120 ("QLOGIN") has been submitted waiting for interactive job to be scheduled … Your interactive job 120 has been successfully scheduled. Establishing /usr/local/bin/qlogin_command session to host uwcmg001.gs.washington.edu ... yourid@uwcmg001.gs.washington.edu's password:
Step 1 : GotCloud Alignment Pipeline
In order to run this tutorial, you need to make sure you have GotCloud installed on your system.
Try to run the following commands sequentially to follow the instruction.
First, let's set an environment variable for your convenience.
% export GC=/home/presenter02/day2/session1/uwcmg_2013_08/
The raw sequence reads can be found by
% ls ${GC}/examples/fastq SRR035022_1.fastq.gz SRR035023_1.fastq.gz SRR035024_1.fastq.gz SRR035025_1.fastq.gz SRR035026_1.fastq.gz SRR035027_1.fastq.gz SRR035669_1.fastq.gz SRR622461_1.fastq.gz SRR035022_2.fastq.gz SRR035023_2.fastq.gz SRR035024_2.fastq.gz SRR035025_2.fastq.gz SRR035026_2.fastq.gz SRR035027_2.fastq.gz SRR035669_2.fastq.gz SRR622461_2.fastq.gz SRR035022.fastq.gz SRR035023.fastq.gz SRR035024.fastq.gz SRR035025.fastq.gz SRR035026.fastq.gz SRR035027.fastq.gz SRR035669.fastq.gz SRR622461.fastq.gz
To see examples of FASTQ file.
% zless ${GC}/examples/fastq/SRR035022_1.fastq.gz
This set of FASTQ are extracted from two samples from the 1000 Genomes project, focusing around 500kb region are CFTR.
The example directory contain other set of files, such as reference sequence, index, and BAM files (for variant calling).
% ls ${GC}/examples bams chr7Ref fastq index
In order to run alignment pipeline, we need an index file for FASTQ files.
% cat ${GC}/examples/index/chr7.CFTR.fastq.index MERGE_NAME FASTQ1 FASTQ2 RGID SAMPLE LIBRARY CENTER PLATFORM NA06984 fastq/SRR035022.fastq.gz . SRR035022 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035022_1.fastq.gz fastq/SRR035022_2.fastq.gz SRR035022 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035023.fastq.gz . SRR035023 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035023_1.fastq.gz fastq/SRR035023_2.fastq.gz SRR035023 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035024.fastq.gz . SRR035024 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035024_1.fastq.gz fastq/SRR035024_2.fastq.gz SRR035024 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035025.fastq.gz . SRR035025 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035025_1.fastq.gz fastq/SRR035025_2.fastq.gz SRR035025 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035026.fastq.gz . SRR035026 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035026_1.fastq.gz fastq/SRR035026_2.fastq.gz SRR035026 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035027.fastq.gz . SRR035027 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035027_1.fastq.gz fastq/SRR035027_2.fastq.gz SRR035027 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035669.fastq.gz . SRR035669 NA06984 Solexa-16556 BI ILLUMINA NA06984 fastq/SRR035669_1.fastq.gz fastq/SRR035669_2.fastq.gz SRR035669 NA06984 Solexa-16556 BI ILLUMINA NA12878 fastq/SRR622461.fastq.gz . SRR622461 NA12878 Illumina_NA12878 ILLUMINA ILLUMINA NA12878 fastq/SRR622461_1.fastq.gz fastq/SRR622461_2.fastq.gz SRR622461 NA12878 Illumina_NA12878 ILLUMINA ILLUMINA
Also, we need a configuration file for GotCloud (default configuration usually works if you install GotCloud on your own, but here we're using reduced examples).
% cat ${GC}/examples/index/chr7.CFTR.align.conf INDEX_FILE = index/chr7.CFTR.fastq.index ################### # References REF_DIR = chr7Ref AS = NCBI37 REF = $(REF_DIR)/hs37d5.chr7.fa DBSNP_VCF = $(REF_DIR)/dbsnp_135.b37.chr7.CFTR.vcf.gz HM3_VCF = $(REF_DIR)/hapmap_3.3.b37.sites.chr7.CFTR.vcf.gz
Create a test directory and enter it for running alignment pipeline
% mkdir test
% cd test
Then, run the alignment pipeline through the following command
% ${GC}/gotcloud/gotcloud align --conf ${GC}/examples/index/chr7.CFTR.align.conf --outDir align --baseprefix ${GC}/examples Retrieved 168 bytes. Random chance is 54 (compared to 30) File sizes of 32 FASTQ input files referenced in '/home/presenter02/day2/session1/uwcmg_2013_08/examples/index/chr7.CFTR.fastq.index' = 6849341.4 Size of BAMs from aligner will be about 0.01 GB Intermediate files from snpcaller will be about 0.01 GB Final VCF output from snpcaller will be about 0.00 GB Be sure you have enough space to hold all this data Created /home/presenter02/test/align/Makefiles/align_NA06984.Makefile Created /home/presenter02/test/align/Makefiles/align_NA12878.Makefile --------------------------------------------------------------------- Waiting while samples are processed... Processing finished in 42 secs with no errors reported
Let's see where the BAM files are generated
% ls align/bams NA06984.recal.bam NA06984.recal.bam.bai.done NA06984.recal.bam.metrics NA12878.recal.bam NA12878.recal.bam.bai.done NA12878.recal.bam.metrics NA06984.recal.bam.bai NA06984.recal.bam.done NA06984.recal.bam.qemp NA12878.recal.bam.bai NA12878.recal.bam.done NA12878.recal.bam.qemp
You can view the aligned sequence reads by running
% samtools view align/bams/NA06984.recal.bam | less
Examine the QC metrics by
% ls align/QCFiles NA06984.genoCheck.depthRG NA06984.genoCheck.selfRG NA06984.qplot.R NA12878.genoCheck.depthSM NA12878.genoCheck.selfSM NA12878.qplot.stats NA06984.genoCheck.depthSM NA06984.genoCheck.selfSM NA06984.qplot.stats NA12878.genoCheck.done NA12878.qplot.done NA06984.genoCheck.done NA06984.qplot.done NA12878.genoCheck.depthRG NA12878.genoCheck.selfRG NA12878.qplot.R
Text-version of QC metrics looks as follows
% cat align/QCFiles/NA06984.qplot.stats Stats\BAM /home/presenter02/test2/align/bams/NA06984.recal.bam TotalReads(e6) 0.06 MappingRate(%) 97.44 MapRate_MQpass(%) 97.44 TargetMapping(%) 0.00 ZeroMapQual(%) 0.56 MapQual<10(%) 0.70 PairedReads(%) 98.27 ProperPaired(%) 92.83 MappedBases(e9) 0.00 Q20Bases(e9) 0.00 Q20BasesPct(%) 95.27 MeanDepth 7.33 GenomeCover(%) 0.33 EPS_MSE 293.97 EPS_Cycle_Mean 25.06 GCBiasMSE 0.00 ISize_mode 335 ISize_medium 331 DupRate(%) 2.05 QCFailRate(%) 0.00 BaseComp_A(%) 31.8 BaseComp_C(%) 18.4 BaseComp_G(%) 18.1 BaseComp_T(%) 31.6 BaseComp_O(%) 0.0
You can also run the R command to create PDF files (see lecture slides for examples).
To check whether the samples are contaminated, look at the verifyBamID results.
% cat align/QCFiles/NA06984.genoCheck.selfSM #SEQ_ID RG CHIP_ID #SNPS #READS AVG_DP FREEMIX FREELK1 FREELK0 FREE_RH FREE_RA CHIPMIX CHIPLK1 CHIPLK0 CHIP_RH CHIP_RA DPREF RDPHET RDPALT NA06984 ALL NA 175 1038 5.93 0.00841 351.80 351.85 NA NA NA NA NA NA NA NA NA NA
STEP 3 : Run GotCloud SNP Calling Pipeline
The next step is to analyze BAM files by calling SNPs and generating a VCF file containing the variant calls.
The variant calling pipeline has multiple built-in steps to generate BAMs:
- Filter out reads with low mapping quality
- Per Base Alignment Quality Adjustment (BAQ)
- Resolve overlapping paired end reads
- Generate genotype likelihood files
- Perform variant calling
- Extract features from variant sites
- Perform variant filtering
To speed variant calling, each chromosome is broken up into smaller regions which are processed separately. While initially split by sample, the per sample data gets merged and is processed together for each region. These regions are later merged to result in a single Variant Call File (VCF) per chromosome.
% ls ${GC}/examples/bams NA06984.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam NA12272.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai NA06984.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai NA12273.mapped.ILLUMINA.bwa.CEU.low_coverage.20121211.CFTR.bam NA06985.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam NA12273.mapped.ILLUMINA.bwa.CEU.low_coverage.20121211.CFTR.bam.bai NA06985.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai NA12275.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam NA06986.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam NA12275.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai NA06986.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai NA12282.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam NA06989.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam NA12282.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai NA06989.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai NA12283.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam NA06994.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam NA12283.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai NA06994.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai NA12286.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam NA07000.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam NA12286.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai NA07000.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai NA12287.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam NA07037.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.CFTR.bam NA12287.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai ...
To run SNP calling pipeline, we need to prepare index file corresponding to the BAM files with sample IDs
% cat ${GC}/examples/index/chr7.CFTR.low_coverage.index NA06984 EUR bams/NA06984.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam NA06985 EUR bams/NA06985.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam NA06986 EUR bams/NA06986.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam NA06989 EUR bams/NA06989.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam NA06994 EUR bams/NA06994.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam NA07000 EUR bams/NA07000.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam NA07037 EUR bams/NA07037.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.CFTR.bam NA07048 EUR bams/NA07048.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam NA07051 EUR bams/NA07051.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam NA07056 EUR bams/NA07056.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam ...
Also, we need a configuration file.
% cat ${GC}/examples/index/chr7.CFTR.low_coverage.conf CHRS = 7 BAM_INDEX = index/chr7.CFTR.low_coverage.index ############ # References REF_ROOT = chr7Ref # REF = $(REF_ROOT)/hs37d5.chr7.fa INDEL_PREFIX = $(REF_ROOT)/1kg.pilot_release.merged.indels.sites.hg19 DBSNP_VCF = $(REF_ROOT)/dbsnp_135.b37.chr7.CFTR.vcf.gz HM3_VCF = $(REF_ROOT)/hapmap_3.3.b37.sites.chr7.CFTR.vcf.gz OMNI_VCF = $(REF_ROOT)/1000G_omni2.5.b37.sites.PASS.chr7.CFTR.vcf.gz
Then we're ready to run SNP calling pipeline
% ${GC}/gotcloud/gotcloud snpcall --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2 CHRS = 7 BAM_INDEX = index/chr7.CFTR.low_coverage.index ############ # References REF_ROOT = chr7Ref # REF = $(REF_ROOT)/hs37d5.chr7.fa INDEL_PREFIX = $(REF_ROOT)/1kg.pilot_release.merged.indels.sites.hg19 DBSNP_VCF = $(REF_ROOT)/dbsnp_135.b37.chr7.CFTR.vcf.gz HM3_VCF = $(REF_ROOT)/hapmap_3.3.b37.sites.chr7.CFTR.vcf.gz OMNI_VCF = $(REF_ROOT)/1000G_omni2.5.b37.sites.PASS.chr7.CFTR.vcf.gz [presenter02@uwcmg002 test]$ time ${GC}/gotcloud/gotcloud snpcall --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2 Key configurations: GOTCLOUD_ROOT: /home/presenter02/day2/session1/uwcmg_2013_08/gotcloud OUT_DIR: snps BAM_INDEX: index/chr7.CFTR.low_coverage.index REF: chr7Ref/hs37d5.chr7.fa CHRS: 7 BATCH_TYPE: local BATCH_OPTS: Processing the following steps... 2: RUN_PILEUP 3: RUN_GLFMULTIPLES 4: RUN_VCFPILEUP 5: RUN_FILTER 6: RUN_SVM 8: RUN_SPLIT Call region is 7:117000000-117500000 Generating commands for chr7... Creating glf INDEX at 7:115000001-120000000.. -------------------------------------------------------------------- Finished creating makefile /home/presenter02/test/snps/umake.Makefile Running /home/presenter02/test/snps/umake.Makefile Start merging VCFs... Merging /home/presenter02/test/snps/vcfs/chr7/115000001.120000000/chr7.115000001.120000000.vcf into STDOUT... Start merging VCFs... Merging /home/presenter02/test/snps/vcfs/chr7/115000001.120000000/chr7.115000001.120000000.stats.vcf into STDOUT... The following parameters are available. Ones with "[]" are in effect: Available Options Recipes : --write-bed, --write-vcf [ON], --upgrade, --summarize, --filter [ON], --subset VCF Input options : --in-vcf [/home/presenter02/test/snps/vcfs/chr7/chr7.merged.stats.vcf] BED Input options : --in-bfile [], --in-bed [], --in-bim [], --in-fam [], --ref [/data/local/ref/karma.ref/human.g1k.v37.fa] Subsetting options : --in-subset [], --mono-subset, --filt-only-subset Output Options : --out [/home/presenter02/test/snps/vcfs/chr7/chr7.hardfiltered.sites.vcf], --qGeno, --print-every [10000] Output compression Options : --plain [ON], --bgzf, --gzip Genotype-level Filter Options : --minGQ, --minGD Filter Options : --winIndel [5], --indelVCF [/home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/1kg.pilot_release.merged.indels.sites.hg19.chr7.vcf], --minQUAL [5], --minMQ [20], --maxDP [99000], --minDP [99], --maxABL [70], --winFFRQ, --maxFFRQ, --winFVAR, --merFVAR, --maxFVAR, --minNS [1], --maxSTP [2147483647], --maxTTT [2147483647], --minTTT [-2147483648], --maxSTR [20], --minSTR [-20], --maxSTZ [5], --minSTZ [-5], --maxCBR [20], --minCBR [-100], --maxQBR [100], --minQBR [-100], --maxCBZ [2147483647], --maxCSR [100], --minCSR [-100], --maxLQZ [2147483647], --minLQZ [-2147483648], --maxRBZ [2147483647], --minRBZ [-2147483648], --maxIOZ [2147483647], --minIOR [-2147483648], --maxIOR [2147483647], --maxAOZ [2147483647], --maxAOI [5], --maxMQ0 [10], --maxMQ10 [100], --maxMQ20 [100], --minFIC [-20], --minABE [-100], --maxABE [100], --maxLQR [30], --minMBR [-100], --maxMBR [100], --minABZ [-2147483648], --maxABZ [2147483647], --maxBCS [2147483647], --keepFilter Analysis started on Tue Aug 6 12:25:04 2013 The following filters are in effect: m20 : MQ >= 20.00 dp99 : DP >= 99.00 DP99000 : DP <= 99000.00 ns1 : NS >= 1.00 AB70 : AB <= 0.70 STR20 : STR <= 0.20 str-20 : STR >= -0.20 stz-5 : STZ >= -5.00 STZ5 : STZ <= 5.00 CBR20 : CBR <= 0.20 AOI5 : AOI <= 5.00 MQ0_10 : MQ0 <= 0.10 fic-20 : FIC >= -0.20 LQR30 : LQR <= 0.30 q5 : QUAL >= 5 INDEL5 : INDEL >= 5 bp with /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/1kg.pilot_release.merged.indels.sites.hg19.chr7.vcf Writing to VCF file /home/presenter02/test/snps/vcfs/chr7/chr7.hardfiltered.sites.vcf Analysis finished on Tue Aug 6 12:25:05 2013 loading /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/dbsnp_135.b37.chr7.CFTR.vcf.gz as a VCF input.. finished loading dbSNP for 7415 variants TOTAL 1538 1538 662 378 166 25 145 129 27 6 The following parameters are available. Ones with "[]" are in effect: Available Options Input options : --in [/home/presenter02/test/snps/vcfs/chr7/chr7.filtered.sites.vcf.raw], --ignore Output Options : --out [/home/presenter02/test/snps/vcfs/chr7/chr7.filtered.sites.vcf.norm], --digits [3], --rank, --verbose [ON] Output compression Options : --plain [ON], --bgzf, --gzip Analysis started on Tue Aug 6 12:25:05 2013 Reading Input File /home/presenter02/test/snps/vcfs/chr7/chr7.filtered.sites.vcf.raw Finished Reading 1538 lines Starting sorting values Writing output values Analysis finished on Tue Aug 6 12:25:05 2013 loading /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/dbsnp_135.b37.chr7.CFTR.vcf.gz as a VCF input.. finished loading dbSNP for 7415 variants TOTAL 1538 1538 662 378 166 25 145 129 27 6 Commands finished in 306 secs with no errors reported --------------------------------------------------------------------
We can summarize the quality of variants.
% cat ${GC}/out/snps/vcfs/chr7/chr7.filtered.sites.vcf.summary -------------------------------------------------------------------------------------------------------------- FILTER #SNPs #dbSNP %dbSNP %CpG %CpG %Known %Novel %nCpG-K %nCpG-N %HM3 %HM3 Known Novel Ts/Tv Ts/Tv Ts/Tv Ts/Tv sens /SNP -------------------------------------------------------------------------------------------------------------- INDEL5 11 8 72.7 25.0 0.0 3.00 0.50 2.00 0.50 0.000 0.000 INDEL5;SVM 8 4 50.0 50.0 25.0 0.33 0.33 1.00 0.00 0.000 0.000 PASS 1388 1186 85.4 15.5 11.4 2.10 1.62 1.79 1.42 85.083 11.095 SVM 131 33 25.2 9.1 9.2 1.06 0.85 0.88 0.78 0.000 0.000 -------------------------------------------------------------------------------------------------------------- FILTER #SNPs #dbSNP %dbSNP %CpG %CpG %Known %Novel %nCpG-K %nCpG-N %HM3 %HM3 Known Novel Ts/Tv Ts/Tv Ts/Tv Ts/Tv sens /SNP -------------------------------------------------------------------------------------------------------------- INDEL5 19 12 63.2 33.3 14.3 1.40 0.40 1.67 0.20 0.000 0.000 PASS 1388 1186 85.4 15.5 11.4 2.10 1.62 1.79 1.42 85.083 11.095 SVM 139 37 26.6 13.5 9.8 0.95 0.82 0.88 0.74 0.000 0.000 -------------------------------------------------------------------------------------------------------------- PASS 1388 1186 85.4 15.5 11.4 2.10 1.62 1.79 1.42 85.083 11.095 FAIL 150 45 30.0 15.6 9.5 1.14 0.81 1.00 0.73 0.000 0.000 TOTAL 1538 1231 80.0 15.5 10.7 2.05 1.27 1.75 1.12 85.083 10.013 ------------------------------------------------------------------------------------------------------------
To run LD-refinement you can run beagle only (gotcloud beagle) or beagle+thunder (gotcloud ldrefine) to improve the quality of genotypes.
If you're interested in looking at individual reads from the variant, samtools tview provides a good alternative to IGV. In order to see CFTR R75Q variant in one of the variant carriers, you can run
% samtools tview ${GC}/examples/bams/NA12843.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam ${GC}/examples/chr7Ref/hs37d5.chr7.fa
And follow the steps
- Type ‘g’
- Type 7:117149147
- Type ‘n’ to color by nucleotide
- Move with left arrow to see the variant site clearly
% ${GC}/gotcloud/gotcloud beagle --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2 Key configurations: GOTCLOUD_ROOT: /home/presenter02/day2/session1/uwcmg_2013_08/gotcloud OUT_DIR: snps BAM_INDEX: index/chr7.CFTR.low_coverage.index REF: chr7Ref/hs37d5.chr7.fa CHRS: 7 BATCH_TYPE: local BATCH_OPTS: Processing the following steps... 9: RUN_BEAGLE 10: RUN_SUBSET Call region is 7:117000000-117500000 Generating commands for chr7... -------------------------------------------------------------------- Finished creating makefile /home/presenter02/test/snps/umake.Makefile Running /home/presenter02/test/snps/umake.Makefile Commands finished in 74 secs with no errors reported --------------------------------------------------------------------
STEP 4 : EPACTS association analysis
In addition to variant calls, we need phenotype information to perform association mapping. Here, the phenotypes are created so that case is enriched in the individuals carrying rare nonsynonymous variants at CFTR.
% less ${GC}/examples/index/chr7.CFTR.ped #FAM_ID IND_ID DAD_ID MOM_ID SEX PHENO QT NA06984 NA06984 0 0 0 1 -0.657149296617419 NA06985 NA06985 0 0 0 2 -0.449034856685281 NA06986 NA06986 0 0 0 1 0.0975849179986626 NA06989 NA06989 0 0 0 2 1.77095069670763 NA06994 NA06994 0 0 0 1 0.287475900193007 NA07000 NA07000 0 0 0 1 -1.36632872407691 NA07037 NA07037 0 0 0 2 -0.1278913321612 NA07048 NA07048 0 0 0 2 -1.45798868745693 NA07051 NA07051 0 0 0 1 -0.599618650565132 ...
Create a directory to store association results
% mkdir assoc
Run single variant logistic score test.
% ${GC}/epacts/bin/epacts single --ped ${GC}/examples/index/chr7.CFTR.ped --vcf ${GC}/out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz --pheno PHENO --out assoc/single.b.score --test b.score --anno --ref ${GC}/examples/chr7Ref/hs37d5.chr7.fa --region 7:117000000-117500000 --run 1 Successfully written phenotypes and 0 covariates across 99 individuals Processing chromosome 7... Finished generating EPACTS Makefile Running 1 parallel jobs of EPACTS make -f /home/presenter02/test.v1/assoc/single.b.score.Makefile -j 1 Loading required package: epactsR Sucessfully wrote ( 1388 * 10 ) matrix The following parameters are available. Ones with "[]" are in effect: Available Options Required Parameters : -i [/home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts] -o [/home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp] Gene Annotation Parameters : -g [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz] -r [/home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa] --inputFormat [epacts], --checkReference, -f [refGene] -p [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt] -c [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt] -u [], -d [], --se [], --si [], --outputFormat [] Other Annotation Tools : --genomeScore [], --bed [], --tabix [] Load reference genome /home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa... DONE: 1 chromosomes and 159138663 bases are loaded. Load codon file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt... DONE: codon file loaded. Load priority file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt... DONE: 24 priority annotation types loaded. Load gene file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz... DONE: 92627 gene loaded. DONE: Generated frequency of each annotype type in [ /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.anno.frq ]. DONE: Generated frequency of each highest priority annotation type in [ /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.top.anno.frq ]. Ts/Tv ratio: 2.02397 Ts observed: 929 times; Tv observed: 459 times. DONE: Generated frequency of each base change in [ /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.base.frq ]. DONE: Generated frequency of each codon change in [ /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.codon.frq ]. DONE: Generated frequency of indel length in [ /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.indel.frq ]. nsnps = 866 Rscript /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/epactsSingle.R --vanilla /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../ /home/presenter02/test.v1/assoc/single.b.score.phe NULL /home/presenter02/test.v1/assoc/single.b.score.ind /home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz 7:117000000-117500000 /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts GT 1e-06 1 3 1000000000 0.5 0 FALSE single.b.score NULL NULL /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/epacts-anno --in /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts --ref /home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/anno -i /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts -r /home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa -f refGene -g /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz -c /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt -o /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp --inputFormat epacts -p /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt .............................................. ... Anno(tation) ... ... Xiaowei Zhan, Goncalo Abecasis ... ... Speical Thanks: ... ... Hyun Ming Kang, Yanming Li ... ... zhanxw@umich.edu ... ... Sep 2011 ... ................................................ DONE: 1388 varaints are annotated. DONE: Generated annotation output in [ /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp ]. Annotation succeed! mv /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts rm /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.log /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.top.anno.frq /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.anno.frq /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.base.frq /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.codon.frq /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.indel.frq /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/epacts-cat /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts | /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/bgzip -c > /home/presenter02/test.v1/assoc/single.b.score.epacts.gz /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/tabix -f -pbed /home/presenter02/test.v1/assoc/single.b.score.epacts.gz rm -f /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/epacts-plot --in /home/presenter02/test.v1/assoc/single.b.score.epacts.gz --region 7:0 Rscript /home/presenter02/test.v1/assoc/single.b.score.epacts.R --vanilla export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; gnuplot /home/presenter02/test.v1/assoc/single.b.score.epacts.cmd export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/epstopdf /home/presenter02/test.v1/assoc/single.b.score.epacts.qq.eps export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; gnuplot /home/presenter02/test.v1/assoc/single.b.score.epacts.cmd export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/epstopdf /home/presenter02/test.v1/assoc/single.b.score.epacts.mh.eps rm /home/presenter02/test.v1/assoc/single.b.score.epacts.cmd /home/presenter02/test.v1/assoc/single.b.score.epacts.qq.eps /home/presenter02/test.v1/assoc/single.b.score.epacts.mh.eps /home/presenter02/test.v1/assoc/single.b.score.epacts.*.dat zcat /home/presenter02/test.v1/assoc/single.b.score.epacts.gz | awk '$9 != "NA" { print $0 }' | sort -g -k 9 | head -n 5000 > /home/presenter02/test.v1/assoc/single.b.score.epacts.top5000 touch /home/presenter02/test.v1/assoc/single.b.score.epacts.OK
d-69-91-154-62:~ hmkang$ cat tmp.txt | perl -lane 's/test\.v1/test/g; print " $_"'
uccessfully written phenotypes and 0 covariates across 99 individuals Processing chromosome 7... Finished generating EPACTS Makefile Running 1 parallel jobs of EPACTS make -f /home/presenter02/test/assoc/single.b.score.Makefile -j 1 Loading required package: epactsR Sucessfully wrote ( 1388 * 10 ) matrix The following parameters are available. Ones with "[]" are in effect: Available Options Required Parameters : -i [/home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts] -o [/home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp] Gene Annotation Parameters : -g [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz] -r [/home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa] --inputFormat [epacts], --checkReference, -f [refGene] -p [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt] -c [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt] -u [], -d [], --se [], --si [], --outputFormat [] Other Annotation Tools : --genomeScore [], --bed [], --tabix [] Load reference genome /home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa... DONE: 1 chromosomes and 159138663 bases are loaded. Load codon file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt... DONE: codon file loaded. Load priority file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt... DONE: 24 priority annotation types loaded. Load gene file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz... DONE: 92627 gene loaded. DONE: Generated frequency of each annotype type in [ /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.anno.frq ]. DONE: Generated frequency of each highest priority annotation type in [ /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.top.anno.frq ]. Ts/Tv ratio: 2.02397 Ts observed: 929 times; Tv observed: 459 times. DONE: Generated frequency of each base change in [ /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.base.frq ]. DONE: Generated frequency of each codon change in [ /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.codon.frq ]. DONE: Generated frequency of indel length in [ /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.indel.frq ]. nsnps = 866 Rscript /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/epactsSingle.R --vanilla /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../ /home/presenter02/test/assoc/single.b.score.phe NULL /home/presenter02/test/assoc/single.b.score.ind /home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz 7:117000000-117500000 /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts GT 1e-06 1 3 1000000000 0.5 0 FALSE single.b.score NULL NULL /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/epacts-anno --in /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts --ref /home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/anno -i /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts -r /home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa -f refGene -g /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz -c /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt -o /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp --inputFormat epacts -p /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt .............................................. ... Anno(tation) ... ... Xiaowei Zhan, Goncalo Abecasis ... ... Speical Thanks: ... ... Hyun Ming Kang, Yanming Li ... ... zhanxw@umich.edu ... ... Sep 2011 ... ................................................ DONE: 1388 varaints are annotated. DONE: Generated annotation output in [ /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp ]. Annotation succeed! mv /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts rm /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.log /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.top.anno.frq /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.anno.frq /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.base.frq /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.codon.frq /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.indel.frq /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/epacts-cat /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts | /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/bgzip -c > /home/presenter02/test/assoc/single.b.score.epacts.gz /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/tabix -f -pbed /home/presenter02/test/assoc/single.b.score.epacts.gz rm -f /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/epacts-plot --in /home/presenter02/test/assoc/single.b.score.epacts.gz --region 7:0 Rscript /home/presenter02/test/assoc/single.b.score.epacts.R --vanilla export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; gnuplot /home/presenter02/test/assoc/single.b.score.epacts.cmd export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/epstopdf /home/presenter02/test/assoc/single.b.score.epacts.qq.eps export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; gnuplot /home/presenter02/test/assoc/single.b.score.epacts.cmd export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/epstopdf /home/presenter02/test/assoc/single.b.score.epacts.mh.eps rm /home/presenter02/test/assoc/single.b.score.epacts.cmd /home/presenter02/test/assoc/single.b.score.epacts.qq.eps /home/presenter02/test/assoc/single.b.score.epacts.mh.eps /home/presenter02/test/assoc/single.b.score.epacts.*.dat zcat /home/presenter02/test/assoc/single.b.score.epacts.gz | awk '$9 != "NA" { print $0 }' | sort -g -k 9 | head -n 5000 > /home/presenter02/test/assoc/single.b.score.epacts.top5000 touch /home/presenter02/test/assoc/single.b.score.epacts.OK
To examine the top signals of association, try
% head assoc/single.b.score.epacts.top5000 #CHROM BEGIN END MARKER_ID NS AC CALLRATE MAF PVALUE SCORE NS.CASE NS.CTRL AF.CASE AF.CTRL 7 117130755 117130755 7:117130755_C/G_Intron:CFTR 99 9 1 0.045455 0.0018407 3.1148 50 49 0.09 0 7 117149147 117149147 7:117149147_G/A_Nonsynonymous:CFTR 99 9 1 0.045455 0.0018407 3.1148 50 49 0.09 0 7 117120141 117120141 7:117120141_G/C_Utr5:CFTR 99 14 1 0.070707 0.0034417 -2.9253 50 49 0.02 0.12245 7 117356483 117356483 7:117356483_C/T_Intron:CTTNBP2 99 11 1 0.055556 0.0035695 -2.9139 50 49 0.01 0.10204 7 117459023 117459023 7:117459023_A/G_Intron:CTTNBP2 99 121 1 0.38889 0.0047674 2.8223 5049 0.71 0.5102 7 117430909 117430909 7:117430909_T/C_Intron:CTTNBP2 99 10 1 0.050505 0.0068926 -2.702 50 49 0.01 0.091837 7 117456904 117456904 7:117456904_C/T_Intron:CTTNBP2 99 115 1 0.41919 0.0076959 2.6651 5049 0.68 0.47959 7 117457141 117457141 7:117457141_G/C_Intron:CTTNBP2 99 115 1 0.41919 0.0076959 2.6651 5049 0.68 0.47959 7 117467003 117467003 7:117467003_T/A_Intron:CTTNBP2 99 122 1 0.38384 0.0077344 2.6634 5049 0.71 0.52041
Also you can examine the PDF output file (although it is a tiny fraction of genome), if your X11 works.
% xpdf assoc/single.b.score.epacts.qq.pdf % xpdf assoc/single.b.score.epacts.mh.pdf
Running variant annotation using EPACTS is also possible
% time ${GC}/epacts/bin/epacts anno --in ${GC}/out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz --out snps/chr7.filtered.PASS.beagled.anno.vcf.gz --ref ${GC}examples/chr7Ref/hs37d5.chr7.fa /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/anno -i /home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz -r /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/hs37d5.chr7.fa -f refGene -g /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz -c /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt -o snps/chr7.filtered.PASS.beagled.anno.vcf.gz --inputFormat vcf -p /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt The following parameters are available. Ones with "[]" are in effect: Available Options Required Parameters : -i [/home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz] -o [snps/chr7.filtered.PASS.beagled.anno.vcf.gz] Gene Annotation Parameters : -g [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz] -r [/home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/hs37d5.chr7.fa] --inputFormat [vcf], --checkReference, -f [refGene] -p [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt] -c [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt] -u [], -d [], --se [], --si [], --outputFormat [] Other Annotation Tools : --genomeScore [], --bed [], --tabix [] Load reference genome /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/hs37d5.chr7.fa... DONE: 1 chromosomes and 159138663 bases are loaded. Load codon file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt... DONE: codon file loaded. Load priority file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt... DONE: 24 priority annotation types loaded. Load gene file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz... DONE: 92627 gene loaded. DONE: Generated frequency of each annotype type in [ snps/chr7.filtered.PASS.beagled.anno.vcf.gz.anno.frq ]. DONE: Generated frequency of each highest priority annotation type in [ snps/chr7.filtered.PASS.beagled.anno.vcf.gz.top.anno.frq ]. Ts/Tv ratio: 2.02397 Ts observed: 929 times; Tv observed: 459 times. DONE: Generated frequency of each base change in [ snps/chr7.filtered.PASS.beagled.anno.vcf.gz.base.frq ]. DONE: Generated frequency of each codon change in [ snps/chr7.filtered.PASS.beagled.anno.vcf.gz.codon.frq ]. DONE: Generated frequency of indel length in [ snps/chr7.filtered.PASS.beagled.anno.vcf.gz.indel.frq ]. .............................................. ... Anno(tation) ... ... Xiaowei Zhan, Goncalo Abecasis ... ... Speical Thanks: ... ... Hyun Ming Kang, Yanming Li ... ... zhanxw@umich.edu ... ... Sep 2011 ... ................................................ DONE: 1388 varaints are annotated. DONE: Generated annotation output in [ snps/chr7.filtered.PASS.beagled.anno.vcf.gz ]. Annotation succeed! mv snps/chr7.filtered.PASS.beagled.anno.vcf.gz /home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz.tmp /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/bgzip -c /home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz.tmp > snps/chr7.filtered.PASS.beagled.anno.vcf.gz /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/tabix -pvcf -f snps/chr7.filtered.PASS.beagled.anno.vcf.gz rm /home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz.tmp rm snps/chr7.filtered.PASS.beagled.anno.vcf.gz.log snps/chr7.filtered.PASS.beagled.anno.vcf.gz.top.anno.frq snps/chr7.filtered.PASS.beagled.anno.vcf.gz.anno.frq snps/chr7.filtered.PASS.beagled.anno.vcf.gz.base.frq snps/chr7.filtered.PASS.beagled.anno.vcf.gz.codon.frq snps/chr7.filtered.PASS.beagled.anno.vcf.gz.indel.frq
You can look at example non-synonymous SNPs.
% zcat snps/chr7.filtered.PASS.beagled.anno.vcf.gz | grep Nonsynonymous | grep CFTR | cut -f 1-8 | head -1
The following examples running gene-level burden test (skipped for the workshop),for your record.
% ${GC}/epacts/bin/epacts make-group --vcf snps/chr7.filtered.PASS.beagled.anno.vcf.gz --out snps/chr7.filtered.PASS.beagled.anno.grp –nonsyn
% ${GC}/epacts/bin/epacts group --ped ${GC}/examples/index/chr7.CFTR.ped --vcf snps/chr7.filtered.PASS.beagled.anno.vcf.gz --out assoc/group.skat.o --groupf snps/chr7.filtered.PASS.beagled.anno.grp --test skat --skat-o --run 2
% cat assoc/group.skat.o.epacts
% ${GC}/epacts/bin/epacts group --ped ${GC}/examples/index/chr7.CFTR.ped --vcf snps/
STEP 5 : All-in-one running of all scripts (for later use)
% tar xzvf /home/presenter02/day2/session1/uwcmg_2013_08.tar.gz�
% cd /home/presenter02/day2/session1/uwcmg_2013_08�
% sh go.sh �