Difference between revisions of "Tutorial: GotCloud UW CMG"

From Genome Analysis Wiki
Jump to navigationJump to search
 
(9 intermediate revisions by the same user not shown)
Line 37: Line 37:
 
  SRR035022_2.fastq.gz  SRR035023_2.fastq.gz  SRR035024_2.fastq.gz  SRR035025_2.fastq.gz  SRR035026_2.fastq.gz  SRR035027_2.fastq.gz  SRR035669_2.fastq.gz  SRR622461_2.fastq.gz
 
  SRR035022_2.fastq.gz  SRR035023_2.fastq.gz  SRR035024_2.fastq.gz  SRR035025_2.fastq.gz  SRR035026_2.fastq.gz  SRR035027_2.fastq.gz  SRR035669_2.fastq.gz  SRR622461_2.fastq.gz
 
  SRR035022.fastq.gz    SRR035023.fastq.gz    SRR035024.fastq.gz    SRR035025.fastq.gz    SRR035026.fastq.gz    SRR035027.fastq.gz    SRR035669.fastq.gz    SRR622461.fastq.gz
 
  SRR035022.fastq.gz    SRR035023.fastq.gz    SRR035024.fastq.gz    SRR035025.fastq.gz    SRR035026.fastq.gz    SRR035027.fastq.gz    SRR035669.fastq.gz    SRR622461.fastq.gz
 +
 +
To see examples of FASTQ file.
 +
% zless ${GC}/examples/fastq/SRR035022_1.fastq.gz
  
 
This set of FASTQ are extracted from two samples from the 1000 Genomes project, focusing around 500kb region are CFTR.
 
This set of FASTQ are extracted from two samples from the 1000 Genomes project, focusing around 500kb region are CFTR.
Line 105: Line 108:
 
  NA06984.recal.bam      NA06984.recal.bam.bai.done  NA06984.recal.bam.metrics  NA12878.recal.bam      NA12878.recal.bam.bai.done  NA12878.recal.bam.metrics
 
  NA06984.recal.bam      NA06984.recal.bam.bai.done  NA06984.recal.bam.metrics  NA12878.recal.bam      NA12878.recal.bam.bai.done  NA12878.recal.bam.metrics
 
  NA06984.recal.bam.bai  NA06984.recal.bam.done      NA06984.recal.bam.qemp    NA12878.recal.bam.bai  NA12878.recal.bam.done      NA12878.recal.bam.qemp
 
  NA06984.recal.bam.bai  NA06984.recal.bam.done      NA06984.recal.bam.qemp    NA12878.recal.bam.bai  NA12878.recal.bam.done      NA12878.recal.bam.qemp
 +
 +
You can view the aligned sequence reads by running
 +
% samtools view align/bams/NA06984.recal.bam | less
  
 
Examine the QC metrics by
 
Examine the QC metrics by
Line 167: Line 173:
  
 
  % ls ${GC}/examples/bams
 
  % ls ${GC}/examples/bams
 +
NA06984.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam      NA12272.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai
 +
NA06984.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai  NA12273.mapped.ILLUMINA.bwa.CEU.low_coverage.20121211.CFTR.bam
 +
NA06985.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam      NA12273.mapped.ILLUMINA.bwa.CEU.low_coverage.20121211.CFTR.bam.bai
 +
NA06985.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai  NA12275.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam
 +
NA06986.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam      NA12275.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai
 +
NA06986.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai  NA12282.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam
 +
NA06989.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam      NA12282.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai
 +
NA06989.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai  NA12283.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam
 +
NA06994.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam      NA12283.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai
 +
NA06994.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai  NA12286.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam
 +
NA07000.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam      NA12286.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai
 +
NA07000.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai  NA12287.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam
 +
NA07037.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.CFTR.bam      NA12287.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai
 +
...
  
 +
To run SNP calling pipeline, we need to prepare index file corresponding to the BAM files with sample IDs
 +
 
  % cat ${GC}/examples/index/chr7.CFTR.low_coverage.index
 
  % cat ${GC}/examples/index/chr7.CFTR.low_coverage.index
 +
NA06984 EUR bams/NA06984.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam
 +
NA06985 EUR bams/NA06985.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam
 +
NA06986 EUR bams/NA06986.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam
 +
NA06989 EUR bams/NA06989.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam
 +
NA06994 EUR bams/NA06994.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam
 +
NA07000 EUR bams/NA07000.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam
 +
NA07037 EUR bams/NA07037.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.CFTR.bam
 +
NA07048 EUR bams/NA07048.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam
 +
NA07051 EUR bams/NA07051.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam
 +
NA07056 EUR bams/NA07056.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam
 +
...
 +
 +
Also, we need a configuration file.
  
 
  % cat ${GC}/examples/index/chr7.CFTR.low_coverage.conf
 
  % cat ${GC}/examples/index/chr7.CFTR.low_coverage.conf
 +
CHRS = 7
 +
BAM_INDEX = index/chr7.CFTR.low_coverage.index
 +
############
 +
# References
 +
REF_ROOT = chr7Ref
 +
#
 +
REF = $(REF_ROOT)/hs37d5.chr7.fa
 +
INDEL_PREFIX = $(REF_ROOT)/1kg.pilot_release.merged.indels.sites.hg19
 +
DBSNP_VCF =  $(REF_ROOT)/dbsnp_135.b37.chr7.CFTR.vcf.gz
 +
HM3_VCF =  $(REF_ROOT)/hapmap_3.3.b37.sites.chr7.CFTR.vcf.gz
 +
OMNI_VCF = $(REF_ROOT)/1000G_omni2.5.b37.sites.PASS.chr7.CFTR.vcf.gz
 +
 +
Then we're ready to run SNP calling pipeline
 +
 +
% ${GC}/gotcloud/gotcloud snpcall --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2
 +
CHRS = 7
 +
BAM_INDEX = index/chr7.CFTR.low_coverage.index
 +
############
 +
# References
 +
REF_ROOT = chr7Ref
 +
#
 +
REF = $(REF_ROOT)/hs37d5.chr7.fa
 +
INDEL_PREFIX = $(REF_ROOT)/1kg.pilot_release.merged.indels.sites.hg19
 +
DBSNP_VCF =  $(REF_ROOT)/dbsnp_135.b37.chr7.CFTR.vcf.gz
 +
HM3_VCF =  $(REF_ROOT)/hapmap_3.3.b37.sites.chr7.CFTR.vcf.gz
 +
OMNI_VCF = $(REF_ROOT)/1000G_omni2.5.b37.sites.PASS.chr7.CFTR.vcf.gz
 +
[presenter02@uwcmg002 test]$ time ${GC}/gotcloud/gotcloud snpcall --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2
 +
Key configurations:
 +
GOTCLOUD_ROOT: /home/presenter02/day2/session1/uwcmg_2013_08/gotcloud
 +
OUT_DIR:      snps
 +
BAM_INDEX:    index/chr7.CFTR.low_coverage.index
 +
REF:          chr7Ref/hs37d5.chr7.fa
 +
CHRS:          7
 +
BATCH_TYPE:    local
 +
BATCH_OPTS:   
 +
 +
Processing the following steps...
 +
2: RUN_PILEUP
 +
3: RUN_GLFMULTIPLES
 +
4: RUN_VCFPILEUP
 +
5: RUN_FILTER
 +
6: RUN_SVM
 +
8: RUN_SPLIT
 +
Call region is 7:117000000-117500000
 +
Generating commands for chr7...
 +
Creating glf INDEX at 7:115000001-120000000..
 +
--------------------------------------------------------------------
 +
Finished creating makefile /home/presenter02/test/snps/umake.Makefile
 +
 +
Running /home/presenter02/test/snps/umake.Makefile
 +
 +
Start merging VCFs...
 +
Merging /home/presenter02/test/snps/vcfs/chr7/115000001.120000000/chr7.115000001.120000000.vcf into STDOUT...
 +
Start merging VCFs...
 +
Merging /home/presenter02/test/snps/vcfs/chr7/115000001.120000000/chr7.115000001.120000000.stats.vcf into STDOUT...
 +
 +
The following parameters are available.  Ones with "[]" are in effect:
 +
 +
Available Options
 +
                          Recipes : --write-bed, --write-vcf [ON], --upgrade,
 +
                                    --summarize, --filter [ON], --subset
 +
                VCF Input options : --in-vcf [/home/presenter02/test/snps/vcfs/chr7/chr7.merged.stats.vcf]
 +
                BED Input options : --in-bfile [], --in-bed [], --in-bim [],
 +
                                    --in-fam [],
 +
                                    --ref [/data/local/ref/karma.ref/human.g1k.v37.fa]
 +
              Subsetting options : --in-subset [], --mono-subset,
 +
                                    --filt-only-subset
 +
                  Output Options : --out [/home/presenter02/test/snps/vcfs/chr7/chr7.hardfiltered.sites.vcf],
 +
                                    --qGeno, --print-every [10000]
 +
      Output compression Options : --plain [ON], --bgzf, --gzip
 +
    Genotype-level Filter Options : --minGQ, --minGD
 +
                  Filter Options : --winIndel [5],
 +
                                    --indelVCF [/home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/1kg.pilot_release.merged.indels.sites.hg19.chr7.vcf],
 +
                                    --minQUAL [5], --minMQ [20],
 +
                                    --maxDP [99000], --minDP [99],
 +
                                    --maxABL [70], --winFFRQ, --maxFFRQ,
 +
                                    --winFVAR, --merFVAR, --maxFVAR,
 +
                                    --minNS [1], --maxSTP [2147483647],
 +
                                    --maxTTT [2147483647],
 +
                                    --minTTT [-2147483648], --maxSTR [20],
 +
                                    --minSTR [-20], --maxSTZ [5], --minSTZ [-5],
 +
                                    --maxCBR [20], --minCBR [-100],
 +
                                    --maxQBR [100], --minQBR [-100],
 +
                                    --maxCBZ [2147483647], --maxCSR [100],
 +
                                    --minCSR [-100], --maxLQZ [2147483647],
 +
                                    --minLQZ [-2147483648],
 +
                                    --maxRBZ [2147483647],
 +
                                    --minRBZ [-2147483648],
 +
                                    --maxIOZ [2147483647],
 +
                                    --minIOR [-2147483648],
 +
                                    --maxIOR [2147483647],
 +
                                    --maxAOZ [2147483647], --maxAOI [5],
 +
                                    --maxMQ0 [10], --maxMQ10 [100],
 +
                                    --maxMQ20 [100], --minFIC [-20],
 +
                                    --minABE [-100], --maxABE [100],
 +
                                    --maxLQR [30], --minMBR [-100],
 +
                                    --maxMBR [100], --minABZ [-2147483648],
 +
                                    --maxABZ [2147483647],
 +
                                    --maxBCS [2147483647], --keepFilter
 +
 +
Analysis started on Tue Aug  6 12:25:04 2013
 +
 +
The following filters are in effect:
 +
m20 : MQ >= 20.00
 +
dp99 : DP >= 99.00
 +
DP99000 : DP <= 99000.00
 +
ns1 : NS >= 1.00
 +
AB70 : AB <= 0.70
 +
STR20 : STR <= 0.20
 +
str-20 : STR >= -0.20
 +
stz-5 : STZ >= -5.00
 +
STZ5 : STZ <= 5.00
 +
CBR20 : CBR <= 0.20
 +
AOI5 : AOI <= 5.00
 +
MQ0_10 : MQ0 <= 0.10
 +
fic-20 : FIC >= -0.20
 +
LQR30 : LQR <= 0.30
 +
q5 : QUAL >= 5
 +
INDEL5 : INDEL >= 5 bp with /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/1kg.pilot_release.merged.indels.sites.hg19.chr7.vcf
 +
 +
Writing to VCF file /home/presenter02/test/snps/vcfs/chr7/chr7.hardfiltered.sites.vcf
 +
Analysis finished on Tue Aug  6 12:25:05 2013
 +
 +
loading /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/dbsnp_135.b37.chr7.CFTR.vcf.gz as a VCF input..
 +
finished loading dbSNP for 7415 variants
 +
TOTAL 1538 1538 662 378 166 25 145 129 27 6
 +
 +
The following parameters are available.  Ones with "[]" are in effect:
 +
 +
Available Options
 +
                Input options : --in [/home/presenter02/test/snps/vcfs/chr7/chr7.filtered.sites.vcf.raw],
 +
                                --ignore
 +
                Output Options : --out [/home/presenter02/test/snps/vcfs/chr7/chr7.filtered.sites.vcf.norm],
 +
                                --digits [3], --rank, --verbose [ON]
 +
    Output compression Options : --plain [ON], --bgzf, --gzip
 +
 +
Analysis started on Tue Aug  6 12:25:05 2013
 +
 +
Reading Input File /home/presenter02/test/snps/vcfs/chr7/chr7.filtered.sites.vcf.raw
 +
Finished Reading 1538 lines
 +
Starting sorting values
 +
Writing output values
 +
Analysis finished on Tue Aug  6 12:25:05 2013
 +
 +
loading /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/dbsnp_135.b37.chr7.CFTR.vcf.gz as a VCF input..
 +
finished loading dbSNP for 7415 variants
 +
TOTAL 1538 1538 662 378 166 25 145 129 27 6
 +
  Commands finished in 306 secs with no errors reported
 +
--------------------------------------------------------------------
  
% time ${GC}/gotcloud/gotcloud snpcall --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2
+
We can summarize the quality of variants.
  
 
  % cat ${GC}/out/snps/vcfs/chr7/chr7.filtered.sites.vcf.summary
 
  % cat ${GC}/out/snps/vcfs/chr7/chr7.filtered.sites.vcf.summary
 +
--------------------------------------------------------------------------------------------------------------
 +
              FILTER #SNPs #dbSNP %dbSNP %CpG %CpG %Known %Novel %nCpG-K %nCpG-N %HM3 %HM3
 +
                    Known Novel Ts/Tv Ts/Tv Ts/Tv Ts/Tv sens /SNP
 +
--------------------------------------------------------------------------------------------------------------
 +
              INDEL5 11 8 72.7 25.0 0.0 3.00 0.50 2.00 0.50 0.000 0.000
 +
          INDEL5;SVM 8 4 50.0 50.0 25.0 0.33 0.33 1.00 0.00 0.000 0.000
 +
                PASS 1388 1186 85.4 15.5 11.4 2.10 1.62 1.79 1.42 85.083 11.095
 +
                  SVM 131 33 25.2 9.1 9.2 1.06 0.85 0.88 0.78 0.000 0.000
 +
--------------------------------------------------------------------------------------------------------------
 +
              FILTER #SNPs #dbSNP %dbSNP %CpG %CpG %Known %Novel %nCpG-K %nCpG-N %HM3 %HM3
 +
                    Known Novel Ts/Tv Ts/Tv Ts/Tv Ts/Tv sens /SNP
 +
--------------------------------------------------------------------------------------------------------------
 +
              INDEL5 19 12 63.2 33.3 14.3 1.40 0.40 1.67 0.20 0.000 0.000
 +
                PASS 1388 1186 85.4 15.5 11.4 2.10 1.62 1.79 1.42 85.083 11.095
 +
                  SVM 139 37 26.6 13.5 9.8 0.95 0.82 0.88 0.74 0.000 0.000
 +
--------------------------------------------------------------------------------------------------------------
 +
                PASS 1388 1186 85.4 15.5 11.4 2.10 1.62 1.79 1.42 85.083 11.095
 +
                FAIL 150 45 30.0 15.6 9.5 1.14 0.81 1.00 0.73 0.000 0.000
 +
                TOTAL 1538 1231 80.0 15.5 10.7 2.05 1.27 1.75 1.12 85.083 10.013
 +
------------------------------------------------------------------------------------------------------------
  
% time ${GC}/gotcloud/gotcloud beagle --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2
+
To run LD-refinement you can run beagle only (gotcloud beagle) or beagle+thunder (gotcloud ldrefine) to improve the quality of genotypes.
 +
 
 +
If you're interested in looking at individual reads from the variant, samtools tview provides a good alternative to IGV. In order to see CFTR R75Q variant in one of the variant carriers, you can run
  
 
  % samtools tview ${GC}/examples/bams/NA12843.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam ${GC}/examples/chr7Ref/hs37d5.chr7.fa
 
  % samtools tview ${GC}/examples/bams/NA12843.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam ${GC}/examples/chr7Ref/hs37d5.chr7.fa
 +
 +
And follow the steps
 +
* Type ‘g’
 +
* Type 7:117149147
 +
* Type ‘n’ to color by nucleotide
 +
* Move with left arrow to see the variant site clearly
 +
 +
% ${GC}/gotcloud/gotcloud beagle --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2
 +
Key configurations:
 +
GOTCLOUD_ROOT: /home/presenter02/day2/session1/uwcmg_2013_08/gotcloud
 +
OUT_DIR:      snps
 +
BAM_INDEX:    index/chr7.CFTR.low_coverage.index
 +
REF:          chr7Ref/hs37d5.chr7.fa
 +
CHRS:          7
 +
BATCH_TYPE:    local
 +
BATCH_OPTS:   
 +
 +
Processing the following steps...
 +
9: RUN_BEAGLE
 +
10: RUN_SUBSET
 +
Call region is 7:117000000-117500000
 +
Generating commands for chr7...
 +
--------------------------------------------------------------------
 +
Finished creating makefile /home/presenter02/test/snps/umake.Makefile
 +
 +
Running /home/presenter02/test/snps/umake.Makefile
 +
 +
  Commands finished in 74 secs with no errors reported
 +
--------------------------------------------------------------------
  
 
== STEP 4 : EPACTS association analysis  ==  
 
== STEP 4 : EPACTS association analysis  ==  
% head ${GC}/examples/index/chr7.CFTR.ped
 
  
 +
In addition to variant calls, we need phenotype information to perform association mapping. Here, the phenotypes are created so that case is enriched in the individuals carrying rare nonsynonymous variants at CFTR.
 +
 +
% less ${GC}/examples/index/chr7.CFTR.ped
 +
#FAM_ID IND_ID DAD_ID MOM_ID SEX PHENO QT
 +
NA06984 NA06984 0 0 0 1 -0.657149296617419
 +
NA06985 NA06985 0 0 0 2 -0.449034856685281
 +
NA06986 NA06986 0 0 0 1 0.0975849179986626
 +
NA06989 NA06989 0 0 0 2 1.77095069670763
 +
NA06994 NA06994 0 0 0 1 0.287475900193007
 +
NA07000 NA07000 0 0 0 1 -1.36632872407691
 +
NA07037 NA07037 0 0 0 2 -0.1278913321612
 +
NA07048 NA07048 0 0 0 2 -1.45798868745693
 +
NA07051 NA07051 0 0 0 1 -0.599618650565132
 +
...
 +
 +
Create a directory to store association results
 
  % mkdir assoc
 
  % mkdir assoc
  
  % time ${GC}/epacts/bin/epacts single  --ped ${GC}/examples/index/chr7.CFTR.ped  --vcf ${GC}/out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz  --pheno PHENO --out assoc/single.b.score --test b.score --anno  --ref ${GC}/examples/chr7Ref/hs37d5.chr7.fa  --region 7:117000000-117500000 --run 1
+
Run single variant logistic score test.
 +
  % ${GC}/epacts/bin/epacts single  --ped ${GC}/examples/index/chr7.CFTR.ped  --vcf ${GC}/out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz  --pheno PHENO --out assoc/single.b.score --test b.score --anno  --ref ${GC}/examples/chr7Ref/hs37d5.chr7.fa  --region 7:117000000-117500000 --run 1
 +
Successfully written phenotypes and 0 covariates across 99 individuals
 +
Processing chromosome 7...
 +
Finished generating EPACTS Makefile
 +
Running 1 parallel jobs of EPACTS
 +
make -f /home/presenter02/test.v1/assoc/single.b.score.Makefile -j 1
 +
Loading required package: epactsR
 +
Sucessfully wrote ( 1388 * 10 ) matrix
 +
The following parameters are available.  Ones with "[]" are in effect:
 +
 +
Available Options
 +
    Required Parameters :
 +
                          -i [/home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts]
 +
                          -o [/home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp]
 +
          Gene Annotation
 +
              Parameters : -g [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz]
 +
                          -r [/home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa]
 +
                          --inputFormat [epacts], --checkReference, -f [refGene]
 +
                          -p [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt]
 +
                          -c [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt]
 +
                          -u [], -d [], --se [], --si [], --outputFormat []
 +
  Other Annotation Tools : --genomeScore [], --bed [], --tabix []
 +
Load reference genome /home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa...
 +
DONE: 1 chromosomes and 159138663 bases are loaded.
 +
Load codon file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt...
 +
DONE: codon file loaded.
 +
Load priority file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt...
 +
DONE: 24 priority annotation types loaded.
 +
Load gene file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz...
 +
DONE: 92627 gene loaded.
 +
DONE: Generated frequency of each annotype type in [ /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.anno.frq ].
 +
DONE: Generated frequency of each highest priority annotation type in [ /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.top.anno.frq ].
 +
Ts/Tv ratio: 2.02397
 +
Ts observed: 929  times; Tv observed: 459 times.
 +
DONE: Generated frequency of each base change in [ /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.base.frq ].
 +
DONE: Generated frequency of each codon change in [ /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.codon.frq ].
 +
DONE: Generated frequency of indel length in [ /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.indel.frq ].
 +
nsnps = 866
 +
Rscript /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/epactsSingle.R --vanilla /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../ /home/presenter02/test.v1/assoc/single.b.score.phe NULL /home/presenter02/test.v1/assoc/single.b.score.ind /home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz 7:117000000-117500000 /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts GT 1e-06 1 3 1000000000 0.5 0 FALSE single.b.score
 +
NULL
 +
NULL
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/epacts-anno --in /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts --ref /home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/anno -i /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts -r /home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa -f refGene -g /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz -c /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt -o /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp --inputFormat epacts -p /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt
 +
..............................................       
 +
  ...      Anno(tation)                      ...     
 +
  ...      Xiaowei Zhan, Goncalo Abecasis    ...     
 +
    ...      Speical Thanks:                    ...   
 +
    ...      Hyun Ming Kang, Yanming Li        ...   
 +
      ...      zhanxw@umich.edu                    ... 
 +
      ...      Sep 2011                            ...
 +
        ................................................
 +
                                                       
 +
DONE: 1388 varaints are annotated.
 +
DONE: Generated annotation output in [ /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp ].
 +
Annotation succeed!
 +
mv /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts
 +
rm /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.log /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.top.anno.frq /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.anno.frq /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.base.frq /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.codon.frq /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.indel.frq
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/epacts-cat /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts | /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/bgzip -c > /home/presenter02/test.v1/assoc/single.b.score.epacts.gz
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/tabix -f -pbed /home/presenter02/test.v1/assoc/single.b.score.epacts.gz
 +
rm -f /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/epacts-plot --in /home/presenter02/test.v1/assoc/single.b.score.epacts.gz --region 7:0
 +
Rscript /home/presenter02/test.v1/assoc/single.b.score.epacts.R --vanilla
 +
 +
export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; gnuplot /home/presenter02/test.v1/assoc/single.b.score.epacts.cmd
 +
export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/epstopdf /home/presenter02/test.v1/assoc/single.b.score.epacts.qq.eps
 +
export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; gnuplot /home/presenter02/test.v1/assoc/single.b.score.epacts.cmd
 +
export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/epstopdf /home/presenter02/test.v1/assoc/single.b.score.epacts.mh.eps
 +
rm /home/presenter02/test.v1/assoc/single.b.score.epacts.cmd /home/presenter02/test.v1/assoc/single.b.score.epacts.qq.eps /home/presenter02/test.v1/assoc/single.b.score.epacts.mh.eps /home/presenter02/test.v1/assoc/single.b.score.epacts.*.dat
 +
zcat /home/presenter02/test.v1/assoc/single.b.score.epacts.gz | awk '$9 != "NA" { print $0 }' | sort -g -k 9 | head -n 5000 > /home/presenter02/test.v1/assoc/single.b.score.epacts.top5000
 +
touch /home/presenter02/test.v1/assoc/single.b.score.epacts.OK
 +
d-69-91-154-62:~ hmkang$ cat tmp.txt | perl -lane 's/test\.v1/test/g; print " $_"'
 +
uccessfully written phenotypes and 0 covariates across 99 individuals
 +
Processing chromosome 7...
 +
Finished generating EPACTS Makefile
 +
Running 1 parallel jobs of EPACTS
 +
make -f /home/presenter02/test/assoc/single.b.score.Makefile -j 1
 +
Loading required package: epactsR
 +
Sucessfully wrote ( 1388 * 10 ) matrix
 +
The following parameters are available.  Ones with "[]" are in effect:
 +
 +
Available Options
 +
    Required Parameters :
 +
                          -i [/home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts]
 +
                          -o [/home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp]
 +
          Gene Annotation
 +
              Parameters : -g [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz]
 +
                          -r [/home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa]
 +
                          --inputFormat [epacts], --checkReference, -f [refGene]
 +
                          -p [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt]
 +
                          -c [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt]
 +
                          -u [], -d [], --se [], --si [], --outputFormat []
 +
  Other Annotation Tools : --genomeScore [], --bed [], --tabix []
 +
Load reference genome /home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa...
 +
DONE: 1 chromosomes and 159138663 bases are loaded.
 +
Load codon file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt...
 +
DONE: codon file loaded.
 +
Load priority file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt...
 +
DONE: 24 priority annotation types loaded.
 +
Load gene file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz...
 +
DONE: 92627 gene loaded.
 +
DONE: Generated frequency of each annotype type in [ /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.anno.frq ].
 +
DONE: Generated frequency of each highest priority annotation type in [ /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.top.anno.frq ].
 +
Ts/Tv ratio: 2.02397
 +
Ts observed: 929  times; Tv observed: 459 times.
 +
DONE: Generated frequency of each base change in [ /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.base.frq ].
 +
DONE: Generated frequency of each codon change in [ /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.codon.frq ].
 +
DONE: Generated frequency of indel length in [ /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.indel.frq ].
 +
nsnps = 866
 +
Rscript /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/epactsSingle.R --vanilla /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../ /home/presenter02/test/assoc/single.b.score.phe NULL /home/presenter02/test/assoc/single.b.score.ind /home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz 7:117000000-117500000 /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts GT 1e-06 1 3 1000000000 0.5 0 FALSE single.b.score
 +
NULL
 +
NULL
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/epacts-anno --in /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts --ref /home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/anno -i /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts -r /home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa -f refGene -g /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz -c /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt -o /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp --inputFormat epacts -p /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt
 +
..............................................       
 +
  ...      Anno(tation)                      ...     
 +
  ...      Xiaowei Zhan, Goncalo Abecasis    ...     
 +
    ...      Speical Thanks:                    ...   
 +
    ...      Hyun Ming Kang, Yanming Li        ...   
 +
      ...      zhanxw@umich.edu                    ... 
 +
      ...      Sep 2011                            ...
 +
        ................................................
 +
                                                       
 +
DONE: 1388 varaints are annotated.
 +
DONE: Generated annotation output in [ /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp ].
 +
Annotation succeed!
 +
mv /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts
 +
rm /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.log /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.top.anno.frq /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.anno.frq /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.base.frq /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.codon.frq /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.indel.frq
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/epacts-cat /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts | /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/bgzip -c > /home/presenter02/test/assoc/single.b.score.epacts.gz
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/tabix -f -pbed /home/presenter02/test/assoc/single.b.score.epacts.gz
 +
rm -f /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/epacts-plot --in /home/presenter02/test/assoc/single.b.score.epacts.gz --region 7:0
 +
Rscript /home/presenter02/test/assoc/single.b.score.epacts.R --vanilla
 +
 +
export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; gnuplot /home/presenter02/test/assoc/single.b.score.epacts.cmd
 +
export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/epstopdf /home/presenter02/test/assoc/single.b.score.epacts.qq.eps
 +
export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; gnuplot /home/presenter02/test/assoc/single.b.score.epacts.cmd
 +
export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/epstopdf /home/presenter02/test/assoc/single.b.score.epacts.mh.eps
 +
rm /home/presenter02/test/assoc/single.b.score.epacts.cmd /home/presenter02/test/assoc/single.b.score.epacts.qq.eps /home/presenter02/test/assoc/single.b.score.epacts.mh.eps /home/presenter02/test/assoc/single.b.score.epacts.*.dat
 +
zcat /home/presenter02/test/assoc/single.b.score.epacts.gz | awk '$9 != "NA" { print $0 }' | sort -g -k 9 | head -n 5000 > /home/presenter02/test/assoc/single.b.score.epacts.top5000
 +
touch /home/presenter02/test/assoc/single.b.score.epacts.OK
 +
 
 +
To examine the top signals of association, try
 +
 
 +
% head assoc/single.b.score.epacts.top5000
 +
#CHROM BEGIN END MARKER_ID NS AC CALLRATE MAF PVALUE SCORE NS.CASE NS.CTRL AF.CASE AF.CTRL
 +
7 117130755 117130755 7:117130755_C/G_Intron:CFTR 99 9 1 0.045455 0.0018407 3.1148 50 49 0.09 0
 +
7 117149147 117149147 7:117149147_G/A_Nonsynonymous:CFTR 99 9 1 0.045455 0.0018407 3.1148 50 49 0.09 0
 +
7 117120141 117120141 7:117120141_G/C_Utr5:CFTR 99 14 1 0.070707 0.0034417 -2.9253 50 49 0.02 0.12245
 +
7 117356483 117356483 7:117356483_C/T_Intron:CTTNBP2 99 11 1 0.055556 0.0035695 -2.9139 50 49 0.01 0.10204
 +
7 117459023 117459023 7:117459023_A/G_Intron:CTTNBP2 99 121 1 0.38889 0.0047674 2.8223 5049 0.71 0.5102
 +
7 117430909 117430909 7:117430909_T/C_Intron:CTTNBP2 99 10 1 0.050505 0.0068926 -2.702 50 49 0.01 0.091837
 +
7 117456904 117456904 7:117456904_C/T_Intron:CTTNBP2 99 115 1 0.41919 0.0076959 2.6651 5049 0.68 0.47959
 +
7 117457141 117457141 7:117457141_G/C_Intron:CTTNBP2 99 115 1 0.41919 0.0076959 2.6651 5049 0.68 0.47959
 +
7 117467003 117467003 7:117467003_T/A_Intron:CTTNBP2 99 122 1 0.38384 0.0077344 2.6634 5049 0.71 0.52041
 +
 
 +
Also you can examine the PDF output file (although it is a tiny fraction of genome), if your X11 works.
 +
 
 +
% xpdf assoc/single.b.score.epacts.qq.pdf
 +
% xpdf assoc/single.b.score.epacts.mh.pdf
 +
 
 +
Running variant annotation using EPACTS is also possible
  
 
  % time ${GC}/epacts/bin/epacts anno --in ${GC}/out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz --out snps/chr7.filtered.PASS.beagled.anno.vcf.gz --ref ${GC}examples/chr7Ref/hs37d5.chr7.fa
 
  % time ${GC}/epacts/bin/epacts anno --in ${GC}/out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz --out snps/chr7.filtered.PASS.beagled.anno.vcf.gz --ref ${GC}examples/chr7Ref/hs37d5.chr7.fa
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/anno -i /home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz -r /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/hs37d5.chr7.fa -f refGene -g /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz -c /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt -o snps/chr7.filtered.PASS.beagled.anno.vcf.gz --inputFormat vcf -p /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt
 +
The following parameters are available.  Ones with "[]" are in effect:
 +
 +
Available Options
 +
    Required Parameters :
 +
                          -i [/home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz]
 +
                          -o [snps/chr7.filtered.PASS.beagled.anno.vcf.gz]
 +
          Gene Annotation
 +
              Parameters : -g [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz]
 +
                          -r [/home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/hs37d5.chr7.fa]
 +
                          --inputFormat [vcf], --checkReference, -f [refGene]
 +
                          -p [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt]
 +
                          -c [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt]
 +
                          -u [], -d [], --se [], --si [], --outputFormat []
 +
  Other Annotation Tools : --genomeScore [], --bed [], --tabix []
 +
Load reference genome /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/hs37d5.chr7.fa...
 +
DONE: 1 chromosomes and 159138663 bases are loaded.
 +
Load codon file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt...
 +
DONE: codon file loaded.
 +
Load priority file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt...
 +
DONE: 24 priority annotation types loaded.
 +
Load gene file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz...
 +
DONE: 92627 gene loaded.
 +
DONE: Generated frequency of each annotype type in [ snps/chr7.filtered.PASS.beagled.anno.vcf.gz.anno.frq ].
 +
DONE: Generated frequency of each highest priority annotation type in [ snps/chr7.filtered.PASS.beagled.anno.vcf.gz.top.anno.frq ].
 +
Ts/Tv ratio: 2.02397
 +
Ts observed: 929  times; Tv observed: 459 times.
 +
DONE: Generated frequency of each base change in [ snps/chr7.filtered.PASS.beagled.anno.vcf.gz.base.frq ].
 +
DONE: Generated frequency of each codon change in [ snps/chr7.filtered.PASS.beagled.anno.vcf.gz.codon.frq ].
 +
DONE: Generated frequency of indel length in [ snps/chr7.filtered.PASS.beagled.anno.vcf.gz.indel.frq ].
 +
..............................................       
 +
  ...      Anno(tation)                      ...     
 +
  ...      Xiaowei Zhan, Goncalo Abecasis    ...     
 +
    ...      Speical Thanks:                    ...   
 +
    ...      Hyun Ming Kang, Yanming Li        ...   
 +
      ...      zhanxw@umich.edu                    ... 
 +
      ...      Sep 2011                            ...
 +
        ................................................
 +
                                                       
 +
DONE: 1388 varaints are annotated.
 +
DONE: Generated annotation output in [ snps/chr7.filtered.PASS.beagled.anno.vcf.gz ].
 +
Annotation succeed!
 +
mv snps/chr7.filtered.PASS.beagled.anno.vcf.gz /home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz.tmp
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/bgzip -c /home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz.tmp > snps/chr7.filtered.PASS.beagled.anno.vcf.gz
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/tabix -pvcf -f snps/chr7.filtered.PASS.beagled.anno.vcf.gz
 +
rm /home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz.tmp
 +
rm snps/chr7.filtered.PASS.beagled.anno.vcf.gz.log snps/chr7.filtered.PASS.beagled.anno.vcf.gz.top.anno.frq snps/chr7.filtered.PASS.beagled.anno.vcf.gz.anno.frq snps/chr7.filtered.PASS.beagled.anno.vcf.gz.base.frq snps/chr7.filtered.PASS.beagled.anno.vcf.gz.codon.frq snps/chr7.filtered.PASS.beagled.anno.vcf.gz.indel.frq
 +
 +
You can look at example non-synonymous SNPs.
  
 
  % zcat snps/chr7.filtered.PASS.beagled.anno.vcf.gz | grep Nonsynonymous | grep CFTR | cut -f 1-8 | head -1
 
  % zcat snps/chr7.filtered.PASS.beagled.anno.vcf.gz | grep Nonsynonymous | grep CFTR | cut -f 1-8 | head -1
 +
 +
The following examples running gene-level burden test (skipped for the workshop),for your record.
  
 
  % ${GC}/epacts/bin/epacts make-group --vcf snps/chr7.filtered.PASS.beagled.anno.vcf.gz --out snps/chr7.filtered.PASS.beagled.anno.grp –nonsyn
 
  % ${GC}/epacts/bin/epacts make-group --vcf snps/chr7.filtered.PASS.beagled.anno.vcf.gz --out snps/chr7.filtered.PASS.beagled.anno.grp –nonsyn
Line 201: Line 663:
 
== STEP 5 : All-in-one running of all scripts (for later use) ==  
 
== STEP 5 : All-in-one running of all scripts (for later use) ==  
  
  % tar xzvf /home/presenter02/day2/session1/uwcmg_2013_08.tar.gz�
+
  % tar xzvf /home/presenter02/day2/session1/uwcmg_2013_08.tar.gz
  
  % cd /home/presenter02/day2/session1/uwcmg_2013_08�
+
  % cd /home/presenter02/day2/session1/uwcmg_2013_08
  
  % sh go.sh
+
  % sh go.sh

Latest revision as of 16:15, 6 August 2013

GotCloud / EPACTS Tutorial

In this tutorial, we illustrate some of the essential steps in the analysis of next generation sequence data.

For a background on GotCloud and Sequence Analysis Pipelines, see GotCloud

While GotCloud can run on a cluster of machines or instances, this tutorial is just a small test that just runs on the machine the commands are run on.

This tutorial is a specialized version for UW Center for Mendelian Genomics. For other tutorial, please see Tutorial: GotCloud.

STEP 0 : Login to the cluster

In UW CMG cluster, you need to login to one of the high-performance cluster node via the following way.

% ssh yourid@uwcmg-head.gs.washington.edu
Password:
% qlogin
Your job 120 ("QLOGIN") has been submitted
waiting for interactive job to be scheduled …
Your interactive job 120 has been successfully scheduled.
Establishing /usr/local/bin/qlogin_command session to host uwcmg001.gs.washington.edu ...
yourid@uwcmg001.gs.washington.edu's password: 

Step 1 : GotCloud Alignment Pipeline

In order to run this tutorial, you need to make sure you have GotCloud installed on your system.

Try to run the following commands sequentially to follow the instruction.

First, let's set an environment variable for your convenience.

% export GC=/home/presenter02/day2/session1/uwcmg_2013_08/

The raw sequence reads can be found by

% ls ${GC}/examples/fastq
SRR035022_1.fastq.gz  SRR035023_1.fastq.gz  SRR035024_1.fastq.gz  SRR035025_1.fastq.gz  SRR035026_1.fastq.gz  SRR035027_1.fastq.gz  SRR035669_1.fastq.gz  SRR622461_1.fastq.gz
SRR035022_2.fastq.gz  SRR035023_2.fastq.gz  SRR035024_2.fastq.gz  SRR035025_2.fastq.gz  SRR035026_2.fastq.gz  SRR035027_2.fastq.gz  SRR035669_2.fastq.gz  SRR622461_2.fastq.gz
SRR035022.fastq.gz    SRR035023.fastq.gz    SRR035024.fastq.gz    SRR035025.fastq.gz    SRR035026.fastq.gz    SRR035027.fastq.gz    SRR035669.fastq.gz    SRR622461.fastq.gz

To see examples of FASTQ file.

% zless ${GC}/examples/fastq/SRR035022_1.fastq.gz

This set of FASTQ are extracted from two samples from the 1000 Genomes project, focusing around 500kb region are CFTR.

The example directory contain other set of files, such as reference sequence, index, and BAM files (for variant calling).

% ls ${GC}/examples
bams  chr7Ref  fastq  index

In order to run alignment pipeline, we need an index file for FASTQ files.

% cat ${GC}/examples/index/chr7.CFTR.fastq.index
MERGE_NAME	FASTQ1	FASTQ2	RGID	SAMPLE	LIBRARY	CENTER	PLATFORM
NA06984	fastq/SRR035022.fastq.gz	.	SRR035022	NA06984	Solexa-16556	BI	ILLUMINA
NA06984	fastq/SRR035022_1.fastq.gz	fastq/SRR035022_2.fastq.gz	SRR035022	NA06984	Solexa-16556	BI	ILLUMINA
NA06984	fastq/SRR035023.fastq.gz	.	SRR035023	NA06984	Solexa-16556	BI	ILLUMINA
NA06984	fastq/SRR035023_1.fastq.gz	fastq/SRR035023_2.fastq.gz	SRR035023	NA06984	Solexa-16556	BI	ILLUMINA
NA06984	fastq/SRR035024.fastq.gz	.	SRR035024	NA06984	Solexa-16556	BI	ILLUMINA
NA06984	fastq/SRR035024_1.fastq.gz	fastq/SRR035024_2.fastq.gz	SRR035024	NA06984	Solexa-16556	BI	ILLUMINA
NA06984	fastq/SRR035025.fastq.gz	.	SRR035025	NA06984	Solexa-16556	BI	ILLUMINA
NA06984	fastq/SRR035025_1.fastq.gz	fastq/SRR035025_2.fastq.gz	SRR035025	NA06984	Solexa-16556	BI	ILLUMINA
NA06984	fastq/SRR035026.fastq.gz	.	SRR035026	NA06984	Solexa-16556	BI	ILLUMINA
NA06984	fastq/SRR035026_1.fastq.gz	fastq/SRR035026_2.fastq.gz	SRR035026	NA06984	Solexa-16556	BI	ILLUMINA
NA06984	fastq/SRR035027.fastq.gz	.	SRR035027	NA06984	Solexa-16556	BI	ILLUMINA
NA06984	fastq/SRR035027_1.fastq.gz	fastq/SRR035027_2.fastq.gz	SRR035027	NA06984	Solexa-16556	BI	ILLUMINA
NA06984	fastq/SRR035669.fastq.gz	.	SRR035669	NA06984	Solexa-16556	BI	ILLUMINA
NA06984	fastq/SRR035669_1.fastq.gz	fastq/SRR035669_2.fastq.gz	SRR035669	NA06984	Solexa-16556	BI	ILLUMINA
NA12878	fastq/SRR622461.fastq.gz	.	SRR622461	NA12878	Illumina_NA12878	ILLUMINA	ILLUMINA
NA12878	fastq/SRR622461_1.fastq.gz	fastq/SRR622461_2.fastq.gz	SRR622461	NA12878	Illumina_NA12878	ILLUMINA	ILLUMINA

Also, we need a configuration file for GotCloud (default configuration usually works if you install GotCloud on your own, but here we're using reduced examples).

% cat ${GC}/examples/index/chr7.CFTR.align.conf
INDEX_FILE = index/chr7.CFTR.fastq.index
###################
# References
REF_DIR = chr7Ref
AS = NCBI37
REF = $(REF_DIR)/hs37d5.chr7.fa
DBSNP_VCF =  $(REF_DIR)/dbsnp_135.b37.chr7.CFTR.vcf.gz
HM3_VCF = $(REF_DIR)/hapmap_3.3.b37.sites.chr7.CFTR.vcf.gz

Create a test directory and enter it for running alignment pipeline

% mkdir test
% cd test

Then, run the alignment pipeline through the following command

% ${GC}/gotcloud/gotcloud align --conf ${GC}/examples/index/chr7.CFTR.align.conf --outDir align --baseprefix ${GC}/examples
Retrieved 168 bytes.
Random chance is 54  (compared to 30)
File sizes of 32 FASTQ input files referenced in '/home/presenter02/day2/session1/uwcmg_2013_08/examples/index/chr7.CFTR.fastq.index' = 6849341.4
Size of BAMs from aligner will be about 0.01 GB
Intermediate files from snpcaller will be about 0.01 GB
Final VCF output from snpcaller will be about 0.00 GB
Be sure you have enough space to hold all this data
Created /home/presenter02/test/align/Makefiles/align_NA06984.Makefile
Created /home/presenter02/test/align/Makefiles/align_NA12878.Makefile
---------------------------------------------------------------------
Waiting while samples are processed...
Processing finished in 42 secs with no errors reported

Let's see where the BAM files are generated

% ls align/bams
NA06984.recal.bam      NA06984.recal.bam.bai.done  NA06984.recal.bam.metrics  NA12878.recal.bam      NA12878.recal.bam.bai.done  NA12878.recal.bam.metrics
NA06984.recal.bam.bai  NA06984.recal.bam.done      NA06984.recal.bam.qemp     NA12878.recal.bam.bai  NA12878.recal.bam.done      NA12878.recal.bam.qemp

You can view the aligned sequence reads by running

% samtools view align/bams/NA06984.recal.bam | less

Examine the QC metrics by

% ls align/QCFiles
NA06984.genoCheck.depthRG  NA06984.genoCheck.selfRG  NA06984.qplot.R            NA12878.genoCheck.depthSM  NA12878.genoCheck.selfSM  NA12878.qplot.stats
NA06984.genoCheck.depthSM  NA06984.genoCheck.selfSM  NA06984.qplot.stats        NA12878.genoCheck.done     NA12878.qplot.done
NA06984.genoCheck.done     NA06984.qplot.done        NA12878.genoCheck.depthRG  NA12878.genoCheck.selfRG   NA12878.qplot.R

Text-version of QC metrics looks as follows

% cat align/QCFiles/NA06984.qplot.stats
Stats\BAM	/home/presenter02/test2/align/bams/NA06984.recal.bam
TotalReads(e6)	0.06
MappingRate(%)	97.44
MapRate_MQpass(%)	97.44
TargetMapping(%)	0.00
ZeroMapQual(%)	0.56
MapQual<10(%)	0.70
PairedReads(%)	98.27
ProperPaired(%)	92.83
MappedBases(e9)	0.00
Q20Bases(e9)	0.00
Q20BasesPct(%)	95.27
MeanDepth	7.33
GenomeCover(%)	0.33
EPS_MSE	293.97
EPS_Cycle_Mean	25.06
GCBiasMSE	0.00
ISize_mode	335
ISize_medium	331
DupRate(%)	2.05
QCFailRate(%)	0.00
BaseComp_A(%)	31.8
BaseComp_C(%)	18.4
BaseComp_G(%)	18.1
BaseComp_T(%)	31.6
BaseComp_O(%)	0.0

You can also run the R command to create PDF files (see lecture slides for examples).

To check whether the samples are contaminated, look at the verifyBamID results.

% cat align/QCFiles/NA06984.genoCheck.selfSM
#SEQ_ID	RG	CHIP_ID	#SNPS	#READS	AVG_DP	FREEMIX	FREELK1	FREELK0	FREE_RH	FREE_RA	CHIPMIX	CHIPLK1	CHIPLK0	CHIP_RH	CHIP_RA	DPREF	RDPHET	RDPALT
NA06984	ALL	NA	175	1038	5.93	0.00841	351.80	351.85	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA

STEP 3 : Run GotCloud SNP Calling Pipeline

The next step is to analyze BAM files by calling SNPs and generating a VCF file containing the variant calls.

The variant calling pipeline has multiple built-in steps to generate BAMs:

  1. Filter out reads with low mapping quality
  2. Per Base Alignment Quality Adjustment (BAQ)
  3. Resolve overlapping paired end reads
  4. Generate genotype likelihood files
  5. Perform variant calling
  6. Extract features from variant sites
  7. Perform variant filtering


To speed variant calling, each chromosome is broken up into smaller regions which are processed separately. While initially split by sample, the per sample data gets merged and is processed together for each region. These regions are later merged to result in a single Variant Call File (VCF) per chromosome.

% ls ${GC}/examples/bams
NA06984.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam      NA12272.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai
NA06984.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai  NA12273.mapped.ILLUMINA.bwa.CEU.low_coverage.20121211.CFTR.bam
NA06985.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam      NA12273.mapped.ILLUMINA.bwa.CEU.low_coverage.20121211.CFTR.bam.bai
NA06985.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai  NA12275.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam
NA06986.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam      NA12275.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai
NA06986.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai  NA12282.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam
NA06989.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam      NA12282.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai
NA06989.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai  NA12283.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam
NA06994.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam      NA12283.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai
NA06994.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai  NA12286.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam
NA07000.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam      NA12286.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai
NA07000.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai  NA12287.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam
NA07037.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.CFTR.bam      NA12287.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai
...

To run SNP calling pipeline, we need to prepare index file corresponding to the BAM files with sample IDs

% cat ${GC}/examples/index/chr7.CFTR.low_coverage.index
NA06984	EUR	bams/NA06984.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam
NA06985	EUR	bams/NA06985.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam
NA06986	EUR	bams/NA06986.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam
NA06989	EUR	bams/NA06989.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam
NA06994	EUR	bams/NA06994.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam
NA07000	EUR	bams/NA07000.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam
NA07037	EUR	bams/NA07037.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.CFTR.bam
NA07048	EUR	bams/NA07048.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam
NA07051	EUR	bams/NA07051.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam
NA07056	EUR	bams/NA07056.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam
...

Also, we need a configuration file.

% cat ${GC}/examples/index/chr7.CFTR.low_coverage.conf
CHRS = 7
BAM_INDEX = index/chr7.CFTR.low_coverage.index 
############
# References
REF_ROOT = chr7Ref
#
REF = $(REF_ROOT)/hs37d5.chr7.fa
INDEL_PREFIX = $(REF_ROOT)/1kg.pilot_release.merged.indels.sites.hg19
DBSNP_VCF =  $(REF_ROOT)/dbsnp_135.b37.chr7.CFTR.vcf.gz
HM3_VCF =  $(REF_ROOT)/hapmap_3.3.b37.sites.chr7.CFTR.vcf.gz
OMNI_VCF = $(REF_ROOT)/1000G_omni2.5.b37.sites.PASS.chr7.CFTR.vcf.gz

Then we're ready to run SNP calling pipeline

% ${GC}/gotcloud/gotcloud snpcall --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2
CHRS = 7
BAM_INDEX = index/chr7.CFTR.low_coverage.index 
############
# References
REF_ROOT = chr7Ref
#
REF = $(REF_ROOT)/hs37d5.chr7.fa
INDEL_PREFIX = $(REF_ROOT)/1kg.pilot_release.merged.indels.sites.hg19
DBSNP_VCF =  $(REF_ROOT)/dbsnp_135.b37.chr7.CFTR.vcf.gz
HM3_VCF =  $(REF_ROOT)/hapmap_3.3.b37.sites.chr7.CFTR.vcf.gz
OMNI_VCF = $(REF_ROOT)/1000G_omni2.5.b37.sites.PASS.chr7.CFTR.vcf.gz
[presenter02@uwcmg002 test]$ time ${GC}/gotcloud/gotcloud snpcall --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2
Key configurations:
GOTCLOUD_ROOT: /home/presenter02/day2/session1/uwcmg_2013_08/gotcloud
OUT_DIR:       snps
BAM_INDEX:     index/chr7.CFTR.low_coverage.index 
REF:           chr7Ref/hs37d5.chr7.fa
CHRS:          7
BATCH_TYPE:    local
BATCH_OPTS:    

Processing the following steps...
2: RUN_PILEUP
3: RUN_GLFMULTIPLES
4: RUN_VCFPILEUP
5: RUN_FILTER
6: RUN_SVM
8: RUN_SPLIT
Call region is 7:117000000-117500000
Generating commands for chr7...
Creating glf INDEX at 7:115000001-120000000..
--------------------------------------------------------------------
Finished creating makefile /home/presenter02/test/snps/umake.Makefile

Running /home/presenter02/test/snps/umake.Makefile

Start merging VCFs...
Merging /home/presenter02/test/snps/vcfs/chr7/115000001.120000000/chr7.115000001.120000000.vcf into STDOUT...
Start merging VCFs...
Merging /home/presenter02/test/snps/vcfs/chr7/115000001.120000000/chr7.115000001.120000000.stats.vcf into STDOUT...

The following parameters are available.  Ones with "[]" are in effect:

Available Options
                         Recipes : --write-bed, --write-vcf [ON], --upgrade,
                                   --summarize, --filter [ON], --subset
               VCF Input options : --in-vcf [/home/presenter02/test/snps/vcfs/chr7/chr7.merged.stats.vcf]
               BED Input options : --in-bfile [], --in-bed [], --in-bim [],
                                   --in-fam [],
                                   --ref [/data/local/ref/karma.ref/human.g1k.v37.fa]
              Subsetting options : --in-subset [], --mono-subset,
                                   --filt-only-subset
                  Output Options : --out [/home/presenter02/test/snps/vcfs/chr7/chr7.hardfiltered.sites.vcf],
                                   --qGeno, --print-every [10000]
      Output compression Options : --plain [ON], --bgzf, --gzip
   Genotype-level Filter Options : --minGQ, --minGD
                  Filter Options : --winIndel [5],
                                   --indelVCF [/home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/1kg.pilot_release.merged.indels.sites.hg19.chr7.vcf],
                                   --minQUAL [5], --minMQ [20],
                                   --maxDP [99000], --minDP [99],
                                   --maxABL [70], --winFFRQ, --maxFFRQ,
                                   --winFVAR, --merFVAR, --maxFVAR,
                                   --minNS [1], --maxSTP [2147483647],
                                   --maxTTT [2147483647],
                                   --minTTT [-2147483648], --maxSTR [20],
                                   --minSTR [-20], --maxSTZ [5], --minSTZ [-5],
                                   --maxCBR [20], --minCBR [-100],
                                   --maxQBR [100], --minQBR [-100],
                                   --maxCBZ [2147483647], --maxCSR [100],
                                   --minCSR [-100], --maxLQZ [2147483647],
                                   --minLQZ [-2147483648],
                                   --maxRBZ [2147483647],
                                   --minRBZ [-2147483648],
                                   --maxIOZ [2147483647],
                                   --minIOR [-2147483648],
                                   --maxIOR [2147483647],
                                   --maxAOZ [2147483647], --maxAOI [5],
                                   --maxMQ0 [10], --maxMQ10 [100],
                                   --maxMQ20 [100], --minFIC [-20],
                                   --minABE [-100], --maxABE [100],
                                   --maxLQR [30], --minMBR [-100],
                                   --maxMBR [100], --minABZ [-2147483648],
                                   --maxABZ [2147483647],
                                   --maxBCS [2147483647], --keepFilter

Analysis started on Tue Aug  6 12:25:04 2013

The following filters are in effect:
m20 : MQ >= 20.00
dp99 : DP >= 99.00
DP99000 : DP <= 99000.00
ns1 : NS >= 1.00
AB70 : AB <= 0.70
STR20 : STR <= 0.20
str-20 : STR >= -0.20
stz-5 : STZ >= -5.00
STZ5 : STZ <= 5.00
CBR20 : CBR <= 0.20
AOI5 : AOI <= 5.00
MQ0_10 : MQ0 <= 0.10
fic-20 : FIC >= -0.20
LQR30 : LQR <= 0.30
q5 : QUAL >= 5
INDEL5 : INDEL >= 5 bp with /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/1kg.pilot_release.merged.indels.sites.hg19.chr7.vcf

Writing to VCF file /home/presenter02/test/snps/vcfs/chr7/chr7.hardfiltered.sites.vcf
Analysis finished on Tue Aug  6 12:25:05 2013

loading /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/dbsnp_135.b37.chr7.CFTR.vcf.gz as a VCF input..
finished loading dbSNP for 7415 variants
TOTAL	1538	1538	662	378	166	25	145	129	27	6

The following parameters are available.  Ones with "[]" are in effect:

Available Options
                Input options : --in [/home/presenter02/test/snps/vcfs/chr7/chr7.filtered.sites.vcf.raw],
                                --ignore
               Output Options : --out [/home/presenter02/test/snps/vcfs/chr7/chr7.filtered.sites.vcf.norm],
                                --digits [3], --rank, --verbose [ON]
   Output compression Options : --plain [ON], --bgzf, --gzip

Analysis started on Tue Aug  6 12:25:05 2013

Reading Input File /home/presenter02/test/snps/vcfs/chr7/chr7.filtered.sites.vcf.raw
Finished Reading 1538 lines
Starting sorting values
Writing output values
Analysis finished on Tue Aug  6 12:25:05 2013

loading /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/dbsnp_135.b37.chr7.CFTR.vcf.gz as a VCF input..
finished loading dbSNP for 7415 variants
TOTAL	1538	1538	662	378	166	25	145	129	27	6
 Commands finished in 306 secs with no errors reported
--------------------------------------------------------------------

We can summarize the quality of variants.

% cat ${GC}/out/snps/vcfs/chr7/chr7.filtered.sites.vcf.summary
--------------------------------------------------------------------------------------------------------------
              FILTER	#SNPs	#dbSNP	%dbSNP	%CpG	%CpG	%Known	%Novel	%nCpG-K	%nCpG-N	%HM3	%HM3
                    				Known	Novel	Ts/Tv	Ts/Tv	Ts/Tv	Ts/Tv	sens	/SNP
--------------------------------------------------------------------------------------------------------------
              INDEL5	11	8	72.7	25.0	0.0	3.00	0.50	2.00	0.50	0.000	0.000
          INDEL5;SVM	8	4	50.0	50.0	25.0	0.33	0.33	1.00	0.00	0.000	0.000
                PASS	1388	1186	85.4	15.5	11.4	2.10	1.62	1.79	1.42	85.083	11.095
                 SVM	131	33	25.2	9.1	9.2	1.06	0.85	0.88	0.78	0.000	0.000
--------------------------------------------------------------------------------------------------------------
              FILTER	#SNPs	#dbSNP	%dbSNP	%CpG	%CpG	%Known	%Novel	%nCpG-K	%nCpG-N	%HM3	%HM3
                    				Known	Novel	Ts/Tv	Ts/Tv	Ts/Tv	Ts/Tv	sens	/SNP 
--------------------------------------------------------------------------------------------------------------
              INDEL5	19	12	63.2	33.3	14.3	1.40	0.40	1.67	0.20	0.000	0.000
                PASS	1388	1186	85.4	15.5	11.4	2.10	1.62	1.79	1.42	85.083	11.095
                 SVM	139	37	26.6	13.5	9.8	0.95	0.82	0.88	0.74	0.000	0.000 
--------------------------------------------------------------------------------------------------------------
                PASS	1388	1186	85.4	15.5	11.4	2.10	1.62	1.79	1.42	85.083	11.095
                FAIL	150	45	30.0	15.6	9.5	1.14	0.81	1.00	0.73	0.000	0.000
               TOTAL	1538	1231	80.0	15.5	10.7	2.05	1.27	1.75	1.12	85.083	10.013
------------------------------------------------------------------------------------------------------------

To run LD-refinement you can run beagle only (gotcloud beagle) or beagle+thunder (gotcloud ldrefine) to improve the quality of genotypes.

If you're interested in looking at individual reads from the variant, samtools tview provides a good alternative to IGV. In order to see CFTR R75Q variant in one of the variant carriers, you can run

% samtools tview ${GC}/examples/bams/NA12843.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam ${GC}/examples/chr7Ref/hs37d5.chr7.fa

And follow the steps

  • Type ‘g’
  • Type 7:117149147
  • Type ‘n’ to color by nucleotide
  • Move with left arrow to see the variant site clearly
% ${GC}/gotcloud/gotcloud beagle --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2
Key configurations:
GOTCLOUD_ROOT: /home/presenter02/day2/session1/uwcmg_2013_08/gotcloud
OUT_DIR:       snps
BAM_INDEX:     index/chr7.CFTR.low_coverage.index 
REF:           chr7Ref/hs37d5.chr7.fa
CHRS:          7
BATCH_TYPE:    local
BATCH_OPTS:    

Processing the following steps...
9: RUN_BEAGLE
10: RUN_SUBSET
Call region is 7:117000000-117500000
Generating commands for chr7...
--------------------------------------------------------------------
Finished creating makefile /home/presenter02/test/snps/umake.Makefile

Running /home/presenter02/test/snps/umake.Makefile

 Commands finished in 74 secs with no errors reported
--------------------------------------------------------------------

STEP 4 : EPACTS association analysis

In addition to variant calls, we need phenotype information to perform association mapping. Here, the phenotypes are created so that case is enriched in the individuals carrying rare nonsynonymous variants at CFTR.

% less ${GC}/examples/index/chr7.CFTR.ped 
#FAM_ID	IND_ID	DAD_ID	MOM_ID	SEX	PHENO	QT
NA06984	NA06984	0	0	0	1	-0.657149296617419
NA06985	NA06985	0	0	0	2	-0.449034856685281
NA06986	NA06986	0	0	0	1	0.0975849179986626
NA06989	NA06989	0	0	0	2	1.77095069670763
NA06994	NA06994	0	0	0	1	0.287475900193007
NA07000	NA07000	0	0	0	1	-1.36632872407691
NA07037	NA07037	0	0	0	2	-0.1278913321612
NA07048	NA07048	0	0	0	2	-1.45798868745693
NA07051	NA07051	0	0	0	1	-0.599618650565132
...

Create a directory to store association results

% mkdir assoc

Run single variant logistic score test.

% ${GC}/epacts/bin/epacts single  --ped ${GC}/examples/index/chr7.CFTR.ped   --vcf ${GC}/out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz  --pheno PHENO --out assoc/single.b.score --test b.score --anno   --ref ${GC}/examples/chr7Ref/hs37d5.chr7.fa   --region 7:117000000-117500000 --run 1
Successfully written phenotypes and 0 covariates across 99 individuals
Processing chromosome 7...
Finished generating EPACTS Makefile
Running 1 parallel jobs of EPACTS
make -f /home/presenter02/test.v1/assoc/single.b.score.Makefile -j 1
Loading required package: epactsR
Sucessfully wrote ( 1388 * 10 ) matrix
The following parameters are available.  Ones with "[]" are in effect:

Available Options
    Required Parameters :
                          -i [/home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts]
                          -o [/home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp]
          Gene Annotation
             Parameters : -g [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz]
                          -r [/home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa]
                          --inputFormat [epacts], --checkReference, -f [refGene]
                          -p [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt]
                          -c [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt]
                          -u [], -d [], --se [], --si [], --outputFormat []
 Other Annotation Tools : --genomeScore [], --bed [], --tabix []
Load reference genome /home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa...
DONE: 1 chromosomes and 159138663 bases are loaded.
Load codon file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt...
DONE: codon file loaded.
Load priority file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt...
DONE: 24 priority annotation types loaded.
Load gene file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz...
DONE: 92627 gene loaded.
DONE: Generated frequency of each annotype type in [ /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.anno.frq ].
DONE: Generated frequency of each highest priority annotation type in [ /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.top.anno.frq ].
Ts/Tv ratio: 2.02397
Ts observed: 929  times; Tv observed: 459 times.
DONE: Generated frequency of each base change in [ /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.base.frq ].
DONE: Generated frequency of each codon change in [ /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.codon.frq ].
DONE: Generated frequency of indel length in [ /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.indel.frq ].
nsnps = 866
Rscript /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/epactsSingle.R --vanilla /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../ /home/presenter02/test.v1/assoc/single.b.score.phe NULL /home/presenter02/test.v1/assoc/single.b.score.ind /home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz 7:117000000-117500000 /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts GT 1e-06 1 3 1000000000 0.5 0 FALSE single.b.score
NULL
NULL
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/epacts-anno --in /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts --ref /home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/anno -i /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts -r /home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa -f refGene -g /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz -c /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt -o /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp --inputFormat epacts -p /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt
..............................................         
 ...      Anno(tation)                       ...       
  ...      Xiaowei Zhan, Goncalo Abecasis     ...      
   ...      Speical Thanks:                    ...     
    ...      Hyun Ming Kang, Yanming Li         ...    
     ...      zhanxw@umich.edu                    ...  
      ...      Sep 2011                            ... 
       ................................................
                                                       
DONE: 1388 varaints are annotated.
DONE: Generated annotation output in [ /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp ].
Annotation succeed!
mv /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts
rm /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.log /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.top.anno.frq /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.anno.frq /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.base.frq /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.codon.frq /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.indel.frq
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/epacts-cat /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts | /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/bgzip -c > /home/presenter02/test.v1/assoc/single.b.score.epacts.gz
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/tabix -f -pbed /home/presenter02/test.v1/assoc/single.b.score.epacts.gz
rm -f /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/epacts-plot --in /home/presenter02/test.v1/assoc/single.b.score.epacts.gz --region 7:0
Rscript /home/presenter02/test.v1/assoc/single.b.score.epacts.R --vanilla

export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; gnuplot /home/presenter02/test.v1/assoc/single.b.score.epacts.cmd
export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/epstopdf /home/presenter02/test.v1/assoc/single.b.score.epacts.qq.eps
export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; gnuplot /home/presenter02/test.v1/assoc/single.b.score.epacts.cmd
export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/epstopdf /home/presenter02/test.v1/assoc/single.b.score.epacts.mh.eps
rm /home/presenter02/test.v1/assoc/single.b.score.epacts.cmd /home/presenter02/test.v1/assoc/single.b.score.epacts.qq.eps /home/presenter02/test.v1/assoc/single.b.score.epacts.mh.eps /home/presenter02/test.v1/assoc/single.b.score.epacts.*.dat
zcat /home/presenter02/test.v1/assoc/single.b.score.epacts.gz | awk '$9 != "NA" { print $0 }' | sort -g -k 9 | head -n 5000 > /home/presenter02/test.v1/assoc/single.b.score.epacts.top5000
touch /home/presenter02/test.v1/assoc/single.b.score.epacts.OK

d-69-91-154-62:~ hmkang$ cat tmp.txt | perl -lane 's/test\.v1/test/g; print " $_"'

uccessfully written phenotypes and 0 covariates across 99 individuals
Processing chromosome 7...
Finished generating EPACTS Makefile
Running 1 parallel jobs of EPACTS
make -f /home/presenter02/test/assoc/single.b.score.Makefile -j 1
Loading required package: epactsR
Sucessfully wrote ( 1388 * 10 ) matrix
The following parameters are available.  Ones with "[]" are in effect:

Available Options
    Required Parameters :
                          -i [/home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts]
                          -o [/home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp]
          Gene Annotation
             Parameters : -g [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz]
                          -r [/home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa]
                          --inputFormat [epacts], --checkReference, -f [refGene]
                          -p [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt]
                          -c [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt]
                          -u [], -d [], --se [], --si [], --outputFormat []
 Other Annotation Tools : --genomeScore [], --bed [], --tabix []
Load reference genome /home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa...
DONE: 1 chromosomes and 159138663 bases are loaded.
Load codon file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt...
DONE: codon file loaded.
Load priority file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt...
DONE: 24 priority annotation types loaded.
Load gene file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz...
DONE: 92627 gene loaded.
DONE: Generated frequency of each annotype type in [ /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.anno.frq ].
DONE: Generated frequency of each highest priority annotation type in [ /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.top.anno.frq ].
Ts/Tv ratio: 2.02397
Ts observed: 929  times; Tv observed: 459 times.
DONE: Generated frequency of each base change in [ /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.base.frq ].
DONE: Generated frequency of each codon change in [ /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.codon.frq ].
DONE: Generated frequency of indel length in [ /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.indel.frq ].
nsnps = 866
Rscript /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/epactsSingle.R --vanilla /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../ /home/presenter02/test/assoc/single.b.score.phe NULL /home/presenter02/test/assoc/single.b.score.ind /home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz 7:117000000-117500000 /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts GT 1e-06 1 3 1000000000 0.5 0 FALSE single.b.score
NULL
NULL
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/epacts-anno --in /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts --ref /home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/anno -i /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts -r /home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa -f refGene -g /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz -c /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt -o /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp --inputFormat epacts -p /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt
..............................................         
 ...      Anno(tation)                       ...       
  ...      Xiaowei Zhan, Goncalo Abecasis     ...      
   ...      Speical Thanks:                    ...     
    ...      Hyun Ming Kang, Yanming Li         ...    
     ...      zhanxw@umich.edu                    ...  
      ...      Sep 2011                            ... 
       ................................................
                                                       
DONE: 1388 varaints are annotated.
DONE: Generated annotation output in [ /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp ].
Annotation succeed!
mv /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts
rm /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.log /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.top.anno.frq /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.anno.frq /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.base.frq /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.codon.frq /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.indel.frq
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/epacts-cat /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts | /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/bgzip -c > /home/presenter02/test/assoc/single.b.score.epacts.gz
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/tabix -f -pbed /home/presenter02/test/assoc/single.b.score.epacts.gz
rm -f /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/epacts-plot --in /home/presenter02/test/assoc/single.b.score.epacts.gz --region 7:0
Rscript /home/presenter02/test/assoc/single.b.score.epacts.R --vanilla

export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; gnuplot /home/presenter02/test/assoc/single.b.score.epacts.cmd
export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/epstopdf /home/presenter02/test/assoc/single.b.score.epacts.qq.eps
export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; gnuplot /home/presenter02/test/assoc/single.b.score.epacts.cmd
export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/epstopdf /home/presenter02/test/assoc/single.b.score.epacts.mh.eps
rm /home/presenter02/test/assoc/single.b.score.epacts.cmd /home/presenter02/test/assoc/single.b.score.epacts.qq.eps /home/presenter02/test/assoc/single.b.score.epacts.mh.eps /home/presenter02/test/assoc/single.b.score.epacts.*.dat
zcat /home/presenter02/test/assoc/single.b.score.epacts.gz | awk '$9 != "NA" { print $0 }' | sort -g -k 9 | head -n 5000 > /home/presenter02/test/assoc/single.b.score.epacts.top5000
touch /home/presenter02/test/assoc/single.b.score.epacts.OK

To examine the top signals of association, try

% head assoc/single.b.score.epacts.top5000
#CHROM	BEGIN	END	MARKER_ID	NS	AC	CALLRATE	MAF	PVALUE	SCORE	NS.CASE	NS.CTRL	AF.CASE	AF.CTRL
7	117130755	117130755	7:117130755_C/G_Intron:CFTR	99	9	1	0.045455	0.0018407	3.1148	50	49	0.09	0
7	117149147	117149147	7:117149147_G/A_Nonsynonymous:CFTR	99	9	1	0.045455	0.0018407	3.1148	50	49	0.09	0
7	117120141	117120141	7:117120141_G/C_Utr5:CFTR	99	14	1	0.070707	0.0034417	-2.9253	50	49	0.02	0.12245
7	117356483	117356483	7:117356483_C/T_Intron:CTTNBP2	99	11	1	0.055556	0.0035695	-2.9139	50	49	0.01	0.10204
7	117459023	117459023	7:117459023_A/G_Intron:CTTNBP2	99	121	1	0.38889	0.0047674	2.8223	5049	0.71	0.5102
7	117430909	117430909	7:117430909_T/C_Intron:CTTNBP2	99	10	1	0.050505	0.0068926	-2.702	50	49	0.01	0.091837
7	117456904	117456904	7:117456904_C/T_Intron:CTTNBP2	99	115	1	0.41919	0.0076959	2.6651	5049	0.68	0.47959
7	117457141	117457141	7:117457141_G/C_Intron:CTTNBP2	99	115	1	0.41919	0.0076959	2.6651	5049	0.68	0.47959
7	117467003	117467003	7:117467003_T/A_Intron:CTTNBP2	99	122	1	0.38384	0.0077344	2.6634	5049	0.71	0.52041

Also you can examine the PDF output file (although it is a tiny fraction of genome), if your X11 works.

% xpdf assoc/single.b.score.epacts.qq.pdf
% xpdf assoc/single.b.score.epacts.mh.pdf

Running variant annotation using EPACTS is also possible

% time ${GC}/epacts/bin/epacts anno --in ${GC}/out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz --out snps/chr7.filtered.PASS.beagled.anno.vcf.gz --ref ${GC}examples/chr7Ref/hs37d5.chr7.fa
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/anno -i /home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz -r /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/hs37d5.chr7.fa -f refGene -g /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz -c /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt -o snps/chr7.filtered.PASS.beagled.anno.vcf.gz --inputFormat vcf -p /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt
The following parameters are available.  Ones with "[]" are in effect:

Available Options
    Required Parameters :
                          -i [/home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz]
                          -o [snps/chr7.filtered.PASS.beagled.anno.vcf.gz]
          Gene Annotation
             Parameters : -g [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz]
                          -r [/home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/hs37d5.chr7.fa]
                          --inputFormat [vcf], --checkReference, -f [refGene]
                          -p [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt]
                          -c [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt]
                          -u [], -d [], --se [], --si [], --outputFormat []
 Other Annotation Tools : --genomeScore [], --bed [], --tabix []
Load reference genome /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/hs37d5.chr7.fa...
DONE: 1 chromosomes and 159138663 bases are loaded.
Load codon file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt...
DONE: codon file loaded.
Load priority file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt...
DONE: 24 priority annotation types loaded.
Load gene file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz...
DONE: 92627 gene loaded.
DONE: Generated frequency of each annotype type in [ snps/chr7.filtered.PASS.beagled.anno.vcf.gz.anno.frq ].
DONE: Generated frequency of each highest priority annotation type in [ snps/chr7.filtered.PASS.beagled.anno.vcf.gz.top.anno.frq ].
Ts/Tv ratio: 2.02397
Ts observed: 929  times; Tv observed: 459 times.
DONE: Generated frequency of each base change in [ snps/chr7.filtered.PASS.beagled.anno.vcf.gz.base.frq ].
DONE: Generated frequency of each codon change in [ snps/chr7.filtered.PASS.beagled.anno.vcf.gz.codon.frq ].
DONE: Generated frequency of indel length in [ snps/chr7.filtered.PASS.beagled.anno.vcf.gz.indel.frq ].
..............................................         
 ...      Anno(tation)                       ...       
  ...      Xiaowei Zhan, Goncalo Abecasis     ...      
   ...      Speical Thanks:                    ...     
    ...      Hyun Ming Kang, Yanming Li         ...    
     ...      zhanxw@umich.edu                    ...  
      ...      Sep 2011                            ... 
       ................................................
                                                       
DONE: 1388 varaints are annotated.
DONE: Generated annotation output in [ snps/chr7.filtered.PASS.beagled.anno.vcf.gz ].
Annotation succeed!
mv snps/chr7.filtered.PASS.beagled.anno.vcf.gz /home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz.tmp
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/bgzip -c /home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz.tmp > snps/chr7.filtered.PASS.beagled.anno.vcf.gz
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/tabix -pvcf -f snps/chr7.filtered.PASS.beagled.anno.vcf.gz
rm /home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz.tmp
rm snps/chr7.filtered.PASS.beagled.anno.vcf.gz.log snps/chr7.filtered.PASS.beagled.anno.vcf.gz.top.anno.frq snps/chr7.filtered.PASS.beagled.anno.vcf.gz.anno.frq snps/chr7.filtered.PASS.beagled.anno.vcf.gz.base.frq snps/chr7.filtered.PASS.beagled.anno.vcf.gz.codon.frq snps/chr7.filtered.PASS.beagled.anno.vcf.gz.indel.frq

You can look at example non-synonymous SNPs.

% zcat snps/chr7.filtered.PASS.beagled.anno.vcf.gz | grep Nonsynonymous | grep CFTR | cut -f 1-8 | head -1

The following examples running gene-level burden test (skipped for the workshop),for your record.

% ${GC}/epacts/bin/epacts make-group --vcf snps/chr7.filtered.PASS.beagled.anno.vcf.gz --out snps/chr7.filtered.PASS.beagled.anno.grp –nonsyn
% ${GC}/epacts/bin/epacts group --ped ${GC}/examples/index/chr7.CFTR.ped --vcf snps/chr7.filtered.PASS.beagled.anno.vcf.gz --out assoc/group.skat.o --groupf snps/chr7.filtered.PASS.beagled.anno.grp --test skat --skat-o --run 2
% cat assoc/group.skat.o.epacts 
% ${GC}/epacts/bin/epacts group --ped ${GC}/examples/index/chr7.CFTR.ped --vcf snps/

STEP 5 : All-in-one running of all scripts (for later use)

% tar xzvf /home/presenter02/day2/session1/uwcmg_2013_08.tar.gz
% cd /home/presenter02/day2/session1/uwcmg_2013_08
% sh go.sh