Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 22: Line 22:  
  yourid@uwcmg001.gs.washington.edu's password:  
 
  yourid@uwcmg001.gs.washington.edu's password:  
   −
  % ls ${GC}/examples
+
== Step 1 : GotCloud Alignment Pipeline ==
 +
In order to run this tutorial, you need to make sure you have GotCloud installed on your system.  
   −
=== Step 1 : GotCloud Alignment Pipeline ===
+
Try to run the following commands sequentially to follow the instruction.
In order to run this tutorial, you need to make sure you have GotCloud installed on your system.
     −
Try to run the following commands sequentially to follow the instruction
+
First, let's set an environment variable for your convenience.
    
  % export GC=/home/presenter02/day2/session1/uwcmg_2013_08/
 
  % export GC=/home/presenter02/day2/session1/uwcmg_2013_08/
 +
 +
The raw sequence reads can be found by
    
  % ls ${GC}/examples/fastq
 
  % ls ${GC}/examples/fastq
 +
SRR035022_1.fastq.gz  SRR035023_1.fastq.gz  SRR035024_1.fastq.gz  SRR035025_1.fastq.gz  SRR035026_1.fastq.gz  SRR035027_1.fastq.gz  SRR035669_1.fastq.gz  SRR622461_1.fastq.gz
 +
SRR035022_2.fastq.gz  SRR035023_2.fastq.gz  SRR035024_2.fastq.gz  SRR035025_2.fastq.gz  SRR035026_2.fastq.gz  SRR035027_2.fastq.gz  SRR035669_2.fastq.gz  SRR622461_2.fastq.gz
 +
SRR035022.fastq.gz    SRR035023.fastq.gz    SRR035024.fastq.gz    SRR035025.fastq.gz    SRR035026.fastq.gz    SRR035027.fastq.gz    SRR035669.fastq.gz    SRR622461.fastq.gz
 +
 +
To see examples of FASTQ file.
 +
% zless ${GC}/examples/fastq/SRR035022_1.fastq.gz
 +
 +
This set of FASTQ are extracted from two samples from the 1000 Genomes project, focusing around 500kb region are CFTR.
 +
 +
The example directory contain other set of files, such as reference sequence, index, and BAM files (for variant calling).
    
  % ls ${GC}/examples
 
  % ls ${GC}/examples
 +
bams  chr7Ref  fastq  index
 +
 +
In order to run alignment pipeline, we need an index file for FASTQ files.
    
  % cat ${GC}/examples/index/chr7.CFTR.fastq.index
 
  % cat ${GC}/examples/index/chr7.CFTR.fastq.index
 +
MERGE_NAME FASTQ1 FASTQ2 RGID SAMPLE LIBRARY CENTER PLATFORM
 +
NA06984 fastq/SRR035022.fastq.gz . SRR035022 NA06984 Solexa-16556 BI ILLUMINA
 +
NA06984 fastq/SRR035022_1.fastq.gz fastq/SRR035022_2.fastq.gz SRR035022 NA06984 Solexa-16556 BI ILLUMINA
 +
NA06984 fastq/SRR035023.fastq.gz . SRR035023 NA06984 Solexa-16556 BI ILLUMINA
 +
NA06984 fastq/SRR035023_1.fastq.gz fastq/SRR035023_2.fastq.gz SRR035023 NA06984 Solexa-16556 BI ILLUMINA
 +
NA06984 fastq/SRR035024.fastq.gz . SRR035024 NA06984 Solexa-16556 BI ILLUMINA
 +
NA06984 fastq/SRR035024_1.fastq.gz fastq/SRR035024_2.fastq.gz SRR035024 NA06984 Solexa-16556 BI ILLUMINA
 +
NA06984 fastq/SRR035025.fastq.gz . SRR035025 NA06984 Solexa-16556 BI ILLUMINA
 +
NA06984 fastq/SRR035025_1.fastq.gz fastq/SRR035025_2.fastq.gz SRR035025 NA06984 Solexa-16556 BI ILLUMINA
 +
NA06984 fastq/SRR035026.fastq.gz . SRR035026 NA06984 Solexa-16556 BI ILLUMINA
 +
NA06984 fastq/SRR035026_1.fastq.gz fastq/SRR035026_2.fastq.gz SRR035026 NA06984 Solexa-16556 BI ILLUMINA
 +
NA06984 fastq/SRR035027.fastq.gz . SRR035027 NA06984 Solexa-16556 BI ILLUMINA
 +
NA06984 fastq/SRR035027_1.fastq.gz fastq/SRR035027_2.fastq.gz SRR035027 NA06984 Solexa-16556 BI ILLUMINA
 +
NA06984 fastq/SRR035669.fastq.gz . SRR035669 NA06984 Solexa-16556 BI ILLUMINA
 +
NA06984 fastq/SRR035669_1.fastq.gz fastq/SRR035669_2.fastq.gz SRR035669 NA06984 Solexa-16556 BI ILLUMINA
 +
NA12878 fastq/SRR622461.fastq.gz . SRR622461 NA12878 Illumina_NA12878 ILLUMINA ILLUMINA
 +
NA12878 fastq/SRR622461_1.fastq.gz fastq/SRR622461_2.fastq.gz SRR622461 NA12878 Illumina_NA12878 ILLUMINA ILLUMINA
 +
 +
Also, we need a configuration file for GotCloud (default configuration usually works if you install GotCloud on your own, but here we're using reduced examples).
    
  % cat ${GC}/examples/index/chr7.CFTR.align.conf
 
  % cat ${GC}/examples/index/chr7.CFTR.align.conf
 +
INDEX_FILE = index/chr7.CFTR.fastq.index
 +
###################
 +
# References
 +
REF_DIR = chr7Ref
 +
AS = NCBI37
 +
REF = $(REF_DIR)/hs37d5.chr7.fa
 +
DBSNP_VCF =  $(REF_DIR)/dbsnp_135.b37.chr7.CFTR.vcf.gz
 +
HM3_VCF = $(REF_DIR)/hapmap_3.3.b37.sites.chr7.CFTR.vcf.gz
 +
 +
Create a test directory and enter it for running alignment pipeline
    
  % mkdir test
 
  % mkdir test
Line 46: Line 90:     
  % ${GC}/gotcloud/gotcloud align --conf ${GC}/examples/index/chr7.CFTR.align.conf --outDir align --baseprefix ${GC}/examples
 
  % ${GC}/gotcloud/gotcloud align --conf ${GC}/examples/index/chr7.CFTR.align.conf --outDir align --baseprefix ${GC}/examples
 +
Retrieved 168 bytes.
 +
Random chance is 54  (compared to 30)
 +
File sizes of 32 FASTQ input files referenced in '/home/presenter02/day2/session1/uwcmg_2013_08/examples/index/chr7.CFTR.fastq.index' = 6849341.4
 +
Size of BAMs from aligner will be about 0.01 GB
 +
Intermediate files from snpcaller will be about 0.01 GB
 +
Final VCF output from snpcaller will be about 0.00 GB
 +
Be sure you have enough space to hold all this data
 +
Created /home/presenter02/test/align/Makefiles/align_NA06984.Makefile
 +
Created /home/presenter02/test/align/Makefiles/align_NA12878.Makefile
 +
---------------------------------------------------------------------
 +
Waiting while samples are processed...
 +
Processing finished in 42 secs with no errors reported
 +
 +
Let's see where the BAM files are generated
 +
 +
% ls align/bams
 +
NA06984.recal.bam      NA06984.recal.bam.bai.done  NA06984.recal.bam.metrics  NA12878.recal.bam      NA12878.recal.bam.bai.done  NA12878.recal.bam.metrics
 +
NA06984.recal.bam.bai  NA06984.recal.bam.done      NA06984.recal.bam.qemp    NA12878.recal.bam.bai  NA12878.recal.bam.done      NA12878.recal.bam.qemp
 +
 +
You can view the aligned sequence reads by running
 +
% samtools view align/bams/NA06984.recal.bam | less
    
Examine the QC metrics by
 
Examine the QC metrics by
   −
% ls align/bams
  −
   
  % ls align/QCFiles
 
  % ls align/QCFiles
 +
NA06984.genoCheck.depthRG  NA06984.genoCheck.selfRG  NA06984.qplot.R            NA12878.genoCheck.depthSM  NA12878.genoCheck.selfSM  NA12878.qplot.stats
 +
NA06984.genoCheck.depthSM  NA06984.genoCheck.selfSM  NA06984.qplot.stats        NA12878.genoCheck.done    NA12878.qplot.done
 +
NA06984.genoCheck.done    NA06984.qplot.done        NA12878.genoCheck.depthRG  NA12878.genoCheck.selfRG  NA12878.qplot.R
 +
 +
Text-version of QC metrics looks as follows
    
  % cat align/QCFiles/NA06984.qplot.stats
 
  % cat align/QCFiles/NA06984.qplot.stats
 +
Stats\BAM /home/presenter02/test2/align/bams/NA06984.recal.bam
 +
TotalReads(e6) 0.06
 +
MappingRate(%) 97.44
 +
MapRate_MQpass(%) 97.44
 +
TargetMapping(%) 0.00
 +
ZeroMapQual(%) 0.56
 +
MapQual<10(%) 0.70
 +
PairedReads(%) 98.27
 +
ProperPaired(%) 92.83
 +
MappedBases(e9) 0.00
 +
Q20Bases(e9) 0.00
 +
Q20BasesPct(%) 95.27
 +
MeanDepth 7.33
 +
GenomeCover(%) 0.33
 +
EPS_MSE 293.97
 +
EPS_Cycle_Mean 25.06
 +
GCBiasMSE 0.00
 +
ISize_mode 335
 +
ISize_medium 331
 +
DupRate(%) 2.05
 +
QCFailRate(%) 0.00
 +
BaseComp_A(%) 31.8
 +
BaseComp_C(%) 18.4
 +
BaseComp_G(%) 18.1
 +
BaseComp_T(%) 31.6
 +
BaseComp_O(%) 0.0
 +
 +
You can also run the R command to create PDF files (see lecture slides for examples).
 +
 +
To check whether the samples are contaminated, look at the verifyBamID results.
    
  % cat align/QCFiles/NA06984.genoCheck.selfSM
 
  % cat align/QCFiles/NA06984.genoCheck.selfSM
 +
#SEQ_ID RG CHIP_ID #SNPS #READS AVG_DP FREEMIX FREELK1 FREELK0 FREE_RH FREE_RA CHIPMIX CHIPLK1 CHIPLK0 CHIP_RH CHIP_RA DPREF RDPHET RDPALT
 +
NA06984 ALL NA 175 1038 5.93 0.00841 351.80 351.85 NA NA NA NA NA NA NA NA NA NA
    
== STEP 3 : Run GotCloud SNP Calling Pipeline ==  
 
== STEP 3 : Run GotCloud SNP Calling Pipeline ==  
Line 73: Line 173:     
  % ls ${GC}/examples/bams
 
  % ls ${GC}/examples/bams
 +
NA06984.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam      NA12272.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai
 +
NA06984.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai  NA12273.mapped.ILLUMINA.bwa.CEU.low_coverage.20121211.CFTR.bam
 +
NA06985.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam      NA12273.mapped.ILLUMINA.bwa.CEU.low_coverage.20121211.CFTR.bam.bai
 +
NA06985.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai  NA12275.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam
 +
NA06986.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam      NA12275.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai
 +
NA06986.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai  NA12282.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam
 +
NA06989.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam      NA12282.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai
 +
NA06989.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai  NA12283.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam
 +
NA06994.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam      NA12283.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai
 +
NA06994.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam.bai  NA12286.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam
 +
NA07000.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam      NA12286.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai
 +
NA07000.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai  NA12287.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam
 +
NA07037.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.CFTR.bam      NA12287.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam.bai
 +
...
    +
To run SNP calling pipeline, we need to prepare index file corresponding to the BAM files with sample IDs
 +
 
  % cat ${GC}/examples/index/chr7.CFTR.low_coverage.index
 
  % cat ${GC}/examples/index/chr7.CFTR.low_coverage.index
 +
NA06984 EUR bams/NA06984.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam
 +
NA06985 EUR bams/NA06985.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam
 +
NA06986 EUR bams/NA06986.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam
 +
NA06989 EUR bams/NA06989.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam
 +
NA06994 EUR bams/NA06994.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam
 +
NA07000 EUR bams/NA07000.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam
 +
NA07037 EUR bams/NA07037.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.CFTR.bam
 +
NA07048 EUR bams/NA07048.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam
 +
NA07051 EUR bams/NA07051.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.CFTR.bam
 +
NA07056 EUR bams/NA07056.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam
 +
...
 +
 +
Also, we need a configuration file.
    
  % cat ${GC}/examples/index/chr7.CFTR.low_coverage.conf
 
  % cat ${GC}/examples/index/chr7.CFTR.low_coverage.conf
 +
CHRS = 7
 +
BAM_INDEX = index/chr7.CFTR.low_coverage.index
 +
############
 +
# References
 +
REF_ROOT = chr7Ref
 +
#
 +
REF = $(REF_ROOT)/hs37d5.chr7.fa
 +
INDEL_PREFIX = $(REF_ROOT)/1kg.pilot_release.merged.indels.sites.hg19
 +
DBSNP_VCF =  $(REF_ROOT)/dbsnp_135.b37.chr7.CFTR.vcf.gz
 +
HM3_VCF =  $(REF_ROOT)/hapmap_3.3.b37.sites.chr7.CFTR.vcf.gz
 +
OMNI_VCF = $(REF_ROOT)/1000G_omni2.5.b37.sites.PASS.chr7.CFTR.vcf.gz
   −
  % time ${GC}/gotcloud/gotcloud snpcall --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2
+
Then we're ready to run SNP calling pipeline
 +
 
 +
  % ${GC}/gotcloud/gotcloud snpcall --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2
 +
CHRS = 7
 +
BAM_INDEX = index/chr7.CFTR.low_coverage.index
 +
############
 +
# References
 +
REF_ROOT = chr7Ref
 +
#
 +
REF = $(REF_ROOT)/hs37d5.chr7.fa
 +
INDEL_PREFIX = $(REF_ROOT)/1kg.pilot_release.merged.indels.sites.hg19
 +
DBSNP_VCF =  $(REF_ROOT)/dbsnp_135.b37.chr7.CFTR.vcf.gz
 +
HM3_VCF =  $(REF_ROOT)/hapmap_3.3.b37.sites.chr7.CFTR.vcf.gz
 +
OMNI_VCF = $(REF_ROOT)/1000G_omni2.5.b37.sites.PASS.chr7.CFTR.vcf.gz
 +
[presenter02@uwcmg002 test]$ time ${GC}/gotcloud/gotcloud snpcall --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2
 +
Key configurations:
 +
GOTCLOUD_ROOT: /home/presenter02/day2/session1/uwcmg_2013_08/gotcloud
 +
OUT_DIR:      snps
 +
BAM_INDEX:    index/chr7.CFTR.low_coverage.index
 +
REF:          chr7Ref/hs37d5.chr7.fa
 +
CHRS:          7
 +
BATCH_TYPE:    local
 +
BATCH_OPTS:   
 +
 +
Processing the following steps...
 +
2: RUN_PILEUP
 +
3: RUN_GLFMULTIPLES
 +
4: RUN_VCFPILEUP
 +
5: RUN_FILTER
 +
6: RUN_SVM
 +
8: RUN_SPLIT
 +
Call region is 7:117000000-117500000
 +
Generating commands for chr7...
 +
Creating glf INDEX at 7:115000001-120000000..
 +
--------------------------------------------------------------------
 +
Finished creating makefile /home/presenter02/test/snps/umake.Makefile
 +
 +
Running /home/presenter02/test/snps/umake.Makefile
 +
 +
Start merging VCFs...
 +
Merging /home/presenter02/test/snps/vcfs/chr7/115000001.120000000/chr7.115000001.120000000.vcf into STDOUT...
 +
Start merging VCFs...
 +
Merging /home/presenter02/test/snps/vcfs/chr7/115000001.120000000/chr7.115000001.120000000.stats.vcf into STDOUT...
 +
 +
The following parameters are available.  Ones with "[]" are in effect:
 +
 +
Available Options
 +
                          Recipes : --write-bed, --write-vcf [ON], --upgrade,
 +
                                    --summarize, --filter [ON], --subset
 +
                VCF Input options : --in-vcf [/home/presenter02/test/snps/vcfs/chr7/chr7.merged.stats.vcf]
 +
                BED Input options : --in-bfile [], --in-bed [], --in-bim [],
 +
                                    --in-fam [],
 +
                                    --ref [/data/local/ref/karma.ref/human.g1k.v37.fa]
 +
              Subsetting options : --in-subset [], --mono-subset,
 +
                                    --filt-only-subset
 +
                  Output Options : --out [/home/presenter02/test/snps/vcfs/chr7/chr7.hardfiltered.sites.vcf],
 +
                                    --qGeno, --print-every [10000]
 +
      Output compression Options : --plain [ON], --bgzf, --gzip
 +
    Genotype-level Filter Options : --minGQ, --minGD
 +
                  Filter Options : --winIndel [5],
 +
                                    --indelVCF [/home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/1kg.pilot_release.merged.indels.sites.hg19.chr7.vcf],
 +
                                    --minQUAL [5], --minMQ [20],
 +
                                    --maxDP [99000], --minDP [99],
 +
                                    --maxABL [70], --winFFRQ, --maxFFRQ,
 +
                                    --winFVAR, --merFVAR, --maxFVAR,
 +
                                    --minNS [1], --maxSTP [2147483647],
 +
                                    --maxTTT [2147483647],
 +
                                    --minTTT [-2147483648], --maxSTR [20],
 +
                                    --minSTR [-20], --maxSTZ [5], --minSTZ [-5],
 +
                                    --maxCBR [20], --minCBR [-100],
 +
                                    --maxQBR [100], --minQBR [-100],
 +
                                    --maxCBZ [2147483647], --maxCSR [100],
 +
                                    --minCSR [-100], --maxLQZ [2147483647],
 +
                                    --minLQZ [-2147483648],
 +
                                    --maxRBZ [2147483647],
 +
                                    --minRBZ [-2147483648],
 +
                                    --maxIOZ [2147483647],
 +
                                    --minIOR [-2147483648],
 +
                                    --maxIOR [2147483647],
 +
                                    --maxAOZ [2147483647], --maxAOI [5],
 +
                                    --maxMQ0 [10], --maxMQ10 [100],
 +
                                    --maxMQ20 [100], --minFIC [-20],
 +
                                    --minABE [-100], --maxABE [100],
 +
                                    --maxLQR [30], --minMBR [-100],
 +
                                    --maxMBR [100], --minABZ [-2147483648],
 +
                                    --maxABZ [2147483647],
 +
                                    --maxBCS [2147483647], --keepFilter
 +
 +
Analysis started on Tue Aug  6 12:25:04 2013
 +
 +
The following filters are in effect:
 +
m20 : MQ >= 20.00
 +
dp99 : DP >= 99.00
 +
DP99000 : DP <= 99000.00
 +
ns1 : NS >= 1.00
 +
AB70 : AB <= 0.70
 +
STR20 : STR <= 0.20
 +
str-20 : STR >= -0.20
 +
stz-5 : STZ >= -5.00
 +
STZ5 : STZ <= 5.00
 +
CBR20 : CBR <= 0.20
 +
AOI5 : AOI <= 5.00
 +
MQ0_10 : MQ0 <= 0.10
 +
fic-20 : FIC >= -0.20
 +
LQR30 : LQR <= 0.30
 +
q5 : QUAL >= 5
 +
INDEL5 : INDEL >= 5 bp with /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/1kg.pilot_release.merged.indels.sites.hg19.chr7.vcf
 +
 +
Writing to VCF file /home/presenter02/test/snps/vcfs/chr7/chr7.hardfiltered.sites.vcf
 +
Analysis finished on Tue Aug  6 12:25:05 2013
 +
 +
loading /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/dbsnp_135.b37.chr7.CFTR.vcf.gz as a VCF input..
 +
finished loading dbSNP for 7415 variants
 +
TOTAL 1538 1538 662 378 166 25 145 129 27 6
 +
 +
The following parameters are available.  Ones with "[]" are in effect:
 +
 +
Available Options
 +
                Input options : --in [/home/presenter02/test/snps/vcfs/chr7/chr7.filtered.sites.vcf.raw],
 +
                                --ignore
 +
                Output Options : --out [/home/presenter02/test/snps/vcfs/chr7/chr7.filtered.sites.vcf.norm],
 +
                                --digits [3], --rank, --verbose [ON]
 +
    Output compression Options : --plain [ON], --bgzf, --gzip
 +
 +
Analysis started on Tue Aug  6 12:25:05 2013
 +
 +
Reading Input File /home/presenter02/test/snps/vcfs/chr7/chr7.filtered.sites.vcf.raw
 +
Finished Reading 1538 lines
 +
Starting sorting values
 +
Writing output values
 +
Analysis finished on Tue Aug  6 12:25:05 2013
 +
 +
loading /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/dbsnp_135.b37.chr7.CFTR.vcf.gz as a VCF input..
 +
finished loading dbSNP for 7415 variants
 +
TOTAL 1538 1538 662 378 166 25 145 129 27 6
 +
  Commands finished in 306 secs with no errors reported
 +
--------------------------------------------------------------------
 +
 
 +
We can summarize the quality of variants.
    
  % cat ${GC}/out/snps/vcfs/chr7/chr7.filtered.sites.vcf.summary
 
  % cat ${GC}/out/snps/vcfs/chr7/chr7.filtered.sites.vcf.summary
 +
--------------------------------------------------------------------------------------------------------------
 +
              FILTER #SNPs #dbSNP %dbSNP %CpG %CpG %Known %Novel %nCpG-K %nCpG-N %HM3 %HM3
 +
                    Known Novel Ts/Tv Ts/Tv Ts/Tv Ts/Tv sens /SNP
 +
--------------------------------------------------------------------------------------------------------------
 +
              INDEL5 11 8 72.7 25.0 0.0 3.00 0.50 2.00 0.50 0.000 0.000
 +
          INDEL5;SVM 8 4 50.0 50.0 25.0 0.33 0.33 1.00 0.00 0.000 0.000
 +
                PASS 1388 1186 85.4 15.5 11.4 2.10 1.62 1.79 1.42 85.083 11.095
 +
                  SVM 131 33 25.2 9.1 9.2 1.06 0.85 0.88 0.78 0.000 0.000
 +
--------------------------------------------------------------------------------------------------------------
 +
              FILTER #SNPs #dbSNP %dbSNP %CpG %CpG %Known %Novel %nCpG-K %nCpG-N %HM3 %HM3
 +
                    Known Novel Ts/Tv Ts/Tv Ts/Tv Ts/Tv sens /SNP
 +
--------------------------------------------------------------------------------------------------------------
 +
              INDEL5 19 12 63.2 33.3 14.3 1.40 0.40 1.67 0.20 0.000 0.000
 +
                PASS 1388 1186 85.4 15.5 11.4 2.10 1.62 1.79 1.42 85.083 11.095
 +
                  SVM 139 37 26.6 13.5 9.8 0.95 0.82 0.88 0.74 0.000 0.000
 +
--------------------------------------------------------------------------------------------------------------
 +
                PASS 1388 1186 85.4 15.5 11.4 2.10 1.62 1.79 1.42 85.083 11.095
 +
                FAIL 150 45 30.0 15.6 9.5 1.14 0.81 1.00 0.73 0.000 0.000
 +
                TOTAL 1538 1231 80.0 15.5 10.7 2.05 1.27 1.75 1.12 85.083 10.013
 +
------------------------------------------------------------------------------------------------------------
 +
 +
To run LD-refinement you can run beagle only (gotcloud beagle) or beagle+thunder (gotcloud ldrefine) to improve the quality of genotypes.
   −
% time ${GC}/gotcloud/gotcloud beagle --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2
+
If you're interested in looking at individual reads from the variant, samtools tview provides a good alternative to IGV. In order to see CFTR R75Q variant in one of the variant carriers, you can run
    
  % samtools tview ${GC}/examples/bams/NA12843.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam ${GC}/examples/chr7Ref/hs37d5.chr7.fa
 
  % samtools tview ${GC}/examples/bams/NA12843.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.CFTR.bam ${GC}/examples/chr7Ref/hs37d5.chr7.fa
 +
 +
And follow the steps
 +
* Type ‘g’
 +
* Type 7:117149147
 +
* Type ‘n’ to color by nucleotide
 +
* Move with left arrow to see the variant site clearly
 +
 +
% ${GC}/gotcloud/gotcloud beagle --conf ${GC}/examples/index/chr7.CFTR.low_coverage.conf --outDir snps --baseprefix ${GC}/examples --region 7:117000000-117500000 --numjobs 2
 +
Key configurations:
 +
GOTCLOUD_ROOT: /home/presenter02/day2/session1/uwcmg_2013_08/gotcloud
 +
OUT_DIR:      snps
 +
BAM_INDEX:    index/chr7.CFTR.low_coverage.index
 +
REF:          chr7Ref/hs37d5.chr7.fa
 +
CHRS:          7
 +
BATCH_TYPE:    local
 +
BATCH_OPTS:   
 +
 +
Processing the following steps...
 +
9: RUN_BEAGLE
 +
10: RUN_SUBSET
 +
Call region is 7:117000000-117500000
 +
Generating commands for chr7...
 +
--------------------------------------------------------------------
 +
Finished creating makefile /home/presenter02/test/snps/umake.Makefile
 +
 +
Running /home/presenter02/test/snps/umake.Makefile
 +
 +
  Commands finished in 74 secs with no errors reported
 +
--------------------------------------------------------------------
    
== STEP 4 : EPACTS association analysis  ==  
 
== STEP 4 : EPACTS association analysis  ==  
% head ${GC}/examples/index/chr7.CFTR.ped
      +
In addition to variant calls, we need phenotype information to perform association mapping. Here, the phenotypes are created so that case is enriched in the individuals carrying rare nonsynonymous variants at CFTR.
 +
 +
% less ${GC}/examples/index/chr7.CFTR.ped
 +
#FAM_ID IND_ID DAD_ID MOM_ID SEX PHENO QT
 +
NA06984 NA06984 0 0 0 1 -0.657149296617419
 +
NA06985 NA06985 0 0 0 2 -0.449034856685281
 +
NA06986 NA06986 0 0 0 1 0.0975849179986626
 +
NA06989 NA06989 0 0 0 2 1.77095069670763
 +
NA06994 NA06994 0 0 0 1 0.287475900193007
 +
NA07000 NA07000 0 0 0 1 -1.36632872407691
 +
NA07037 NA07037 0 0 0 2 -0.1278913321612
 +
NA07048 NA07048 0 0 0 2 -1.45798868745693
 +
NA07051 NA07051 0 0 0 1 -0.599618650565132
 +
...
 +
 +
Create a directory to store association results
 
  % mkdir assoc
 
  % mkdir assoc
   −
  % time ${GC}/epacts/bin/epacts single  --ped ${GC}/examples/index/chr7.CFTR.ped  --vcf ${GC}/out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz  --pheno PHENO --out assoc/single.b.score --test b.score --anno  --ref ${GC}/examples/chr7Ref/hs37d5.chr7.fa  --region 7:117000000-117500000 --run 1
+
Run single variant logistic score test.
 +
  % ${GC}/epacts/bin/epacts single  --ped ${GC}/examples/index/chr7.CFTR.ped  --vcf ${GC}/out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz  --pheno PHENO --out assoc/single.b.score --test b.score --anno  --ref ${GC}/examples/chr7Ref/hs37d5.chr7.fa  --region 7:117000000-117500000 --run 1
 +
Successfully written phenotypes and 0 covariates across 99 individuals
 +
Processing chromosome 7...
 +
Finished generating EPACTS Makefile
 +
Running 1 parallel jobs of EPACTS
 +
make -f /home/presenter02/test.v1/assoc/single.b.score.Makefile -j 1
 +
Loading required package: epactsR
 +
Sucessfully wrote ( 1388 * 10 ) matrix
 +
The following parameters are available.  Ones with "[]" are in effect:
 +
 +
Available Options
 +
    Required Parameters :
 +
                          -i [/home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts]
 +
                          -o [/home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp]
 +
          Gene Annotation
 +
              Parameters : -g [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz]
 +
                          -r [/home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa]
 +
                          --inputFormat [epacts], --checkReference, -f [refGene]
 +
                          -p [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt]
 +
                          -c [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt]
 +
                          -u [], -d [], --se [], --si [], --outputFormat []
 +
  Other Annotation Tools : --genomeScore [], --bed [], --tabix []
 +
Load reference genome /home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa...
 +
DONE: 1 chromosomes and 159138663 bases are loaded.
 +
Load codon file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt...
 +
DONE: codon file loaded.
 +
Load priority file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt...
 +
DONE: 24 priority annotation types loaded.
 +
Load gene file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz...
 +
DONE: 92627 gene loaded.
 +
DONE: Generated frequency of each annotype type in [ /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.anno.frq ].
 +
DONE: Generated frequency of each highest priority annotation type in [ /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.top.anno.frq ].
 +
Ts/Tv ratio: 2.02397
 +
Ts observed: 929  times; Tv observed: 459 times.
 +
DONE: Generated frequency of each base change in [ /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.base.frq ].
 +
DONE: Generated frequency of each codon change in [ /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.codon.frq ].
 +
DONE: Generated frequency of indel length in [ /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.indel.frq ].
 +
nsnps = 866
 +
Rscript /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/epactsSingle.R --vanilla /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../ /home/presenter02/test.v1/assoc/single.b.score.phe NULL /home/presenter02/test.v1/assoc/single.b.score.ind /home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz 7:117000000-117500000 /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts GT 1e-06 1 3 1000000000 0.5 0 FALSE single.b.score
 +
NULL
 +
NULL
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/epacts-anno --in /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts --ref /home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/anno -i /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts -r /home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa -f refGene -g /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz -c /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt -o /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp --inputFormat epacts -p /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt
 +
..............................................       
 +
  ...      Anno(tation)                      ...     
 +
  ...      Xiaowei Zhan, Goncalo Abecasis    ...     
 +
    ...      Speical Thanks:                    ...   
 +
    ...      Hyun Ming Kang, Yanming Li        ...   
 +
      ...      zhanxw@umich.edu                    ... 
 +
      ...      Sep 2011                            ...
 +
        ................................................
 +
                                                       
 +
DONE: 1388 varaints are annotated.
 +
DONE: Generated annotation output in [ /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp ].
 +
Annotation succeed!
 +
mv /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts
 +
rm /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.log /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.top.anno.frq /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.anno.frq /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.base.frq /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.codon.frq /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts.tmp.indel.frq
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/epacts-cat /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts | /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/bgzip -c > /home/presenter02/test.v1/assoc/single.b.score.epacts.gz
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/tabix -f -pbed /home/presenter02/test.v1/assoc/single.b.score.epacts.gz
 +
rm -f /home/presenter02/test.v1/assoc/single.b.score.7.117000000.117500000.epacts
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/epacts-plot --in /home/presenter02/test.v1/assoc/single.b.score.epacts.gz --region 7:0
 +
Rscript /home/presenter02/test.v1/assoc/single.b.score.epacts.R --vanilla
 +
 +
export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; gnuplot /home/presenter02/test.v1/assoc/single.b.score.epacts.cmd
 +
export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/epstopdf /home/presenter02/test.v1/assoc/single.b.score.epacts.qq.eps
 +
export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; gnuplot /home/presenter02/test.v1/assoc/single.b.score.epacts.cmd
 +
export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/epstopdf /home/presenter02/test.v1/assoc/single.b.score.epacts.mh.eps
 +
rm /home/presenter02/test.v1/assoc/single.b.score.epacts.cmd /home/presenter02/test.v1/assoc/single.b.score.epacts.qq.eps /home/presenter02/test.v1/assoc/single.b.score.epacts.mh.eps /home/presenter02/test.v1/assoc/single.b.score.epacts.*.dat
 +
zcat /home/presenter02/test.v1/assoc/single.b.score.epacts.gz | awk '$9 != "NA" { print $0 }' | sort -g -k 9 | head -n 5000 > /home/presenter02/test.v1/assoc/single.b.score.epacts.top5000
 +
touch /home/presenter02/test.v1/assoc/single.b.score.epacts.OK
 +
d-69-91-154-62:~ hmkang$ cat tmp.txt | perl -lane 's/test\.v1/test/g; print " $_"'
 +
uccessfully written phenotypes and 0 covariates across 99 individuals
 +
Processing chromosome 7...
 +
Finished generating EPACTS Makefile
 +
Running 1 parallel jobs of EPACTS
 +
make -f /home/presenter02/test/assoc/single.b.score.Makefile -j 1
 +
Loading required package: epactsR
 +
Sucessfully wrote ( 1388 * 10 ) matrix
 +
The following parameters are available.  Ones with "[]" are in effect:
 +
 +
Available Options
 +
    Required Parameters :
 +
                          -i [/home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts]
 +
                          -o [/home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp]
 +
          Gene Annotation
 +
              Parameters : -g [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz]
 +
                          -r [/home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa]
 +
                          --inputFormat [epacts], --checkReference, -f [refGene]
 +
                          -p [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt]
 +
                          -c [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt]
 +
                          -u [], -d [], --se [], --si [], --outputFormat []
 +
  Other Annotation Tools : --genomeScore [], --bed [], --tabix []
 +
Load reference genome /home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa...
 +
DONE: 1 chromosomes and 159138663 bases are loaded.
 +
Load codon file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt...
 +
DONE: codon file loaded.
 +
Load priority file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt...
 +
DONE: 24 priority annotation types loaded.
 +
Load gene file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz...
 +
DONE: 92627 gene loaded.
 +
DONE: Generated frequency of each annotype type in [ /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.anno.frq ].
 +
DONE: Generated frequency of each highest priority annotation type in [ /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.top.anno.frq ].
 +
Ts/Tv ratio: 2.02397
 +
Ts observed: 929  times; Tv observed: 459 times.
 +
DONE: Generated frequency of each base change in [ /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.base.frq ].
 +
DONE: Generated frequency of each codon change in [ /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.codon.frq ].
 +
DONE: Generated frequency of indel length in [ /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.indel.frq ].
 +
nsnps = 866
 +
Rscript /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/epactsSingle.R --vanilla /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../ /home/presenter02/test/assoc/single.b.score.phe NULL /home/presenter02/test/assoc/single.b.score.ind /home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz 7:117000000-117500000 /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts GT 1e-06 1 3 1000000000 0.5 0 FALSE single.b.score
 +
NULL
 +
NULL
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/epacts-anno --in /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts --ref /home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/anno -i /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts -r /home/presenter02/day2/session1/uwcmg_2013_08//examples/chr7Ref/hs37d5.chr7.fa -f refGene -g /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz -c /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt -o /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp --inputFormat epacts -p /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt
 +
..............................................       
 +
  ...      Anno(tation)                      ...     
 +
  ...      Xiaowei Zhan, Goncalo Abecasis    ...     
 +
    ...      Speical Thanks:                    ...   
 +
    ...      Hyun Ming Kang, Yanming Li        ...   
 +
      ...      zhanxw@umich.edu                    ... 
 +
      ...      Sep 2011                            ...
 +
        ................................................
 +
                                                       
 +
DONE: 1388 varaints are annotated.
 +
DONE: Generated annotation output in [ /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp ].
 +
Annotation succeed!
 +
mv /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts
 +
rm /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.log /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.top.anno.frq /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.anno.frq /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.base.frq /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.codon.frq /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts.tmp.indel.frq
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/epacts-cat /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts | /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/bgzip -c > /home/presenter02/test/assoc/single.b.score.epacts.gz
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/tabix -f -pbed /home/presenter02/test/assoc/single.b.score.epacts.gz
 +
rm -f /home/presenter02/test/assoc/single.b.score.7.117000000.117500000.epacts
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/epacts-plot --in /home/presenter02/test/assoc/single.b.score.epacts.gz --region 7:0
 +
Rscript /home/presenter02/test/assoc/single.b.score.epacts.R --vanilla
 +
 +
export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; gnuplot /home/presenter02/test/assoc/single.b.score.epacts.cmd
 +
export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/epstopdf /home/presenter02/test/assoc/single.b.score.epacts.qq.eps
 +
export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; gnuplot /home/presenter02/test/assoc/single.b.score.epacts.cmd
 +
export GDFONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_FONTPATH=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export GNUPLOT_PS_DIR=/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../share/EPACTS; export PATH=$PATH:/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/; export GNUPLOT_PFBTOPFA="pfbtops %s"; /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/../bin/epstopdf /home/presenter02/test/assoc/single.b.score.epacts.mh.eps
 +
rm /home/presenter02/test/assoc/single.b.score.epacts.cmd /home/presenter02/test/assoc/single.b.score.epacts.qq.eps /home/presenter02/test/assoc/single.b.score.epacts.mh.eps /home/presenter02/test/assoc/single.b.score.epacts.*.dat
 +
zcat /home/presenter02/test/assoc/single.b.score.epacts.gz | awk '$9 != "NA" { print $0 }' | sort -g -k 9 | head -n 5000 > /home/presenter02/test/assoc/single.b.score.epacts.top5000
 +
touch /home/presenter02/test/assoc/single.b.score.epacts.OK
 +
 
 +
To examine the top signals of association, try
 +
 
 +
% head assoc/single.b.score.epacts.top5000
 +
#CHROM BEGIN END MARKER_ID NS AC CALLRATE MAF PVALUE SCORE NS.CASE NS.CTRL AF.CASE AF.CTRL
 +
7 117130755 117130755 7:117130755_C/G_Intron:CFTR 99 9 1 0.045455 0.0018407 3.1148 50 49 0.09 0
 +
7 117149147 117149147 7:117149147_G/A_Nonsynonymous:CFTR 99 9 1 0.045455 0.0018407 3.1148 50 49 0.09 0
 +
7 117120141 117120141 7:117120141_G/C_Utr5:CFTR 99 14 1 0.070707 0.0034417 -2.9253 50 49 0.02 0.12245
 +
7 117356483 117356483 7:117356483_C/T_Intron:CTTNBP2 99 11 1 0.055556 0.0035695 -2.9139 50 49 0.01 0.10204
 +
7 117459023 117459023 7:117459023_A/G_Intron:CTTNBP2 99 121 1 0.38889 0.0047674 2.8223 5049 0.71 0.5102
 +
7 117430909 117430909 7:117430909_T/C_Intron:CTTNBP2 99 10 1 0.050505 0.0068926 -2.702 50 49 0.01 0.091837
 +
7 117456904 117456904 7:117456904_C/T_Intron:CTTNBP2 99 115 1 0.41919 0.0076959 2.6651 5049 0.68 0.47959
 +
7 117457141 117457141 7:117457141_G/C_Intron:CTTNBP2 99 115 1 0.41919 0.0076959 2.6651 5049 0.68 0.47959
 +
7 117467003 117467003 7:117467003_T/A_Intron:CTTNBP2 99 122 1 0.38384 0.0077344 2.6634 5049 0.71 0.52041
 +
 
 +
Also you can examine the PDF output file (although it is a tiny fraction of genome), if your X11 works.
 +
 
 +
% xpdf assoc/single.b.score.epacts.qq.pdf
 +
% xpdf assoc/single.b.score.epacts.mh.pdf
 +
 
 +
Running variant annotation using EPACTS is also possible
    
  % time ${GC}/epacts/bin/epacts anno --in ${GC}/out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz --out snps/chr7.filtered.PASS.beagled.anno.vcf.gz --ref ${GC}examples/chr7Ref/hs37d5.chr7.fa
 
  % time ${GC}/epacts/bin/epacts anno --in ${GC}/out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz --out snps/chr7.filtered.PASS.beagled.anno.vcf.gz --ref ${GC}examples/chr7Ref/hs37d5.chr7.fa
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/anno -i /home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz -r /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/hs37d5.chr7.fa -f refGene -g /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz -c /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt -o snps/chr7.filtered.PASS.beagled.anno.vcf.gz --inputFormat vcf -p /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt
 +
The following parameters are available.  Ones with "[]" are in effect:
 +
 +
Available Options
 +
    Required Parameters :
 +
                          -i [/home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz]
 +
                          -o [snps/chr7.filtered.PASS.beagled.anno.vcf.gz]
 +
          Gene Annotation
 +
              Parameters : -g [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz]
 +
                          -r [/home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/hs37d5.chr7.fa]
 +
                          --inputFormat [vcf], --checkReference, -f [refGene]
 +
                          -p [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt]
 +
                          -c [/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt]
 +
                          -u [], -d [], --se [], --si [], --outputFormat []
 +
  Other Annotation Tools : --genomeScore [], --bed [], --tabix []
 +
Load reference genome /home/presenter02/day2/session1/uwcmg_2013_08/examples/chr7Ref/hs37d5.chr7.fa...
 +
DONE: 1 chromosomes and 159138663 bases are loaded.
 +
Load codon file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/codon.txt...
 +
DONE: codon file loaded.
 +
Load priority file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/priority.txt...
 +
DONE: 24 priority annotation types loaded.
 +
Load gene file /home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//share/EPACTS/hg19_gencodeV14.txt.gz...
 +
DONE: 92627 gene loaded.
 +
DONE: Generated frequency of each annotype type in [ snps/chr7.filtered.PASS.beagled.anno.vcf.gz.anno.frq ].
 +
DONE: Generated frequency of each highest priority annotation type in [ snps/chr7.filtered.PASS.beagled.anno.vcf.gz.top.anno.frq ].
 +
Ts/Tv ratio: 2.02397
 +
Ts observed: 929  times; Tv observed: 459 times.
 +
DONE: Generated frequency of each base change in [ snps/chr7.filtered.PASS.beagled.anno.vcf.gz.base.frq ].
 +
DONE: Generated frequency of each codon change in [ snps/chr7.filtered.PASS.beagled.anno.vcf.gz.codon.frq ].
 +
DONE: Generated frequency of indel length in [ snps/chr7.filtered.PASS.beagled.anno.vcf.gz.indel.frq ].
 +
..............................................       
 +
  ...      Anno(tation)                      ...     
 +
  ...      Xiaowei Zhan, Goncalo Abecasis    ...     
 +
    ...      Speical Thanks:                    ...   
 +
    ...      Hyun Ming Kang, Yanming Li        ...   
 +
      ...      zhanxw@umich.edu                    ... 
 +
      ...      Sep 2011                            ...
 +
        ................................................
 +
                                                       
 +
DONE: 1388 varaints are annotated.
 +
DONE: Generated annotation output in [ snps/chr7.filtered.PASS.beagled.anno.vcf.gz ].
 +
Annotation succeed!
 +
mv snps/chr7.filtered.PASS.beagled.anno.vcf.gz /home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz.tmp
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/bgzip -c /home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz.tmp > snps/chr7.filtered.PASS.beagled.anno.vcf.gz
 +
/home/presenter02/day2/session1/uwcmg_2013_08/epacts/bin/..//bin/tabix -pvcf -f snps/chr7.filtered.PASS.beagled.anno.vcf.gz
 +
rm /home/presenter02/day2/session1/uwcmg_2013_08//out/snps/beagle/chr7/chr7.filtered.PASS.beagled.vcf.gz.tmp
 +
rm snps/chr7.filtered.PASS.beagled.anno.vcf.gz.log snps/chr7.filtered.PASS.beagled.anno.vcf.gz.top.anno.frq snps/chr7.filtered.PASS.beagled.anno.vcf.gz.anno.frq snps/chr7.filtered.PASS.beagled.anno.vcf.gz.base.frq snps/chr7.filtered.PASS.beagled.anno.vcf.gz.codon.frq snps/chr7.filtered.PASS.beagled.anno.vcf.gz.indel.frq
 +
 +
You can look at example non-synonymous SNPs.
    
  % zcat snps/chr7.filtered.PASS.beagled.anno.vcf.gz | grep Nonsynonymous | grep CFTR | cut -f 1-8 | head -1
 
  % zcat snps/chr7.filtered.PASS.beagled.anno.vcf.gz | grep Nonsynonymous | grep CFTR | cut -f 1-8 | head -1
 +
 +
The following examples running gene-level burden test (skipped for the workshop),for your record.
    
  % ${GC}/epacts/bin/epacts make-group --vcf snps/chr7.filtered.PASS.beagled.anno.vcf.gz --out snps/chr7.filtered.PASS.beagled.anno.grp –nonsyn
 
  % ${GC}/epacts/bin/epacts make-group --vcf snps/chr7.filtered.PASS.beagled.anno.vcf.gz --out snps/chr7.filtered.PASS.beagled.anno.grp –nonsyn
Line 107: Line 663:  
== STEP 5 : All-in-one running of all scripts (for later use) ==  
 
== STEP 5 : All-in-one running of all scripts (for later use) ==  
   −
  % tar xzvf /home/presenter02/day2/session1/uwcmg_2013_08.tar.gz�
+
  % tar xzvf /home/presenter02/day2/session1/uwcmg_2013_08.tar.gz
   −
  % cd /home/presenter02/day2/session1/uwcmg_2013_08�
+
  % cd /home/presenter02/day2/session1/uwcmg_2013_08
   −
  % sh go.sh
+
  % sh go.sh

Navigation menu