Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 1: Line 1: −
(''WARNING: Under Construction'')
+
'''Note:''' the latest version of this practical is available at: [[SeqShop: Genetic Association Analysis Practical]]
 +
* The ones here is the original one from the June workshop (updated to be run from elsewhere)
 +
 
 +
 
 +
== Introduction ==
 +
[[Media:Seqshop association 2014 06.pdf|View Lecture Slides]]
 +
 
 +
[[Media:Seqshop association practice 2014 06.pdf|View Introductory Slides for Practical Session]]
    
== Goals of This Session ==
 
== Goals of This Session ==
 +
 
* Understand how to annotate variants using EPACTS
 
* Understand how to annotate variants using EPACTS
 
* Understand how to run single variant association analysis using EPACTS
 
* Understand how to run single variant association analysis using EPACTS
 
* Understand how to run rare variant association test using EPACTS
 
* Understand how to run rare variant association test using EPACTS
 
* Understand how to visualize the association output from EPACTS
 
* Understand how to visualize the association output from EPACTS
 +
 +
== Setup in person at the SeqShop Workshop ==
 +
''This section is specifically for the SeqShop Workshop computers.''
 +
<div class="mw-collapsible mw-collapsed" style="width:600px">
 +
''If you are not running during the SeqShop Workshop, please skip this section.''
 +
<div class="mw-collapsible-content">
    
{{SeqShopLogin}}
 
{{SeqShopLogin}}
   −
== Setup your run environment==
+
=== Setup your run environment===
 +
This is the same setup you did for the previous tutorial, but you need to redo it each time you log in.
   −
This is a  setup similar to the Structural Variant session.
+
This will setup some environment variables to point you to
 
+
* [[GotCloud]] program
* GotCloud program
   
* Tutorial input files
 
* Tutorial input files
 
* Setup an output directory
 
* Setup an output directory
Line 20: Line 34:  
* You won't see any output after running <code>source</code>
 
* You won't see any output after running <code>source</code>
 
** It silently sets up your environment
 
** It silently sets up your environment
** If you want to view the detail of the set up, type
+
** If you want to view the detail of the setup, type
  less /home/hmkang/seqshop/setup.txt
+
  less /home/mktrost/seqshop/setup.txt
 
and press 'q' to finish.
 
and press 'q' to finish.
 +
 
<div class="mw-collapsible mw-collapsed" style="width:200px">
 
<div class="mw-collapsible mw-collapsed" style="width:200px">
 
View setup.txt
 
View setup.txt
Line 32: Line 47:  
  export SV=/home/hmkang/seqshop/reference/svtoolkit
 
  export SV=/home/hmkang/seqshop/reference/svtoolkit
 
  export EXT=/home/hmkang/seqshop/external
 
  export EXT=/home/hmkang/seqshop/external
 +
export EPACTS=/home/hmkang/seqshop/epacts
 
  export OUT=~/out
 
  export OUT=~/out
 
  mkdir -p ${OUT}
 
  mkdir -p ${OUT}
 +
</div>
 +
</div>
 +
</div>
 +
</div>
 +
 +
 +
== Setup when running on your own outside of the SeqShop Workshop ==
 +
''This section is specifically for running on your own outside of the SeqShop Workshop.''
 +
<div class="mw-collapsible" style="width:600px">
 +
''If you are running during the SeqShop Workshop, please skip this section.''
 +
<div class="mw-collapsible-content">
 +
 +
This tutorial builds on the alignment & snpcall tutorials, if you have not already, please first run those tutorials: [[SeqShop:_Sequence_Mapping_and_Assembly_Practical, June 2014|Alignment Tutorial]] & [[SeqShop: Variant Calling and Filtering for SNPs Practical, June 2014|SNP Calling Tutorial]]
 +
 +
=== Download & Build EPACTS ===
 +
If you do not already have EPACTS:
 +
* cd to where you want EPACTS installed (you can change this to any directory you want)
 +
mkdir -p ~/seqshop
 +
cd ~/seqshop/
 +
* download, decompress, and build the version of epacts that was tested with this tutorial:
 +
wget http://csg.sph.umich.edu//kang/epacts/download/EPACTS-3.2.6.tar.gz
 +
tar xvf EPACTS-3.2.6.tar.gz
 +
cd EPACTS-3.2.6
 +
./configure --prefix=$HOME/seqshop/epacts
 +
make
 +
make install
 +
cd ../..
 +
 +
{{SeqShopRemoteEnv}}
 +
 +
<ul>
 +
<li> Additional variables for EPACTS:</li>
 +
<ul>
 +
<div class="mw-collapsible" style="width:500px">
 +
<li>Using bash (replace the paths below with the appropriate paths):</li>
 +
<div class="mw-collapsible-content">
 +
:<pre>export EPACTS=~/seqshop/epacts</pre>
 +
</div>
 +
</div>
 +
<div class="mw-collapsible mw-collapsed" style="width:500px">
 +
<li>Using tcsh (replace the paths below with the appropriate paths):</li>
 +
<div class="mw-collapsible-content">
 +
:<pre>setenv EPACTS ~/seqshop/epacts</pre>
 +
</div>
 +
</div>
 +
</ul>
 +
</ul>
 +
 
</div>
 
</div>
 
</div>
 
</div>
Line 44: Line 108:  
Check the contents of the VCF file using the following command.
 
Check the contents of the VCF file using the following command.
   −
  zless $OUT/thunder/chr22/ALL/thunder/chr22.filtered.PASS.beagled.ALL.thunder.vcf.gz
+
  zless ${OUT}/thunder/chr22/ALL/thunder/chr22.filtered.PASS.beagled.ALL.thunder.vcf.gz
    
=== Phenotype Information ===
 
=== Phenotype Information ===
Line 50: Line 114:  
Phenotype information is prepared in PED format commonly used in other GWAS software such as MERLIN or PLINK.
 
Phenotype information is prepared in PED format commonly used in other GWAS software such as MERLIN or PLINK.
   −
  less $IN/assoc/seqshop.ped
+
  less ${SS}/assoc/seqshop.ped
    
The first several line should look like below.
 
The first several line should look like below.
Line 57: Line 121:  
View Data
 
View Data
 
<div class="mw-collapsible-content" style="width:800px">
 
<div class="mw-collapsible-content" style="width:800px">
  #FAM_ID IND_ID DAD_ID MOM_ID SEX     GENO    PHENO
+
  #FAM_ID IND_ID DAD_ID MOM_ID SEX PHENO
  HG00551 HG00551 0       0       0       1      0
+
  HG00551 HG00551 0 0 0 0
  HG00553 HG00553 0       0       0       0      0
+
  HG00553 HG00553 0 0 0 0
  HG00554 HG00554 0       0       0       0      0
+
  HG00554 HG00554 0 0 0 0
  HG00637 HG00637 0       0      0       0       0
+
  HG00637 HG00637 0 0 0 0
  HG00638 HG00638 0       0      0       0       0
+
  HG00638 HG00638 0 0 0 0
  HG00640 HG00640 0       0       0       1      1
+
  HG00640 HG00640 0 0 0 1
  HG00641 HG00641 0       0       0       2      1
+
  HG00641 HG00641 0 0 0 1
  HG00734 HG00734 0       0       0       1      1
+
  HG00734 HG00734 0 0 0 1
  HG00736 HG00736 0       0      0      0      0
+
  HG00736 HG00736 0 0 0 0
HG00737 HG00737 0      0      0       0       0
   
  ...
 
  ...
 
</div>
 
</div>
Line 100: Line 163:  
   
 
   
 
  mkdir --p $OUT/assoc
 
  mkdir --p $OUT/assoc
  $EPACTS/bin/epacts-anno --in $OUT/thunder/chr22/ALL/thunder/chr22.filtered.PASS.beagled.ALL.thunder.vcf.gz --out $OUT/assoc/snps.anno.vcf.gz
+
  $EPACTS/bin/epacts-anno --in $OUT/thunder/chr22/ALL/thunder/chr22.filtered.PASS.beagled.ALL.thunder.vcf.gz --out $OUT/assoc/snps.anno.vcf.gz --ref $SS/ref22/human.g1k.v37.chr22.fa
    
Then you will see a series of messages before annotation finishes.
 
Then you will see a series of messages before annotation finishes.
Line 108: Line 171:  
<div class="mw-collapsible-content" style="width:800px">
 
<div class="mw-collapsible-content" style="width:800px">
 
  /home/hmkang/seqshop/epacts/bin/anno -i /net/seqshop-server/hmkang/out/thunder/chr22/ALL/thunder/chr22.filtered.PASS.beagled.ALL.thunder.vcf.gz -r \
 
  /home/hmkang/seqshop/epacts/bin/anno -i /net/seqshop-server/hmkang/out/thunder/chr22/ALL/thunder/chr22.filtered.PASS.beagled.ALL.thunder.vcf.gz -r \
     /home/hmkang/seqshop/epacts/share/EPACTS/human_g1k_v37.fasta -f refGene -g /home/hmkang/seqshop/epacts/share/EPACTS/hg19_gencodeV14.txt.gz \
+
     /home/hmkang/seqshop/ref22/human_g1k_v37.chr22.fa -f refGene -g /home/hmkang/seqshop/epacts/share/EPACTS/hg19_gencodeV14.txt.gz \
 
   -c /home/hmkang/seqshop/epacts/share/EPACTS/codon.txt -o  /net/seqshop-server/hmkang/out/assoc/snps.anno.vcf.gz --inputFormat vcf \
 
   -c /home/hmkang/seqshop/epacts/share/EPACTS/codon.txt -o  /net/seqshop-server/hmkang/out/assoc/snps.anno.vcf.gz --inputFormat vcf \
 
   -p /home/hmkang/seqshop/epacts/share/EPACTS/priority.txt
 
   -p /home/hmkang/seqshop/epacts/share/EPACTS/priority.txt
Line 119: Line 182:  
           Gene Annotation
 
           Gene Annotation
 
             Parameters : -g [/home/hmkang/seqshop/epacts/share/EPACTS/hg19_gencodeV14.txt.gz]
 
             Parameters : -g [/home/hmkang/seqshop/epacts/share/EPACTS/hg19_gencodeV14.txt.gz]
                           -r [/home/hmkang/seqshop/epacts/share/EPACTS/human_g1k_v37.fasta]
+
                           -r [/home/hmkang/seqshop/ref22/human_g1k_v37.chr22.fa]
 
                           --inputFormat [vcf], --checkReference, -f [refGene]
 
                           --inputFormat [vcf], --checkReference, -f [refGene]
 
                           -p [/home/hmkang/seqshop/epacts/share/EPACTS/priority.txt]
 
                           -p [/home/hmkang/seqshop/epacts/share/EPACTS/priority.txt]
Line 125: Line 188:  
                           -u [], -d [], --se [], --si [], --outputFormat []
 
                           -u [], -d [], --se [], --si [], --outputFormat []
 
   Other Annotation Tools : --genomeScore [], --bed [], --tabix []
 
   Other Annotation Tools : --genomeScore [], --bed [], --tabix []
  Load reference genome /home/hmkang/seqshop/epacts/share/EPACTS/human_g1k_v37.fasta...
+
  Load reference genome /home/hmkang/seqshop/ref22/human_g1k_v37.chr22.fa...
  DONE: 84 chromosomes and 3101804739 bases are loaded.
+
  DONE: 1 chromosomes and 51304566 bases are loaded.
 
  Load codon file /home/hmkang/seqshop/epacts/share/EPACTS/codon.txt...
 
  Load codon file /home/hmkang/seqshop/epacts/share/EPACTS/codon.txt...
 
  DONE: codon file loaded.
 
  DONE: codon file loaded.
Line 186: Line 249:  
== Single Variant Association Analysis ==
 
== Single Variant Association Analysis ==
   −
  $EPACTS/bin/epacts-single --ped $IN/assoc/seqshop.ped --vcf $OUT/assoc/snps.anno.vcf.gz --out $OUT/assoc/single --region 22:36000000-37000000 --test b.score --pheno PHENO --run 2  
+
Let's run a single-variant association analysis using a score test.
 +
 
 +
  $EPACTS/bin/epacts-single --ped $SS/assoc/seqshop.ped --vcf $OUT/assoc/snps.anno.vcf.gz --out $OUT/assoc/single --region 22:36000000-37000000 --test b.score --pheno PHENO --run 2  
 +
 
 +
After running it, you will see EPACTS output files by looking at
 +
 
 +
ls $OUT/assoc
 +
 
 +
The top association results can be viewed by
    
  head $OUT/assoc/single.epacts.top5000
 
  head $OUT/assoc/single.epacts.top5000
    +
<div class="mw-collapsible mw-collapsed" style="width:400px">
 +
View top association results
 +
<div class="mw-collapsible-content" style="width:800px">
 
  #CHROM BEGIN END MARKER_ID NS AC CALLRATE MAF PVALUE SCORE NS.CASE NS.CTRL AF.CASE AF.CTRL
 
  #CHROM BEGIN END MARKER_ID NS AC CALLRATE MAF PVALUE SCORE NS.CASE NS.CTRL AF.CASE AF.CTRL
 
  22 36995620 36995620 22:36995620_A/G 62 36 1 0.29032 5.6717e-09 5.8262 31 31 0.51613 0.064516
 
  22 36995620 36995620 22:36995620_A/G 62 36 1 0.29032 5.6717e-09 5.8262 31 31 0.51613 0.064516
Line 200: Line 274:  
  22 36998907 36998907 22:36998907_C/T 62 61 1 0.49194 0.00015557 -3.782 31 31 0.30645 0.67742
 
  22 36998907 36998907 22:36998907_C/T 62 61 1 0.49194 0.00015557 -3.782 31 31 0.30645 0.67742
 
  22 36667082 36667082 22:36667082_T/G 62 28 1 0.22581 0.0003506 -3.5747 31 31 0.080645 0.37097
 
  22 36667082 36667082 22:36667082_T/G 62 28 1 0.22581 0.0003506 -3.5747 31 31 0.080645 0.37097
 +
</div>
 +
</div>
   −
=== Visualizing Zoom Plot ===
+
You can look also visualize the results by QQ-plot and Manhattan plot
   −
* Use ENCODE GM12878 cell type to diplsy variants in the locusZoom-like plot
+
<div class="mw-collapsible mw-collapsed" style="width:400px">
 +
View QQ plots
 +
<div class="mw-collapsible-content" style="width:800px">
 +
:<pre>evince $OUT/assoc/single.epacts.qq.pdf&</pre>
 +
[[File:Single.epacts.qq.png]]
 +
</div>
 +
</div>
 +
 
 +
<div class="mw-collapsible mw-collapsed" style="width:400px">
 +
View Manhattan plots
 +
<div class="mw-collapsible-content" style="width:800px">
 +
:<pre>evince $OUT/assoc/single.epacts.mh.pdf&</pre>
 +
[[File:Single.epacts.mh.png|900px]]
 +
</div>
 +
</div>
 +
 
 +
Also, you can create a zoom plot focusing on the region of interest
 +
 
 +
$EPACTS/bin/epacts-zoom --vcf $OUT/assoc/snps.anno.vcf.gz --pos 22:36995620 --prefix $OUT/assoc/single
 +
 
 +
<div class="mw-collapsible mw-collapsed" style="width:400px">
 +
View Zoom Plots
 +
<div class="mw-collapsible-content" style="width:800px">
 +
:<pre>evince $OUT/assoc/single.zoom.22.36995620.pdf&</pre>
 +
[[File:Single.zoom.22.36995620.png]]
 +
</div>
 +
</div>
 +
 
 +
If you want to run EMMAX, you first need to create a kinship matrix
 +
 
 +
$EPACTS/bin/epacts-make-kin --vcf $OUT/assoc/snps.anno.vcf.gz --min-maf 0.01 --out $OUT/assoc/snps.anno.kinf --run 2 --chr 22
 +
 
 +
And run EMMAX test specifying the kinship matrix
 +
 
 +
$EPACTS/bin/epacts-single --ped $SS/assoc/seqshop.ped --vcf $OUT/assoc/snps.anno.vcf.gz --out $OUT/assoc/emmax --region 22:36000000-37000000 --test q.emmax --pheno PHENO --run 2 --kinf $OUT/assoc/snps.anno.kinf
   −
$S5/epacts/bin/epacts-zoom --vcf ~/out/snps/chr7.filtered.PASS.beagled.anno.vcf.gz --pos 7:117149147 --prefix ~/out/assoc/single.b.score --cellType Gm12878
+
Then the results may look similar to previous ones.
   −
Copy the output file single.b.score.zoom.7.117149147.pdf using WinSCP or other file transfer software
+
cat $OUT/assoc/emmax.epacts.top5000
    
== Run Groupwise Test ==
 
== Run Groupwise Test ==
   −
* Create a set of markers to test
+
To run group-wise test such as gene-level burden test, you need to create a marker group file. If your VCF is already annotated, you can create a group file by running
 +
 
 +
$EPACTS/bin/epacts make-group --vcf $OUT/assoc/snps.anno.vcf.gz --out $OUT/assoc/snps.anno.grp --nonsyn
 +
 
 +
The group file is simply a list of marker per group name, as shown below.
 +
 
 +
cat $OUT/assoc/snps.anno.grp
 +
APOL1 22:36655735_G/A 22:36657740_G/A 22:36661330_G/A 22:36661566_G/A 22:36661646_G/A 22:36661891_G/A 22:36661906_A/G
 +
APOL2 22:36623731_T/C 22:36623920_G/A 22:36629466_T/A 22:36633107_C/A
 +
APOL3 22:36537763_C/T 22:36537798_G/A 22:36556768_G/A 22:36556823_G/T
 +
APOL4 22:36587154_G/T 22:36587202_G/A 22:36587223_G/T 22:36587346_C/T 22:36587511_C/T 22:36587704_T/C 22:36587886_C/T 22:36593714_G/A 22:36597744_A/C 22:36598049_C/G 22:36598058_T/C 22:36598081_A/T
 +
APOL5 22:36122356_G/A 22:36122380_T/A 22:36122930_C/T 22:36123083_C/T 22:36124860_C/G
 +
FOXRED2 22:36900271_T/C 22:36900806_A/G
 +
MYH9 22:36681163_G/C 22:36684354_T/C 22:36710183_T/C
 +
Metazoa_SRP 22:36711990_C/G
 +
RBFOX2 22:36424450_A/C
 +
RP4-633O19__A.1 22:36792162_G/A
 +
 
 +
If you have your own annotation, you can create your own burden test unit by modifying this file.
 +
 
 +
If you want to run a collapsing burden test (CMC), run the following command
 +
 
 +
$EPACTS/bin/epacts group --ped $SS/assoc/seqshop.ped --vcf $OUT/assoc/snps.anno.vcf.gz --out $OUT/assoc/group.collapse --test b.collapse --groupf $OUT/assoc/snps.anno.grp --pheno PHENO --run 2
   −
  $S5/epacts/bin/epacts make-group --vcf ~/out/snps/chr7.filtered.PASS.beagled.anno.vcf.gz --out ~/out/snps/chr7.filtered.PASS.beagled.anno.grp --nonsyn
+
You can view the results by examining the output file
 +
  cat $OUT/assoc/group.collapse.epacts
   −
* Run SKAT-O test
+
<div class="mw-collapsible mw-collapsed" style="width:400px">
 +
View Output file
 +
<div class="mw-collapsible-content" style="width:800px">
 +
#CHROM BEGIN END MARKER_ID NS FRAC_WITH_RARE NUM_ALL_VARS NUM_PASS_VARS NUM_SING_VARS PVALUE STATRHO
 +
22 36655735 36661906 22:36655735-36661906_APOL1 62 0.14516 7 4 0 0.42748 1
 +
22 36623731 36633107 22:36623731-36633107_APOL2 62 0.064516 4 1 0 0.038657 NA
 +
22 36537763 36556823 22:36537763-36556823_APOL3 62 0.080645 4 2 0 0.40634 0
 +
22 36587154 36598081 22:36587154-36598081_APOL4 62 0.14516 12 4 0 0.67891 0
 +
22 36122356 36124860 22:36122356-36124860_APOL5 62 0.1129 5 2 0 0.15055 0.3
 +
22 36900271 36900806 22:36900271-36900806_FOXRED2 NA NA 2 0 0 NA NA
 +
22 36681163 36710183 22:36681163-36710183_MYH9 62 0.032258 3 1 0 1 NA
 +
22 36711990 36711990 22:36711990-36711990_Metazoa_SRP NA NA 1 0 0 NA NA
 +
22 36424450 36424450 22:36424450-36424450_RBFOX2 62 0.032258 1 1 0 1 NA
 +
22 36792162 36792162 22:36792162-36792162_RP4-633O19__A.1 NA NA 1 0 0 NA NA
 +
</div>
 +
</div>
   −
$S5/epacts/bin/epacts group --ped $S5/examples/index/chr7.CFTR.ped --vcf ~/out/snps/chr7.filtered.PASS.beagled.anno.vcf.gz --out ~/out/assoc/group.skat.o --groupf ~/out/snps/chr7.filtered.PASS.beagled.anno.grp --test skat --skat-o --run 2
+
You can run SKAT-O test in a similar way, but with a special tag
   −
  cat ~/out/assoc/group.skat.o.epacts | cut -f 4,6,10,11
+
  $EPACTS/bin/epacts group --ped $SS/assoc/seqshop.ped --vcf $OUT/assoc/snps.anno.vcf.gz --out $OUT/assoc/group.skato --test skat --skat-o --groupf $OUT/assoc/snps.anno.grp --pheno PHENO --run 2
   −
* Run Variable Threshold test
+
And view output files
  $S5/epacts/bin/epacts group --ped $S5/examples/index/chr7.CFTR.ped --vcf ~/out/snps/chr7.filtered.PASS.beagled.anno.vcf.gz --out ~/out/assoc/group.VT --groupf ~/out/snps/chr7.filtered.PASS.beagled.anno.grp --test VT --run 2
+
   
 +
cat $OUT/assoc/group.skato.epacts
   −
  cat ~/out/assoc/group.VT.epacts | cut -f 4,6,10,13
+
<div class="mw-collapsible mw-collapsed" style="width:400px">
 +
View Output file
 +
<div class="mw-collapsible-content" style="width:800px">
 +
#CHROM BEGIN END MARKER_ID NS FRAC_WITH_RARE NUM_ALL_VARS NUM_PASS_VARS NUM_SING_VARS PVALUE STATRHO
 +
22 36655735 36661906 22:36655735-36661906_APOL1 62 0.14516 7 4 0 0.42748 1
 +
22 36623731 36633107 22:36623731-36633107_APOL2 62 0.064516 4 1 0 0.038657 NA
 +
  22 36537763 36556823 22:36537763-36556823_APOL3 62 0.080645 4 2 0 0.40634 0
 +
22 36587154 36598081 22:36587154-36598081_APOL4 62 0.14516 12 4 0 0.67891 0
 +
22 36122356 36124860 22:36122356-36124860_APOL5 62 0.1129 5 2 0 0.15055 0.3
 +
22 36900271 36900806 22:36900271-36900806_FOXRED2 NA NA 2 0 0 NA NA
 +
22 36681163 36710183 22:36681163-36710183_MYH9 62 0.032258 3 1 0 1 NA
 +
22 36711990 36711990 22:36711990-36711990_Metazoa_SRP NA NA 1 0 0 NA NA
 +
22 36424450 36424450 22:36424450-36424450_RBFOX2 62 0.032258 1 1 0 1 NA
 +
22 36792162 36792162 22:36792162-36792162_RP4-633O19__A.1 NA NA 1 0 0 NA NA
 +
</div>
 +
</div>
96

edits

Navigation menu