Tutorial: EMMAX GotCloud STOM: Lecture 2

From Genome Analysis Wiki
Jump to: navigation, search

STOM 2014 Workshop - Practical Sessions 2

Lecture 2

The slides describing the notes below are available here (PDF)

Basic Setup

  • To see the files for the session, type
ls /data/stom2014/session2/

If you see any errors, please let me know now!

  • For convenience, let’s set some variables
export S2=/data/stom2014/session2
mkdir ~/out

Naive Association Test

  • Run naive association test using PLINK
$S2/bin/plink --noweb --bfile $S2/data/1000G.auto.omni.phased.EUR --pheno $S2/data/1000G_EUR_20_1459060.phe --linear --out ~/out/naive
  • Check your output file and see what it looks like
less ~/out/naive.assoc.linear
  • Check the p-value at the causal variant
grep -w ADD ~/out/naive.assoc.linear | grep 20:1459060
  • Draw QQ plot using the following R commands
> source('/data/stom2014/session2/r/qqconf.r')
> T <- read.table('~/out/naive.assoc.linear',header=TRUE)
> pdf('~/out/naive.pdf')
> qq.conf.beta(T$P)
> dev.off()

Genomic Control

  • Add --adjust option to enable genomic control
$S2/bin/plink --noweb --bfile $S2/data/1000G.auto.omni.phased.EUR --pheno $S2/data/1000G_EUR_20_1459060.phe --linear --adjust --out ~/out/naive
  • Calculate inflation factor on your own
> T <- read.table('~/out/naive.assoc.linear',header=TRUE)
    • First, find the median p-value
> median(T$P)
> 0.4814
    • Convert p-value into chi-square using R, and compute lambda
> qchisq(0.4814,1,lower.tail=FALSE)
[1] 0,4956901
> 0.4958032/0.456
[1] 1.08704
  • Check out the custom script to calculate inflation factor
less $S2/r/calc.GC.lambda.r
    • Feed the p-values from association results
cut -c 96- ~/out/naive.assoc.linear | Rscript $S2/r/calc.GC.lambda.r

Principal Component Analysis

  • Convert PLINK format file to EMMAX-compatible format to obtain PCs
$S2/bin/plink --noweb --bfile $S2/data/1000G.auto.omni.phased.EUR --recode12 --output-missing-genotype 0 --transpose --out ~/out/1000G.auto.omni.phased.EUR
  • Create kinship matrix using EMMAX

$S2/bin/emmax-kin-intel64 -T 1 -M 0.2 -v -d 10 ~/out/1000G.auto.omni.phased.EUR less ~/out/1000G.auto.omni.phased.EUR.aBN.kinf

  • Calculate principal component using custom script
Rscript $S2/r/calc.PC.from.kinf.r ~/out/1000G.auto.omni.phased.EUR.aBN.kinf ~/out/1000G.auto.omni.phased.EUR.tfam ~/out/1000G.auto.omni.phased.EUR.pc10
  • Check out how the PCA outcome looks like
less ~/out/1000G.auto.omni.phased.EUR.pc10
  • Visualize the population structure using PCs with custom script
Rscript $S2/r/plot_pc_pop.r ~/out/1000G.auto.omni.phased.EUR.pc10 $S2/data/1000G.auto.omni.phased.EUR.pop ~/out/1000G.auto.omni.phased.EUR.pc10.pdf
  • Use PCs as covariates to adjust for PCs
$S2/bin/plink --noweb --bfile $S2/data/1000G.auto.omni.phased.EUR --pheno $S2/data/1000G_EUR_20_1459060.phe --covar ~/out/1000G.auto.omni.phased.EUR.pc10 --covar-number 1,2,3,4 --linear --adjust --out ~/out/pca
  • Check out the p-value at the causal variant and inflation of statistics
grep -w ADD ~/out/pca.assoc.linear | grep 20:1459060 
grep -w ADD ~/out/pca.assoc.linear | cut -c 96- | Rscript $S2/r/calc.GC.lambda.r

Mixed Model Association

  • Run EMMAX association

$S2/bin/emmax-intel64 -t ~/out/1000G.auto.omni.phased.EUR -o ~/out/emmax -p $S2/data/1000G_EUR_20_1459060.phe -k ~/out/1000G.auto.omni.phased.EUR.aBN.kinf

  • Check the inflation factor
cut -f 4 ~out/emmax.ps | Rscript $S2/r/calc.GC.lambda.r
[1] 1.006079
  • Draw and compare multiple QQ plots using the R function provided
> source('/data/stom2014/session2/r/qqconf.r')
> T1 <- read.table('~/out/naive.assoc.linear',header=TRUE)
> T2 <- read.table('~/out/pca.assoc.linear',header=TRUE)
> T3 <- read.table('~/out/emmax.ps')
> pdf('~/out/all.pdf')
> qq.conf.beta(T1$P)
> qq.conf.beta(T2$P,drawaxis=FALSE,ptcolor="blue")
> qq.conf.beta(T3$V4,drawaxis=FALSE,ptcolor="red")
> dev.off()