EPACTS for DIAGRAM

From Genome Analysis Wiki
Revision as of 10:20, 19 September 2012 by Clement Ma (talk | contribs)
Jump to navigationJump to search

Motivation and Rationale

EPACTS is a software pipeline developed to perform various statistical tests for analysis of whole-genome / whole-exome sequencing data.  The main motivation for using EPACTS is to use a consistent analysis framework for association analysis in the DIAGRAM consortium.  In addition, for analysis of low frequency variants (minor allele frequency [MAF] < 5%), standard logistic regression Wald or likelihood ratio tests found in existing association software are conservative or anti-conservative respectively.  We implemented two statistical tests recommended for analysis of low frequency variants: (1) logistic regresion-based score test and (2) Firth bias-corrected logistic regression (Firth, 1993).  For analysis of common variants, any asyptotic logistic regression test has well-controlled type I error rates and asymptotically equivalent power.  For simplicity and consistency, we propose the use of both score and Firth tests for testing all allele frequencies.

Outline of analysis protocol

This is an overview of the analysis protocol for analyzing imputed DIAGRAM datasets using the EPACTS pipeline.  We assume that your dataset has been imputed using minimac or Impute2.  Starting with minimac or impute2 output:

  1. Download and install EPACTS
  2. Convert the minimac or Impute2 output into VCF format
  3. Prepre PED file for phenotypes and covariates
  4. Run EPACTS association pipeline

1.  Download and install EPACTS

EPACTS is available for download here.

2.  Convert the minimac or Impute2 output into VCF format

[ Explanation wrapper program usage ]

An explanation of VCF files can be found here.

3.  Prepare PED file for phenotypes and covariates

EPACTS accepts the PED format supported by MERLIN or PLINK to represent the phenotypes and covariates.  You may prepare either (1) a PED file without column headers + accompanying DAT file, or (2) a PED file with column headers.  The standard PED format has 6 mandatory columns:

  1. Family ID
  2. Individual ID
  3. Paternal ID
  4. Maternal ID
  5. Sex (1=male; 2=female; other=unknown)
  6. Phenotype

Columns 7 and onwards are additonal covariates and or phenotypes.  For example

  1. QT
  2. AGE

etc.

An example PED file with a header is as follows:

#FAM_ID IND_ID FAT_ID MOT_ID SEX DISEASE QT AGE
13281 NA12344 NA12347 NA12348 1 1 94.17 66.1
13281 NA12347 0 0 1 1 109.54 44.0
13281 NA12348 0 0 2 2 119.40 46.6
1328 NA06984 0 0 1 2 87.72 39.3
1328 NA06989 0 0 2 1 100.60 41.7
1328 NA12329 NA06984 NA06989 2 1 100.85 46.4
13291 NA06986 0 0 1 2 91.94 61.9
13291 NA06995 NA07435 NA07037 1 2 104.36 57.4
13291 NA06997 NA06986 NA07045 2 2 107.53 53.1

Alternatively, you can prepare a PED file without a header, and include a corresponding DAT file describing the column headers

13281 NA12344 NA12347 NA12348 1 1 94.17 66.1
13281 NA12347 0 0 1 1 109.54 44.0
13281 NA12348 0 0 2 2 119.40 46.6
1328 NA06984 0 0 1 2 87.72 39.3
1328 NA06989 0 0 2 1 100.60 41.7
1328 NA12329 NA06984 NA06989 2 1 100.85 46.4
13291 NA06986 0 0 1 2 91.94 61.9
13291 NA06995 NA07435 NA07037 1 2 104.36 57.4
13291 NA06997 NA06986 NA07045 2 2 107.53 53.1

The corresponding DAT file is:

M DISEASE
M QT
M AGE

4.  Run EPACTS association pipeline

The basic EPACTS command for single variant tests is:

perl epacts.pl single -vcf [INPUT VCF FILENAME] -ped [INPUT PED FILENAME] -out [OUTPUT FILENAME PREFIX] -test b.score -pheno DISEASE -cov AGE -sepchr -anno -run 10

This command will run single variant analysis using the score test logistic regression on the DISEASE phenotype adjusting for AGE.  The VCF files are separated by chromosomes (hence the option -sepchr).  It will annotate results by functional category (option -anno) and run the analysis on 10 parallel CPUs (option -run 10).

To run the Firth test as well, use:

perl epacts.pl single -vcf [INPUT VCF FILENAME] -ped [INPUT PED FILENAME] -out [OUTPUT FILENAME PREFIX] -test b.firth -pheno DISEASE -cov AGE -sepchr -anno -run 10

For detailed description of options, use:

perl epacts.pl single -man