Difference between revisions of "EPACTS for DIAGRAM"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 24: Line 24:
 
== 3.  Prepare PED file for phenotypes and covariates  ==
 
== 3.  Prepare PED file for phenotypes and covariates  ==
  
EPACTS accepts the PED format supported by MERLIN or PLINK to represent the phenotypes and covariates.  You may prepare either (1) a PED file without column headers + accompanying DAT file, or (2) a PED file with column headers.
+
EPACTS accepts the PED format supported by MERLIN or [http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml PLINK ]to represent the phenotypes and covariates.  You may prepare either (1) a PED file without column headers + accompanying DAT file, or (2) a PED file with column headers.  The standard PED format has 6 mandatory columns:
 +
 
 +
#Family ID
 +
#Individual ID
 +
#Paternal ID
 +
#Maternal ID
 +
#Sex (1=male; 2=female; other=unknown)
 +
#Phenotype
 +
 
 +
Columns 7 and onwards are covariate information.  For example
 +
 
 +
#AGE
 +
#SEX
 +
#PC1
 +
#PC2
 +
 
 +
etc.

Revision as of 11:54, 18 September 2012

Motivation and Rationale

EPACTS is a software pipeline developed to perform various statistical tests for analysis of whole-genome / whole-exome sequencing data.  The main motivation for using EPACTS is to use a consistent analysis framework for association analysis in the DIAGRAM consortium.  In addition, for analysis of low frequency variants (minor allele frequency [MAF] < 5%), standard logistic regression Wald or likelihood ratio tests found in existing association software are conservative or anti-conservative respectively.  We implemented two statistical tests recommended for analysis of low frequency variants: (1) logistic regresion-based score test and (2) Firth bias-corrected logistic regression (Firth, 1993).  For analysis of common variants, any asyptotic logistic regression test has well-controlled type I error rates and asymptotically equivalent power.  For simplicity and consistency, we propose the use of both score and Firth tests for testing all allele frequencies.

Outline of analysis protocol

This is an overview of the analysis protocol for analyzing imputed DIAGRAM datasets using the EPACTS pipeline.  We assume that your dataset has been imputed using minimac or Impute2.  Starting with minimac or impute2 output:

  1. Download and install EPACTS
  2. Convert the minimac or Impute2 output into VCF format
  3. Prepre PED file for phenotypes and covariates
  4. Run EPACTS association pipeline

1.  Download and install EPACTS

EPACTS is available for download here.

2.  Convert the minimac or Impute2 output into VCF format

[ Explanation wrapper program usage ]

An explanation of VCF files can be found here.

3.  Prepare PED file for phenotypes and covariates

EPACTS accepts the PED format supported by MERLIN or PLINK to represent the phenotypes and covariates.  You may prepare either (1) a PED file without column headers + accompanying DAT file, or (2) a PED file with column headers.  The standard PED format has 6 mandatory columns:

  1. Family ID
  2. Individual ID
  3. Paternal ID
  4. Maternal ID
  5. Sex (1=male; 2=female; other=unknown)
  6. Phenotype

Columns 7 and onwards are covariate information.  For example

  1. AGE
  2. SEX
  3. PC1
  4. PC2

etc.