Difference between revisions of "EPACTS for DIAGRAM"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 15: Line 15:
 
== 1.  Download and install EPACTS  ==
 
== 1.  Download and install EPACTS  ==
  
EPACTS is available for download [http://www.sph.umich.edu/csg/kang/epacts/download/index.html here].
+
EPACTS is available for download [http://www.sph.umich.edu/csg/kang/epacts/download/index.html here].  
  
For users in CSG, EPACTS can be found here:
+
For users in CSG, EPACTS can be found here:  
<pre>/net/fantasia/home/hmkang/sw/epacts2/</pre>
+
<pre>/net/fantasia/home/hmkang/sw/epacts2/</pre>  
 +
<br> Once installed, test out the software by running a quick example using the test data provided in the "example" directory. The example VCF and PED files are:
 +
<pre>$ epacts2/example/1000G_exome_chr20_example_softFiltered.calls.vcf.gz
  
 
+
$ epacts2/example/1000G_dummy_pheno.ped
Once installed, test out the software by running a quick example using the test data provided in the "example" directory. The example VCF and PED files are:
 
<pre>${EPACTS_DIRECTORY}/example/1000G_exome_chr20_example_softFiltered.calls.vcf.gz
 
 
 
${EPACTS_DIRECTORY}/example/1000G_dummy_pheno.ped
 
 
</pre>  
 
</pre>  
 
<br> Run the single variant score test on the example data using this command:  
 
<br> Run the single variant score test on the example data using this command:  
<pre>${EPACTS_DIR}/epacts single  
+
<pre>$ epacts2/epacts single  
--vcf {EPACTS_DIR}/example/1000G_exome_chr20_example_softFiltered.calls.vcf.gz  
+
--vcf epacts2/example/1000G_exome_chr20_example_softFiltered.calls.vcf.gz  
--ped {EPACTS_DIR}/example/1000G_dummy_pheno.ped  
+
--ped epacts2/example/1000G_dummy_pheno.ped  
 
--min-maf 0.001 --chr 20 --pheno DISEASE --cov AGE --cov SEX --test b.score --anno  
 
--min-maf 0.001 --chr 20 --pheno DISEASE --cov AGE --cov SEX --test b.score --anno  
 
--out {OUTPUT_DIR}/test --run 2 &amp;
 
--out {OUTPUT_DIR}/test --run 2 &amp;

Revision as of 17:21, 26 September 2012

Motivation and Rationale

EPACTS is a software pipeline developed to perform various statistical tests for analysis of whole-genome / whole-exome sequencing data.  The main motivation for using EPACTS is to use a consistent analysis framework for association analysis in the DIAGRAM consortium.  In addition, for analysis of low frequency variants (minor allele frequency [MAF] < 5%), standard logistic regression Wald or likelihood ratio tests found in existing association software are conservative or anti-conservative respectively.  We implemented two statistical tests recommended for analysis of low frequency variants: (1) logistic regresion-based score test and (2) Firth bias-corrected logistic regression (Firth, 1993).  For analysis of common variants, any asyptotic logistic regression test has well-controlled type I error rates and asymptotically equivalent power.  For simplicity and consistency, we propose the use of both score and Firth tests for testing all allele frequencies.

Outline of analysis protocol

This is an overview of the analysis protocol for analyzing imputed DIAGRAM datasets using the EPACTS pipeline.  We assume that your dataset has been imputed using minimac or Impute2.  Starting with minimac or impute2 output:

  1. Download and install EPACTS
  2. Convert the minimac or Impute2 output into VCF format
  3. Prepre PED file for phenotypes and covariates
  4. Run EPACTS association pipeline
  5. Report association results in appropriate format

1.  Download and install EPACTS

EPACTS is available for download here.

For users in CSG, EPACTS can be found here:

/net/fantasia/home/hmkang/sw/epacts2/


Once installed, test out the software by running a quick example using the test data provided in the "example" directory. The example VCF and PED files are:

$ epacts2/example/1000G_exome_chr20_example_softFiltered.calls.vcf.gz

$ epacts2/example/1000G_dummy_pheno.ped


Run the single variant score test on the example data using this command:

$ epacts2/epacts single 
--vcf epacts2/example/1000G_exome_chr20_example_softFiltered.calls.vcf.gz 
--ped epacts2/example/1000G_dummy_pheno.ped 
--min-maf 0.001 --chr 20 --pheno DISEASE --cov AGE --cov SEX --test b.score --anno 
--out {OUTPUT_DIR}/test --run 2 &

This command will run the single variant test on the input VCF and PED files, with a minimum MAF threshold of 0.001.  The phenotype is "DISEASE" and we are adjusting the analysis with covariates AGE and SEX.  The output file directory prefix is {OUTPUT_DIR}/test.  Finally, EPACTS will run the analysis in parallel on 2 CPUs.

2.  Convert the minimac or Impute2 output into VCF format

[ Explanation wrapper program usage ]

An explanation of VCF files can be found here.

3.  Prepare PED file for phenotypes and covariates

EPACTS accepts the PED format supported by MERLIN or PLINK to represent the phenotypes and covariates.  You may prepare either (1) a PED file without column headers + accompanying DAT file, or (2) a PED file with column headers.  The standard PED format has 6 mandatory columns:

  1. Family ID
  2. Individual ID
  3. Paternal ID
  4. Maternal ID
  5. Sex (1=male; 2=female; other=unknown)
  6. Phenotype

Columns 7 and onwards are additonal covariates and or phenotypes.  For example

  1. QT
  2. AGE

etc.

An example PED file with a header is as follows:

#FAM_ID IND_ID FAT_ID MOT_ID SEX DISEASE QT AGE
13281 NA12344 NA12347 NA12348 1 1 94.17 66.1
13281 NA12347 0 0 1 1 109.54 44.0
13281 NA12348 0 0 2 2 119.40 46.6
1328 NA06984 0 0 1 2 87.72 39.3
1328 NA06989 0 0 2 1 100.60 41.7
1328 NA12329 NA06984 NA06989 2 1 100.85 46.4
13291 NA06986 0 0 1 2 91.94 61.9
13291 NA06995 NA07435 NA07037 1 2 104.36 57.4
13291 NA06997 NA06986 NA07045 2 2 107.53 53.1

Alternatively, you can prepare a PED file without a header, and include a corresponding DAT file describing the column headers.

13281 NA12344 NA12347 NA12348 1 1 94.17 66.1
13281 NA12347 0 0 1 1 109.54 44.0
13281 NA12348 0 0 2 2 119.40 46.6
1328 NA06984 0 0 1 2 87.72 39.3
1328 NA06989 0 0 2 1 100.60 41.7
1328 NA12329 NA06984 NA06989 2 1 100.85 46.4
13291 NA06986 0 0 1 2 91.94 61.9
13291 NA06995 NA07435 NA07037 1 2 104.36 57.4
13291 NA06997 NA06986 NA07045 2 2 107.53 53.1

The corresponding DAT file is:

M DISEASE
M QT
M AGE

4.  Run EPACTS association pipeline

The basic EPACTS command for single variant tests is:

perl epacts.pl single -vcf [INPUT VCF FILENAME] -ped [INPUT PED FILENAME] -out [OUTPUT FILENAME PREFIX] -test b.score -pheno DISEASE -cov AGE -sepchr -anno -min-mac 1 -run 10

This command will run single variant analysis using the score test logistic regression on the DISEASE phenotype adjusting for AGE.  The VCF files are separated by chromosomes (option -sepchr).  All variants with at least one minor allele count will be analyzed (option -min-mac 1).  It will annotate results by functional category (option -anno) and run the analysis on 10 parallel CPUs (option -run 10).

To run the Firth test as well, use:

perl epacts.pl single -vcf [INPUT VCF FILENAME] -ped [INPUT PED FILENAME] -out [OUTPUT FILENAME PREFIX] -test b.firth -pheno DISEASE -cov AGE -sepchr -anno -run 10

For detailed description of options, use:

perl epacts.pl single -man

5.  Report association results in appropriate format

Once the EPACTS pipeline has completed, you will find all of the results in the .epacts file.  Here is an example of score test results:

#CHROM BEGIN END MARKER_ID NS AC CALLRATE MAF PVALUE SCORE
20 68303 68303 20:68303_A/G_Upstream:DEFB125 266 1 1 0.0018797 NA NA
20 68319 68319 20:68319_C/A_Upstream:DEFB125 266 0 1 0 NA NA
20 68396 68396 20:68396_C/T_Nonsynonymous:DEFB125 266 1 1 0.0018797 NA NA
20 76635 76635 20:76635_A/T_Intron:DEFB125 266 0 1 0 NA NA
20 76689 76689 20:76689_T/C_Synonymous:DEFB125 266 0 1 0 NA NA
20 76690 76690 20:76690_T/C_Nonsynonymous:DEFB125 266 1 1 0.0018797 NA NA
20 76700 76700 20:76700_G/A_Nonsynonymous:DEFB125 266 0 1 0 NA NA
20 76726 76726 20:76726_C/G_Nonsynonymous:DEFB125 266 0 1 0 NA NA
20 76771 76771 20:76771_C/T_Nonsynonymous:DEFB125 266 3 1 0.0056391 0.68484 0.40587

You will need to parse the information into the format as required here [NEED FINAL DRAFT OF ASSOCIATION ANALYSIS FILE AND COLUMN SPECIFICATIONS].