Difference between revisions of "Test EPACTS for DIAGRAM"

From Genome Analysis Wiki
Jump to navigationJump to search
 
(11 intermediate revisions by one other user not shown)
Line 1: Line 1:
 
= Download EPACTS  =
 
= Download EPACTS  =
  
EPACTS is available for download [http://www.sph.umich.edu/csg/kang/epacts/download/epacts_v2.1.noref_binary.2012_09_27.tar.gz here (100Mb) ].  
+
EPACTS is available for download [http://csg.sph.umich.edu//kang/epacts/download/epacts_v2.12.noref_binary.2012_10_01.tar.gz here (100Mb) ].  
  
 
Requirements  
 
Requirements  
Line 11: Line 11:
  
 
Uncompress EPACTS package to the directory you would like to install  
 
Uncompress EPACTS package to the directory you would like to install  
 
+
<pre> tar xzvf epacts_v2_12.noref_binary.2012_10_01.tar.gz</pre>
  tar xzvf epacts_v2_01.noref_binary.2012_09_27.tar.gz
+
Download the reference FASTA files from 1000 Genomes FTP automatically by running the following commands<br>
 +
<pre>cd epacts2.1/
 +
./ref_download.sh
 +
(For advanced users, to save time for downloading the FASTA files (~900MB), you may copy a local copy of GRCh37 FASTA file and the index file to ${EPACTS_DIR}/ext/ref/)
 +
</pre>
  
 
= Accessing help  =
 
= Accessing help  =
Line 51: Line 55:
 
</pre>
 
</pre>
  
= Getting started in EPACTS with an example =
+
= Getting started in EPACTS with an example =
  
 
Once installed, test out the software by running a quick example using the test data provided in the "example" directory. The example VCF and PED files are:  
 
Once installed, test out the software by running a quick example using the test data provided in the "example" directory. The example VCF and PED files are:  
<pre>$ epacts2/example/1000G_exome_chr20_example_softFiltered.calls.vcf.gz
+
<pre>$ epacts2.1/example/1000G_exome_chr20_example_softFiltered.calls.vcf.gz
  
$ epacts2/example/1000G_dummy_pheno.ped
+
$ epacts2.1/example/1000G_dummy_pheno.ped
 
</pre>  
 
</pre>  
 
<br> Run the single variant score test on the example data using this command:  
 
<br> Run the single variant score test on the example data using this command:  
<pre>$ epacts2/epacts single \
+
<pre>$ epacts2.1/epacts single \
--vcf epacts2/example/1000G_exome_chr20_example_softFiltered.calls.vcf.gz \
+
--vcf epacts2.1/example/1000G_exome_chr20_example_softFiltered.calls.vcf.gz \
--ped epacts2/example/1000G_dummy_pheno.ped \
+
--ped epacts2.1/example/1000G_dummy_pheno.ped \
 
--min-maf 0.001 --chr 20 --pheno DISEASE --cov AGE --cov SEX --test b.score --anno \
 
--min-maf 0.001 --chr 20 --pheno DISEASE --cov AGE --cov SEX --test b.score --anno \
 
--out {OUTPUT_DIR}/test --run 2 &amp;
 
--out {OUTPUT_DIR}/test --run 2 &amp;
Line 72: Line 76:
  
 
1. &nbsp;'''test.epacts.gz'''&nbsp;contains all the association results.  
 
1. &nbsp;'''test.epacts.gz'''&nbsp;contains all the association results.  
<pre>$ head test.epacts
+
<pre>$ head {OUTPUT_DIR}/test.epacts
#CHROM BEGIN END MARKER_ID NS AC CALLRATE MAF PVALUE SCORE
+
#CHROM BEGIN   END     MARKER_ID       NS     AC     CALLRATE       MAF     PVALUE SCORE   N.CASE  N.CTRL  AF.CASE AF.CTRL
20 68303 68303 20:68303_A/G_Upstream:DEFB125 266 1 1 0.0018797 NA NA
+
20     68303   68303   20:68303_A/G_Upstream:DEFB125   266     1       1       0.0018797       NA     NA      NA      NA      NA      NA
20 68319 68319 20:68319_C/A_Upstream:DEFB125 266 0 1 0 NA NA
+
20     68319   68319   20:68319_C/A_Upstream:DEFB125   266     0       1       0       NA     NA      NA      NA      NA      NA
20 68396 68396 20:68396_C/T_Nonsynonymous:DEFB125 266 1 1 0.0018797 NA NA
+
20     68396   68396   20:68396_C/T_Nonsynonymous:DEFB125     266     1       1       0.0018797       NA      NA      NA      NA      NA     NA
20 76635 76635 20:76635_A/T_Intron:DEFB125 266 0 1 0 NA NA
+
20     76635   76635   20:76635_A/T_Intron:DEFB125     266     0       1       0       NA     NA      NA      NA      NA      NA
20 76689 76689 20:76689_T/C_Synonymous:DEFB125 266 0 1 0 NA NA
+
20     76689   76689   20:76689_T/C_Synonymous:DEFB125 266     0       1       0       NA     NA      NA      NA      NA      NA
20 76690 76690 20:76690_T/C_Nonsynonymous:DEFB125 266 1 1 0.0018797 NA NA
+
20     76690   76690   20:76690_T/C_Nonsynonymous:DEFB125     266     1       1       0.0018797       NA      NA      NA     NA      NA      NA
20 76700 76700 20:76700_G/A_Nonsynonymous:DEFB125 266 0 1 0 NA NA
+
20     76700   76700   20:76700_G/A_Nonsynonymous:DEFB125     266     0       1       0       NA      NA      NA      NA     NA      NA
20 76726 76726 20:76726_C/G_Nonsynonymous:DEFB125 266 0 1 0 NA NA
+
20     76726   76726   20:76726_C/G_Nonsynonymous:DEFB125     266     0       1       0       NA      NA      NA      NA      NA     NA
20 76771 76771 20:76771_C/T_Nonsynonymous:DEFB125 266 3 1 0.0056391 0.68484 0.40587
+
20     76771   76771   20:76771_C/T_Nonsynonymous:DEFB125     266     3       1       0.0056391       0.68484 0.40587 145    121    0.013793        0.0082645
  
 
</pre>  
 
</pre>  
 +
The columns in the results file are:
 +
 +
#CHROM: &nbsp;chromosome
 +
#BEGIN: &nbsp;starting position
 +
#END: ending position (same as BEGIN if a SNP)
 +
#MARKER_ID: &nbsp;name of varian
 +
#NS: &nbsp;Number of samples (cases + controls)
 +
#AC: &nbsp;Total allele count in sample
 +
#CALLRATE: &nbsp;call rate
 +
#MAF: &nbsp;minor allele frequency in full sample
 +
#PVALUE: &nbsp;score test association p-value
 +
#SCORE: &nbsp;test statistic for score test
 +
#N.CASE: &nbsp;number of cases
 +
#N.CTRL: &nbsp;number of controls
 +
#AF.CASE: &nbsp;allele frequency in cases only
 +
#AF.CTRL: &nbsp;allele frequency in controls only
 +
 +
Note: &nbsp;For variants below the minimum MAF threshold (min-maf = 0.001), the number of cases and controls (N.CASE, N.CTRL) are not outputted (listed as "NA").
 +
 +
<br>
 +
 
2. &nbsp;'''test.epacts.top5000''' contains the top 5000 associated variants ordered by p-value.  
 
2. &nbsp;'''test.epacts.top5000''' contains the top 5000 associated variants ordered by p-value.  
<pre>$ head out/test.single.b.score.epacts.top5000  
+
<pre>$ head {OUTPUT_DIR}/test.epacts.top5000  
#CHROM BEGIN END MARKER_ID NS AC CALLRATE MAF PVALUE SCORE
+
#CHROM BEGIN   END     MARKER_ID       NS     AC     CALLRATE       MAF     PVALUE SCORE   N.CASE  N.CTRL  AF.CASE AF.CTRL
20 1610894 1610894 20:1610894_G/A_Synonymous:SIRPG 266 136 1 0.25564 0.0001097 3.8681
+
20     1610894 1610894 20:1610894_G/A_Synonymous:SIRPG 266     136     1       0.25564 0.0001097       3.8681 145    121    0.64138 0.35537
20 4162411 4162411 20:4162411_T/C_Intron:SMOX 266 204 1 0.38346 0.00055585 -3.4523
+
20     4162411 4162411 20:4162411_T/C_Intron:SMOX     266     204     1       0.38346 0.00055585     -3.4523 145    121    0.62759 0.93388
20 34061918 34061918 20:34061918_T/C_Intron:CEP250 266 39 1 0.073308 0.0011231 3.2577
+
20     34061918       34061918       20:34061918_T/C_Intron:CEP250   266     39     1       0.073308       0.0011231       3.2577 145    121    0.21379 0.066116
20 4155948 4155948 20:4155948_G/A_Intron:SMOX 266 215 1 0.40414 0.0020791 -3.0787
+
20     4155948 4155948 20:4155948_G/A_Intron:SMOX     266     215     1       0.40414 0.0020791       -3.0787 145    121    0.68276 0.95868
20 4680251 4680251 20:4680251_A/G_Nonsynonymous:PRNP 266 186 1 0.34962 0.0025962 3.0119
+
20     4680251 4680251 20:4680251_A/G_Nonsynonymous:PRNP       266     186     1       0.34962 0.0025962       3.0119 145    121    0.8069  0.57025
20 36668874 36668874 20:36668874_G/A_Synonymous:RPRD1B 266 96 1 0.18045 0.003031 2.9646
+
20     36668874       36668874       20:36668874_G/A_Synonymous:RPRD1B       266     96     1       0.18045 0.003031       2.9646 145    121    0.44828 0.2562
20 36641871 36641871 20:36641871_G/A_Synonymous:TTI1 266 10 1 0.018797 0.004308 -2.8547
+
20     36641871       36641871       20:36641871_G/A_Synonymous:TTI1 266     10     1       0.018797       0.004308       -2.8547 145    121    0.0068966      0.07438
20 32664926 32664926 20:32664926_G/A_Nonsynonymous:RALY 266 20 1 0.037594 0.0046365 2.8313
+
20     32664926       32664926       20:32664926_G/A_Nonsynonymous:RALY     266     20     1       0.037594       0.0046365       2.8313 145    121    0.11724 0.024793
20 34288854 34288854 20:34288854_C/T_Utr3:ROMO1 266 28 1 0.052632 0.0047722 2.822
+
20     34288854       34288854       20:34288854_C/T_Utr3:ROMO1     266     28     1       0.052632       0.0047722       2.822   145    121    0.15862 0.041322
 +
 
 
</pre>  
 
</pre>  
 
3. &nbsp;'''test.epacts.qq.pdf''' contains the Q-Q plot of test p-values (stratified by MAF)<br>[[Image:Test b score epacts qq.png]]  
 
3. &nbsp;'''test.epacts.qq.pdf''' contains the Q-Q plot of test p-values (stratified by MAF)<br>[[Image:Test b score epacts qq.png]]  

Latest revision as of 10:28, 2 February 2017

Download EPACTS

EPACTS is available for download here (100Mb) .

Requirements

  • Linux 64bit
  • Perl 5

Install EPACTS

Uncompress EPACTS package to the directory you would like to install

 tar xzvf epacts_v2_12.noref_binary.2012_10_01.tar.gz

Download the reference FASTA files from 1000 Genomes FTP automatically by running the following commands

cd epacts2.1/
./ref_download.sh
(For advanced users, to save time for downloading the FASTA files (~900MB), you may copy a local copy of GRCh37 FASTA file and the index file to ${EPACTS_DIR}/ext/ref/)

Accessing help

For a list of commands available in EPACTS, type in the following commands:

$ epacts2.1/epacts help

Usage:
epacts [command] [options]

Command:
help Print out brief help message
man Print the full documentation in man page style
single Perform single variant association
group Perform groupwise (burden-style) association test
anno Annotate a VCF file
zoom Create a locus zoom plot from epacts results
meta Perform meta-analysis across multiple epacts results
make-group Create the group information for gene-based testing
make-kin Create a kinship matrix

Visit http://genome.sph.umich.edu/wiki/EPACTS for more detailed documentation


To view options for single variant testing only type in:

$ epacts2.1/epacts single -help
Usage:
epacts single [options]

Required Options (Run epacts single -man or see wiki for more info):
-vcf STR Input VCF file (tabixed and bgzipped)
-ped STR Input PED file for phenotypes and covariates
-out STR Prefix of output files
-test STR Statistical test to use
...

Getting started in EPACTS with an example

Once installed, test out the software by running a quick example using the test data provided in the "example" directory. The example VCF and PED files are:

$ epacts2.1/example/1000G_exome_chr20_example_softFiltered.calls.vcf.gz

$ epacts2.1/example/1000G_dummy_pheno.ped


Run the single variant score test on the example data using this command:

$ epacts2.1/epacts single \
--vcf epacts2.1/example/1000G_exome_chr20_example_softFiltered.calls.vcf.gz \
--ped epacts2.1/example/1000G_dummy_pheno.ped \
--min-maf 0.001 --chr 20 --pheno DISEASE --cov AGE --cov SEX --test b.score --anno \
--out {OUTPUT_DIR}/test --run 2 &

This command will run the single variant test on the input VCF and PED files, with a minimum MAF threshold of 0.001.  The phenotype is "DISEASE" and we are adjusting the analysis with covariates AGE and SEX.  The output file directory prefix is {OUTPUT_DIR}/test.  Finally, EPACTS will run the analysis in parallel on 2 CPUs.

Expected output

EPACTS produces a number of files and plots.

1.  test.epacts.gz contains all the association results.

$ head {OUTPUT_DIR}/test.epacts
#CHROM  BEGIN   END     MARKER_ID       NS      AC      CALLRATE        MAF     PVALUE  SCORE   N.CASE  N.CTRL  AF.CASE AF.CTRL
20      68303   68303   20:68303_A/G_Upstream:DEFB125   266     1       1       0.0018797       NA      NA      NA      NA      NA      NA
20      68319   68319   20:68319_C/A_Upstream:DEFB125   266     0       1       0       NA      NA      NA      NA      NA      NA
20      68396   68396   20:68396_C/T_Nonsynonymous:DEFB125      266     1       1       0.0018797       NA      NA      NA      NA      NA      NA
20      76635   76635   20:76635_A/T_Intron:DEFB125     266     0       1       0       NA      NA      NA      NA      NA      NA
20      76689   76689   20:76689_T/C_Synonymous:DEFB125 266     0       1       0       NA      NA      NA      NA      NA      NA
20      76690   76690   20:76690_T/C_Nonsynonymous:DEFB125      266     1       1       0.0018797       NA      NA      NA      NA      NA      NA
20      76700   76700   20:76700_G/A_Nonsynonymous:DEFB125      266     0       1       0       NA      NA      NA      NA      NA      NA
20      76726   76726   20:76726_C/G_Nonsynonymous:DEFB125      266     0       1       0       NA      NA      NA      NA      NA      NA
20      76771   76771   20:76771_C/T_Nonsynonymous:DEFB125      266     3       1       0.0056391       0.68484 0.40587 145     121     0.013793        0.0082645

The columns in the results file are:

  1. CHROM:  chromosome
  2. BEGIN:  starting position
  3. END: ending position (same as BEGIN if a SNP)
  4. MARKER_ID:  name of varian
  5. NS:  Number of samples (cases + controls)
  6. AC:  Total allele count in sample
  7. CALLRATE:  call rate
  8. MAF:  minor allele frequency in full sample
  9. PVALUE:  score test association p-value
  10. SCORE:  test statistic for score test
  11. N.CASE:  number of cases
  12. N.CTRL:  number of controls
  13. AF.CASE:  allele frequency in cases only
  14. AF.CTRL:  allele frequency in controls only

Note:  For variants below the minimum MAF threshold (min-maf = 0.001), the number of cases and controls (N.CASE, N.CTRL) are not outputted (listed as "NA").


2.  test.epacts.top5000 contains the top 5000 associated variants ordered by p-value.

$ head {OUTPUT_DIR}/test.epacts.top5000 
#CHROM  BEGIN   END     MARKER_ID       NS      AC      CALLRATE        MAF     PVALUE  SCORE   N.CASE  N.CTRL  AF.CASE AF.CTRL
20      1610894 1610894 20:1610894_G/A_Synonymous:SIRPG 266     136     1       0.25564 0.0001097       3.8681  145     121     0.64138 0.35537
20      4162411 4162411 20:4162411_T/C_Intron:SMOX      266     204     1       0.38346 0.00055585      -3.4523 145     121     0.62759 0.93388
20      34061918        34061918        20:34061918_T/C_Intron:CEP250   266     39      1       0.073308        0.0011231       3.2577  145     121     0.21379 0.066116
20      4155948 4155948 20:4155948_G/A_Intron:SMOX      266     215     1       0.40414 0.0020791       -3.0787 145     121     0.68276 0.95868
20      4680251 4680251 20:4680251_A/G_Nonsynonymous:PRNP       266     186     1       0.34962 0.0025962       3.0119  145     121     0.8069  0.57025
20      36668874        36668874        20:36668874_G/A_Synonymous:RPRD1B       266     96      1       0.18045 0.003031        2.9646  145     121     0.44828 0.2562
20      36641871        36641871        20:36641871_G/A_Synonymous:TTI1 266     10      1       0.018797        0.004308        -2.8547 145     121     0.0068966       0.07438
20      32664926        32664926        20:32664926_G/A_Nonsynonymous:RALY      266     20      1       0.037594        0.0046365       2.8313  145     121     0.11724 0.024793
20      34288854        34288854        20:34288854_C/T_Utr3:ROMO1      266     28      1       0.052632        0.0047722       2.822   145     121     0.15862 0.041322

3.  test.epacts.qq.pdf contains the Q-Q plot of test p-values (stratified by MAF)
Test b score epacts qq.png

4.  test.epacts.mh.pdf contains the Manhattan Plot of test p-values
The file out/test.b.score.epacts.mh.pdf will be generated for chr20 only.

Test b score epacts mh.png

An example Genome-wide manhattan plot (from a genome-wide run) will look like below

Tes b score epacts mh gw.png