Changes

From Genome Analysis Wiki
Jump to navigationJump to search
7,546 bytes added ,  08:03, 19 February 2019
Line 1: Line 1: −
(LAST UPDATED on July 3rd, 2012)
+
'''EPACTS''' (Efficient and Parallelizable Association Container Toolbox) is a versatile software pipeline to perform various statistical tests for identifying genome-wide association from sequence data through a user-friendly interface, both to scientific analysts and to method developers.
(NOTE: This document is written in format Supporting Wiki)
     −
'''EPACTS''' (Efficient and Parallelizable Association Container Toolbox) is a versatile software pipeline to perform various statistical test for identifying genome-wide association from sequence data
+
== Join in EPACTS mailing list ==
   −
== ChangeLog ==
+
Please join in the [http://groups.google.com/group/epacts EPACTS Google Group] to ask / discuss / comment about EPACTS.
* Jul 3, 2012 : EPACTS v2.0-beta is released with the following updates
+
 
** Major restructuring of the software
+
== Lastest ChangeLog ==
** Annotation software is switched with built-in application
+
* Dec 15th, 2016 : EPACTS v3.3.0 release (github)
** Addition of SKAT-O and EMMAX burden test
+
** Moved the repository into github
** Minor bug fixes
+
** Some major fixes in handling large sample size (>18,000)
* Apr 8, 2012 : EPACTS v1.2-alpha is released with the following updates, in addition to the following updates
+
** Other minor bug fixes
** EMMAX bug in handling covariates was fixed
+
* July 10th, 2014 : EPACTS v3.2.6 release
** Variable Threshold Test is added
+
** Minor bug fix in epacts-make-kin
** Variable Threshold Test with genomic score (e.g. GERP or PhyloP) is added.
+
* March 11th, 2014 : EPACTS v3.2.5 release
* Apr 4, 2012 : EPACTS v1.1-alpha is released with the following updates, in addition to minor updates
+
** EMMAX-SKAT is implemented with major bug fix
** EMMAX burden test (Hyun Min Kang)
+
* November 21th, 2013 : EPACTS v3.2.4 release
** Likelihood ratio test (Clement Ma)
+
** Fixed a number of minor bugs (more comprehensive fix is still pending)
** Updated version of Firth bias-corrected likelihood ratio test (Clement Ma)
+
* March 25th, 2013 : EPACTS v3.2.3 release
** Updated version of EMMAX single variant test (Hyun Min Kang)
+
** Relaxed the checking of low-rank matrix in SKAT tests (to avoid unncessary skipping of genes)
* Mar 29, 2012 : EPACTS v1.0-alpha is released
+
* March 13th, 2013 : EPACTS v3.2.2 release
 +
** Fixed an error which occasionally report mismatches in the number of samples
 +
* March 9th, 2013 : EPACTS v3.2.1 release
 +
**Fixed errors in loading the dynamic library
 +
** Fixed errors in SKAT-O (thanks to Anubha Mahajan and Jason Flannick)
 +
** Fixed bugs in emmax-CMC
 +
** Added emmax-SKAT (contributed by Seunngeun Lee)
 +
** And additional minor bug fixes
 +
See [[#Full ChangeLog]] for full details
    
== Key Features ==
 
== Key Features ==
Line 45: Line 52:  
== Obtaining EPACTS ==
 
== Obtaining EPACTS ==
   −
The EPACTS software is in 'beta development stage, and we are working on setting active git repository.
+
* The official release of EPACTS software is available at https://github.com/statgen/EPACTS
Currently, EPACTS is only provided as linux binary, can it can be downloaded at http://www.sph.umich.edu/csg/kang/epacts/ . From the CSG cluster, it is available at /net/fantasia/home/hmkang/sw/epacts2/ . Current binary copy will run only in 64-bit linux machine
+
** From the CSG cluster, it is available at /net/fantasia/home/bin/epacts/
 +
* Note that R (version 2.10 or higher) and gnuplot (version 4.2 or higher) must be installed in order to run EPACTS correctly.
   −
=== Currently Supported Statistical Tests ===
+
== Currently Supported Statistical Tests ==
    
EPACTS supports the following sets of widely used statistical tests for single variant tests and burden tests
 
EPACTS supports the following sets of widely used statistical tests for single variant tests and burden tests
   −
==== Single Variant Tests ====
+
=== Single Variant Tests ===
    
<noinclude>
 
<noinclude>
Line 64: Line 72:  
| Implemented by
 
| Implemented by
 
|-  
 
|-  
| b.glm
+
| b.wald
 
| Binary
 
| Binary
 
| YES <br> (Joint)
 
| YES <br> (Joint)
Line 84: Line 92:  
| Firth Bias-Corrected Logistic Likelihood Ratio Test  
 
| Firth Bias-Corrected Logistic Likelihood Ratio Test  
 
| Clement Ma
 
| Clement Ma
 +
|-
 +
| b.spa2
 +
| Binary
 +
| YES <br>
 +
| Moderate
 +
| Saddlepoint Approximation Method
 +
| Shawn Lee & Rounak Dey
 
|-
 
|-
 
| b.lrt
 
| b.lrt
Line 136: Line 151:  
|}
 
|}
   −
==== Gene-wise or group-wise tests ====
+
=== Gene-wise or group-wise tests ===
    
<noinclude>
 
<noinclude>
Line 157: Line 172:  
| b.madsen
 
| b.madsen
 
| Binary
 
| Binary
| YES <br> (Joint Estimation)
+
| NO
 
| Slow
 
| Slow
| Wilcoxon Rank Sum Test between binary phenotypes and weighted rare variant scores
+
| Wilcoxon Rank Sum Test between binary phenotypes and weighted rare variant scores (slightly different version from the published method - it uses pooled allele frequency across cases and controls for weighting each variant)
 
| Hyun Min Kang
 
| Hyun Min Kang
 
|-
 
|-
Line 183: Line 198:  
| Hyun Min Kang
 
| Hyun Min Kang
 
|-
 
|-
| q.skat
+
| skat
| Quantitative
+
| Binary/Quantitative
 
| YES <br> (Joint Estimation)
 
| YES <br> (Joint Estimation)
 
| Slow
 
| Slow
| SKAT Test by Wu et al, AJHG (2011) 89:82-93
+
| SKAT-O Test by Lee et al, Biostatistics (2012)
 
| Seunggeun Lee <br> (adaptive by Xueling Sim and Hyun Min Kang)
 
| Seunggeun Lee <br> (adaptive by Xueling Sim and Hyun Min Kang)
 
|-
 
|-
 
| VT
 
| VT
| Variable Threshold Test <br> with adaptive permutation
+
| Binary/Quantitative
 
| YES <br> (Regressed out first)
 
| YES <br> (Regressed out first)
 
| Slow
 
| Slow
| Price et al, AJHG (2010) 86:832-8
+
| Variable Threshold Test <br> with adaptive permutation <br> Price et al, AJHG (2010) 86:832-8
 
| Hyun Min Kang
 
| Hyun Min Kang
 
|-
 
|-
 
| emmaxCMC
 
| emmaxCMC
| Quantitative
+
| Binary/Quantitative
 
| YES <br> (Regressed Out First)
 
| YES <br> (Regressed Out First)
 
| Slow
 
| Slow
| CMC burden test using EMMAX
+
| Collapsing burden test using EMMAX
 
| Hyun Min Kang
 
| Hyun Min Kang
 
|-
 
|-
 
| emmaxVT
 
| emmaxVT
| Quantitative
+
| Binary/Quantitative
 
| YES <br> (Regressed Out First)
 
| YES <br> (Regressed Out First)
 
| Slow
 
| Slow
 
| Variable-threshold burden test using EMMAX
 
| Variable-threshold burden test using EMMAX
 
| Hyun Min Kang
 
| Hyun Min Kang
 +
|-
 +
| mmskat
 +
| Quantitative
 +
| YES <br> (Regressed Out First)
 +
| Slow
 +
| SKAT test using EMMAX
 +
| Seunggeun Lee & Hyun Min Kang
 
|}
 
|}
   −
== Installation Details ==
+
== Installation Details ==
If you want to use EPACTS in an Ubuntu platform, following the step below
+
 
# Download EPACTS binary at http://www.sph.umich.edu/csg/kang/epacts/download/epacts2.noref_binary.2012_07_03.tar.gz (94MB)
+
If you want to use EPACTS in an Ubuntu platform, following the step below  
# Uncompress EPACTS package to the directory you would like to install
+
 
  tar xzvf epacts2.noref_binary.2012_07_03.tar.gz
+
$ git clone https://github.com/statgen/EPACTS.git
# Download the reference FASTA files by running the following commands
+
$ cd EPACTS
   cd epacts2/
+
$ ./configure --prefix [/path/to/install]
  ./ref_download.sh (Or copy the FASTA and index file locally you have to ${EPACTS_DIR}/ext/ref/)
+
$ make
# Perform a test run by running the following command
+
$ make install
   example/test_run_epacts.sh
+
 
 +
 
 +
(Important Note: '''make sure to specify --prefix=/path/to/install''' to avoid installing to the default path /usr/local/, which you may not have the permission. /home/your_userid/epacts might be a good one, if you are not sure where to install)
 +
 
 +
* Now ${EPACTS_DIR} represents the '/path/to/install' directory
 +
 
 +
* Download the reference FASTA files from 1000 Genomes FTP automatically by running the following commands
 +
 
 +
   ${EPACTS_DIR}/bin/epacts download
 +
 
 +
(For advanced users, to save time for downloading the FASTA files (~900MB), you may copy a local copy of GRCh37 FASTA file and the index file to ${EPACTS_DIR}/share/EPACTS/)
 +
 
 +
*Perform a test run by running the following command
 +
 
 +
   ${EPACTS_DIR}/bin/test_run_epacts.sh
    
In order to use EPACTS in the CSG cluster, you do not need to install them. You can directly use or make a copy of the in-house release version at  
 
In order to use EPACTS in the CSG cluster, you do not need to install them. You can directly use or make a copy of the in-house release version at  
  /net/fantasia/home/hmkang/sw/epacts2
+
 
 +
  /net/fantasia/home/hmkang/tools/epacts-3.3.0/bin/epacts/
 +
 
 +
* If you want to access previous versions, visit http://csg-old.sph.umich.edu/kang/epacts/download
    
== Getting Started With Examples ==
 
== Getting Started With Examples ==
 
If you are using EPACTS from the CSG cluster, please set the following environment variable
 
If you are using EPACTS from the CSG cluster, please set the following environment variable
  EPACTS_DIR=/net/fantasia/home/hmkang/sw/epacts2 (in bash)
+
  EPACTS_DIR=/net/fantasia/home/hmkang/tools/epacts-3.3.0/bin/epacts (in bash)
  setenv EPACTS_DIR /net/fantasia/home/hmkang/sw/epacts2 (in csh)
+
  setenv EPACTS_DIR /net/fantasia/home/hmkang/tools/epacts-3.3.0/bin/epacts (in csh)
    
If you downloaded EPACTS binary and please set EPACTS_DIR to the full path of the downloaded and uncompressed directory.
 
If you downloaded EPACTS binary and please set EPACTS_DIR to the full path of the downloaded and uncompressed directory.
Line 235: Line 274:  
=== All-in-one example ===
 
=== All-in-one example ===
   −
To get started with EPACTS, use the example files located at
+
To get started with EPACTS, run the following command will perform an example run
${EPACTS_DIR}/example
+
  ${EPACTS_DIR}/bin/test_run_epacts.sh
 
  −
Running the following command will perform an example run
  −
  ${EPACTS_DIR}/example/test_run_epacts.sh
   
   
 
   
 
You will find a series of lines in test_run_epacts.sh script commented out for each possible test.  
 
You will find a series of lines in test_run_epacts.sh script commented out for each possible test.  
 +
 +
The example phenotype (PED format) and genotype (VCF format) can be found at
 +
${EPACTS_DIR}/share/EPACTS/
    
=== Single Variant Test ===
 
=== Single Variant Test ===
Line 247: Line 286:  
Or You can run EPACTS command yourself by running
 
Or You can run EPACTS command yourself by running
 
  ${EPACTS_DIR}/epacts single \
 
  ${EPACTS_DIR}/epacts single \
   --vcf  ${EPACTS_DIR}/example/1000G_exome_chr20_example_softFiltered.calls.vcf.gz \
+
   --vcf  ${EPACTS_DIR}/data/1000G_exome_chr20_example_softFiltered.calls.vcf.gz \
   --ped  ${EPACTS_DIR}/example/1000G_dummy_pheno.ped  \
+
   --ped  ${EPACTS_DIR}/data/1000G_dummy_pheno.ped  \
   --minAF 0.001 --chr 20 --pheno DISEASE --cov AGE --cov SEX --test b.score --anno \  
+
   --min-maf 0.001 --chr 20 --pheno DISEASE --cov AGE --cov SEX --test b.score --anno \  
 
   --out out/test --run 2
 
   --out out/test --run 2
   Line 260: Line 299:  
The filename is out/test.single.b.score.epacts.gz and the contents will look like
 
The filename is out/test.single.b.score.epacts.gz and the contents will look like
 
  $ zcat out/test.single.b.score.epacts.gz | head
 
  $ zcat out/test.single.b.score.epacts.gz | head
  #CHROM BEGIN   END     MARKER_ID       NS     AC     CALLRATE       MAF     PVALUE SCORE
+
  #CHROM BEGIN END MARKER_ID NS AC CALLRATE MAF PVALUE SCORE N.CASE N.CTRL AF.CASE AF.CTRL
  20     68303   68303   20:68303_A/G_Upstream:DEFB125   266     1       1       0.0018797       NA     NA
+
  20 68303 68303 20:68303_A/G_Upstream:DEFB125 266 1 1 0.0018797 NA NA NA NA NA NA
  20     68319   68319   20:68319_C/A_Upstream:DEFB125   266     0      1       0       NA     NA
+
  20 68319 68319 20:68319_C/A_Upstream:DEFB125 266 1.4467e-36 1 0 NA NA NA NA NA NA
  20     68396   68396   20:68396_C/T_Nonsynonymous:DEFB125     266     1       1       0.0018797       NA     NA
+
  20 68396 68396 20:68396_C/T_Nonsynonymous:DEFB125 266 1 1 0.0018797 NA NA NA NA NA NA
  20     76635   76635   20:76635_A/T_Intron:DEFB125     266     0      1       0       NA     NA
+
  20 76635 76635 20:76635_A/T_Intron:DEFB125 266 1.534e-37 1 0 NA NA NA NA NA NA
  20     76689   76689   20:76689_T/C_Synonymous:DEFB125 266     0       1       0       NA     NA
+
  20 76689 76689 20:76689_T/C_Synonymous:DEFB125 266 0 1 0 NA NA NA NA NA NA
  20     76690   76690   20:76690_T/C_Nonsynonymous:DEFB125     266     1       1       0.0018797       NA     NA
+
  20 76690 76690 20:76690_T/C_Nonsynonymous:DEFB125 266 1 1 0.0018797 NA NA NA NA NA NA
  20     76700   76700   20:76700_G/A_Nonsynonymous:DEFB125     266     0       1       0       NA     NA
+
  20 76700 76700 20:76700_G/A_Nonsynonymous:DEFB125 266 0 1 0 NA NA NA NA NA NA
  20     76726   76726   20:76726_C/G_Nonsynonymous:DEFB125     266     0       1       0       NA     NA
+
  20 76726 76726 20:76726_C/G_Nonsynonymous:DEFB125 266 0 1 0 NA NA NA NA NA NA
  20     76771   76771   20:76771_C/T_Nonsynonymous:DEFB125     266     3       1       0.0056391       0.68484 0.40587
+
  20 76771 76771 20:76771_C/T_Nonsynonymous:DEFB125 266 3 1 0.0056391 0.68484 0.40587 145 121 0.013793 0.0082645
    
==== Output Text of Top Associations ====
 
==== Output Text of Top Associations ====
Line 276: Line 315:     
  $ head out/test.single.b.score.epacts.top5000  
 
  $ head out/test.single.b.score.epacts.top5000  
  #CHROM BEGIN   END     MARKER_ID       NS     AC     CALLRATE       MAF     PVALUE SCORE
+
  #CHROM BEGIN END MARKER_ID NS AC CALLRATE MAF PVALUE SCORE N.CASE N.CTRL AF.CASE AF.CTRL
  20     1610894 1610894 20:1610894_G/A_Synonymous:SIRPG 266     136    1       0.25564 0.0001097      3.8681
+
  20 1610894 1610894 20:1610894_G/A_Synonymous:SIRPG 266 138.64 1 0.26061 6.9939e-05 3.9765 145 121 0.65177 0.36476
  20     4162411 4162411 20:4162411_T/C_Intron:SMOX     266     204     1       0.38346 0.00055585      -3.4523
+
  20 4162411 4162411 20:4162411_T/C_Intron:SMOX 266 204 1 0.38346 0.00055583 -3.4523 145 121 0.62759 0.93388
  20     34061918       34061918       20:34061918_T/C_Intron:CEP250   266     39      1       0.073308        0.0011231      3.2577
+
  20 34061918 34061918 20:34061918_T/C_Intron:CEP250 266 41.815 1 0.0786 0.00095471 3.3035 145 121 0.22543 0.075436
  20     4155948 4155948 20:4155948_G/A_Intron:SMOX     266     215     1       0.40414 0.0020791      -3.0787
+
  20 4155948 4155948 20:4155948_G/A_Intron:SMOX 266 215 1 0.40414 0.0020792 -3.0787 145 121 0.68276 0.95868
  20     4680251 4680251 20:4680251_A/G_Nonsynonymous:PRNP       266     186     1       0.34962 0.0025962       3.0119
+
  20 4680251 4680251 20:4680251_A/G_Nonsynonymous:PRNP 266 186 1 0.34962 0.0025962 3.0119 145 121 0.8069 0.57025
  20     36668874       36668874       20:36668874_G/A_Synonymous:RPRD1B       266     96     1       0.18045 0.003031       2.9646
+
  20 36668874 36668874 20:36668874_G/A_Synonymous:RPRD1B 266 96 1 0.18045 0.003031 2.9646 145 121 0.44828 0.2562
  20     36641871       36641871       20:36641871_G/A_Synonymous:TTI1 266     10     1       0.018797       0.004308       -2.8547
+
  20 36641871 36641871 20:36641871_G/A_Synonymous:TTI1 266 10 1 0.018797 0.004308 -2.8547 145 121 0.0068966 0.07438
  20     32664926        32664926        20:32664926_G/A_Nonsynonymous:RALY      266     20      1       0.037594        0.0046365      2.8313
+
  20 1616892 1616892 20:1616892_A/G_Synonymous:SIRPG 266 144 1 0.27068 0.0051239 2.7991 145 121 0.63449 0.42975
  20     34288854        34288854        20:34288854_C/T_Utr3:ROMO1      266     28      1       0.052632        0.0047722      2.822
+
  20 25038372 25038372 20:25038372_G/A_Intron:ACSS1 266 103.3 1 0.19418 0.005748 2.7618 145 121 0.47201 0.28813
+
 
 +
The key columns represents:
 +
* '''NS''' : Number of phenotyped samples with non-missing genotypes
 +
* '''AC''' : Total Non-reference Allele Count
 +
* '''CALLRATE''' : Fraction of non-missing genotypes.
 +
* '''MAF''' : Minor allele frequencies
 +
* '''PVALUE''' : P-value of single variant test
 +
* '''AF.CASE''' : Non-reference allele frequencies for cases
 +
* '''AF.CTRL''' : Non-reference allele frequencies for controls
 +
 
 
==== Q-Q plot of test statistics (stratified by MAF) ====
 
==== Q-Q plot of test statistics (stratified by MAF) ====
   −
The file outPutwash/example.exome.DISEASE.score.epacts.qq.pdf will be generated as shown below
+
The file out/test.b.score.epacts.qq.pdf will be generated as shown below
   −
[[File:Pugwash example qq.png]]
+
[[File:test_b_score_epacts_qq.png]]
    
==== Manhattan Plot of Test Statistics ====
 
==== Manhattan Plot of Test Statistics ====
   −
The file outPutwash/example.exome.DISEASE.score.epacts.mh.pdf will be generated for chr20 only.  
+
The file out/test.b.score.epacts.mh.pdf will be generated for chr20 only.  
   −
[[File:Pugwash example mh.png]]
+
[[File:test_b_score_epacts_mh.png]]
    
An example Genome-wide manhattan plot (from a genome-wide run) will look like below
 
An example Genome-wide manhattan plot (from a genome-wide run) will look like below
   −
[[File:Pugwash example mh gw.png]]
+
[[File:tes_b_score_epacts_mh_gw.png]]
    
=== Gene-wise or group-wise burden test ===
 
=== Gene-wise or group-wise burden test ===
Line 320: Line 368:  
Note that [MARKER_ID_K] has to be sorted by increasing order of genomic coordinate
 
Note that [MARKER_ID_K] has to be sorted by increasing order of genomic coordinate
   −
In oeder to create gene-level group file from typically formatted VCF file, one may use the following utility  
+
In order to create gene-level group file from typically formatted VCF file, one may use the following utility  
   −
  epacts makegroup --vcf [input-vcf] --out [output-group-file] --format [epacts, annovar, chaos or gatk] --nonsyn
+
  ${EPACTS_DIR}/epacts make-group --vcf [input-vcf] --out [output-group-file] --format [epacts, annovar, chaos or gatk] --nonsyn
    
The above command create a file [output-group-file] containing a list of missense and nonsense variants per each gene. To incorporate different types of functional annotations, use --type option as follows
 
The above command create a file [output-group-file] containing a list of missense and nonsense variants per each gene. To incorporate different types of functional annotations, use --type option as follows
   −
  epacts makegroup --vcf [input-vcf] --out [output-group-file] --format [epacts, annovar, chaos or gatk] --type [function_type_1] --type [function_type_2] ...
+
  ${EPACTS_DIR}/epacts make-group --vcf [input-vcf] --out [output-group-file] --format [epacts, annovar, chaos or gatk] --type [function_type_1] --type [function_type_2] ...
    
Type 'epacts makegroup -man' for the detailed documentation
 
Type 'epacts makegroup -man' for the detailed documentation
Line 335: Line 383:     
  ${EPACTS_DIR}/epacts anno \
 
  ${EPACTS_DIR}/epacts anno \
     --in ${EPACTS_DIR}/example/1000G_exome_chr20_example_softFiltered.calls.vcf.gz \
+
     --in ${EPACTS_DIR}/data/1000G_exome_chr20_example_softFiltered.calls.vcf.gz \
     --out ${EPACTS_DIR}/example/1000G_exome_chr20_example_softFiltered.calls.anno.vcf.gz
+
     --out ${EPACTS_DIR}/data/1000G_exome_chr20_example_softFiltered.calls.anno.vcf.gz
    
The epacts anno script will add "ANNO=[function]:[genename]" entry into the INFO field based on gencodeV7 (default) or refGene database.
 
The epacts anno script will add "ANNO=[function]:[genename]" entry into the INFO field based on gencodeV7 (default) or refGene database.
Line 346: Line 394:  
To perform a groupwise burden test on the example VCF (annotated as above), run the following command
 
To perform a groupwise burden test on the example VCF (annotated as above), run the following command
   −
  ${EPACTS_DIR}/epacts group --vcf ${EPACTS_DIR}/example/1000G_exome_chr20_example_softFiltered.calls.anno.vcf.gz \
+
  ${EPACTS_DIR}/epacts group --vcf ${EPACTS_DIR}/data/1000G_exome_chr20_example_softFiltered.calls.anno.vcf.gz \
   --groupf ${EPACTS_DIR}/example/1000G_exome_chr20_example_softFiltered.calls.anno.grp --out out/test.gene.skat \
+
   --groupf ${EPACTS_DIR}/data/1000G_exome_chr20_example_softFiltered.calls.anno.grp --out out/test.gene.skat \
   --ped ${EPACTS_DIR}/example/example/1000G_dummy_pheno.ped --maxAF 0.05 \
+
   --ped ${EPACTS_DIR}/data/1000G_dummy_pheno.ped --maxAF 0.05 \
   --chr 20 --pheno QT --cov AGE --cov SEX --test skat --ska-o --run 2
+
   --chr 20 --pheno QT --cov AGE --cov SEX --test skat --skat-o --run 2
    
==== Example Output ====
 
==== Example Output ====
 
  $ head out/test.gene.skat.epacts.top5000
 
  $ head out/test.gene.skat.epacts.top5000
  #CHROM BEGIN  END    MARKER_ID      NS      FRAC_BURDEN     NUM_ALL_VARS    NUM_PASS_VARS  NUM_SING_VARS  PVALUE  STATRHO
+
  #CHROM BEGIN  END    MARKER_ID      NS      FRAC_WITH_RARE     NUM_ALL_VARS    NUM_PASS_VARS  NUM_SING_VARS  PVALUE  STATRHO
 
  20    62607037        62608720        20:62607037-62608720_SAMD10    266    0.14662 9      5      1      0.0020064      1
 
  20    62607037        62608720        20:62607037-62608720_SAMD10    266    0.14662 9      5      1      0.0020064      1
 
  20    2816211 2820493 20:2816211-2820493_FAM113A      266    0.011278        12      2      1      0.0032542      0
 
  20    2816211 2820493 20:2816211-2820493_FAM113A      266    0.011278        12      2      1      0.0032542      0
Line 362: Line 410:  
  20    60962895        60963559        20:60962895-60963559_RPS21      266    0.06015 6      3      2      0.016409        0
 
  20    60962895        60963559        20:60962895-60963559_RPS21      266    0.06015 6      3      2      0.016409        0
 
  20    55904961        55917801        20:55904961-55917801_SPO11      266    0.011278        11      3      3      0.018031        0
 
  20    55904961        55917801        20:55904961-55917801_SPO11      266    0.011278        11      3      3      0.018031        0
 +
 +
The key columns represents:
 +
* '''NS''' : Number of phenotyped samples with non-missing genotypes
 +
* '''FRAC_WITH_RARE''' : Fraction of individual carrying rare variants below --max-maf (default : 0.05) threshold.
 +
* '''NUM_ALL_VARS''' : Number of all variants defining the group.
 +
* '''NUM_PASS_VARS''' : Number of variants passing the --min-maf, --min-mac, --max-maf, --min-callrate thresholds
 +
* '''NUM_SING_VARS''' : Number of singletons among variants in NUM_PASS_VARS
 +
* '''PVALUE''' : P-value of burden tests
 +
* Other columns are test specific auxiliary columns. For example, in the VT test, the optimal MAF threshold is recorded as an auxiliary output column.
    
=== Specialized Instruction for EMMAX tests ===
 
=== Specialized Instruction for EMMAX tests ===
Line 374: Line 431:     
* '''Creating Kinship Matrix''' : From VCF, we recommend to set a MAF (e.g. 0.01) and call rate (e.g. 0.95) threshold to select high-quality markers to generate kinship matrix as follows.
 
* '''Creating Kinship Matrix''' : From VCF, we recommend to set a MAF (e.g. 0.01) and call rate (e.g. 0.95) threshold to select high-quality markers to generate kinship matrix as follows.
  ${EPACTS_DIR}/epacts single \
+
  ${EPACTS_DIR}/epacts make-kin \
   --vcf [input.vcf.gz] --ped  [input.ped] --minAF 0.01 --minCallRate 0.95 \
+
   --vcf [input.vcf.gz] --ped  [input.ped (Optional)] --min-maf 0.01 --minCallRate 0.95 \
   --sepchr --pheno [PHENO_NAME] --cov [COV1] --cov [COV2] --test emmax --kinOnly \
+
   --sepchr (if VCF is separated by chromosome) --out [outprefix.kinf] --run [# of parallel jobs]
  --out [outprefix.kinf] --unit 50000000 --run [# of parallel jobs]
+
 
 +
If you provide [input.ped] file, then it will calculate the subset the individuals contained in the PED file.
    
The procedure above will create a file [outprefix.kinf] after splitting and merging the genomes into multiple pieces. If only a certain subset of SNPs needs to be considered due to target regions, LD-pruning, or any other reasons, a VCF containing the subset of markers must be created beforehand and should be used as input VCF file.
 
The procedure above will create a file [outprefix.kinf] after splitting and merging the genomes into multiple pieces. If only a certain subset of SNPs needs to be considered due to target regions, LD-pruning, or any other reasons, a VCF containing the subset of markers must be created beforehand and should be used as input VCF file.
Line 383: Line 441:  
* '''Perform Single Variant Association''' : From VCF and PED, we recommend to use less stringent MAF threshold (e.g. 0.001) and call rate (e.g. 0.50) to perform single variant association
 
* '''Perform Single Variant Association''' : From VCF and PED, we recommend to use less stringent MAF threshold (e.g. 0.001) and call rate (e.g. 0.50) to perform single variant association
 
  ${EPACTS_DIR}/epacts single \
 
  ${EPACTS_DIR}/epacts single \
   --vcf [input.vcf.gz] --ped  [input.ped] --minAF 0.001 --kin [outputprefix.kinf] \
+
   --vcf [input.vcf.gz] --ped  [input.ped] --min-maf 0.001 --kin [outputprefix.kinf] \
   --sepchr --pheno [PHENO_NAME] --cov [COV1] --cov [COV2] --test emmax \
+
   --sepchr --pheno [PHENO_NAME] --cov [COV1] --cov [COV2] --test q.emmax \
 
   --out [outprefix] --run [# of parallel jobs]
 
   --out [outprefix] --run [# of parallel jobs]
   Line 402: Line 460:  
* Run CMC-style burden test by
 
* Run CMC-style burden test by
 
  ${EPACTS_DIR}/epacts group --groupf [group.grp] \
 
  ${EPACTS_DIR}/epacts group --groupf [group.grp] \
   --vcf [input.vcf.gz] --ped  [input.ped] --maxAF [max-MAF-for-rare-variants] \
+
   --vcf [input.vcf.gz] --ped  [input.ped] --max-maf [max-MAF-for-rare-variants] \
 
   --kin [outputprefix.kinf] --sepchr --pheno [PHENO_NAME] --cov [COV1] --cov [COV2] \
 
   --kin [outputprefix.kinf] --sepchr --pheno [PHENO_NAME] --cov [COV1] --cov [COV2] \
 
   --test emmaxCMC --out [outprefix]  
 
   --test emmaxCMC --out [outprefix]  
 
* Run Variable Threshold burden test by
 
* Run Variable Threshold burden test by
 
  ${EPACTS_DIR}/epacts group --groupf [group.grp] \
 
  ${EPACTS_DIR}/epacts group --groupf [group.grp] \
   --vcf [input.vcf.gz] --ped  [input.ped] --maxAF [max-MAF-for-rare-variants] \
+
   --vcf [input.vcf.gz] --ped  [input.ped] --max-maf [max-MAF-for-rare-variants] \
 
   --kin [outputprefix.kinf] --sepchr --pheno [PHENO_NAME] --cov [COV1] --cov [COV2] \
 
   --kin [outputprefix.kinf] --sepchr --pheno [PHENO_NAME] --cov [COV1] --cov [COV2] \
   --test emmaxVT --out [outprefix]  
+
   --test emmaxVT --out [outprefix]
    
== Preparing Your Own Input Data ==
 
== Preparing Your Own Input Data ==
Line 419: Line 477:  
   bgzip input.vcf    ## this command will produce input.vcf.gz
 
   bgzip input.vcf    ## this command will produce input.vcf.gz
 
   tabix -pvcf -f input.vcf.gz  ## this command will produce input.vcf.gz.tbi
 
   tabix -pvcf -f input.vcf.gz  ## this command will produce input.vcf.gz.tbi
* If the VCF file is separated by chromosome, the VCF file must contain the string "chr1" in the chromosome 1 file, and corresponding chromosome name for other chromosomes.
+
* If the VCF file is separated by chromosome, the VCF file specified in the input argument must contain the string "chr1" in the chromosome 1 file, and corresponding chromosome name for other chromosomes. Thus, the files names should be like <code>[prefix]chr1[suffix].vcf.gz</code>, <code>[prefix]chr2[suffix].vcf.gz</code>, ..., <code>[prefix]chr22[suffix].vcf.gz</code>, <code>[prefix]chrX[suffix].vcf.gz</code>.
 
* Sample IDs in the VCF file must be consistent to those from PED file
 
* Sample IDs in the VCF file must be consistent to those from PED file
 
* Currently EPACTS only support bi-allelic variants, but it handles SNPs, INDELs, snd SVs.
 
* Currently EPACTS only support bi-allelic variants, but it handles SNPs, INDELs, snd SVs.
Line 481: Line 539:  
EPACTS also accept a PED format with header information. The above file can be combined into one file as follows
 
EPACTS also accept a PED format with header information. The above file can be combined into one file as follows
   −
  $ head example/1000G_dummy_pheno.ped
+
  $ head data/1000G_dummy_pheno.ped
 
  #FAM_ID    IND_ID  FAT_ID  MOT_ID  SEX DISEASE QT  AGE
 
  #FAM_ID    IND_ID  FAT_ID  MOT_ID  SEX DISEASE QT  AGE
 
  13281  NA12344 NA12347 NA12348 1  1  94.17  66.1
 
  13281  NA12344 NA12347 NA12348 1  1  94.17  66.1
Line 496: Line 554:     
== Frequently Asked Questions ==
 
== Frequently Asked Questions ==
 +
=== Installation ===
 +
# How should I install EPACTS?
 +
#* See [[EPACTS#Installation_Details | Installation Details]]
 +
# I am having the following error message '''configure: error: libR.{so,a} was not found. Please install it at http://www.r-project.org/ first'''. What do I have to do?
 +
#* First, you need to find out where R was installed. Try to type "locate libR.so" and see if it returns anything
 +
#* If "locate libR.so" returns you something, as explained [[EPACTS#Installation_Details | Installation Details]], try to add "LDFLAGS=-L/path/to/R/library" and rerun '''configure''' and '''make'''
 +
#* If you cannot find libR.so, you make have to recompile R with --enable-R-shlib option as described in http://cran.r-project.org/doc/manuals/R-admin.html#Installation
    
=== Input Files ===
 
=== Input Files ===
 
# What is VCF?
 
# What is VCF?
 
#* VCF refers to Variant Call Format
 
#* VCF refers to Variant Call Format
#* See [[http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41 | 1000 Genomes wiki page]] for the detailed description of VCF format
+
#* See [[http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41 1000 Genomes wiki page]] for the detailed description of VCF format
 
# Should input VCF be compressed into certain format?
 
# Should input VCF be compressed into certain format?
 
#* Correct. EPACTS assumes that VCF file is bgzipped and tabixed already.
 
#* Correct. EPACTS assumes that VCF file is bgzipped and tabixed already.
Line 510: Line 575:  
#* If non-GT field is used, the field is considered as dosage and should be a single numeric value.
 
#* If non-GT field is used, the field is considered as dosage and should be a single numeric value.
 
# What are the acceptable input format to encode phenotypes and covariates?
 
# What are the acceptable input format to encode phenotypes and covariates?
** See [[#PED file for Phenotypes and Covariates]] for the detailed information
+
#* See [[#PED file for Phenotypes and Covariates]] for the detailed information
 
# How should I encode binary phenotypes?
 
# How should I encode binary phenotypes?
 
#* If you encode your phenotypes into two different numeric values (e.g. 0/1 or 1/2), EPACTS will automatically recognize them as binary phenotypes and encode them into 1/2 values. Higher value will be considered as cases for case-control association
 
#* If you encode your phenotypes into two different numeric values (e.g. 0/1 or 1/2), EPACTS will automatically recognize them as binary phenotypes and encode them into 1/2 values. Higher value will be considered as cases for case-control association
Line 529: Line 594:  
#* [[#Manhattan Plot of Test Statistics]] will inform us the genome-wide distribution of association signals
 
#* [[#Manhattan Plot of Test Statistics]] will inform us the genome-wide distribution of association signals
 
#* [[#Output Text of All Test Statistics]] will contain the full information of test results across all units tested
 
#* [[#Output Text of All Test Statistics]] will contain the full information of test results across all units tested
 +
# The Q-Q and Manhattan plots cannot be found. Why?
 +
#* It is probably because gnuplot 4.2 or higher is not installed in your system, or they are included but cannot be found in your ${PATH}. Please visit [[http://gnuplot.info/ GNUPLOT web page]] for installation.
 +
# How can I read the EMMAX kinship file from produced from EPACTS?
 +
# * You can run the following command to dump your kinship matrix into a human-readable text format.
 +
$(EPACTS_DIR)/bin/pEmmax kin-util --kinf [input.kinf] --outf [output.prefix] --dump
    
=== More questions ===
 
=== More questions ===
# If you have more questions, please contact [[mailto:hmkang@umich.edu | Hyun Min Kang]].
+
# If you have more questions, please contact [[mailto:hmkang@umich.edu Hyun Min Kang]].
    
== Detailed Options ==
 
== Detailed Options ==
    
The detailed options can viewed by running the following commands
 
The detailed options can viewed by running the following commands
${EPACTS_DIR}/epacts -man          (for overall structure)
+
${EPACTS_DIR}/bin/epacts -man          (for overall structure)  
${EPACTS_DIR}/epacts single -man    (for single variant test)
+
${EPACTS_DIR}/bin/epacts single -man    (for single variant test)
${EPACTS_DIR}/epacts group -man    (for groupwise test)
+
${EPACTS_DIR}/bin/epacts group -man    (for groupwise test)
${EPACTS_DIR}/epacts anno -man      (for annotation)
+
${EPACTS_DIR}/bin/epacts anno -man      (for annotation)
${EPACTS_DIR}/epacts plot -man      (for QQ and Manhattan plot)
+
${EPACTS_DIR}/bin/epacts plot -man      (for QQ and Manhattan plot)
${EPACTS_DIR}/epacts zoom -man      (for zoom plot)
+
${EPACTS_DIR}/bin/epacts zoom -man      (for zoom plot)
${EPACTS_DIR}/epacts meta -man      (for meta-analysis)
+
${EPACTS_DIR}/bin/epacts meta -man      (for meta-analysis)
${EPACTS_DIR}/epacts makegroup -man (for creating gene group)
+
${EPACTS_DIR}/bin/epacts make-group -man (for creating gene group)
   −
=== Implementing Additional Statistical Tests ===
+
== Implementing Additional Statistical Tests ==
    
In order to add additional statistical test to EPACTS, the following procedure are recommended
 
In order to add additional statistical test to EPACTS, the following procedure are recommended
   −
# Create a file named 'single.[testname].R' for single variant test or 'gene.[testname].R' for gene-level test under ${EPACTS_DIR}/R
+
# Create a file named 'single.[testname].R' for single variant test or 'gene.[testname].R' for gene-level test under ${EPACTS_DIR}/share/EPACTS/
 
# Test your implementation using --test [testname] option to perform sanity check and debugging
 
# Test your implementation using --test [testname] option to perform sanity check and debugging
 
# If you want to add your test in the official in-house version, please send your code to Hyun
 
# If you want to add your test in the official in-house version, please send your code to Hyun
Line 578: Line 648:  
  ## MISSING VALUES : IGNORED
 
  ## MISSING VALUES : IGNORED
 
  single.q.lm <- function() {
 
  single.q.lm <- function() {
   cname <- c("BETA","SEBETA","TSTAT")
+
   cname <- c("BETA","SEBETA","TSTAT") # column names for additional variables in the EPACTS output
 
   m <- nrow(genos)
 
   m <- nrow(genos)
 
   p <- rep(NA,m)
 
   p <- rep(NA,m)
Line 584: Line 654:  
   if ( m > 0 ) {
 
   if ( m > 0 ) {
 
     for(i in 1:m) {
 
     for(i in 1:m) {
       r <- summary(lm(pheno~genos[i,]+cov-1))$coefficients[1,]
+
       r <- summary(lm(pheno~genos[i,]+cov-1))$coefficients[1,] # run simple linear regression
       p[i] <- r[4]
+
       p[i] <- r[4]  # store p-value to p[i]
       add[i,] <- r[1:3]
+
       add[i,] <- r[1:3] # store additional variables to add[i,]
 
     }
 
     }
 
   }
 
   }
Line 679: Line 749:  
# PVALUE : P-value from the test
 
# PVALUE : P-value from the test
 
# Additional columns specified by return values 'add'
 
# Additional columns specified by return values 'add'
 +
 +
== Full ChangeLog ==
 +
* July 10th, 2014 : EPACTS v3.2.6 release
 +
** Minor bug fix in epacts-make-kin
 +
* March 11th, 2014 : EPACTS v3.2.5 release
 +
** EMMAX-SKAT is implemented with major bug fix
 +
* November 21th, 2013 : EPACTS v3.2.4 release
 +
** Fixed a number of minor bugs
 +
** Some known bugs still exist
 +
*** SKAT-O Lambda eigenvalue error. This happenes in a particular context but haven't nailed down a way to prevent it yet.
 +
*** EMMAX has case and control frequency flipped.
 +
* EMMAX test has a silly known bug with case / ctrl frequency is flipped
 +
* March 25th, 2013 : EPACTS v3.2.3 release
 +
** Relaxed the checking of low-rank matrix in SKAT tests (to avoid unncessary skipping of genes)
 +
* March 13th, 2013 : EPACTS v3.2.2 release
 +
** Fixed an error which occasionally report mismatches in the number of samples
 +
* March 9th, 2013 : EPACTS v3.2.1 release
 +
**Fixed errors in loading the dynamic library
 +
** Fixed errors in SKAT-O (thanks to Anubha Mahajan and Jason Flannick)
 +
** Fixed bugs in emmax-CMC
 +
** Added emmax-SKAT (contributed by Seunngeun Lee)
 +
** And additional minor bug fixes
 +
* February 28th, 2013 : EPACTS v3.2.0 release
 +
** R package installation bug (for some users) was fixed
 +
** A bug in the MAF error for high frequency variants (AF>0.25) was now fixed
 +
** SKAT version is updated to 0.81
 +
** --bprange option is added to allow testing for small region size
 +
** Additional minor bug fixes
 +
* December 4th, 2012 : EPACTS v3.1.0 release
 +
** Removed dependency on libR.so
 +
** Additional minor bug fixes
 +
** --bprange option is added to allow testing for small region size
 +
** November 25th, 2012 : EPACTS v3.0.0 release
 +
** Restructured with source code release (with autoconf / automake / libtools)
 +
** Added zoom plot feature
 +
** FRAC_BURDEN keyword was replace to FRAC_WITH_RARE for groupwise testing
 +
* October 26th, 2012 : EPACTS v2.2.0-beta is released with the following updates
 +
** Added --max-mac option
 +
** Fixed Firth's bias-corrected test (by Clement Ma)
 +
** Added more informative warning messages when index files do not exist
 +
** Fixed the bug in the epacts-plot in plotting ties
 +
** Fixed errors in the MAF estimates per case and control
 +
** Fixed bug in --minRSQ option
 +
* September 28, 2012 : EPACTS v2.11-beta is released with the following updates
 +
** Counts and allele frequencies for case/control added for binary tests
 +
** --max-maf parameter is added
 +
** Fixed EMMAX error in MAF in the output
 +
** More informative error messages
 +
* September 27, 2012 : EPACTS v2.1-beta is released with the following updates
 +
** EMMAX interface is changed. --kinOnly option is related with a new command '''make-kin'''
 +
** SKAT-O is upgraded to version 0.77 with additional configurable parameter settings
 +
** Some parameter names are renamed (e.g. --min-maf, --min-mac)
 +
** Many minor bugs are fixed
 +
* Jul 6, 2012 : EPACTS v2.01-beta is released with the following updates
 +
** SKAT-O is upgraded to version 0.76
 +
** Fixed minor bugs in option names (Thanks to Xueling Sim)
 +
* Jul 3, 2012 : EPACTS v2.0-beta is released with the following updates
 +
** Major restructuring of the software
 +
** Annotation software is switched with built-in application
 +
** Addition of SKAT-O and EMMAX burden test
 +
** Minor bug fixes
 +
* Apr 8, 2012 : EPACTS v1.2-alpha is released with the following updates, in addition to the following updates
 +
** EMMAX bug in handling covariates was fixed
 +
** Variable Threshold Test is added
 +
** Variable Threshold Test with genomic score (e.g. GERP or PhyloP) is added.
 +
* Apr 4, 2012 : EPACTS v1.1-alpha is released with the following updates, in addition to minor updates
 +
** EMMAX burden test (Hyun Min Kang)
 +
** Likelihood ratio test (Clement Ma)
 +
** Updated version of Firth bias-corrected likelihood ratio test (Clement Ma)
 +
** Updated version of EMMAX single variant test (Hyun Min Kang)
 +
* Mar 29, 2012 : EPACTS v1.0-alpha is released

Navigation menu