Line 1: |
Line 1: |
− | (LAST UPDATED on July 3rd, 2012) | + | '''EPACTS''' (Efficient and Parallelizable Association Container Toolbox) is a versatile software pipeline to perform various statistical tests for identifying genome-wide association from sequence data through a user-friendly interface, both to scientific analysts and to method developers. |
− | (NOTE: This document is written in format Supporting Wiki)
| |
| | | |
− | '''EPACTS''' (Efficient and Parallelizable Association Container Toolbox) is a versatile software pipeline to perform various statistical test for identifying genome-wide association from sequence data
| + | == Join in EPACTS mailing list == |
| | | |
− | == ChangeLog == | + | Please join in the [http://groups.google.com/group/epacts EPACTS Google Group] to ask / discuss / comment about EPACTS. |
− | * Jul 3, 2012 : EPACTS v2.0-beta is released with the following updates | + | |
− | ** Major restructuring of the software | + | == Lastest ChangeLog == |
− | ** Annotation software is switched with built-in application | + | * Dec 15th, 2016 : EPACTS v3.3.0 release (github) |
− | ** Addition of SKAT-O and EMMAX burden test | + | ** Moved the repository into github |
− | ** Minor bug fixes | + | ** Some major fixes in handling large sample size (>18,000) |
− | * Apr 8, 2012 : EPACTS v1.2-alpha is released with the following updates, in addition to the following updates | + | ** Other minor bug fixes |
− | ** EMMAX bug in handling covariates was fixed | + | * July 10th, 2014 : EPACTS v3.2.6 release |
− | ** Variable Threshold Test is added | + | ** Minor bug fix in epacts-make-kin |
− | ** Variable Threshold Test with genomic score (e.g. GERP or PhyloP) is added. | + | * March 11th, 2014 : EPACTS v3.2.5 release |
− | * Apr 4, 2012 : EPACTS v1.1-alpha is released with the following updates, in addition to minor updates | + | ** EMMAX-SKAT is implemented with major bug fix |
− | ** EMMAX burden test (Hyun Min Kang) | + | * November 21th, 2013 : EPACTS v3.2.4 release |
− | ** Likelihood ratio test (Clement Ma) | + | ** Fixed a number of minor bugs (more comprehensive fix is still pending) |
− | ** Updated version of Firth bias-corrected likelihood ratio test (Clement Ma) | + | * March 25th, 2013 : EPACTS v3.2.3 release |
− | ** Updated version of EMMAX single variant test (Hyun Min Kang) | + | ** Relaxed the checking of low-rank matrix in SKAT tests (to avoid unncessary skipping of genes) |
− | * Mar 29, 2012 : EPACTS v1.0-alpha is released
| + | * March 13th, 2013 : EPACTS v3.2.2 release |
| + | ** Fixed an error which occasionally report mismatches in the number of samples |
| + | * March 9th, 2013 : EPACTS v3.2.1 release |
| + | **Fixed errors in loading the dynamic library |
| + | ** Fixed errors in SKAT-O (thanks to Anubha Mahajan and Jason Flannick) |
| + | ** Fixed bugs in emmax-CMC |
| + | ** Added emmax-SKAT (contributed by Seunngeun Lee) |
| + | ** And additional minor bug fixes |
| + | See [[#Full ChangeLog]] for full details |
| | | |
| == Key Features == | | == Key Features == |
Line 45: |
Line 52: |
| == Obtaining EPACTS == | | == Obtaining EPACTS == |
| | | |
− | The EPACTS software is in 'beta development stage, and we are working on setting active git repository. | + | * The official release of EPACTS software is available at https://github.com/statgen/EPACTS |
− | Currently, EPACTS is only provided as linux binary, can it can be downloaded at http://www.sph.umich.edu/csg/kang/epacts/ . From the CSG cluster, it is available at /net/fantasia/home/hmkang/sw/epacts2/ . Current binary copy will run only in 64-bit linux machine
| + | ** From the CSG cluster, it is available at /net/fantasia/home/bin/epacts/ |
| + | * Note that R (version 2.10 or higher) and gnuplot (version 4.2 or higher) must be installed in order to run EPACTS correctly. |
| | | |
− | === Currently Supported Statistical Tests ===
| + | == Currently Supported Statistical Tests == |
| | | |
| EPACTS supports the following sets of widely used statistical tests for single variant tests and burden tests | | EPACTS supports the following sets of widely used statistical tests for single variant tests and burden tests |
| | | |
− | ==== Single Variant Tests ====
| + | === Single Variant Tests === |
| | | |
| <noinclude> | | <noinclude> |
Line 64: |
Line 72: |
| | Implemented by | | | Implemented by |
| |- | | |- |
− | | b.glm | + | | b.wald |
| | Binary | | | Binary |
| | YES <br> (Joint) | | | YES <br> (Joint) |
Line 84: |
Line 92: |
| | Firth Bias-Corrected Logistic Likelihood Ratio Test | | | Firth Bias-Corrected Logistic Likelihood Ratio Test |
| | Clement Ma | | | Clement Ma |
| + | |- |
| + | | b.spa2 |
| + | | Binary |
| + | | YES <br> |
| + | | Moderate |
| + | | Saddlepoint Approximation Method |
| + | | Shawn Lee & Rounak Dey |
| |- | | |- |
| | b.lrt | | | b.lrt |
Line 136: |
Line 151: |
| |} | | |} |
| | | |
− | ==== Gene-wise or group-wise tests ====
| + | === Gene-wise or group-wise tests === |
| | | |
| <noinclude> | | <noinclude> |
Line 157: |
Line 172: |
| | b.madsen | | | b.madsen |
| | Binary | | | Binary |
− | | YES <br> (Joint Estimation) | + | | NO |
| | Slow | | | Slow |
− | | Wilcoxon Rank Sum Test between binary phenotypes and weighted rare variant scores | + | | Wilcoxon Rank Sum Test between binary phenotypes and weighted rare variant scores (slightly different version from the published method - it uses pooled allele frequency across cases and controls for weighting each variant) |
| | Hyun Min Kang | | | Hyun Min Kang |
| |- | | |- |
Line 184: |
Line 199: |
| |- | | |- |
| | skat | | | skat |
− | | Quantitative | + | | Binary/Quantitative |
| | YES <br> (Joint Estimation) | | | YES <br> (Joint Estimation) |
| | Slow | | | Slow |
Line 191: |
Line 206: |
| |- | | |- |
| | VT | | | VT |
− | | Variable Threshold Test <br> with adaptive permutation | + | | Binary/Quantitative |
| | YES <br> (Regressed out first) | | | YES <br> (Regressed out first) |
| | Slow | | | Slow |
− | | Price et al, AJHG (2010) 86:832-8 | + | | Variable Threshold Test <br> with adaptive permutation <br> Price et al, AJHG (2010) 86:832-8 |
| | Hyun Min Kang | | | Hyun Min Kang |
| |- | | |- |
| | emmaxCMC | | | emmaxCMC |
− | | Quantitative | + | | Binary/Quantitative |
| | YES <br> (Regressed Out First) | | | YES <br> (Regressed Out First) |
| | Slow | | | Slow |
− | | CMC burden test using EMMAX | + | | Collapsing burden test using EMMAX |
| | Hyun Min Kang | | | Hyun Min Kang |
| |- | | |- |
| | emmaxVT | | | emmaxVT |
− | | Quantitative | + | | Binary/Quantitative |
| | YES <br> (Regressed Out First) | | | YES <br> (Regressed Out First) |
| | Slow | | | Slow |
| | Variable-threshold burden test using EMMAX | | | Variable-threshold burden test using EMMAX |
| | Hyun Min Kang | | | Hyun Min Kang |
| + | |- |
| + | | mmskat |
| + | | Quantitative |
| + | | YES <br> (Regressed Out First) |
| + | | Slow |
| + | | SKAT test using EMMAX |
| + | | Seunggeun Lee & Hyun Min Kang |
| |} | | |} |
| | | |
− | == Installation Details == | + | == Installation Details == |
− | If you want to use EPACTS in an Ubuntu platform, following the step below | + | |
− | # Download EPACTS binary at http://www.sph.umich.edu/csg/kang/epacts/download/epacts2.noref_binary.2012_07_03.tar.gz (94MB)
| + | If you want to use EPACTS in an Ubuntu platform, following the step below |
− | # Uncompress EPACTS package to the directory you would like to install
| + | |
− | tar xzvf epacts2.noref_binary.2012_07_03.tar.gz
| + | $ git clone https://github.com/statgen/EPACTS.git |
− | # Download the reference FASTA files by running the following commands
| + | $ cd EPACTS |
− | cd epacts2/ | + | $ ./configure --prefix [/path/to/install] |
− | ./ref_download.sh (Or copy the FASTA and index file locally you have to ${EPACTS_DIR}/ext/ref/)
| + | $ make |
− | # Perform a test run by running the following command
| + | $ make install |
− | example/test_run_epacts.sh | + | |
| + | |
| + | (Important Note: '''make sure to specify --prefix=/path/to/install''' to avoid installing to the default path /usr/local/, which you may not have the permission. /home/your_userid/epacts might be a good one, if you are not sure where to install) |
| + | |
| + | * Now ${EPACTS_DIR} represents the '/path/to/install' directory |
| + | |
| + | * Download the reference FASTA files from 1000 Genomes FTP automatically by running the following commands |
| + | |
| + | ${EPACTS_DIR}/bin/epacts download |
| + | |
| + | (For advanced users, to save time for downloading the FASTA files (~900MB), you may copy a local copy of GRCh37 FASTA file and the index file to ${EPACTS_DIR}/share/EPACTS/) |
| + | |
| + | *Perform a test run by running the following command |
| + | |
| + | ${EPACTS_DIR}/bin/test_run_epacts.sh |
| | | |
| In order to use EPACTS in the CSG cluster, you do not need to install them. You can directly use or make a copy of the in-house release version at | | In order to use EPACTS in the CSG cluster, you do not need to install them. You can directly use or make a copy of the in-house release version at |
− | /net/fantasia/home/hmkang/sw/epacts2 | + | |
| + | /net/fantasia/home/hmkang/tools/epacts-3.3.0/bin/epacts/ |
| + | |
| + | * If you want to access previous versions, visit http://csg-old.sph.umich.edu/kang/epacts/download |
| | | |
| == Getting Started With Examples == | | == Getting Started With Examples == |
| If you are using EPACTS from the CSG cluster, please set the following environment variable | | If you are using EPACTS from the CSG cluster, please set the following environment variable |
− | EPACTS_DIR=/net/fantasia/home/hmkang/sw/epacts2 (in bash) | + | EPACTS_DIR=/net/fantasia/home/hmkang/tools/epacts-3.3.0/bin/epacts (in bash) |
− | setenv EPACTS_DIR /net/fantasia/home/hmkang/sw/epacts2 (in csh) | + | setenv EPACTS_DIR /net/fantasia/home/hmkang/tools/epacts-3.3.0/bin/epacts (in csh) |
| | | |
| If you downloaded EPACTS binary and please set EPACTS_DIR to the full path of the downloaded and uncompressed directory. | | If you downloaded EPACTS binary and please set EPACTS_DIR to the full path of the downloaded and uncompressed directory. |
Line 235: |
Line 274: |
| === All-in-one example === | | === All-in-one example === |
| | | |
− | To get started with EPACTS, use the example files located at | + | To get started with EPACTS, run the following command will perform an example run |
− | ${EPACTS_DIR}/example
| + | ${EPACTS_DIR}/bin/test_run_epacts.sh |
− | | |
− | Running the following command will perform an example run
| |
− | ${EPACTS_DIR}/example/test_run_epacts.sh | |
| | | |
| You will find a series of lines in test_run_epacts.sh script commented out for each possible test. | | You will find a series of lines in test_run_epacts.sh script commented out for each possible test. |
| + | |
| + | The example phenotype (PED format) and genotype (VCF format) can be found at |
| + | ${EPACTS_DIR}/share/EPACTS/ |
| | | |
| === Single Variant Test === | | === Single Variant Test === |
Line 247: |
Line 286: |
| Or You can run EPACTS command yourself by running | | Or You can run EPACTS command yourself by running |
| ${EPACTS_DIR}/epacts single \ | | ${EPACTS_DIR}/epacts single \ |
− | --vcf ${EPACTS_DIR}/example/1000G_exome_chr20_example_softFiltered.calls.vcf.gz \ | + | --vcf ${EPACTS_DIR}/data/1000G_exome_chr20_example_softFiltered.calls.vcf.gz \ |
− | --ped ${EPACTS_DIR}/example/1000G_dummy_pheno.ped \ | + | --ped ${EPACTS_DIR}/data/1000G_dummy_pheno.ped \ |
− | --minAF 0.001 --chr 20 --pheno DISEASE --cov AGE --cov SEX --test b.score --anno \ | + | --min-maf 0.001 --chr 20 --pheno DISEASE --cov AGE --cov SEX --test b.score --anno \ |
| --out out/test --run 2 | | --out out/test --run 2 |
| | | |
Line 260: |
Line 299: |
| The filename is out/test.single.b.score.epacts.gz and the contents will look like | | The filename is out/test.single.b.score.epacts.gz and the contents will look like |
| $ zcat out/test.single.b.score.epacts.gz | head | | $ zcat out/test.single.b.score.epacts.gz | head |
− | #CHROM BEGIN END MARKER_ID NS AC CALLRATE MAF PVALUE SCORE | + | #CHROM BEGIN END MARKER_ID NS AC CALLRATE MAF PVALUE SCORE N.CASE N.CTRL AF.CASE AF.CTRL |
− | 20 68303 68303 20:68303_A/G_Upstream:DEFB125 266 1 1 0.0018797 NA NA | + | 20 68303 68303 20:68303_A/G_Upstream:DEFB125 266 1 1 0.0018797 NA NA NA NA NA NA |
− | 20 68319 68319 20:68319_C/A_Upstream:DEFB125 266 0 1 0 NA NA | + | 20 68319 68319 20:68319_C/A_Upstream:DEFB125 266 1.4467e-36 1 0 NA NA NA NA NA NA |
− | 20 68396 68396 20:68396_C/T_Nonsynonymous:DEFB125 266 1 1 0.0018797 NA NA | + | 20 68396 68396 20:68396_C/T_Nonsynonymous:DEFB125 266 1 1 0.0018797 NA NA NA NA NA NA |
− | 20 76635 76635 20:76635_A/T_Intron:DEFB125 266 0 1 0 NA NA | + | 20 76635 76635 20:76635_A/T_Intron:DEFB125 266 1.534e-37 1 0 NA NA NA NA NA NA |
− | 20 76689 76689 20:76689_T/C_Synonymous:DEFB125 266 0 1 0 NA NA | + | 20 76689 76689 20:76689_T/C_Synonymous:DEFB125 266 0 1 0 NA NA NA NA NA NA |
− | 20 76690 76690 20:76690_T/C_Nonsynonymous:DEFB125 266 1 1 0.0018797 NA NA | + | 20 76690 76690 20:76690_T/C_Nonsynonymous:DEFB125 266 1 1 0.0018797 NA NA NA NA NA NA |
− | 20 76700 76700 20:76700_G/A_Nonsynonymous:DEFB125 266 0 1 0 NA NA | + | 20 76700 76700 20:76700_G/A_Nonsynonymous:DEFB125 266 0 1 0 NA NA NA NA NA NA |
− | 20 76726 76726 20:76726_C/G_Nonsynonymous:DEFB125 266 0 1 0 NA NA | + | 20 76726 76726 20:76726_C/G_Nonsynonymous:DEFB125 266 0 1 0 NA NA NA NA NA NA |
− | 20 76771 76771 20:76771_C/T_Nonsynonymous:DEFB125 266 3 1 0.0056391 0.68484 0.40587 | + | 20 76771 76771 20:76771_C/T_Nonsynonymous:DEFB125 266 3 1 0.0056391 0.68484 0.40587 145 121 0.013793 0.0082645 |
| | | |
| ==== Output Text of Top Associations ==== | | ==== Output Text of Top Associations ==== |
Line 276: |
Line 315: |
| | | |
| $ head out/test.single.b.score.epacts.top5000 | | $ head out/test.single.b.score.epacts.top5000 |
− | #CHROM BEGIN END MARKER_ID NS AC CALLRATE MAF PVALUE SCORE | + | #CHROM BEGIN END MARKER_ID NS AC CALLRATE MAF PVALUE SCORE N.CASE N.CTRL AF.CASE AF.CTRL |
− | 20 1610894 1610894 20:1610894_G/A_Synonymous:SIRPG 266 136 1 0.25564 0.0001097 3.8681 | + | 20 1610894 1610894 20:1610894_G/A_Synonymous:SIRPG 266 138.64 1 0.26061 6.9939e-05 3.9765 145 121 0.65177 0.36476 |
− | 20 4162411 4162411 20:4162411_T/C_Intron:SMOX 266 204 1 0.38346 0.00055585 -3.4523 | + | 20 4162411 4162411 20:4162411_T/C_Intron:SMOX 266 204 1 0.38346 0.00055583 -3.4523 145 121 0.62759 0.93388 |
− | 20 34061918 34061918 20:34061918_T/C_Intron:CEP250 266 39 1 0.073308 0.0011231 3.2577 | + | 20 34061918 34061918 20:34061918_T/C_Intron:CEP250 266 41.815 1 0.0786 0.00095471 3.3035 145 121 0.22543 0.075436 |
− | 20 4155948 4155948 20:4155948_G/A_Intron:SMOX 266 215 1 0.40414 0.0020791 -3.0787 | + | 20 4155948 4155948 20:4155948_G/A_Intron:SMOX 266 215 1 0.40414 0.0020792 -3.0787 145 121 0.68276 0.95868 |
− | 20 4680251 4680251 20:4680251_A/G_Nonsynonymous:PRNP 266 186 1 0.34962 0.0025962 3.0119 | + | 20 4680251 4680251 20:4680251_A/G_Nonsynonymous:PRNP 266 186 1 0.34962 0.0025962 3.0119 145 121 0.8069 0.57025 |
− | 20 36668874 36668874 20:36668874_G/A_Synonymous:RPRD1B 266 96 1 0.18045 0.003031 2.9646 | + | 20 36668874 36668874 20:36668874_G/A_Synonymous:RPRD1B 266 96 1 0.18045 0.003031 2.9646 145 121 0.44828 0.2562 |
− | 20 36641871 36641871 20:36641871_G/A_Synonymous:TTI1 266 10 1 0.018797 0.004308 -2.8547 | + | 20 36641871 36641871 20:36641871_G/A_Synonymous:TTI1 266 10 1 0.018797 0.004308 -2.8547 145 121 0.0068966 0.07438 |
− | 20 32664926 32664926 20:32664926_G/A_Nonsynonymous:RALY 266 20 1 0.037594 0.0046365 2.8313 | + | 20 1616892 1616892 20:1616892_A/G_Synonymous:SIRPG 266 144 1 0.27068 0.0051239 2.7991 145 121 0.63449 0.42975 |
− | 20 34288854 34288854 20:34288854_C/T_Utr3:ROMO1 266 28 1 0.052632 0.0047722 2.822 | + | 20 25038372 25038372 20:25038372_G/A_Intron:ACSS1 266 103.3 1 0.19418 0.005748 2.7618 145 121 0.47201 0.28813 |
− |
| + | |
| + | The key columns represents: |
| + | * '''NS''' : Number of phenotyped samples with non-missing genotypes |
| + | * '''AC''' : Total Non-reference Allele Count |
| + | * '''CALLRATE''' : Fraction of non-missing genotypes. |
| + | * '''MAF''' : Minor allele frequencies |
| + | * '''PVALUE''' : P-value of single variant test |
| + | * '''AF.CASE''' : Non-reference allele frequencies for cases |
| + | * '''AF.CTRL''' : Non-reference allele frequencies for controls |
| + | |
| ==== Q-Q plot of test statistics (stratified by MAF) ==== | | ==== Q-Q plot of test statistics (stratified by MAF) ==== |
| | | |
− | The file outPutwash/example.exome.DISEASE.score.epacts.qq.pdf will be generated as shown below | + | The file out/test.b.score.epacts.qq.pdf will be generated as shown below |
| | | |
− | [[File:Pugwash example qq.png]] | + | [[File:test_b_score_epacts_qq.png]] |
| | | |
| ==== Manhattan Plot of Test Statistics ==== | | ==== Manhattan Plot of Test Statistics ==== |
| | | |
− | The file outPutwash/example.exome.DISEASE.score.epacts.mh.pdf will be generated for chr20 only. | + | The file out/test.b.score.epacts.mh.pdf will be generated for chr20 only. |
| | | |
− | [[File:Pugwash example mh.png]] | + | [[File:test_b_score_epacts_mh.png]] |
| | | |
| An example Genome-wide manhattan plot (from a genome-wide run) will look like below | | An example Genome-wide manhattan plot (from a genome-wide run) will look like below |
| | | |
− | [[File:Pugwash example mh gw.png]] | + | [[File:tes_b_score_epacts_mh_gw.png]] |
| | | |
| === Gene-wise or group-wise burden test === | | === Gene-wise or group-wise burden test === |
Line 320: |
Line 368: |
| Note that [MARKER_ID_K] has to be sorted by increasing order of genomic coordinate | | Note that [MARKER_ID_K] has to be sorted by increasing order of genomic coordinate |
| | | |
− | In oeder to create gene-level group file from typically formatted VCF file, one may use the following utility | + | In order to create gene-level group file from typically formatted VCF file, one may use the following utility |
| | | |
− | epacts makegroup --vcf [input-vcf] --out [output-group-file] --format [epacts, annovar, chaos or gatk] --nonsyn | + | ${EPACTS_DIR}/epacts make-group --vcf [input-vcf] --out [output-group-file] --format [epacts, annovar, chaos or gatk] --nonsyn |
| | | |
| The above command create a file [output-group-file] containing a list of missense and nonsense variants per each gene. To incorporate different types of functional annotations, use --type option as follows | | The above command create a file [output-group-file] containing a list of missense and nonsense variants per each gene. To incorporate different types of functional annotations, use --type option as follows |
| | | |
− | epacts makegroup --vcf [input-vcf] --out [output-group-file] --format [epacts, annovar, chaos or gatk] --type [function_type_1] --type [function_type_2] ... | + | ${EPACTS_DIR}/epacts make-group --vcf [input-vcf] --out [output-group-file] --format [epacts, annovar, chaos or gatk] --type [function_type_1] --type [function_type_2] ... |
| | | |
| Type 'epacts makegroup -man' for the detailed documentation | | Type 'epacts makegroup -man' for the detailed documentation |
Line 335: |
Line 383: |
| | | |
| ${EPACTS_DIR}/epacts anno \ | | ${EPACTS_DIR}/epacts anno \ |
− | --in ${EPACTS_DIR}/example/1000G_exome_chr20_example_softFiltered.calls.vcf.gz \ | + | --in ${EPACTS_DIR}/data/1000G_exome_chr20_example_softFiltered.calls.vcf.gz \ |
− | --out ${EPACTS_DIR}/example/1000G_exome_chr20_example_softFiltered.calls.anno.vcf.gz | + | --out ${EPACTS_DIR}/data/1000G_exome_chr20_example_softFiltered.calls.anno.vcf.gz |
| | | |
| The epacts anno script will add "ANNO=[function]:[genename]" entry into the INFO field based on gencodeV7 (default) or refGene database. | | The epacts anno script will add "ANNO=[function]:[genename]" entry into the INFO field based on gencodeV7 (default) or refGene database. |
Line 346: |
Line 394: |
| To perform a groupwise burden test on the example VCF (annotated as above), run the following command | | To perform a groupwise burden test on the example VCF (annotated as above), run the following command |
| | | |
− | ${EPACTS_DIR}/epacts group --vcf ${EPACTS_DIR}/example/1000G_exome_chr20_example_softFiltered.calls.anno.vcf.gz \ | + | ${EPACTS_DIR}/epacts group --vcf ${EPACTS_DIR}/data/1000G_exome_chr20_example_softFiltered.calls.anno.vcf.gz \ |
− | --groupf ${EPACTS_DIR}/example/1000G_exome_chr20_example_softFiltered.calls.anno.grp --out out/test.gene.skat \ | + | --groupf ${EPACTS_DIR}/data/1000G_exome_chr20_example_softFiltered.calls.anno.grp --out out/test.gene.skat \ |
− | --ped ${EPACTS_DIR}/example/example/1000G_dummy_pheno.ped --maxAF 0.05 \ | + | --ped ${EPACTS_DIR}/data/1000G_dummy_pheno.ped --maxAF 0.05 \ |
− | --chr 20 --pheno QT --cov AGE --cov SEX --test skat --ska-o --run 2 | + | --chr 20 --pheno QT --cov AGE --cov SEX --test skat --skat-o --run 2 |
| | | |
| ==== Example Output ==== | | ==== Example Output ==== |
| $ head out/test.gene.skat.epacts.top5000 | | $ head out/test.gene.skat.epacts.top5000 |
− | #CHROM BEGIN END MARKER_ID NS FRAC_BURDEN NUM_ALL_VARS NUM_PASS_VARS NUM_SING_VARS PVALUE STATRHO | + | #CHROM BEGIN END MARKER_ID NS FRAC_WITH_RARE NUM_ALL_VARS NUM_PASS_VARS NUM_SING_VARS PVALUE STATRHO |
| 20 62607037 62608720 20:62607037-62608720_SAMD10 266 0.14662 9 5 1 0.0020064 1 | | 20 62607037 62608720 20:62607037-62608720_SAMD10 266 0.14662 9 5 1 0.0020064 1 |
| 20 2816211 2820493 20:2816211-2820493_FAM113A 266 0.011278 12 2 1 0.0032542 0 | | 20 2816211 2820493 20:2816211-2820493_FAM113A 266 0.011278 12 2 1 0.0032542 0 |
Line 362: |
Line 410: |
| 20 60962895 60963559 20:60962895-60963559_RPS21 266 0.06015 6 3 2 0.016409 0 | | 20 60962895 60963559 20:60962895-60963559_RPS21 266 0.06015 6 3 2 0.016409 0 |
| 20 55904961 55917801 20:55904961-55917801_SPO11 266 0.011278 11 3 3 0.018031 0 | | 20 55904961 55917801 20:55904961-55917801_SPO11 266 0.011278 11 3 3 0.018031 0 |
| + | |
| + | The key columns represents: |
| + | * '''NS''' : Number of phenotyped samples with non-missing genotypes |
| + | * '''FRAC_WITH_RARE''' : Fraction of individual carrying rare variants below --max-maf (default : 0.05) threshold. |
| + | * '''NUM_ALL_VARS''' : Number of all variants defining the group. |
| + | * '''NUM_PASS_VARS''' : Number of variants passing the --min-maf, --min-mac, --max-maf, --min-callrate thresholds |
| + | * '''NUM_SING_VARS''' : Number of singletons among variants in NUM_PASS_VARS |
| + | * '''PVALUE''' : P-value of burden tests |
| + | * Other columns are test specific auxiliary columns. For example, in the VT test, the optimal MAF threshold is recorded as an auxiliary output column. |
| | | |
| === Specialized Instruction for EMMAX tests === | | === Specialized Instruction for EMMAX tests === |
Line 374: |
Line 431: |
| | | |
| * '''Creating Kinship Matrix''' : From VCF, we recommend to set a MAF (e.g. 0.01) and call rate (e.g. 0.95) threshold to select high-quality markers to generate kinship matrix as follows. | | * '''Creating Kinship Matrix''' : From VCF, we recommend to set a MAF (e.g. 0.01) and call rate (e.g. 0.95) threshold to select high-quality markers to generate kinship matrix as follows. |
− | ${EPACTS_DIR}/epacts single \ | + | ${EPACTS_DIR}/epacts make-kin \ |
− | --vcf [input.vcf.gz] --ped [input.ped] --minAF 0.01 --minCallRate 0.95 \ | + | --vcf [input.vcf.gz] --ped [input.ped (Optional)] --min-maf 0.01 --minCallRate 0.95 \ |
− | --sepchr --pheno [PHENO_NAME] --cov [COV1] --cov [COV2] --test emmax --kinOnly \ | + | --sepchr (if VCF is separated by chromosome) --out [outprefix.kinf] --run [# of parallel jobs] |
− | --out [outprefix.kinf] --unit 50000000 --run [# of parallel jobs]
| + | |
| + | If you provide [input.ped] file, then it will calculate the subset the individuals contained in the PED file. |
| | | |
| The procedure above will create a file [outprefix.kinf] after splitting and merging the genomes into multiple pieces. If only a certain subset of SNPs needs to be considered due to target regions, LD-pruning, or any other reasons, a VCF containing the subset of markers must be created beforehand and should be used as input VCF file. | | The procedure above will create a file [outprefix.kinf] after splitting and merging the genomes into multiple pieces. If only a certain subset of SNPs needs to be considered due to target regions, LD-pruning, or any other reasons, a VCF containing the subset of markers must be created beforehand and should be used as input VCF file. |
Line 383: |
Line 441: |
| * '''Perform Single Variant Association''' : From VCF and PED, we recommend to use less stringent MAF threshold (e.g. 0.001) and call rate (e.g. 0.50) to perform single variant association | | * '''Perform Single Variant Association''' : From VCF and PED, we recommend to use less stringent MAF threshold (e.g. 0.001) and call rate (e.g. 0.50) to perform single variant association |
| ${EPACTS_DIR}/epacts single \ | | ${EPACTS_DIR}/epacts single \ |
− | --vcf [input.vcf.gz] --ped [input.ped] --minAF 0.001 --kin [outputprefix.kinf] \ | + | --vcf [input.vcf.gz] --ped [input.ped] --min-maf 0.001 --kin [outputprefix.kinf] \ |
− | --sepchr --pheno [PHENO_NAME] --cov [COV1] --cov [COV2] --test emmax \ | + | --sepchr --pheno [PHENO_NAME] --cov [COV1] --cov [COV2] --test q.emmax \ |
| --out [outprefix] --run [# of parallel jobs] | | --out [outprefix] --run [# of parallel jobs] |
| | | |
Line 402: |
Line 460: |
| * Run CMC-style burden test by | | * Run CMC-style burden test by |
| ${EPACTS_DIR}/epacts group --groupf [group.grp] \ | | ${EPACTS_DIR}/epacts group --groupf [group.grp] \ |
− | --vcf [input.vcf.gz] --ped [input.ped] --maxAF [max-MAF-for-rare-variants] \ | + | --vcf [input.vcf.gz] --ped [input.ped] --max-maf [max-MAF-for-rare-variants] \ |
| --kin [outputprefix.kinf] --sepchr --pheno [PHENO_NAME] --cov [COV1] --cov [COV2] \ | | --kin [outputprefix.kinf] --sepchr --pheno [PHENO_NAME] --cov [COV1] --cov [COV2] \ |
| --test emmaxCMC --out [outprefix] | | --test emmaxCMC --out [outprefix] |
| * Run Variable Threshold burden test by | | * Run Variable Threshold burden test by |
| ${EPACTS_DIR}/epacts group --groupf [group.grp] \ | | ${EPACTS_DIR}/epacts group --groupf [group.grp] \ |
− | --vcf [input.vcf.gz] --ped [input.ped] --maxAF [max-MAF-for-rare-variants] \ | + | --vcf [input.vcf.gz] --ped [input.ped] --max-maf [max-MAF-for-rare-variants] \ |
| --kin [outputprefix.kinf] --sepchr --pheno [PHENO_NAME] --cov [COV1] --cov [COV2] \ | | --kin [outputprefix.kinf] --sepchr --pheno [PHENO_NAME] --cov [COV1] --cov [COV2] \ |
− | --test emmaxVT --out [outprefix] | + | --test emmaxVT --out [outprefix] |
| | | |
| == Preparing Your Own Input Data == | | == Preparing Your Own Input Data == |
Line 419: |
Line 477: |
| bgzip input.vcf ## this command will produce input.vcf.gz | | bgzip input.vcf ## this command will produce input.vcf.gz |
| tabix -pvcf -f input.vcf.gz ## this command will produce input.vcf.gz.tbi | | tabix -pvcf -f input.vcf.gz ## this command will produce input.vcf.gz.tbi |
− | * If the VCF file is separated by chromosome, the VCF file must contain the string "chr1" in the chromosome 1 file, and corresponding chromosome name for other chromosomes. | + | * If the VCF file is separated by chromosome, the VCF file specified in the input argument must contain the string "chr1" in the chromosome 1 file, and corresponding chromosome name for other chromosomes. Thus, the files names should be like <code>[prefix]chr1[suffix].vcf.gz</code>, <code>[prefix]chr2[suffix].vcf.gz</code>, ..., <code>[prefix]chr22[suffix].vcf.gz</code>, <code>[prefix]chrX[suffix].vcf.gz</code>. |
| * Sample IDs in the VCF file must be consistent to those from PED file | | * Sample IDs in the VCF file must be consistent to those from PED file |
| * Currently EPACTS only support bi-allelic variants, but it handles SNPs, INDELs, snd SVs. | | * Currently EPACTS only support bi-allelic variants, but it handles SNPs, INDELs, snd SVs. |
Line 481: |
Line 539: |
| EPACTS also accept a PED format with header information. The above file can be combined into one file as follows | | EPACTS also accept a PED format with header information. The above file can be combined into one file as follows |
| | | |
− | $ head example/1000G_dummy_pheno.ped | + | $ head data/1000G_dummy_pheno.ped |
| #FAM_ID IND_ID FAT_ID MOT_ID SEX DISEASE QT AGE | | #FAM_ID IND_ID FAT_ID MOT_ID SEX DISEASE QT AGE |
| 13281 NA12344 NA12347 NA12348 1 1 94.17 66.1 | | 13281 NA12344 NA12347 NA12348 1 1 94.17 66.1 |
Line 496: |
Line 554: |
| | | |
| == Frequently Asked Questions == | | == Frequently Asked Questions == |
| + | === Installation === |
| + | # How should I install EPACTS? |
| + | #* See [[EPACTS#Installation_Details | Installation Details]] |
| + | # I am having the following error message '''configure: error: libR.{so,a} was not found. Please install it at http://www.r-project.org/ first'''. What do I have to do? |
| + | #* First, you need to find out where R was installed. Try to type "locate libR.so" and see if it returns anything |
| + | #* If "locate libR.so" returns you something, as explained [[EPACTS#Installation_Details | Installation Details]], try to add "LDFLAGS=-L/path/to/R/library" and rerun '''configure''' and '''make''' |
| + | #* If you cannot find libR.so, you make have to recompile R with --enable-R-shlib option as described in http://cran.r-project.org/doc/manuals/R-admin.html#Installation |
| | | |
| === Input Files === | | === Input Files === |
| # What is VCF? | | # What is VCF? |
| #* VCF refers to Variant Call Format | | #* VCF refers to Variant Call Format |
− | #* See [[http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41 | 1000 Genomes wiki page]] for the detailed description of VCF format | + | #* See [[http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41 1000 Genomes wiki page]] for the detailed description of VCF format |
| # Should input VCF be compressed into certain format? | | # Should input VCF be compressed into certain format? |
| #* Correct. EPACTS assumes that VCF file is bgzipped and tabixed already. | | #* Correct. EPACTS assumes that VCF file is bgzipped and tabixed already. |
Line 510: |
Line 575: |
| #* If non-GT field is used, the field is considered as dosage and should be a single numeric value. | | #* If non-GT field is used, the field is considered as dosage and should be a single numeric value. |
| # What are the acceptable input format to encode phenotypes and covariates? | | # What are the acceptable input format to encode phenotypes and covariates? |
− | ** See [[#PED file for Phenotypes and Covariates]] for the detailed information
| + | #* See [[#PED file for Phenotypes and Covariates]] for the detailed information |
| # How should I encode binary phenotypes? | | # How should I encode binary phenotypes? |
| #* If you encode your phenotypes into two different numeric values (e.g. 0/1 or 1/2), EPACTS will automatically recognize them as binary phenotypes and encode them into 1/2 values. Higher value will be considered as cases for case-control association | | #* If you encode your phenotypes into two different numeric values (e.g. 0/1 or 1/2), EPACTS will automatically recognize them as binary phenotypes and encode them into 1/2 values. Higher value will be considered as cases for case-control association |
Line 529: |
Line 594: |
| #* [[#Manhattan Plot of Test Statistics]] will inform us the genome-wide distribution of association signals | | #* [[#Manhattan Plot of Test Statistics]] will inform us the genome-wide distribution of association signals |
| #* [[#Output Text of All Test Statistics]] will contain the full information of test results across all units tested | | #* [[#Output Text of All Test Statistics]] will contain the full information of test results across all units tested |
| + | # The Q-Q and Manhattan plots cannot be found. Why? |
| + | #* It is probably because gnuplot 4.2 or higher is not installed in your system, or they are included but cannot be found in your ${PATH}. Please visit [[http://gnuplot.info/ GNUPLOT web page]] for installation. |
| + | # How can I read the EMMAX kinship file from produced from EPACTS? |
| + | # * You can run the following command to dump your kinship matrix into a human-readable text format. |
| + | $(EPACTS_DIR)/bin/pEmmax kin-util --kinf [input.kinf] --outf [output.prefix] --dump |
| | | |
| === More questions === | | === More questions === |
− | # If you have more questions, please contact [[mailto:hmkang@umich.edu | Hyun Min Kang]]. | + | # If you have more questions, please contact [[mailto:hmkang@umich.edu Hyun Min Kang]]. |
| | | |
| == Detailed Options == | | == Detailed Options == |
| | | |
| The detailed options can viewed by running the following commands | | The detailed options can viewed by running the following commands |
− | ${EPACTS_DIR}/epacts -man (for overall structure) | + | ${EPACTS_DIR}/bin/epacts -man (for overall structure) |
− | ${EPACTS_DIR}/epacts single -man (for single variant test) | + | ${EPACTS_DIR}/bin/epacts single -man (for single variant test) |
− | ${EPACTS_DIR}/epacts group -man (for groupwise test) | + | ${EPACTS_DIR}/bin/epacts group -man (for groupwise test) |
− | ${EPACTS_DIR}/epacts anno -man (for annotation) | + | ${EPACTS_DIR}/bin/epacts anno -man (for annotation) |
− | ${EPACTS_DIR}/epacts plot -man (for QQ and Manhattan plot) | + | ${EPACTS_DIR}/bin/epacts plot -man (for QQ and Manhattan plot) |
− | ${EPACTS_DIR}/epacts zoom -man (for zoom plot) | + | ${EPACTS_DIR}/bin/epacts zoom -man (for zoom plot) |
− | ${EPACTS_DIR}/epacts meta -man (for meta-analysis) | + | ${EPACTS_DIR}/bin/epacts meta -man (for meta-analysis) |
− | ${EPACTS_DIR}/epacts makegroup -man (for creating gene group) | + | ${EPACTS_DIR}/bin/epacts make-group -man (for creating gene group) |
| | | |
− | === Implementing Additional Statistical Tests ===
| + | == Implementing Additional Statistical Tests == |
| | | |
| In order to add additional statistical test to EPACTS, the following procedure are recommended | | In order to add additional statistical test to EPACTS, the following procedure are recommended |
| | | |
− | # Create a file named 'single.[testname].R' for single variant test or 'gene.[testname].R' for gene-level test under ${EPACTS_DIR}/R | + | # Create a file named 'single.[testname].R' for single variant test or 'gene.[testname].R' for gene-level test under ${EPACTS_DIR}/share/EPACTS/ |
| # Test your implementation using --test [testname] option to perform sanity check and debugging | | # Test your implementation using --test [testname] option to perform sanity check and debugging |
| # If you want to add your test in the official in-house version, please send your code to Hyun | | # If you want to add your test in the official in-house version, please send your code to Hyun |
Line 578: |
Line 648: |
| ## MISSING VALUES : IGNORED | | ## MISSING VALUES : IGNORED |
| single.q.lm <- function() { | | single.q.lm <- function() { |
− | cname <- c("BETA","SEBETA","TSTAT") | + | cname <- c("BETA","SEBETA","TSTAT") # column names for additional variables in the EPACTS output |
| m <- nrow(genos) | | m <- nrow(genos) |
| p <- rep(NA,m) | | p <- rep(NA,m) |
Line 584: |
Line 654: |
| if ( m > 0 ) { | | if ( m > 0 ) { |
| for(i in 1:m) { | | for(i in 1:m) { |
− | r <- summary(lm(pheno~genos[i,]+cov-1))$coefficients[1,] | + | r <- summary(lm(pheno~genos[i,]+cov-1))$coefficients[1,] # run simple linear regression |
− | p[i] <- r[4] | + | p[i] <- r[4] # store p-value to p[i] |
− | add[i,] <- r[1:3] | + | add[i,] <- r[1:3] # store additional variables to add[i,] |
| } | | } |
| } | | } |
Line 679: |
Line 749: |
| # PVALUE : P-value from the test | | # PVALUE : P-value from the test |
| # Additional columns specified by return values 'add' | | # Additional columns specified by return values 'add' |
| + | |
| + | == Full ChangeLog == |
| + | * July 10th, 2014 : EPACTS v3.2.6 release |
| + | ** Minor bug fix in epacts-make-kin |
| + | * March 11th, 2014 : EPACTS v3.2.5 release |
| + | ** EMMAX-SKAT is implemented with major bug fix |
| + | * November 21th, 2013 : EPACTS v3.2.4 release |
| + | ** Fixed a number of minor bugs |
| + | ** Some known bugs still exist |
| + | *** SKAT-O Lambda eigenvalue error. This happenes in a particular context but haven't nailed down a way to prevent it yet. |
| + | *** EMMAX has case and control frequency flipped. |
| + | * EMMAX test has a silly known bug with case / ctrl frequency is flipped |
| + | * March 25th, 2013 : EPACTS v3.2.3 release |
| + | ** Relaxed the checking of low-rank matrix in SKAT tests (to avoid unncessary skipping of genes) |
| + | * March 13th, 2013 : EPACTS v3.2.2 release |
| + | ** Fixed an error which occasionally report mismatches in the number of samples |
| + | * March 9th, 2013 : EPACTS v3.2.1 release |
| + | **Fixed errors in loading the dynamic library |
| + | ** Fixed errors in SKAT-O (thanks to Anubha Mahajan and Jason Flannick) |
| + | ** Fixed bugs in emmax-CMC |
| + | ** Added emmax-SKAT (contributed by Seunngeun Lee) |
| + | ** And additional minor bug fixes |
| + | * February 28th, 2013 : EPACTS v3.2.0 release |
| + | ** R package installation bug (for some users) was fixed |
| + | ** A bug in the MAF error for high frequency variants (AF>0.25) was now fixed |
| + | ** SKAT version is updated to 0.81 |
| + | ** --bprange option is added to allow testing for small region size |
| + | ** Additional minor bug fixes |
| + | * December 4th, 2012 : EPACTS v3.1.0 release |
| + | ** Removed dependency on libR.so |
| + | ** Additional minor bug fixes |
| + | ** --bprange option is added to allow testing for small region size |
| + | ** November 25th, 2012 : EPACTS v3.0.0 release |
| + | ** Restructured with source code release (with autoconf / automake / libtools) |
| + | ** Added zoom plot feature |
| + | ** FRAC_BURDEN keyword was replace to FRAC_WITH_RARE for groupwise testing |
| + | * October 26th, 2012 : EPACTS v2.2.0-beta is released with the following updates |
| + | ** Added --max-mac option |
| + | ** Fixed Firth's bias-corrected test (by Clement Ma) |
| + | ** Added more informative warning messages when index files do not exist |
| + | ** Fixed the bug in the epacts-plot in plotting ties |
| + | ** Fixed errors in the MAF estimates per case and control |
| + | ** Fixed bug in --minRSQ option |
| + | * September 28, 2012 : EPACTS v2.11-beta is released with the following updates |
| + | ** Counts and allele frequencies for case/control added for binary tests |
| + | ** --max-maf parameter is added |
| + | ** Fixed EMMAX error in MAF in the output |
| + | ** More informative error messages |
| + | * September 27, 2012 : EPACTS v2.1-beta is released with the following updates |
| + | ** EMMAX interface is changed. --kinOnly option is related with a new command '''make-kin''' |
| + | ** SKAT-O is upgraded to version 0.77 with additional configurable parameter settings |
| + | ** Some parameter names are renamed (e.g. --min-maf, --min-mac) |
| + | ** Many minor bugs are fixed |
| + | * Jul 6, 2012 : EPACTS v2.01-beta is released with the following updates |
| + | ** SKAT-O is upgraded to version 0.76 |
| + | ** Fixed minor bugs in option names (Thanks to Xueling Sim) |
| + | * Jul 3, 2012 : EPACTS v2.0-beta is released with the following updates |
| + | ** Major restructuring of the software |
| + | ** Annotation software is switched with built-in application |
| + | ** Addition of SKAT-O and EMMAX burden test |
| + | ** Minor bug fixes |
| + | * Apr 8, 2012 : EPACTS v1.2-alpha is released with the following updates, in addition to the following updates |
| + | ** EMMAX bug in handling covariates was fixed |
| + | ** Variable Threshold Test is added |
| + | ** Variable Threshold Test with genomic score (e.g. GERP or PhyloP) is added. |
| + | * Apr 4, 2012 : EPACTS v1.1-alpha is released with the following updates, in addition to minor updates |
| + | ** EMMAX burden test (Hyun Min Kang) |
| + | ** Likelihood ratio test (Clement Ma) |
| + | ** Updated version of Firth bias-corrected likelihood ratio test (Clement Ma) |
| + | ** Updated version of EMMAX single variant test (Hyun Min Kang) |
| + | * Mar 29, 2012 : EPACTS v1.0-alpha is released |