Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,230 bytes added ,  10:28, 2 February 2017
Line 1: Line 1:  
= Download EPACTS  =
 
= Download EPACTS  =
   −
EPACTS is available for download [http://www.sph.umich.edu/csg/kang/epacts/download/epacts_v2_01.noref_binary.2012_07_06.tar.gz here (100Mb) ].  
+
EPACTS is available for download [http://csg.sph.umich.edu//kang/epacts/download/epacts_v2.12.noref_binary.2012_10_01.tar.gz here (100Mb) ].  
    
Requirements  
 
Requirements  
Line 11: Line 11:     
Uncompress EPACTS package to the directory you would like to install  
 
Uncompress EPACTS package to the directory you would like to install  
 +
<pre> tar xzvf epacts_v2_12.noref_binary.2012_10_01.tar.gz</pre>
 +
Download the reference FASTA files from 1000 Genomes FTP automatically by running the following commands<br>
 +
<pre>cd epacts2.1/
 +
./ref_download.sh
 +
(For advanced users, to save time for downloading the FASTA files (~900MB), you may copy a local copy of GRCh37 FASTA file and the index file to ${EPACTS_DIR}/ext/ref/)
 +
</pre>
   −
  tar xzvf epacts_v2_01.noref_binary.2012_07_06.tar.gz
+
= Accessing help =
 
  −
= Example =
  −
 
  −
Once installed, test out the software by running a quick example using the test data provided in the "example" directory. The example VCF and PED files are:
  −
<pre>${EPACTS_DIRECTORY}/example/1000G_exome_chr20_example_softFiltered.calls.vcf.gz
  −
 
  −
${EPACTS_DIRECTORY}/example/1000G_dummy_pheno.ped
  −
</pre>
  −
<br> Run the single variant score test on the example data using this command:
  −
<pre>${EPACTS_DIR}/epacts single \
  −
--vcf {EPACTS_DIR}/example/1000G_exome_chr20_example_softFiltered.calls.vcf.gz \
  −
--ped {EPACTS_DIR}/example/1000G_dummy_pheno.ped \
  −
--min-maf 0.001 --chr 20 --pheno DISEASE --cov AGE --cov SEX --test b.score --anno \
  −
--out {OUTPUT_DIR}/test --run 2 &amp;
  −
</pre>
  −
This command will run the '''single '''variant test on the input '''VCF '''and '''PED '''files, with a '''minimum MAF''' threshold of 0.001. &nbsp;The '''phenotype '''is "DISEASE" and we are adjusting the analysis with '''covariates '''AGE and SEX. &nbsp;The '''output file directory prefix''' is {OUTPUT_DIR}/test. &nbsp;Finally, EPACTS will run the analysis '''in parallel on 2 CPUs'''.
  −
 
  −
== Expected output  ==
  −
 
  −
EPACTS produces a number of files and plots.
  −
 
  −
1. &nbsp;'''test.epacts.gz'''&nbsp;contains all the association results.
  −
<pre>&gt; head test.epacts
  −
#CHROM BEGIN END MARKER_ID NS AC CALLRATE MAF PVALUE SCORE
  −
20 68303 68303 20:68303_A/G_Upstream:DEFB125 266 1 1 0.0018797 NA NA
  −
20 68319 68319 20:68319_C/A_Upstream:DEFB125 266 0 1 0 NA NA
  −
20 68396 68396 20:68396_C/T_Nonsynonymous:DEFB125 266 1 1 0.0018797 NA NA
  −
20 76635 76635 20:76635_A/T_Intron:DEFB125 266 0 1 0 NA NA
  −
20 76689 76689 20:76689_T/C_Synonymous:DEFB125 266 0 1 0 NA NA
  −
20 76690 76690 20:76690_T/C_Nonsynonymous:DEFB125 266 1 1 0.0018797 NA NA
  −
20 76700 76700 20:76700_G/A_Nonsynonymous:DEFB125 266 0 1 0 NA NA
  −
20 76726 76726 20:76726_C/G_Nonsynonymous:DEFB125 266 0 1 0 NA NA
  −
20 76771 76771 20:76771_C/T_Nonsynonymous:DEFB125 266 3 1 0.0056391 0.68484 0.40587
  −
 
  −
</pre>
  −
2. &nbsp;'''test.epacts.top5000''' contains the top 5000 associated variants ordered by p-value.
  −
<pre>$ head out/test.single.b.score.epacts.top5000
  −
#CHROM BEGIN END MARKER_ID NS AC CALLRATE MAF PVALUE SCORE
  −
20 1610894 1610894 20:1610894_G/A_Synonymous:SIRPG 266 136 1 0.25564 0.0001097 3.8681
  −
20 4162411 4162411 20:4162411_T/C_Intron:SMOX 266 204 1 0.38346 0.00055585 -3.4523
  −
20 34061918 34061918 20:34061918_T/C_Intron:CEP250 266 39 1 0.073308 0.0011231 3.2577
  −
20 4155948 4155948 20:4155948_G/A_Intron:SMOX 266 215 1 0.40414 0.0020791 -3.0787
  −
20 4680251 4680251 20:4680251_A/G_Nonsynonymous:PRNP 266 186 1 0.34962 0.0025962 3.0119
  −
20 36668874 36668874 20:36668874_G/A_Synonymous:RPRD1B 266 96 1 0.18045 0.003031 2.9646
  −
20 36641871 36641871 20:36641871_G/A_Synonymous:TTI1 266 10 1 0.018797 0.004308 -2.8547
  −
20 32664926 32664926 20:32664926_G/A_Nonsynonymous:RALY 266 20 1 0.037594 0.0046365 2.8313
  −
20 34288854 34288854 20:34288854_C/T_Utr3:ROMO1 266 28 1 0.052632 0.0047722 2.822
  −
</pre>
  −
3. &nbsp;'''test.epacts.qq.pdf''' contains the Q-Q plot of test p-values (stratified by MAF)<br>[[Image:Test b score epacts qq.png]]
     −
4. &nbsp;'''test.epacts.mh.pdf''' contains the Manhattan Plot of test p-values<br>The file out/test.b.score.epacts.mh.pdf will be generated for chr20 only.
+
For a list of commands available in EPACTS, type in the following commands:
 +
<pre>$ epacts2.1/epacts help
   −
[[Image:Test b score epacts mh.png]]
  −
  −
An example Genome-wide manhattan plot (from a genome-wide run) will look like below<br><br> [[Image:Tes b score epacts mh gw.png]] <br>
  −
  −
= Additional options =
  −
  −
Type in the following command to view additional options available in EPACTS.
  −
<pre>&gt; /net/fantasia/home/hmkang/sw/epacts2/epacts help
   
Usage:
 
Usage:
 
epacts [command] [options]
 
epacts [command] [options]
Line 84: Line 34:  
zoom Create a locus zoom plot from epacts results
 
zoom Create a locus zoom plot from epacts results
 
meta Perform meta-analysis across multiple epacts results
 
meta Perform meta-analysis across multiple epacts results
 +
make-group Create the group information for gene-based testing
 +
make-kin Create a kinship matrix
    
Visit http://genome.sph.umich.edu/wiki/EPACTS for more detailed documentation
 
Visit http://genome.sph.umich.edu/wiki/EPACTS for more detailed documentation
    
</pre>
 
</pre>
To view options for single variant testing only type in:
+
 
<pre>&gt; /net/fantasia/home/hmkang/sw/epacts2/epacts single -help
+
 
 +
To view options for single variant testing only type in:  
 +
<pre>$ epacts2.1/epacts single -help
 
Usage:
 
Usage:
 
epacts single [options]
 
epacts single [options]
Line 98: Line 52:  
-out STR Prefix of output files
 
-out STR Prefix of output files
 
-test STR Statistical test to use
 
-test STR Statistical test to use
 +
...
 +
</pre>
   −
Key Options (Run epacts single -man or see wiki for more info):
+
= Getting started in EPACTS with an example  =
-help Print out brief help message [OFF]
+
 
-man Print the full documentation in man page style [OFF]
+
Once installed, test out the software by running a quick example using the test data provided in the "example" directory. The example VCF and PED files are:
-pheno STR Name of phenotype column from PED file [6th column]
+
<pre>$ epacts2.1/example/1000G_exome_chr20_example_softFiltered.calls.vcf.gz
-cov STR Name of covariate column(s) from PED file. []
+
 
-field STR VCF's FORMAT field of genotypes or dosages [GT]
+
$ epacts2.1/example/1000G_dummy_pheno.ped
-unit INT Base pair units for a parallel run [10000000]
+
</pre>
-sepchr Indicator of separated VCF per chromosome [OFF]
+
<br> Run the single variant score test on the example data using this command:
-anno Annotate the results with functional category [OFF]
+
<pre>$ epacts2.1/epacts single \
-run INT Run EPACTS immediately with specified # CPUs [0]
+
--vcf epacts2.1/example/1000G_exome_chr20_example_softFiltered.calls.vcf.gz \
-min-maf FLT Minimum minor allele frequency [1e-6]
+
--ped epacts2.1/example/1000G_dummy_pheno.ped \
-min-callrate FLT Minimum call rate [0.5]
+
--min-maf 0.001 --chr 20 --pheno DISEASE --cov AGE --cov SEX --test b.score --anno \
 +
--out {OUTPUT_DIR}/test --run 2 &amp;
 +
</pre>
 +
This command will run the '''single '''variant test on the input '''VCF '''and '''PED '''files, with a '''minimum MAF''' threshold of 0.001. &nbsp;The '''phenotype '''is "DISEASE" and we are adjusting the analysis with '''covariates '''AGE and SEX. &nbsp;The '''output file directory prefix''' is {OUTPUT_DIR}/test. &nbsp;Finally, EPACTS will run the analysis '''in parallel on 2 CPUs'''.
 +
 
 +
== Expected output  ==
 +
 
 +
EPACTS produces a number of files and plots.
 +
 
 +
1. &nbsp;'''test.epacts.gz'''&nbsp;contains all the association results.
 +
<pre>$ head {OUTPUT_DIR}/test.epacts
 +
#CHROM  BEGIN  END    MARKER_ID      NS      AC      CALLRATE        MAF    PVALUE  SCORE  N.CASE  N.CTRL  AF.CASE AF.CTRL
 +
20      68303  68303  20:68303_A/G_Upstream:DEFB125  266    1      1      0.0018797      NA      NA      NA      NA      NA      NA
 +
20      68319  68319  20:68319_C/A_Upstream:DEFB125  266    0      1      0      NA      NA      NA      NA      NA      NA
 +
20      68396  68396  20:68396_C/T_Nonsynonymous:DEFB125      266    1      1      0.0018797      NA      NA      NA      NA      NA      NA
 +
20      76635  76635  20:76635_A/T_Intron:DEFB125    266    0      1      0      NA      NA      NA      NA      NA      NA
 +
20      76689  76689  20:76689_T/C_Synonymous:DEFB125 266    0      1      0      NA      NA      NA      NA      NA      NA
 +
20      76690  76690  20:76690_T/C_Nonsynonymous:DEFB125      266    1      1      0.0018797      NA      NA      NA      NA      NA      NA
 +
20      76700  76700  20:76700_G/A_Nonsynonymous:DEFB125      266    0      1      0      NA      NA      NA      NA      NA      NA
 +
20      76726  76726  20:76726_C/G_Nonsynonymous:DEFB125      266    0      1      0      NA      NA      NA      NA      NA      NA
 +
20      76771  76771  20:76771_C/T_Nonsynonymous:DEFB125      266    3      1      0.0056391      0.68484 0.40587 145    121    0.013793        0.0082645
 +
 
 +
</pre>
 +
The columns in the results file are:
 +
 
 +
#CHROM: &nbsp;chromosome
 +
#BEGIN: &nbsp;starting position
 +
#END: ending position (same as BEGIN if a SNP)
 +
#MARKER_ID: &nbsp;name of varian
 +
#NS: &nbsp;Number of samples (cases + controls)
 +
#AC: &nbsp;Total allele count in sample
 +
#CALLRATE: &nbsp;call rate
 +
#MAF: &nbsp;minor allele frequency in full sample
 +
#PVALUE: &nbsp;score test association p-value
 +
#SCORE: &nbsp;test statistic for score test
 +
#N.CASE: &nbsp;number of cases
 +
#N.CTRL: &nbsp;number of controls
 +
#AF.CASE: &nbsp;allele frequency in cases only
 +
#AF.CTRL: &nbsp;allele frequency in controls only
 +
 
 +
Note: &nbsp;For variants below the minimum MAF threshold (min-maf = 0.001), the number of cases and controls (N.CASE, N.CTRL) are not outputted (listed as "NA").
 +
 
 +
<br>
 +
 
 +
2. &nbsp;'''test.epacts.top5000''' contains the top 5000 associated variants ordered by p-value.
 +
<pre>$ head {OUTPUT_DIR}/test.epacts.top5000
 +
#CHROM  BEGIN  END    MARKER_ID      NS      AC      CALLRATE        MAF    PVALUE  SCORE  N.CASE  N.CTRL  AF.CASE AF.CTRL
 +
20      1610894 1610894 20:1610894_G/A_Synonymous:SIRPG 266    136    1      0.25564 0.0001097      3.8681  145    121    0.64138 0.35537
 +
20      4162411 4162411 20:4162411_T/C_Intron:SMOX      266    204    1      0.38346 0.00055585      -3.4523 145    121    0.62759 0.93388
 +
20      34061918        34061918        20:34061918_T/C_Intron:CEP250  266    39      1      0.073308        0.0011231      3.2577  145    121    0.21379 0.066116
 +
20      4155948 4155948 20:4155948_G/A_Intron:SMOX      266    215    1      0.40414 0.0020791      -3.0787 145    121    0.68276 0.95868
 +
20      4680251 4680251 20:4680251_A/G_Nonsynonymous:PRNP      266    186    1      0.34962 0.0025962      3.0119  145    121    0.8069  0.57025
 +
20      36668874        36668874        20:36668874_G/A_Synonymous:RPRD1B      266    96      1      0.18045 0.003031        2.9646  145    121    0.44828 0.2562
 +
20      36641871        36641871        20:36641871_G/A_Synonymous:TTI1 266    10      1      0.018797        0.004308        -2.8547 145    121    0.0068966      0.07438
 +
20      32664926        32664926        20:32664926_G/A_Nonsynonymous:RALY      266    20      1      0.037594        0.0046365      2.8313  145    121    0.11724 0.024793
 +
20      34288854        34288854        20:34288854_C/T_Utr3:ROMO1      266    28      1      0.052632        0.0047722      2.822  145    121    0.15862 0.041322
 +
 
 +
</pre>
 +
3. &nbsp;'''test.epacts.qq.pdf''' contains the Q-Q plot of test p-values (stratified by MAF)<br>[[Image:Test b score epacts qq.png]]  
 +
 
 +
4. &nbsp;'''test.epacts.mh.pdf''' contains the Manhattan Plot of test p-values<br>The file out/test.b.score.epacts.mh.pdf will be generated for chr20 only.
 +
 
 +
[[Image:Test b score epacts mh.png]]  
   −
Other Options (Run epacts single -man or see wiki for more info):
+
An example Genome-wide manhattan plot (from a genome-wide run) will look like below<br><br> [[Image:Tes b score epacts mh gw.png]] <br>
-all-cov Use all possible covariates from PED file [OFF]
  −
-chr STR Specific chromosome to run association []
  −
-pass use only pass-filtered sites [OFF]
  −
-info STR substring in the INFO field to be matched []
  −
-kinf STR Kinship file if '-test q.oemmax' is used []
  −
-kin-only Create kinship matrix only [OFF]
  −
-inv-norm Inverse-normal transformation of phenotypes [OFF]
  −
-restart Ignore intermediate results and restart [OFF]
  −
-nodes STR Comma-separated list of MOSIX cluster nodes []
  −
-missing STR String representing missing value [NA]
  −
-noplot Skip producing the Manhattan and QQ plots [OFF]
  −
-topzoom INT Produce locus zoom plot for top N signals [0]
  −
...</pre>
 
96

edits

Navigation menu