Changes

From Genome Analysis Wiki
Jump to navigationJump to search
2,453 bytes added ,  10:05, 2 February 2017
Line 19: Line 19:  
For external users, follow the instruction at [[EPACTS]] page, summarized below.  
 
For external users, follow the instruction at [[EPACTS]] page, summarized below.  
   −
*http://www.sph.umich.edu/csg/kang/epacts/download/EPACTS-3.0.0.tar.gz (99MB)
+
*Please download the latest version of EPACTS here:  http://csg.sph.umich.edu//kang/epacts/download/
 
*Uncompress EPACTS package to the directory you would like to install and then type the following commands
 
*Uncompress EPACTS package to the directory you would like to install and then type the following commands
   Line 61: Line 61:  
=== A.  Convert dosage file into VCF format  ===
 
=== A.  Convert dosage file into VCF format  ===
   −
Use the wrapper program "dose2vcf" to convert your doseage output to pseudo VCF format.  Download the tool from [http://www.sph.umich.edu/csg/cfuchsb/dose2vcf_v0.5.gz here]. If you used rs numbers during imputation, you can find mapping tables ready for dose2vcf [http://www.sph.umich.edu/csg/cfuchsb/mapping_rs_ALL.GIANT.phase1_release_v3.20101123.tgz here (214 Mb) ]  
+
Use the wrapper program "dose2vcf" to convert your doseage output to pseudo VCF format.  Download the tool from [http://csg.sph.umich.edu//cfuchsb/dose2vcf_v0.5.gz here]. If you used rs numbers during imputation, you can find mapping tables ready for dose2vcf [http://csg.sph.umich.edu//cfuchsb/mapping_rs_ALL.GIANT.phase1_release_v3.20101123.tgz here (214 Mb) ]  
    
<br>  
 
<br>  
Line 74: Line 74:     
</pre>  
 
</pre>  
Note that for longer chromosomes, the program is quite memory intensive. &nbsp;In this case, please convert dosages in shorter sections of the chromosome. &nbsp;For example, if the imputation was performed by sections, then convert these sections to vcf first, and then merge the vcf files together using vcftools [http://vcftools.sourceforge.net/docs.html#concat vcf-concat]:  
+
Note that for longer chromosomes, the program is quite memory intensive. &nbsp;In this case, please convert dosages in shorter sections of the chromosome. &nbsp;For example, if the imputation was performed by sections, then convert these sections to vcf first, and then merge the vcf files together using vcftools [http://vcftools.sourceforge.net/docs.html#concat vcf-concat]:
    
=== B. &nbsp;bgzip and tabix VCF files  ===
 
=== B. &nbsp;bgzip and tabix VCF files  ===
Line 366: Line 366:  
=== A. Typical DIAGRAM analysis using existing association pipeline<br>  ===
 
=== A. Typical DIAGRAM analysis using existing association pipeline<br>  ===
   −
This is the typical DIAGRAM analysis using your current association pipeline and software. &nbsp; [[Image:1000Genomes march2012 imputation analysis plan 08312012.pdf]]  
+
This is the typical DIAGRAM analysis using your current association pipeline and software. &nbsp; [[Image:1000Genomes_march2012_imputation_analysis_plan_08312012_v2.pdf]] (Updated Dec 14, 2012)
 +
 
 +
For frequently asked questions regarding the file format, please see: &nbsp;[http://genome.sph.umich.edu/wiki/EPACTS_for_DIAGRAM#Results_FIle_Clarifications genome.sph.umich.edu/wiki/EPACTS_for_DIAGRAM#Results_FIle_Clarifications]
    
==== Alternative: &nbsp;Analyze VCF and PED files using the Wald test with the EPACTS software:  ====
 
==== Alternative: &nbsp;Analyze VCF and PED files using the Wald test with the EPACTS software:  ====
Line 374: Line 376:  
-test b.wald -pheno DISEASE -cov AGE -sepchr -anno -min-mac 1 -field EC -run 10
 
-test b.wald -pheno DISEASE -cov AGE -sepchr -anno -min-mac 1 -field EC -run 10
 
</pre>  
 
</pre>  
'''Important:''' To analyze dosages (not genotypes), you must specify the dosage field with the "--field EC" option. Without this option, you will be analyzing the hard genotypes (i.e. --field option defaults to "GT" or "genotypes")!  
+
'''Important:''' To analyze dosages (not genotypes), you must specify the dosage field with the "--field EC" option. Without this option, you will be analyzing the hard genotypes (i.e. --field option defaults to "GT" or "genotypes")!
    
=== B. Analysis of low frequency variants using Firth bias-corrected logistic regression  ===
 
=== B. Analysis of low frequency variants using Firth bias-corrected logistic regression  ===
Line 408: Line 410:  
Again use the Firth test on EPACTS for your analysis with BMI  
 
Again use the Firth test on EPACTS for your analysis with BMI  
   −
== 5. &nbsp;Report EPACTS results<br>  ==
+
== 5. &nbsp;Report results<br>  ==
 +
For '''analysis 1''', please follow the following results file guidelines: &nbsp; [[Image:1000Genomes_march2012_imputation_analysis_plan_08312012_v2.pdf]] (Updated Dec 14, 2012)
   −
For analyses 2 and 3, please upload the two epacts.gz files to the FTP server (ftp.broadinstitute.org):  
+
For frequently asked questions regarding the file format, please see: &nbsp;[http://genome.sph.umich.edu/wiki/EPACTS_for_DIAGRAM#Results_FIle_Clarifications genome.sph.umich.edu/wiki/EPACTS_for_DIAGRAM#Results_FIle_Clarifications]
 +
 
 +
 
 +
For '''analyses 2 and 3''', please upload the two epacts.gz files to the FTP server:  
    
#'''Firth test (no BMI): '''&nbsp;DIAGRAMv4_iSNPs_XXX_1000G_KKK_FBC_YYY_ZZZ.epacts.gz  
 
#'''Firth test (no BMI): '''&nbsp;DIAGRAMv4_iSNPs_XXX_1000G_KKK_FBC_YYY_ZZZ.epacts.gz  
Line 416: Line 422:     
<br>  
 
<br>  
 +
 +
The FTP hostname is: &nbsp;'''ftp.broadinstitute.org'''. &nbsp;Please place your files into to the /incoming/ directory.
 +
 +
    
Here's an example score test .epacts file  
 
Here's an example score test .epacts file  
Line 446: Line 456:  
= Troubleshooting Common Issues  =
 
= Troubleshooting Common Issues  =
   −
== EPACTS installation errors ==
+
== EPACTS installation errors ==
   −
 
+
== Errors when running EPACTS ==
 
  −
== Errors when running EPACTS ==
      
=== Rscript execution error: No such file or directory  ===
 
=== Rscript execution error: No such file or directory  ===
Line 476: Line 484:  
If you can find Rscript (e.g. /usr/bin/Rscript, /usr/local/bin/Rscript), or if you can re-install the full Rscript, you can simply avoid the problem by setting your environment variable.  
 
If you can find Rscript (e.g. /usr/bin/Rscript, /usr/local/bin/Rscript), or if you can re-install the full Rscript, you can simply avoid the problem by setting your environment variable.  
   −
Otherwise, Hyun will modify EPACTS to not requiring this (so you can run R CMD BATCH instead of Rscript).
+
Otherwise, Hyun will modify EPACTS to not requiring this (so you can run R CMD BATCH instead of Rscript).  
    
=== ERROR: No overlapping IDs between VCF and PED file. Cannot proceed.  ===
 
=== ERROR: No overlapping IDs between VCF and PED file. Cannot proceed.  ===
Line 506: Line 514:     
</pre>  
 
</pre>  
<br> The genotype information has FORMAT "GT:EC". &nbsp;For the first SNP (chr11:180567) and individual A001, the genotype is 1/1 and dosage is 2.0000. &nbsp;To access the dosages, you must specify the option "-field EC"
+
<br> The genotype information has FORMAT "GT:EC". &nbsp;For the first SNP (chr11:180567) and individual A001, the genotype is 1/1 and dosage is 2.0000. &nbsp;To access the dosages, you must specify the option "-field EC".
 +
 
 +
 
 +
 
 +
== Results FIle Clarifications ==
 +
 
 +
=== 1. How do I code the INDEL variant names and alleles?  ===
 +
 
 +
Please use the variant name and the allele name directly from IMPUTE or minimac. Please do NOT recode variant names or alleles. We will do this step in the analysis for consistency.
 +
 
 +
ACTION IF YOU HAVE UPLOADED YOUR FILE: If you have recoded your INDEL alleles, please tell us so we can remove your file and let us know when you can reupload with the original variable and allele names.
 +
 
 +
=== <br>2. The document asks for the number of homozygotes and heterozygotes in case and control. How do I get this from my data? Is this relevant for imputed data?  ===
 +
 
 +
These numbers were relevant to genotyped data but not for imputed data. We didn't intend to ask for this. To retain the same file format between results already submitted and those to be submitted please retain the columns with a "." for the value.
 +
 
 +
ACTION IF YOU HAVE UPLOADED YOUR FILE: No action. You do not need to redo the file. We will skip these columns.
 +
 
 +
=== 3. For the "Imputed" variable, what does imputed mean in the context of the data output from MACH and IMPUTE?  ===
 +
 
 +
This is a hold over from the last round of analysis where we asked for results separately from genotyped SNPs and imputed SNPs and wanted to distinguish between the two. We will use r2_hat or info measures to estimate the accuracy of the genotypes. This column will be retained for consistency with files already submitted but should be filled in with "." or "1". It will not be used in the analysis.
 +
 
 +
ACTION IF YOU HAVE UPLOADED YOUR FILE: No action. You do not need to redo the file. We will skip this column.
96

edits

Navigation menu