Changes

From Genome Analysis Wiki
Jump to navigationJump to search
93 bytes added ,  10:58, 11 October 2012
Line 8: Line 8:     
#Download and install EPACTS  
 
#Download and install EPACTS  
#Prepare VCF file with genotypes / dosages
+
#Prepare VCF file with genotypes / dosages  
 
#Prepare PED file with phenotypes and covariates  
 
#Prepare PED file with phenotypes and covariates  
 
#Run EPACTS association pipeline  
 
#Run EPACTS association pipeline  
Line 28: Line 28:  
./ref_download.sh
 
./ref_download.sh
 
(For advanced users, to save time for downloading the FASTA files (~900MB), you may copy a local copy of GRCh37 FASTA file and the index file to ${EPACTS_DIR}/ext/ref/)
 
(For advanced users, to save time for downloading the FASTA files (~900MB), you may copy a local copy of GRCh37 FASTA file and the index file to ${EPACTS_DIR}/ext/ref/)
</pre>
+
</pre>  
 
*Perform a test run by running the following command
 
*Perform a test run by running the following command
   Line 53: Line 53:  
This command will run the single variant test on the input VCF and PED files, with a minimum MAF threshold of 0.001. &nbsp;The phenotype is "DISEASE" and we are adjusting the analysis with covariates AGE and SEX. &nbsp;The output file directory prefix is {OUTPUT_DIR}/test. &nbsp;Finally, EPACTS will run the analysis in parallel on 2 CPUs.  
 
This command will run the single variant test on the input VCF and PED files, with a minimum MAF threshold of 0.001. &nbsp;The phenotype is "DISEASE" and we are adjusting the analysis with covariates AGE and SEX. &nbsp;The output file directory prefix is {OUTPUT_DIR}/test. &nbsp;Finally, EPACTS will run the analysis in parallel on 2 CPUs.  
   −
A more detailed description of the example can be found [http://genome.sph.umich.edu/wiki/Test_EPACTS_for_DIAGRAM here].
+
A more detailed description of the example can be found [http://genome.sph.umich.edu/wiki/Test_EPACTS_for_DIAGRAM here].  
   −
== 2. &nbsp;Prepare VCF file with genotypes / dosages ==
+
== 2. &nbsp;Prepare VCF file with genotypes / dosages ==
   −
EPACTS requires input genotype / doseage information in VCF format. &nbsp;From minimac or Impute2, you wil start with your imputed dosage file.
+
EPACTS requires input genotype / doseage information in VCF format. &nbsp;From minimac or Impute2, you wil start with your imputed dosage file.  
    
=== A. &nbsp;Convert doseage file into VCF format  ===
 
=== A. &nbsp;Convert doseage file into VCF format  ===
Line 69: Line 69:  
<pre>&gt; dose2vcf/dose2vcf --dose FUSION.GWAS.1KG.imp.chr1.dose --info FUSION.GWAS.1KG.imp.chr1.info --out out/FUSION.GWAS.1KG.imp.chr1</pre>  
 
<pre>&gt; dose2vcf/dose2vcf --dose FUSION.GWAS.1KG.imp.chr1.dose --info FUSION.GWAS.1KG.imp.chr1.info --out out/FUSION.GWAS.1KG.imp.chr1</pre>  
 
The expected output file is:  
 
The expected output file is:  
<pre>out/FUSION.GWAS.1KG.imp.chr1.vcf</pre>
+
<pre>out/FUSION.GWAS.1KG.imp.chr1.vcf</pre>  
 
+
=== B. &nbsp;bgzip and tabix VCF files ===
=== B. &nbsp;bgzip and tabix VCF files ===
     −
Input VCF file must be bgzipped and tabixed before running association to allow efficient random access of the file. Below is an example command to conver plain VCF into bgzipped and tabixed VCF
+
Input VCF file must be bgzipped and tabixed before running association to allow efficient random access of the file. Below is an example command to conver plain VCF into bgzipped and tabixed VCF  
 
<pre>bgzip input.vcf ## this command will produce input.vcf.gz
 
<pre>bgzip input.vcf ## this command will produce input.vcf.gz
 
tabix -pvcf -f input.vcf.gz ## this command will produce input.vcf.gz.tbi
 
tabix -pvcf -f input.vcf.gz ## this command will produce input.vcf.gz.tbi
</pre>
+
</pre>  
If the VCF file is separated by chromosome, the VCF file must contain the string "chr1" in the chromosome 1 file, and corresponding chromosome name for other chromosomes.<br>Sample IDs in the VCF file must be consistent to those from PED file
+
If the VCF file is separated by chromosome, the VCF file must contain the string "chr1" in the chromosome 1 file, and corresponding chromosome name for other chromosomes.<br>Sample IDs in the VCF file must be consistent to those from PED file  
    
== 3. &nbsp;Prepare PED file for phenotypes and covariates  ==
 
== 3. &nbsp;Prepare PED file for phenotypes and covariates  ==
Line 125: Line 124:  
M QT
 
M QT
 
M AGE
 
M AGE
</pre>
+
</pre>  
 
   
== 4. &nbsp;Run EPACTS association pipeline  ==
 
== 4. &nbsp;Run EPACTS association pipeline  ==
   Line 230: Line 228:  
=== A. Typical DIAGRAM analysis using existing association pipeline<br>  ===
 
=== A. Typical DIAGRAM analysis using existing association pipeline<br>  ===
   −
This is the typical DIAGRAM analysis using your current association pipeline and software. &nbsp; [[Image:Example.jpg]]
+
This is the typical DIAGRAM analysis using your current association pipeline and software. &nbsp; [[Image:Example.jpg]]  
    
=== B. Analysis of all SNPs using logistic regression score test  ===
 
=== B. Analysis of all SNPs using logistic regression score test  ===
Line 248: Line 246:  
To run the Firth test using the EPACTS software:  
 
To run the Firth test using the EPACTS software:  
 
<pre>epacts2.1/epacts single -vcf [INPUT VCF FILENAME] -ped [INPUT PED FILENAME] -out [OUTPUT FILENAME PREFIX] \
 
<pre>epacts2.1/epacts single -vcf [INPUT VCF FILENAME] -ped [INPUT PED FILENAME] -out [OUTPUT FILENAME PREFIX] \
-test b.firth -pheno DISEASE -cov AGE -sepchr -anno -min-mac 1 -max-mac 200 -run 10</pre>
+
-test b.firth -pheno DISEASE -cov AGE -sepchr -anno -min-mac 1 -max-mac 200 -run 10</pre>  
 
   
== 5. &nbsp;Report EPACTS results<br>  ==
 
== 5. &nbsp;Report EPACTS results<br>  ==
   Line 270: Line 267:  
20      76771  76771  20:76771_C/T_Nonsynonymous:DEFB125      266    3      1      0.0056391      0.68484 0.40587 145    121    0.013793        0.0082645
 
20      76771  76771  20:76771_C/T_Nonsynonymous:DEFB125      266    3      1      0.0056391      0.68484 0.40587 145    121    0.013793        0.0082645
   −
</pre>
+
</pre>  
 +
Each column represents
   −
Each column represents
+
*CHROM&nbsp;: Chromosome Name  
* CHROM : Chromosome Name
+
*BEGIN, END&nbsp;: Base position of the variant on each side  
* BEGIN, END : Base position of the variant on each side
+
*MARKER_ID&nbsp;: [CHROM]:[POS]_[REF]/[ALT]_[ANNOTATION] formatted marker ID. [ANNOTATION] information will be available only with --anno option  
* MARKER_ID : [CHROM]:[POS]_[REF]/[ALT]_[ANNOTATION] formatted marker ID. [ANNOTATION] information will be available only with --anno option
+
*NS&nbsp;: Number of samples with non-missing genotypes  
* NS : Number of samples with non-missing genotypes
+
*AC&nbsp;: Non-reference allele count  
* AC : Non-reference allele count
+
*CALLRATE&nbsp;: Genotype call rate  
* CALLRATE : Genotype call rate
+
*MAF&nbsp;: Minor allele frequency  
* MAF : Minor allele frequency
+
*PVALUE&nbsp;: P-value
* PVALUE : P-value
     −
The rest of columns varies by statistical tests. For example, in b.score test, SCORE represents score test statistics, N.CASE and N.CTRL represents the case/control counts, and AF.CASE and AF.CTRL represents the case/control allele frequencies.
+
The rest of columns varies by statistical tests. For example, in b.score test, SCORE represents score test statistics, N.CASE and N.CTRL represents the case/control counts, and AF.CASE and AF.CTRL represents the case/control allele frequencies.  
 +
<!-- Tidy found serious XHTML errors -->
216

edits

Navigation menu