Changes

EPACTS for DIAGRAM (view source)

Revision as of 10:58, 11 October 2012

93 bytes added , 10:58, 11 October 2012

→‎A. Typical DIAGRAM analysis using existing association pipeline

Line 8: Line 8:

#Download and install EPACTS

−

#Prepare VCF file with genotypes / dosages

+

#Prepare VCF file with genotypes / dosages

#Prepare PED file with phenotypes and covariates

#Run EPACTS association pipeline

Line 28: Line 28:

./ref_download.sh

(For advanced users, to save time for downloading the FASTA files (~900MB), you may copy a local copy of GRCh37 FASTA file and the index file to ${EPACTS_DIR}/ext/ref/)

−

</pre>

+

</pre>

*Perform a test run by running the following command

Line 53: Line 53:

This command will run the single variant test on the input VCF and PED files, with a minimum MAF threshold of 0.001.  The phenotype is "DISEASE" and we are adjusting the analysis with covariates AGE and SEX.  The output file directory prefix is {OUTPUT_DIR}/test.  Finally, EPACTS will run the analysis in parallel on 2 CPUs.

−

A more detailed description of the example can be found [http://genome.sph.umich.edu/wiki/Test_EPACTS_for_DIAGRAM here].

+

A more detailed description of the example can be found [http://genome.sph.umich.edu/wiki/Test_EPACTS_for_DIAGRAM here].

−

== 2.  Prepare VCF file with genotypes / dosages ==

+

== 2.  Prepare VCF file with genotypes / dosages ==

−

EPACTS requires input genotype / doseage information in VCF format.  From minimac or Impute2, you wil start with your imputed dosage file.

+

EPACTS requires input genotype / doseage information in VCF format.  From minimac or Impute2, you wil start with your imputed dosage file.

=== A.  Convert doseage file into VCF format ===

Line 69: Line 69:

<pre>> dose2vcf/dose2vcf --dose FUSION.GWAS.1KG.imp.chr1.dose --info FUSION.GWAS.1KG.imp.chr1.info --out out/FUSION.GWAS.1KG.imp.chr1</pre>

The expected output file is:

−

<pre>out/FUSION.GWAS.1KG.imp.chr1.vcf</pre>

+

<pre>out/FUSION.GWAS.1KG.imp.chr1.vcf</pre>

−

+

=== B.  bgzip and tabix VCF files ===

−

=== B.  bgzip and tabix VCF files ===

−

Input VCF file must be bgzipped and tabixed before running association to allow efficient random access of the file. Below is an example command to conver plain VCF into bgzipped and tabixed VCF

+

Input VCF file must be bgzipped and tabixed before running association to allow efficient random access of the file. Below is an example command to conver plain VCF into bgzipped and tabixed VCF

<pre>bgzip input.vcf ## this command will produce input.vcf.gz

tabix -pvcf -f input.vcf.gz ## this command will produce input.vcf.gz.tbi

−

</pre>

+

</pre>

−

If the VCF file is separated by chromosome, the VCF file must contain the string "chr1" in the chromosome 1 file, and corresponding chromosome name for other chromosomes.<br>Sample IDs in the VCF file must be consistent to those from PED file

+

If the VCF file is separated by chromosome, the VCF file must contain the string "chr1" in the chromosome 1 file, and corresponding chromosome name for other chromosomes.<br>Sample IDs in the VCF file must be consistent to those from PED file

== 3.  Prepare PED file for phenotypes and covariates ==

Line 125: Line 124:

M QT

M AGE

−

</pre>

+

</pre>

−

== 4.  Run EPACTS association pipeline ==

Line 230: Line 228:

=== A. Typical DIAGRAM analysis using existing association pipeline<br> ===

−

This is the typical DIAGRAM analysis using your current association pipeline and software.   [[Image:Example.jpg]]

+

This is the typical DIAGRAM analysis using your current association pipeline and software.   [[Image:Example.jpg]]

=== B. Analysis of all SNPs using logistic regression score test ===

Line 248: Line 246:

To run the Firth test using the EPACTS software:

<pre>epacts2.1/epacts single -vcf [INPUT VCF FILENAME] -ped [INPUT PED FILENAME] -out [OUTPUT FILENAME PREFIX] \

−

-test b.firth -pheno DISEASE -cov AGE -sepchr -anno -min-mac 1 -max-mac 200 -run 10</pre>

+

-test b.firth -pheno DISEASE -cov AGE -sepchr -anno -min-mac 1 -max-mac 200 -run 10</pre>

−

== 5.  Report EPACTS results<br> ==

Line 270: Line 267:

20 76771 76771 20:76771_C/T_Nonsynonymous:DEFB125 266 3 1 0.0056391 0.68484 0.40587 145 121 0.013793 0.0082645

−

</pre>

+

</pre>

+

Each column represents

−

~~Each column represents~~

+

*CHROM : Chromosome Name

−

* CHROM : Chromosome Name

+

*BEGIN, END : Base position of the variant on each side

−

* BEGIN, END : Base position of the variant on each side

+

*MARKER_ID : [CHROM]:[POS]_[REF]/[ALT]_[ANNOTATION] formatted marker ID. [ANNOTATION] information will be available only with --anno option

−

* MARKER_ID : [CHROM]:[POS]_[REF]/[ALT]_[ANNOTATION] formatted marker ID. [ANNOTATION] information will be available only with --anno option

+

*NS : Number of samples with non-missing genotypes

−

* NS : Number of samples with non-missing genotypes

+

*AC : Non-reference allele count

−

* AC : Non-reference allele count

+

*CALLRATE : Genotype call rate

−

* CALLRATE : Genotype call rate

+

*MAF : Minor allele frequency

−

* MAF : Minor allele frequency

+

*PVALUE : P-value

−

* PVALUE : P-value

−

The rest of columns varies by statistical tests. For example, in b.score test, SCORE represents score test statistics, N.CASE and N.CTRL represents the case/control counts, and AF.CASE and AF.CTRL represents the case/control allele frequencies.

+

The rest of columns varies by statistical tests. For example, in b.score test, SCORE represents score test statistics, N.CASE and N.CTRL represents the case/control counts, and AF.CASE and AF.CTRL represents the case/control allele frequencies.

+

Clement Ma

216

edits

Changes

EPACTS for DIAGRAM (view source)

Revision as of 10:58, 11 October 2012

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools