EMADS Primary Analysis Plan

From Genome Analysis Wiki
Revision as of 15:38, 4 April 2013 by Svrieze (talk | contribs) (Created page with 'Exome Meta-Analysis of Drinking and Smoking (EMADS) Draft Analysis Plan == Genotypes == All samples have some version of the Exome Chip or exome/whole genome sequences. Individ…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Exome Meta-Analysis of Drinking and Smoking (EMADS) Draft Analysis Plan

Genotypes

All samples have some version of the Exome Chip or exome/whole genome sequences. Individual studies will provide information about the manufacturer and version of the exome chip, or sequencing platform, they are using.

Inclusion Criteria

For our first analysis, samples must be between ages 18 and 70 (inclusive) and be of European ancestry. We hope to extend analysis to other ancestral groups in the future.

Quality Control

We leave calling algorithms, marker filters, and sample filters to the discretion of local sites, although we will evaluate the possibility of batch effects (where batch might be a study) during the meta-analysis step.

For reference, four currently participating studies have used Illumina chips and Illumina’s genotype caller in Genome Studio (Gencall). Some studies also implemented some manual curation involving reclustering the intensity data of ~1500 markers.

Strand Orientation

Chip TOP allele annotations (typical output from Gencall) need to be updated to the forward strand of build 37.

The strand file for exome chip version 12v1_A is available at: http://www.well.ox.ac.uk/~wrayner/strand/HumanExome-12v1_A-b37-strand.zip

Usage instructions, including scripts, are available here: http://www.well.ox.ac.uk/~wrayner/strand/

Future strand files will also be available at that site.

Primary Phenotypes

Average cigarettes smoked per day, either as a current smoker or former smoker

Individuals who either never smoked, or on whom we have no data (e.g., someone was a former smoker but former smoking was never assessed) will be excluded from analysis. Only cigarettes will be included in the estimate. If preferable, repeated measures designs (longitudinal data) can use all assessments by scaling and correcting for covariates within waves of assessment, then averaging across assessments.

There was some cross-study variability on this measure. Some studies specified avg smoking during a specific window, such as the last 12 months; most made no such specification. One study allowed respondents to report packs.

Smoking Initiation

Every study had some useable measure of whether a respondent has ever regularly smoked. Almost all asked directly. Some have necessary information for this variable (e.g., 100 cigs lifetime? Ever smoked every day for 2 weeks straight?).

Note that we’re among the first groups conducting such meta-analyses, and our analysis pipeline is currently restricted to continuous traits. Until methods are developed for binary traits, it is proposed that we analyze smoking initation as a continuous trait. Average drinks per week, either as a current drinker or former drinker. Individuals who either never drank, or on whom we have no data (e.g., someone was a former drinker but former drinking was not assessed) will be excluded from analysis. All types of liquor will be combined in the total estimate. If preferable, repeated measures designs (longitudinal data) can use all assessments by scaling and correcting for covariates within waves of assessment, then averaging across assessments.

There was some cross-study variability on this measure. Some studies specified avg drinking during a specific window, such as the 12 months or last one month; most made no such specification. Two studies forced the respondent to select ranges.

Secondary Phenotypes

Pack Years Number of packs of cigarettes per day multiplied by the number of years the person has smoked, corrected for age. Age of Initiation of Smoking The age an individual first became a regular smoker. Covariates Appropriate covariates can often be study-specific. We will depend on local investigators to determine the most appropriate covariates. We list here some covariates that will likely be necessary.

Main Effects 1. Age of assessment for current smokers/drinkers a. At assessment for current smokers/drinkers 2. Age of smoking/drinking for former smokers/drinkers a. Could be age at quitting 3. Age of assessment for Pack Years, Smoking Initiation, and Age of Smoking Initiation 4. Sex 5. Date of birth (or year, or range) 6. Cohort 7. Height, weight, BMI, for drinking (a single beer has different effects on a 200 lb man versus a 100 lb woman) 8. Genetic principle components (or empirical kinships) 9. Adolescence versus adulthood (e.g., < 21 years of age versus >=21) 10. Date of assessment (e.g., the calendar year of the assessment)?


Interactions 11. Sex X Adolescence interaction 12. Sex X Age interaction 13. Sex X Weight/Height/BMI interaction 14. Age X Adolescence interaction


Analysis The basic analysis is two-stage. In the first stage, local investigators produce, for each phenotype, a set of single-variant summary statistics using a tool developed at the University of Michigan. In the second stage, these summary statistics are pooled for meta-analysis. All single-variant and gene-based (‘burden’) tests can be conducted from the summary statistics.

These two stages are now described in more detail. Stage 1: Local Sites Produce Summary Statistics Using Rare-Metal-Worker The meta-analysis step (stage 2) requires a very specific set of summary statistics, which includes single-variant test statistics and p-values, as well as the test statistic covariance matrix within a sliding window (default: 1Mb). Shuang Feng, Dajiang Liu, and Goncalo Abecasis at the University of Michigan have developed software specifically for this purpose, called Rare-Metal-Worker. Software and usage instructions to generate necessary single variant statistics is available at:

http://genome.sph.umich.edu/wiki/Rare-Metal-Worker

NOTE: It is essential that each trait is corrected for covariates, and the residuals are inverse normalized, and then association testing is conducted. The Rare-Metal-Worker software has this functionality when --makeResiduals, and --inverseNormalize are jointly specified.

Alternatively, you could correct for covariates prior to using Rare-Metal-Worker, and then specify only --makeResiduals and --inverseNormalize.

All output files from Rare-Metal-Worker can then be emailed to Scott Vrieze (svrieze at umich dot edu) for centralized meta-analysis. Stage 2: Single-Variant and Gene-Based Meta-Analysis Single-Variant Tests We will do meta-analysis of score statistics for individual variants weighting by sample size.

Gene-Based Tests Gene-based tests can be conducted centrally by Scott Vrieze using output from Rare-Metal-Worker.

We will implement two burden tests. First, a Variable Threshold Combined Multivariate and Collapsing count method, where the number of rare alleles is counted in each gene, then the gene is tested for association. Second, we will use SKAT for all rare variants (MAF < .05) within a gene. SKAT allows for variants with opposite directions of effect within the same gene, whereas the variable threshold combined multivariate and collapsing method does not. Other Analysis Considerations Genotype Annotation Gene-based burden tests can be augmented with genotype annotation. We currently plan to use only nonsynonymous variants from ANNO-generated annotations relative to GENCODE transcripts. All annotation can be done centrally at the meta-analysis stage to ensure consistency across sites. Multivariate Test We will pursue development of a multivariate test for drinking and smoking jointly. This could be as simple as, on a per-marker or per-gene basis, averaging effect sizes or p-values for meta-analytic CPD and DPW p-value results. Further Downstream Analysis To be determined. Will depend on results from the main analysis above. Some possible ideas include:

1. Pathway-based analysis (e.g., with MAGENTA: http://www.broadinstitute.org/mpg/magenta/) 2. Gene set analysis (e.g., grouping together all nicotinic receptor genes) 3. Human knockout analysis for all individuals with rare variants resulting in effective gene knockouts. 4. Conditional analysis on known variants and any newly discovered rare variants. 5. Sex specific analysis

We more than welcome individual sites to propose additional analysis, as well as to take the lead on additional projects related to the primary aims of this meta-analysis.