Changes

From Genome Analysis Wiki
Jump to navigationJump to search
no edit summary
Line 1: Line 1: −
[[File:EMADS_thumb.png|thumb|360px|right]]
+
<!--        BANNER ACROSS TOP OF PAGE        -->
 +
 
 +
{| style="width:100%; background:#fcfcfc; margin-top:1.2em; border:1px solid #ccc;"
 +
|-
 +
| style="width:100%; text-align:center; white-space:nowrap; color:#000;" | <div style="font-size:162%; border:none; margin:0; padding:.1em; color:#000;">This Page is No Longer Supported. Please Visit http://gscan.sph.umich.edu</div>
 +
|}
 +
 
 +
[[File:EMADS_thumb5.png|thumb|340px|right]]
 
Exome Meta-Analysis of Drinking and Smoking (EMADS) Analysis Plan
 
Exome Meta-Analysis of Drinking and Smoking (EMADS) Analysis Plan
   Line 8: Line 15:     
== Inclusion Criteria ==
 
== Inclusion Criteria ==
For our first analysis, samples must be between ages 18 and 70 (inclusive) and be of European ancestry. We hope to extend analysis to other ancestral groups in the future.
+
For our first analysis, samples must be between ages 18 and 70 (inclusive) and be of European ancestry. We will extend analysis to other ancestral groups in the future.
    
== Quality Control ==
 
== Quality Control ==
Line 42: Line 49:     
=== (2) Smoking Initiation ===
 
=== (2) Smoking Initiation ===
Every study had some usable measure of whether a respondent has ever regularly smoked.  Almost all asked directly.  Some have necessary information for this variable (e.g., 100 cigs lifetime? Ever smoked every day for 2 weeks straight?).
+
This is a binary phenotype. Code "1" for everyone in the study who reports ever being a regular smoker in their life (current or former). Code a "0" for everyone who denies ever being a regular smoker in their life.
 +
 
 +
Every study had some usable measure of whether a respondent has ever regularly smoked.  Almost all asked directly.  Some have necessary information to code this variable (e.g., 100 cigs lifetime? Ever smoked every day for 2 weeks straight?).
    
Note that we’re among the first groups conducting such meta-analyses, and our analysis pipeline is currently restricted to continuous traits. Until methods are developed for binary traits, it is proposed that we analyze smoking initiation as a continuous trait.
 
Note that we’re among the first groups conducting such meta-analyses, and our analysis pipeline is currently restricted to continuous traits. Until methods are developed for binary traits, it is proposed that we analyze smoking initiation as a continuous trait.
Line 53: Line 62:     
=== (5) Average drinks per week, either as a current drinker or former drinker ===
 
=== (5) Average drinks per week, either as a current drinker or former drinker ===
 +
The average number of drinks a subject reports drinking each week. Most studies asked this question directly. Other studies have converted to grams per day, or grams per week. The latter are fine to analyze directly for our purposes.
 +
 
Individuals who either never drank, or on whom we have no data (e.g., someone was a former drinker but former drinking was not assessed) will be excluded from analysis.  Please combine all types of liquor in the total estimate.  If preferable, repeated measures designs (longitudinal data) can use all assessments by scaling and correcting for covariates within waves of assessment, then averaging across assessments.   
 
Individuals who either never drank, or on whom we have no data (e.g., someone was a former drinker but former drinking was not assessed) will be excluded from analysis.  Please combine all types of liquor in the total estimate.  If preferable, repeated measures designs (longitudinal data) can use all assessments by scaling and correcting for covariates within waves of assessment, then averaging across assessments.   
   Line 60: Line 71:  
For CPD we will consider the binned responses to be on a quantitative scale from 1-4 (see above under the CPD phenotype description). '''CPD therefore will not require transformation''' prior to covariate correction.
 
For CPD we will consider the binned responses to be on a quantitative scale from 1-4 (see above under the CPD phenotype description). '''CPD therefore will not require transformation''' prior to covariate correction.
   −
For the other four quantitative phenotypes (Pack Years, Age of Initiation, Drinks Per Week) please '''left-anchor''' the distribution at 1 and '''log-transform it'''. Left-anchoring, such that no value is less than 1, prevents the log-transform from returning nonsensical values like negative infinity. Then apply the covariate correction to the transformed phenotypes. This step is unnecessary for the binary smoking initiation phenotype
+
For the other four quantitative phenotypes (Pack Years, Age of Initiation, Drinks Per Week) please '''left-anchor''' the distribution at 1 and '''log-transform it'''. Left-anchoring, such that no value is less than 1, prevents the log-transform from returning nonsensical values like negative infinity. Then apply the covariate correction to the transformed phenotypes.  
 +
 
 +
No transformations are necessary for the binary smoking initiation phenotype, but we will still correct for covariates for smoking initiation (recall that we are treating this binary phenotype in our analysis as if it were a continuous trait).
    
Appropriate covariates can often be study-specific.  We will depend on local investigators to determine the most appropriate covariates.  We list here some covariates that will likely be necessary.
 
Appropriate covariates can often be study-specific.  We will depend on local investigators to determine the most appropriate covariates.  We list here some covariates that will likely be necessary.
Line 73: Line 86:  
*Date of birth (or year, or range)
 
*Date of birth (or year, or range)
 
*Cohort
 
*Cohort
*Height, weight, BMI, for drinking (a single beer has different effects on a 200 lb man versus a 100 lb woman)
   
*Genetic principle components (alternatively could use empirical kinships in rare-metal-worker)
 
*Genetic principle components (alternatively could use empirical kinships in rare-metal-worker)
*Adolescence versus adulthood (e.g., < 21 years of age versus >=21)
+
*Adolescence versus adulthood (e.g., < 21 years of age versus >=21). Only consider using this covariate if you have a large number of adolescents in your study.
 
*Date of assessment (e.g., the calendar year of the assessment)?
 
*Date of assessment (e.g., the calendar year of the assessment)?
*Current versus former smoker/drinker?
+
*Current versus former smoker for smoking phenotypes. This would be a binary covariate.
 +
*Current versus former drinker for drinking phenotypes. This would be a binary covariate.
 +
*For the drinking phenotype, consider Height, weight, and/or BMI (the idea is that a similar amount of alcohol has different effects on a 200 lb person versus a 100 lb person)
    
=== Interactions ===
 
=== Interactions ===
 +
These covariates may not be necessary, but we list them for local analysts to consider.
 
*Sex X Adolescence interaction
 
*Sex X Adolescence interaction
 
*Sex X Age interaction
 
*Sex X Age interaction
Line 92: Line 107:  
=== Stage 1: Local Sites Produce Summary Statistics Using Rare-Metal-Worker ===
 
=== Stage 1: Local Sites Produce Summary Statistics Using Rare-Metal-Worker ===
 
The meta-analysis step (stage 2) requires a very specific set of summary statistics, which includes single-variant test statistics and p-values, as well as the test statistic covariance matrix within a sliding window (default: 1Mb). Shuang Feng, Dajiang Liu, and Goncalo Abecasis at the University of Michigan have developed software specifically for this purpose, called Rare-Metal-Worker.  Software and usage instructions to generate necessary single variant statistics is available at [http://genome.sph.umich.edu/wiki/Rare-Metal-Worker  Rare-Metal-Worker].If there are installation problems please let Scott know.
 
The meta-analysis step (stage 2) requires a very specific set of summary statistics, which includes single-variant test statistics and p-values, as well as the test statistic covariance matrix within a sliding window (default: 1Mb). Shuang Feng, Dajiang Liu, and Goncalo Abecasis at the University of Michigan have developed software specifically for this purpose, called Rare-Metal-Worker.  Software and usage instructions to generate necessary single variant statistics is available at [http://genome.sph.umich.edu/wiki/Rare-Metal-Worker  Rare-Metal-Worker].If there are installation problems please let Scott know.
 +
 +
Rare-Metal-Worker works best, IMHO, when coding the genotype files as VCF. There are several ways to convert to vcf, including PLINK/SEQ and also WDIST (https://www.cog-genomics.org/wdist/).
    
'''NOTE:''' It is essential that analysis proceeds in the following order. For CPD, please bin quantitative responses and correct for covariates to obtain residuals. For Pack Years, Age of Initiation, and Drinks Per Week, please left-anchor responses at 1, log-transform, and then correct for covariates to obtain residuals. In this way we will obtain residualized phenotypes ready for analysis with Rare-Metal-Worker. These steps are probably easier to do in your software of choice.  
 
'''NOTE:''' It is essential that analysis proceeds in the following order. For CPD, please bin quantitative responses and correct for covariates to obtain residuals. For Pack Years, Age of Initiation, and Drinks Per Week, please left-anchor responses at 1, log-transform, and then correct for covariates to obtain residuals. In this way we will obtain residualized phenotypes ready for analysis with Rare-Metal-Worker. These steps are probably easier to do in your software of choice.  
Line 103: Line 120:  
'''Running Times'''
 
'''Running Times'''
   −
Run times depend heavily on the type of analysis. If all samples are unrelated, and no kinship matrix is used, then run times should be relatively fast (tens of minutes). If a mixed model is used, for example using an empirical kinship, then in samples of a few thousand rare-metal-worker should take less than 20 minutes to complete. In larger samples (~10,000 or more with phenotype data) it can take several days to complete an exome-chip-wide scan.
+
Run times depend heavily on the type of analysis. If all samples are unrelated, and no kinship matrix is used, then run times should be relatively fast (tens of minutes). If a mixed model is used, for example using an empirical kinship, then in samples of a few thousand rare-metal-worker should take less than 20 minutes to complete. In larger samples, especially of related individuals (~10,000 or more with phenotype data), it can take several days to complete an exome-chip-wide scan.
    
'''Submitting Results for Meta-Analysis'''
 
'''Submitting Results for Meta-Analysis'''
   −
All output files from Rare-Metal-Worker can then be uploaded to an sftp server at the University of Michigan for central analysis -- please email [mailto:svrieze@umich.edu Scott Vrieze] for the hostname, username, and password.
+
All output files from Rare-Metal-Worker can then be uploaded to an sftp server at the University of Michigan for central analysis -- please email [mailto:svrieze@umich.edu Scott Vrieze] for the hostname, username, and password. One site used Aspera to transmit results, which worked well.
    
=== Stage 2: Single-Variant and Gene-Based Meta-Analysis ===
 
=== Stage 2: Single-Variant and Gene-Based Meta-Analysis ===
235

edits

Navigation menu