Difference between revisions of "METAL Documentation"

From Genome Analysis Wiki
Jump to navigationJump to search
 
(56 intermediate revisions by 5 users not shown)
Line 1: Line 1:
==METAL==
+
== Useful Wiki Pages ==
'''Goncalo Abecasis, Yun Li and Cristen Willer, 2007'''
 
  
METAL is a tool for performing meta-analysis of p-values from two or more individual studies.  Metal creates a single summary p-value from studies which could not be analyzed together because of differences in ethnicity, phenotype distribution, gender, inability to share individual-level data, or any other reason.
+
There are a few pages in this Wiki that may be useful to METAL users. Here are links to key pages:
  
For each marker, an arbitrary reference allele is selected and a z-statistic characterizing the evidence for association is used as input. The z-statistic summarizes the magnitude and the direction of effect relative to the reference allele.  An overall z-statistic and p-value are then calculated from the weighted average of the individual statistics.  Weights are proportional to the square-root of the number of individuals examined in each sample and selected such that the squared weights sum to 1.0. If a sample contains related individuals, a smaller ‘effective’ population size may be used, but simulations suggest that modest changes in the effective sample size have very little impact on the final p-value.
+
* The [[METAL Program|METAL Home Page]]
  
===Usage instructions===
+
* The [[METAL Quick Start|METAL Quick Start Tutorial]]
  
METAL is invoked with the command ‘metal’ and allows for analysis to be performed interactively.  A convenient alternative is to save all commands into a single text file which can be provided as input.  An example is at the bottom of this document.
+
* The [[METAL FAQ]]
  
METAL allows for a variety of tabular formats in the input files, but the following information must be provided for each marker in each file;
+
* The [[METAL Command Reference]]
  
There are a number of useful commands related to the analysis that are typically set early in the analysis.  For example, the user can choose to weight studies in the meta-analysis using the inverse of the standard error, or the square root of the sample size.  These are proportionate.  Users should be cautious when weighting based on standard error that the beta and standard error are in the same units for all studies (i.e. same trait and same transformation applied to the trait).  The default weighting scheme is SAMPLESIZE.
+
== History ==
SCHEME STDERR
 
  
METAL has an option to perform genomic control correction to all input files.  METAL will estimate the inflation of the test statistic by comparing the median test statistic to that expected by chance, and then apply the genomic control correction to the p-values (for SAMPLESIZE weighted meta-analysis), or the standard error (for STDERR weighted meta-analysis). This should only be applied to files with whole genome data (i.e. should not be used for cohorts that only performed genotyping of replication SNPs). Genomic control can be turned off and on for different input files.  We recommend applying genomic control correction to all input files, and also to the final output by loading the initial results file into METAL to perform genomic control correction on the final results.
+
METAL was developed by Goncalo Abecasis, Yun Li and Cristen Willer ([http://www.sph.umich.edu/csg/abecasis/publications/pdf/Bioinformatics.vol.26-pp.2190.pdf manuscript available here]). The first version was developed in 2007 and was used for the analyses presented in [http://www.sph.umich.edu/csg/abecasis/publications/18193045.html Sanna et al (2008)] and [http://www.sph.umich.edu/csg/abecasis/publications/18193043.html Willer et al (2008)]. Since then, it has become quite a popular tool for the analysis of genomewide association scans.
GENOMICCONTROL ON
 
  
METAL will optionally keep track of the effect allele frequency across all files and provide the mean, minimum and maximum.  This can be quite useful to determine whether the frequencies are similar across different cohorts after METAL performs all strand alignment.  METAL requires all input files to have an allele frequency column when this feature is turned on.
+
== Brief Description ==
AVERAGEFREQ ON
 
MINMAXFREQ ON
 
  
Then, for each individual file, the following command will be used;
+
METAL is a tool for meta-analysis genomewide association scans. METAL can combine either (a) test statistics and standard errors or (b) p-values across studies (taking sample size and direction of effect into account).  METAL analysis is a convenient alternative to a direct analysis of merged data from multiple studies. It is especially appropriate when data from the individual studies cannot be analyzed together because of differences in ethnicity, phenotype distribution, gender or constraints in sharing of individual level data imposed. Meta-analysis results in little or no loss of efficiency compared to analysis of a combined dataset including data from all individual studies.
FREQLABEL EffectAlleleFrequencyColumnHeading
 
  
We allow users to keep cumulative counts of custom variables across input files.  An example of this might be to keep track of the sample size when performing standard-error weighted meta-analysis.  The name of the custom variable should be defined once, before input files are loaded.  The name of the heading in each file can be specified using the command LABEL for each file.
+
== Approach ==
  
CUSTOMVARIABLE TotalSampleSize
+
One of the most common questions we receive is about the approach used by METAL to carry out a meta-analysis using p-values as input. The process is actually quite simple! First, for each marker, a reference allele is selected and a z-statistic characterizing the evidence for association is calculated. The z-statistic summarizes the magnitude and the direction of effect relative to the reference allele and all studies are aligned to the same reference allele.  Next, an overall z-statistic and p-value are then calculated from a weighted sum of the individual statistics. Weights are proportional to the square-root of the number of individuals examined in each sample and selected such that the squared weights sum to 1.0. For samples that contain related individuals, a smaller ‘effective’ sample size may be used, but simulations suggest that modest changes in the effective sample size have very little impact on the final p-value.
For each individual input file;
 
LABEL TotalSampleSize as N
 
  
We allow flexible input formats, including a method for providing SNPs on different strands.  Input files can contain a column which can indicate which strand the alleles are coded on (given as +/-).  This feature can be turned on and off for different files in the same analysis.  If USESTRAND is off, the strand is assumed to be “+” for all SNPs, although obvious strand problems for unambiguous SNPs are identified by METAL and appropriately handled (i.e. one study provides A/G alleles and a different study provides C/T alleles)
+
== Basic Usage Instructions ==
  
USESTRAND ON
+
METAL is a command line tool. It is typically run from a Linux, Unix or DOS prompt by invoking the command <code>metal</code>. Analyses can be run interactively or a simple script can be provided as input. Interactive analyses are usually convenient when learning how to use METAL, whereas the scripting approach is preferred for production use (as it allows analyses to be conveniently repeated).  An example METAL script is included at the bottom of this page.
For each individual file;
 
STRAND StrandColumnHeading
 
  
METAL allows for complete output of individual summary statistics for all SNPs in all input files.  This can create a very large file and should be used with caution.  Users should create custom variables to restrict analyses to significant SNPs or specific SNPs of interest before using this option.  However, this option can be useful for comparing direction of effect across many studies since METAL takes care of all the strand flipping and provides the direction of effect relative to the same allele.  This is also a way to double-check that the expected data are being used appropriately by METAL.
+
METAL has lots of options and here we have listed some common ones that, hopefully, will help you get started.  
VERBOSE ON
 
  
Another option allows METAL to check the appropriate number of columns exist for each input file, or allows METAL to ignore situations when there are not enough columns.  The default is STRICT column counting.
+
=== Help! ===
COLUMNCOUNTING LENIENT
 
  
 +
Issuing the <code>HELP</code> command lists all available commands and the current settings for each option. The list of all available commands is also available in the [[METAL Command Reference]].
  
Mandatory input for each input file;
+
=== Input File Separators ===
* Marker name
 
* Reference allele (also known as the ‘effect allele’) and the non-reference allele
 
* P-value
 
* Weight (sample size or standard error)
 
* Direction of effect relative to reference allele
 
  
Tables must have column headers that specify where the mandatory input can be found. The default name for the Marker column is ‘MARKER’, but can be changed to match the relevant input file column with the following command;
+
METAL expects that each set of results will be summarized in a table. This table must be stored in a text file but otherwise METAL is quite flexible about details such as column separators, column headers and the like. This does mean that an essential bit of information needed before any meta-analysis is a description of each input file.
  
MARKER SNP
+
The first thing you should specify is the column separator. By default, METAL assumes columns are separated by whitespace (which consists of any combination of space and tab characters). You can also specify:
  
Similarly, the reference allele column, P-value column and effect column can be changed to match the input file;
+
  SEPARATOR  WHITESPACE    - the default
 +
  SEPARATOR  COMMA        - for comma delimited files that are popular in some platforms
 +
  SEPARATOR  TAB          - columns separated by a single tab, so that consecutive tabs indicate an empty column
  
ALLELE RefAlleleColumnHeading NonRefAlleleColumnHeading
+
=== Input File Columns ===
PVALUE PvalueColumnHeading
 
EFFECT EffectColumnHeading
 
  
We strongly recommend that both allele labels, corresponding to the the effect allele and non-effect allele, respectively, are given for all SNPs.  Alleles can be numeric (1,2,3,4) or alphabetical (A,C,G,T,a,c,g,t) and can be on either strand if not an A/T or C/G SNP.  For A/T or C/G SNPs, METAL requires SNPs to be on a consistent strand in different input files for the results to be interpretable.  For A/C, A/G, C/T, and G/T SNPs, METAL will flip the strand the alleles are on if not consistent between input files and METAL will output results with respect to the lowest numeric reference allele (see Examples 1, 2, and 3, below).  If all files are consistent (for example, using the HapMap allele naming conventions), the strand of the alleles is left alone. As long as both allele columns are given for each input file, METAL appropriately accounts for situations when different input files use different reference alleles.
+
Each input file should include the following information:
  
P-values of 0 or any other non-numeric value are assumed to be missing.  Missing values are tolerated and a meta-analysis p-value will include results from any input file with non-missing values, even if only one input file has a p-value for this marker (see MINWEIGHT below for exclusion of markers with a small combined N).
+
* A column with marker name, which should be consistent across studies
 +
* A column indicating the tested allele
 +
* A column indicating the other allele
  
The EFFECT column can have positive and negative values (beta values from regression, for example), or simply directions of effect relative to the reference allele, listed as “+” and “-“.  An EFFECT of “+” (or any positive number) with respect to the reference allele A (or effect allele A), for example, represents a case where increasing number of copies of allele A are correlated with increasing trait values. 
+
If you are carrying out a sample size weighted analysis (based on p-values), you will also need:
  
To perform odds-ratio based meta-analysis, select SCHEME STDERR at the beginning of the script.  Then, for each file, provide the natural log of the odds ratio as the EFFECT column;
+
* A column indicating the direction of effect for the tested allele
EFFECT logOddsRatioColumnHeading
+
* A column indicating the corresponding p-value
 +
* An optional column indicating the sample size (if the sample size varies by marker)
  
Or, METAL can compute the log of the odds ratio for you;
+
If you are carrying out a meta-analysis based on standard errors, you will need:
EFFECT log(OddsRatioColumnHeading)
 
  
The weight for each MARKER can be assigned using a column;
+
* A column indicating the estimated effect size for each marker
WEIGHTLABEL SampleSizeColumnHeading
+
* A column indicating the standard error of this effect size estimate
  
Or;
+
The header for each of these columns must be specified so that METAL knows how to interpret the data. As noted below, additional columns including allele frequency information, strand information, and others can also be present.
WEIGHT SampleSizeColumnHeading
 
  
Or the default weight for the entire file can be specified with the following command;
+
Here is a typical set of commands that would describe a table where the headers SNP, RefAllele, NonRefAllele, Pvalue and Beta correspond to the MARKER, ALLELE 1 and 2, PVALUE and EFFECT columns:
E.g., if you have a sample size of 2000 for all markers in an input file
 
DEFAULTWEIGHT 2000
 
  
The default delimiter in METAL is WHITESPACE (comma or tab is considered a delimiter) but can be changed to comma, tab or space.
+
  MARKERLABEL  SNP
 +
  ALLELELABELS  RefAllele NonRefAllele
 +
  PVALUELABEL  P-value
 +
  EFFECTLABEL  Effect
 +
 
 +
These can be abbreviated as:
 +
 
 +
  MARKER        SNP
 +
  ALLELE        RefAllele NonRefAllele
 +
  PVALUE        P-value
 +
  EFFECT        Effect
 +
 
 +
=== Specifying Weights in P-value Based Analysis ===
 +
 
 +
The weight for each MARKER can be stored in a column in the table (specified with the <code>WEIGHTLABEL</code> or <code>WEIGHT</code> commands). Most commonly, the weight will be the number of individuals contributing to that particular p-value.
 +
 
 +
  WEIGHTLABEL    N
 +
 
 +
Alternatively, the same weight can be used for all markers for that inputfile (in which case the fixed weight can be set with the <code>DEFAULTWEIGHT</code> command).  The WEIGHTLABEL command takes precedence over the DEFAULTWEIGHT command, so the WEIGHT column label in use must not match any columns in the inputfile.
 +
 
 +
  WEIGHTLABEL    DONTUSECOLUMN
 +
  DEFAULTWEIGHT  1000
 +
 
 +
=== Reading Each Input File ===
 +
 
 +
Once all appropriate headers have been specified, issuing the <code>PROCESS</code> command will read an input file and update summary statistics to take the results it contains into account. Thus:
 +
 
 +
  PROCESS      study1-results.tbl
 +
 
 +
=== Performing the Final Analysis ===
 +
 
 +
Once all input files have been processed, simply issue the <code>ANALYZE</code> command to execute a meta-analysis. If you'd like to execute interim analysis that include only a subset of the studies, issue the ANALYZE command after the corresponding input files have been processed.
 +
 
 +
  ANALYZE
 +
 
 +
To allow for heterogeneity, use the <code>ANALYZE HETEROGENEITY</code> command. This command will take a little longer to run, because it requires each input file to be examined twice. The METAL heterogeneity analysis requires a second pass of analysis to decide whether observed effect sizes (or test statistics) are homogeneous across samples.  The resulting heterogeneity statistic has n-1 degrees of freedom for n samples.
 +
 
 +
  ANALYZE HETEROGENEITY
 +
 
 +
METAL does not require that all input files report a result for every marker.  Any available data is used.  To restrict the output to only markers that have at least a specific number of individuals analysed (or weight), use a command like the following:
 +
 
 +
  MINWEIGHT 10000
 +
 
 +
For example to restrict the output to show only Markers with a total sample size of at least 10,000 individuals.
 +
 
 +
== Additional Analysis Options ==
 +
 
 +
=== Selecting an Analysis Scheme ===
 +
 
 +
  SCHEME SAMPLESIZE        - default approach, uses p-value and direction of effect, weighted according to sample size
 +
  SCHEME STDERR            - classical approach, uses effect size estimates and standard errors
 +
  STDERR SE                - specify the label for the standard error column.
 +
 
 +
By default, METAL combines p-values across studies taking into account a study specific weight (typically, the sample size) and direction of effect. This behavior can be requested explicitly with the <code>SCHEME SAMPLESIZE</code> command. An alternative can be requested with the <code>SCHEME STDERR</code> command and weights effect size estimates using the inverse of the corresponding standard errors. To enable this option, you will also need to specify which of your input columns contains standard error information using the <code>STDERRLABEL</code> command (or <code>STDERR</code> for short). While standard error based weights are more common in the biostatistical literature, if you decide to use this approach, it is very important to ensure that effect size estimates (''beta'' coefficients) and standard errors use the same units in all studies (i.e. make sure that the exact same trait was examined in each study and that the same transformations were applied). Inconsistent use of measurement units across studies is the most common cause of discrepancies between these two analysis strategies.
 +
 
 +
=== Genomic Control Correction ===
 +
 
 +
  GENOMICCONTROL OFF      - the default, no adjustment to test statistics
 +
  GENOMICCONTROL ON      - automatically correct test statistics to account for small amounts of population stratification or unaccounted for relatedness
 +
  GENOMICCONTROL [value]  - correct test statistics using the specified inflation factor
 +
 
 +
METAL has the ability to apply a genomic control correction to all input files.  METAL will estimate the inflation of the test statistic by comparing the median test statistic to that expected by chance, and then apply the genomic control correction to the p-values (for SAMPLESIZE weighted meta-analysis) or the standard error (for STDERR weighted meta-analysis).  This should only be applied to files with whole genome data (i.e. should not be used for settings where results are only available for a candidate locus or a small number of SNPs selected for follow-up of [[GWAS]] results). Genomic control settings can be customized for each input file.  We recommend applying genomic control correction to all input files that include genomewide data and, in addition, to the meta-analysis results. To apply genomic control to the meta-analysis results, just perform an initial meta-analysis and then load the initial set of results into METAL to get final, genomic control adjusted results.
 +
 
 +
=== Sample Overlap Correction ===
 +
 
 +
Correction for sample overlap in sample size weighted meta-analysis (developed by Sebanti Sengupta and implemented by Daniel Taliun).
 +
 
 +
First, METAL estimates the number of individuals that are common among two or more studies based on Z-statistics from each study. Then, METAL adjusts for sample overlap when calculating overall Z-statistics by correcting the weights with the estimated number of individuals in common.
 +
 
 +
To enable correction for sample overlap in your sample size weighted meta-analysis, use OVERLAP ON command (valid only with SCHEME SAMPLESIZE). By default, METAL uses Z-statistics <1 for esimating the number of individuals that are common among studies. To change this threshold, use ZCUTOFF [number] command.
 +
 
 +
More information on the method can be found in:
 +
 
 +
* [[media:METAL_sample_overlap_2017-11-15.pptx|Method overview and results]]
 +
* [[media:METAL_sample_overlap_method_2017-11-15.pdf|Full method description]] (current draft, manuscript est. 2018)
 +
 
 +
=== Strand Information ===
 +
 
 +
  USESTRAND  ON
 +
  STRANDLABEL StrandColumnHeading
 +
 
 +
Input files can contain a column that indicates which strand the alleles are coded on (given as +/-).  If this column is present, you should issue the <code>USESTRAND ON</code> command and specify an appropriate header with the <code>STRANDLABEL</code> command.  If USESTRAND is off, the strand is assumed to be “+” for all SNPs, although obvious strand problems are identified by METAL and appropriately handled (for example, when one study provides A/G alleles and a different study provides C/T alleles).
 +
 
 +
=== Filtering ===
 +
 
 +
Custom filters can be used to select SNPs for inclusion in the meta-analysis.  This can be used, for example, to select SNPs within a specified minor-allele frequency range for analysis.
 +
 
 +
Here are some possible filters:
 +
 
 +
  ADDFILTER N > 1000
 +
  ADDFILTER MAF > 0.01
 +
 
 +
Together, these two filters would only consider entries where the value in the N column is greater than 1000 and the value in the MAF column is also greater than 0.01.
 +
 
 +
Filters can be defined using the <, >, <=, >=, =, != and IN operators. The IS operator tests membership in a set. For example to restrict analysis to three interesting SNPs, use (''note absence of spaces in list of SNPs''):
 +
 
 +
  ADDFILTER MARKER_ID IN (rs1234,rs123456,rs123)
 +
 
 +
To remove all previously defined filters, use the command:
 +
 
 +
  REMOVEFILTERS
 +
 
 +
=== Verbose Mode ===
 +
 
 +
  VERBOSE ON
 +
 
 +
METAL allows for complete output of individual summary statistics for all SNPs in all input files.  This can create a very large file and should be used with caution.  Typically, one should create custom filters  to restrict analyses to interesting SNPs of interest before using this option.  This option can be useful for comparing direction of effect across many studies since METAL takes care of all the strand flipping and provides the direction of effect relative to the same allele.  This is also a way to double-check that the expected data are being used appropriately by METAL.
 +
 
 +
=== Lenient Mode ===
 
   
 
   
SEPARATOR commas
+
    COLUMNCOUNTING STRICT        - requires expected number of columns in every row
 +
    COLUMNCOUNTING LENIENT        - tries to interpret rows with fewer columns than expected
 +
 
 +
By default, METAL will skip lines in each input file that don't have the expected number of columns. This is usually a good idea because it avoids producing incorrect results when a column is missing. Sometimes (for example, when there are optional extra columns at the end of each line), the <code>COLUMNCOUNTING LENIENT</code> option can be useful.
 +
 
 +
=== Tracking Allele Frequencies ===
 +
 
 +
  AVERAGEFREQ ON
 +
  MINMAXFREQ ON
  
Custom-designed filters can be used to select SNPs for inclusion in the meta-analysisThis can be used to select SNPs above or below a certain value (> or < ) from any column in the table, which can be useful for including SNPs with a minor allele frequency above a certain threshold.
+
METAL can optionally track the effect allele frequency across all files and report the mean, minimum and maximum effect allele frequencyThese can be quite useful to check that allele frequencies are similar across different cohorts after METAL performs all strand alignment. Large differences in allele frequencies across studies can suggest inconsistent naming of reference alleles across studies. METAL requires all input files to have an allele frequency column when this feature is turned on. To specify the column header for allele frequency information, use the <code>FREQLABEL</code> command.
FILTER N > 1000
 
CUSTOMVARIABLE MAF
 
LABEL MAF as MAF
 
FILTER MAF > 0.01
 
  
To remove filters so that they no longer apply to files processed later, use;
+
=== Custom Variables ===
REMOVEFILTERS
 
  
Once the appropriate WEIGHT, MARKER, PVALUE and EFFECT labels are defined, with or without optional parameters to set the FREQLABEL, DELIMITER, STRAND, FILTER and LABEL commands, load an input file;
+
We allow users to keep cumulative counts of custom variables across input files.  An example of this might be to keep track of the sample size when performing standard-error weighted meta-analysis.  The name of the custom variable should be defined once, before input files are loaded.  The corresponding column label in each input file can be specified using the <code>LABEL</code> command. For example, to create a custom variable labeled TotalSampleSize that tallies the total of the N column across files, one could issue the commands:
PROCESS firstinputfile_bmi.txt
 
  
METAL does not require that all input files have a p-value result to calculate a meta-analysis p-value.  Any available data is used.  To restrict the output to only markers that have at least a specific weight (number of individuals), then use;
+
  CUSTOMVARIABLE TotalSampleSize
> MINWEIGHT 10000
+
  LABEL TotalSampleSize as N
For example to restrict the output to show only Markers with at least 10,000 individuals.
 
  
Once all input files have had their column names defined and been loaded, then define your output filename (optional) and analyze!
+
If needed, the <code>LABEL</code> command can be used multiple times to customize column headers for each input file.
OUTPUTFILE myoutputfilename
 
ANALYZE
 
  
METAL can also evaluate the evidence for heterogeneity.  When you do this, METAL will do a second pass of analysis to decide whether observed effect sizes (or test statistics) are homogeneous across samples.  This will result in a test statistic (with n-1) degrees of freedom for n samples.
+
=== Input File Recommendations ===
  
==ANALYZE HETEROGENEITY==
+
We strongly recommend that both allele labels, corresponding to the the effect allele and non-effect allele, should be provided for all SNPs.  As long as both allele columns are given for each input file, METAL appropriately accounts for situations when different input files use different reference alleles. Alleles can be coded numerically (A=1,C=2,G=3,T=4) or alphabetically (A,C,G,T,a,c,g,t) and can be on either strand if not an A/T or C/G SNP.  For A/T or C/G SNPs, METAL requires SNPs to be on a consistent strand in different input files for the results to be interpretable.  For other SNPs, METAL can automatically identify and resolve strand inconsistencies.
  
Example 1; Strand flips required
+
P-values that are &lt; 0.0, &gt; 1.0 or non-numeric will be treated as missing and generate a warning. 
ALLELES EFFECT ALLELES Analyzed EFFECT Analyzed
 
Input file 1 T/G + a/c +
 
Input file 2 T/G + a/c +
 
Input file 3 A/C + a/c +
 
Output a/c +
 
  
Example 2; Reference allele flips required
+
The EFFECT column can have positive and negative values (beta values from regression, for example), or simply directions of effect relative to the reference allele, listed as “+” and “-“.  An EFFECT of “+” (or any positive number) with respect to the reference allele A (or effect allele A), for example, represents a case where increasing number of copies of allele A are correlated with increasing trait values. For discrete traits, it is common to report odds ratios, which are always positive. In this case, to calculate the direction of effect, one should look at the log of the odds ratio. METAL can compute the odds ratio for you if you specify <code>EFFECT log(ODDS_RATIO_COLUMN)</code>
ALLELES EFFECT ALLELES Analyzed EFFECT Analyzed
 
Input file 1 C/A - a/c +
 
Input file 2 C/A - a/c +
 
Input file 3 A/C + a/c +
 
Output a/c +
 
  
Example 2; Strand flips, numeric flips, and reference allele flips required
+
To perform odds-ratio based meta-analysis, select SCHEME STDERR at the beginning of the script.  Then, for each file, provide the natural log of the odds ratio as the EFFECT column or another appropriate statistic (such as the corresponding regression coefficient from a logistic regression analysis).
ALLELES EFFECT ALLELES Analyzed EFFECT Analyzed
 
Input file 1 G/T - a/c +
 
Input file 2 2/1 - a/c +
 
Input file 3 A/C + a/c +
 
Output a/c +
 
  
+
== Example: A METAL Meta-Analysis Script ==
Example text file to run metal;
 
  
# THIS FILE EXECUTES AN ANALYSIS OF ALL AVAILABLE INFORMATION
+
<pre>
 +
#THIS SCRIPT EXECUTES AN ANALYSIS OF EIGHT STUDIES
 +
#THE RESULTS FOR EACH STUDY ARE STORED IN FILES Inputfile1.txt THROUGH Inputfile8.txt
  
mkdir output-metal
+
#LOAD THE FIRST EIGHT INPUT FILES
  
metal << EOT
+
# UNCOMMENT THE NEXT LINE TO ENABLE GenomicControl CORRECTION
 +
# GENOMICCONTROL ON
  
# loading in the first half of inputfiles
+
# === DESCRIBE AND PROCESS THE FIRST INPUT FILE ===
MARKER SNP
+
MARKER SNP
ALLELE REF_ALLELE OTHER_ALLELE
+
ALLELE REF_ALLELE OTHER_ALLELE
EFFECT BETA
+
EFFECT BETA
WEIGHT N
+
PVALUE PVALUE
PVALUE  PVALUE
+
WEIGHT N
 
PROCESS inputfile1.txt
 
PROCESS inputfile1.txt
PROCESS inputfiles2.txt
+
 
PVALUE  pvalue
+
# === THE SECOND INPUT FILE HAS THE SAME FORMAT AND CAN BE PROCESSED IMMEDIATELY ===
ALLELE A_REF OTHER_ALLELE
+
PROCESS inputfile2.txt
MARKER  SNP
+
 
EFFECT BETA
+
# === DESCRIBE AND PROCESS THE THIRD INPUT FILE ===
WEIGHT N
+
MARKER SNP
 +
ALLELE A_REF OTHER_ALLELE
 +
EFFECT BETA
 +
PVALUE pvalue
 +
WEIGHT N
 
PROCESS inputfile3.txt
 
PROCESS inputfile3.txt
MARKER MARKERNAME
+
 
ALLELE EFFECTALLELE NON_EFFECT_ALLELE
+
# === DESCRIBE AND PROCESS THE FOURTH INPUT FILE ===
EFFECT EFFECT1
+
MARKER MARKERNAME
WEIGHT NONMISS
+
ALLELE EFFECTALLELE NON_EFFECT_ALLELE
PVALUE  PVALUE
+
EFFECT EFFECT1
PROCESS inputfile4.txt
+
PVALUE PVALUE
# meta-analysis can be performed at any stage
+
WEIGHT NONMISS
# and will include inputfiles 1-4
+
PROCESS inputfile4.txt  
 +
 
 +
# === CARRY OUT AN INTERIM ANALYSIS OF THE FIRST FOUR FILES ===
 
OUTFILE METAANALYSIS_inputfile1to4_ .tbl
 
OUTFILE METAANALYSIS_inputfile1to4_ .tbl
ANALYZE
+
ANALYZE  
 +
 
 +
# LOAD THE NEXT FOUR INPUT FILES
  
# load the second half of inputfiles
+
# === DESCRIBE AND PROCESS THE FIFTH INPUT FILE ===
MARKER rsid
+
MARKER rsid
ALLELE EFFECT_ALLELE OTHER_ALLELE
+
ALLELE EFFECT_ALLELE OTHER_ALLELE
EFFECT BETA
+
EFFECT BETA
WEIGHT total_N
+
PVALUE Add_p
PVALUE  Add_p
+
WEIGHT total_N
SEPARATOR COMMAS
+
SEPARATOR COMMAS
 
PROCESS inputfile5.txt
 
PROCESS inputfile5.txt
 +
 +
# === THE SIXTH INPUT FILE HAS THE SAME FORMAT AND CAN BE PROCESSED IMMEDIATELY ===
 
PROCESS inputfile6.txt
 
PROCESS inputfile6.txt
ALLELE ALLELE OTHER_ALLELE
+
 
MARKER SNP
+
# === DESCRIBE AND PROCESS THE SEVENTH INPUT FILE ===
EFFECT BETA
+
ALLELE ALLELE OTHER_ALLELE
WEIGHT N
+
MARKER SNP
PVALUE  PVALUE
+
EFFECT BETA
 +
PVALUE PVALUE
 +
WEIGHT N
 
SEPARATOR WHITESPACE
 
SEPARATOR WHITESPACE
 
PROCESS inputfile7.txt
 
PROCESS inputfile7.txt
ALLELE BETA_ALLELE OTHER_ALLELE
+
 
PVALUE  P_VAL
+
# === DESCRIBE AND PROCESS THE EIGHTH INPUT FILE ===
MARKER SNP
+
ALLELE BETA_ALLELE OTHER_ALLELE
EFFECT BETA
+
MARKER SNP
WEIGHT N
+
EFFECT BETA
PROCESS inputfile8.txt
+
PVALUE P_VAL
# for the final meta-analysis of all 8 samples
+
WEIGHT N
# only output results if the combined weight
+
PROCESS inputfile8.txt  
# is greater than 10000 people  
+
 
 +
#for the final meta-analysis of all 8 samples only output results if the
 +
#combined weight is greater than 10000 people
 +
 
 
OUTFILE METAANALYSIS_inputfile1-8_ .tbl
 
OUTFILE METAANALYSIS_inputfile1-8_ .tbl
 
MINWEIGHT 10000
 
MINWEIGHT 10000
ANALYZE
+
ANALYZE  
  
 
QUIT
 
QUIT
EOT
+
</pre>

Latest revision as of 15:52, 22 December 2017

Useful Wiki Pages

There are a few pages in this Wiki that may be useful to METAL users. Here are links to key pages:

History

METAL was developed by Goncalo Abecasis, Yun Li and Cristen Willer (manuscript available here). The first version was developed in 2007 and was used for the analyses presented in Sanna et al (2008) and Willer et al (2008). Since then, it has become quite a popular tool for the analysis of genomewide association scans.

Brief Description

METAL is a tool for meta-analysis genomewide association scans. METAL can combine either (a) test statistics and standard errors or (b) p-values across studies (taking sample size and direction of effect into account). METAL analysis is a convenient alternative to a direct analysis of merged data from multiple studies. It is especially appropriate when data from the individual studies cannot be analyzed together because of differences in ethnicity, phenotype distribution, gender or constraints in sharing of individual level data imposed. Meta-analysis results in little or no loss of efficiency compared to analysis of a combined dataset including data from all individual studies.

Approach

One of the most common questions we receive is about the approach used by METAL to carry out a meta-analysis using p-values as input. The process is actually quite simple! First, for each marker, a reference allele is selected and a z-statistic characterizing the evidence for association is calculated. The z-statistic summarizes the magnitude and the direction of effect relative to the reference allele and all studies are aligned to the same reference allele. Next, an overall z-statistic and p-value are then calculated from a weighted sum of the individual statistics. Weights are proportional to the square-root of the number of individuals examined in each sample and selected such that the squared weights sum to 1.0. For samples that contain related individuals, a smaller ‘effective’ sample size may be used, but simulations suggest that modest changes in the effective sample size have very little impact on the final p-value.

Basic Usage Instructions

METAL is a command line tool. It is typically run from a Linux, Unix or DOS prompt by invoking the command metal. Analyses can be run interactively or a simple script can be provided as input. Interactive analyses are usually convenient when learning how to use METAL, whereas the scripting approach is preferred for production use (as it allows analyses to be conveniently repeated). An example METAL script is included at the bottom of this page.

METAL has lots of options and here we have listed some common ones that, hopefully, will help you get started.

Help!

Issuing the HELP command lists all available commands and the current settings for each option. The list of all available commands is also available in the METAL Command Reference.

Input File Separators

METAL expects that each set of results will be summarized in a table. This table must be stored in a text file but otherwise METAL is quite flexible about details such as column separators, column headers and the like. This does mean that an essential bit of information needed before any meta-analysis is a description of each input file.

The first thing you should specify is the column separator. By default, METAL assumes columns are separated by whitespace (which consists of any combination of space and tab characters). You can also specify:

  SEPARATOR  WHITESPACE    - the default
  SEPARATOR  COMMA         - for comma delimited files that are popular in some platforms
  SEPARATOR  TAB           - columns separated by a single tab, so that consecutive tabs indicate an empty column

Input File Columns

Each input file should include the following information:

  • A column with marker name, which should be consistent across studies
  • A column indicating the tested allele
  • A column indicating the other allele

If you are carrying out a sample size weighted analysis (based on p-values), you will also need:

  • A column indicating the direction of effect for the tested allele
  • A column indicating the corresponding p-value
  • An optional column indicating the sample size (if the sample size varies by marker)

If you are carrying out a meta-analysis based on standard errors, you will need:

  • A column indicating the estimated effect size for each marker
  • A column indicating the standard error of this effect size estimate

The header for each of these columns must be specified so that METAL knows how to interpret the data. As noted below, additional columns including allele frequency information, strand information, and others can also be present.

Here is a typical set of commands that would describe a table where the headers SNP, RefAllele, NonRefAllele, Pvalue and Beta correspond to the MARKER, ALLELE 1 and 2, PVALUE and EFFECT columns:

 MARKERLABEL   SNP
 ALLELELABELS  RefAllele NonRefAllele
 PVALUELABEL   P-value
 EFFECTLABEL   Effect

These can be abbreviated as:

 MARKER        SNP
 ALLELE        RefAllele NonRefAllele
 PVALUE        P-value
 EFFECT        Effect

Specifying Weights in P-value Based Analysis

The weight for each MARKER can be stored in a column in the table (specified with the WEIGHTLABEL or WEIGHT commands). Most commonly, the weight will be the number of individuals contributing to that particular p-value.

 WEIGHTLABEL     N

Alternatively, the same weight can be used for all markers for that inputfile (in which case the fixed weight can be set with the DEFAULTWEIGHT command). The WEIGHTLABEL command takes precedence over the DEFAULTWEIGHT command, so the WEIGHT column label in use must not match any columns in the inputfile.

 WEIGHTLABEL     DONTUSECOLUMN
 DEFAULTWEIGHT   1000

Reading Each Input File

Once all appropriate headers have been specified, issuing the PROCESS command will read an input file and update summary statistics to take the results it contains into account. Thus:

 PROCESS      study1-results.tbl

Performing the Final Analysis

Once all input files have been processed, simply issue the ANALYZE command to execute a meta-analysis. If you'd like to execute interim analysis that include only a subset of the studies, issue the ANALYZE command after the corresponding input files have been processed.

 ANALYZE

To allow for heterogeneity, use the ANALYZE HETEROGENEITY command. This command will take a little longer to run, because it requires each input file to be examined twice. The METAL heterogeneity analysis requires a second pass of analysis to decide whether observed effect sizes (or test statistics) are homogeneous across samples. The resulting heterogeneity statistic has n-1 degrees of freedom for n samples.

 ANALYZE HETEROGENEITY

METAL does not require that all input files report a result for every marker. Any available data is used. To restrict the output to only markers that have at least a specific number of individuals analysed (or weight), use a command like the following:

 MINWEIGHT 10000

For example to restrict the output to show only Markers with a total sample size of at least 10,000 individuals.

Additional Analysis Options

Selecting an Analysis Scheme

 SCHEME SAMPLESIZE        - default approach, uses p-value and direction of effect, weighted according to sample size
 SCHEME STDERR            - classical approach, uses effect size estimates and standard errors
 STDERR SE                - specify the label for the standard error column.

By default, METAL combines p-values across studies taking into account a study specific weight (typically, the sample size) and direction of effect. This behavior can be requested explicitly with the SCHEME SAMPLESIZE command. An alternative can be requested with the SCHEME STDERR command and weights effect size estimates using the inverse of the corresponding standard errors. To enable this option, you will also need to specify which of your input columns contains standard error information using the STDERRLABEL command (or STDERR for short). While standard error based weights are more common in the biostatistical literature, if you decide to use this approach, it is very important to ensure that effect size estimates (beta coefficients) and standard errors use the same units in all studies (i.e. make sure that the exact same trait was examined in each study and that the same transformations were applied). Inconsistent use of measurement units across studies is the most common cause of discrepancies between these two analysis strategies.

Genomic Control Correction

  GENOMICCONTROL OFF      - the default, no adjustment to test statistics
  GENOMICCONTROL ON       - automatically correct test statistics to account for small amounts of population stratification or unaccounted for relatedness
  GENOMICCONTROL [value]  - correct test statistics using the specified inflation factor

METAL has the ability to apply a genomic control correction to all input files. METAL will estimate the inflation of the test statistic by comparing the median test statistic to that expected by chance, and then apply the genomic control correction to the p-values (for SAMPLESIZE weighted meta-analysis) or the standard error (for STDERR weighted meta-analysis). This should only be applied to files with whole genome data (i.e. should not be used for settings where results are only available for a candidate locus or a small number of SNPs selected for follow-up of GWAS results). Genomic control settings can be customized for each input file. We recommend applying genomic control correction to all input files that include genomewide data and, in addition, to the meta-analysis results. To apply genomic control to the meta-analysis results, just perform an initial meta-analysis and then load the initial set of results into METAL to get final, genomic control adjusted results.

Sample Overlap Correction

Correction for sample overlap in sample size weighted meta-analysis (developed by Sebanti Sengupta and implemented by Daniel Taliun).

First, METAL estimates the number of individuals that are common among two or more studies based on Z-statistics from each study. Then, METAL adjusts for sample overlap when calculating overall Z-statistics by correcting the weights with the estimated number of individuals in common.

To enable correction for sample overlap in your sample size weighted meta-analysis, use OVERLAP ON command (valid only with SCHEME SAMPLESIZE). By default, METAL uses Z-statistics <1 for esimating the number of individuals that are common among studies. To change this threshold, use ZCUTOFF [number] command.

More information on the method can be found in:

Strand Information

  USESTRAND   ON
  STRANDLABEL StrandColumnHeading

Input files can contain a column that indicates which strand the alleles are coded on (given as +/-). If this column is present, you should issue the USESTRAND ON command and specify an appropriate header with the STRANDLABEL command. If USESTRAND is off, the strand is assumed to be “+” for all SNPs, although obvious strand problems are identified by METAL and appropriately handled (for example, when one study provides A/G alleles and a different study provides C/T alleles).

Filtering

Custom filters can be used to select SNPs for inclusion in the meta-analysis. This can be used, for example, to select SNPs within a specified minor-allele frequency range for analysis.

Here are some possible filters:

  ADDFILTER N > 1000
  ADDFILTER MAF > 0.01

Together, these two filters would only consider entries where the value in the N column is greater than 1000 and the value in the MAF column is also greater than 0.01.

Filters can be defined using the <, >, <=, >=, =, != and IN operators. The IS operator tests membership in a set. For example to restrict analysis to three interesting SNPs, use (note absence of spaces in list of SNPs):

  ADDFILTER MARKER_ID IN (rs1234,rs123456,rs123)

To remove all previously defined filters, use the command:

  REMOVEFILTERS

Verbose Mode

  VERBOSE ON

METAL allows for complete output of individual summary statistics for all SNPs in all input files. This can create a very large file and should be used with caution. Typically, one should create custom filters to restrict analyses to interesting SNPs of interest before using this option. This option can be useful for comparing direction of effect across many studies since METAL takes care of all the strand flipping and provides the direction of effect relative to the same allele. This is also a way to double-check that the expected data are being used appropriately by METAL.

Lenient Mode

   COLUMNCOUNTING STRICT         - requires expected number of columns in every row
   COLUMNCOUNTING LENIENT        - tries to interpret rows with fewer columns than expected

By default, METAL will skip lines in each input file that don't have the expected number of columns. This is usually a good idea because it avoids producing incorrect results when a column is missing. Sometimes (for example, when there are optional extra columns at the end of each line), the COLUMNCOUNTING LENIENT option can be useful.

Tracking Allele Frequencies

  AVERAGEFREQ ON
  MINMAXFREQ ON

METAL can optionally track the effect allele frequency across all files and report the mean, minimum and maximum effect allele frequency. These can be quite useful to check that allele frequencies are similar across different cohorts after METAL performs all strand alignment. Large differences in allele frequencies across studies can suggest inconsistent naming of reference alleles across studies. METAL requires all input files to have an allele frequency column when this feature is turned on. To specify the column header for allele frequency information, use the FREQLABEL command.

Custom Variables

We allow users to keep cumulative counts of custom variables across input files. An example of this might be to keep track of the sample size when performing standard-error weighted meta-analysis. The name of the custom variable should be defined once, before input files are loaded. The corresponding column label in each input file can be specified using the LABEL command. For example, to create a custom variable labeled TotalSampleSize that tallies the total of the N column across files, one could issue the commands:

 CUSTOMVARIABLE TotalSampleSize
 LABEL TotalSampleSize as N

If needed, the LABEL command can be used multiple times to customize column headers for each input file.

Input File Recommendations

We strongly recommend that both allele labels, corresponding to the the effect allele and non-effect allele, should be provided for all SNPs. As long as both allele columns are given for each input file, METAL appropriately accounts for situations when different input files use different reference alleles. Alleles can be coded numerically (A=1,C=2,G=3,T=4) or alphabetically (A,C,G,T,a,c,g,t) and can be on either strand if not an A/T or C/G SNP. For A/T or C/G SNPs, METAL requires SNPs to be on a consistent strand in different input files for the results to be interpretable. For other SNPs, METAL can automatically identify and resolve strand inconsistencies.

P-values that are < 0.0, > 1.0 or non-numeric will be treated as missing and generate a warning.

The EFFECT column can have positive and negative values (beta values from regression, for example), or simply directions of effect relative to the reference allele, listed as “+” and “-“. An EFFECT of “+” (or any positive number) with respect to the reference allele A (or effect allele A), for example, represents a case where increasing number of copies of allele A are correlated with increasing trait values. For discrete traits, it is common to report odds ratios, which are always positive. In this case, to calculate the direction of effect, one should look at the log of the odds ratio. METAL can compute the odds ratio for you if you specify EFFECT log(ODDS_RATIO_COLUMN)

To perform odds-ratio based meta-analysis, select SCHEME STDERR at the beginning of the script. Then, for each file, provide the natural log of the odds ratio as the EFFECT column or another appropriate statistic (such as the corresponding regression coefficient from a logistic regression analysis).

Example: A METAL Meta-Analysis Script

#THIS SCRIPT EXECUTES AN ANALYSIS OF EIGHT STUDIES
#THE RESULTS FOR EACH STUDY ARE STORED IN FILES Inputfile1.txt THROUGH Inputfile8.txt

#LOAD THE FIRST EIGHT INPUT FILES

# UNCOMMENT THE NEXT LINE TO ENABLE GenomicControl CORRECTION
# GENOMICCONTROL ON

# === DESCRIBE AND PROCESS THE FIRST INPUT FILE ===
MARKER SNP
ALLELE REF_ALLELE OTHER_ALLELE
EFFECT BETA
PVALUE PVALUE 
WEIGHT N
PROCESS inputfile1.txt

# === THE SECOND INPUT FILE HAS THE SAME FORMAT AND CAN BE PROCESSED IMMEDIATELY ===
PROCESS inputfile2.txt

# === DESCRIBE AND PROCESS THE THIRD INPUT FILE ===
MARKER SNP
ALLELE A_REF OTHER_ALLELE
EFFECT BETA
PVALUE pvalue 
WEIGHT N
PROCESS inputfile3.txt

# === DESCRIBE AND PROCESS THE FOURTH INPUT FILE ===
MARKER MARKERNAME
ALLELE EFFECTALLELE NON_EFFECT_ALLELE
EFFECT EFFECT1
PVALUE PVALUE
WEIGHT NONMISS
PROCESS inputfile4.txt 

# === CARRY OUT AN INTERIM ANALYSIS OF THE FIRST FOUR FILES ===
OUTFILE METAANALYSIS_inputfile1to4_ .tbl
ANALYZE 

# LOAD THE NEXT FOUR INPUT FILES

# === DESCRIBE AND PROCESS THE FIFTH INPUT FILE ===
MARKER rsid
ALLELE EFFECT_ALLELE OTHER_ALLELE
EFFECT BETA
PVALUE Add_p
WEIGHT total_N
SEPARATOR COMMAS
PROCESS inputfile5.txt

# === THE SIXTH INPUT FILE HAS THE SAME FORMAT AND CAN BE PROCESSED IMMEDIATELY ===
PROCESS inputfile6.txt

# === DESCRIBE AND PROCESS THE SEVENTH INPUT FILE ===
ALLELE ALLELE OTHER_ALLELE
MARKER SNP
EFFECT BETA
PVALUE PVALUE
WEIGHT N
SEPARATOR WHITESPACE
PROCESS inputfile7.txt

# === DESCRIBE AND PROCESS THE EIGHTH INPUT FILE ===
ALLELE BETA_ALLELE OTHER_ALLELE
MARKER SNP
EFFECT BETA
PVALUE P_VAL
WEIGHT N
PROCESS inputfile8.txt 

#for the final meta-analysis of all 8 samples only output results if the
#combined weight is greater than 10000 people

OUTFILE METAANALYSIS_inputfile1-8_ .tbl
MINWEIGHT 10000
ANALYZE 

QUIT