Difference between revisions of "METAL Documentation"

Revision as of 10:46, 25 March 2010

History

METAL was developed by Goncalo Abecasis, Yun Li and Cristen Willer. The first version was developed in 2007 and was used for the analyses presented in Sanna et al (2008) and Willer et al (2008). Since then, it has become quite a popular tool for the analysis of genomewide association scans.

Brief Description

METAL is a tool for meta-analysis genomewide association scans. METAL can combine either (a) test statistics and standard errors or (b) p-values across studies (taking sample size and direction of effect into account). METAL analysis is a convenient alternative to a direct analysis of merged data from multiple studies. It is especially appropriate when data from the individual studies cannot be analyzed together because of differences in ethnicity, phenotype distribution, gender or constraints in sharing of individual level data imposed. Meta-analysis results in little or no loss of efficiency compared to analysis of a combined dataset including data from all individual studies.

Approach

One of the most common questions we get concerns the approach used by METAL to carry out a meta-analysis using p-values as input. The process is actually quite simple! First, for each marker, an arbitrary reference allele is selected and a z-statistic characterizing the evidence for association is calculated. The z-statistic summarizes the magnitude and the direction of effect relative to the reference allele. Next, an overall z-statistic and p-value are then calculated from a weighted sum of the individual statistics. Weights are proportional to the square-root of the number of individuals examined in each sample and selected such that the squared weights sum to 1.0. For samples that contain related individuals, a smaller ‘effective’ sample size may be used, but simulations suggest that modest changes in the effective sample size have very little impact on the final p-value.

Usage instructions

METAL is a command line. It is typically run from a Linux, Unix or DOS prompt by invoking the command metal. Analyses can be run interactively or a simple script can be provided as input. Interactive analyses are usually convenient when learning how to use METAL, whereas the scripting approach is preferred for production use (as it allows analyses to be conveniently repeated). An example METAL script is included at the bottom of this page.

METAL has lots of options and here we have listed some common ones that, hopefully, will help you get started.

Help!

Issuing the HELP command lists all available commands and the current settings for each option. The list of all available commands is also available in the METAL Command Reference.

Input File Separators

METAL expects that each set of results will be summarized in a table. This table must be stored in a text file but otherwise METAL is quite flexible about details such as column separators, column headers and the like. This does mean that an essential bit of information needed before any meta-analysis is a description of each input file.

The first thing you should specify is the column separator. By default, METAL assumes columns are separated by each other by whitespace (which consists of any combination of space and tab characters). You can also specify:

  SEPARATOR  WHITESPACE    - the default
  SEPARATOR  COMMA         - for comma delimited files that are popular in some platforms
  SEPARATOR  TAB           - columns separated by a single tab, so that consecutive tabs indicate an empty column

Input File Columns

Each input file should include the following information:

A column with marker name, which should be consistent across studies
A column indicating the tested allele
A column indicating the other allele

If you are carrying out a sample size weighted analysis (based on p-values), you will also need:

A column indicating the direction of effect for the tested allele
A column indicating the corresponding p-value
An optional column indicating the sample size (if the sample size varies by marker)

If you are carrying out a meta-analysis based on standard errors, you will need:

A column indicating the estimated effect size for each marker
A column indicating the standard error of this effect size estimate

The header for each of these columns must be specified so that METAL knows how to interpret the data. As noted below, additional columns including allele frequency information, strand information, and others can also be present.

Here is a typical set of commands that would describe a table where the headers SNP, RefAllele, NonRefAllele, Pvalue and Beta correspond to the MARKER, ALLELE 1 and 2, PVALUE and EFFECT columns:

 MARKERLABEL   SNP
 ALLELELABELS  RefAllele NonRefAllele
 PVALUELABEL   P-value
 EFFECTLABEL   Effect

These can be abbreviated as:

 MARKER        SNP
 ALLELE        RefAllele NonRefAllele
 PVALUE        P-value
 EFFECT        Effect

Once all appropriate headers have been specified, issuing the PROCESS command will read an input file and update summary statistics to take the results it contains into account. Thus:

 PROCESS      study1-results.tbl

After issuing the PROCESS command, you can update headers (if needed) and then issue other PROCESS commands to read additional input files.

Input File Recommendations

We strongly recommend that both allele labels, corresponding to the the effect allele and non-effect allele, should be provided for all SNPs. As long as both allele columns are given for each input file, METAL appropriately accounts for situations when different input files use different reference alleles. Alleles can be coded numerically (A=1,C=2,G=3,T=4) or alphabetically (A,C,G,T,a,c,g,t) and can be on either strand if not an A/T or C/G SNP. For A/T or C/G SNPs, METAL requires SNPs to be on a consistent strand in different input files for the results to be interpretable. For other SNPs, METAL can automatically identify and resolve strand inconsistencies.

P-values that are < 0.0, > 1.0 or non-numeric will be treated as missing and generate a warning.

The EFFECT column can have positive and negative values (beta values from regression, for example), or simply directions of effect relative to the reference allele, listed as “+” and “-“. An EFFECT of “+” (or any positive number) with respect to the reference allele A (or effect allele A), for example, represents a case where increasing number of copies of allele A are correlated with increasing trait values. For discrete traits, it is common to report odds ratios, which are always positive. In this case, to calculate the direction of effect, one should look at the log of the odds ratio. METAL can compute the odds ratio for you if you specify EFFECT log(ODDS_RATIO_COLUMN)

To perform odds-ratio based meta-analysis, select SCHEME STDERR at the beginning of the script. Then, for each file, provide the natural log of the odds ratio as the EFFECT column or another appropriate statistic (such as the corresponding regression coefficient from a logistic regression analysis).

Specifying Weights in P-value Based Analysis

The weight for each MARKER can be stored in a column in the table (specified with the WEIGHTLABEL or WEIGHT commands) or can be fixed for each file (in which case the fixed weight can be set with the DEFAULTWEIGHT command).

Lenient Mode

   COLUMNCOUNTING STRICT         - requires expected number of columns in every row
   COLUMNCOUNTING LENIENT        - tries to interpret rows with fewer columns than expected

By default, METAL will skip lines in each input file that don't have the expected number of columns. This is usually a good idea because it avoids producing incorrect results when a column is missing. Sometimes (for example, when there are optional extra columns at the end of each line), the COLUMNCOUNTING LENIENT option can be useful.

Genomic Control Correction

  GENOMICCONTROL OFF      - the default, no adjustment to test statistics
  GENOMICCONTROL ON       - automatically correct test statistics to account for small amounts of population stratification or unaccounted for relatedness
  GENOMICCONTROL [value]  - correct test statistics using the specified inflation factor

METAL has the ability to apply a genomic control correction to all input files. METAL will estimate the inflation of the test statistic by comparing the median test statistic to that expected by chance, and then apply the genomic control correction to the p-values (for SAMPLESIZE weighted meta-analysis) or the standard error (for STDERR weighted meta-analysis). This should only be applied to files with whole genome data (i.e. should not be used for settings where results are only available for a candidate locus or a small number of SNPs selected for follow-up of GWAS results). Genomic control settings can be customized for each input file. We recommend applying genomic control correction to all input files that include genomewide data and, in addition, to the meta-analysis results. To apply genomic control to the meta-analysis results, just perform an initial meta-analysis and then load the initial set of results into METAL to get final, genomic control adjusted results.

Selecting an Analysis Scheme

 SCHEME SAMPLESIZE        - default approach, uses p-value and direction of effect, weighted according to sample size
 SCHEME STDERR            - classical approach, uses effect size estimates and standard errors

By default, METAL combines p-values across studies taking into account a study specific weight (typically, the sample size) and direction of effect. This behavior can be requested explicitly with the SCHEME SAMPLESIZE command. An alternative can be requested with the SCHEME STDERR command and weights effect size estimates using the inverse of the corresponding standard errors. While standard error based weights are more common in the biostatistical literature, if you decide to use this approach, it is very important to ensure that effect size estimates (beta coefficients) and standard errors use the same units in all studies (i.e. make sure that the exact same trait was examined in each study and that the same transformations were applied). Inconsistent use of measurement units across studies is the most common cause of discrepancies between these two analysis strategies.

Tracking Allele Frequencies

  AVERAGEFREQ ON
  MINMAXFREQ ON

METAL can optionally track the effect allele frequency across all files and report its mean, minimum and maximum. These can be quite useful to check that allele frequencies are similar across different cohorts after METAL performs all strand alignment. Large differences in allele frequencies across studies can suggest inconsistent naming of alleles across studies. METAL requires all input files to have an allele frequency column when this feature is turned on. To specify the column header for allele frequency information, use the FREQLABEL command.

Custom Variables

We allow users to keep cumulative counts of custom variables across input files. An example of this might be to keep track of the sample size when performing standard-error weighted meta-analysis. The name of the custom variable should be defined once, before input files are loaded. The corresponding column label in each input file can be specified using the LABEL command. For example, to create a custom variable labeled TotalSampleSize that tallies the total of the N column across files, one could issue the commands:

 CUSTOMVARIABLE TotalSampleSize
 LABEL TotalSampleSize as N

If needed, the LABEL command can be used multiple times to customize column headers for each input file.

Strand Information

  USESTRAND   ON
  STRANDLABEL StrandColumnHeading

Input files can contain a column that indicates which strand the alleles are coded on (given as +/-). If this column is present, you should issue the USESTRAND ON command and specify an appropriate header with the STRANDLABEL command. If USESTRAND is off, the strand is assumed to be “+” for all SNPs, although obvious strand problems are identified by METAL and appropriately handled (for example, when one study provides A/G alleles and a different study provides C/T alleles).

Verbose Mode

  VERBOSE ON

METAL allows for complete output of individual summary statistics for all SNPs in all input files. This can create a very large file and should be used with caution. Typically, one should create custom filters to restrict analyses to interesting SNPs of interest before using this option. This option can be useful for comparing direction of effect across many studies since METAL takes care of all the strand flipping and provides the direction of effect relative to the same allele. This is also a way to double-check that the expected data are being used appropriately by METAL.

Custom-designed filters can be used to select SNPs for inclusion in the meta-analysis. This can be used to select SNPs above or below a certain value (> or < ) from any column in the table, which can be useful for including SNPs with a minor allele frequency above a certain threshold. FILTER N > 1000 CUSTOMVARIABLE MAF LABEL MAF as MAF FILTER MAF > 0.01

To remove filters so that they no longer apply to files processed later, use;

REMOVEFILTERS

Once the appropriate WEIGHT, MARKER, PVALUE and EFFECT labels are defined, with or without optional parameters to set the FREQLABEL, DELIMITER, STRAND, FILTER and LABEL commands, load an input file; PROCESS firstinputfile_bmi.txt

METAL does not require that all input files have a p-value result to calculate a meta-analysis p-value. Any available data is used. To restrict the output to only markers that have at least a specific weight (number of individuals), then use;

> MINWEIGHT 10000

For example to restrict the output to show only Markers with at least 10,000 individuals.

Once all input files have had their column names defined and been loaded, then define your output filename (optional) and analyze! OUTPUTFILE myoutputfilename ANALYZE

METAL can also evaluate the evidence for heterogeneity. When you do this, METAL will do a second pass of analysis to decide whether observed effect sizes (or test statistics) are homogeneous across samples. This will result in a test statistic (with n-1) degrees of freedom for n samples.

Analyze Heterogeneity

Example 1; Strand flips required

	ALLELES	EFFECT	ALLELES Analyzed	EFFECT Analyzed
Input file 1	T/G	+	a/c	+
Input file 2	T/G	+	a/c	+
Input file 3	A/C	+	a/c	+
Output			a/c	+

Example 2; Reference allele flips required

	ALLELES	EFFECT	ALLELES Analyzed	EFFECT Analyzed
Input file 1	C/A	-	a/c	+
Input file 2	C/A	-	a/c	+
Input file 3	A/C	+	a/c	+
Output			a/c	+

Example 2; Strand flips, numeric flips, and reference allele flips required

	ALLELES	EFFECT	ALLELES Analyzed	EFFECT Analyzed
Input file 1	G/T	-	a/c	+
Input file 2	2/1	-	a/c	+
Input file 3	A/C	+	a/c	+
Output			a/c	+

Example text file to run metal;

#THIS FILE EXECUTES AN ANALYSIS OF ALL AVAILABLE INFORMATION

mkdir output-metal 

metal << EOT 

#loading in the first half of inputfiles

MARKER SNP
ALLELE REF_ALLELE OTHER_ALLELE
EFFECT BETA
WEIGHT N
PVALUE PVALUE PROCESS inputfile1.txt
PROCESS inputfiles2.txt
PVALUE pvalue ALLELE A_REF OTHER_ALLELE
MARKER SNP
EFFECT BETA
WEIGHT N
PROCESS inputfile3.txt
MARKER MARKERNAME
ALLELE EFFECTALLELE NON_EFFECT_ALLELE
EFFECT EFFECT1
WEIGHT NONMISS
PVALUE PVALUE
PROCESS inputfile4.txt 

#meta-analysis can be performed at any stage and will include inputfiles 1-4

OUTFILE METAANALYSIS_inputfile1to4_ .tbl
ANALYZE 

#load the second half of inputfiles

MARKER rsid
ALLELE EFFECT_ALLELE OTHER_ALLELE
EFFECT BETA
WEIGHT total_N
PVALUE Add_p
SEPARATOR COMMAS
PROCESS inputfile5.txt
PROCESS inputfile6.txt
ALLELE ALLELE OTHER_ALLELE
MARKER SNP
EFFECT BETA
WEIGHT N
PVALUE PVALUE
SEPARATOR WHITESPACE
PROCESS inputfile7.txt
ALLELE BETA_ALLELE OTHER_ALLELE
PVALUE P_VAL
MARKER SNP
EFFECT BETA
WEIGHT N
PROCESS inputfile8.txt 

#for the final meta-analysis of all 8 samples only output results if the
#combined weight is greater than 10000 people

OUTFILE METAANALYSIS_inputfile1-8_ .tbl
MINWEIGHT 10000
ANALYZE 

QUIT
EOT

@@ Line 21: / Line 21: @@
 Issuing the <code>HELP</code> command lists all available commands and the current settings for each option. The list of all available commands is also available in the [[METAL Command Reference]].
-=== Input Files ===
+=== Input File Separators ===
 METAL expects that each set of results will be summarized in a table. This table must be stored in a text file but otherwise METAL is quite flexible about details such as column separators, column headers and the like. This does mean that an essential bit of information needed before any meta-analysis is a description of each input file.
-Each input file should include the follow information:
+The first thing you should specify is the column separator. By default, METAL assumes columns are separated by each other by whitespace (which consists of any combination of space and tab characters). You can also specify:
+   SEPARATOR  WHITESPACE    - the default
+   SEPARATOR  COMMA         - for comma delimited files that are popular in some platforms
+   SEPARATOR  TAB           - columns separated by a single tab, so that consecutive tabs indicate an empty column
+=== Input File Columns ===
+Each input file should include the following information:
 * A column with marker name, which should be consistent across studies