Difference between revisions of "METAL Documentation"
Line 21: | Line 21: | ||
Issuing the <code>HELP</code> command lists all available commands and the current settings for each option. The list of all available commands is also available in the [[METAL Command Reference]]. | Issuing the <code>HELP</code> command lists all available commands and the current settings for each option. The list of all available commands is also available in the [[METAL Command Reference]]. | ||
− | === Input | + | === Input File Separators === |
METAL expects that each set of results will be summarized in a table. This table must be stored in a text file but otherwise METAL is quite flexible about details such as column separators, column headers and the like. This does mean that an essential bit of information needed before any meta-analysis is a description of each input file. | METAL expects that each set of results will be summarized in a table. This table must be stored in a text file but otherwise METAL is quite flexible about details such as column separators, column headers and the like. This does mean that an essential bit of information needed before any meta-analysis is a description of each input file. | ||
− | Each input file should include the | + | The first thing you should specify is the column separator. By default, METAL assumes columns are separated by each other by whitespace (which consists of any combination of space and tab characters). You can also specify: |
+ | |||
+ | SEPARATOR WHITESPACE - the default | ||
+ | SEPARATOR COMMA - for comma delimited files that are popular in some platforms | ||
+ | SEPARATOR TAB - columns separated by a single tab, so that consecutive tabs indicate an empty column | ||
+ | |||
+ | === Input File Columns === | ||
+ | |||
+ | Each input file should include the following information: | ||
* A column with marker name, which should be consistent across studies | * A column with marker name, which should be consistent across studies |
Revision as of 10:46, 25 March 2010
History
METAL was developed by Goncalo Abecasis, Yun Li and Cristen Willer. The first version was developed in 2007 and was used for the analyses presented in Sanna et al (2008) and Willer et al (2008). Since then, it has become quite a popular tool for the analysis of genomewide association scans.
Brief Description
METAL is a tool for meta-analysis genomewide association scans. METAL can combine either (a) test statistics and standard errors or (b) p-values across studies (taking sample size and direction of effect into account). METAL analysis is a convenient alternative to a direct analysis of merged data from multiple studies. It is especially appropriate when data from the individual studies cannot be analyzed together because of differences in ethnicity, phenotype distribution, gender or constraints in sharing of individual level data imposed. Meta-analysis results in little or no loss of efficiency compared to analysis of a combined dataset including data from all individual studies.
Approach
One of the most common questions we get concerns the approach used by METAL to carry out a meta-analysis using p-values as input. The process is actually quite simple! First, for each marker, an arbitrary reference allele is selected and a z-statistic characterizing the evidence for association is calculated. The z-statistic summarizes the magnitude and the direction of effect relative to the reference allele. Next, an overall z-statistic and p-value are then calculated from a weighted sum of the individual statistics. Weights are proportional to the square-root of the number of individuals examined in each sample and selected such that the squared weights sum to 1.0. For samples that contain related individuals, a smaller ‘effective’ sample size may be used, but simulations suggest that modest changes in the effective sample size have very little impact on the final p-value.
Usage instructions
METAL is a command line. It is typically run from a Linux, Unix or DOS prompt by invoking the command metal
. Analyses can be run interactively or a simple script can be provided as input. Interactive analyses are usually convenient when learning how to use METAL, whereas the scripting approach is preferred for production use (as it allows analyses to be conveniently repeated). An example METAL script is included at the bottom of this page.
METAL has lots of options and here we have listed some common ones that, hopefully, will help you get started.
Help!
Issuing the HELP
command lists all available commands and the current settings for each option. The list of all available commands is also available in the METAL Command Reference.
Input File Separators
METAL expects that each set of results will be summarized in a table. This table must be stored in a text file but otherwise METAL is quite flexible about details such as column separators, column headers and the like. This does mean that an essential bit of information needed before any meta-analysis is a description of each input file.
The first thing you should specify is the column separator. By default, METAL assumes columns are separated by each other by whitespace (which consists of any combination of space and tab characters). You can also specify:
SEPARATOR WHITESPACE - the default SEPARATOR COMMA - for comma delimited files that are popular in some platforms SEPARATOR TAB - columns separated by a single tab, so that consecutive tabs indicate an empty column
Input File Columns
Each input file should include the following information:
- A column with marker name, which should be consistent across studies
- A column indicating the tested allele
- A column indicating the other allele
If you are carrying out a sample size weighted analysis (based on p-values), you will also need:
- A column indicating the direction of effect for the tested allele
- A column indicating the corresponding p-value
- An optional column indicating the sample size (if the sample size varies by marker)
If you are carrying out a meta-analysis based on standard errors, you will need:
- A column indicating the estimated effect size for each marker
- A column indicating the standard error of this effect size estimate
The header for each of these columns must be specified so that METAL knows how to interpret the data. As noted below, additional columns including allele frequency information, strand information, and others can also be present.
Here is a typical set of commands that would describe a table where the headers SNP, RefAllele, NonRefAllele, Pvalue and Beta correspond to the MARKER, ALLELE 1 and 2, PVALUE and EFFECT columns:
MARKERLABEL SNP ALLELELABELS RefAllele NonRefAllele PVALUELABEL P-value EFFECTLABEL Effect
These can be abbreviated as:
MARKER SNP ALLELE RefAllele NonRefAllele PVALUE P-value EFFECT Effect
Once all appropriate headers have been specified, issuing the PROCESS
command will read an input file and update summary statistics to take the results it contains into account. Thus:
PROCESS study1-results.tbl
After issuing the PROCESS
command, you can update headers (if needed) and then issue other PROCESS commands to read additional input files.
Input File Recommendations
We strongly recommend that both allele labels, corresponding to the the effect allele and non-effect allele, should be provided for all SNPs. As long as both allele columns are given for each input file, METAL appropriately accounts for situations when different input files use different reference alleles. Alleles can be coded numerically (A=1,C=2,G=3,T=4) or alphabetically (A,C,G,T,a,c,g,t) and can be on either strand if not an A/T or C/G SNP. For A/T or C/G SNPs, METAL requires SNPs to be on a consistent strand in different input files for the results to be interpretable. For other SNPs, METAL can automatically identify and resolve strand inconsistencies.
P-values that are < 0.0, > 1.0 or non-numeric will be treated as missing and generate a warning.
The EFFECT column can have positive and negative values (beta values from regression, for example), or simply directions of effect relative to the reference allele, listed as “+” and “-“. An EFFECT of “+” (or any positive number) with respect to the reference allele A (or effect allele A), for example, represents a case where increasing number of copies of allele A are correlated with increasing trait values. For discrete traits, it is common to report odds ratios, which are always positive. In this case, to calculate the direction of effect, one should look at the log of the odds ratio. METAL can compute the odds ratio for you if you specify EFFECT log(ODDS_RATIO_COLUMN)
To perform odds-ratio based meta-analysis, select SCHEME STDERR at the beginning of the script. Then, for each file, provide the natural log of the odds ratio as the EFFECT column or another appropriate statistic (such as the corresponding regression coefficient from a logistic regression analysis).
Specifying Weights in P-value Based Analysis
The weight for each MARKER can be stored in a column in the table (specified with the WEIGHTLABEL
or WEIGHT
commands) or can be fixed for each file (in which case the fixed weight can be set with the DEFAULTWEIGHT
command).
Lenient Mode
COLUMNCOUNTING STRICT - requires expected number of columns in every row COLUMNCOUNTING LENIENT - tries to interpret rows with fewer columns than expected
By default, METAL will skip lines in each input file that don't have the expected number of columns. This is usually a good idea because it avoids producing incorrect results when a column is missing. Sometimes (for example, when there are optional extra columns at the end of each line), the COLUMNCOUNTING LENIENT
option can be useful.
Genomic Control Correction
GENOMICCONTROL OFF - the default, no adjustment to test statistics GENOMICCONTROL ON - automatically correct test statistics to account for small amounts of population stratification or unaccounted for relatedness GENOMICCONTROL [value] - correct test statistics using the specified inflation factor
METAL has the ability to apply a genomic control correction to all input files. METAL will estimate the inflation of the test statistic by comparing the median test statistic to that expected by chance, and then apply the genomic control correction to the p-values (for SAMPLESIZE weighted meta-analysis) or the standard error (for STDERR weighted meta-analysis). This should only be applied to files with whole genome data (i.e. should not be used for settings where results are only available for a candidate locus or a small number of SNPs selected for follow-up of GWAS results). Genomic control settings can be customized for each input file. We recommend applying genomic control correction to all input files that include genomewide data and, in addition, to the meta-analysis results. To apply genomic control to the meta-analysis results, just perform an initial meta-analysis and then load the initial set of results into METAL to get final, genomic control adjusted results.
Selecting an Analysis Scheme
SCHEME SAMPLESIZE - default approach, uses p-value and direction of effect, weighted according to sample size SCHEME STDERR - classical approach, uses effect size estimates and standard errors
By default, METAL combines p-values across studies taking into account a study specific weight (typically, the sample size) and direction of effect. This behavior can be requested explicitly with the SCHEME SAMPLESIZE
command. An alternative can be requested with the SCHEME STDERR
command and weights effect size estimates using the inverse of the corresponding standard errors. While standard error based weights are more common in the biostatistical literature, if you decide to use this approach, it is very important to ensure that effect size estimates (beta coefficients) and standard errors use the same units in all studies (i.e. make sure that the exact same trait was examined in each study and that the same transformations were applied). Inconsistent use of measurement units across studies is the most common cause of discrepancies between these two analysis strategies.
Tracking Allele Frequencies
AVERAGEFREQ ON MINMAXFREQ ON
METAL can optionally track the effect allele frequency across all files and report its mean, minimum and maximum. These can be quite useful to check that allele frequencies are similar across different cohorts after METAL performs all strand alignment. Large differences in allele frequencies across studies can suggest inconsistent naming of alleles across studies. METAL requires all input files to have an allele frequency column when this feature is turned on. To specify the column header for allele frequency information, use the FREQLABEL
command.
Custom Variables
We allow users to keep cumulative counts of custom variables across input files. An example of this might be to keep track of the sample size when performing standard-error weighted meta-analysis. The name of the custom variable should be defined once, before input files are loaded. The corresponding column label in each input file can be specified using the LABEL
command. For example, to create a custom variable labeled TotalSampleSize that tallies the total of the N column across files, one could issue the commands:
CUSTOMVARIABLE TotalSampleSize LABEL TotalSampleSize as N
If needed, the LABEL
command can be used multiple times to customize column headers for each input file.
Strand Information
USESTRAND ON STRANDLABEL StrandColumnHeading
Input files can contain a column that indicates which strand the alleles are coded on (given as +/-). If this column is present, you should issue the USESTRAND ON
command and specify an appropriate header with the STRANDLABEL
command. If USESTRAND is off, the strand is assumed to be “+” for all SNPs, although obvious strand problems are identified by METAL and appropriately handled (for example, when one study provides A/G alleles and a different study provides C/T alleles).
Verbose Mode
VERBOSE ON
METAL allows for complete output of individual summary statistics for all SNPs in all input files. This can create a very large file and should be used with caution. Typically, one should create custom filters to restrict analyses to interesting SNPs of interest before using this option. This option can be useful for comparing direction of effect across many studies since METAL takes care of all the strand flipping and provides the direction of effect relative to the same allele. This is also a way to double-check that the expected data are being used appropriately by METAL.
Custom-designed filters can be used to select SNPs for inclusion in the meta-analysis. This can be used to select SNPs above or below a certain value (> or < ) from any column in the table, which can be useful for including SNPs with a minor allele frequency above a certain threshold. FILTER N > 1000 CUSTOMVARIABLE MAF LABEL MAF as MAF FILTER MAF > 0.01
To remove filters so that they no longer apply to files processed later, use;
REMOVEFILTERS
Once the appropriate WEIGHT, MARKER, PVALUE and EFFECT labels are defined, with or without optional parameters to set the FREQLABEL, DELIMITER, STRAND, FILTER and LABEL commands, load an input file; PROCESS firstinputfile_bmi.txt
METAL does not require that all input files have a p-value result to calculate a meta-analysis p-value. Any available data is used. To restrict the output to only markers that have at least a specific weight (number of individuals), then use;
> MINWEIGHT 10000
For example to restrict the output to show only Markers with at least 10,000 individuals.
Once all input files have had their column names defined and been loaded, then define your output filename (optional) and analyze! OUTPUTFILE myoutputfilename ANALYZE
METAL can also evaluate the evidence for heterogeneity. When you do this, METAL will do a second pass of analysis to decide whether observed effect sizes (or test statistics) are homogeneous across samples. This will result in a test statistic (with n-1) degrees of freedom for n samples.
Analyze Heterogeneity
Example 1; Strand flips required
ALLELES | EFFECT | ALLELES Analyzed | EFFECT Analyzed | |
---|---|---|---|---|
Input file 1 | T/G | + | a/c | + |
Input file 2 | T/G | + | a/c | + |
Input file 3 | A/C | + | a/c | + |
Output | a/c | + |
Example 2; Reference allele flips required
ALLELES | EFFECT | ALLELES Analyzed | EFFECT Analyzed | |
---|---|---|---|---|
Input file 1 | C/A | - | a/c | + |
Input file 2 | C/A | - | a/c | + |
Input file 3 | A/C | + | a/c | + |
Output | a/c | + |
Example 2; Strand flips, numeric flips, and reference allele flips required
ALLELES | EFFECT | ALLELES Analyzed | EFFECT Analyzed | |
---|---|---|---|---|
Input file 1 | G/T | - | a/c | + |
Input file 2 | 2/1 | - | a/c | + |
Input file 3 | A/C | + | a/c | + |
Output | a/c | + |
Example text file to run metal;
#THIS FILE EXECUTES AN ANALYSIS OF ALL AVAILABLE INFORMATION mkdir output-metal metal << EOT #loading in the first half of inputfiles MARKER SNP ALLELE REF_ALLELE OTHER_ALLELE EFFECT BETA WEIGHT N PVALUE PVALUE PROCESS inputfile1.txt PROCESS inputfiles2.txt PVALUE pvalue ALLELE A_REF OTHER_ALLELE MARKER SNP EFFECT BETA WEIGHT N PROCESS inputfile3.txt MARKER MARKERNAME ALLELE EFFECTALLELE NON_EFFECT_ALLELE EFFECT EFFECT1 WEIGHT NONMISS PVALUE PVALUE PROCESS inputfile4.txt #meta-analysis can be performed at any stage and will include inputfiles 1-4 OUTFILE METAANALYSIS_inputfile1to4_ .tbl ANALYZE #load the second half of inputfiles MARKER rsid ALLELE EFFECT_ALLELE OTHER_ALLELE EFFECT BETA WEIGHT total_N PVALUE Add_p SEPARATOR COMMAS PROCESS inputfile5.txt PROCESS inputfile6.txt ALLELE ALLELE OTHER_ALLELE MARKER SNP EFFECT BETA WEIGHT N PVALUE PVALUE SEPARATOR WHITESPACE PROCESS inputfile7.txt ALLELE BETA_ALLELE OTHER_ALLELE PVALUE P_VAL MARKER SNP EFFECT BETA WEIGHT N PROCESS inputfile8.txt #for the final meta-analysis of all 8 samples only output results if the #combined weight is greater than 10000 people OUTFILE METAANALYSIS_inputfile1-8_ .tbl MINWEIGHT 10000 ANALYZE QUIT EOT