Difference between revisions of "METAL Documentation"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 109: Line 109:
 
METAL can also evaluate the evidence for heterogeneity.  When you do this, METAL will do a second pass of analysis to decide whether observed effect sizes (or test statistics) are homogeneous across samples.  This will result in a test statistic (with n-1) degrees of freedom for n samples.
 
METAL can also evaluate the evidence for heterogeneity.  When you do this, METAL will do a second pass of analysis to decide whether observed effect sizes (or test statistics) are homogeneous across samples.  This will result in a test statistic (with n-1) degrees of freedom for n samples.
  
==ANALYZE HETEROGENEITY==
+
== ANALYZE HETEROGENEITY ==
  
'''Example 1; Strand flips required'''
+
'''Example 1; Strand flips required'''  
{|
+
 
 +
{| border="1"
 
|-
 
|-
!
+
!  
!ALLELES
+
! ALLELES  
!EFFECT
+
! EFFECT  
!ALLELES Analyzed
+
! ALLELES Analyzed  
!EFFECT Analyzed
+
! EFFECT Analyzed
 
|-
 
|-
|Input file 1, T/G, +, a/c, +
+
| Input file 1 || T/G || + || a/c || +  
|Input file 2, T/G, +, a/c, +
+
|-
|Input file 3, A/C, +, a/c, +
+
| Input file 2 || T/G || + || a/c || +  
|Output, , , a/c, +
+
|-
 +
| Input file 3 || A/C || + || a/c || +  
 +
| Output ||  ||  || a/c || +
 
|-
 
|-
 
|}
 
|}
  
'''Example 2; Reference allele flips required'''
+
'''Example 2; Reference allele flips required''' ALLELES EFFECT ALLELES Analyzed EFFECT Analyzed Input file 1 C/A - a/c + Input file 2 C/A - a/c + Input file 3 A/C + a/c + Output a/c +
ALLELES EFFECT ALLELES Analyzed EFFECT Analyzed
+
 
Input file 1 C/A - a/c +
+
Example 2; Strand flips, numeric flips, and reference allele flips required ALLELES EFFECT ALLELES Analyzed EFFECT Analyzed Input file 1 G/T - a/c + Input file 2 2/1 - a/c + Input file 3 A/C + a/c + Output a/c +  
Input file 2 C/A - a/c +
+
 
Input file 3 A/C + a/c +
+
<br> Example text file to run metal;
Output a/c +
+
 
 +
#THIS FILE EXECUTES AN ANALYSIS OF ALL AVAILABLE INFORMATION
 +
 
 +
mkdir output-metal
 +
 
 +
metal &lt;&lt; EOT
 +
 
 +
#loading in the first half of inputfiles
  
Example 2; Strand flips, numeric flips, and reference allele flips required
+
MARKER SNP ALLELE REF_ALLELE OTHER_ALLELE EFFECT BETA WEIGHT N PVALUE PVALUE PROCESS inputfile1.txt PROCESS inputfiles2.txt PVALUE pvalue ALLELE A_REF OTHER_ALLELE MARKER SNP EFFECT BETA WEIGHT N PROCESS inputfile3.txt MARKER MARKERNAME ALLELE EFFECTALLELE NON_EFFECT_ALLELE EFFECT EFFECT1 WEIGHT NONMISS PVALUE PVALUE PROCESS inputfile4.txt
ALLELES EFFECT ALLELES Analyzed EFFECT Analyzed
 
Input file 1 G/T - a/c +
 
Input file 2 2/1 - a/c +
 
Input file 3 A/C + a/c +
 
Output a/c +
 
  
+
#meta-analysis can be performed at any stage
Example text file to run metal;
+
#and will include inputfiles 1-4
  
# THIS FILE EXECUTES AN ANALYSIS OF ALL AVAILABLE INFORMATION
+
OUTFILE METAANALYSIS_inputfile1to4_ .tbl ANALYZE
  
mkdir output-metal
+
#load the second half of inputfiles
  
metal << EOT
+
MARKER rsid ALLELE EFFECT_ALLELE OTHER_ALLELE EFFECT BETA WEIGHT total_N PVALUE Add_p SEPARATOR COMMAS PROCESS inputfile5.txt PROCESS inputfile6.txt ALLELE ALLELE OTHER_ALLELE MARKER SNP EFFECT BETA WEIGHT N PVALUE PVALUE SEPARATOR WHITESPACE PROCESS inputfile7.txt ALLELE BETA_ALLELE OTHER_ALLELE PVALUE P_VAL MARKER SNP EFFECT BETA WEIGHT N PROCESS inputfile8.txt
  
# loading in the first half of inputfiles
+
#for the final meta-analysis of all 8 samples
MARKER  SNP
+
#only output results if the combined weight
ALLELE  REF_ALLELE OTHER_ALLELE
+
#is greater than 10000 people
EFFECT  BETA
 
WEIGHT  N
 
PVALUE  PVALUE
 
PROCESS inputfile1.txt
 
PROCESS inputfiles2.txt
 
PVALUE  pvalue
 
ALLELE  A_REF OTHER_ALLELE
 
MARKER  SNP
 
EFFECT  BETA
 
WEIGHT  N
 
PROCESS inputfile3.txt
 
MARKER  MARKERNAME
 
ALLELE  EFFECTALLELE NON_EFFECT_ALLELE
 
EFFECT  EFFECT1
 
WEIGHT  NONMISS
 
PVALUE  PVALUE
 
PROCESS inputfile4.txt
 
# meta-analysis can be performed at any stage
 
# and will include inputfiles 1-4
 
OUTFILE METAANALYSIS_inputfile1to4_ .tbl
 
ANALYZE
 
  
# load the second half of inputfiles
+
OUTFILE METAANALYSIS_inputfile1-8_ .tbl MINWEIGHT 10000 ANALYZE  
MARKER  rsid
 
ALLELE  EFFECT_ALLELE OTHER_ALLELE
 
EFFECT  BETA
 
WEIGHT  total_N
 
PVALUE  Add_p
 
SEPARATOR  COMMAS
 
PROCESS inputfile5.txt
 
PROCESS inputfile6.txt
 
ALLELE  ALLELE OTHER_ALLELE
 
MARKER  SNP
 
EFFECT  BETA
 
WEIGHT  N
 
PVALUE  PVALUE
 
SEPARATOR WHITESPACE
 
PROCESS inputfile7.txt
 
ALLELE  BETA_ALLELE OTHER_ALLELE
 
PVALUE  P_VAL
 
MARKER  SNP
 
EFFECT  BETA
 
WEIGHT  N
 
PROCESS inputfile8.txt
 
# for the final meta-analysis of all 8 samples
 
# only output results if the combined weight
 
# is greater than 10000 people
 
OUTFILE METAANALYSIS_inputfile1-8_ .tbl
 
MINWEIGHT 10000
 
ANALYZE
 
  
QUIT
+
QUIT EOT
EOT
 

Revision as of 22:01, 2 February 2010

METAL

Goncalo Abecasis, Yun Li and Cristen Willer, 2007

METAL is a tool for performing meta-analysis of p-values from two or more individual studies. Metal creates a single summary p-value from studies which could not be analyzed together because of differences in ethnicity, phenotype distribution, gender, inability to share individual-level data, or any other reason.

For each marker, an arbitrary reference allele is selected and a z-statistic characterizing the evidence for association is used as input. The z-statistic summarizes the magnitude and the direction of effect relative to the reference allele. An overall z-statistic and p-value are then calculated from the weighted average of the individual statistics. Weights are proportional to the square-root of the number of individuals examined in each sample and selected such that the squared weights sum to 1.0. If a sample contains related individuals, a smaller ‘effective’ population size may be used, but simulations suggest that modest changes in the effective sample size have very little impact on the final p-value.

Usage instructions

METAL is invoked with the command ‘metal’ and allows for analysis to be performed interactively. A convenient alternative is to save all commands into a single text file which can be provided as input. An example is at the bottom of this document.

METAL allows for a variety of tabular formats in the input files, but the following information must be provided for each marker in each file;

There are a number of useful commands related to the analysis that are typically set early in the analysis. For example, the user can choose to weight studies in the meta-analysis using the inverse of the standard error, or the square root of the sample size. These are proportionate. Users should be cautious when weighting based on standard error that the beta and standard error are in the same units for all studies (i.e. same trait and same transformation applied to the trait). The default weighting scheme is SAMPLESIZE. SCHEME STDERR

METAL has an option to perform genomic control correction to all input files. METAL will estimate the inflation of the test statistic by comparing the median test statistic to that expected by chance, and then apply the genomic control correction to the p-values (for SAMPLESIZE weighted meta-analysis), or the standard error (for STDERR weighted meta-analysis). This should only be applied to files with whole genome data (i.e. should not be used for cohorts that only performed genotyping of replication SNPs). Genomic control can be turned off and on for different input files. We recommend applying genomic control correction to all input files, and also to the final output by loading the initial results file into METAL to perform genomic control correction on the final results. GENOMICCONTROL ON

METAL will optionally keep track of the effect allele frequency across all files and provide the mean, minimum and maximum. This can be quite useful to determine whether the frequencies are similar across different cohorts after METAL performs all strand alignment. METAL requires all input files to have an allele frequency column when this feature is turned on. AVERAGEFREQ ON MINMAXFREQ ON

Then, for each individual file, the following command will be used; FREQLABEL EffectAlleleFrequencyColumnHeading

We allow users to keep cumulative counts of custom variables across input files. An example of this might be to keep track of the sample size when performing standard-error weighted meta-analysis. The name of the custom variable should be defined once, before input files are loaded. The name of the heading in each file can be specified using the command LABEL for each file.

CUSTOMVARIABLE TotalSampleSize For each individual input file; LABEL TotalSampleSize as N

We allow flexible input formats, including a method for providing SNPs on different strands. Input files can contain a column which can indicate which strand the alleles are coded on (given as +/-). This feature can be turned on and off for different files in the same analysis. If USESTRAND is off, the strand is assumed to be “+” for all SNPs, although obvious strand problems for unambiguous SNPs are identified by METAL and appropriately handled (i.e. one study provides A/G alleles and a different study provides C/T alleles)

USESTRAND ON For each individual file; STRAND StrandColumnHeading

METAL allows for complete output of individual summary statistics for all SNPs in all input files. This can create a very large file and should be used with caution. Users should create custom variables to restrict analyses to significant SNPs or specific SNPs of interest before using this option. However, this option can be useful for comparing direction of effect across many studies since METAL takes care of all the strand flipping and provides the direction of effect relative to the same allele. This is also a way to double-check that the expected data are being used appropriately by METAL. VERBOSE ON

Another option allows METAL to check the appropriate number of columns exist for each input file, or allows METAL to ignore situations when there are not enough columns. The default is STRICT column counting. COLUMNCOUNTING LENIENT


Mandatory input for each input file;

  • Marker name
  • Reference allele (also known as the ‘effect allele’) and the non-reference allele
  • P-value
  • Weight (sample size or standard error)
  • Direction of effect relative to reference allele

Tables must have column headers that specify where the mandatory input can be found. The default name for the Marker column is ‘MARKER’, but can be changed to match the relevant input file column with the following command;

MARKER SNP

Similarly, the reference allele column, P-value column and effect column can be changed to match the input file;

ALLELE RefAlleleColumnHeading NonRefAlleleColumnHeading PVALUE PvalueColumnHeading EFFECT EffectColumnHeading

We strongly recommend that both allele labels, corresponding to the the effect allele and non-effect allele, respectively, are given for all SNPs. Alleles can be numeric (1,2,3,4) or alphabetical (A,C,G,T,a,c,g,t) and can be on either strand if not an A/T or C/G SNP. For A/T or C/G SNPs, METAL requires SNPs to be on a consistent strand in different input files for the results to be interpretable. For A/C, A/G, C/T, and G/T SNPs, METAL will flip the strand the alleles are on if not consistent between input files and METAL will output results with respect to the lowest numeric reference allele (see Examples 1, 2, and 3, below). If all files are consistent (for example, using the HapMap allele naming conventions), the strand of the alleles is left alone. As long as both allele columns are given for each input file, METAL appropriately accounts for situations when different input files use different reference alleles.

P-values of 0 or any other non-numeric value are assumed to be missing. Missing values are tolerated and a meta-analysis p-value will include results from any input file with non-missing values, even if only one input file has a p-value for this marker (see MINWEIGHT below for exclusion of markers with a small combined N).

The EFFECT column can have positive and negative values (beta values from regression, for example), or simply directions of effect relative to the reference allele, listed as “+” and “-“. An EFFECT of “+” (or any positive number) with respect to the reference allele A (or effect allele A), for example, represents a case where increasing number of copies of allele A are correlated with increasing trait values.

To perform odds-ratio based meta-analysis, select SCHEME STDERR at the beginning of the script. Then, for each file, provide the natural log of the odds ratio as the EFFECT column; EFFECT logOddsRatioColumnHeading

Or, METAL can compute the log of the odds ratio for you; EFFECT log(OddsRatioColumnHeading)

The weight for each MARKER can be assigned using a column; WEIGHTLABEL SampleSizeColumnHeading

Or; WEIGHT SampleSizeColumnHeading

Or the default weight for the entire file can be specified with the following command; E.g., if you have a sample size of 2000 for all markers in an input file DEFAULTWEIGHT 2000

The default delimiter in METAL is WHITESPACE (comma or tab is considered a delimiter) but can be changed to comma, tab or space.

SEPARATOR commas

Custom-designed filters can be used to select SNPs for inclusion in the meta-analysis. This can be used to select SNPs above or below a certain value (> or < ) from any column in the table, which can be useful for including SNPs with a minor allele frequency above a certain threshold. FILTER N > 1000 CUSTOMVARIABLE MAF LABEL MAF as MAF FILTER MAF > 0.01

To remove filters so that they no longer apply to files processed later, use; REMOVEFILTERS

Once the appropriate WEIGHT, MARKER, PVALUE and EFFECT labels are defined, with or without optional parameters to set the FREQLABEL, DELIMITER, STRAND, FILTER and LABEL commands, load an input file; PROCESS firstinputfile_bmi.txt

METAL does not require that all input files have a p-value result to calculate a meta-analysis p-value. Any available data is used. To restrict the output to only markers that have at least a specific weight (number of individuals), then use; > MINWEIGHT 10000 For example to restrict the output to show only Markers with at least 10,000 individuals.

Once all input files have had their column names defined and been loaded, then define your output filename (optional) and analyze! OUTPUTFILE myoutputfilename ANALYZE

METAL can also evaluate the evidence for heterogeneity. When you do this, METAL will do a second pass of analysis to decide whether observed effect sizes (or test statistics) are homogeneous across samples. This will result in a test statistic (with n-1) degrees of freedom for n samples.

ANALYZE HETEROGENEITY

Example 1; Strand flips required

ALLELES EFFECT ALLELES Analyzed EFFECT Analyzed
Input file 1 T/G + a/c +
Input file 2 T/G + a/c +
Input file 3 A/C + a/c + Output a/c +

Example 2; Reference allele flips required ALLELES EFFECT ALLELES Analyzed EFFECT Analyzed Input file 1 C/A - a/c + Input file 2 C/A - a/c + Input file 3 A/C + a/c + Output a/c +

Example 2; Strand flips, numeric flips, and reference allele flips required ALLELES EFFECT ALLELES Analyzed EFFECT Analyzed Input file 1 G/T - a/c + Input file 2 2/1 - a/c + Input file 3 A/C + a/c + Output a/c +


Example text file to run metal;

  1. THIS FILE EXECUTES AN ANALYSIS OF ALL AVAILABLE INFORMATION

mkdir output-metal

metal << EOT

  1. loading in the first half of inputfiles

MARKER SNP ALLELE REF_ALLELE OTHER_ALLELE EFFECT BETA WEIGHT N PVALUE PVALUE PROCESS inputfile1.txt PROCESS inputfiles2.txt PVALUE pvalue ALLELE A_REF OTHER_ALLELE MARKER SNP EFFECT BETA WEIGHT N PROCESS inputfile3.txt MARKER MARKERNAME ALLELE EFFECTALLELE NON_EFFECT_ALLELE EFFECT EFFECT1 WEIGHT NONMISS PVALUE PVALUE PROCESS inputfile4.txt

  1. meta-analysis can be performed at any stage
  2. and will include inputfiles 1-4

OUTFILE METAANALYSIS_inputfile1to4_ .tbl ANALYZE

  1. load the second half of inputfiles

MARKER rsid ALLELE EFFECT_ALLELE OTHER_ALLELE EFFECT BETA WEIGHT total_N PVALUE Add_p SEPARATOR COMMAS PROCESS inputfile5.txt PROCESS inputfile6.txt ALLELE ALLELE OTHER_ALLELE MARKER SNP EFFECT BETA WEIGHT N PVALUE PVALUE SEPARATOR WHITESPACE PROCESS inputfile7.txt ALLELE BETA_ALLELE OTHER_ALLELE PVALUE P_VAL MARKER SNP EFFECT BETA WEIGHT N PROCESS inputfile8.txt

  1. for the final meta-analysis of all 8 samples
  2. only output results if the combined weight
  3. is greater than 10000 people

OUTFILE METAANALYSIS_inputfile1-8_ .tbl MINWEIGHT 10000 ANALYZE

QUIT EOT