METAL Quick Start

From Genome Analysis Wiki
Jump to navigationJump to search

Learning by Example

Many people find that it easiest to learn by example. So, if reading through our extensive documentation is not for you, you might find it more appealing to walk through a simple METAL analysis.

The Glucose Data

This example examines evidence for association between fasting glucose levels and genetic markers in the G6PC2 (chr. 2), GCK (chr. 7) and MTNR1B (chr. 11) regions. It uses results from 3 genomewide association studies: FUSION, SardiNIA and DGI. Genetic variants in the three loci impact fasting glucose levels and, in the case of MTRN1B, also impact the risk of type 2 diabetes.

You can download a copy of the input files used in this example analysis from the METAL Download Page.

Input Files

Initially, each of the three studies analyzed association between fasting glucose levels and genotyped and imputed variants in each of the three loci. Study specific results are stored in the following files:

Input Files with Single Study Results
'STUDY INPUT FILE FILE SIZE
FUSION MAGIC_FUSION_Results.txt.gz 46 kb
SardiNIA magic_SARDINIA.tbl 188 kb
DGI DGI_three_regions.txt 188 kb

Although some effort was made to harmonize analysis strategies (for example, by excluding individuals with a diagnosis of diabetes as well as other individuals with elevated fasting glucose levels), you will notice that the three files are formatted somewhat differently. In order to combine results across studies, one critical piece of information that METAL will need are details of the formating used for each file. You probably also noticed that the FUSION input files are a bit smaller than those for the other studies. This is because they have been compressed with [www.gzip.org gzip]. This is not a problem, because METAL can transparently handle gzip-compressed files.

Running METAL

If you haven't used METAL before, try starting the program from the Linux or Windows command prompt. By default, METAL runs in interactive mode and responds to each of the commands you issue (by typing!, unfortunately, there is no point and click interface).

For example, you can trying issuing the HELP command, to which METAL responds by printing a list of available options. Or you could try to issue the command MARKERLABEL SNP to indicate that marker names are tabulated in a column labelled SNP. METAL would respond to this later command by reporting:

## Set marker header to SNP ...

For convenience, many commands can be shortened. For example, instead of writing MARKERLABEL SNP, you could write MARKER SNP. If you make a mistake and METAL doesn't understand your command, it will usually say:

## ERROR: The command you issued could not be processed ...

Interactive mode is an easy way to learn METAL, once you are familiar with the basic workings of the program it will typically be better to store a series of commands in a METAL script which can be conveniently edited and run multiple times. To run commands stored in a script, just specify the script name on the METAL command line. As with other command line programs, you can redirect screen output to a file using the > operator.

The METAL Script

We will now walk through the METAL Glucose Example Script. The first thing to know is that METAL scripts can include comments and that these are indicating by using a hash sign # as the first character in a line. Thus:

 # This is a comment.

Our example script starts with a series of comments, which we will ignore for now. Instead, we will proceed directly to the description of study specific input files -- which is an essential step in any meta-analysis with METAL.

Describing the DGI Input Files

METAL expects study specific results will be stored in a plain text tabular file, with one line per marker. While that sounds simple, there is a wide variety of ways to implement the details. Thus, the first step in any analysis is to specify these details for each study being analyzed. The Diabetes Genetic Initiative (DGI) results are stored described in the following snippet of METAL code:

MARKER   SNP
WEIGHT   N
ALLELE   EFFECT_ALLELE NON_EFFECT_ALLELE
FREQ     EFFECT_ALLELE_FREQ
EFFECT   BETA
STDERR   SE
PVAL     P_VAL

Each line specifies the header for a key column in the input. For example, here we specify that the marker name is stored in a column labelled SNP (with the MARKER SNP command), that the number of individuals analyzed for each row -- and which can be used to weight the contribution of each study in sample size and p-value based meta-analysis is stored in a column labeled N (with the WEIGHT N command), that the two allele labels are stored in columns labelled EFFECT_ALLELE and NON_EFFECT_ALLELE (with the ALLELE EFFECT_ALLELE NON_EFFECT_ALLELE command), that the allele frequency of the first of these alleles is stored in the column EFFECT_ALLELE_FREQ (with the FREQ EFFECT_ALLELE_FREQ command), that the effect size is stored in a column labelled BETA (with the EFFECT BETA command), and that the standard error and p-value are stored in columns labeled SE and P_VAL (with the STDERR SE and PVAL P_VAL commands).

Describing the FUSION Input Files

Describing the SardiNIA Input Files

Acknowledgements

Thanks to colleagues in the FUSION, DGI and SardiNIA studies for sharing study results for the MTRN1B, G6PC2 and GCK regions. Special thanks also to Josee Dupuis, at Boston College, for helping prepare scripts and input files for this worked example.