Difference between revisions of "Garlic"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 143: Line 143:
This file contains QQ plots for each study, and allele frequency plots for each study vs. study combination.
This file contains QQ plots for each study, and allele frequency plots for each study vs. study combination.
== Reporting bugs/issues ==
Email welchr@umich.edu.

Revision as of 21:21, 4 April 2011

Garlic is a quality control program for GWAS association results. These results would typically be fed into a meta-analysis program such as METAL.


Garlic is currently only available on the CSG cluster.


Running garlic requires the creation of a configuration file, which tells the program:

  • The names of the studies (datasets) and their locations
  • The quality control criteria to use for filtering SNPs
  • How to apply genomic control (using all available SNPs, or a subset)
  • .. and other information listed below.

Creating a configuration file

A sample configuration file is given below, along with comments that describe each setting.

## ------------------------------------------------------
## Default values section. 
## These values apply to all studies, unless overridden 
## in their sections. 
## ------------------------------------------------------

# Directory to write your QC'd data files. 
# The default is the currently directory. 
out_dir = .

# Prefix to add to your QC'd files. 
# Defaults to: 
out_prefix = "quality_"

# Missing data character for QC'd data files. 
# This is the character that will be used for missing values, 
# regardless of what the input file used. 
out_missing = .

# Delimiter to use when writing QC'd data files. 
# This can be "tab", "space", "comma", or an actual character
out_delim = tab

# Default delimiter when reading data files. 
# If you set "delim = guess", the program will try to guess the delimiter
# for each file. Ideally, you should provide the delimiter for each 
# file by including them in the study sections (see below.) 
delim = tab

# Default missing data characters. 
# Give a list of characters separated by space. 
# This means space is not a valid missing data character!
missing = . NA

# Specify the names of columns in your datasets. 
# These can be overridden in the study sections if one of the files
# deviates from the rest (for example, if the sample size column 
# is N_QTL, but one study errors and gives you NQTL.)  
snp = SNP
strand = STRAND
build = BUILD
chr = CHR
pos = POS
effect_allele = EFFECT_ALLELE
non_effect_allele = NON_EFFECT_ALLELE
n = N
hwe_pval = HWE_PVAL
call_rate = CALL_RATE
beta = BETA
se = SE
pval = PVAL
imputed = IMPUTED
imp_quality = IMP_QUALITY

# Genomic control computed from a list of SNPs, rather than genome-wide. 
# Give a file with a SNP per line. 
# Example:
# gc_snp_file = /path/to/QT_Replication_SNPs

# QC cutoff thresholds. 
# call_rate_cutoff -- Call rate
# hwe_cutoff -- Hardy Weinberg p-value 
# se_cutoff -- Standard error 
# mac_cutoff -- Minor allele count
# maf_cutoff -- Minor allele frequency 
call_rate_cutoff = 0.95
hwe_cutoff = 1E-04
se_cutoff = 10
mac_cutoff = 10
maf_cutoff = 0.01

# SNPs can be renamed according to a list of substitutions. 
# This is useful if chr:pos SNP names differ on chromosome names
# (see below for an example.) 
# Note: you should include more complicated filters before simpler ones. 
# Filters are executed in the order they are given. 
# snp_rename = ('chr23','chrX'), ('chrmt','chrMT'), ('chrc6_COX','chr6'), ('chr24','chrY'), ('chr6_QBL','chr6'), ('c6_QBL','chr6'), ('c6_COX','chr6')

## ------------------------------------------------------
## Study-specific sections. 
## Each section should contain at minimum the location 
## of the study's association results file.  
## ------------------------------------------------------

# An example study section below: 
# delim = comma
# file = /path/to/AMCPAS_CASES_FULLMETABOCHIP_06AUG2010_SK.txt.gz

Running garlic

On snowwhite/wonderland/fantasia, you can do:

garlic /path/to/your/config_file

If garlic is not on your PATH, you can refer to it directly as /usr/cluster/bin/garlic.


Garlic creates 2 important files:


The log file contains a copy of all console output by the program. It primarily contains:

  • Information on how many rows (SNPs) were dropped due to QC criteria
  • Genomic control information
  • Any errors encountered when reading the file, these errors include:
    • Bad strands or alleles in the file
    • Duplicated SNPs or positions
    • P-values that don't match their beta/SE


This file contains QQ plots for each study, and allele frequency plots for each study vs. study combination.

Reporting bugs/issues

Email welchr@umich.edu.