Difference between revisions of "Garlic"

Revision as of 21:21, 4 April 2011

Garlic is a quality control program for GWAS association results. These results would typically be fed into a meta-analysis program such as METAL.

Availability

Garlic is currently only available on the CSG cluster.

Usage

Running garlic requires the creation of a configuration file, which tells the program:

The names of the studies (datasets) and their locations
The quality control criteria to use for filtering SNPs
How to apply genomic control (using all available SNPs, or a subset)
.. and other information listed below.

Creating a configuration file

A sample configuration file is given below, along with comments that describe each setting.

## ------------------------------------------------------
## Default values section. 
## These values apply to all studies, unless overridden 
## in their sections. 
## ------------------------------------------------------

[defaults]
# Directory to write your QC'd data files. 
# The default is the currently directory. 
out_dir = .

# Prefix to add to your QC'd files. 
# Defaults to: 
out_prefix = "quality_"

# Missing data character for QC'd data files. 
# This is the character that will be used for missing values, 
# regardless of what the input file used. 
out_missing = .

# Delimiter to use when writing QC'd data files. 
# This can be "tab", "space", "comma", or an actual character
out_delim = tab

# Default delimiter when reading data files. 
# If you set "delim = guess", the program will try to guess the delimiter
# for each file. Ideally, you should provide the delimiter for each 
# file by including them in the study sections (see below.) 
delim = tab

# Default missing data characters. 
# Give a list of characters separated by space. 
# This means space is not a valid missing data character!
missing = . NA

# Specify the names of columns in your datasets. 
# These can be overridden in the study sections if one of the files
# deviates from the rest (for example, if the sample size column 
# is N_QTL, but one study errors and gives you NQTL.)  
snp = SNP
strand = STRAND
build = BUILD
chr = CHR
pos = POS
effect_allele = EFFECT_ALLELE
non_effect_allele = NON_EFFECT_ALLELE
n = N
hwe_pval = HWE_PVAL
eaf = EFFECT_ALLELE_FREQ
call_rate = CALL_RATE
beta = BETA
se = SE
pval = PVAL
imputed = IMPUTED
imp_quality = IMP_QUALITY

# Genomic control computed from a list of SNPs, rather than genome-wide. 
# Give a file with a SNP per line. 
# Example:
# gc_snp_file = /path/to/QT_Replication_SNPs

# QC cutoff thresholds. 
# call_rate_cutoff -- Call rate
# hwe_cutoff -- Hardy Weinberg p-value 
# se_cutoff -- Standard error 
# mac_cutoff -- Minor allele count
# maf_cutoff -- Minor allele frequency 
call_rate_cutoff = 0.95
hwe_cutoff = 1E-04
se_cutoff = 10
mac_cutoff = 10
maf_cutoff = 0.01

# SNPs can be renamed according to a list of substitutions. 
# This is useful if chr:pos SNP names differ on chromosome names
# (see below for an example.) 
# Note: you should include more complicated filters before simpler ones. 
# Filters are executed in the order they are given. 
# snp_rename = ('chr23','chrX'), ('chrmt','chrMT'), ('chrc6_COX','chr6'), ('chr24','chrY'), ('chr6_QBL','chr6'), ('c6_QBL','chr6'), ('c6_COX','chr6')

## ------------------------------------------------------
## Study-specific sections. 
## Each section should contain at minimum the location 
## of the study's association results file.  
## ------------------------------------------------------

# An example study section below: 
# [AMCPAS]
# delim = comma
# snp = MARKER_NAME
# file = /path/to/AMCPAS_CASES_FULLMETABOCHIP_06AUG2010_SK.txt.gz

Running garlic

On snowwhite/wonderland/fantasia, you can do:

garlic /path/to/your/config_file

If garlic is not on your PATH, you can refer to it directly as /usr/cluster/bin/garlic.

Output

Garlic creates 2 important files:

garlic_log.txt

The log file contains a copy of all console output by the program. It primarily contains:

Information on how many rows (SNPs) were dropped due to QC criteria
Genomic control information
Any errors encountered when reading the file, these errors include:
- Bad strands or alleles in the file
- Duplicated SNPs or positions
- P-values that don't match their beta/SE

qc_plots.pdf

This file contains QQ plots for each study, and allele frequency plots for each study vs. study combination.

Reporting bugs/issues

Email welchr@umich.edu.

Difference between revisions of "Garlic"

Revision as of 21:21, 4 April 2011

Contents

Availability

Usage

Creating a configuration file

Running garlic

Output

garlic_log.txt

qc_plots.pdf

Reporting bugs/issues

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools

@@ Line 143: / Line 143: @@
 This file contains QQ plots for each study, and allele frequency plots for each study vs. study combination.
+== Reporting bugs/issues ==
+Email welchr@umich.edu.