Garlic
From Genome Analysis Wiki
Garlic is a quality control program for GWAS association results. These results would typically be fed into a meta-analysis program such as METAL.
Availability
Garlic is currently only available on the CSG cluster.
Usage
Running garlic requires the creation of a configuration file, which tells the program:
- The names of the studies (datasets) and their locations
- The quality control criteria to use for filtering SNPs
- How to apply genomic control (using all available SNPs, or a subset)
- .. and other information listed below.
Creating a configuration file
A sample configuration file is given below, along with comments that describe each setting.
## ------------------------------------------------------ ## Default values section. ## These values apply to all studies, unless overridden ## in their sections. ## ------------------------------------------------------ [defaults] # Directory to write your QC'd data files. # The default is the currently directory. out_dir = . # Prefix to add to your QC'd files. # Defaults to: out_prefix = "quality_" # Missing data character for QC'd data files. # This is the character that will be used for missing values, # regardless of what the input file used. out_missing = . # Delimiter to use when writing QC'd data files. # This can be "tab", "space", "comma", or an actual character out_delim = tab # Default delimiter when reading data files. # If you set "delim = guess", the program will try to guess the delimiter # for each file. Ideally, you should provide the delimiter for each # file by including them in the study sections (see below.) delim = tab # Default missing data characters. # Give a list of characters separated by space. # This means space is not a valid missing data character! missing = . NA # Specify the names of columns in your datasets. # These can be overridden in the study sections if one of the files # deviates from the rest (for example, if the sample size column # is N_QTL, but one study errors and gives you NQTL.) snp = SNP strand = STRAND build = BUILD chr = CHR pos = POS effect_allele = EFFECT_ALLELE non_effect_allele = NON_EFFECT_ALLELE n = N hwe_pval = HWE_PVAL eaf = EFFECT_ALLELE_FREQ call_rate = CALL_RATE beta = BETA se = SE pval = PVAL imputed = IMPUTED imp_quality = IMP_QUALITY # Genomic control computed from a list of SNPs, rather than genome-wide. # Give a file with a SNP per line. # Example: # gc_snp_file = /path/to/QT_Replication_SNPs # QC cutoff thresholds. # call_rate_cutoff -- Call rate # hwe_cutoff -- Hardy Weinberg p-value # se_cutoff -- Standard error # mac_cutoff -- Minor allele count # maf_cutoff -- Minor allele frequency call_rate_cutoff = 0.95 hwe_cutoff = 1E-04 se_cutoff = 10 mac_cutoff = 10 maf_cutoff = 0.01 # SNPs can be renamed according to a list of substitutions. # This is useful if chr:pos SNP names differ on chromosome names # (see below for an example.) # Note: you should include more complicated filters before simpler ones. # Filters are executed in the order they are given. # snp_rename = ('chr23','chrX'), ('chrmt','chrMT'), ('chrc6_COX','chr6'), ('chr24','chrY'), ('chr6_QBL','chr6'), ('c6_QBL','chr6'), ('c6_COX','chr6') ## ------------------------------------------------------ ## Study-specific sections. ## Each section should contain at minimum the location ## of the study's association results file. ## ------------------------------------------------------ # An example study section below: # [AMCPAS] # delim = comma # snp = MARKER_NAME # file = /path/to/AMCPAS_CASES_FULLMETABOCHIP_06AUG2010_SK.txt.gz
Running garlic
On snowwhite/wonderland/fantasia, you can do:
garlic /path/to/your/config_file
If garlic is not on your PATH, you can refer to it directly as /usr/cluster/bin/garlic.
Output
Garlic creates 2 important files: