Difference between revisions of "RareMETALS"

From Genome Analysis Wiki
Jump to: navigation, search
(How to install)
Line 48: Line 48:
 
** direction.by.study: Direction of genetic effects by study
 
** direction.by.study: Direction of genetic effects by study
 
** anno: Annotation for variants
 
** anno: Annotation for variants
 +
 +
== Perform single variant association test using a group file ==
 +
Summary level statistics collected in meta-analysis can often be messy, messier for lower frequency variants than for common variants. To correct for the right alleles, and make sure that ref/alt alleles from each study are properly aligned, it is often desirable to
 +
  
 
== Performing gene level association test ==
 
== Performing gene level association test ==
Line 77: Line 81:
 
** direction.meta.single.var.out: Direction of meta-analysis statistics for single variant test. It may be useful for inspecting if any of the variant in the gene have opposite effects etc.   
 
** direction.meta.single.var.out: Direction of meta-analysis statistics for single variant test. It may be useful for inspecting if any of the variant in the gene have opposite effects etc.   
 
** pos.ref.alt.out: Position, reference and alternative alleles for each variant position in the gene
 
** pos.ref.alt.out: Position, reference and alternative alleles for each variant position in the gene
 +
 +
== Perform gene-level test with a group file ==
 +
 +
  
 
== Performing conditional analysis for single variant tests ==
 
== Performing conditional analysis for single variant tests ==

Revision as of 07:59, 27 June 2014

rareMETALS is an R-package for performing single or gene-level tests for detecting rare variant associations. For questions regarding the use of this package, please contact Dajiang Liu (dajiang at umich dot edu) or Gonçalo Abecasis (goncalo at umich dot edu). The same methodology is also implemented in command line tools. Please see [1]

What is new

  • 06/27/2014 Updated to version 4.0: Many updates are implemented, including support for group files in both single variant and gene-level association test; checks for allele flips based upon variant frequency, the detection of possible allele flips using a novel statistic based upon variations of allele frequency between studies;

Where to download

The R package can be downloaded from rareMETALS_4.0.tar.gz. It will be eventually released on the Comprehensive R-archive Network. To perform gene-level association test, you will also need refFlat_hg19.txt.gz, which is the gene definition modified from refFlat.

How to install

To install the package, please use "R CMD INSTALL rareMETALS_4.0.tar.gz" command.

Supported Functionalities

  • Marginal analysis of single variant or gene-level association test
  • Conditional analysis of single variant or gene-level association, for variants (gene) where there are covariance information available between candidate variants and known variants.
  • Estimates of genetic effects and locus genetic variance

Preparing Input Files for rareMETALS

  1. Generate summary level statistic files: Summary statistics files can be generated by rvtests [2] or rare-metal-worker [3]
  2. Annotate your summary level statistics: In order to perform gene-level association test, summary level statistics file have to be annotated first. The default program for performing annotations is ANNO (by Xiaowei Zhan). The usage of the program can be found at [4]
  3. Compress and Index Summary Statistics: Files rareMETALS R-package takes compressed and tabix-indexed files as input for performing meta-analysis

Performing single variant association analysis

  • Single variant association analysis statistics can be calculated using the following function in the package:
 rareMETALS.single(score.stat.file, cov.file, range, alternative = c("two.sided", "greater", "less"), ix.gold = 1, callrate.cutoff = 0, hwe.cutoff = 0)
  • Input parameters are described below:
    • score.stat.file is the vector of file names for single variant score statistics.
    • cov.file is the vector of files of covariance matrices for single variant score statistics
    • range is a tabix [5]-like range (e.g. 1:12345-23456). All variants in the specified region will be analyzed
    • alternative specifies alternative hypothesis to be tested. The default is two.sided.
    • ix.gold is the index to be used for choosing a "gold standard" population, in case flips of alleles are observed, and the gold standard population can be used to correct for the flips
    • callrate.cutoff specifies the call rate cutoffs that will be used. All sites with call rates lower than the cutoff will be labelled as missing.
    • hwe.cutoff specifies the cutoffs for call rate, All sites with call rate lower than the cutoff will be labeled as missing.
  • Output is a dataframe that consist of the following fields:
    • pos: Variant physical position
    • ref: Reference allele
    • alt: Alternative allele
    • no.sample: Number of samples
    • p.value: P-values
    • statistic: Test statistic (score statistic)
    • maf: Minor allele frequency
    • beta1.est: Beta estimates
    • beta1.sd: The standard deviation for beta estimates
    • hsq.est: Locus genetic variance estimates
    • direction.by.study: Direction of genetic effects by study
    • anno: Annotation for variants

Perform single variant association test using a group file

Summary level statistics collected in meta-analysis can often be messy, messier for lower frequency variants than for common variants. To correct for the right alleles, and make sure that ref/alt alleles from each study are properly aligned, it is often desirable to


Performing gene level association test

  • Gene-level association test can be performed using the following function:
 rareMETALS.gene(ANNO, score.stat.file, cov.file, gene, test = "GRANVIL",maf.cutoff, no.boot = 10000, alternative = c("two.sided", "greater",
 "less"), alpha = 0.05, ix.gold = 1, out.digits = 4, callrate.cutoff = 0, hwe.cutoff = 0, gene.file = "refFlat_hg19.txt.gz")
  • Input parameters are described below:
    • ANNO is the annotation information for variants. Possible choices include Nonsynonymous, Stop_Gain, Stop_Loss, Synonymous, Essential_Splice_Site, or any logical combination of them, such as "Nonsynonymous|Stop_Gain|Stop_Loss"
    • score.stat.file is the vector of file names for single variant score statistics.
    • cov.file is the vector of files of covariance matrices for single variant score statistics
    • gene is the gene name such as PCSK9
    • no.boot is the number of bootstraps performed for evaluating significance, such as 10,000. If you choose to use analytic evaluation, please specify no.boot=0
    • alternative specifies alternative hypothesis to be tested. The default is two.sided.
    • ix.gold is the index to be used for choosing a "gold standard" population, in case flips of alleles are observed, and the gold standard population can be used to correct for the flips
    • out.digits is the number of digits in the output, which is used to prettify output.
    • callrate.cutoff specifies the call rate cutoffs that will be used. All sites with call rates lower than the cutoff will be labelled as missing.
    • hwe.cutoff specifies the cutoffs for call rate, All sites with call rate lower than the cutoff will be labeled as missing.
    • gene.file is a resource to locate gene region
  • Output: The output res.out consist of the following fields:
    • gene.name.out: gene names
    • p.value.out: P-value
    • statistic.out: Score statistics for meta-analysis
    • no.site.out: Number of variant sites in the gene.
    • beta1.est.out: Estimates for beta.
    • beta1.sd.out: Standard deviation for the beta estimates
    • maf.cutoff.out: The minor allele frequency cutoffs used to analyze the data
    • direction.burden.by.study.out: Direction of meta-analysis burden statistics across different studies
    • direction.meta.single.var.out: Direction of meta-analysis statistics for single variant test. It may be useful for inspecting if any of the variant in the gene have opposite effects etc.
    • pos.ref.alt.out: Position, reference and alternative alleles for each variant position in the gene

Perform gene-level test with a group file

Performing conditional analysis for single variant tests

  • We provide functions for performing single variant conditional meta-analysis. For variants within the sliding window, the conditional analysis is exact, in the sense that they are equal to conditional analysis results obtained using individual level data.
  • The following function can be used:
conditional.rareMETALS.single.basic(candidate.variant,score.stat.file,cov.file,known.variant.vec,maf.cutoff,no.boot=0,alternative=c('two.sided','greater','less'),
alpha=0.05,ix.gold=1,out.digits=4,callrate.cutoff=0,hwe.cutoff=0,gene.file="refFlat_hg19.txt.gz",p.value.known.variant.vec="N/A",
anno.known.variant.vec="N/A",anno.candidate.variant="N/A")
  • Input parameters are described below:
    • candidate variant: the chromosomal position for the candidate variant to be tested, e.g. "1:12345";
    • score.stat.file is the vector of file names for single variant score statistics.
    • cov.file is the vector of files of covariance matrices for single variant score statistics
    • known.variant.vec is the vector of chromosomal positions for known variants. Examples include c("1:12345","1:1234567");
    • alternative specifies alternative hypothesis to be tested. The default is two.sided.
    • ix.gold is the index to be used for choosing a "gold standard" population, in case flips of alleles are observed, and the gold standard population can be used to correct for the flips
    • callrate.cutoff specifies the call rate cutoffs that will be used. All sites with call rates lower than the cutoff will be labelled as missing.
    • hwe.cutoff specifies the cutoffs for call rate, All sites with call rate lower than the cutoff will be labeled as missing.
    • anno.known.variant.vec: Annotation information for known variant. It is optional. If the annotation is not present, please use "NA". Note that the quotation mark is a must.
    • anno.candidate.variant: Annotation information for candidate variant. It is optional and can be ignored
  • Output is a dataframe that consist of the following fields:
    • pos.single.out: Chromosomal position for candidate variants
    • ref.single.out: Referrence allele
    • alt.single.out: Alternative allele
    • p.value.single.out: P values
    • maf.single.out: Minor allele frequencies
    • beta1.est.single.out: Estimates of alternative allele effects
    • beta1.sd.single.out: Standard deviation for beta estimates
    • direction.single.out: Direction of effects
    • anno.single.out: Annotation information for candidate variants
    • pos.ref.alt.known.single.out: Position/ref/alt alleles for known variants
    • p.value.known.single.out: p-values for known variants
    • anno.known.single.out: annotation for known variants