RareMETALS2

From Genome Analysis Wiki
Jump to navigationJump to search

The R package rareMETALS2 was an extension of the R package rareMETALS. It was designed to meta-analyze gene-level association tests for binary trait . While rareMETALS offers a near-complete solution for meta-analysis of gene-level tests for quantitative trait, it does not offer the optimal solution for binary trait. The package rareMETALS2 offers improved features for analyzing gene-level association tests in meta-analyses for binary trait. If you have any questions for using rareMETALS2 or rvtests, please post your questions to our google group https://groups.google.com/forum/#!forum/raremetals

The package rareMETALS2 is under development. It takes summary association statistics generated by rvtests as input. It offers the following unique features

  • 1.) It allows the meta-analysis of samples with related individuals and samples with unrelated individuals, and allows locally efficient estimate of genetic effects.
  • 2.) It allows the adjustment of covariates in meta-analysis.
  • 3.) It allows conditional meta-analysis of single variant and gene-level associations.

Change Log

June, 14, 2015 0.1 Version released

Download

The R package can be downloaded from rareMETALS2_0.1.tar.gz. It will be eventually released on the Comprehensive R-archive Network.

How to install

To install the package, please use "R CMD INSTALL rareMETALS2_XXX.tar.gz" command, where XXX is the version number for rareMETALS2

Forum to Ask Questions

I have created a google group for discussion on the usage and for bug reports etc. If you find any issues to the package and think that the discussions may also benefit others, please post them to the user group. Here is the link to the discussion group https://groups.google.com/forum/#!forum/raremetals

Exemplar Datasets

The following exemplar datasets ExampleDataset.zip can be downloaded and tested with rareMETALS2 package.

  • Score Statistics File
  • Covariance Matrix File
  • Tabix index file

These files are all automatically generated by rvtests.

Meta-analysis of Single Variant Associations

A Simple Tutorial for Using the rareMETALS2.single function

Example:

Use the ExampleDataset above which is a set of single variant score statistics and their covariance matrices.

cov.file <- c("example1.MetaCov.assoc.gz","example2.MetaCov.assoc.gz")
score.stat.file <- c("example1.MetaScore.assoc.gz","example2.MetaScore.assoc.gz")
library(rareMETALS2)
res <- rareMETALS2.single(score.stat.file,range="19:42906914-45854919",alternative="two.sided",ix.gold=1,callrate.cutoff=0,hwe.cutoff=0)
###result can be explored as below###
> names(res)
 [1] "p.value"            "ref"                "alt"               
 [4] "integratedData"     "raw.data"           "clean.data"        
 [7] "statistic"          "direction.by.study" "anno"              
[10] "af"                 "afCase"             "afCtrl"            
[13] "QC.by.study"        "no.sample"          "no.case"           
[16] "no.ctrl"            "beta1.est"          "beta1.sd"          
[19] "hsq.est"            "nearby"             "pos"  
> print(res$pos) ###the first 25 position###
  [1] "19:42906914" "19:42907064" "19:42907171" "19:42909626" "19:42909645"
  [6] "19:42910407" "19:42910445" "19:42910454" "19:42911617" "19:42911769"
 [11] "19:42911869" "19:42911877" "19:42912216" "19:42912223" "19:42912229"
 [16] "19:42912244" "19:42912441" "19:42914568" "19:42914587" "19:42914626"
 [21] "19:42914665" "19:42914668" "19:42914682" "19:42914703" "19:42914772"
> print(res$p.value) ###the P-value of the first 25 position###
  [1]          NA 0.311755625          NA 0.793836855 0.292744822 0.393802545
  [7]          NA 0.175761040 0.106303677          NA 0.217054457          NA
 [13] 0.063036490 0.516061931 0.089290910          NA          NA          NA
 [19]          NA 0.364595966          NA 0.482571300          NA 0.434685495
 [25]          NA

Function Detail:

  rareMETALS2.single <- function(score.stat.file,range,alternative=c('two.sided','greater','less'),ix.gold=1,callrate.cutoff=0,hwe.cutoff=0,hwe.ctrl.cutoff=0)

Relevant Parameters:

  • score.stat.file files of score statistics
  • range tabix range of variants to be analyzed
  • alternative alternative hypothesis to be specified
  • ix.gold Gold standard population to align reference allele to.
  • callrate.cutoff Cutoffs of call rate, lower than which will NOT be analyzed (labelled as missing)
  • hwe.cutoff Cutoffs of HWE p-values; Variants with HWE p-value smaller than the cutoffs are removed from subsequent analysis and labelled as missing;
  • hwe.ctrl.cutoff Cutoffs of HWE p-values using controls; Variants with HWE p-value smaller than the cutoffs are removed from subsequent analysis and labelled as missing; In case control studies, it is recommended to use hwe.ctrl.cutoff, since large effect variants may violate HWE.

Meta-analysis of Gene-level Association

A Simple Tutorial for Using the rareMETALS2.range function

Example:

res1 <- rareMETALS2.range(score.stat.file,cov.file,range="19:42906914-45854919",range.name="LDLR",test = "GRANVIL",maf.cutoff = 0.05,alternative = c("two.sided"),ix.gold = 1,out.digits = 4,callrate.cutoff = 0,hwe.cutoff = 0,max.VT = NULL)
> print(res1$res.out)
    gene.name.out p.value.out statistic.out no.site.out beta1.est.out
[1,] "LDLR"        "0.5575"    "0.344"       "332"       "-0.01899"   
    beta1.sd.out maf.cutoff.out direction.burden.by.study.out
[1,] "0.03237"    "0.05"         "+-"                         
     direction.meta.single.var.out                                                                                                                                                                                                                                                                                                                 
[1,]   "++--+++-+--++-++-++---++-++-+----++-+++-++-++--+--++-++----++-++++++++---+++-------+++--++-++---++----++--+--+-+-++--+++++++-++-++--
        --+-+++----+++----++---+---+---++++---+-+--+--+++--+--+-++--++-+-+++---+++++-----++--+-++---+----++-+-++--+-+++-++-+++-+--++---+---
        +++++-+----+++ -+--++---+---++++---+++---+-+++++----+---+-+----+++-++-"
    top.singlevar.pos top.singlevar.refalt top.singlevar.pval top.singlevar.af
[1,] "19:45662295"     "G/A"                "7.804e-15"        "0.02303"       
    pos.ref.alt.out   
[1,] "19:42907064/A/G,19:42909626/C/T,19:42909645/C/T,19:42910407/C/T,19:42910454/C/T,19:42911617/T/C,19:42911869/C/T,19:42912216/C/T,19:42912223/C/T,
      19:42912229/C/T,19:42914626/C/T,19:42914668/G/A,19:42914703/C/T,19:42914880/C/T,19:42930652/C/T,19:42930721/G/A,19:42931022/C/T,19:43013329/G/T,19:43031219/T/C,
      19:43093001/G/A,19:43093011/C/T,19:43093026/T/C,19:43097692/C/T,19:43097788/C/T,19:43243059/C/T,19:43268142/G/T,19:43411959/C/G,19:43411997/G/A,19:43420322/C/T,
      19:43420408/G/C,19:43420415/G/A,19:43420565/C/T,19:43433773/C/T,19:43439628/A/G,

Function Detail:

 rareMETALS2.range <- function(score.stat.file,cov.file,range,range.name,test='GRANVIL',maf.cutoff=1,alternative=c('two.sided','greater','less'),
 ix.gold=1,out.digits=4,callrate.cutoff=0,hwe.cutoff=0,hwe.ctrl.cutoff=0,max.VT=NULL)
  • score.stat.file files of score statistics
  • cov.file covariance matrix files
  • range tabix range for each gene/region
  • range.name The name of the range,e.g. gene names can be used
  • test rare variant tests to be used
  • maf.cutoff MAF cutoff used to analyze variants
  • alternative alternative hypothesis to be specified
  • ix.gold Gold standard population to align reference allele to
  • out.digits Number of digits used in the output
  • callrate.cutoff Cutoffs of call rate, lower than which will NOT be analyzed (labelled as missing)
  • hwe.cutoff Cutoffs of HWE p-values; Variants with HWE p-value smaller than the cutoffs are removed from subsequent analysis and labelled as missing;
  • hwe.ctrl.cutoff Cutoffs of HWE p-values using controls; Variants with HWE p-value smaller than the cutoffs are removed from subsequent analysis and labelled as missing; In case control studies, it is recommended to use hwe.ctrl.cutoff, since large effect variants may violate HWE.
  • max.VT The maximum number of thresholds used in VT; Setting max.VT to 10 can improve the speed for calculation without affecting the power too much. The default parameter is NULL, which does not set upper limit on the number of variable frequency threhsold.

Conditional Meta-analysis

A Simple Tutorial for Using the conditional.rareMETALS2.single

Example

 res2<-conditional.rareMETALS2.single(candidate.variant.vec=c("19:42906914","19:45854819"), score.stat.file, cov.file,known.variant.vec=c("19:43995275","19:44047839","19:44084155"), maf.cutoff=0.05, no.boot =1000,
                            alternative = c("two.sided"), ix.gold = 1,out.digits = 4, callrate.cutoff = 0, hwe.cutoff = 0,p.value.known.variant.vec = NA, anno.known.variant.vec = NA,anno.candidate.variant.vec = NA)
 > print(res2$res.out)
    POS           REF ALT N:N_CASE:N_CTRL  PVALUE AF:AF_CASE:AF_CTRL BETA_EST
[1,] "19:42906914" "G" "T" "4012:2008:2004" "NaN"  "0:0:0"            "NaN"   
[2,] ""            ""  ""  "4012:2008:2004" NA     "0:NA:NA"          NA      
    BETA_SD DIRECTION_BY_STUDY ANNO 
[1,] "Inf"   "=="               "N/A"
[2,] NA      NA                 ""   
    POS_REF_ALT_ANNO_KNOWN                               
[1,] "0"                                                  
[2,] "19:43995275/NA/NA,19:44047839/C/T,19:44084155/NA/NA"


 Function Detail: 
                                                                                                                              
 conditional.rareMETALS2.single <- function(candidate.variant.vec,score.stat.file,cov.file,known.variant.vec,maf.cutoff,no.boot=10000,alternative=c('two.sided','greater','less'),
 ix.gold=1,out.digits=4,callrate.cutoff=0,hwe.cutoff=0,hwe.ctrl.cutoff=0,p.value.known.variant.vec=NA,anno.known.variant.vec=NA,anno.candidate.variant.vec=NA)
  • candidate.variant Position of candidate variant
  • score.stat.file Files of score statistics
  • cov.file Covariance matrix files
  • known.variant.vec Range of candidate variant, expressed in a vector, e.g. c("1:12345","1:234567");
  • test test of rare variant tests
  • maf.cutoff Cutoffs of MAF used for determining rare variants
  • alternative Alternative hypothesis to be tested
  • out.digits The number of digits used in the output
  • callrate.cutoff Cutoff of call rates. Sites with callrates lower than the cutoff will be labeled as missing
  • hwe.cutoff Cutoff of HWE p-values. Sites with HWE pvalues lower than the cutoff will be labeled as missing