Difference between revisions of "Tutorial: RAREMETAL"

From Genome Analysis Wiki
Jump to navigationJump to search
 
(44 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
[[Category:RAREMETAL]]
 +
[[Category:RAREMETALWORKER]]
 
== Useful Wiki Pages ==
 
== Useful Wiki Pages ==
  
There are a few pages in this Wiki that may be useful to rareMETAL users. Here are links to a few:
+
There are several pages in this Wiki that may be useful to RAREMETAL users. Here are links to a few:
  
* The [[RareMETAL|rareMETAL Home Page]]
+
* The [[RAREMETAL|RAREMETAL Home Page]]
  
* The [[rareMETAL Command Reference]]
+
* The [[RAREMETAL Command Reference]]
  
* The [[Rare-Metal|rareMETAL Documentation]]
+
* The [[RAREMETAL_Documentation|RAREMETAL Documentation]]
  
* The [[rareMETAL FAQ]]
+
* The [[RAREMETAL FAQ]]
 +
 
 +
* The [[RAREMETALWORKER | RAREMETALWORKER documentation]]
  
 
==Introduction==
 
==Introduction==
  
In this tutorial, we will use RareMetalWorker and RareMetal to perform a simple rare variant meta analysis. [[Rare-Metal-Worker|'''rareMetalWorker''']] is a tool that generates summary statistics that can be shared to enable meta-analysis of gene-level association tests. [[rareMETAL|'''rareMETAL''']] uses the files generated by RareMetalWorker as input and perform both single variant and gene-level association meta-analysis.
+
In this tutorial, we will use [[RAREMETAL|'''RAREMETAL''']] to perform single variant and gene-level meta-analysis using summary statistics of two small studies. We will use [[RAREMETALWORKER|'''RAREMETALWORKER''']] to generate summary statistics for each study.
  
 
==STEP 1: Install Software and Download Example Data Sets==
 
==STEP 1: Install Software and Download Example Data Sets==
  
* If RAREMETAL and RAREMETALWORKER have not been installed on your local computer yet, that is the first step! Installation instructions for RAREMETAL and RAREMETALWORKER are
+
* If RAREMETAL and RAREMETALWORKER have not been installed on your local computer yet, you will first need to install them! You can download the two from here,  [[RAREMETALWORKER#Software_Download_and_Installation|'''RAREMETAL installation''']] and [[RAREMETAL#Software_Download_and_Installation|'''RAREMETALWORKER installation''']].  
available ([[Rare-Metal-Worker#Software_Download_and_Installation|RAREMETAL]] and [[Rare-Metal#Software_Download_and_Installation|RAREMETALWORKER]]).
+
 
 +
* Then please download the [[Media:Raremetal_tutorial.tar.gz|'''tutorial package''']], including example data sets.
 +
 
 +
* To unpack the example dataset, use the following two Unix commands:
 +
 
 +
  tar xvzf raremetal_tutorial.tar.gz
 +
  cd raremetal_tutorial
 +
 
 +
==STEP 2: Analyze individual samples using RAREMETALWORKER==
  
* Then please download the [[Media:Raremetal_tutorial.tar.gz|'''tutorial package''']], including example data sets and results.  
+
* Our first dataset has 743 unrelated individuals. Their phenotypes and relationships are described in a pair of pedigree and data files; you can get a summary of their contents using pedstats.  
  
* In this tutorial, we will use a simple example dataset, which is available [[Media:Raremetalworker.tutorial.tgz|here]]
+
* These individuals have been genotyped at ~1000 markers, and these genotypes are stored in a VCF file. The VCF is a text format, so you can try to peek at the contents if you like.
 +
 +
* Although these individuals are "unrelated", there is the possibility that some of them are more closely related than others. In our analysis, we will calculate an empirical (i.e. data-driven) kinship (i.e. relatedness) matrix to describe the similarity between individuals.
  
** To unpack the example dataset, you can use the following two Unix commands:
+
* To analyse the first study, we execute the following command:  
  
   tar xvzf raremetalworker.tutorial.tgz
+
   raremetalworker --ped example1.ped --dat example1.dat --vcf example1.vcf.gz --traitName QT1 \
  cd rmw_tutorial
+
                  --inverseNormal --makeResiduals --kinSave --kinGeno --prefix STUDY1
  
* Download the example data set for '''rareMetal''' to your local drive: [[Media:Raremetal.tutorial.tgz | RareMetalWorker example data sets]]
+
This command transforms phenotypes to normality (--inverseNormal), calculates trait residuals after adjusting for covariates (--makeResiduals), estimates relatedness between individuals and saves this for later use (--kinSave and --kinGeno), and even generates some PDF files summarizing results.
  
* Go to your local path where the tar ball was saved then extract
+
* After RAREMETALWORKER finishes running, you will see several output files:
   tar xvzf raremetal.tutorial.tgz #extract
+
 
   cd raremetal_tutorial
+
STUDY1.QT1.singlevar.score.txt    ## single variant statistics
 +
STUDY1.QT1.singlevar.cov.txt      ## covariance matrices between score statistics
 +
STUDY1.plots.pdf                  ## QQ plots and Manhattan plots
 +
STUDY1.Empirical.Kinship.gz        ## Relatedness matrix
 +
STUDY1.singlevar.log              ## Log file
 +
 
 +
* We next analyse the second study using a similar command:
 +
 
 +
  raremetalworker  --ped example2.ped --dat example2.dat --vcf example2.vcf.gz --traitName QT1 \
 +
                    --inverseNormal --makeResiduals --kinSave --kinGeno --prefix STUDY2
 +
 
 +
==STEP 3: Run RAREMETAL for Meta-Analysis==
 +
 
 +
* In this step, we will use RAREMETAL to combine the summary statistics we just generated.
 +
 
 +
* We will first index the result files generated by RAREMETAL. This step relies on bgzip and tabix, two tools that allow rapid indexing and retrieval of results from compressed text files.
 +
 
 +
  bgzip STUDY1.QT1.singlevar.score.txt
 +
  tabix -c "#" -s 1 -b 2 -e 2 STUDY1.QT1.singlevar.score.txt.gz
 +
  bgzip STUDY1.QT1.singlevar.cov.txt
 +
  tabix -c "#" -s 1 -b 2 -e 2 STUDY1.QT1.singlevar.cov.txt.gz
 +
 
 +
* And again for the second study:
 +
 
 +
  bgzip STUDY2.QT1.singlevar.score.txt
 +
   tabix -c "#" -s 1 -b 2 -e 2 STUDY2.QT1.singlevar.score.txt.gz
 +
  bgzip STUDY2.QT1.singlevar.cov.txt
 +
  tabix -c "#" -s 1 -b 2 -e 2 STUDY2.QT1.singlevar.cov.txt.gz
 +
 
 +
* We next create two text files that will drive the meta-analysis. The first file lists the input files with summary statistics. Let's call it summaryfiles. In most Linux workstations, you can use the command pico or nano to create this file. These should be the contents of  "summaryfiles":
 +
 
 +
   STUDY1.QT1.singlevar.score.txt.gz
 +
  STUDY2.QT1.singlevar.score.txt.gz
 +
 
 +
* The second file lists the input files with variance-covariance information between markers. Let's call it covfiles. These should be its contents:
  
==STEP 2: Run RareMetalWorker on Individual Studies==
+
  STUDY1.QT1.singlevar.cov.txt.gz
===Example 1===
+
  STUDY2.QT1.singlevar.cov.txt.gz
  
* The first example has 743 individuals coded as unrelated according to PED file (each person belongs to an individual family).
+
* Now, we are ready for meta-analysis. To perform single variant and gene-level meta-analyses all at once, use the following command:
* there are ~1000 markers included in the VCF file.  
 
* To analyze this sample accounting for hidden relatedness, an empirical kinship should be calculated.
 
* Go to $yourPath/bin/ and execute the following command:  
 
  
   $yourPath/bin/raremetalworker  --ped rmw_tutorial/inputfiles/example1.ped
+
   raremetal --summaryFiles summaryfiles --covFiles covfiles --groupFile group.file \
                                --dat rmw_tutorial/inputfiles/example1.dat
+
            --SKAT --burden --MB --VT --longOutput --tabulateHits --hitsCutoff 1e-05 \
                                --vcf rmw_tutorial/inputfiles/example1.vcf.gz
+
            --prefix COMBINED.QT1 --hwe 1.0e-05 --callRate 0.95 -
                                --prefix rmw_tutorial/output/example1
 
                                --traitName QT1 --inverseNormal --makeResiduals --kinSave --kinGeno
 
  
Thefollowing command allows covariates to be adjusted and residuals inverse normalized.
+
This command filters summary statistics based on HWE p-value and variant call rate, generates single variant meta-analysis results, generates gene-level meta-analysis results using simple burden test, variable threshold test, Madson-Browning weighted burden test, and SKAT, tabulates significant genes with detailed single variant results included, and even generates some PDF files summarizing results.
  
===Example 2===
+
* The following output will be generated:
  
* The second sample can also be analyzed in the same fashion using the following command:
+
  COMBINED.QT1.meta.plots.pdf (## QQ plots and manhattan plots)
   $yourPath/bin/raremetalworker --ped $yourLocalPath/rmw_tutorial/inputfiles/example2.ped --dat $yourLocalPath/rmw_tutorial/inputfiles/example2.dat --vcf 
+
  COMBINED.QT1.meta.singlevar.results
        $yourLocalPath/rmw_tutorial/inputfiles/example2.vcf.gz --kinGeno --kinSave --traitName LDL --inverseNormal --makeResiduals --useCovariates
+
  COMBINED.QT1.meta.burden.results
        --prefix $yourLocalPath/rmw_tutorial/outputfiles/example2
+
   COMBINED.QT1.meta.SKAT.results
* After the two runs are finished, you will see the following output files under your current path:
+
  COMBINED.QT1.meta.VT.results
 +
  COMBINED.QT1.meta.MB.results
 +
  COMBINED.QT1.meta.tophits.SKAT.tbl
 +
  COMBINED.QT1.meta.tophits.VT.tbl
 +
  COMBINED.QT1.meta.tophits.burden.tbl
 +
  COMBINED.QT1.meta.tophits.MB.tbl
 +
  COMBINED.QT1.raremetal.log
  
  example1.QT1.singlevar.score.txt
+
* It is probably a good idea to spend a few minutes reviewing these files - they have the key results for your analysis! A detailed description of output files is available  [[RAREMETAL_Documentation#Gene-level_Tests_Meta-Analysis_Output|elsewhere]].  
  example1.QT1.singlevar.cov.txt
 
  example2.QT1.singlevar.score.txt
 
  example2.QT1.singlevar.cov.txt
 
* The output file ending with singlevar.score.txt includes summary statistics of single marker score tests.
 
* The output file ending with singlevar.cov.txt includes summary variance-covariance matrices of score statistics.
 
  
==STEP 3: Run RareMETAL to do Meta Analysis==
+
* If you are feeling adventurous and are not yet totally confused, you can continue to explore advanced features of RAREMETAL.
  
* A list of studies to be included is an essential piece of information for '''RareMETAL''' to run.
+
* For example, RAREMETAL can carry out conditional analyses using the summary statistics in summaryfiles and covfiles.  
* First, modify the example.studyname file to make the output files of RareMetaWorker reachable by RareMETAL.
 
  cd $yourPath/raremetal_tutorial/inputfiles
 
* Open example.studyName and modify them into the following:
 
  
  $yourLocalPath/rmw_tutorial/outputfiles/example1.LDL
+
* To carry out a conditional analysis, create a text file that specifies the variant that you want to condition on (lets call it "conditioningfile") and add "--condition conditioningfile" to the command line. The "conditioningfile" might look like this:
  $yourLocalPath/rmw_tutorial/outputfiles/example2.LDL
 
* If gene-level meta analysis is expected, then annotation information or groups of variants are necessary. RareMETAL can take group file to get this piece of information. * An example group file is in the following:
 
  $yourLocaPath/raremetal_tutorial/inputfiles/nonsyn.stop.splice.groupfile
 
  
* RareMETAL also takes annotated VCF as input to parse variant grouping information. Please refer to software documentation for details [http://genome.sph.umich.edu/wiki/Rare-Metal#Grouping_from_an_Annotated_VCF_File '''grouping from annotated VCF''']
+
  9:505484545:C:T
* RareMETAL allows filtering single variants to be included in meta analysis according to their QC information summarized by raremetalworker, including HWE p-value and genotype call rate.
 
  
* Finally, to meta-analyze the above two samples using summary statistics, the following command will generate results from single variant meta analysis, gene-level meta analysis using SKAT, Madsen-Browning burden test, simple burden test, Variable Threshold Burden tests.
+
* When you now run RAREMETAL with the extra options --condition conditioningfile, results will be adjusted for the variant 9:505484545:C:T.
  $yourPath/bin/raremetal --studyName --$yourLocaPath/raremetal_tutorial/inputfiles/example.studyname
 
    --groupFile $yourLocaPath/raremetal_tutorial/inputfiles/nonsyn.stop.splice.groupfile --SKAT --VT --burden --MB --maf 0.05 --hwe 1.0e-05 --callRate 0.95
 
    --prefix $yourLocaPath/raremetal_tutorial/results/
 
* To generate a lengthy results and report hits, the following command should be used:
 
  $yourPath/bin/raremetal --studyName --$yourLocaPath/raremetal_tutorial/inputfiles/example.studyname
 
    --groupFile $yourLocaPath/raremetal_tutorial/inputfiles/nonsyn.stop.splice.groupfile --SKAT --VT --burden --MB --maf 0.05 --hwe 1.0e-05 --callRate 0.95
 
    --longOutput --tabulateHits --hitsCutoff 1.0e-05 --prefix $yourLocaPath/raremetal_tutorial/results/
 
* Please refer to the documentation for detailed description of output format. [http://genome.sph.umich.edu/wiki/Rare-Metal#Output_Files '''RareMETAL Results''']
 
  
* RareMETAL also allows users to output an VCF file of the super set of all variants and use their favorite annotation tool to annotate it and then come back to RareMETAL for the gene-level meta analysis. --writeVCF is the option to use. Please refer to [http://genome.sph.umich.edu/wiki/Rare-Metal#Generate_a_VCF_File_to_Annotate_Outside_of_Rare_Metal '''Write VCF and Annotated outside RareMETAL''']
+
* An alternative to generating a group.file is to provide RAREMETAL with an annotated VCF as input. For details, see  [[RAREMETAL_Documentation#Group_Rare_Variants_from_Annotated_VCF|documentation]]. This option is especially useful when you first use RAREMETAL's [[RAREMETAL_Documentation#Generate_a_VCF_File_to_Annotate_Outside_of_Rare_Metal|'''--writeVCF''']] option to create a file listing a superset of all available variants.

Latest revision as of 17:51, 16 March 2018

Useful Wiki Pages

There are several pages in this Wiki that may be useful to RAREMETAL users. Here are links to a few:

Introduction

In this tutorial, we will use RAREMETAL to perform single variant and gene-level meta-analysis using summary statistics of two small studies. We will use RAREMETALWORKER to generate summary statistics for each study.

STEP 1: Install Software and Download Example Data Sets

  • To unpack the example dataset, use the following two Unix commands:
 tar xvzf raremetal_tutorial.tar.gz 
 cd raremetal_tutorial

STEP 2: Analyze individual samples using RAREMETALWORKER

  • Our first dataset has 743 unrelated individuals. Their phenotypes and relationships are described in a pair of pedigree and data files; you can get a summary of their contents using pedstats.
  • These individuals have been genotyped at ~1000 markers, and these genotypes are stored in a VCF file. The VCF is a text format, so you can try to peek at the contents if you like.
  • Although these individuals are "unrelated", there is the possibility that some of them are more closely related than others. In our analysis, we will calculate an empirical (i.e. data-driven) kinship (i.e. relatedness) matrix to describe the similarity between individuals.
  • To analyse the first study, we execute the following command:
 raremetalworker  --ped example1.ped --dat example1.dat --vcf example1.vcf.gz --traitName QT1 \
                  --inverseNormal --makeResiduals --kinSave --kinGeno --prefix STUDY1 

This command transforms phenotypes to normality (--inverseNormal), calculates trait residuals after adjusting for covariates (--makeResiduals), estimates relatedness between individuals and saves this for later use (--kinSave and --kinGeno), and even generates some PDF files summarizing results.

  • After RAREMETALWORKER finishes running, you will see several output files:
STUDY1.QT1.singlevar.score.txt     ## single variant statistics
STUDY1.QT1.singlevar.cov.txt       ## covariance matrices between score statistics
STUDY1.plots.pdf                   ## QQ plots and Manhattan plots
STUDY1.Empirical.Kinship.gz        ## Relatedness matrix
STUDY1.singlevar.log               ## Log file
  • We next analyse the second study using a similar command:
  raremetalworker  --ped example2.ped --dat example2.dat --vcf example2.vcf.gz --traitName QT1 \
                   --inverseNormal --makeResiduals --kinSave --kinGeno --prefix STUDY2

STEP 3: Run RAREMETAL for Meta-Analysis

  • In this step, we will use RAREMETAL to combine the summary statistics we just generated.
  • We will first index the result files generated by RAREMETAL. This step relies on bgzip and tabix, two tools that allow rapid indexing and retrieval of results from compressed text files.
 bgzip STUDY1.QT1.singlevar.score.txt
 tabix -c "#" -s 1 -b 2 -e 2 STUDY1.QT1.singlevar.score.txt.gz
 bgzip STUDY1.QT1.singlevar.cov.txt
 tabix -c "#" -s 1 -b 2 -e 2 STUDY1.QT1.singlevar.cov.txt.gz
  • And again for the second study:
 bgzip STUDY2.QT1.singlevar.score.txt
 tabix -c "#" -s 1 -b 2 -e 2 STUDY2.QT1.singlevar.score.txt.gz
 bgzip STUDY2.QT1.singlevar.cov.txt
 tabix -c "#" -s 1 -b 2 -e 2 STUDY2.QT1.singlevar.cov.txt.gz
  • We next create two text files that will drive the meta-analysis. The first file lists the input files with summary statistics. Let's call it summaryfiles. In most Linux workstations, you can use the command pico or nano to create this file. These should be the contents of "summaryfiles":
 STUDY1.QT1.singlevar.score.txt.gz
 STUDY2.QT1.singlevar.score.txt.gz
  • The second file lists the input files with variance-covariance information between markers. Let's call it covfiles. These should be its contents:
 STUDY1.QT1.singlevar.cov.txt.gz
 STUDY2.QT1.singlevar.cov.txt.gz
  • Now, we are ready for meta-analysis. To perform single variant and gene-level meta-analyses all at once, use the following command:
 raremetal --summaryFiles summaryfiles --covFiles covfiles --groupFile group.file \
           --SKAT --burden --MB --VT --longOutput --tabulateHits --hitsCutoff 1e-05 \
           --prefix COMBINED.QT1 --hwe 1.0e-05 --callRate 0.95 -

This command filters summary statistics based on HWE p-value and variant call rate, generates single variant meta-analysis results, generates gene-level meta-analysis results using simple burden test, variable threshold test, Madson-Browning weighted burden test, and SKAT, tabulates significant genes with detailed single variant results included, and even generates some PDF files summarizing results.

  • The following output will be generated:
 COMBINED.QT1.meta.plots.pdf (## QQ plots and manhattan plots)
 COMBINED.QT1.meta.singlevar.results 
 COMBINED.QT1.meta.burden.results
 COMBINED.QT1.meta.SKAT.results
 COMBINED.QT1.meta.VT.results
 COMBINED.QT1.meta.MB.results
 COMBINED.QT1.meta.tophits.SKAT.tbl
 COMBINED.QT1.meta.tophits.VT.tbl
 COMBINED.QT1.meta.tophits.burden.tbl
 COMBINED.QT1.meta.tophits.MB.tbl
 COMBINED.QT1.raremetal.log
  • It is probably a good idea to spend a few minutes reviewing these files - they have the key results for your analysis! A detailed description of output files is available elsewhere.
  • If you are feeling adventurous and are not yet totally confused, you can continue to explore advanced features of RAREMETAL.
  • For example, RAREMETAL can carry out conditional analyses using the summary statistics in summaryfiles and covfiles.
  • To carry out a conditional analysis, create a text file that specifies the variant that you want to condition on (lets call it "conditioningfile") and add "--condition conditioningfile" to the command line. The "conditioningfile" might look like this:
 9:505484545:C:T
  • When you now run RAREMETAL with the extra options --condition conditioningfile, results will be adjusted for the variant 9:505484545:C:T.
  • An alternative to generating a group.file is to provide RAREMETAL with an annotated VCF as input. For details, see documentation. This option is especially useful when you first use RAREMETAL's --writeVCF option to create a file listing a superset of all available variants.