Difference between revisions of "RAREMETAL Documentation"
(→Options for Report Generation)
|Line 163:||Line 163:|
* --prefix allows customized prefix for output files.
* --prefix allows customized prefix for output files.
* --longOutput allows users to output not only burden test results but also the single variant results (allele frequencies, effect sizes, and p-values) for the variants being grouped together. Please refer to the output files section for detailed explanation and examples.
* --tabulateHits works with --hitsCutoff together to generate reports for genes that have p-value less than specified cutoff from burden tests or SKAT. The default cutoff of p-value for genes to be reported is 1.0e-06, which can be specified by --hitsCutoff option. For more explanations and examples, please go to [[http://genome.sph.umich.edu/wiki/Rare-Metal#
* --tabulateHits works with --hitsCutoff together to generate reports for genes that have p-value less than specified cutoff from burden tests or SKAT. The default cutoff of p-value for genes to be reported is 1.0e-06, which can be specified by --hitsCutoff option. For more explanations and examples, please go to [[http://genome.sph.umich.edu/wiki/Rare-Metal#Tabulated Top Hits]].
Revision as of 23:04, 7 August 2013
- 1 Useful Wiki Pages
- 2 Key Features
- 3 Brief Description
- 4 Approach
- 5 Download and Installation
- 6 Basic Usage Instructions
- 7 Additional Analysis Options
- 8 Reports Generated by rareMETAL
- 9 Example Command lines
- 10 TUTORIAL
- 11 CONTACT
- 12 Change Log
Useful Wiki Pages
- The rareMETAL FAQ
rareMETAL has the following features:
- rareMETAL performs single variant metal-analysis by default.
- rareMETAL allows customized groups of variants to be tested.
- rareMETAL generated QQ plots and manhattan plots by default.
rareMETAL is a computationally efficient tool for meta-analysis of rare variants using sequencing or genotyping array data. rareMETAL takes summary statistics and LD matrices generated by rareMetalWorker, handles related and unrelated individuals, and supports both single variant and burden meta-analysis. rareMETAL generated high quality plots by default and has options that allow users to build reports at different levels.
rareMETAL is developed by Shuang Feng, Dajiang Liu and Gonçalo Abecasis. A R-package using the same methodology is available]. Manuscript for this tool is in preparation. Please contact sfengsph at umich dot edu for questions.
Download and Installation
- University of Michigan CSG users can go to the following:
Where to Download
How to Compile
- Save it to your local path and decompress using the following command:
tar xvzf raremetal.0.1.5.tgz
- Go to raremetal_0.1.5/raremetal/src and type the following command to compile:
How to Execute
- Go to raremetal_0.1.5/raremetal/bin and use the following:
- For example usage, please refer to [example command lines]
Basic Usage Instructions
raeMETAL is a command line tool. It is typically run from a Linux or Unix prompt by invoking the command
raremetal. In the following are descriptions of basic usage for meta analysis. A detailed TUTORIAL with toy data are also available.
Prepare Input Files
rareMETAL requires the following basic input files: summary statistics and covariance matrices of score statistics generated by rareMetalWorker, a file with list of studies to be included and a group file if gene-level meta-analysis is expected.
Files containing summary statistics and LD matrices generated by rareMetalWorker should be compressed and tabix indexed using the following commands:
bgzip study1.singlevar.score.txt tabix -s 1 -b 2 -e 2 -c "#" study1.singlevar.score.txt.gz
bgzip study1.singlevar.cov.txt tabix -s 1 -b 2 -e 2 -c "#" study1.singlevar.cov.txt.gz
List of Studies
- --studyName option is crucial for rareMETAL to work. Ignoring this option would lead to FATAL ERROR and rareMETAL would stop.
- The file should contain the path and prefix of the studies you want to include.
- If there is one or more studies that you want to excluded from your list, but want to save some effort of generating a new file, you can put a "#" in front of the line of record. rareMETAL would automatically exclude that study from meta analysis.
- An example file is in the following:
- The above example study name file guides rareMETAL to look for summary statistics from TwinsUK study only, because "HUNT" study is commented out. The following two files are needed for rareMETAL to perform further analysis together with their tabix index file are needed.
/net/fantasia/home/sfengsph/prj/raremetal/raremetal/bin/META/TwinsUK/TwinsUK.TG.singlevar.score.txt.gz /net/fantasia/home/sfengsph/prj/raremetal/raremetal/bin/META/TwinsUK/TwinsUK.TG.singlevar.cov.txt.gz /net/fantasia/home/sfengsph/prj/raremetal/raremetal/bin/META/TwinsUK/TwinsUK.TG.singlevar.score.txt.gz.tbi /net/fantasia/home/sfengsph/prj/raremetal/raremetal/bin/META/TwinsUK/TwinsUK.TG.singlevar.cov.txt.gz.tbi
Group Rare Variants
From a Group File
- With --groupFile option, you can specify particular set of variants to be grouped for burden tests.
- The group file must be a tab or space delimited file in the following format:
GROUP_ID MARKER1_ID MARKER2_ID MARKER3_ID ...
- MARKER_ID must be in the following format:
- An example group file is:
From an Annotated VCF File
If --groupFile option is NOT specified, rareMETAL will look for an annotated vcf file as blue print for variants to group. Users are also allowed to generate a vcf file based on the superset of variants from pooled samples, and annotate outside rareMETAL. Then, annotated vcf file can be used as input for rareMETAL for gene-level meta-analysis, or group files can be generated based on the annotated vcf file. Detailed description of these options are available. There are also examples of this usage at the bottom of this page.
- rareMETAL allows filtering of variants from individual studies by their HWE pvalue and call rate, which are generated as part of the output from rareMetalWorker.
- Currently, CMC type burden test, Madsen-Browning burden test, Variable Threshold burden test and SKAT are provided in rareMETAL, by specifying --burden, --MB, --VT and --SKAT.
- To decide whether a signal is caused by shadowing a significant common variant nearby, rareMETAL also enables conditional analysis with a list of variants to be conditioned upon provided in a file as input for --condition option. An example input file should be space or tab delimited as in the following. When alleles do not match the ref and alt alleles from samples, the variant will be skipped from conditional analysis.
1:861349:C:T 1:905901:G:A 20:986998:G:C 22:3670691:A:G
Additional Analysis Options
Group Rare Variants from Annotated VCF
- If --groupFile option is NOT specified, rareMETAL will look for an annotated vcf file as blue print for variants to group.
- The annotated VCF file should be specified using --annotatedVcf option.
Generate a VCF File to Annotate Outside of Rare Metal
- --writeVCF allows user to write a VCF file including pooled single variants from all studies. Then users can use their favorite annotation tool to annotate the VCF file. After annotating the VCF file, users can use that file as input for rareMETAL for further gene-based or region-based meta analysis.
Options for Report Generation
- --correctGC generates QQ plots and manhattan plots with pvalues corrected using genomic control.
- --prefix allows customized prefix for output files.
- --tabulateHits works with --hitsCutoff together to generate reports for genes that have p-value less than specified cutoff from burden tests or SKAT. The default cutoff of p-value for genes to be reported is 1.0e-06, which can be specified by --hitsCutoff option. For more explanations and examples, please go to [HITS| Tabulated Top Hits].
- --tabix allows rapid analysis when number of groups/genes of interests are small. Currently, when number of groups is less than 100, --tabix option is automatically turned on.
Reports Generated by rareMETAL
Single Variant Meta Analysis Output
- Single variant meta analysis output has the following components: header, results and footnote.
- Header line starts with "#" are column headers for results table.
- Footnote also starts with "#", where genomic controls from each study and the overall sample are reported.
- An example single variant meta analysis output is shown below:
- A detailed explanation of each column is in the following:
rareMETAL generates QQ plots and manhattan plots from single variant meta-analysis by default. Three QQ plots are generated, one with all variants included, one of variants with maf<0.05 and one of variants with maf<0.01. All plots are saved in a pdf file named yourPrefix.meta.plots.pdf. Genomic controls are also reported in the title of plots. When --correctGC option is specified, GC corrected plots are also generated.
Burden Tests Meta Analysis Output
When --longOutput is used, output includes both burden test results of genes and single variant results of the variants included in burden tests. Here is an example of output file from SKAT when --longOutput is specified.
Otherwise, single variant results of variants included in burden tests will not be included in the output. Here is an example of output file from SKAT when --longOutput is not specified.
- Tabulated top hits are saved in the file:
- The following items are tabulated in the output:
rareMETAL generates QQ plots and manhattan plots from single variant and gene-level meta-analysis by default. Example QQ plots and manhattan plots are:
The following parameters are in effect: List of Studies: ============================ --studyName [studyName.SardiNia] Grouping Methods: ============================ --groupFile [genes.file] --annotatedVcf  --annotation  --writeVcf [OFF] QC Options: ============================ --hwe  --callRate  Association Methods: ============================ --burden [true] --MB [false] --SKAT [false] --VT [false] --condition [condition.file] Other Options: ============================ --tabix [OFF] --correctGC [ON] --prefix [test] --maf [0.05] --longOutput [false] --tabulateHits [false] --hitsCutoff [1e-06]
Example Command lines
- Here is an example command line to do single variant meta analysis only:
./raremetal --studyName your.studyName.file --prefix yourPrefix
- When you want to do all burden tests using a group file to specify which variants to group:
- Here is an example of adding QC filters to variants when doing meta analysis.
- Here is how to do the same thing but reading grouping information from an annotated VCF file:
Please email Shuang Feng (sfengsph at umich dot edu) for questions.
- Version 0.0.1 released to U of M CSG group. (2/13/2013)
- Version 0.0.1 released to public. (2/24/2013)
- Version 0.1.2 released to public after fixing a few bugs, adding conditional analysis and automatic graphing to the tool. (8/5/2013)
- Version 0.1.5 released to public after fixing a few bugs, adding conditional analysis and automatic graphing to the tool. (8/8/2013)