30
edits
Changes
From Genome Analysis Wiki
Update contact address
* The [[RAREMETAL_Change_Log | Change Log]]
* The [[RAREMETAL_DOWNLOAD_%26_BUILD | DOWNLOAD page]]
* The [[rareMETALRAREMETALWORKER|rareMETAL Home PageRAREMETALWORKER documentation]]
== Approach ==
The key idea behind meta-analysis with rareMETAL RAREMETAL is that various gene-level test statistics can be reconstructed from single variant score statistics and that, when the linkage disequilibrium relationships between variants are known, the distribution of these gene-level statistics can be derived and used to evaluate signifi-cance. Single variant statistics are calculated using the Cochran-Mantel-Haenszel method. The main formulae are tabulated Our method has been published in [http://www.nature.com/ng/journal/v46/n2/abs/ng.2852.html '''Liu et. al'''] in the followingNature Genetics. Please go to [http://genome.sph.umich.edu/wiki/RAREMETAL_method '''method'''] for details.
== Basic Usage Instructions ==
== Additional Analysis Options ==Prepare Input Files===='''RAREMETAL''' requires the following basic input files: summary statistics and covariance matrices of score statistics generated by '''RAREMETALWORKER''' or [http://genome.sph.umich.edu/wiki/Rvtests '''rvtests'''], a file with list of studies to be included and a group file if gene-level meta-analysis is expected.
=====Summary Statistics=====
Files containing summary statistics and LD matrices generated by '''RAREMETALWORKER''' should be compressed and [http://samtools.sourceforge.net/tabix.shtml '''tabix'''] indexed using the following commands (Note in RAREMETALWORKER, if --zip is specified, these .gz and .tbi files will be automatically generated):
==== Groups =List of Variants Studies=====To perform gene* -based or group-based burden test, groups summaryFiles option is crucial for '''RAREMETAL''' to work. Ignoring this option would lead to FATAL ERROR and '''RAREMETAL''' would stop. * The file should contain the path and prefix of variants need the studies you want to be providedinclude. There are two options * If there is one or more studies that you want to provide such informationexcluded from your list, but want to save some effort of generating a new file, you can put a "#" in front of the line of record. '''RAREMETAL''' would automatically exclude that study from meta analysis. An example list of summary file is in the following:
==== List of Studies =Group Rare Variants=====* --studyName option is crucial for '''Rare Metal''' to work. Ignoring this option will lead to FATAL ERROR and '''Rare Metal''' will stop. * The file should contain the path and prefix of the studies you want to include. * If there is one or more studies that you want to excluded from your list, but want to save some effort of generating a new file, you can put a "#" in front of the line of record. '''Rare Metal''' will automatically exclude that study from meta analysis.* An example file is in the following:
* Grouping methods are only necessary when doing gene-based or group-based burden tests in meta-analysis.
* If none of the grouping method is specified, then only single variant meta-analysis will be performed.
C1orf159 1:1021285:G:T 1:1021302:T:C 1:1021315:A:C 1:1021386:G:A 1:1022534:C:T 1:1025751:C:T 1:1026913:C:T
===== Grouping from = From an Annotated VCF File ======* If --groupFile option is '''NOT''' specified, '''RAREMETAL''' will look for an annotated vcf file as blue print for variants to group. Users are also allowed to generate a vcf file based on the superset of variants from pooled samples, and annotate outside RAREMETAL. Then, annotated vcf file can be used as input for RAREMETAL for gene-level meta-analysis, or group files can be generated based on the annotated vcf file. Detailed description of these options are [[Rare-Metal#Group_Rare_Variants_from_Annotated_VCF|'''available''']]. There are also [[Rare-Metal#Example_Command_lines|'''examples''']] of this usage at the bottom of this page. ==== QC Options ====* '''RAREMETAL''' allows filtering of variants from individual studies by their HWE pvalue and call rate, which are generated as part of the output from '''RAREMETALWORKER''' or [http://genome.sph.umich.edu/wiki/Rvtests '''rvtests'''].* To filter by HWE p-values, --hwe option should be used. The default is 0.0, which means not filtering any of the variants.* To filter by call rate, --callRate option can be specified. The default is 0.0, which allows no filtering utilized. ==== Association Options====* Currently, CMC type burden test, Madsen-Browning burden test, Variable Threshold burden test and SKAT are provided in '''RAREMETAL''', by specifying --burden, --MB, --VT and --SKAT.* --maf specifies the minor allele frequency cutoff when doing gene-based or group-based burden tests. Variants with maf '''above''' this threshold will be ignored. The default is maf<0.05.* In '''a single study''' of sample size N, if a site is monomorphic or not reported in vcf/ped, it is considered that the sample size of this study is not large enough to sample the rare allele. Thus, this study contributes 2*N reference alleles and 0 alternative allele towards meta-analysis. To let such studies contribute no alleles towards pooled allele frequency, specify --altMAF. ==== Conditional Analysis====* To decide whether a signal is caused by shadowing a significant common variant nearby, '''RAREMETAL''' also enables conditional analysis with a list of variants to be conditioned upon provided in a file as input for --condition option. An example input file should be space or tab delimited as in the following. When alleles do not match the ref and alt alleles from samples, the variant will be skipped from conditional analysis. 1:861349:C:T 1:905901:G:A 20:986998:G:C 22:3670691:A:G == Additional Analysis Options == === Generate a VCF File to Annotate Outside RAREMETAL ===* --writeVCF allows user to write a VCF file including pooled single variants from all studies. Then users can use their favorite annotation tool to annotate the VCF file. After annotating the VCF file, users can use that file as input for '''RAREMETAL''' for further gene-based or region-based meta analysis.* The output vcf file will be name as: yourPrefix.pooled.variants.vcf. An example output vcf file is in the following: #CHROM POS ID REF ALT QUAL FILTER INFO 1 115658497 115658497 G A . . ALT_AF=0.380906; 2 74688884 74688884 G A . . ALT_AF=8.33611e-05; 3 121414217 121414217 C A . . ALT_AF=0.0747833;===Annotation===* RAREMETAL automatically recognizes the annotation format generated by [[TabAnno | '''ANNO''']] or [[EPACTS#Annotating_VCF_file_using_EPACTS | '''EPACTS''']].* To annotate a the VCF generated in previous step, you can use the following command: ./anno --in your.in.vcf.gz --out your.out.vcf.gz === Group Rare MetalVariants from Annotated VCF ===* If --groupFile option is '''NOT''' specified, '''RAREMETAL''' will look for an annotated vcf file as blue print for variants to group.
* The annotated VCF file should be specified using --annotatedVcf option.
* --annotation should be used with --annotatedVcf together when specific category of functional variants are of interest to be grouped. For example, if grouping nonsynonymous and splicing variants are of interests, the following should be included in command line:
* (! only available after v4.13.8) when --annotation is not specified, raremetal groups all non-intergenic variants.
--annotatedVcf your.annotated.vcf --annotation nonsyn/splicing
* Notice that each variant is allowed to have more than one annotations; but each annotation should start with a new key "ANNO=" followed by annotation:genename:other transcript information.
* Generated group file will be named test.groupfile under your running directory.
===== Generate a VCF File to Annotate Outside of Rare Metal =====* --writeVCF allows user to write a VCF file including pooled single variants from all studies. Then users can use their favorite annotation tool to annotate the VCF file. After annotating the VCF file, users can use that file as input for '''Rare Metal''' Options for further gene-based or region-based meta analysis.* The output vcf file will be name as: yourPrefix.pooled.variants.vcf. An example output vcf file is in the following: #CHROM POS ID REF ALT QUAL FILTER INFO 1 115658497 115658497 G A . . ALT_AF=0.380906; 2 74688884 74688884 G A . . ALT_AF=8.33611e-05; 3 121414217 121414217 C A . . ALT_AF=0.0747833; ==== QC Options Report Generation====* '''Rare Metal''' allows filtering of variants from individual studies by their HWE pvalue and call rate, which are generated as part of the output from '''Rare Metal Worker'''.* To filter by HWE p-values, --hwe option should be used. The default is 0.0, which means not filtering any of the variants.* To filter by call rate, --callRate option can be specified. The default is 0.0, which allows no filtering utilized. ==== Association Methods ====* Currently, four methods are provided in '''Rare Metal''', CMC type burden test, Madsen-Browning burden test, Variable Threshold burden test, correctGC generates QQ plots and SKATmanhattan plots with pvalues corrected using genomic control. ==== Other Options====
* --prefix allows customized prefix for output files.
* --longOutput allows users to output not only burden test results but also the single variant results (allele frequencies, effect sizes, and p-values) for the variants being grouped together. Please refer to the output files section for detailed explanation and examples.
* --tabulateHits works with --hitsCutoff together to generate reports for genes that have p-value less than specified cutoff from burden tests or SKAT. The default cutoff of p-value for genes to be reported is 1.0e-06, which can be specified by --hitsCutoff option. For more explanations and examples, please go to [[http://genome.sph.umich.edu/wiki/Rare-Metal#Tabulated_Top_Hits TABULATED_HITS| Tabulated Top Hits]]. ===Miscellaneous Options===* --tabix allows rapid analysis when number of groups/genes of interests are small. Currently, when number of groups is less than 100, --tabix option is automatically turned on.
==Reports Generated by RAREMETAL = ==== Single Variant Meta Analysis Output Files ===
==== Single Variant Meta Analysis Output TABLES ====
* Single variant meta analysis output has the following components: header , results and resultsfootnote.
* Header lines start with "##" shows summary of the meta analysis including method used, number of studies, and total sample size.
* Header line starts with "#" are column headers for results table.
* Footnote also starts with "#", where genomic controls from each study and the overall sample are reported.
* An example single variant meta analysis output is shown below:
EFFECT_SIZE: Alternative Allele Effect Size
DIRECTION_BY_STUDY: Effect size direction of alternative allele from each study.
The order of study is consistent with the order of studies listed in the input file for option --studyNamesummaryFiles.
"?" means the variant is not observed or monomorphic from the study.
"!" means the variant observed from this study has different alleles from those from in the first study.
==== Burden Tests Meta Analysis Output PLOTS====When --longOutput is specified, output includes both burden test results of genes and single variant results of the variants included in burden tests. Otherwise, single variant results of variants included in burden tests will not be included in the output.
'''RAREMETAL''' generates QQ plots and manhattan plots from single variant meta-analysis by default. Three QQ plots are generated, one with all variants included, one of variants with maf<0.05 and one of variants with maf<0.01. All plots are saved in a pdf file named yourPrefix.meta.plots.pdf. Genomic controls are also reported in the title of plots. When --correctGC option is specified, GC corrected plots are also generated.{| border="1" cellpadding="5" cellspacing="0" align="center"|-| align="center" width="100" | [[File:QQ.png]]|-| align="center" width="200" | [[File:Single_var_manhattan.png]]|} = Long == Gene-level Tests Meta-Analysis Output Format === ==== LONG TABLES ====* When --longOutput is used, output includes both burden test results of genes and single variant results of the variants included in burden tests. Here is an example of output file from SKAT when --longOutput is specified.
##Method=Burden
##STUDY_NUM=2
KLHL17 2 1:897285:A:G;1:898869:C:T 0.0148408,0.00108369 -0.0502034,-0.0256403 0.528269,0.934606 0.00796222 0.00108369 0.0148408 -0.0484494 0.528878
===== Short Output Format =SHORT TABLES ====* Otherwise, single variant results of variants included in burden tests will not be included in the output. Here is an example of output file from SKAT when --longOutput is not specified.
##Method=Burden
KLHL17 2 1:897285:A:G;1:898869:C:T 0.00796222 0.00108369 0.0148408 -0.0484494 0.528878
==== Tabulated Top Hits TABULATED HITS ====
* When --tabulateHits is specified, top hits from Burden tests will be generated. Each method will have an individual tabulated file generated. The purpose of this tabulated file is to list burden test results of top hits together with single variant results from variants being grouped in burden tests. The difference between this file and the standard long-format output file from burden test is that each row of the file represents a single variant that is included in the gene for burden test. This format allows each sorting on users end.
* According to the example above, PCSK9 had a p-value of 7.54587e-11 from the gene-based burden test, where three variants from this gene were included. Another hit from this meta analysis is APOE, where only one variant was included in the burden test.
==== Log PLOTS ===='''RAREMETAL''' generates QQ plots and manhattan plots from single variant and gene-level meta-analysis by default. Example QQ plots and manhattan plots are:{| border="1" cellpadding="5" cellspacing="0" align="center"|-| align="center" width="200" | [[File :manhattan.png]]|} ==== LOG ====
* A log file is automatically generated by '''Rare MetalRAREMETAL''' to save the parameters in effect. An example is in the following:
The following parameters are in effect:
Grouping Methods:
============================
--groupFile [genes.file] --annotatedVcf [../../groupvcf/bin/debug/nonsynonymous.vcf]
--annotation []
--writeVcf [OFF]
--SKAT [false]
--VT [false]
--condition [condition.file]
Other Options:
============================
--tabix [OFF]
--correctGC [ON]
--prefix [test]
--maf [0.05]
--tabulateHits [false]
--hitsCutoff [1e-06]
--dosage [false]
--altMAF [false]
==Example UsageCommand lines==
* Here is an example command line to do single variant meta analysis only:
./raremetal --studyName summaryFiles your.studyNamelist.file of.summary.files --prefix yourPrefix
* When you want to do all burden tests using a group file to specify which variants to group:
./raremetal --studyName summaryFiles your.studyNamelist.file of.summary.files --covFiles your.list.of.cov.files --groupFile your.groupfile --burden --MB --SKAT --VT --maf 0.01 --prefix yourPrefix
(NOTE: this will generate single variant meta analysis result and the short format output for burden test results.)
* Here is how to do all SKAT meta analysis using a group file and request a long format output together with tabulated hits:
./raremetal --studyName summaryFiles your.studyNamelist.file of.summary.files --covFiles your.list.of.cov.files --groupFile your.groupfile --SKAT --longOutput --tabulateHits --hitsCutoff 1.0e-07 --prefix yourPrefix
* Here is an example of adding QC filters to variants when doing meta analysis.
./raremetal --studyName summaryFiles your.studyNamelist.file of.summary.files --covFiles your.list.of.cov.files --groupFile your.groupfile --SKAT --longOutput --tabulateHits --hitsCutoff 1.0e-07 --hwe 1e-06 --callRate 0.98 --prefix yourPrefix
* Here is how to do the same thing but reading grouping information from an annotated VCF file:
./raremetal --studyName summaryFiles your.studyNamelist.file of.summary.files --covFiles your.list.of.cov.files --annotatedVcf your.annotated.vcf --annotation nonsyn/stop/splicing --SKAT --longOutput --tabulateHits --hitsCutoff 1.0e-07 --hwe 1e-06 --callRate 0.98 --prefix yourPrefix
* If you want to write a VCF file of pooled variants from all studies, annotate them using your favorite annotation program, and then come back to '''Rare MetalRAREMETAL''' with the annotate VCF file to do burden tests:
First, use the following command to write the VCF file:
./raremetal --studyName summaryFiles your.studyNamelist.file of.summary.files --writeVcf --prefix yourPrefix Second, annotate the VCF file using your favorite annotation program. (Annotated VCF file has to follow the format described here: [[http://genome.sph.umich.edu/wiki/Rare-Metal#Grouping_from_an_Annotated_VCF_File Group_Rare_Variants_from_Annotated_VCF|annotated VCF format]])
Third, use the following command to do meta analysis:
./raremetal --studyName summaryFiles your.studyNamelist.file of.summary.files --covFiles your.list.of.cov.files --annotatedVcf your.annotated.vcf --annotation nonsyn/splicing/stop --burden --MB --SKAT --VT --maf 0.01 --prefix yourPrefix ==Other Useful Info== * Summary specs can be found [[Summary Files Specification for RAREMETAL]]
==TUTORIAL==
* For a comprehensive tutorial of RareMetalWorker RAREMETALWORKER and RareMETAL RAREMETAL using example data sets, please go to the following: [http://genome.sph.umich.edu/wiki/Tutorial:_RareMETAL '''RAREMETAL Tutorial'''] * For a brief tutorial of rvtests, please go to: [http://genome.sph.umich.edu/wiki/Rvtests '''rvtests'''] ==CONTACT==