RvTests
From Genome Analysis Wiki
Jump to navigationJump to searchOverview
A few rare variants tests (Li-Leal's CMC and Madsen-Browning's weighted method) are implemented in the logisitc regression framework using C++.
The source code is located at wonderland:/home/youna/prj/RV/RV3Tests.v1/
The binary file is located at wonderland:/home/youna/prj/RV/RV3Tests.v1/executables/rvTests
Example
See a detailed example here.
Syntax
This software uses command line interface as follows
RARE VARIANT ANALYSIS OPTIONS:
GENOTYPE : --genofile [pos.012], --geneList [outGeneSorted.txt], --cutoff [0.010], --collapseChoice [or] PHENOTYPE : --phenofile [LDL.y.ID] COVARIATES : --covConsider, --covfile [covFile.ID.2.txt] PERMUTATION : --nPermute [10], --PermutationSeed [1] GENE LEVEL TEST RESULT : --geneGlobalTestOut [globalPermuteSummary.txt], --geneTestpvalueFile [geneTestPvalues.txt]
- GENOTYPE
- --genofile
- A genotype 012 matrix (.012 is the file) This file can be prepared by using the prepare012s
source code (wonderland:/home/youna/prj/RV/vcfReader.v1) binary file (wonderland:/home/youna/prj/RV/vcfReader.v1/executables/prepare012s)
Note: If you going to analyze nonsynonymous and stop annotated variants, you should use Yanming's vcf annotation [1] on the vcf file.
Data File PREPARATION
Input files : --vcf [LDL.test.vcf], --log [], --IDfile [] Subsetting choices : --All Output files : --outputPrefix [subsetGeno], --outputGeneList [LDL.geneList.txt] --vcf: Input vcf file --log: This is the log file from Yanming's annotation output, we use this log to obtain the gene list --IDfile specifies a file with one column of subject IDs to subject from the vcf file. If it is not specified, then all subjects are included for the format conversion. --All: specifies 1 to include all variants and 0 to include only nonsyn and stop annotated variants. -- outputPrefix: Specify the prefix for the four output files which will be used in rvTests *.012: A genotype matrix with subjects as rows and variant sites as columns. *.012.pos: Chromosome and position numbers. *.012.indv: Subject IDs. *.012.frq: The frequency of the included variants. --outputGeneList: Specify a file to store the gene list which will be used in rvTest. The list file looks like this 1 OR4F5 69090 70008 1 SAMD11 860529 871276 1 NOC2L 879583 893918 1 KLHL17 895966 901095 1 PLEKHN1 901876 910482 1 C1orf170 910578 912021
- --geneList
- This file is an output from prepare012s using the option --outputGeneList with columns as chromosome number, gene Name, start position, end position. There should be no header for this file.
THE CHROMOSOME NUMBERS SHOULD BE NUMERICS!!!! 1 - chromosome 1, DO NOT USE chr1.
- --cutoff
- This is the minor allele frequency, you can specify it as 0.01, 0.05 or etc.
- --collapseChoice
- Specify one of {or,sum,wt}. or: Li-Leal's CMC test, sum: Use the number of rare variants for each subject as the score, wt: Madeson-Browning's weighted rare variant score.
- PHENOTYPE
- --phenofile
- A file where the first column is subject ID and the second column is phenotype (0 or 1).
- COVARIATES
- --covConsider
- Default = 0, no covariate is considered. 1. covariate is considered.
- --covfile
- Covariate file with the first column as subject ID and the other columns are covariates needed to be considered in the model.
- PERMUTATION
- --nPermute
- Number of permutation for the evaluation of p values.
- -- PermutationSeed
- Default = 1. Can be changed to other numbers too.
- GENE LEVEL TEST RESULT
- --geneGlobalTestOut
- This file stores the 5% and 95% quantiles of the p values for all the genes at each permutation
- --geneTestPvalueFile
- This file gives you the gene name, number of rare variants, count of variants in case/control and p values from the RV test specified by collapseChoice.