Difference between revisions of "RvTests"
From Genome Analysis Wiki
Jump to navigationJump to searchLine 34: | Line 34: | ||
Note: You should use Yanming's vcf annotation [http://genome.sph.umich.edu/wiki/VcfCodingSnps] on your vcf file first | Note: You should use Yanming's vcf annotation [http://genome.sph.umich.edu/wiki/VcfCodingSnps] on your vcf file first | ||
− | to output a annotated vcf file. You SHOULD | + | to output a annotated vcf file. You SHOULD keep the log file from the annotation, which will be used to create the gene list. |
Data File PREPARATION | Data File PREPARATION |
Revision as of 15:03, 30 January 2011
Overview
A few rare variants tests (Li-Leal's CMC and Madsen-Browning's weighted method) are implemented in the logisitc regression framework using C++.
The source code is located at File:RV3Tests.v1.tar
You can just download the tar file, extract it and go to the RV3Test.v1 folder to type make all to compile the code, the binary file will then be in the exectuables folder.
Example
See a detailed example here.
Syntax
This software uses command line interface as follows
RARE VARIANT ANALYSIS OPTIONS:
GENOTYPE : --genofile [pos.012], --geneList [outGeneSorted.txt], --cutoff [0.010], --collapseChoice [or] PHENOTYPE : --phenofile [LDL.y.ID] COVARIATES : --covConsider, --covfile [covFile.ID.2.txt] PERMUTATION : --nPermute [10], --PermutationSeed [1] GENE LEVEL TEST RESULT : --geneGlobalTestOut [globalPermuteSummary.txt], --geneTestpvalueFile [geneTestPvalues.txt]
- GENOTYPE
- --genofile
- A genotype 012 matrix (.012 is the file) This file can be prepared by using the prepare012s
source code File:VcfReader.v1.tar Again, extract the tar file and then go into the directory to type make all to compile the code, you then will find binary file in the executables folder.
Note: You should use Yanming's vcf annotation [1] on your vcf file first to output a annotated vcf file. You SHOULD keep the log file from the annotation, which will be used to create the gene list.
Data File PREPARATION
Input files : --vcf [LDL.test.vcf], --log [], --IDfile [] Subsetting choices : --All Output files : --outputPrefix [subsetGeno], --outputGeneList [LDL.geneList.txt] --vcf: Input vcf file --log: This is the log file from Yanming's annotation output, we use this log to obtain the gene list --IDfile specifies a file with one column of subject IDs to subject from the vcf file. If it is not specified, then all subjects are included for the format conversion. --All: specifies 1 to include all variants and 0 to include only nonsyn and stop annotated variants. -- outputPrefix: Specify the prefix for the four output files which will be used in rvTests *.012: A genotype matrix with subjects as rows and variant sites as columns. *.012.pos: Chromosome and position numbers. *.012.indv: Subject IDs. *.012.frq: The frequency of the included variants. --outputGeneList: Specify a file to store the gene list which will be used in rvTest. The list file looks like this 1 OR4F5 69090 70008 1 SAMD11 860529 871276 1 NOC2L 879583 893918 1 KLHL17 895966 901095 1 PLEKHN1 901876 910482 1 C1orf170 910578 912021
- --geneList
- This file is an output from prepare012s using the option --outputGeneList with columns as chromosome number, gene Name, start position, end position. There should be no header for this file.
THE CHROMOSOME NUMBERS SHOULD BE NUMERICS!!!! 1 - chromosome 1, DO NOT USE chr1.
- --cutoff
- This is the minor allele frequency, you can specify it as 0.01, 0.05 or etc.
- --collapseChoice
- Specify one of {or,sum,wt}. or: Li-Leal's CMC test, sum: Use the number of rare variants for each subject as the score, wt: Madeson-Browning's weighted rare variant score.
- PHENOTYPE
- --phenofile
- A file where the first column is subject ID and the second column is phenotype (0 or 1).
- COVARIATES
- --covConsider
- Default = 0, no covariate is considered. 1. covariate is considered.
- --covfile
- Covariate file with the first column as subject ID and the other columns are covariates needed to be considered in the model.
- PERMUTATION
- --nPermute
- Number of permutation for the evaluation of p values.
- -- PermutationSeed
- Default = 1. Can be changed to other numbers too.
- GENE LEVEL TEST RESULT
- --geneGlobalTestOut
- This file stores the 5% and 95% quantiles of the p values for all the genes at each permutation
- --geneTestPvalueFile
- This file gives you the gene name, number of rare variants, count of variants in case/control and p values from the RV test specified by collapseChoice.