Difference between revisions of "RvTests"
From Genome Analysis Wiki
Jump to navigationJump to search (Created page with 'Coming up soon ...') |
|||
Line 1: | Line 1: | ||
− | + | [[Category:Software]] | |
+ | = Overview = | ||
+ | A few rare variants tests (Li-Leal's CMC and Madsen-Browning's weighted method) are implemented in the logisitc regression framework using C++. | ||
+ | |||
+ | The source code is located at wonderland:/home/youna/prj/RV/RV3Tests.v1/ | ||
+ | |||
+ | The binary file is located at wonderland:/home/youna/prj/RV/RV3Tests.v1/executables/rvTests | ||
+ | |||
+ | = Example = | ||
+ | |||
+ | See a detailed [[example]] here. | ||
+ | |||
+ | = Syntax = | ||
+ | |||
+ | This software uses command line interface as follows | ||
+ | |||
+ | RARE VARIANT ANALYSIS OPTIONS: | ||
+ | GENOTYPE : --genofile [pos.012], | ||
+ | --geneList [outGeneSorted.txt], --cutoff [0.010], | ||
+ | --collapseChoice [or] | ||
+ | PHENOTYPE : --phenofile [LDL.y.ID] | ||
+ | COVARIATES : --covConsider, --covfile [covFile.ID.2.txt] | ||
+ | PERMUTATION : --nPermute [10], --PermutationSeed [1] | ||
+ | GENE LEVEL TEST RESULT : --geneGlobalTestOut [globalPermuteSummary.txt], | ||
+ | --geneTestpvalueFile [geneTestPvalues.txt] | ||
+ | |||
+ | |||
+ | ;GENOTYPE | ||
+ | |||
+ | ;--genofile: A genotype 012 matrix (.012 is the file) This file can be prepared by using the prepare012s | ||
+ | source code (wonderland:/home/youna/prj/RV/vcfReader.v1) | ||
+ | binary file (wonderland:/home/youna/prj/RV/vcfReader.v1/executables/prepare012s) | ||
+ | |||
+ | Note: If you going to analyze nonsynonymous and stop annotated variants, | ||
+ | you should use Yanming's vcf annotation [http://genome.sph.umich.edu/wiki/VcfCodingSnps] on the vcf file. | ||
+ | |||
+ | Data File PREPARATION | ||
+ | Input files : --vcf [LDL.test.vcf], --log [], --IDfile [] | ||
+ | Subsetting choices : --All | ||
+ | Output files : --outputPrefix [subsetGeno], | ||
+ | --outputGeneList [LDL.geneList.txt] | ||
+ | --vcf: Input vcf file | ||
+ | --log: This is the log file from Yanming's annotation output, we use this log to obtain the gene list | ||
+ | --IDfile specifies a file with one column of subject IDs to subject from the vcf file. | ||
+ | If it is not specified, then all subjects are included for the format conversion. | ||
+ | --All: specifies 1 to include all variants and 0 to include only nonsyn and stop annotated variants. | ||
+ | -- outputPrefix: Specify the prefix for the four output files which will be used in rvTests | ||
+ | *.012: A genotype matrix with subjects as rows and variant sites as columns. | ||
+ | *.012.pos: Chromosome and position numbers. | ||
+ | *.012.indv: Subject IDs. | ||
+ | *.012.frq: The frequency of the included variants. | ||
+ | --outputGeneList: Specify a file to store the gene list which will be used in rvTest. | ||
+ | The list file looks like this | ||
+ | 1 OR4F5 69090 70008 | ||
+ | 1 SAMD11 860529 871276 | ||
+ | 1 NOC2L 879583 893918 | ||
+ | 1 KLHL17 895966 901095 | ||
+ | 1 PLEKHN1 901876 910482 | ||
+ | 1 C1orf170 910578 912021 | ||
+ | |||
+ | ;--geneList: This file is an output from prepare012s using the option --outputGeneList with columns as chromosome number, gene Name, start position, end position. There should be no header for this file. | ||
+ | |||
+ | THE CHROMOSOME NUMBERS SHOULD BE NUMERICS!!!! 1 - chromosome 1, DO NOT USE chr1. | ||
+ | |||
+ | ;--cutoff: This is the minor allele frequency, you can specify it as 0.01, 0.05 or etc. | ||
+ | ;--collapseChoice: Specify one of {or,sum,wt}. or: Li-Leal's CMC test, sum: Use the number of rare variants for each subject as the score, wt: Madeson-Browning's weighted rare variant score. | ||
+ | |||
+ | ;PHENOTYPE | ||
+ | ;--phenofile: A file where the first column is subject ID and the second column is phenotype (0 or 1). | ||
+ | |||
+ | ;COVARIATES | ||
+ | ;--covConsider: Default = 0, no covariate is considered. 1. covariate is considered. | ||
+ | ;--covfile: Covariate file with the first column as subject ID and the other columns are covariates needed to be considered in the model. | ||
+ | |||
+ | ;PERMUTATION | ||
+ | ;--nPermute: Number of permutation for the evaluation of p values. | ||
+ | ;-- PermutationSeed: Default = 1. Can be changed to other numbers too. | ||
+ | |||
+ | ;GENE LEVEL TEST RESULT: | ||
+ | ;--geneGlobalTestOut: This file stores the 5% and 95% quantiles of the p values for all the genes at each permutation | ||
+ | ;--geneTestPvalueFile: This file gives you the gene name, number of rare variants, count of variants in case/control and p values from the RV test specified by collapseChoice. |
Revision as of 15:37, 30 January 2011
Overview
A few rare variants tests (Li-Leal's CMC and Madsen-Browning's weighted method) are implemented in the logisitc regression framework using C++.
The source code is located at wonderland:/home/youna/prj/RV/RV3Tests.v1/
The binary file is located at wonderland:/home/youna/prj/RV/RV3Tests.v1/executables/rvTests
Example
See a detailed example here.
Syntax
This software uses command line interface as follows
RARE VARIANT ANALYSIS OPTIONS:
GENOTYPE : --genofile [pos.012], --geneList [outGeneSorted.txt], --cutoff [0.010], --collapseChoice [or] PHENOTYPE : --phenofile [LDL.y.ID] COVARIATES : --covConsider, --covfile [covFile.ID.2.txt] PERMUTATION : --nPermute [10], --PermutationSeed [1] GENE LEVEL TEST RESULT : --geneGlobalTestOut [globalPermuteSummary.txt], --geneTestpvalueFile [geneTestPvalues.txt]
- GENOTYPE
- --genofile
- A genotype 012 matrix (.012 is the file) This file can be prepared by using the prepare012s
source code (wonderland:/home/youna/prj/RV/vcfReader.v1) binary file (wonderland:/home/youna/prj/RV/vcfReader.v1/executables/prepare012s)
Note: If you going to analyze nonsynonymous and stop annotated variants, you should use Yanming's vcf annotation [1] on the vcf file.
Data File PREPARATION
Input files : --vcf [LDL.test.vcf], --log [], --IDfile [] Subsetting choices : --All Output files : --outputPrefix [subsetGeno], --outputGeneList [LDL.geneList.txt] --vcf: Input vcf file --log: This is the log file from Yanming's annotation output, we use this log to obtain the gene list --IDfile specifies a file with one column of subject IDs to subject from the vcf file. If it is not specified, then all subjects are included for the format conversion. --All: specifies 1 to include all variants and 0 to include only nonsyn and stop annotated variants. -- outputPrefix: Specify the prefix for the four output files which will be used in rvTests *.012: A genotype matrix with subjects as rows and variant sites as columns. *.012.pos: Chromosome and position numbers. *.012.indv: Subject IDs. *.012.frq: The frequency of the included variants. --outputGeneList: Specify a file to store the gene list which will be used in rvTest. The list file looks like this 1 OR4F5 69090 70008 1 SAMD11 860529 871276 1 NOC2L 879583 893918 1 KLHL17 895966 901095 1 PLEKHN1 901876 910482 1 C1orf170 910578 912021
- --geneList
- This file is an output from prepare012s using the option --outputGeneList with columns as chromosome number, gene Name, start position, end position. There should be no header for this file.
THE CHROMOSOME NUMBERS SHOULD BE NUMERICS!!!! 1 - chromosome 1, DO NOT USE chr1.
- --cutoff
- This is the minor allele frequency, you can specify it as 0.01, 0.05 or etc.
- --collapseChoice
- Specify one of {or,sum,wt}. or: Li-Leal's CMC test, sum: Use the number of rare variants for each subject as the score, wt: Madeson-Browning's weighted rare variant score.
- PHENOTYPE
- --phenofile
- A file where the first column is subject ID and the second column is phenotype (0 or 1).
- COVARIATES
- --covConsider
- Default = 0, no covariate is considered. 1. covariate is considered.
- --covfile
- Covariate file with the first column as subject ID and the other columns are covariates needed to be considered in the model.
- PERMUTATION
- --nPermute
- Number of permutation for the evaluation of p values.
- -- PermutationSeed
- Default = 1. Can be changed to other numbers too.
- GENE LEVEL TEST RESULT
- --geneGlobalTestOut
- This file stores the 5% and 95% quantiles of the p values for all the genes at each permutation
- --geneTestPvalueFile
- This file gives you the gene name, number of rare variants, count of variants in case/control and p values from the RV test specified by collapseChoice.