RvTests

From Genome Analysis Wiki
Revision as of 14:37, 30 January 2011 by Youna (talk | contribs)
Jump to: navigation, search

Overview

A few rare variants tests (Li-Leal's CMC and Madsen-Browning's weighted method) are implemented in the logisitc regression framework using C++.

The source code is located at wonderland:/home/youna/prj/RV/RV3Tests.v1/

The binary file is located at wonderland:/home/youna/prj/RV/RV3Tests.v1/executables/rvTests

Example

See a detailed example here.

Syntax

This software uses command line interface as follows

RARE VARIANT ANALYSIS OPTIONS:

                GENOTYPE : --genofile [pos.012],
                           --geneList [outGeneSorted.txt], --cutoff [0.010],
                           --collapseChoice [or]
               PHENOTYPE : --phenofile [LDL.y.ID]
              COVARIATES : --covConsider, --covfile [covFile.ID.2.txt]
             PERMUTATION : --nPermute [10], --PermutationSeed [1]
  GENE LEVEL TEST RESULT : --geneGlobalTestOut [globalPermuteSummary.txt],
                           --geneTestpvalueFile [geneTestPvalues.txt]


GENOTYPE
--genofile
A genotype 012 matrix (.012 is the file) This file can be prepared by using the prepare012s
 source code (wonderland:/home/youna/prj/RV/vcfReader.v1)
 binary file (wonderland:/home/youna/prj/RV/vcfReader.v1/executables/prepare012s)
Note: If you going to analyze nonsynonymous and stop annotated variants, 
 you should use Yanming's vcf annotation [1] on the vcf file.

Data File PREPARATION

         Input files : --vcf [LDL.test.vcf], --log [], --IDfile []
  Subsetting choices : --All
        Output files : --outputPrefix [subsetGeno],
                       --outputGeneList [LDL.geneList.txt]
  --vcf: Input vcf file 
  --log: This is the log file from Yanming's annotation output, we use this log to obtain the gene list
  --IDfile specifies a file with one column of subject IDs to subject from the vcf file. 
   If it is not specified, then all subjects are included for the format conversion.
  --All:  specifies 1 to include all variants and 0 to include only nonsyn and stop annotated variants.
  -- outputPrefix: Specify the prefix for the four output files which will be used in rvTests
  *.012: A genotype matrix with subjects as rows and variant sites as columns.
  *.012.pos: Chromosome and position numbers. 
  *.012.indv: Subject IDs.
  *.012.frq: The frequency of the included variants.
 --outputGeneList:  Specify a file to store the gene list which will be used in rvTest.
 The list file looks like this 
 1	OR4F5	69090	70008
 1	SAMD11	860529	871276
 1	NOC2L	879583	893918
 1	KLHL17	895966	901095
 1	PLEKHN1	901876	910482
 1	C1orf170	910578	912021
--geneList
This file is an output from prepare012s using the option --outputGeneList with columns as chromosome number, gene Name, start position, end position. There should be no header for this file.

THE CHROMOSOME NUMBERS SHOULD BE NUMERICS!!!! 1 - chromosome 1, DO NOT USE chr1.

--cutoff
This is the minor allele frequency, you can specify it as 0.01, 0.05 or etc.
--collapseChoice
Specify one of {or,sum,wt}. or: Li-Leal's CMC test, sum: Use the number of rare variants for each subject as the score, wt: Madeson-Browning's weighted rare variant score.
PHENOTYPE
--phenofile
A file where the first column is subject ID and the second column is phenotype (0 or 1).
COVARIATES
--covConsider
Default = 0, no covariate is considered. 1. covariate is considered.
--covfile
Covariate file with the first column as subject ID and the other columns are covariates needed to be considered in the model.
PERMUTATION
--nPermute
Number of permutation for the evaluation of p values.
-- PermutationSeed
Default = 1. Can be changed to other numbers too.
GENE LEVEL TEST RESULT
--geneGlobalTestOut
This file stores the 5% and 95% quantiles of the p values for all the genes at each permutation
--geneTestPvalueFile
This file gives you the gene name, number of rare variants, count of variants in case/control and p values from the RV test specified by collapseChoice.