Difference between revisions of "RvTests"

From Genome Analysis Wiki
Jump to: navigation, search
(Created page with 'Coming up soon ...')
 
Line 1: Line 1:
Coming up soon ...
+
[[Category:Software]]
 +
= Overview =
 +
A few rare variants tests (Li-Leal's CMC and Madsen-Browning's weighted method) are implemented in the logisitc regression framework using C++.
 +
 
 +
The source code is located at wonderland:/home/youna/prj/RV/RV3Tests.v1/
 +
 
 +
The binary file is located at wonderland:/home/youna/prj/RV/RV3Tests.v1/executables/rvTests
 +
 
 +
= Example =
 +
 
 +
See a detailed [[example]] here.
 +
 
 +
= Syntax =
 +
 
 +
This software uses command line interface as follows
 +
 
 +
RARE VARIANT ANALYSIS OPTIONS:
 +
                GENOTYPE : --genofile [pos.012],
 +
                            --geneList [outGeneSorted.txt], --cutoff [0.010],
 +
                            --collapseChoice [or]
 +
                PHENOTYPE : --phenofile [LDL.y.ID]
 +
              COVARIATES : --covConsider, --covfile [covFile.ID.2.txt]
 +
              PERMUTATION : --nPermute [10], --PermutationSeed [1]
 +
  GENE LEVEL TEST RESULT : --geneGlobalTestOut [globalPermuteSummary.txt],
 +
                            --geneTestpvalueFile [geneTestPvalues.txt]
 +
 
 +
 
 +
;GENOTYPE
 +
 
 +
;--genofile: A genotype 012 matrix (.012 is the file) This file can be prepared by using the prepare012s
 +
  source code (wonderland:/home/youna/prj/RV/vcfReader.v1)
 +
  binary file (wonderland:/home/youna/prj/RV/vcfReader.v1/executables/prepare012s)
 +
 
 +
Note: If you going to analyze nonsynonymous and stop annotated variants,
 +
  you should use Yanming's vcf annotation [http://genome.sph.umich.edu/wiki/VcfCodingSnps] on the vcf file.
 +
 
 +
Data File PREPARATION
 +
          Input files : --vcf [LDL.test.vcf], --log [], --IDfile []
 +
  Subsetting choices : --All
 +
        Output files : --outputPrefix [subsetGeno],
 +
                        --outputGeneList [LDL.geneList.txt]
 +
  --vcf: Input vcf file
 +
  --log: This is the log file from Yanming's annotation output, we use this log to obtain the gene list
 +
  --IDfile specifies a file with one column of subject IDs to subject from the vcf file.
 +
    If it is not specified, then all subjects are included for the format conversion.
 +
  --All:  specifies 1 to include all variants and 0 to include only nonsyn and stop annotated variants.
 +
  -- outputPrefix: Specify the prefix for the four output files which will be used in rvTests
 +
  *.012: A genotype matrix with subjects as rows and variant sites as columns.
 +
  *.012.pos: Chromosome and position numbers.
 +
  *.012.indv: Subject IDs.
 +
  *.012.frq: The frequency of the included variants.
 +
  --outputGeneList:  Specify a file to store the gene list which will be used in rvTest.
 +
  The list file looks like this
 +
  1 OR4F5 69090 70008
 +
  1 SAMD11 860529 871276
 +
  1 NOC2L 879583 893918
 +
  1 KLHL17 895966 901095
 +
  1 PLEKHN1 901876 910482
 +
  1 C1orf170 910578 912021
 +
 
 +
;--geneList: This file is an output from prepare012s using the option --outputGeneList  with columns as chromosome number, gene Name, start position, end position. There should be no header for this file.
 +
 
 +
THE CHROMOSOME NUMBERS SHOULD BE NUMERICS!!!! 1 - chromosome 1, DO NOT USE chr1.
 +
 
 +
;--cutoff: This is the minor allele frequency, you can specify it as 0.01, 0.05 or etc.
 +
;--collapseChoice: Specify one of {or,sum,wt}. or: Li-Leal's CMC test, sum: Use the number of rare variants for each subject as the score, wt: Madeson-Browning's weighted rare variant score.
 +
 
 +
;PHENOTYPE
 +
;--phenofile: A file where the first column is subject ID and the second column is phenotype (0 or 1).
 +
 
 +
;COVARIATES
 +
;--covConsider: Default = 0, no covariate is considered. 1. covariate is considered.
 +
;--covfile: Covariate file with the first column as subject ID and the other columns are covariates needed to be considered in the model.
 +
 
 +
;PERMUTATION
 +
;--nPermute: Number of permutation for the evaluation of p values.
 +
;-- PermutationSeed: Default = 1. Can be changed to other numbers too.
 +
 
 +
;GENE LEVEL TEST RESULT:
 +
;--geneGlobalTestOut: This file stores the 5% and 95% quantiles of the p values for all the genes at each permutation
 +
;--geneTestPvalueFile: This file gives you the gene name, number of rare variants, count of variants in case/control and p values from the RV test specified by collapseChoice.

Revision as of 15:37, 30 January 2011

Overview

A few rare variants tests (Li-Leal's CMC and Madsen-Browning's weighted method) are implemented in the logisitc regression framework using C++.

The source code is located at wonderland:/home/youna/prj/RV/RV3Tests.v1/

The binary file is located at wonderland:/home/youna/prj/RV/RV3Tests.v1/executables/rvTests

Example

See a detailed example here.

Syntax

This software uses command line interface as follows

RARE VARIANT ANALYSIS OPTIONS:

                GENOTYPE : --genofile [pos.012],
                           --geneList [outGeneSorted.txt], --cutoff [0.010],
                           --collapseChoice [or]
               PHENOTYPE : --phenofile [LDL.y.ID]
              COVARIATES : --covConsider, --covfile [covFile.ID.2.txt]
             PERMUTATION : --nPermute [10], --PermutationSeed [1]
  GENE LEVEL TEST RESULT : --geneGlobalTestOut [globalPermuteSummary.txt],
                           --geneTestpvalueFile [geneTestPvalues.txt]


GENOTYPE
--genofile
A genotype 012 matrix (.012 is the file) This file can be prepared by using the prepare012s
 source code (wonderland:/home/youna/prj/RV/vcfReader.v1)
 binary file (wonderland:/home/youna/prj/RV/vcfReader.v1/executables/prepare012s)
Note: If you going to analyze nonsynonymous and stop annotated variants, 
 you should use Yanming's vcf annotation [1] on the vcf file.

Data File PREPARATION

         Input files : --vcf [LDL.test.vcf], --log [], --IDfile []
  Subsetting choices : --All
        Output files : --outputPrefix [subsetGeno],
                       --outputGeneList [LDL.geneList.txt]
  --vcf: Input vcf file 
  --log: This is the log file from Yanming's annotation output, we use this log to obtain the gene list
  --IDfile specifies a file with one column of subject IDs to subject from the vcf file. 
   If it is not specified, then all subjects are included for the format conversion.
  --All:  specifies 1 to include all variants and 0 to include only nonsyn and stop annotated variants.
  -- outputPrefix: Specify the prefix for the four output files which will be used in rvTests
  *.012: A genotype matrix with subjects as rows and variant sites as columns.
  *.012.pos: Chromosome and position numbers. 
  *.012.indv: Subject IDs.
  *.012.frq: The frequency of the included variants.
 --outputGeneList:  Specify a file to store the gene list which will be used in rvTest.
 The list file looks like this 
 1	OR4F5	69090	70008
 1	SAMD11	860529	871276
 1	NOC2L	879583	893918
 1	KLHL17	895966	901095
 1	PLEKHN1	901876	910482
 1	C1orf170	910578	912021
--geneList
This file is an output from prepare012s using the option --outputGeneList with columns as chromosome number, gene Name, start position, end position. There should be no header for this file.

THE CHROMOSOME NUMBERS SHOULD BE NUMERICS!!!! 1 - chromosome 1, DO NOT USE chr1.

--cutoff
This is the minor allele frequency, you can specify it as 0.01, 0.05 or etc.
--collapseChoice
Specify one of {or,sum,wt}. or: Li-Leal's CMC test, sum: Use the number of rare variants for each subject as the score, wt: Madeson-Browning's weighted rare variant score.
PHENOTYPE
--phenofile
A file where the first column is subject ID and the second column is phenotype (0 or 1).
COVARIATES
--covConsider
Default = 0, no covariate is considered. 1. covariate is considered.
--covfile
Covariate file with the first column as subject ID and the other columns are covariates needed to be considered in the model.
PERMUTATION
--nPermute
Number of permutation for the evaluation of p values.
-- PermutationSeed
Default = 1. Can be changed to other numbers too.
GENE LEVEL TEST RESULT
--geneGlobalTestOut
This file stores the 5% and 95% quantiles of the p values for all the genes at each permutation
--geneTestPvalueFile
This file gives you the gene name, number of rare variants, count of variants in case/control and p values from the RV test specified by collapseChoice.