Changes

Rvtests (view source)

Revision as of 15:43, 12 January 2015

7,561 bytes added , 15:43, 12 January 2015

no edit summary

Line 1: Line 1: −

~~'''Table of Contents'''~~

+

−

* [[#introduction|Introduction]]

+

'''Table of Contents'''

−

* [[#download|Download]]

+

* [[#introduction|Introduction]]<br />

+

* [[#download|Download]]<br />

* [[#quick-tutorial|Quick Tutorial]]

−

** [[#single-variant-tests|Single variant tests]]

+

** [[#single-variant-tests|Single variant tests]]<br />

−

** [[#groupwise-tests|Groupwise tests]]

+

** [[#groupwise-tests|Groupwise tests]]<br />

−

** [[#related-individual-tests|Related individual tests]]

+

** [[#related-individual-tests|Related individual tests]]<br />

** [[#meta-analysis-tests|Meta-analysis tests]]

+

*** [[#dominant-models-and-recessive-models|Dominant models and recessive models]]<br />

* [[#input-files|Input files]]

−

** [[#genotype-file-vcf|Genotype file (VCF)]]

+

** [[#genotype-file-vcf|Genotype file (VCF)]]<br />

−

** [[#phenotype-file|Phenotype file]]

+

** [[#phenotype-file|Phenotype file]]<br />

−

** [[#covariate-file|Covariate file]]

+

** [[#covariate-file|Covariate file]]<br />

−

** [[#trait-transformation|Trait transformation]]

+

** [[#trait-transformation|Trait transformation]]<br />

* [[#models|Models]]

−

** [[#single-variant-tests-1|Single variant tests]]

+

** [[#single-variant-tests|Single variant tests]]<br />

−

** [[#burden-tests|Burden tests]]

+

** [[#burden-tests|Burden tests]]<br />

−

** [[#variable-threshold-models|Variable threshold models]]

+

** [[#variable-threshold-models|Variable threshold models]]<br />

−

** [[#kernel-models|Kernel models]]

+

** [[#kernel-models|Kernel models]]<br />

−

** [[#utility-models|Utility models]]

+

** [[#meta-analysis-models|Meta-analysis models]]<br />

+

** [[#utility-models|Utility models]]<br />

* [[#association-test-options|Association test options]]

−

** [[#sample-inclusionexclusion|Sample inclusion/exclusion]]

+

** [[#sample-inclusionexclusion|Sample inclusion/exclusion]]<br />

−

** [[#variant-site-filters|Variant site filters]]

+

** [[#variant-site-filters|Variant site filters]]<br />

−

** [[#~~genotyep~~-filters|~~Genotyep~~ filters]]

+

** [[#genotype-filters|Genotype filters]]<br />

−

** [[#handle-missing-genotypes-and-phenotypes|Handle missing genotypes and phenotypes]]

+

** [[#handle-missing-genotypes-and-phenotypes|Handle missing genotypes and phenotypes]]<br />

−

** [[#specify-groups-eg-burden-unit|Specify groups (e.g burden unit)]]

+

** [[#specify-groups-eg-burden-unit|Specify groups (e.g burden unit)]]<br />

−

* [[#~~contact~~|Contact]]

+

* [[#sex-chromosome-analysis|Sex chromosome analysis]]<br />

+

* [[#kinship-generation|Kinship generation]]<br />

+

* [[#frequently-asked-questions-faq|Frequently Asked Questions (FAQ)]]<br />

+

* [[#feedbackcontact|Feedback/Contact]]

+

+

[https://travis-ci.org/zhanxw/rvtests [[Image:https://travis-ci.org/zhanxw/rvtests.png?branch=master|Build Status]]]

+

(Updated: Janurary 2015)

= Introduction =

Line 33: Line 47:

= Download =

−

Source ~~file~~ can be downloaded from [https://github.com/zhanxw/rvtests/archive/master.zip github] or [https://github.com/zhanxw/rvtests github page]. Executable binary ~~file~~ can be downloaded from [~~http~~://~~www~~.~~sph.umich.edu~~/~~csg~~/~~zhanxw~~/~~software~~/~~rvtests~~/rvtests-~~latest~~.tar.gz here].

+

Source files can be downloaded from [https://github.com/zhanxw/rvtests/archive/master.zip github] or [https://github.com/zhanxw/rvtests github page].<br />Executable binary files (for Linux 64bit) can be downloaded from [https://github.com/zhanxw/rvtests/releases/download/v1.8.6/rvtests-20150104.tar.gz here].

= Quick Tutorial =

Line 42: Line 56:

<pre>rvtests --inVcf input.vcf --pheno phenotype.ped --out output --single wald,score</pre>

−

This specifies single variant Wald and score test for association tests for every variant in the <code>input.vcf</code> file. The 6th column of the phenotype file, <code>phenotype.ped</code>, which is in PLINK format, is used. Rvtests will automatically check whether the phenotype is binary trait or quantitative trait.

+

This specifies single variant Wald and score test for association<br />tests for every variant in the <code>input.vcf</code> file. The 6th column of the phenotype file, <code>phenotype.ped</code>, which is in PLINK format, is used. Rvtests will automatically check whether the phenotype is binary trait or quantitative trait.<br />For binary trait, the recommended way of coding is to code controls as 1, cases as 2, missing phenotypes as -9 or 0.

−

For other types of association tests, you can refer to [[#~~Models~~|Models]]

+

For other types of association tests, you can refer to [[#models|Models]]

== Groupwise tests ==

Line 50: Line 64:

Groupwise tests includes three major kinds of tests.

−

* Burden tests: group variants, which are usually less than 1% or 5% rare variants, for association tests. The category includes: CMC test, Zeggini test, Madsen-Browning test, CMAT test, and rare-cover test.

+

* Burden tests: group variants, which are usually less than 1% or 5% rare variants, for association tests. The category includes: CMC test, Zeggini test, Madsen-Browning test, CMAT test, and rare-cover test.<br />

−

* Variable threshold tests: group variants under different frequency thresholds.

+

* Variable threshold tests: group variants under different frequency thresholds.<br />

* Kernel methods: suitable to tests rare variants having different directions of effects. These includes SKAT test and KBAC test.

Line 58: Line 72:

To perform rare variant tests by gene, you need to use <code>--geneFile</code> to specify the gene range in a refFlat format. We provided different gene definitions in the [[#Resources|Resources]] section. You can use <code>--gene</code> to specify which gene(s) to test. For example, specify <code>--gene CFH,ARMS2</code> will perform association tests on CFH and ARMS2 genes. If there is no providing <code>--gene</code> option, all genes will be tests.

−

The following command line demonstrate how to use CMC method, variable threshold method(proposed by Price) and kernel based method (SKAT by Shawn Lee and KBAC by Dajiang Liu) to test every gene listed in ''refFlat_hg19_uniq_gene.txt.gz''.

+

The following command line demonstrate how to use CMC method, variable threshold method(proposed by Price) and kernel based method (SKAT by Shawn Lee and KBAC by<br />Dajiang Liu) to test every gene listed in ''refFlat_hg19_uniq_gene.txt.gz''.

<pre>rvtests --inVcf input.vcf --pheno phenotype.ped --out output --geneFile refFlat_hg19_uniq_gene.txt.gz --burden cmc --vt price --kernel skat,kbac</pre>

Line 73: Line 87:

== Meta-analysis tests ==

−

The meta-analysis models outputs association test results and genotype covariance matrix. These statistics can be used in rare variant association analysis. We provide single variant score test and generate genotype covariance matrix. You can use command:

+

The meta-analysis models outputs association test results and genotype covariance matrix. These statistics can be used in rare variant association analysis.<br />We provide single variant score test and generate genotype covariance matrix.<br />You can use command:

+

<pre>rvtests --inVcf input.vcf --pheno phenotype.ped --meta score,cov --out output</pre>

+

In a more realistic scenario, you may want to adjust for covariates and want to inverse normalized residuals obtained in null model ([http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.2852.html link] to our methodology paper), then this command will work:

<pre>rvtests --inVcf input.vcf --pheno phenotype.ped --covar example.covar --covar-name age,bmi --inverseNormal --useResidualAsPhenotype --meta score,cov --out output</pre>

Here the <code>--covar</code> specify a covariate file, and <code>--covar-name</code> specify which covariates can used in the analysis. Covariate file format can be found [[#Covariate%20file|here]]. <code>--inverseNormal --useResidualAsPhenotype</code> specifies trait transformation method. That means first fit a regression model of the phenotype on covariates (intercept automatically added), then the residuals are inverse normalized. Trait transformation details can be found [[#Trait%20transformation|here]].

+

We support both unrelated individuals and related indivudlas (e.g. family data). You need to append <code>--kinship input.kinship</code> to the command line:

+

<pre>rvtests --inVcf input.vcf --pheno phenotype.ped --meta score,cov --out output --kinship input.kinship</pre>

+

The file <code>input.kinship</code> is calculated by <code>vcf2kinship</code> program, and usage to this program is described in [[#related-individual-tests|Related individual tests]].

+

=== Dominant models and recessive models ===

+

Dominant and recessive disease models are supported by appending "dominant" and/or "recessive" after "--meta" option. For example, use "--meta dominant,recessive" will<br />generate two sets of files. For dominant model, they are "prefix.MetaDominant.assoc" and "prefix.MetaDominantCov.assoc.gz"; for recessive model,<br />they are "prefix.MetaRecessive.assoc" and "prefix.MetaRecessiveCov.assoc.gz". Internally, in dominant models, genotypes 0/1/2 are coded as 0/1/1; in recessive models, genotypes 0/1/2 are<br />coded as 0/0/1. Missing genotypes will be imputed to the mean.

= Input files =

Line 89: Line 115:

tabix -f -p vcf $your_vcf_file</pre>

The above commands will (1) remove the <code>chr</code> prefix from chromosome names; (2) sort VCF files by chromosome first, then by chromosomal positions; (3) compress using bgzip; (4) create tabix index.

+

Rvtests support genotype dosages. Use <code>--dosage DosageTag</code> to specify the dosage tag. For example, if VCF format field is "GT:EC" and individual genotype fields is "0/0:0.02", you can use <code>--dosage EC</code>, and rvtests will use the dosage 0.02 in the regression models.

== Phenotype file ==

Line 97: Line 125:

<pre>fid iid fatid matid sex y1 y2 y3 y4

−

P1 P1 0 0 0 1.7642934435605 -0.733862638327895 -0.980843608339726 1

+

P1 P1 0 0 0 1.7642934435605 -0.733862638327895 -0.980843608339726 2

−

P2 P2 0 0 0 0.457111744989746 0.623297281416372 -2.24266162284447 0

+

P2 P2 0 0 0 0.457111744989746 0.623297281416372 -2.24266162284447 1

−

P3 P3 0 0 0 0.566689682543218 1.44136462889459 -1.6490100777089 0

+

P3 P3 0 0 0 0.566689682543218 1.44136462889459 -1.6490100777089 1

−

P4 P4 0 0 0 0.350528353203767 -1.79533911725537 -1.11916876241804 0

+

P4 P4 0 0 0 0.350528353203767 -1.79533911725537 -1.11916876241804 1

−

P5 P5 0 0 1 2.72675074738545 -1.05487747371158 -0.33586430010589 1</pre>

+

P5 P5 0 0 1 2.72675074738545 -1.05487747371158 -0.33586430010589 2</pre>

−

Phenotype file is specified by the option <code>--pheno example.pheno</code> . The default phenotype column header is “<code>y1</code>”. If you want to use alternative columns as phenotype for association analysis (e.g the column with header y2), you may ~~specific~~ the ~~header names~~ using either

+

Phenotype file is specified by the option <code>--pheno example.pheno</code> . The default phenotype column header is “<code>y1</code>”. If you want to use alternative columns as phenotype for association analysis (e.g the column with header y2), you may specify the phenotype by column or by name using either

−

* --mpheno 2

+

* --mpheno 2<br />

* --pheno-name y2

Line 110: Line 138:

In phenotype file, missing values can be denoted by NA or any non-numeric values. Individuals with missing phenotypes will be automatically dropped from subsequent association analysis. For each missing phenotype value, a warning will be generated and recorded in the log file.

+

When the phenotype values are only 0, 1 and 2, rvtests will automatically treat it as binary traits. However, if you want to treat it as continuous trait, please use "<code>--qtl</code>" option.

== Covariate file ==

Line 117: Line 147:

The covariate file, (e.g. <code>example.covar</code>) has a similar format as the phenotype file:

−

<pre>fid iid ~~age bmi pc1 pc2 pc3~~

+

<pre>fid iid fatid matid sex y1 y2 y3 y4

−

P1 P1 ~~23 24.2~~ 1.~~7642934435605~~ -0.~~733862638327895~~ -0.~~980843608339726~~

+

P1 P1 0 0 0 1.911 -1.465 -0.817 1

−

P2 P2 ~~32 29.~~0 0~~.457111744989746~~ 0.~~623297281416372~~ -2.~~24266162284447~~

+

P2 P2 0 0 0 2.146 -2.451 -0.178 2

−

P3 P3 ~~44 32.4~~ 0.~~566689682543218~~ 1.~~44136462889459~~ -1~~.6490100777089~~

+

P3 P3 0 0 0 1.086 -1.194 -0.899 1

−

P4 P4 ~~25 28.2~~ 0.~~350528353203767~~ -1.~~79533911725537~~ -1~~.11916876241804~~

+

P4 P4 0 0 0 0.704 -1.052 -0.237 1

−

P5 P5 ~~30 19.8~~ 2.~~72675074738545~~ -1.~~05487747371158~~ -0.~~33586430010589~~</pre>

+

P5 P5 0 0 1 2.512 -3.085 -2.579 1</pre>

−

The covariate file is specified by the <code>--covar</code> option (e.g. <code>--covar example.covar</code>). To specify covariates that will be used in the association analysis, the option <code>--covar-name</code> can be used. For example, when age, bmi and 3 PCs are used for association analysis, the following option can be specified for the rvtest program, i.e. <code>--covar example.covar --covar-name age,bmi,pc1,pc2,pc3</code>.

+

The covariate file is specified by the <code>--covar</code> option (e.g. <code>--covar example.covar</code>). To specify covariates that will be used in the association analysis, the option <code>--covar-name</code> can be used. For example, when age, bmi and 3 PCs are used for association analysis, the following option can be specified for the rvtest program, i.e.<br /><code>--covar example.covar --covar-name age,bmi,pc1,pc2,pc3</code>.

Note: Missing data in the covariate file can be labeled by any non-numeric value (e.g. NA). They will be automatically imputed to the mean value in the data file.

Line 140: Line 170:

== Single variant tests ==

−

~~<table>~~

+

{|

−

~~<thead>~~

+

!Single variant

−

~~<tr class="header">~~

+

!align="center"|Model(#)

−

~~<th align="left">~~Single variant~~</th>~~

+

!align="center"|Traits(##)

−

~~<th~~ align="center">Model(*)~~</th>~~

+

!align="center"|Covariates

−

~~<th~~ align="center">Traits(#)~~</th>~~

+

!align="center"|Related / unrelated

−

~~<th~~ align="center">Covariates~~</th>~~

+

!Description

−

~~<th~~ align="center">Related / unrelated~~</th>~~

+

|-

−

~~<th align="left">~~Description~~</th>~~

+

|Score test

−

~~</tr>~~

+

|align="center"|score

−

~~</thead>~~

+

|align="center"|B, Q

−

~~<tbody>~~

+

|align="center"|Y

−

~~<tr class="odd">~~

+

|align="center"|U

−

~~<td align="left">~~Score test~~</td>~~

+

|Only null model is used to performed the test

−

~~<td~~ align="center">score~~</td>~~

+

|-

−

~~<td~~ align="center">B, Q~~</td>~~

+

|Wald test

−

~~<td~~ align="center">Y~~</td>~~

+

|align="center"|wald

−

~~<td~~ align="center">U~~</td>~~

+

|align="center"|B, Q

−

~~<td align="left">~~Only null model is used to performed the test~~</td>~~

+

|align="center"|Y

−

~~</tr>~~

+

|align="center"|U

−

~~<tr class="even">~~

+

|Only fit alternative model, and effect size will be estimated

−

~~<td align="left">~~Wald test~~</td>~~

+

|-

−

~~<td~~ align="center">wald~~</td>~~

+

|Exact test

−

~~<td~~ align="center">B, Q~~</td>~~

+

|align="center"|exact

−

~~<td~~ align="center">Y~~</td>~~

+

|align="center"|B

−

~~<td~~ align="center">U~~</td>~~

+

|align="center"|N

−

~~<td align="left">~~Only fit alternative model, and effect size will be estimated~~</td>~~

+

|align="center"|U

−

~~</tr>~~

+

|Fisher's test

−

~~<tr class="odd">~~

+

|-

−

~~<td align="left">~~Exact test~~</td>~~

+

|Fam LRT

−

~~<td~~ align="center">exact~~</td>~~

+

|align="center"|famLRT

−

~~<td~~ align="center">B~~</td>~~

+

|align="center"|Q

−

~~<td~~ align="center">N~~</td>~~

+

|align="center"|Y

−

~~<td~~ align="center">U~~</td>~~

+

|align="center"|R, U

−

~~<td align="left">~~Fisher's test~~</td>~~

+

|Fast-LMM model

−

~~</tr>~~

+

|-

−

~~<tr class="even">~~

+

|Fam Score

−

~~<td align="left">~~Fam LRT~~</td>~~

+

|align="center"|famScore

−

~~<td~~ align="center">famLRT~~</td>~~

+

|align="center"|Q

−

~~<td~~ align="center">Q~~</td>~~

+

|align="center"|Y

−

~~<td~~ align="center">Y~~</td>~~

+

|align="center"|R, U

−

~~<td~~ align="center">R, U~~</td>~~

+

|Fast-LMM model style likelihood ratio test

−

~~<td align="left">~~Fast-LMM model~~</td>~~

+

|-

−

~~</tr>~~

+

|Grammar-gamma

−

~~<tr class="odd">~~

+

|align="center"|famGrammarGamma

−

~~<td align="left">~~Fam Score~~</td>~~

+

|align="center"|Q

−

~~<td~~ align="center">famScore~~</td>~~

+

|align="center"|Y

−

~~<td~~ align="center">Q~~</td>~~

+

|align="center"|R, U

−

~~<td~~ align="center">Y~~</td>~~

+

|Grammar-gamma method

−

~~<td~~ align="center">R, U~~</td>~~

+

|}

−

~~<td align="left">~~Fast-LMM model style likelihood ratio test~~</td>~~

−

~~</tr>~~

−

~~<tr class="even">~~

−

~~<td align="left">~~Grammar-gamma~~</td>~~

−

~~<td~~ align="center">famGrammarGamma~~</td>~~

−

~~<td~~ align="center">Q~~</td>~~

−

~~<td~~ align="center">Y~~</td>~~

−

~~<td~~ align="center">R, U~~</td>~~

−

~~<td align="left">~~Grammar-gamma method~~</td>~~

−

~~</tr>~~

−

~~</tbody>~~

−

~~</table>~~

−

(*) Model columns list the regconized names in rvtests. For example, use <code>--single score</code> will apply score test.

+

(#) Model columns list the regconized names in rvtests. For example, use <code>--single score</code> will apply score test.

−

# In trait column, B and Q stand for binary, quantitiave trait.

+

(##) In trait column, B and Q stand for binary, quantitiave trait.

== Burden tests ==

−

~~<table>~~

+

{|

−

~~<thead>~~

+

!Burden tests

−

~~<tr class="header">~~

+

!align="center"|Model(#)

−

~~<th align="left">~~Burden tests~~</th>~~

+

!align="center"|Traits(##)

−

~~<th~~ align="center">Model(*)~~</th>~~

+

!align="center"|Covariates

−

~~<th~~ align="center">Traits(#)~~</th>~~

+

!align="center"|Related / unrelated

−

~~<th~~ align="center">Covariates~~</th>~~

+

!Description

−

~~<th~~ align="center">Related / unrelated~~</th>~~

+

|-

−

~~<th align="left">~~Description~~</th>~~

+

|CMC

−

~~</tr>~~

+

|align="center"|cmc

−

~~</thead>~~

+

|align="center"|B, Q

−

~~<tbody>~~

+

|align="center"|N

−

~~<tr class="odd">~~

+

|align="center"|U

−

~~<td align="left">~~CMC~~</td>~~

+

|Collapsing and combine rare variants by Bingshan Li.

−

~~<td~~ align="center">cmc~~</td>~~

+

|-

−

~~<td~~ align="center">B, Q~~</td>~~

+

|Zeggini

−

~~<td~~ align="center">N~~</td>~~

+

|align="center"|zeggini

−

~~<td~~ align="center">U~~</td>~~

+

|align="center"|B, Q

−

~~<td align="left">~~Collapsing and combine rare variants by Bingshan Li.~~</td>~~

+

|align="center"|N

−

~~</tr>~~

+

|align="center"|U

−

~~<tr class="even">~~

+

|Aggregate counts of rare variants by Morris Zeggini.

−

~~<td align="left">~~Zeggini~~</td>~~

+

|-

−

~~<td~~ align="center">zeggini~~</td>~~

+

|Madsen-Browning

−

~~<td~~ align="center">B, Q~~</td>~~

+

|align="center"|mb

−

~~<td~~ align="center">N~~</td>~~

+

|align="center"|B

−

~~<td~~ align="center">U~~</td>~~

+

|align="center"|N

−

~~<td align="left">~~Aggregate counts of rare variants by Morris Zeggini.~~</td>~~

+

|align="center"|U

−

~~</tr>~~

+

|Up-weight rare variant using inverse frequency from controls by Madsen.

−

~~<tr class="odd">~~

+

|-

−

~~<td align="left">~~Madsen-Browning~~</td>~~

+

|Fp

−

~~<td~~ align="center">mb~~</td>~~

+

|align="center"|fp

−

~~<td~~ align="center">B~~</td>~~

+

|align="center"|B

−

~~<td~~ align="center">N~~</td>~~

+

|align="center"|N

−

~~<td~~ align="center">U~~</td>~~

+

|align="center"|U

−

~~<td align="left">~~Up-weight rare variant using inverse frequency from controls by Madsen.~~</td>~~

+

|Up-weight rare variant using inverse frequency from controls by Danyu Lin.

−

~~</tr>~~

+

|-

−

~~<tr class="even">~~

+

|Exact CMC

−

~~<td align="left">~~Fp~~</td>~~

+

|align="center"|exactCMC

−

~~<td~~ align="center">fp~~</td>~~

+

|align="center"|B

−

~~<td~~ align="center">B~~</td>~~

+

|align="center"|N

−

~~<td~~ align="center">N~~</td>~~

+

|align="center"|U

−

~~<td~~ align="center">U~~</td>~~

+

|Collapsing and combine rare variants, then pefore Fisher's exact test.

−

~~<td align="left">~~Up-weight rare variant using inverse frequency from controls by Danyu Lin.~~</td>~~

+

|-

−

~~</tr>~~

+

|CMC Wald

−

~~<tr class="odd">~~

+

|align="center"|cmcWald

−

~~<td align="left">~~Exact CMC~~</td>~~

+

|align="center"|B, Q

−

~~<td~~ align="center">exactCMC~~</td>~~

+

|align="center"|N

−

~~<td~~ align="center">B~~</td>~~

+

|align="center"|U

−

~~<td~~ align="center">N~~</td>~~

+

|Collapsing and combine rare variants, then pefore Wald test.

−

~~<td~~ align="center">U~~</td>~~

+

|-

−

~~<td align="left">~~Collapsing and combine rare variants, then pefore Fisher's exact test.~~</td>~~

+

|RareCover

−

~~</tr>~~

+

|align="center"|rarecover

−

~~<tr class="even">~~

+

|align="center"|B

−

~~<td align="left">RareCover</td>~~

+

|align="center"|N

−

~~<td~~ align="center"~~>rarecover</td>~~

+

|align="center"|U

−

~~<td~~ align="center">B~~</td>~~

+

|Find optimal grouping unit for rare variant tests by Thomas Hoffman.

−

~~<td~~ align="center">N~~</td>~~

+

|-

−

~~<td~~ align="center">U~~</td>~~

+

|CMAT

−

~~<td align="left">Find optimal grouping unit for~~ rare ~~variant tests by Thomas Hoffman~~.~~</td>~~

+

|align="center"|cmat

−

~~</tr>~~

+

|align="center"|B

−

~~<tr class="odd">~~

+

|align="center"|N

−

~~<td align="left">CMAT</td>~~

+

|align="center"|U

−

~~<td~~ align="center"~~>cmat</td>~~

+

|Test non-coding variants by Matt Z.

−

~~<td~~ align="center">B~~</td>~~

+

|}

−

~~<td~~ align="center">N~~</td>~~

−

~~<td~~ align="center">U~~</td>~~

−

~~<td align="left">Test non-coding variants~~ by ~~Matt Z~~.~~</td>~~

−

~~</tr>~~

−

~~<tr class="even">~~

−

~~<td align="left">CMC Wald</td>~~

−

~~<td~~ align="center"~~>cmcWald</td>~~

−

~~<td~~ align="center">B~~, Q</td>~~

−

~~<td~~ align="center">N~~</td>~~

−

~~<td~~ align="center">U~~</td>~~

−

~~<td align="left">Collapsing and combine rare~~ variants~~, then pefore Wald test~~.~~</td>~~

−

~~</tr>~~

−

~~</tbody>~~

−

~~</table>~~

−

(*) Model columns list the regconized names in rvtests. For example, use <code>--burden cmc</code> will apply CMC test.

+

(#) Model columns list the regconized names in rvtests. For example, use <code>--burden cmc</code> will apply CMC test.

−

# In trait column, B and Q stand for binary, quantitiave trait.

+

(##) In trait column, B and Q stand for binary, quantitiave trait.

== Variable threshold models ==

−

~~<table>~~

+

{|

−

~~<thead>~~

+

!Single variant

−

~~<tr class="header">~~

+

!align="center"|Model(#)

−

~~<th align="left">~~Single variant~~</th>~~

+

!align="center"|Traits(##)

−

~~<th~~ align="center">Model(*)~~</th>~~

+

!align="center"|Covariates

−

~~<th~~ align="center">Traits(#)~~</th>~~

+

!align="center"|Related / unrelated

−

~~<th~~ align="center">Covariates~~</th>~~

+

!Description

−

~~<th~~ align="center">Related / unrelated~~</th>~~

+

|-

−

~~<th align="left">~~Description~~</th>~~

+

|Variable threshold model

−

~~</tr>~~

+

|align="center"|vt

−

~~</thead>~~

+

|align="center"|B, Q

−

~~<tbody>~~

+

|align="center"|N

−

~~<tr class="odd">~~

+

|align="center"|U

−

~~<td align="left">~~Variable threshold model~~</td>~~

+

|Every rare-variant frequency cutoffs are tests by Alkes Price.

−

~~<td~~ align="center">vt~~</td>~~

+

|-

−

~~<td~~ align="center">B, Q~~</td>~~

+

|Variable threshold CMC

−

~~<td~~ align="center">N~~</td>~~

+

|align="center"|cmc

−

~~<td~~ align="center">U~~</td>~~

+

|align="center"|B, Q

−

~~<td align="left">~~Every rare-variant frequency cutoffs are tests by Alkes Price.~~</td>~~

+

|align="center"|N

−

~~</tr>~~

+

|align="center"|U

−

~~<tr class="even">~~

+

|This models is natiive so that it output CMC test statistics under all possible frequency cutoffs.

−

~~<td align="left">~~Variable threshold CMC~~</td>~~

+

|}

−

~~<td~~ align="center">cmc~~</td>~~

−

~~<td~~ align="center">B, Q~~</td>~~

−

~~<td~~ align="center">N~~</td>~~

−

~~<td~~ align="center">U~~</td>~~

−

~~<td align="left">~~This models is natiive so that it output CMC test statistics under all possible frequency cutoffs.~~</td>~~

−

~~</tr>~~

−

~~</tbody>~~

−

~~</table>~~

−

(*) Model columns list the regconized names in rvtests. For example, use <code>--vt price</code> will apply score test.

+

(#) Model columns list the regconized names in rvtests. For example, use <code>--vt price</code> will apply score test.

−

# In trait column, B and Q stand for binary, quantitiave trait.

+

(##) In trait column, B and Q stand for binary, quantitiave trait.

== Kernel models ==

−

~~<table>~~

+

{|

−

~~<thead>~~

+

!Kernel

−

~~<tr class="header">~~

+

!align="center"|Model(#)

−

~~<th align="left">~~Kernel~~</th>~~

+

!align="center"|Traits(##)

−

~~<th~~ align="center">Model(*)~~</th>~~

+

!align="center"|Covariates

−

~~<th~~ align="center">Traits(#)~~</th>~~

+

!align="center"|Related / unrelated

−

~~<th~~ align="center">Covariates~~</th>~~

+

!Description

−

~~<th~~ align="center">Related / unrelated~~</th>~~

+

|-

−

~~<th align="left">~~Description~~</th>~~

+

|SKAT

−

~~</tr>~~

+

|align="center"|skat

−

~~</thead>~~

+

|align="center"|B, Q

−

~~<tbody>~~

+

|align="center"|Y

−

~~<tr class="odd">~~

+

|align="center"|U

−

~~<td align="left">~~SKAT~~</td>~~

+

|Sequencing kernel association test by Shawn Lee.

−

~~<td~~ align="center">skat~~</td>~~

+

|-

−

~~<td~~ align="center">B, Q~~</td>~~

+

|KBAC

−

~~<td~~ align="center">Y~~</td>~~

+

|align="center"|kbac

−

~~<td~~ align="center">U~~</td>~~

+

|align="center"|B

−

~~<td align="left">~~Sequencing kernel association test by Shawn Lee.~~</td>~~

+

|align="center"|N

−

~~</tr>~~

+

|align="center"|U

−

~~<tr class="even">~~

+

|Kernel-based adaptive clustering model by Dajiang Liu.

−

~~<td align="left">~~KBAC~~</td>~~

+

|}

−

~~<td~~ align="center">kbac~~</td>~~

+

−

~~<td~~ align="center">B~~</td>~~

+

(#) Model columns list the regconized names in rvtests. For example, use <code>--kernel skat</code> will apply SKAT test.<br />To further customize SKAT test, you can use ''--kernel skat[nPerm=100:alpha=0.001:beta1=1:beta2=20]'' to specify permutation counts, type-1 error,<br />beta distribution parameters for upweighting rare variants. Rvtests will output a message showing:

−

~~<td~~ align="center">N~~</td>~~

+

−

~~<td~~ align="center">U~~</td>~~

+

<pre>[INFO] SKAT test significance will be evaluated using 10000 permutations at alpha = 0.001 (beta1 = 1.00, beta2 = 20.00)</pre>

−

~~<td align="left">~~Kernel-based adaptive clustering model by Dajiang Liu.</td>

+

(##) In trait column, B and Q stand for binary, quantitiave trait.

−

</tr>

−

</~~tbody~~>

−

</~~table~~>

−

~~(*) Model columns list the regconized names in rvtests. For example, use <code>~~-~~-kernel skat</code> will apply SKAT test.~~

+

== Meta-analysis models ==

−

# In trait column, B and Q stand for ~~binary~~, ~~quantitiave~~ trait.

+

{|

+

!Type

+

!align="center"|Model(#)

+

!align="center"|Traits(##)

+

!align="center"|Covariates

+

!align="center"|Related / unrelated

+

!Description

+

|-

+

|Score test

+

|align="center"|score

+

|align="center"|Q

+

|align="center"|Y

+

|align="center"|R, U

+

|standard score tests

+

|-

+

|Dominant model

+

|align="center"|dominant

+

|align="center"|Q

+

|align="center"|Y

+

|align="center"|R, U

+

|score tests and covariance matrix under dominant disease model

+

|-

+

|Recessive model

+

|align="center"|recessive

+

|align="center"|Q

+

|align="center"|Y

+

|align="center"|R, U

+

|score tests and covariance matrix under recessive disease model

+

|-

+

|Covariance

+

|align="center"|cov

+

|align="center"|Q

+

|align="center"|Y

+

|align="center"|R, U

+

|covariance matrix

+

|}

+

(#) Model columns list the regconized names in rvtests. For example, use <code>--meta score,cov</code> will generate score statistics and covariance matrix for meta-analysis.<br />(##) In trait column, B and Q stand for (b)inary, (q)uantitiave trait.

== Utility models ==

Line 380: Line 410:

--peopleExcludeID : give IDs of people that will be included in study

--peopleExcludeFile : from given file, set IDs of people that will be included in study</pre>

−

<code>--peopleIncludeID</code> and <code>--peopleExcludeID</code> are used to include/exclude samples from command line. For example, specify <code>--peopleIncludeID A,B,C</code> will include A, B and C sample from the VCF files if they exists. <code>--peopleIncludeID</code> and <code>--peopleExcludeID</code> followed by a file name will include or exclude the IDs in the file. So to include sample A, B and C, you can provide a file, <code>people.txt</code>, looks like:

+

<code>--peopleIncludeID</code> and <code>--peopleExcludeID</code> are used to include/exclude samples from command line.<br />For example, specify <code>--peopleIncludeID A,B,C</code> will include A, B and C sample from the VCF files if they exists.<br /><code>--peopleIncludeID</code> and <code>--peopleExcludeID</code> followed by a file name will include or exclude the IDs in the file.<br />So to include sample A, B and C, you can provide a file, <code>people.txt</code>, looks like:

<pre>A

Line 389: Line 419:

== Variant site filters ==

−

It is common that different frequency cutoffs are applied in rare-variant analysis. Therefore, rvtests specify frequency cutoffs.

+

It is common that different frequency cutoffs are applied in rare-variant analysis.<br />Therefore, rvtests specify frequency cutoffs.

Frequency Cutoff

−

<pre> --freqUpper : Specify upper frequency bound to be included in analysis

+

<pre> --freqUpper : Specify upper minor allele frequency bound to be included in analysis

−

--freqLower : Specify lower frequency bound to be included in analysis</pre>

+

--freqLower : Specify lower minor allele frequency bound to be included in analysis</pre>

−

Similar to sample inclusion/exclusion options, you can specify a range of variants to be included by specifying <code>--rangeList</code> option. For example <code>--rangeList 1:100-200</code> will include the chromosome 1 position 100bp to 200bp region. Alternatively, use a separate file, <code>range.txt</code>, and <code>--rangeFile range.txt</code> to speicify association tests range.

+

If you specify <code>--freqLower 0.01 --freqUpper 0.05</code>, only the variants with minor allele ferquncy between 0.01 and 0.05 (boundary inclusive) will be analyzed.

+

Similar to sample inclusion/exclusion options, you can specify a range of variants to be included by<br />specifying <code>--rangeList</code> option. For example <code>--rangeList 1:100-200</code> will include the chromosome 1 position 100bp to 200bp region.<br />Alternatively, use a separate file, <code>range.txt</code>, and <code>--rangeFile range.txt</code> to speicify association tests range.

<pre> --rangeList : Specify some ranges to use, please use chr:begin-end format.

Line 406: Line 438:

--siteMACMin : Specify minimum Minor Allele Count(inclusive) to be incluced in analysis

--annoType : Specify annotation type that is follwed by ANNO= in the VCF INFO field, regular expression is allowed</pre>

−

''NOTE'': <code>--annoType Nonsynonymous</code> will only analyze nonsynonymous variants where they have <code>ANNO=Nonsynonymous</code> in the INFO field. VCF with annotatino information are called annotated VCF here. And to annotate a VCF file, you can use [https://github.com/zhanxw/anno ANNO], a fast and accurate annotation software.

+

''NOTE'': <code>--annoType Nonsynonymous</code> will only analyze nonsynonymous variants where they have <code>ANNO=Nonsynonymous</code> in the INFO field.<br />VCF with annotatino information are called annotated VCF here. And to annotate<br />a VCF file, you can use [https://github.com/zhanxw/anno ANNO], a fast and accurate annotation software.

== Genotype filters ==

Line 415: Line 447:

--indvDepthMax : Specify maximum depth(inclusive) of a sample to be incluced in analysis

--indvQualMin : Specify minimum depth(inclusive) of a sample to be incluced in analysis</pre>

−

When genotypes are filtered, they are marked as missing genotypes. Consequently, samples with missing genotype may or may not be included in the analysis. That means samples with genotypes may be dropped (<code>--impute drop</code>) or may still be included (<code>--impute mean</code> or <code>--impute hwe</code>). By default, genotypes are imputed to its means. See next section about how you like to handle missing genotypes.

+

When genotypes are filtered, they are marked as missing genotypes.<br />Consequently, samples with missing genotype may or may not be included in the analysis.<br />That means samples with genotypes may be dropped (<code>--impute drop</code>)<br />or may still be included (<code>--impute mean</code> or <code>--impute hwe</code>).<br />By default, genotypes are imputed to its means.<br />See next section about how you like to handle missing genotypes.

== Handle missing genotypes and phenotypes ==

−

When genotypes are missing (e.g. genotype = "./.") or gentoypes are filtered out, there are three options to handle them: (1) impute to its mean(default option); (2) impute by HWE equilibrium; (3) remove from the model. Use <code>--impute [mean|hwe|drop]</code> to specify which option to use.

+

When genotypes are missing (e.g. genotype = "./.") or gentoypes are filtered out,<br />there are three options to handle them: (1) impute to its mean(default option); (2) impute by HWE equilibrium; (3) remove from the model.<br />Use <code>--impute [mean|hwe|drop]</code> to specify which option to use.

−

When quantitative phenotypes are missing, for example, some samples have gneotype files, but not phenotypes, rvtests can impute missing phenotype to its mean.

+

When quantitative phenotypes are missing, for example, some samples have gneotype files, but not phenotypes,<br />rvtests can impute missing phenotype to its mean.

''NOTE:'' Do not use <code>--imputePheno</code> for binary trait.

Line 434: Line 466:

== Specify groups (e.g burden unit) ==

−

Rare variants association tests are usually performed in gruops of variants. The natural grouping unit is gene. Rvtests can read gene definition file in <code>refFlat</code> format, and perform association for each gene. Use <code>--geneFile</code> option to specify the gene file name. For example, <code>--geneFile refFlat_hg19.txt.gz</code> will use <code>refFlat_hg19.txt.gz</code> as gene definition file, and then perform association tests for every gene. Use <code>--gene</code> to specify a subset of genes to test. For example, <code>--gene CFH</code> will only test CFH gene.

+

Rare variants association tests are usually performed in gruops of variants.<br />The natural grouping unit is gene. Rvtests can read gene definition file in <code>refFlat</code> format,<br />and perform association for each gene. Use <code>--geneFile</code> option to specify the gene file name.<br />For example, <code>--geneFile refFlat_hg19.txt.gz</code> will use <code>refFlat_hg19.txt.gz</code> as gene definition file,<br />and then perform association tests for every gene. Use <code>--gene</code> to specify a subset of genes to test.<br />For example, <code>--gene CFH</code> will only test CFH gene.

−

Alternative grouping unit can be specified as ''set''. These ''sets'' are treated similar to gene. You can thus use <code>--setFile</code> to define sets (similar to <code>--geneFile</code> option), and use <code>--set</code> to define a specific set (similar to <code>--gene</code> option). Additionally, use <code>--setList</code> can speicify a set to test from command line.

+

Alternative grouping unit can be specified as ''set''.<br />These ''sets'' are treated similar to gene.<br />You can thus use <code>--setFile</code> to define sets (similar to <code>--geneFile</code> option),<br />and use <code>--set</code> to define a specific set (similar to <code>--gene</code> option).<br />Additionally, use <code>--setList</code> can speicify a set to test from command line.

−

The format of a set file is: (1) set names; (2) ranges (e.g. chrom:begin-end); For example, you have a set file, <code>example.set</code>, like this:

+

The format of a set file is: (1) set names; (2) ranges (e.g. chrom:begin-end);<br />For example, you have a set file, <code>example.set</code>, like this:

<pre>set1 1:100-200,1:250-300

set2 2:500-600</pre>

−

You can specify <code>--setFile example.set --set set2</code> to group variants within chromosome 2, position 500 to 600bp. If you want to test a particular region, for example, chromosome 2, position 500 to 550bp, but do not want to make another file, you can use <code>--setList 2:500-600</code>.

+

You can specify <code>--setFile example.set --set set2</code> to group variants<br />within chromosome 2, position 500 to 600bp.<br />If you want to test a particular region, for example, chromosome 2, position 500 to 550bp,<br />but do not want to make another file, you can use <code>--setList 2:500-600</code>.

In summary, options related to ''Grouping Unit'' are listed below:

Line 452: Line 484:

setName chr:beg-end)

--set : specify which set to test (1st column)</pre>

−

= ~~Contact~~ =

+

= Sex chromosome analysis =

+

Rvtests suppport X chromosome analysis. In human X chromosome, there is PAR (pseudoautosomal region) and non-PAR region.<br />For males, there are two X allele in PAR region and one allele in non-PAR region.<br />While the PAR region is treated in the same way as autosomes, rvtests treate non-PAR region differently.<br />Below we will describe the details about how rvtests handles non-PAR region.

+

''Prepare data''. According to VCF standard, male genotype needs to coded as 0 or 1. For compatibility, rvtests also support 0/0 or 1/1 coding.<br />In VCF files, male genotypes can be written as "0", "1", "0|0", "0/0", "1|1", "1/1". All other genotypes will be treated as missing.

+

''Genotype in the regression model''. For consistencmaine, male genotypes are converted to 0 or 2.

+

''MetaScore results''. If specify <code>--meta score</code>, the output file <code>prefix.MetaScore.assoc</code> includes PAR-region and non-PAR region analysis.<br />But in the non-PAR region, the difference is that Hardy-Weinberg P-value are calculated using female samples.

+

''Related individuals''. Just append <code>--xHemi</code> to the <code>vcf2kinship</code> (more details in [[#kinship-generation|Kinship generation]]) and <code>rvtest</code> command lines. Rvtests can recognize non-PAR region kinship and use it in the analysis.

+

''PAR region''. PAR region is defined as two regions X:60001-2699520 and X:154931044-155270560. Use <code>--xLabel</code> can speicify which chromosome has PAR region (default: 23|X)<br />and use <code>--xParRegion</code> to specify PAR region (default: hg19, meaning '60001-2699520,154931044-155260560' in the UCSC build hg19, specify "hg18" will use PAR region definition in the UCSC build hg18).

+

= Kinship generation =

+

Analysis of related individual usually requires estimation of kinship. You can a separate tool, <code>vcf2kinship</code>.<br /><code>vcf2kinship</code> is usually included in rvtests binary distribution or can be built from software source codes.

+

<code>vcf2kinship</code> can calcualte pedigree kinship using a pedigree input file (PED format, see [[#phenotype-file|Phenotype file]], use option <code>--ped</code>).<br />The output file name is specified by <code>--prefix</code> option. If you use <code>--prefix output</code> then the output files will include <code>output.kinship</code>.

+

It can also calculate empirical kinship using genotyp input file (VCF format, see [[#genotype-file-vcf|Genotype file (VCF)]], use option <code>--inVcf</code>).<br />For empiricial kinship, you also need to specify the kinship model, either Balding-Nicols model (ue option <code>--bn</code>) or Identity-by-state model (use option <code>--ibs</code>).

+

In sex chromosome analysis, it is often required to generate kinship on X chromsoome regions, then you need to speicfy <code>--xHemi</code>. If your input VCF file has different X chromosome label (e.g. chromosome name is '23' instead of 'X'), you can use <code>--xLabel 23</code>.

+

If principal component decomposition (PCA) results are needed, you can use option <code>--pca</code>. Then output files with suffix '.pca' include PCA results.

+

When dealing with large input files, it is often preferred to use multiple CPU to speed up calculation using the option <code>--thread N</code> in which N is the number of CPUs.

+

For example, to generate pedigree-based kinship (<code>--ped</code>) on both autosomal region and X chromosome (<code>--xHemi</code>) region, the command line is:

+

<pre>vcf2kinship --ped input.ped --xHemi --out output</pre>

+

To generate empirical kinship (<code>--inVcf</code>) on both autosomal region and X chromosome (<code>--xHemi</code>) region using Balding-Nicols model, the command line is:

+

<pre>vcf2kinship --inVcf input.vcf.gz --ped input.ped --bn --xHemi --out output</pre>

+

NOTE: you need to provide a pedigree file (PED) in the above case, as <code>vcf2kinship</code> need the sex information of samples.

+

= Frequently Asked Questions (FAQ) =

+

* Does rvtests suppport binary traits of related-individuals?

+

Not yet. It's a complex scenario and we have not found good solutions.

+

* Can you provide a list of command line options?

−

Questions and requests can be sent to Xiaowei Zhan ([mailto:zhanxw@umich.edu zhanxw@umich.edu]) or Goncalo Abecasis ([mailto:goncalo@umich.edu goncalo@umich.edu])

+

Rvtests have build help taht can be found by executing <code>rvtest --help</code>.<br />We also put all available options in this [https://github.com/zhanxw/rvtests/wiki/Command-Line-Options link].

+

* Can you provide standard error (SE) or confidence interval (CI) for the estimated Beta in the score model?

+

In the output of MetaScore model (--meta score), the standard error is the inverse of SQRT_V_STAT.<br />For example, if SQRT_V_STAT = 2, that means the standard error of estimated beta is 1/2 = 0.5.

+

* Why the INFORMATIVE_ALT_AC, N_REF and N_ALT columns have zero counts for certain chromosome X regions in meta-analysis models?

+

These counts are calculated from female individuals. If your study only has male samples, rvtests cannot report these counts. Because if a male carries a non-reference allele, we cannot conclude that this is heterozygous (0/1) site or homozygous alternatives (1/1) site.

+

= Feedback/Contact =

+

Questions and requests can be sent to Xiaowei Zhan<br />([mailto:zhanxw@umich.edu [mailto:zhanxw@umich.edu zhanxw@umich.edu]])<br />or Goncalo Abecasis<br />([mailto:goncalo@umich.edu [mailto:goncalo@umich.edu goncalo@umich.edu]])

Rvtests is a collaborative effort by Youna Hu, Bingshan Li, Dajiang Liu.

+

[https://bitdeli.com/free [[Image:https://d2weczhvl823v0.cloudfront.net/zhanxw/rvtests/trend.png|Bitdeli Badge]]]

Zhanxw

255

edits

Changes

Rvtests (view source)

Revision as of 15:43, 12 January 2015

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools