Difference between revisions of "GTDT"

Revision as of 11:57, 4 September 2014

Introduction

gTDT implemented gene-based or group-wise TDT for rare variants. Currently gTDT implemented 6 tests for 6 models, M1-M6. It takes as input a ped file and a dat file that specify the relationships, and a VCF file that stores genotype data.

The 6 models are described as follows

M1:    additive model with equal weights (sum statistics)
M2:    additive model with unequal weights (weighted sum statistics)
M3:    dominant model (i.e. carriers vs. non-carrier, or indicator model, or collapsing model)
M4:    compound heterozygous model (CH), also 2-hits or multi-hits models, or recessive model.
M5:    CH model with weighted sum on the rare variants that form compound heterozygotes, i.e. a hybrid of M4 and M2
M6:    2-allele additive model where haplotypes are classified into two alleles, one with rare variants and one with all common alleles

Usage

A command without any input will invoke triodenovo and display the following message

The following parameters are available.  Ones with "[]" are in effect:
                      pedfile :                 (-pname)
                      datfile :                 (-dname)
                    groupfile :                 (-gname)
                      vcffile :                 (-vname)
                      outfile :                 (-oname)

Additional Options
           Groupwise test : --sum [OFF] , --wss [OFF] , --cmc [OFF] ,
                            --comphet [OFF] , --comphet-weight [OFF] ,
                            --all [ON]
          Variant filters : --min_qual [0.00], --min_maf [0.00],
                            --max_maf [0.01], --min_avg_dp [0.00],
                            --max_avg_dp [0.00], --max_missing_rate [1.00],
                            --pass [OFF] 
         Genotype filters : --min_dp [0], --min_gq [0.00],
                            --missing_as_ref [OFF] 
         Phased genotypes : --phased [OFF] 
  Mendelian inconsistency : --ignore_mi_trio [OFF] 
       Disease to analyze : --disease []
            Non-autosomes : --chrX [X], --chrY [Y], --MT [MT]
       Empirical p-values : --permute [0], --seed [13579]
          Multi-threading : --nthreads [1]

Example 1: using default parameters

gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt

Example 1: analyze for disease diseaes_A specified in the .dat file. The .ped and .dat file can contain multiple phenotypes

gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --disease disease_A

Example 1: using maf cutoff of 0.05

gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --max_maf 0.05

Example 1: using maf cutoff of 0.05 and using phased genotyped provided in the vcf

gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --max_maf 0.05 --phased

Example 1: using the minimum depth 10 to filter a variant in a trio if one of the individuals in the trio has depth lower than 10

gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --min_dp 10

Input

A ped file, with 6 colums [see merlin documentation]. An example ped file is as follows

fam1 p1  0  0   1 0
fam1 p2  0  0   2 0
fam1 p3  p1 p2  1 2
fam2 p4  0  0   1 0
fam2 p5  0  0   2 0
fam2 p6  p4 p5  1 2

A dat file contain the information about the column 6 and beyond in the ped file. Here it is affection status for disease_A

A disease_A

A VCF file [VCF specs]. It can contain variant information for more individuals than in the ped file.

The genotypes in the VCF file can be either phased or unphased. If unphased, gTDT will phased the genotypes by transmission.

Output

Group   nVar    maf_sum carrier-ca:co:unphe     allele-ca:co:unphe      fa-t:nt mo-t:nt p_s     z_s     p_ws    z_ws    p_c     z_c     p_ch    z_ch    p_ch_w  z_ch_w  p_2a_add        z_2a_add
OR4F5   0       0       0:0:0   0:0:0   0:0     0:0     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan
SAMD11  8       0.0317978       8:0:14  10:0:15 4:3     6:2     0.225253        1.21268 0.352245        0.930244        0.376344        0.884652        0.414216        0.816497        0.220113        1.22623 0.376344        0.884652
NOC2L   3       0.0105932       2:0:5   2:0:5   1:2     0:2     0.179712        -1.34164        0.148039        -1.44649        0.179712        -1.34164        nan     nan     nan     nan     0.179712        -1.34164
KLHL17  5       0.0105932       2:0:5   2:0:5   1:2     1:1     0.654721        -0.447214       0.654721        -0.447214       0.654721        -0.447214       nan     nan     nan     nan     0.654721        -0.447214

By default there are 19 columns in the output, and the meaning of each columns is described as follows

Group: group/gene name
nVar: # of variants included in the analysis
maf_sum: sum of MAF across all variants included in the analysis
carrier-ca:co:unphe: #rare variant carriers in case:control:unphenotype groups
allele-ca:co:unphe:  #rare alleles in case:control:unphenotype groups
fa-t:nt: father #transmission:#non-transmitted alleles
mo-t:nt: mother #transmission:#non-transmitted alleles
p_s: p-value of equal weight sum statistics
z_s: z-score of equal weight sum statistics
p_ws: p-value of weighted sum statistics
z_ws: z-score of weighted sum statistics
p_c: p-value of collapsing test (i.e. carrier vs. non-carrier, considered as the dominant model)
z_c: z-score of the collapsing test
p_ch: p-value of compound heterozygous model (CH)
z_ch: z-score of CH model
p_ch_w: p-value of CH model with weights
z_ch_w: z-score of CH model with weights
p_2a_add: p-value of the 2-allele additive model where haplotypes are classified into two alleles, one with rare variants and one with all common alleles.
z_2a_add: z-score of the 2-allele additive model

Difference between revisions of "GTDT"

Revision as of 11:57, 4 September 2014

Contents

Introduction

Usage

Input

Output

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools