GTDT

From Genome Analysis Wiki
Jump to navigationJump to search

Introduction

  • gTDT implemented gene-based or group-wise TDT for rare variant aggregation analysis. Currently gTDT implemented haplotype-based tests for 6 models, M1-M6. It takes as input a ped file and a dat file that specify the relationships, and a VCF file that stores genotype data.
  • The 6 models are described as follows
AD:    additive model with equal weights (sum statistics)
wAD:    additive model with unequal weights (weighted sum statistics)
DOM:    dominant model (i.e. carriers vs. non-carrier, or indicator model, or collapsing model)
CH:    compound heterozygous model (CH), also 2-hits or multi-hits models, or recessive model.
wCH:    CH model with weighted sum on the rare variants that form compound heterozygotes, i.e. a hybrid of M4 and M2
2AD:    2-allele additive model where haplotypes are classified into two alleles, one with rare variants and one with all common alleles

Usage

A command gtdt without any input will invoke gtdt and display the following message

The following parameters are available.  Ones with "[]" are in effect:
                      pedfile :                 (-pname)
                      datfile :                 (-dname)
                    groupfile :                 (-gname)
                      vcffile :                 (-vname)
                      outfile :                 (-oname)
Additional Options
           Groupwise test : --sum [OFF] , --wss [OFF] , --cmc [OFF] ,
                            --comphet [OFF] , --comphet-weight [OFF] ,
                            --all [ON]
          Variant filters : --min_qual [0.00], --min_maf [0.00],
                            --max_maf [0.01], --min_avg_dp [0.00],
                            --max_avg_dp [0.00], --max_missing_rate [1.00],
                            --pass [OFF] 
         Genotype filters : --min_dp [0], --min_gq [0.00],
                            --missing_as_ref [OFF] 
         Phased genotypes : --phased [OFF] 
  Mendelian inconsistency : --ignore_mi_trio [OFF] 
       Disease to analyze : --disease []
            Non-autosomes : --chrX [X], --chrY [Y], --MT [MT]
       Empirical p-values : --permute [0], --seed [13579]
          Multi-threading : --nthreads [1]


  • Example 1: using default parameters
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt
  • Example 1: analyze for disease diseaes_A specified in the .dat file. The .ped and .dat file can contain multiple phenotypes
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --disease disease_A
  • Example 1: using maf cutoff of 0.05
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --max_maf 0.05
  • Example 1: using maf cutoff of 0.05 and using phased genotyped provided in the vcf
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --max_maf 0.05 --phased
  • Example 1: using the minimum depth 10 to filter a variant in a trio if one of the individuals in the trio has depth lower than 10
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --min_dp 10

Input

fam1 p1  0  0   1 0
fam1 p2  0  0   2 0
fam1 p3  p1 p2  1 2
fam2 p4  0  0   1 0
fam2 p5  0  0   2 0
fam2 p6  p4 p5  1 2
  • A dat file contains the information about the column 6 and beyond in the ped file. Here it is affection status for disease_A
A disease_A
  • A VCF file [VCF specs]. It can contain variant information for more individuals than in the ped file. The genotypes in the VCF file can be either phased or unphased. If unphased, gTDT will phased the genotypes by transmission.
  • A group file indicating which variants should be analyzed as a unit. It has required 3 columns, plus additional columns for weighting and others. For example:
#CHR   POS       Group
1    10000       gene1
1    20000       gene1
2    15000       gene2
2    25000       gene2
2    35000       gene2

If you have specific weights for each variant you can provide on column 4.

#CHR   POS       Group     WEIGHT
1    10000       gene1    2.1
1    20000       gene1    2.5
2    15000       gene2    1.5
2    25000       gene2    2.2
2    35000       gene2    2.8

Output

Group   nVar    maf_sum carrier-ca:co:unphe     allele-ca:co:unphe      fa-t:nt mo-t:nt p_s     z_s     p_ws    z_ws    p_c     z_c     p_ch    z_ch    p_ch_w  z_ch_w  p_2a_add        z_2a_add
OR4F5   0       0       0:0:0   0:0:0   0:0     0:0     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan
SAMD11  8       0.0317978       8:0:14  10:0:15 4:3     6:2     0.225253        1.21268 0.352245        0.930244        0.376344        0.884652        0.414216        0.816497        0.220113        1.22623 0.376344        0.884652
NOC2L   3       0.0105932       2:0:5   2:0:5   1:2     0:2     0.179712        -1.34164        0.148039        -1.44649        0.179712        -1.34164        nan     nan     nan     nan     0.179712        -1.34164
KLHL17  5       0.0105932       2:0:5   2:0:5   1:2     1:1     0.654721        -0.447214       0.654721        -0.447214       0.654721        -0.447214       nan     nan     nan     nan     0.654721        -0.447214


  • By default there are 19 columns in the output, and the meaning of each columns is described as follows


Group:      group/gene name
nVar:         # of variants included in the analysis
maf_sum:  sum of MAF across all variants included in the analysis
carrier-ca:co:unphe:        #rare variant carriers in case:control:unphenotype groups
allele-ca:co:unphe:        #rare alleles in case:control:unphenotype groups
fa-t:nt:     father #transmission:#non-transmitted alleles
mo-t:nt:   mother #transmission:#non-transmitted alleles
p_s:          p-value of M1
z_s:          z-score of M1
p_ws:         p-value of M2
z_ws:         z-score of M2
p_c:         p-value of M3
z_c:         z-score of M3
p_ch:         p-value of M4
z_ch:         z-score of M4
p_ch_w:   p-value of M5
z_ch_w:   z-score of M5
p_2a_add:  p-value of M6
z_2a_add:  z-score of M6

Download

Source code of v0.01 download here. A pre-compiled binary version v0.01 download here.

Contact

For questions please contact the authors (Bingshan Li: bingshan.li@vanderbilt.edu)