GTDT

From Genome Analysis Wiki
Revision as of 13:20, 4 September 2014 by Bingshan (talk | contribs) (→‎Usage)
Jump to navigationJump to search

Introduction

  • gTDT implemented gene-based or group-wise TDT for rare variant aggregation analysis. Currently gTDT implemented haplotype-based tests for 6 models, M1-M6. It takes as input a ped file and a dat file that specify the relationships, and a VCF file that stores genotype data.
  • The 6 models are described as follows
M1:    additive model with equal weights (sum statistics)
M2:    additive model with unequal weights (weighted sum statistics)
M3:    dominant model (i.e. carriers vs. non-carrier, or indicator model, or collapsing model)
M4:    compound heterozygous model (CH), also 2-hits or multi-hits models, or recessive model.
M5:    CH model with weighted sum on the rare variants that form compound heterozygotes, i.e. a hybrid of M4 and M2
M6:    2-allele additive model where haplotypes are classified into two alleles, one with rare variants and one with all common alleles

Usage

A command gtdt without any input will invoke gtdt and display the following message

The following parameters are available.  Ones with "[]" are in effect:
                      pedfile :                 (-pname)
                      datfile :                 (-dname)
                    groupfile :                 (-gname)
                      vcffile :                 (-vname)
                      outfile :                 (-oname)
Additional Options
           Groupwise test : --sum [OFF] , --wss [OFF] , --cmc [OFF] ,
                            --comphet [OFF] , --comphet-weight [OFF] ,
                            --all [ON]
          Variant filters : --min_qual [0.00], --min_maf [0.00],
                            --max_maf [0.01], --min_avg_dp [0.00],
                            --max_avg_dp [0.00], --max_missing_rate [1.00],
                            --pass [OFF] 
         Genotype filters : --min_dp [0], --min_gq [0.00],
                            --missing_as_ref [OFF] 
         Phased genotypes : --phased [OFF] 
  Mendelian inconsistency : --ignore_mi_trio [OFF] 
       Disease to analyze : --disease []
            Non-autosomes : --chrX [X], --chrY [Y], --MT [MT]
       Empirical p-values : --permute [0], --seed [13579]
          Multi-threading : --nthreads [1]


  • Example 1: using default parameters
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt
  • Example 1: analyze for disease diseaes_A specified in the .dat file. The .ped and .dat file can contain multiple phenotypes
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --disease disease_A
  • Example 1: using maf cutoff of 0.05
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --max_maf 0.05
  • Example 1: using maf cutoff of 0.05 and using phased genotyped provided in the vcf
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --max_maf 0.05 --phased
  • Example 1: using the minimum depth 10 to filter a variant in a trio if one of the individuals in the trio has depth lower than 10
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --min_dp 10

Input

fam1 p1  0  0   1 0
fam1 p2  0  0   2 0
fam1 p3  p1 p2  1 2
fam2 p4  0  0   1 0
fam2 p5  0  0   2 0
fam2 p6  p4 p5  1 2
  • A dat file contains the information about the column 6 and beyond in the ped file. Here it is affection status for disease_A
A disease_A
  • A VCF file [VCF specs]. It can contain variant information for more individuals than in the ped file.
  • The genotypes in the VCF file can be either phased or unphased. If unphased, gTDT will phased the genotypes by transmission.

Output

Group   nVar    maf_sum carrier-ca:co:unphe     allele-ca:co:unphe      fa-t:nt mo-t:nt p_s     z_s     p_ws    z_ws    p_c     z_c     p_ch    z_ch    p_ch_w  z_ch_w  p_2a_add        z_2a_add
OR4F5   0       0       0:0:0   0:0:0   0:0     0:0     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan
SAMD11  8       0.0317978       8:0:14  10:0:15 4:3     6:2     0.225253        1.21268 0.352245        0.930244        0.376344        0.884652        0.414216        0.816497        0.220113        1.22623 0.376344        0.884652
NOC2L   3       0.0105932       2:0:5   2:0:5   1:2     0:2     0.179712        -1.34164        0.148039        -1.44649        0.179712        -1.34164        nan     nan     nan     nan     0.179712        -1.34164
KLHL17  5       0.0105932       2:0:5   2:0:5   1:2     1:1     0.654721        -0.447214       0.654721        -0.447214       0.654721        -0.447214       nan     nan     nan     nan     0.654721        -0.447214


  • By default there are 19 columns in the output, and the meaning of each columns is described as follows


Group:      group/gene name
nVar:         # of variants included in the analysis
maf_sum:  sum of MAF across all variants included in the analysis
carrier-ca:co:unphe:        #rare variant carriers in case:control:unphenotype groups
allele-ca:co:unphe:        #rare alleles in case:control:unphenotype groups
fa-t:nt:     father #transmission:#non-transmitted alleles
mo-t:nt:   mother #transmission:#non-transmitted alleles
p_s:          p-value of M1
z_s:          z-score of M1
p_ws:         p-value of M2
z_ws:         z-score of M2
p_c:         p-value of M3
z_c:         z-score of M3
p_ch:         p-value of M4
z_ch:         z-score of M4
p_ch_w:   p-value of M5
z_ch_w:   z-score of M5
p_2a_add:  p-value of M6
z_2a_add:  z-score of M6

Contact

For questions please contact the authors (Bingshan Li: bingshan.li@vanderbilt.edu)