Difference between revisions of "GTDT"

Latest revision as of 08:18, 26 March 2015

Introduction

gTDT implemented gene-based or group-wise TDT for rare variant aggregation analysis. Currently gTDT implemented haplotype-based tests for 6 models, M1-M6. It takes as input a ped file and a dat file that specify the relationships, and a VCF file that stores genotype data.

The 6 models are described as follows

AD:    additive model with equal weights (sum statistics)
wAD:    additive model with unequal weights (weighted sum statistics)
DOM:    dominant model (i.e. carriers vs. non-carrier, or indicator model, or collapsing model)
CH:    compound heterozygous model (CH), also 2-hits or multi-hits models, or recessive model.
wCH:    CH model with weighted sum on the rare variants that form compound heterozygotes, i.e. a hybrid of M4 and M2
2AD:    2-allele additive model where haplotypes are classified into two alleles, one with rare variants and one with all common alleles

Usage

A command gtdt without any input will invoke gtdt and display the following message

The following parameters are available.  Ones with "[]" are in effect:
                      pedfile :                 (-pname)
                      datfile :                 (-dname)
                    groupfile :                 (-gname)
                      vcffile :                 (-vname)
                      outfile :                 (-oname)

Additional Options
           Groupwise test : --sum [OFF] , --wss [OFF] , --cmc [OFF] ,
                            --comphet [OFF] , --comphet-weight [OFF] ,
                            --all [ON]
          Variant filters : --min_qual [0.00], --min_maf [0.00],
                            --max_maf [0.01], --min_avg_dp [0.00],
                            --max_avg_dp [0.00], --max_missing_rate [1.00],
                            --pass [OFF] 
         Genotype filters : --min_dp [0], --min_gq [0.00],
                            --missing_as_ref [OFF] 
         Phased genotypes : --phased [OFF] 
  Mendelian inconsistency : --ignore_mi_trio [OFF] 
       Disease to analyze : --disease []
            Non-autosomes : --chrX [X], --chrY [Y], --MT [MT]
       Empirical p-values : --permute [0], --seed [13579]
          Multi-threading : --nthreads [1]

Example 1: using default parameters

gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt

Example 1: analyze for disease diseaes_A specified in the .dat file. The .ped and .dat file can contain multiple phenotypes

gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --disease disease_A

Example 1: using maf cutoff of 0.05

gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --max_maf 0.05

Example 1: using maf cutoff of 0.05 and using phased genotyped provided in the vcf

gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --max_maf 0.05 --phased

Example 1: using the minimum depth 10 to filter a variant in a trio if one of the individuals in the trio has depth lower than 10

gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --min_dp 10

Input

A ped file, with 6 colums [see merlin documentation]. An example ped file is as follows

fam1 p1  0  0   1 0
fam1 p2  0  0   2 0
fam1 p3  p1 p2  1 2
fam2 p4  0  0   1 0
fam2 p5  0  0   2 0
fam2 p6  p4 p5  1 2

A dat file contains the information about the column 6 and beyond in the ped file. Here it is affection status for disease_A

A disease_A

A VCF file [VCF specs]. It can contain variant information for more individuals than in the ped file. The genotypes in the VCF file can be either phased or unphased. If unphased, gTDT will phased the genotypes by transmission.

A group file indicating which variants should be analyzed as a unit. It has required 3 columns, plus additional columns for weighting and others. For example:

#CHR   POS       Group
1    10000       gene1
1    20000       gene1
2    15000       gene2
2    25000       gene2
2    35000       gene2

If you have specific weights for each variant you can provide on column 4.

#CHR   POS       Group     WEIGHT
1    10000       gene1    2.1
1    20000       gene1    2.5
2    15000       gene2    1.5
2    25000       gene2    2.2
2    35000       gene2    2.8

Output

Group   nVar    maf_sum carrier-ca:co:unphe     allele-ca:co:unphe      fa-t:nt mo-t:nt p_s     z_s     p_ws    z_ws    p_c     z_c     p_ch    z_ch    p_ch_w  z_ch_w  p_2a_add        z_2a_add
OR4F5   0       0       0:0:0   0:0:0   0:0     0:0     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan
SAMD11  8       0.0317978       8:0:14  10:0:15 4:3     6:2     0.225253        1.21268 0.352245        0.930244        0.376344        0.884652        0.414216        0.816497        0.220113        1.22623 0.376344        0.884652
NOC2L   3       0.0105932       2:0:5   2:0:5   1:2     0:2     0.179712        -1.34164        0.148039        -1.44649        0.179712        -1.34164        nan     nan     nan     nan     0.179712        -1.34164
KLHL17  5       0.0105932       2:0:5   2:0:5   1:2     1:1     0.654721        -0.447214       0.654721        -0.447214       0.654721        -0.447214       nan     nan     nan     nan     0.654721        -0.447214

By default there are 19 columns in the output, and the meaning of each columns is described as follows

Group:      group/gene name
nVar:         # of variants included in the analysis
maf_sum:  sum of MAF across all variants included in the analysis
carrier-ca:co:unphe:        #rare variant carriers in case:control:unphenotype groups
allele-ca:co:unphe:        #rare alleles in case:control:unphenotype groups
fa-t:nt:     father #transmission:#non-transmitted alleles
mo-t:nt:   mother #transmission:#non-transmitted alleles
p_s:          p-value of M1
z_s:          z-score of M1
p_ws:         p-value of M2
z_ws:         z-score of M2
p_c:         p-value of M3
z_c:         z-score of M3
p_ch:         p-value of M4
z_ch:         z-score of M4
p_ch_w:   p-value of M5
z_ch_w:   z-score of M5
p_2a_add:  p-value of M6
z_2a_add:  z-score of M6

Download

Source code of v0.01 download here. A pre-compiled binary version v0.01 download here.

Contact

For questions please contact the authors (Bingshan Li: bingshan.li@vanderbilt.edu)

Difference between revisions of "GTDT"

Latest revision as of 08:18, 26 March 2015

Contents

Introduction

Usage

Input

Output

Download

Contact

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools