Difference between revisions of "GTDT"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 53: Line 53:
 
== Input ==
 
== Input ==
  
* A ped file, with 5 colums [[http://www.sph.umich.edu/csg/abecasis/merlin/tour/ see merlin documentation]]. An example ped file is as follows
+
* A ped file, with 6 colums [[http://www.sph.umich.edu/csg/abecasis/merlin/tour/ see merlin documentation]]. An example ped file is as follows
 
   
 
   
 
  fam1 p1  0  0  1 0
 
  fam1 p1  0  0  1 0

Revision as of 11:47, 4 September 2014

Introduction

Usage

A command without any input will invoke triodenovo and display the following message

The following parameters are available.  Ones with "[]" are in effect:
                      pedfile :                 (-pname)
                      datfile :                 (-dname)
                    groupfile :                 (-gname)
                      vcffile :                 (-vname)
                      outfile :                 (-oname)
Additional Options
           Groupwise test : --sum [OFF] , --wss [OFF] , --cmc [OFF] ,
                            --comphet [OFF] , --comphet-weight [OFF] ,
                            --all [ON]
          Variant filters : --min_qual [0.00], --min_maf [0.00],
                            --max_maf [0.01], --min_avg_dp [0.00],
                            --max_avg_dp [0.00], --max_missing_rate [1.00],
                            --pass [OFF] 
         Genotype filters : --min_dp [0], --min_gq [0.00],
                            --missing_as_ref [OFF] 
         Phased genotypes : --phased [OFF] 
  Mendelian inconsistency : --ignore_mi_trio [OFF] 
       Disease to analyze : --disease []
            Non-autosomes : --chrX [X], --chrY [Y], --MT [MT]
       Empirical p-values : --permute [0], --seed [13579]
          Multi-threading : --nthreads [1]


  • Example 1: using default parameters
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt
  • Example 1: analyze for disease diseaes_A specified in the .dat file. The .ped and .dat file can contain multiple phenotypes
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --disease disease_A
  • Example 1: using maf cutoff of 0.05
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --max_maf 0.05
  • Example 1: using maf cutoff of 0.05 and using phased genotyped provided in the vcf
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --max_maf 0.05 --phased
  • Example 1: using the minimum depth 10 to filter a variant in a trio if one of the individuals in the trio has depth lower than 10
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --min_dp 10


Input

fam1 p1  0  0   1 0
fam1 p2  0  0   2 0
fam1 p3  p1 p2  1 2
fam2 p4  0  0   1 0
fam2 p5  0  0   2 0
fam2 p6  p4 p5  1 2
  • A dat file contain the information about the column 6 and beyond in the ped file. Here it is affection status for disease_A
A disease_A
  • A VCF file [VCF specs]. It can contain variant information for more individuals than in the ped file.
  • The genotypes in the VCF file can be either phased or unphased. If unphased, gTDT will phased the genotypes by transmission.

Output

Group   nVar    maf_sum carrier-ca:co:unphe     allele-ca:co:unphe      fa-t:nt mo-t:nt p_s     z_s     p_ws    z_ws    p_c     z_c     p_ch    z_ch    p_ch_w  z_ch_w  p_2a_add        z_2a_add
OR4F5   0       0       0:0:0   0:0:0   0:0     0:0     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan
SAMD11  8       0.0317978       8:0:14  10:0:15 4:3     6:2     0.225253        1.21268 0.352245        0.930244        0.376344        0.884652        0.414216        0.816497        0.220113        1.22623 0.376344        0.884652
NOC2L   3       0.0105932       2:0:5   2:0:5   1:2     0:2     0.179712        -1.34164        0.148039        -1.44649        0.179712        -1.34164        nan     nan     nan     nan     0.179712        -1.34164
KLHL17  5       0.0105932       2:0:5   2:0:5   1:2     1:1     0.654721        -0.447214       0.654721        -0.447214       0.654721        -0.447214       nan     nan     nan     nan     0.654721        -0.447214


  • By default there are 19 columns in the output, and the meaning of each columns is described as follows
Group: group/gene name
nVar: # of variants included in the analysis
maf_sum: sum of MAF across all variants included in the analysis
carrier-ca:co:unphe: #rare variant carriers in case:control:unphenotype groups
allele-ca:co:unphe:  #rare alleles in case:control:unphenotype groups
fa-t:nt: father #transmission:#non-transmitted alleles
mo-t:nt: mother #transmission:#non-transmitted alleles
p_s: p-value of equal weight sum statistics
z_s: z-score of equal weight sum statistics
p_ws: p-value of weighted sum statistics
z_ws: z-score of weighted sum statistics
p_c: p-value of collapsing test (i.e. carrier vs. non-carrier, considered as the dominant model)
z_c: z-score of the collapsing test
p_ch: p-value of compound heterozygous model (CH)
z_ch: z-score of CH model
p_ch_w: p-value of CH model with weights
z_ch_w: z-score of CH model with weights
p_2a_add: p-value of the 2-allele additive model where haplotypes are classified into two alleles, one with rare variants and one with all common alleles.
z_2a_add: z-score of the 2-allele additive model