Difference between revisions of "GTDT"
From Genome Analysis Wiki
Jump to navigationJump to searchLine 8: | Line 8: | ||
M4: compound heterozygous model (CH), also 2-hits or multi-hits models, or recessive model. | M4: compound heterozygous model (CH), also 2-hits or multi-hits models, or recessive model. | ||
M5: CH model with weighted sum on the rare variants that form compound heterozygotes, i.e. a hybrid of M4 and M2 | M5: CH model with weighted sum on the rare variants that form compound heterozygotes, i.e. a hybrid of M4 and M2 | ||
− | M6: 2-allele additive model | + | M6: 2-allele additive model where haplotypes are classified into two alleles, one with rare variants and one with all common alleles |
== Usage == | == Usage == |
Revision as of 11:57, 4 September 2014
Introduction
- gTDT implemented gene-based or group-wise TDT for rare variants. Currently gTDT implemented 6 tests for 6 models, M1-M6. It takes as input a ped file and a dat file that specify the relationships, and a VCF file that stores genotype data.
- The 6 models are described as follows
M1: additive model with equal weights (sum statistics) M2: additive model with unequal weights (weighted sum statistics) M3: dominant model (i.e. carriers vs. non-carrier, or indicator model, or collapsing model) M4: compound heterozygous model (CH), also 2-hits or multi-hits models, or recessive model. M5: CH model with weighted sum on the rare variants that form compound heterozygotes, i.e. a hybrid of M4 and M2 M6: 2-allele additive model where haplotypes are classified into two alleles, one with rare variants and one with all common alleles
Usage
A command without any input will invoke triodenovo and display the following message
The following parameters are available. Ones with "[]" are in effect: pedfile : (-pname) datfile : (-dname) groupfile : (-gname) vcffile : (-vname) outfile : (-oname)
Additional Options Groupwise test : --sum [OFF] , --wss [OFF] , --cmc [OFF] , --comphet [OFF] , --comphet-weight [OFF] , --all [ON] Variant filters : --min_qual [0.00], --min_maf [0.00], --max_maf [0.01], --min_avg_dp [0.00], --max_avg_dp [0.00], --max_missing_rate [1.00], --pass [OFF] Genotype filters : --min_dp [0], --min_gq [0.00], --missing_as_ref [OFF] Phased genotypes : --phased [OFF] Mendelian inconsistency : --ignore_mi_trio [OFF] Disease to analyze : --disease [] Non-autosomes : --chrX [X], --chrY [Y], --MT [MT] Empirical p-values : --permute [0], --seed [13579] Multi-threading : --nthreads [1]
- Example 1: using default parameters
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt
- Example 1: analyze for disease diseaes_A specified in the .dat file. The .ped and .dat file can contain multiple phenotypes
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --disease disease_A
- Example 1: using maf cutoff of 0.05
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --max_maf 0.05
- Example 1: using maf cutoff of 0.05 and using phased genotyped provided in the vcf
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --max_maf 0.05 --phased
- Example 1: using the minimum depth 10 to filter a variant in a trio if one of the individuals in the trio has depth lower than 10
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --min_dp 10
Input
- A ped file, with 6 colums [see merlin documentation]. An example ped file is as follows
fam1 p1 0 0 1 0 fam1 p2 0 0 2 0 fam1 p3 p1 p2 1 2 fam2 p4 0 0 1 0 fam2 p5 0 0 2 0 fam2 p6 p4 p5 1 2
- A dat file contain the information about the column 6 and beyond in the ped file. Here it is affection status for disease_A
A disease_A
- A VCF file [VCF specs]. It can contain variant information for more individuals than in the ped file.
- The genotypes in the VCF file can be either phased or unphased. If unphased, gTDT will phased the genotypes by transmission.
Output
Group nVar maf_sum carrier-ca:co:unphe allele-ca:co:unphe fa-t:nt mo-t:nt p_s z_s p_ws z_ws p_c z_c p_ch z_ch p_ch_w z_ch_w p_2a_add z_2a_add OR4F5 0 0 0:0:0 0:0:0 0:0 0:0 nan nan nan nan nan nan nan nan nan nan nan nan SAMD11 8 0.0317978 8:0:14 10:0:15 4:3 6:2 0.225253 1.21268 0.352245 0.930244 0.376344 0.884652 0.414216 0.816497 0.220113 1.22623 0.376344 0.884652 NOC2L 3 0.0105932 2:0:5 2:0:5 1:2 0:2 0.179712 -1.34164 0.148039 -1.44649 0.179712 -1.34164 nan nan nan nan 0.179712 -1.34164 KLHL17 5 0.0105932 2:0:5 2:0:5 1:2 1:1 0.654721 -0.447214 0.654721 -0.447214 0.654721 -0.447214 nan nan nan nan 0.654721 -0.447214
- By default there are 19 columns in the output, and the meaning of each columns is described as follows
Group: group/gene name nVar: # of variants included in the analysis maf_sum: sum of MAF across all variants included in the analysis carrier-ca:co:unphe: #rare variant carriers in case:control:unphenotype groups allele-ca:co:unphe: #rare alleles in case:control:unphenotype groups fa-t:nt: father #transmission:#non-transmitted alleles mo-t:nt: mother #transmission:#non-transmitted alleles p_s: p-value of equal weight sum statistics z_s: z-score of equal weight sum statistics p_ws: p-value of weighted sum statistics z_ws: z-score of weighted sum statistics p_c: p-value of collapsing test (i.e. carrier vs. non-carrier, considered as the dominant model) z_c: z-score of the collapsing test p_ch: p-value of compound heterozygous model (CH) z_ch: z-score of CH model p_ch_w: p-value of CH model with weights z_ch_w: z-score of CH model with weights p_2a_add: p-value of the 2-allele additive model where haplotypes are classified into two alleles, one with rare variants and one with all common alleles. z_2a_add: z-score of the 2-allele additive model