Difference between revisions of "GTDT"
From Genome Analysis Wiki
Jump to navigationJump to search (→Output) |
(→Output) |
||
Line 90: | Line 90: | ||
* By default there are 19 columns in the output, and the meaning of each columns is described as follows | * By default there are 19 columns in the output, and the meaning of each columns is described as follows | ||
− | + | <tt> | |
− | '''Group''': group/gene name | + | '''Group''': group/gene name |
'''nVar''': # of variants included in the analysis | '''nVar''': # of variants included in the analysis | ||
'''maf_sum''': sum of MAF across all variants included in the analysis | '''maf_sum''': sum of MAF across all variants included in the analysis | ||
Line 110: | Line 110: | ||
'''p_2a_add''': p-value of M6 | '''p_2a_add''': p-value of M6 | ||
'''z_2a_add''': z-score of M6 | '''z_2a_add''': z-score of M6 | ||
+ | </tt> |
Revision as of 12:06, 4 September 2014
Introduction
- gTDT implemented gene-based or group-wise TDT for rare variants. Currently gTDT implemented 6 tests for 6 models, M1-M6. It takes as input a ped file and a dat file that specify the relationships, and a VCF file that stores genotype data.
- The 6 models are described as follows
M1: additive model with equal weights (sum statistics) M2: additive model with unequal weights (weighted sum statistics) M3: dominant model (i.e. carriers vs. non-carrier, or indicator model, or collapsing model) M4: compound heterozygous model (CH), also 2-hits or multi-hits models, or recessive model. M5: CH model with weighted sum on the rare variants that form compound heterozygotes, i.e. a hybrid of M4 and M2 M6: 2-allele additive model where haplotypes are classified into two alleles, one with rare variants and one with all common alleles
Usage
A command without any input will invoke triodenovo and display the following message
The following parameters are available. Ones with "[]" are in effect: pedfile : (-pname) datfile : (-dname) groupfile : (-gname) vcffile : (-vname) outfile : (-oname)
Additional Options Groupwise test : --sum [OFF] , --wss [OFF] , --cmc [OFF] , --comphet [OFF] , --comphet-weight [OFF] , --all [ON] Variant filters : --min_qual [0.00], --min_maf [0.00], --max_maf [0.01], --min_avg_dp [0.00], --max_avg_dp [0.00], --max_missing_rate [1.00], --pass [OFF] Genotype filters : --min_dp [0], --min_gq [0.00], --missing_as_ref [OFF] Phased genotypes : --phased [OFF] Mendelian inconsistency : --ignore_mi_trio [OFF] Disease to analyze : --disease [] Non-autosomes : --chrX [X], --chrY [Y], --MT [MT] Empirical p-values : --permute [0], --seed [13579] Multi-threading : --nthreads [1]
- Example 1: using default parameters
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt
- Example 1: analyze for disease diseaes_A specified in the .dat file. The .ped and .dat file can contain multiple phenotypes
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --disease disease_A
- Example 1: using maf cutoff of 0.05
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --max_maf 0.05
- Example 1: using maf cutoff of 0.05 and using phased genotyped provided in the vcf
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --max_maf 0.05 --phased
- Example 1: using the minimum depth 10 to filter a variant in a trio if one of the individuals in the trio has depth lower than 10
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --min_dp 10
Input
- A ped file, with 6 colums [see merlin documentation]. An example ped file is as follows
fam1 p1 0 0 1 0 fam1 p2 0 0 2 0 fam1 p3 p1 p2 1 2 fam2 p4 0 0 1 0 fam2 p5 0 0 2 0 fam2 p6 p4 p5 1 2
- A dat file contain the information about the column 6 and beyond in the ped file. Here it is affection status for disease_A
A disease_A
- A VCF file [VCF specs]. It can contain variant information for more individuals than in the ped file.
- The genotypes in the VCF file can be either phased or unphased. If unphased, gTDT will phased the genotypes by transmission.
Output
Group nVar maf_sum carrier-ca:co:unphe allele-ca:co:unphe fa-t:nt mo-t:nt p_s z_s p_ws z_ws p_c z_c p_ch z_ch p_ch_w z_ch_w p_2a_add z_2a_add OR4F5 0 0 0:0:0 0:0:0 0:0 0:0 nan nan nan nan nan nan nan nan nan nan nan nan SAMD11 8 0.0317978 8:0:14 10:0:15 4:3 6:2 0.225253 1.21268 0.352245 0.930244 0.376344 0.884652 0.414216 0.816497 0.220113 1.22623 0.376344 0.884652 NOC2L 3 0.0105932 2:0:5 2:0:5 1:2 0:2 0.179712 -1.34164 0.148039 -1.44649 0.179712 -1.34164 nan nan nan nan 0.179712 -1.34164 KLHL17 5 0.0105932 2:0:5 2:0:5 1:2 1:1 0.654721 -0.447214 0.654721 -0.447214 0.654721 -0.447214 nan nan nan nan 0.654721 -0.447214
- By default there are 19 columns in the output, and the meaning of each columns is described as follows
Group: group/gene name nVar: # of variants included in the analysis maf_sum: sum of MAF across all variants included in the analysis carrier-ca:co:unphe: #rare variant carriers in case:control:unphenotype groups allele-ca:co:unphe: #rare alleles in case:control:unphenotype groups fa-t:nt: father #transmission:#non-transmitted alleles mo-t:nt: mother #transmission:#non-transmitted alleles p_s: p-value of M1 z_s: z-score of M1 p_ws: p-value of M2 z_ws: z-score of M2 p_c: p-value of M3 z_c: z-score of M3 p_ch: p-value of M4 z_ch: z-score of M4 p_ch_w: p-value of M5 z_ch_w: z-score of M5 p_2a_add: p-value of M6 z_2a_add: z-score of M6