Difference between revisions of "GTDT"
From Genome Analysis Wiki
Jump to navigationJump to search (→Input) |
|||
Line 76: | Line 76: | ||
* A group file indicating which variants should be analyzed as a unit. It has required 3 columns, plus additional columns for weighting and others. For example: | * A group file indicating which variants should be analyzed as a unit. It has required 3 columns, plus additional columns for weighting and others. For example: | ||
− | CHR POS Group | + | #CHR POS Group |
1 10000 gene1 | 1 10000 gene1 | ||
1 20000 gene1 | 1 20000 gene1 | ||
Line 85: | Line 85: | ||
If you have specific weights for each variant you can provide on column 4. | If you have specific weights for each variant you can provide on column 4. | ||
− | CHR POS Group WEIGHT | + | #CHR POS Group WEIGHT |
1 10000 gene1 2.1 | 1 10000 gene1 2.1 | ||
1 20000 gene1 2.5 | 1 20000 gene1 2.5 |
Latest revision as of 08:18, 26 March 2015
Introduction
- gTDT implemented gene-based or group-wise TDT for rare variant aggregation analysis. Currently gTDT implemented haplotype-based tests for 6 models, M1-M6. It takes as input a ped file and a dat file that specify the relationships, and a VCF file that stores genotype data.
- The 6 models are described as follows
AD: additive model with equal weights (sum statistics) wAD: additive model with unequal weights (weighted sum statistics) DOM: dominant model (i.e. carriers vs. non-carrier, or indicator model, or collapsing model) CH: compound heterozygous model (CH), also 2-hits or multi-hits models, or recessive model. wCH: CH model with weighted sum on the rare variants that form compound heterozygotes, i.e. a hybrid of M4 and M2 2AD: 2-allele additive model where haplotypes are classified into two alleles, one with rare variants and one with all common alleles
Usage
A command gtdt without any input will invoke gtdt and display the following message
The following parameters are available. Ones with "[]" are in effect: pedfile : (-pname) datfile : (-dname) groupfile : (-gname) vcffile : (-vname) outfile : (-oname)
Additional Options Groupwise test : --sum [OFF] , --wss [OFF] , --cmc [OFF] , --comphet [OFF] , --comphet-weight [OFF] , --all [ON] Variant filters : --min_qual [0.00], --min_maf [0.00], --max_maf [0.01], --min_avg_dp [0.00], --max_avg_dp [0.00], --max_missing_rate [1.00], --pass [OFF] Genotype filters : --min_dp [0], --min_gq [0.00], --missing_as_ref [OFF] Phased genotypes : --phased [OFF] Mendelian inconsistency : --ignore_mi_trio [OFF] Disease to analyze : --disease [] Non-autosomes : --chrX [X], --chrY [Y], --MT [MT] Empirical p-values : --permute [0], --seed [13579] Multi-threading : --nthreads [1]
- Example 1: using default parameters
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt
- Example 1: analyze for disease diseaes_A specified in the .dat file. The .ped and .dat file can contain multiple phenotypes
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --disease disease_A
- Example 1: using maf cutoff of 0.05
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --max_maf 0.05
- Example 1: using maf cutoff of 0.05 and using phased genotyped provided in the vcf
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --max_maf 0.05 --phased
- Example 1: using the minimum depth 10 to filter a variant in a trio if one of the individuals in the trio has depth lower than 10
gtdt -p fam.ped -d fam.dat -g input.grp -v input.vcf -o out.txt --min_dp 10
Input
- A ped file, with 6 colums [see merlin documentation]. An example ped file is as follows
fam1 p1 0 0 1 0 fam1 p2 0 0 2 0 fam1 p3 p1 p2 1 2 fam2 p4 0 0 1 0 fam2 p5 0 0 2 0 fam2 p6 p4 p5 1 2
- A dat file contains the information about the column 6 and beyond in the ped file. Here it is affection status for disease_A
A disease_A
- A VCF file [VCF specs]. It can contain variant information for more individuals than in the ped file. The genotypes in the VCF file can be either phased or unphased. If unphased, gTDT will phased the genotypes by transmission.
- A group file indicating which variants should be analyzed as a unit. It has required 3 columns, plus additional columns for weighting and others. For example:
#CHR POS Group 1 10000 gene1 1 20000 gene1 2 15000 gene2 2 25000 gene2 2 35000 gene2
If you have specific weights for each variant you can provide on column 4.
#CHR POS Group WEIGHT 1 10000 gene1 2.1 1 20000 gene1 2.5 2 15000 gene2 1.5 2 25000 gene2 2.2 2 35000 gene2 2.8
Output
Group nVar maf_sum carrier-ca:co:unphe allele-ca:co:unphe fa-t:nt mo-t:nt p_s z_s p_ws z_ws p_c z_c p_ch z_ch p_ch_w z_ch_w p_2a_add z_2a_add OR4F5 0 0 0:0:0 0:0:0 0:0 0:0 nan nan nan nan nan nan nan nan nan nan nan nan SAMD11 8 0.0317978 8:0:14 10:0:15 4:3 6:2 0.225253 1.21268 0.352245 0.930244 0.376344 0.884652 0.414216 0.816497 0.220113 1.22623 0.376344 0.884652 NOC2L 3 0.0105932 2:0:5 2:0:5 1:2 0:2 0.179712 -1.34164 0.148039 -1.44649 0.179712 -1.34164 nan nan nan nan 0.179712 -1.34164 KLHL17 5 0.0105932 2:0:5 2:0:5 1:2 1:1 0.654721 -0.447214 0.654721 -0.447214 0.654721 -0.447214 nan nan nan nan 0.654721 -0.447214
- By default there are 19 columns in the output, and the meaning of each columns is described as follows
Group: group/gene name nVar: # of variants included in the analysis maf_sum: sum of MAF across all variants included in the analysis carrier-ca:co:unphe: #rare variant carriers in case:control:unphenotype groups allele-ca:co:unphe: #rare alleles in case:control:unphenotype groups fa-t:nt: father #transmission:#non-transmitted alleles mo-t:nt: mother #transmission:#non-transmitted alleles p_s: p-value of M1 z_s: z-score of M1 p_ws: p-value of M2 z_ws: z-score of M2 p_c: p-value of M3 z_c: z-score of M3 p_ch: p-value of M4 z_ch: z-score of M4 p_ch_w: p-value of M5 z_ch_w: z-score of M5 p_2a_add: p-value of M6 z_2a_add: z-score of M6
Download
Source code of v0.01 download here. A pre-compiled binary version v0.01 download here.
Contact
For questions please contact the authors (Bingshan Li: bingshan.li@vanderbilt.edu)