From Genome Analysis Wiki
Jump to navigationJump to search
2,337 bytes added
, 23:28, 27 August 2014
Genetic Model-based Simulator [GMS] is an efficient c++ program for simulating case control data sets based on genetic models. The input is a pool of haplotypes and a text file for model specification.
The output is a set of simulated datasets in the format of Merlin ped file.
== Basic Usage Example ==
In a typical command line, a few options need to be specified together with the input files.
Here is an example of how GMS works:
./GMS --hapfile test.hap --snplist test.lst --model model.heter.txt --f0 0.01 --
nrep 100 --ncase 250 --nctrl 250 --causal --prefix tmp
== Command Line Options ==
=== Basic Output Options ===
--hapfile a pool of simulated or real haplotypes, one chromosome per row
--snplist snp names in the order ofhaplotypes in hapfile, one snp per row
--model a model file specifying genetic models, see below for details
--nrep the number of replications
--seed seed for random number generator
--ncase the number of cases in each replicate
--nctrl the number of controls in each replicate
--f0 overall baseline prevalence
--prefix prefix of output files (e.g. prefix.rep1.ped, prefix.rep2.ped)
--causal only generate causal SNPs in the output pedigree file
=== Model File Annotation ===
The model file includes one header line and multiple rows after. Each row responding to a set of
SNPs with desired frequency range and relate risk (RR) or odds ratio (OR)
1. Heterogeneity Model
a) COUNT FREQ_MIN FREQ_MAX RR1 RR2
b) FRACTION FREQ_MIN FREQ_MAX RR1 RR2
2. Logistic Model
a) COUNT FREQ_MIN FREQ_MAX OR1 OR2
b) FRACTION FREQ_MIN FREQ_MAX OR1 OR2
== How It Works ==
There are two underlying models. Disease status follows a Bernoulli distribution with P
1. Heterogeneity Model
<math> P(D | (AA,AA,...,AA)) = f_0 </math>
<math> P = \sum_{i=1}^N P(D|x_i) </math>
2. Logistic Model
<math>logit(y) = \beta_0 + \sum_{i=1}^{N}\beta_i\times x_i</math>
<math> P = \frac{e^{\beta_0 + \sum_{i=1}^{N}\beta_i\times x_i}}{1+e^{\beta_0 + \sum_{i=1}^{N}\beta_i\times x_i}}</math>
== Download ==
The current version is available for download from http://www.sph.umich.edu/csg/weich/GMS.tar.gz
== TODO ==
1. Support Quantitative trait.
2. Support family structures.
3. Support more "reasonable" models.