RareSimu

From Genome Analysis Wiki
Revision as of 11:11, 2 February 2017 by Ppwhite (talk | contribs) (→‎Download)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Genetic Model-based Simulator [GMS] is an efficient c++ program for simulating case control data sets based on genetic models. The input is a pool of haplotypes and a text file for model specification. The output is a set of simulated datasets in the format of Merlin ped file.

Basic Usage Example

In a typical command line, a few options need to be specified together with the input files. Here is an example of how GMS works:

./GMS --hapfile test.hap --snplist test.lst --model model.heter.txt --f0 0.01 -- nrep 100 --ncase 250 --nctrl 250 --causal --prefix tmp

Command Line Options

Basic Output Options

 --hapfile       a pool of simulated or real haplotypes, one chromosome per row
 --snplist       snp names in the order ofhaplotypes in hapfile, one snp per row
 --model         a model file specifying genetic models, see below for details
 --nrep          the number of replications
 --seed          seed for random number generator
 --ncase         the number of cases in each replicate
 --nctrl         the number of controls in each replicate
 --f0            overall baseline prevalence
 --prefix        prefix of output files (e.g. prefix.rep1.ped, prefix.rep2.ped)
 --causal        only generate causal SNPs in the output pedigree file


Model File Annotation

The model file includes one header line and multiple rows after. Each row responding to a set of SNPs with desired frequency range and relate risk (RR) or odds ratio (OR)

1. Heterogeneity Model

a) COUNT FREQ_MIN FREQ_MAX RR1 RR2

b) FRACTION FREQ_MIN FREQ_MAX RR1 RR2

2. Logistic Model

a) COUNT FREQ_MIN FREQ_MAX OR1 OR2

b) FRACTION FREQ_MIN FREQ_MAX OR1 OR2

How It Works

There are two underlying models. Disease status follows a Bernoulli distribution with P

1. Heterogeneity Model  

 


2. Logistic Model

 

 

Download

The current version is available for download from http://csg.sph.umich.edu//weich/GMS.tar.gz

TODO

1. Support Quantitative trait.

2. Support family structures.

3. Support more "reasonable" models.