relativeFinder is a program for checking relationships between pairs of individuals. There are many excellent programs that carry out similar tasks. Some of the unique features in relativeFinder are the batch mode options, that allow large jobs to be divided into many smaller jobs (suitable for deployment on a compute cluster environment), and the flexibility of the underlying Merlin engine, which allows relative finder to handle large pedigrees and consider a variety of alternate relationships -- including potential relationships specified by the user on the fly!
How It Works
relativeFinder examines every pair of genotyped individuals in a pedigree. Then, it calculates the likelihood of the pair of the observed genotype data for the pair using the relationship specified in the pedigree and using a series of alternate relationships (described in a template pedigree file). If any of these alternate relationships makes the observed genotype data more likely, the pair is flagged.
Before evaluating relationship likelihoods, relativeFinder will flag individuals with outlier patterns for heterozygosity or missing genotypes. To do this, relativeFinder checks whether the fraction of heterozygous sites or the total number of available genotypes differs froom the sample average by three standard deviations or more. Results for any flagged samples should be treated with caution, as problematic heterozygosity or missing data patterns can lead to odd results for the relativeFinder algorithm.
Command Line Options
- -d datafile -p pedigreefile -m mapfile
- A set of required input files in Merlin format. These files list the individuals to be evaluated. It is recommended that a whole genome worth of data should be available.
- -t testRelationshipData -q testRelationshipPedigree
- An optional set of files, also in Merlin format, that describe relationships to be considered by RelativeFinder.
- -f alelleFrequencyModel
- Allele frequencies can be provided in a Merlin format file (
-f filename) or can be calculated from the available pedigree option. In the later case, recommended options are
-fa(estimate frequencies from all available genotypes),
-ff(estimate frequencies from founder genotypes only) or
-fm(estimate frequencies using a maximum likelihood algorithm).
- --perAllele errorRate
- Specifies the genotyping error rate per allele; the default is 0.002.
- --perGenotype errorRate
- Specifies the genotyping error rate per genotype. This option and the
--perAlleleoption are mutually exclusive.
- Specifies that all possible pairings should be considered
- Specifies that only within family pairings should be considered
- --job k --of N
- Specifies that the analysis should be divided into N parallel batches and that the current invocation corresponds to batch k.
The set of relationships to be considered by RelativeFinder can be specified in a pair of merlin format data and pedigree files. Names for these files can be specified with the -t (for the data file) and -q (for the pedigree file) command line options. Within this pedigree, each family should describe one potential relationship. The placement of the individuals within the test pedigree should be labeled with the words "TEST1" and "TEST2".
If these files are not available, RelativeFinder will consider a basic set of relationships for each pair of individuals (siblings, half-siblings, identical twins, parent-offspring pairs, unrelated individuals, or an avuncular relationship).
Here files specifying relationships to be considered might look like:
Example of testCases.dat file
In this case, the data file simply indicates that the five canonical columns in the pedigree (family id, individual id, father, mother and sex) will be followed by a column indicated whether individuals are MZ twins.
Example of testCases.ped file
MZTWIN 1 0 0 1 0 MZTWIN 2 0 0 2 0 MZTWIN TEST1 1 2 1 MZ MZTWIN TEST2 1 2 2 MZ SIBS 1 0 0 1 0 SIBS 2 0 0 2 0 SIBS TEST1 1 2 1 0 SIBS TEST2 1 2 2 0 HALFSIBS 1 0 0 1 0 HALFSIBS 2 0 0 2 0 HALFSIBS 3 0 0 2 0 HALFSIBS TEST1 1 2 1 0 HALFSIBS TEST2 1 3 2 0 AVUNCULAR 1 0 0 1 0 AVUNCULAR 2 0 0 2 0 AVUNCULAR 3 1 2 1 0 AVUNCULAR TEST1 1 2 2 0 AVUNCULAR 4 0 0 2 0 AVUNCULAR TEST2 3 4 1 0 PARENT-OFFSPRING 1 0 0 1 0 PARENT-OFFSPRING TEST1 0 0 2 0 PARENT-OFFSPRING TEST2 1 TEST1 2 0 UNRELATED TEST1 0 0 1 0 UNRELATED TEST2 0 0 2 0
In the example above, the pedigree describes 6 alternate relationships that match the default relationships considered by RelativeFinder(identical twins, siblings, half-siblings, avuncular pairs, parent-offspring pairs and unrelated individuals). Note that, for each putative relationship, the placement of the individuals being evaluated is indicated by the TEST1 and TEST2 tags.
A source package is available for download from here.
Current Limitations and Todo List
The current implementation does not include support for X linked markers and should only be used with autosomal markers.
The current implementation simply reports the most likely relationship and the difference in log-likelihood between this relationship and the originally specified relationship. It would be better to use an E-M algorithm to calculate a prior probability for each relationship and to only report as problematic pairs where the posterior probability of a mis-specified relationship is high.