SEQMIX is a C++ program that takes advantage of off-targeted sequence reads from exome/targeted sequencing experiments for accurate local ancestry inference.
Before running SEQMIX, it is important to LD prune your data so that pairs of sites in high LD (r^2 > 0.1) are identified and only the one with a higher sequence depth are included into the model. As the sequence depth distribution is sample dependent, it is necessary to prune the sequence data for each individual.
Here is a tar ball Media:SEQMIX_0.1.tar which compresses the following three folders
- libsrc: a folder contains source code from code written by Goncalo
- src: source code for SEQMIX
- Release: example files and command as well as the Readme.txt file that explains how to run SEQMIX
Before you run SEQMIX, or even while you are running SEQMIX examples (summarized by the example.sh file), please refer to the Readme.txt file for detailed explanation of the two steps for running SEQMIX.
Note that SEQMIX requires these specified files
- allele frequency for Africans
- allele frequency for European
- genetic distance file
- input vcf
The input vcf file is generated from sequencing experiment and the downstream data processing steps. The allele frequency and genetic distance files should be prepared by users and sometimes are tedious to do. The good news is that I have used these files for the whole genome level. Please contact me (firstname.lastname@example.org) if you would like to use it. I will point you to the path if you are internal user and will figure out a way to share them (These files are fairly big) if you are a external users.
Whole genome ancestry inference with ultra low coverage sequence data can be analyzed with LASER.