SEQMIX

Contents

Overview

SEQMIX is a C++ program that takes advantage of off-targeted sequence reads from exome/targeted sequencing experiments for accurate local ancestry inference. The paper is currently accepted by AJHG and will appear at the November issue (link coming soon).

Method

Before running SEQMIX, it is important to LD prune your data so that pairs of sites in high LD (r^2 > 0.1) are identified and only the one with a higher sequence depth are included into the model. As the sequence depth distribution is sample dependent, it is necessary to prune the sequence data for each individual.

Download

Here is a tar ball Media:SEQMIX_0.1.tar for SEQMIX (version 0.1) which compresses the following three folders

  • libsrc: a folder contains source code from code written by Goncalo Abecasis
  • src: source code for SEQMIX
  • Release: example files and command file as well as the Readme.txt file that explains how to run SEQMIX

Before you run SEQMIX, or even while you are running SEQMIX examples (summarized by the example.sh file), please refer to the Readme.txt file for detailed explanations of the two steps for running SEQMIX.

Note that SEQMIX requires these specified files

  • allele frequency for Africans
  • allele frequency for Europeans
  • genetic distance file
  • input vcf

The input vcf file is generated from sequencing experiment and the downstream data processing steps. The allele frequency and genetic distance files should be prepared by users and sometimes are tedious to do. The good news is that I have used these files for the whole genome level. Please contact me (youna@umich.edu) if you would like to have them. I will point you to the path if you are internal user and will figure out a way to share them (These files are fairly big) if you are a external users.

Maintainer

Please contact Youna Hu (youna@umich.edu) if you have any questions or suggestions for SEQMIX.

Related Programs

Local ancestry inference with high density genotype array data can be done with existing software HAPMIX, LAMP, ANCESTRYMAP.

Whole genome ancestry inference with ultra low coverage sequence data can be analyzed with LASER.