Difference between revisions of "SEQMIX"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 12: Line 12:
 
* libsrc: a folder contains source code from code written by Goncalo Abecasis
 
* libsrc: a folder contains source code from code written by Goncalo Abecasis
 
* src: source code for SEQMIX
 
* src: source code for SEQMIX
* Release: example files and command as well as the Readme.txt file that explains how to run SEQMIX
+
* Release: example files and command file as well as the Readme.txt file that explains how to run SEQMIX
  
 
Before you run SEQMIX, or even while you are running SEQMIX examples (summarized by the '''example.sh''' file), please refer to the '''Readme.txt''' file for detailed explanations of the two steps for running SEQMIX.  
 
Before you run SEQMIX, or even while you are running SEQMIX examples (summarized by the '''example.sh''' file), please refer to the '''Readme.txt''' file for detailed explanations of the two steps for running SEQMIX.  

Revision as of 03:05, 4 November 2013

Overview

SEQMIX is a C++ program that takes advantage of off-targeted sequence reads from exome/targeted sequencing experiments for accurate local ancestry inference. The paper is currently accepted by AJHG and will appear at the November issue (link coming soon).

Method

Before running SEQMIX, it is important to LD prune your data so that pairs of sites in high LD (r^2 > 0.1) are identified and only the one with a higher sequence depth are included into the model. As the sequence depth distribution is sample dependent, it is necessary to prune the sequence data for each individual.

Download

Here is a tar ball Media:SEQMIX_0.1.tar which compresses the following three folders

  • libsrc: a folder contains source code from code written by Goncalo Abecasis
  • src: source code for SEQMIX
  • Release: example files and command file as well as the Readme.txt file that explains how to run SEQMIX

Before you run SEQMIX, or even while you are running SEQMIX examples (summarized by the example.sh file), please refer to the Readme.txt file for detailed explanations of the two steps for running SEQMIX.

Note that SEQMIX requires these specified files

  • allele frequency for Africans
  • allele frequency for European
  • genetic distance file
  • input vcf

The input vcf file is generated from sequencing experiment and the downstream data processing steps. The allele frequency and genetic distance files should be prepared by users and sometimes are tedious to do. The good news is that I have used these files for the whole genome level. Please contact me (youna@umich.edu) if you would like to have them. I will point you to the path if you are internal user and will figure out a way to share them (These files are fairly big) if you are a external users.

Related Programs

Local ancestry inference with high density genotype array data can be done with existing software HAPMIX, LAMP, ANCESTRYMAP.

Whole genome ancestry inference with ultra low coverage sequence data can be analyzed with LASER.