UMAKE is a software pipeline to detect SNPs and call their genotypes from a list of BAM files. UMAKE pipeline has been successfully applied in detecting SNPs from many large-scale next-generation sequencing studies.
To get a copy go to the UMAKE Download download page.
To build UMAKE, download the UMAKE package from the link above and run the following series of commands.
tar xzvf umake.r100.20110705.tar.gz cd umake make
UMAKE is designed to be portable. However, since development occurs only on Ubuntu 9.10 x86 and x64 platforms, and later, there are likely other portability issues.
Currently we support UMAKE only on Ubuntu 9.10 and later on 64-bit processors. perl (5.0 or higher) must be installed with IO::File, IO::Zlib, and Getopt::Long packages.
Note that UMAKE requires external software packages to be copied to
Basic Usage Example
Here is a typical command line:
perl $(UMAKE_HOME)/scripts/umake.pl --conf [conf.file]
Example configuration file can be found at examples/umake-example.conf. Users have to modify the configuration files to
The full pipeline of UMAKE has to be be partitioned into three parts, (1) SNP detection (2) LD-aware genotype refinement using beagle (3) MaCH/Thunder genotype refinement on top of beagle haplotypes. These three steps can be run with the same configuration file using the following options
perl $(UMAKE_HOME)/scripts/umake.pl --conf [conf.file] --snpcall perl $(UMAKE_HOME)/scripts/umake.pl --conf [conf.file] --beagle perl $(UMAKE_HOME)/scripts/umake.pl --conf [conf.file] --thunder
== Exercise with Example Resouces Example input files can be downloaded at UMAKE Download. These example resource files includs sequence alignment files over 60 individuals from the 1000 Genomes project, focusing on 300kb region in chromosome 20. Note that the reference genome FASTA file has also been modified to use chromosome 20 only.
UMAKE_HOME be the path to the UMAKE package and
EXAMPLE_HOME be the path to the example resource files.
- First, modify
UMAKE_ROOT, INPUT_ROOT, OUTPUT_ROOTparameters accordingly.
- Second, perform SNP calling procedure using the following command
perl $(UMAKE_HOME)/scripts/umake.pl --snpcall
- Third, run BEAGLE genotype refinement using the
perl $(UMAKE_HOME)/scripts/umake.pl --beagle
- Finally, run BEAGLE/THUNDER genotype refinement using the
perl $(UMAKE_HOME)/scripts/umake.pl --thunder
Preparing Your Own Input Files
UMAKE requires three types of input files (1) a set of BAM files (2) index file (3) configuration file
- BAM files need to be duplicate-marked and base-quality recalibrated in order to obtain high quality SNP calls.
- Each line of Index file represents each individual under the following format. Note that multiple BAMs per individual may be provided.
[SAMPLE_ID] [COMMA SEPARATED POPULATION LABELS] [BAM_FILE1] [BAM_FILE2] ...
- Additional input Files including Pedigree files (PED format) (to specify gender information in chrX calling), Target information (UCSC's BED format) in targeted or whole exome capture sequencing may be provided.
- Configuration file contains core information of run-time options including the software binaries and command line arguments. Refer to the example configuration file for further information
UMAKE pipeline consists of the following software components (details TBA)
UMAKE is a result from collaborative effort by Hyun Min Kang, Goo Jun, Carlo Sidore, Paul Anderson, Mary Kate Trost, Wei Chen, Tom Blackwell, and Goncalo Abecasis. Please email to Hyun Min Kang [email@example.com ] for any questions.