C++ Library: libStatGen
Description
Open source, freely available (GPL license), easy to use C++ APIs
- General Operation Classes including:
- File/Stream I/O – uncompressed, BGZF, GZIP, stdin, stdout
- String processing
- Parameter Parsing
- Statistical Genetic Specific Classes including:
- Handling Common file formats – SAM/BAM, FASTQ, GLF, VCF (coming soon)
- Accessors to get/set values
- Indexed access to BAM files
- Utility classes, including:
- Cigar – interpretation and mapping between query and reference
- Pileup – structured access to data by individual reference position
- Handling Common file formats – SAM/BAM, FASTQ, GLF, VCF (coming soon)
Can be used to create your own C++ programs.
Currently the repository is recommended for Unix/Linux users with access to the GNU C++ compiler.
Copyrights
If you use this software, please e-mail me, Mary Kate Trost, at mktrost@umich.edu
Here are links to the copyrights for our code and some of the utilities it uses:
- GNU GENERAL PUBLIC LICENSE and Our Copyright Note
- Copyright for MERSENNE TWISTER (used in Random.cpp)
- Samtools Copyright (MIT License)
Copies of these can be found in our library under libStatGen/copyrights/.
Where to Find It
libStatGen can be found at: https://github.com/statgen/libStatGen
libStatGen can also be found within certain releases of statgen tools. The tools will have a release version that also includes a copy of libStatGen.
On github, you can both browse and download the latest version of the library, as well as explore the history of changes.
You can access the latest version with or without using git.
Using Git To Track the Current Development Version
Clone (get your own copy)
You can create your own git clone (copy) using:
git clone https://github.com/statgen/libStatGen.git
or
git clone git://github.com/statgen/libStatGen.git
Either of these commands create a directory called libStatGen
in the current directory.
Then just cd libStatGen
and compile the library.
Get the latest Updates (update your copy)
To update your copy to the latest version (a major advantage of using git):
cd pathToYourCopy/libStatGen
make clean
git pull
make all
Git Refresher
If you decide to use git, but need a refresher, see How To Use Git or Notes on how to use git (if you have access)
Getting the Latest Development Version without Using Git
Periodically download the latest copy from github from the "Downloads" link on the webpage or https://github.com/statgen/libStatGen/archives/master.
The downloaded tar file is named "statgen-libStatGen-someHexNumber.tar.gz". The directory created when it is untared shares the same base name. I recommend that you do not change the name of the directory. If you want one called libStatGen, create a link to this directory. The hex number in the directory name identifies the version of the library that you downloaded and is necessary to easily troubleshoot any issues you encounter. If you must rename the directory, be sure to record the hex number that was on the download for future reference.
Released Versions
Released Versions are documented at libStatGen Download
What has changed
The pipeline
and statgen
repositories have been deprecated, so please update to our new framework.
libStatGen
is the new git repository for our library code.
There are now separate repositories for specific tools/groups of tools, allowing us to track everything separately so it is easier to follow changes that impact a specific tool or the library in general.
Library Documentation
Latest Doxygen documentation: 8/23/11 Library Documentation in Doxygen
Additional documentation: Currently outdated, but updates will be coming soon
- libStatGen: general - General classes for file processing and performing common tasks (used by most other libraries).
- libStatGen: BAM - Classes specific for reading/writing/analyzing SAM/BAM files.
- libStatGen: GLF - Classes specific for reading/writing/analyzing GLF files.
- libStatGen: FASTQ - Classes specific for reading/writing/analyzing FastQ files.
Using the Library
Dependencies
- This software requires the following to be installed:
- g++
- development version of zlib (zlib1g-dev on ubuntu)
- openssl and md5 (libssl-dev on ubuntu)
- Compiles on Linux/Unix
Building the Library
If you type make help, you get the build options.
Makefile help ------------- Type... To... make Compile opt make help Display this help screen make all Compile everything (opt, debug, & profile) make opt Compile optimized make debug Compile for debug make profile Compile for profile make clean Delete temporary files make test Execute tests (if there are any)
When you just type make, it will by default to make opt (optimized).
Make all indicates opt, debug, and profile.
opt creates libStatGen.a
, debug creates libStatGen_debug.a
, profile creates libStatGen_profile.a
These libraries are created in the top level libStatGen directory and can then be linked to appropriately for building tools as optimized, debugging, and/or profiling.
Under the main libStatGen repository, there are:
- bam - library code for operating on bam files.
- copyrights - copyrights for the library and any code included with it.
- fastq - library code for operating on fastq files.
- general - library code for general operations
- glf - library code for operating on glf files.
- include - after compiling, the library headers are linked here
- Makefiles - directory containing Makefiles that are used in the library and can be used for developing programs using the library
- samtools - library code used from samtools
After Compiling: libStatGen.a, libStatGen_debug.a, libStatGen_profile.a are created at the top level.
bam, fastq, general, glf, samtools
Object files are placed in an obj directory under each subdirectory with debug & profile objects in obj/debug and obj/profile.
Most also have a test directory. Tests are executed by running make test
Makefiles
This directory contains base makefiles and makefile settings that are used by the library and by programs being written to use the library.
Using the Library in Your Own Program
Starting from a Sample Program
https://github.com/statgen/SampleProgram is a simple program demonstrating how to write a tool that uses libStatGen and can be used as a starting point for your tool.
SampleProgram has 4 subdirectories:
- copyrights - contains the copyright information, add your own copyrights as necessary
- obj - this directory is where the object files are placed when the code is compiled (with a subdirectory for debug and profile objects)
- src - this is where your own program code goes
- test - this is where your test code goes. Test code can be setup to run with
make test
to ensure the program works properly.
Using SampleProgram as a starting point for your tool:
- Copy SampleProgram into a directory with your program name (it is the starting point for your own program).
- Update ChangeLog, .gitignore, and README.txt as appropriate.
- Add any necessary copyrights to the copyrights directory.
- No changes to Makefile should be necessary.
- Update Makefile.inc
- Update the VERSION as necessary.
- Replace all occurrences of
SAMPLE_PROGRAM
with an all caps name for your program.- You can then use the
LIB_PATH_<your program name>
environment variable to specify an alternate path to libStatGen specific for your program. In most cases you will not need to do this.
- You can then use the
- No other updates to Makefile.inc should be necessary.
- Add your program (cpp & h files) to the
src
directory. - Update src/Makefile
- Set EXE to your program executable (replacing sampleProgram)
- Set TOOLBASE, SRCONLY, and HDRONLY as appropriate for specifying your program file names.
- Set any of the other optional settings as specified in the sample makefile.
- No other changes should be necessary to src/Makefile.
- Add your tests to the
test
directory. - Update test/Makefile as appropriate for specifying how to compile/run your tests.
Working from Scratch
When compiling your code, be sure to include the library header files found in libStatgen/include/ and link in the appropriate library (opt: libStatGen.a, debug: libStatGen_debug.a, or profile: libStatGen_profile.a).
Troubleshooting
If you are having trouble compiling any of the versions, check libStatGen Troubleshooting for help. If that does not solve your problem, email me for support.