Difference between revisions of "C++ Library: libStatGen"

From Genome Analysis Wiki
Jump to navigationJump to search
(28 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[Category:Software Libraries]]
 
 
[[Category:C++]]
 
[[Category:C++]]
 
[[Category:libStatGen]]
 
[[Category:libStatGen]]
  
= DESCRIPTION =
+
= Description =
 
Open source, freely available (GPL license), easy to use C++ APIs
 
Open source, freely available (GPL license), easy to use C++ APIs
 
* General Operation Classes including:
 
* General Operation Classes including:
Line 9: Line 8:
 
** String processing
 
** String processing
 
** Parameter Parsing
 
** Parameter Parsing
* Statistical Genetic Specific Classes including:
+
* '''Statistical Genetic Specific Classes''' including:
**Handling Common file formats – SAM/BAM, FASTQ, GLF
+
**Handling Common file formats – SAM/BAM, FASTQ, GLF, VCF (coming soon)
 
***Accessors to get/set values
 
***Accessors to get/set values
 
***Indexed access to BAM files
 
***Indexed access to BAM files
Line 16: Line 15:
 
***Cigar – interpretation and mapping between query and reference
 
***Cigar – interpretation and mapping between query and reference
 
***Pileup – structured access to data by individual reference position
 
***Pileup – structured access to data by individual reference position
 +
 +
Can be used to create your own C++ programs.
 +
 +
Currently the repository is recommended for Unix/Linux users with access to the GNU C++ compiler.
 +
 +
 +
= Copyrights =
 +
'''If you use this software, please e-mail me, Mary Kate Wing, at mktrost@umich.edu'''
 +
 +
Here are links to the copyrights for our code and some of the utilities it uses:
 +
*[https://github.com/statgen/libStatGen/blob/master/general/COPYING GNU GENERAL PUBLIC LICENSE] and [https://github.com/statgen/libStatGen/blob/master/general/LICENSE.txt Our Copyright Note]
 +
*[https://github.com/statgen/libStatGen/blob/master/general/LICENSE.twister Copyright for MERSENNE TWISTER (used in Random.cpp)]
 +
*[https://github.com/statgen/libStatGen/blob/master/samtools/COPYING Samtools Copyright (MIT License)]
 +
Copies of these can be found in our library under libStatGen/copyrights/.
 +
 +
= Join in libStatGen mailing list =
 +
 +
Please join in the [http://groups.google.com/group/libStatGen libStatGen Google Group] to ask / discuss / comment about this library.
 +
 +
 +
= Troubleshooting =
 +
If you are having trouble compiling any of the versions, check [[libStatGen Troubleshooting]] for help.  If that does not solve your problem, email me for support.
 +
 +
 +
= Where to Find It =
 +
 +
{{ToolGitRepo|repoName=libStatGen|libStatGen=true|libBaseName=libStatGen}}
 +
 +
== Releases ==
 +
Released Versions are documented at [[libStatGen Download]]
 +
 +
= What has changed =
 +
The <code>pipeline</code> and <code>statgen</code> repositories have been deprecated, so please update to our new framework.
 +
 +
<code>libStatGen</code> is the new git repository for our library code. 
 +
 +
There are now separate repositories for specific tools/groups of tools, allowing us to track everything separately so it is easier to follow changes that impact a specific tool or the library in general.
  
  
 
= Library Documentation =
 
= Library Documentation =
Latest Doxygen documentation:  
+
Latest Doxygen documentation:
[http://www.sph.umich.edu/csg/mktrost/doxygen/march22_2011/ 3/22/11 Library Documentation in Doxygen]
+
<!-- <a href="http://csg.sph.umich.edu//abecasis/GOLD/ -->
 +
<!-- [http://www.sph.umich.edu/csg/mktrost/doxygen/current/ Current Library Documentation in Doxygen] -->
 +
[http://csg.sph.umich.edu//mktrost/doxygen/current/ Current Library Documentation in Doxygen]
  
[http://www.sph.umich.edu/csg/mktrost/doxygen/version.0.1.2/ Version 0.1.2 Library Documentation in Doxygen]
+
Additional documentation:
 +
* [[libStatGen: general]] - General classes for file processing and performing common tasks (used by most other libraries).
 +
* [[libStatGen: BAM]] - Classes specific for reading/writing/analyzing SAM/BAM files.
 +
* [[libStatGen: GLF]] - Classes specific for reading/writing/analyzing GLF files.
 +
* [[libStatGen: FASTQ]] - Classes specific for reading/writing/analyzing FastQ files.
 +
* [[libStatGen: ASP]] - Classes specific for reading/writing/analyzing ASP files.
 +
* [[libStatGen: VCF]] - Classes specific for reading/writing/analyzing VCF files.
  
Possibly out of date documentaitons:
+
= Using the Library =
* [[StatGenLibrary: general]] - General classes for file processing and performing common tasks (used by most other libraries).
+
== Dependencies ==
* [[StatGenLibrary: BAM]] - Classes specific for reading/writing/analyzing SAM/BAM files.
+
* This software requires the following to be installed:
* [[StatGenLibrary: GLF]] - Classes specific for reading/writing/analyzing GLF files.
+
** g++
* [[StatGenLibrary: FASTQ]] - Classes specific for reading/writing/analyzing FastQ files.
+
** development version of zlib (zlib1g-dev on ubuntu)
 +
* Compiles on Linux/Unix
  
 +
== Building the Library ==
  
= Using the Library =
+
If you type make help, you get the build options.
To use the StatGen Library, first download and compile via [[StatGen Download | StatGen Download Instructions]] and [[StatGen Repository#Compile.2FBuild | StatGen Compile/Build Instructions]]
+
<pre>
This creates the library: statgen/lib/libStatGen.a
+
Makefile help
 +
-------------
 +
Type...          To...
 +
make              Compile opt
 +
make help        Display this help screen
 +
make all          Compile everything (opt, debug, & profile)
 +
make opt          Compile optimized
 +
make debug        Compile for debug
 +
make profile      Compile for profile
 +
make clean        Delete temporary files
 +
make test        Execute tests (if there are any)
 +
</pre>
 +
 
 +
When you just type make, it will by default to make opt (optimized).
 +
 
 +
Make all indicates opt, debug, and profile.
 +
 
 +
opt creates <code>libStatGen.a</code>, debug creates <code>libStatGen_debug.a</code>, profile creates <code>libStatGen_profile.a</code>
 +
 
 +
These libraries are created in the top level libStatGen directory and can then be linked to appropriately for building tools as optimized, debugging, and/or profiling.
 +
 
 +
== Navigating the Library Subdirectories ==
 +
Under the main libStatGen repository, there are:
 +
*bam - library code for operating on bam files.
 +
*copyrights - copyrights for the library and any code included with it.
 +
*fastq - library code for operating on fastq files.
 +
*general - library code for general operations
 +
*glf - library code for operating on glf files.
 +
*include - after compiling, the library headers are linked here
 +
*Makefiles - directory containing Makefiles that are used in the library and can be used for developing programs using the library
 +
*samtools - library code used from samtools
 +
 
 +
After Compiling: libStatGen.a, libStatGen_debug.a, libStatGen_profile.a are created at the top level.
 +
 
 +
=== bam, fastq, general, glf, samtools ===
 +
Object files are placed in an obj directory under each subdirectory with debug & profile objects in obj/debug and obj/profile.
 +
 
 +
Most also have a test directory. Tests are executed by running <code>make test</code>
 +
 
 +
=== Makefiles ===
 +
This directory contains base makefiles and makefile settings that are used by the library and by programs being written to use the library.
 +
 
 +
== Using the Library in Your Own Program ==
 +
 
 +
=== Starting from a Sample Program (Recommended) ===
 +
[https://github.com/statgen/SampleProgram https://github.com/statgen/SampleProgram] is a simple program demonstrating how to write a tool that uses libStatGen and can be used as a starting point for your tool. 
 +
 
 +
SampleProgram has 4 subdirectories:
 +
* copyrights - contains the copyright information, add your own copyrights as necessary
 +
* obj - this directory is where the object files are placed when the code is compiled (with a subdirectory for debug and profile objects)
 +
* src - this is where your own program code goes
 +
* test - this is where your test code goes.  Test code can be setup to run with <code>make test</code> to ensure the program works properly.
 +
 
 +
'''Using SampleProgram as a starting point for your tool:'''
 +
# Copy SampleProgram into a directory with your program name (it is the starting point for your own program).
 +
# Update ChangeLog, .gitignore, and README.txt as appropriate.
 +
# Add any necessary copyrights to the copyrights directory.
 +
#* No changes to Makefile should be necessary.
 +
# Update Makefile.inc
 +
## Update the VERSION as necessary.
 +
## Replace all occurrences of <code>SAMPLE_PROGRAM</code> with an all caps name for your program.
 +
##*  You can then use the <code>LIB_PATH_<your program name></code> environment variable to specify an alternate path to libStatGen specific for your program.  In most cases you will not need to do this.
 +
#* No other updates to Makefile.inc should be necessary.
 +
# Add your program (cpp & h files) to the <code>src</code> directory.
 +
# Update src/Makefile
 +
## Set EXE to your program executable (replacing sampleProgram)
 +
## Set TOOLBASE, SRCONLY, and HDRONLY as appropriate for specifying your program file names.
 +
## Set any of the other optional settings as specified in the sample makefile.
 +
#* No other changes should be necessary to src/Makefile.
 +
# Add your tests to the <code>test</code> directory.
 +
# Update test/Makefile as appropriate for specifying how to compile/run your tests.
 +
 
 +
 
 +
After compiling a <code>bin</code> directory is created in the top level directory.  Your executable goes in there. If you build for <code>debug</code> and/or <code>profile</code>, subdirectories for those are created under <code>bin/</code> and <code>obj</code>.
 +
 
 +
 
 +
=== Working from Scratch ===
 +
When compiling your code, be sure to include the library header files found in libStatgen/include/ and link in the appropriate library (opt: libStatGen.a, debug: libStatGen_debug.a, or profile: libStatGen_profile.a).
 +
 
 +
 
 +
=== Starting from a Sample Set of Tools ===
 +
[https://github.com/statgen/SampleTools https://github.com/statgen/SampleTools] is a repository containing multiple programs within one directory structure.  It demonstrates how to have subdirectories for each tool using libStatGen and can be used as a starting point for your set of tools.
 +
 
 +
SampleTools has 3 subdirectories:
 +
* copyrights - contains the copyright information, add your own copyrights as necessary
 +
* SampleProgram1 - a dummy demo program to show the structure for having multiple programs
 +
* SampleProgram2 - a second dummy demo program to show the structure for having multiple programs
 +
 
 +
SampleProgram1 & SampleProgram2 have 2 subdirectories:
 +
* src - this is where your own program code goes
 +
* test - this is where your test code goes.  Test code can be setup to run with <code>make test</code> to ensure the program works properly.
  
== Creating programs ==
+
Upon compiling, an <code>obj</code> directory is created under <code>SampleProgram1</code> and <code>SampleProgram2</code> and a <code>bin</code> directory is created at the top level.  If you build for <code>debug</code> and/or <code>profile</code>, subdirectories for those are created under <code>bin/</code> and <code>SampleProgram1(2)/obj</code>.
After the statgen repository has been created:
 
  
  
''' THIS IS IN THE PROCESS OF BEING UPDATED - CHECK BACK SOON'''
+
'''Using SampleTools as a starting point for your set of tools:'''
 +
# Copy <code>SampleTools</code> into a directory with your toolset name (it is the starting point for your own set of tools).
 +
# Update <code>ChangeLog</code>, <code>.gitignore</code>, and <code>README.txt</code> as appropriate.
 +
# Add any necessary copyrights to the copyrights directory.
 +
# Rename the <code>SampleProgram1</code> and <code>SampleProgram2</code> directories
 +
# Create any additional directories as necessary.
 +
#* Recursively copy the structure/Makefiles from <code>SampleProgram1</code>.
 +
# Update <code>SUBDIRS</code> in <code>Makefile</code> as necessary.
 +
# Update <code>Makefile.inc</code>
 +
## Update the <code>VERSION</code> as necessary.
 +
## Replace all occurrences of <code>SAMPLE_PROGRAM</code> with an all caps name for your toolset.
 +
##*  You can then use the <code>LIB_PATH_<your toolset name></code> environment variable to specify an alternate path to libStatGen specific for your program.  In most cases you will not need to do this.
 +
#* No other updates to <code>Makefile.inc</code> should be necessary.
 +
# For each Program you want to add:
 +
## Move into the appropriate subdirectory.
 +
##* No change should be made to the program's <code>Makefile</code>
 +
## Add your program (cpp & h files) to the <code>src</code> subdirectory.
 +
## Update src/Makefile
 +
### Set EXE to your program executable (replacing sampleProgram)
 +
### Set TOOLBASE, SRCONLY, and HDRONLY as appropriate for specifying your program file names.
 +
### Set any of the other optional settings as specified in the sample makefile.
 +
##* No other changes should be necessary to src/Makefile.
 +
## Add your tests to the <code>test</code> directory.
 +
## Update test/Makefile as appropriate for specifying how to compile/run your tests.
  
If you are creating a program, you can start with Makefile.src found at statgen/src/.
 
  
(If you are creating a program in a directory outside of statgen/src, you may need additional modifications.)
+
= How To Use the APIs =
 +
More coming soon, see: http://genome.sph.umich.edu/wiki/Sam_Library_Usage_Examples
  
'''Example: creating statgen/src/myprog/'''
+
[[LibStatGen: ASP#API for Reading ASP Files| ASP APIs]]
# cd statgen/src/myprog
 
# ln -s ../Makefile.src Makefile
 
# cp ../bam/Makefile.tool .
 
# Update Makefile.tool for your specific program.
 
#* You may need settings beyond were set in statgen/src/bam/Makefile.tool.  See Makefile.src for what settings you can use.
 
  
== Recently Added Capabilities ==
+
[[LibStatGen: VCF#API for Reading VCF Files| VCF APIs]]
* [[SAM/BAM Convert Sequence|SAM/BAM support conversion between '=' and the base in a sequence]]
 

Revision as of 15:54, 31 January 2017


Description

Open source, freely available (GPL license), easy to use C++ APIs

  • General Operation Classes including:
    • File/Stream I/O – uncompressed, BGZF, GZIP, stdin, stdout
    • String processing
    • Parameter Parsing
  • Statistical Genetic Specific Classes including:
    • Handling Common file formats – SAM/BAM, FASTQ, GLF, VCF (coming soon)
      • Accessors to get/set values
      • Indexed access to BAM files
    • Utility classes, including:
      • Cigar – interpretation and mapping between query and reference
      • Pileup – structured access to data by individual reference position

Can be used to create your own C++ programs.

Currently the repository is recommended for Unix/Linux users with access to the GNU C++ compiler.


Copyrights

If you use this software, please e-mail me, Mary Kate Wing, at mktrost@umich.edu

Here are links to the copyrights for our code and some of the utilities it uses:

Copies of these can be found in our library under libStatGen/copyrights/.

Join in libStatGen mailing list

Please join in the libStatGen Google Group to ask / discuss / comment about this library.


Troubleshooting

If you are having trouble compiling any of the versions, check libStatGen Troubleshooting for help. If that does not solve your problem, email me for support.


Where to Find It

The libStatGen repository is available both via release downloads and via github.

On github, https://github.com/statgen/libStatGen, you can both browse and download the libStatGen source code as well as explore the history of changes.

You can obtain the source either with or without git.

A copy of libStatGen is included in certain releases of some statgen tools.

Using Git To Track the Current Development Version

Clone (get your own copy)

You can create your own git clone (copy) using:

git clone https://github.com/statgen/libStatGen.git

or

git clone git://github.com/statgen/libStatGen.git

Either of these commands create a directory called libStatGen in the current directory.

Then just cd libStatGen and compile.

Get the latest Updates (update your copy)

To update your copy to the latest version (a major advantage of using git):

  1. cd pathToYourCopy/libStatGen
  2. make clean
  3. git pull
  4. make all

Git Refresher

If you decide to use git, but need a refresher, see How To Use Git or Notes on how to use git (if you have access)


Downloading From GitHub Without Git

If you download the latest code/version, make sure you periodically update it by downloading a newer version.

From github you can download:

  1. Latest Code (master branch)
    via Website
    1. Goto: https://github.com/statgen/libStatGen
    2. Click on the Download ZIP button on the right side panel.
    via Command Line
    wget https://github.com/statgen/libStatGen/archive/master.tar.gz
    or
    wget https://github.com/statgen/libStatGen/archive/master.zip
  2. Specific Release (via a tag)
    via Website
    1. Goto: https://github.com/statgen/libStatGen/releases to see the available releases
    2. Click zip or tar.gz for the desired version.
    via Command Line
    wget https://github.com/statgen/libStatGen/archive/<tagName>.tar.gz
    or
    wget https://github.com/statgen/libStatGen/archive/<tagName>.zip


After downloading the file, uncompress (unzip/untar) it. The directory created will be named libStatGen-<name of version you downloaded>.

Building

After obtaining the libStatGen repository (either by download or from github), compile the code using:

make all  

Object (.o) files are compiled into the obj directory with a subdirectory debug and profile for the debugging and profiling objects.

This creates the libraries, libStatGen.a, libStatGen_debug.a, libStatGen_profile.a at the top level directory.

make test compiles for opt, debug, and profile and runs the tests (found in the test subdirectory).

To see all make options, type make help.


If compilation fails due to warnings being treated as errors, please contact us so we can fix the warnings. As a work-around to get it to compile, you can disable the treatment of warnings as errors by editing libStatGen/general/Makefile to remove -Werror.

Releases

Released Versions are documented at libStatGen Download

What has changed

The pipeline and statgen repositories have been deprecated, so please update to our new framework.

libStatGen is the new git repository for our library code.

There are now separate repositories for specific tools/groups of tools, allowing us to track everything separately so it is easier to follow changes that impact a specific tool or the library in general.


Library Documentation

Latest Doxygen documentation: Current Library Documentation in Doxygen

Additional documentation:

  • libStatGen: general - General classes for file processing and performing common tasks (used by most other libraries).
  • libStatGen: BAM - Classes specific for reading/writing/analyzing SAM/BAM files.
  • libStatGen: GLF - Classes specific for reading/writing/analyzing GLF files.
  • libStatGen: FASTQ - Classes specific for reading/writing/analyzing FastQ files.
  • libStatGen: ASP - Classes specific for reading/writing/analyzing ASP files.
  • libStatGen: VCF - Classes specific for reading/writing/analyzing VCF files.

Using the Library

Dependencies

  • This software requires the following to be installed:
    • g++
    • development version of zlib (zlib1g-dev on ubuntu)
  • Compiles on Linux/Unix

Building the Library

If you type make help, you get the build options.

Makefile help
-------------
Type...           To...
make              Compile opt 
make help         Display this help screen
make all          Compile everything (opt, debug, & profile)
make opt          Compile optimized
make debug        Compile for debug
make profile      Compile for profile
make clean        Delete temporary files
make test         Execute tests (if there are any)

When you just type make, it will by default to make opt (optimized).

Make all indicates opt, debug, and profile.

opt creates libStatGen.a, debug creates libStatGen_debug.a, profile creates libStatGen_profile.a

These libraries are created in the top level libStatGen directory and can then be linked to appropriately for building tools as optimized, debugging, and/or profiling.

Navigating the Library Subdirectories

Under the main libStatGen repository, there are:

  • bam - library code for operating on bam files.
  • copyrights - copyrights for the library and any code included with it.
  • fastq - library code for operating on fastq files.
  • general - library code for general operations
  • glf - library code for operating on glf files.
  • include - after compiling, the library headers are linked here
  • Makefiles - directory containing Makefiles that are used in the library and can be used for developing programs using the library
  • samtools - library code used from samtools

After Compiling: libStatGen.a, libStatGen_debug.a, libStatGen_profile.a are created at the top level.

bam, fastq, general, glf, samtools

Object files are placed in an obj directory under each subdirectory with debug & profile objects in obj/debug and obj/profile.

Most also have a test directory. Tests are executed by running make test

Makefiles

This directory contains base makefiles and makefile settings that are used by the library and by programs being written to use the library.

Using the Library in Your Own Program

Starting from a Sample Program (Recommended)

https://github.com/statgen/SampleProgram is a simple program demonstrating how to write a tool that uses libStatGen and can be used as a starting point for your tool.

SampleProgram has 4 subdirectories:

  • copyrights - contains the copyright information, add your own copyrights as necessary
  • obj - this directory is where the object files are placed when the code is compiled (with a subdirectory for debug and profile objects)
  • src - this is where your own program code goes
  • test - this is where your test code goes. Test code can be setup to run with make test to ensure the program works properly.

Using SampleProgram as a starting point for your tool:

  1. Copy SampleProgram into a directory with your program name (it is the starting point for your own program).
  2. Update ChangeLog, .gitignore, and README.txt as appropriate.
  3. Add any necessary copyrights to the copyrights directory.
    • No changes to Makefile should be necessary.
  4. Update Makefile.inc
    1. Update the VERSION as necessary.
    2. Replace all occurrences of SAMPLE_PROGRAM with an all caps name for your program.
      • You can then use the LIB_PATH_<your program name> environment variable to specify an alternate path to libStatGen specific for your program. In most cases you will not need to do this.
    • No other updates to Makefile.inc should be necessary.
  5. Add your program (cpp & h files) to the src directory.
  6. Update src/Makefile
    1. Set EXE to your program executable (replacing sampleProgram)
    2. Set TOOLBASE, SRCONLY, and HDRONLY as appropriate for specifying your program file names.
    3. Set any of the other optional settings as specified in the sample makefile.
    • No other changes should be necessary to src/Makefile.
  7. Add your tests to the test directory.
  8. Update test/Makefile as appropriate for specifying how to compile/run your tests.


After compiling a bin directory is created in the top level directory. Your executable goes in there. If you build for debug and/or profile, subdirectories for those are created under bin/ and obj.


Working from Scratch

When compiling your code, be sure to include the library header files found in libStatgen/include/ and link in the appropriate library (opt: libStatGen.a, debug: libStatGen_debug.a, or profile: libStatGen_profile.a).


Starting from a Sample Set of Tools

https://github.com/statgen/SampleTools is a repository containing multiple programs within one directory structure. It demonstrates how to have subdirectories for each tool using libStatGen and can be used as a starting point for your set of tools.

SampleTools has 3 subdirectories:

  • copyrights - contains the copyright information, add your own copyrights as necessary
  • SampleProgram1 - a dummy demo program to show the structure for having multiple programs
  • SampleProgram2 - a second dummy demo program to show the structure for having multiple programs

SampleProgram1 & SampleProgram2 have 2 subdirectories:

  • src - this is where your own program code goes
  • test - this is where your test code goes. Test code can be setup to run with make test to ensure the program works properly.

Upon compiling, an obj directory is created under SampleProgram1 and SampleProgram2 and a bin directory is created at the top level. If you build for debug and/or profile, subdirectories for those are created under bin/ and SampleProgram1(2)/obj.


Using SampleTools as a starting point for your set of tools:

  1. Copy SampleTools into a directory with your toolset name (it is the starting point for your own set of tools).
  2. Update ChangeLog, .gitignore, and README.txt as appropriate.
  3. Add any necessary copyrights to the copyrights directory.
  4. Rename the SampleProgram1 and SampleProgram2 directories
  5. Create any additional directories as necessary.
    • Recursively copy the structure/Makefiles from SampleProgram1.
  6. Update SUBDIRS in Makefile as necessary.
  7. Update Makefile.inc
    1. Update the VERSION as necessary.
    2. Replace all occurrences of SAMPLE_PROGRAM with an all caps name for your toolset.
      • You can then use the LIB_PATH_<your toolset name> environment variable to specify an alternate path to libStatGen specific for your program. In most cases you will not need to do this.
    • No other updates to Makefile.inc should be necessary.
  8. For each Program you want to add:
    1. Move into the appropriate subdirectory.
      • No change should be made to the program's Makefile
    2. Add your program (cpp & h files) to the src subdirectory.
    3. Update src/Makefile
      1. Set EXE to your program executable (replacing sampleProgram)
      2. Set TOOLBASE, SRCONLY, and HDRONLY as appropriate for specifying your program file names.
      3. Set any of the other optional settings as specified in the sample makefile.
      • No other changes should be necessary to src/Makefile.
    4. Add your tests to the test directory.
    5. Update test/Makefile as appropriate for specifying how to compile/run your tests.


How To Use the APIs

More coming soon, see: http://genome.sph.umich.edu/wiki/Sam_Library_Usage_Examples

ASP APIs

VCF APIs