Difference between revisions of "C++ Library: libStatGen"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 40: Line 40:
  
 
= Library Documentation =
 
= Library Documentation =
Latest Doxygen documentation: <span style="color:#B22222">Coming Soon</span>
+
Latest Doxygen documentation:
 +
[http://www.sph.umich.edu/csg/mktrost/doxygen/2011_08_23/ 8/23/11 Library Documentation in Doxygen]
  
 
Additional documentation: <span style="color:#B22222">Currently outdated, but updates will be coming soon</span>
 
Additional documentation: <span style="color:#B22222">Currently outdated, but updates will be coming soon</span>
Line 97: Line 98:
 
== Using the Library in Your Own Program ==
 
== Using the Library in Your Own Program ==
  
=== Using SampleProgram as a starting point ===
+
=== Starting from a Sample Program ===
[https://github.com/statgen/SampleProgram https://github.com/statgen/SampleProgram] is a simple program demonstrating how to write a tool that uses libStatGen.  You can copy SampleProgram and use it as a starting point for your own program.
+
[https://github.com/statgen/SampleProgram https://github.com/statgen/SampleProgram] is a simple program demonstrating how to write a tool that uses libStatGen and can be used as a starting point for your tool.   
 
 
In the main SampleProgram directory, update ChangeLog, .gitignore, and README.txt as appropriate.
 
 
 
No changes to Makefile should be necessary.
 
 
 
Update Makefile.inc, replacing all occurrences of <code>LIB_PATH_SAMPLE_PROGRAM</code> with <code>LIB_PATH_YOUR_PROGRAM_NAME</code> where <code>YOUR_PROGRAM_NAME</code> is a unique name that can be used to set an environment variable to locate the library for your program(Most likely this won't need to be set, but is there in case you need it.)
 
 
 
 
 
  
 
SampleProgram has 4 subdirectories:
 
SampleProgram has 4 subdirectories:
Line 113: Line 106:
 
* src - this is where your own program code goes
 
* src - this is where your own program code goes
 
* test - this is where your test code goes.  Test code can be setup to run with <code>make test</code> to ensure the program works properly.
 
* test - this is where your test code goes.  Test code can be setup to run with <code>make test</code> to ensure the program works properly.
 +
 +
'''Using SampleProgram as a starting point for your tool:'''
 +
# Copy SampleProgram into a directory with your program name (it is the starting point for your own program).
 +
# Update ChangeLog, .gitignore, and README.txt as appropriate.
 +
# Add any necessary copyrights to the copyrights directory.
 +
#* No changes to Makefile should be necessary.
 +
# Update Makefile.inc
 +
## Update the VERSION as necessary.
 +
## Replace all occurrences of <code>SAMPLE_PROGRAM</code> with an all caps name for your program.
 +
##*  You can then use the <code>LIB_PATH_<your program name></code> environment variable to specify an alternate path to libStatGen specific for your program.  In most cases you will not need to do this.
 +
#* No other updates to Makefile.inc should be necessary.
 +
# Add your program (cpp & h files) to the <code>src</code> directory.
 +
# Update src/Makefile
 +
## Set EXE to your program executable (replacing sampleProgram)
 +
## Set TOOLBASE, SRCONLY, and HDRONLY as appropriate for specifying your program file names.
 +
## Set any of the other optional settings as specified in the sample makefile.
 +
#* No other changes should be necessary to src/Makefile.
 +
# Add your tests to the <code>test</code> directory.
 +
# Update test/Makefile as appropriate for specifying how to compile/run your tests.
 +
  
 
=== Working from Scratch ===
 
=== Working from Scratch ===
 
When compiling your code, be sure to include the library header files found in libStatgen/include/ and link in the appropriate library (opt, debug, or profile).
 
When compiling your code, be sure to include the library header files found in libStatgen/include/ and link in the appropriate library (opt, debug, or profile).
 
 
<span style="color:#B22222">Coming Soon</span>
 
 
 
''' Below are the outdated instructions for <code>statgen</code>
 
 
In the following instructions/comments:
 
* replace STATGEN_DIR with the path to where the statgen directory is located (does not include statgen/).
 
* replace MY_CODE_DIR with the path where you want your code located.
 
 
To use the StatGen Library, first download and compile via [[StatGen Download | StatGen Download Instructions]] and [[StatGen Repository#Compile.2FBuild | StatGen Compile/Build Instructions]]
 
 
This creates the library: STATGEN_DIR/statgen/lib/libStatGen.a, where STATGEN_DIR is the path to where you dec
 
 
== Creating programs using the Default Makefile ==
 
# Create a directory for your own code.
 
#* <code>mkdir MY_CODE_DIR</code>
 
# Move into your directory.
 
#* <code>cd MY_CODE_DIR</code>
 
# Copy the Makefile from the statgen directory.
 
#* <code>cp STATGEN_DIR/statgen/src/Makefile.src Makefile
 
#* You could instead link the Makefile, but then be careful not to modify it for your program because that may break any other programs that link to it (including those with the statgen repository).
 
# Create <code>Makefile.tool</code>
 
#* See [[Makefile.tool]] for what to put into Makefile.tool.
 
# Compile your program
 
#* For optimal performance (be sure you also compiled statgen for optimal performance):
 
#** <code>make</code>
 
#* For debug (be sure that you also compiled statgen for debug):
 
#** <code>make OPTFLAG="-ggdb -O0"</code>
 
 
NOTE: When you compile, all of your '.o' files will go in a directory called obj that will be created by the Makefile if it does not already exist.
 
 
== Creating programs Without the Default Makefile ==
 
You can also use your own Makefile or method of building.
 
 
Just be sure to add -ISTAGEN_DIR/statgen/lib/include to your compile line to pull in the library header files.
 
 
Add <code>STATGEN_DIR/stagen/lib/libStatGen.a STATGEN_DIR/statgen/lib/samtools/libbam.a -lm -lz -lssl</code> in that order to the end of your compile line to pull in the necessary libraries.
 
 
NOTE: These are all handled for you if you use Makefile.src from the statgen repository.
 
 
= Recently Added Capabilities =
 
* [[SAM/BAM Convert Sequence|SAM/BAM support conversion between '=' and the base in a sequence]]
 

Revision as of 11:28, 23 August 2011


Description

Open source, freely available (GPL license), easy to use C++ APIs

  • General Operation Classes including:
    • File/Stream I/O – uncompressed, BGZF, GZIP, stdin, stdout
    • String processing
    • Parameter Parsing
  • Statistical Genetic Specific Classes including:
    • Handling Common file formats – SAM/BAM, FASTQ, GLF, VCF (coming soon)
      • Accessors to get/set values
      • Indexed access to BAM files
    • Utility classes, including:
      • Cigar – interpretation and mapping between query and reference
      • Pileup – structured access to data by individual reference position

Where to Find It

libStatGen can be found at: https://github.com/statgen/libStatGen

You can both browse and download the library at that address.

This is also a git repository. You can create your own git clone from this location:

git clone https://github.com/statgen/libStatGen.git

or

git clone git://github.com/statgen/libStatGen.git

Either of these commands create a directory called libStatGen in the current directory.

If you decide to use git, but need a refresher, see How To Use Git or Notes on how to use git (if you have access)

What has changed

The pipeline and statgen repositories have been deprecated, so please update to our new framework.

libStatGen is the new git repository for our library code.

There are now separate repositories for specific tools/groups of tools, allowing us to track everything separately so it is easier to follow changes that impact a specific tool or the library in general.


Library Documentation

Latest Doxygen documentation: 8/23/11 Library Documentation in Doxygen

Additional documentation: Currently outdated, but updates will be coming soon

  • libStatGen: general - General classes for file processing and performing common tasks (used by most other libraries).
  • libStatGen: BAM - Classes specific for reading/writing/analyzing SAM/BAM files.
  • libStatGen: GLF - Classes specific for reading/writing/analyzing GLF files.
  • libStatGen: FASTQ - Classes specific for reading/writing/analyzing FastQ files.

Using the Library

Building the Library

If you type make help, you get the build options.

Makefile help
-------------
Type...           To...
make              Compile opt 
make help         Display this help screen
make all          Compile everything (opt, debug, & profile)
make opt          Compile optimized
make debug        Compile for debug
make profile      Compile for profile
make clean        Delete temporary files
make test         Execute tests (if there are any)

When you just type make, it will by default to make opt.

Make all indicates opt, debug, and profile.

opt creates libStatGen.a, debug creates libStatGen_debug.a, profile creates libStatGen_profile.a

These libraries are created in the top level libStatGen directory and can then be linked to appropriately for optimized, debugging, or profiling builds.

Navigating the Library Subdirectories

Under the main libStatGen repository, there are:

  • bam - library code for operating on bam files.
  • copyrights - copyrights for the library and any code included with it.
  • fastq - library code for operating on fastq files.
  • general - library code for general operations
  • glf - library code for operating on glf files.
  • include - after compiling, the library headers are linked here
  • Makefiles - directory containing Makefiles that are used in the library and can be used for developing programs using the library
  • samtools - library code used from samtools

After Compiling: libStatGen.a, libStatGen_debug.a, libStatGen_profile.a are created at the top level.

bam, fastq, general, glf, samtools

Object files are placed in an obj directory under each subdirectory with debug & profile objects in obj/debug and obj/profile.

Most also have a test directory. Tests are executed by running make test

Makefiles

This directory contains base makefiles and makefile settings that are used by the library and by programs being written to use the library.

Using the Library in Your Own Program

Starting from a Sample Program

https://github.com/statgen/SampleProgram is a simple program demonstrating how to write a tool that uses libStatGen and can be used as a starting point for your tool.

SampleProgram has 4 subdirectories:

  • copyrights - contains the copyright information, add your own copyrights as necessary
  • obj - this directory is where the object files are placed when the code is compiled (with a subdirectory for debug and profile objects)
  • src - this is where your own program code goes
  • test - this is where your test code goes. Test code can be setup to run with make test to ensure the program works properly.

Using SampleProgram as a starting point for your tool:

  1. Copy SampleProgram into a directory with your program name (it is the starting point for your own program).
  2. Update ChangeLog, .gitignore, and README.txt as appropriate.
  3. Add any necessary copyrights to the copyrights directory.
    • No changes to Makefile should be necessary.
  4. Update Makefile.inc
    1. Update the VERSION as necessary.
    2. Replace all occurrences of SAMPLE_PROGRAM with an all caps name for your program.
      • You can then use the LIB_PATH_<your program name> environment variable to specify an alternate path to libStatGen specific for your program. In most cases you will not need to do this.
    • No other updates to Makefile.inc should be necessary.
  5. Add your program (cpp & h files) to the src directory.
  6. Update src/Makefile
    1. Set EXE to your program executable (replacing sampleProgram)
    2. Set TOOLBASE, SRCONLY, and HDRONLY as appropriate for specifying your program file names.
    3. Set any of the other optional settings as specified in the sample makefile.
    • No other changes should be necessary to src/Makefile.
  7. Add your tests to the test directory.
  8. Update test/Makefile as appropriate for specifying how to compile/run your tests.


Working from Scratch

When compiling your code, be sure to include the library header files found in libStatgen/include/ and link in the appropriate library (opt, debug, or profile).