Line 1: |
Line 1: |
− | [[Category:Software Libraries]]
| |
| [[Category:C++]] | | [[Category:C++]] |
| [[Category:libStatGen]] | | [[Category:libStatGen]] |
| | | |
− | == DESCRIPTION == | + | = Description = |
− | Open source, freely available (GPL license), easy to use APIs | + | Open source, freely available (GPL license), easy to use C++ APIs |
− | *File/Stream I/O – uncompressed, BGZF, GZIP, stdin, stdout | + | * General Operation Classes including: |
− | *Common file formats – SAM/BAM, FASTQ, GLF | + | ** File/Stream I/O – uncompressed, BGZF, GZIP, stdin, stdout |
− | **Indexed access to BAM files | + | ** String processing |
− | **Accessors to get/set values | + | ** Parameter Parsing |
− | *Utility classes, including: | + | * '''Statistical Genetic Specific Classes''' including: |
− | **Cigar – interpretation and mapping between query and reference | + | **Handling Common file formats – SAM/BAM, FASTQ, GLF, VCF (coming soon) |
− | **Pileup – structured access to data by individual reference position | + | ***Accessors to get/set values |
| + | ***Indexed access to BAM files |
| + | **Utility classes, including: |
| + | ***Cigar – interpretation and mapping between query and reference |
| + | ***Pileup – structured access to data by individual reference position |
| | | |
− | === Contents ===
| + | Can be used to create your own C++ programs. |
− | * [[StatGenLibrary: general]] - General classes for file processing and performing common tasks (used by most other libraries).
| |
− | * [[StatGenLibrary: BAM]] - Classes specific for reading/writing/analyzing SAM/BAM files.
| |
− | * [[StatGenLibrary: GLF]] - Classes specific for reading/writing/analyzing GLF files.
| |
− | * [[StatGenLibrary: FASTQ]] - Classes specific for reading/writing/analyzing FastQ files.
| |
| | | |
− | [http://www.sph.umich.edu/csg/mktrost/doxygen/ Library Documentation in Doxygen]
| + | Currently the repository is recommended for Unix/Linux users with access to the GNU C++ compiler. |
| | | |
− | == Download ==
| |
− | To use the StatGenLibrary, first download:
| |
− | http://genome.sph.umich.edu/wiki/Software#Download
| |
| | | |
− | *Build: type make | + | = Copyrights = |
− | **the library is built: statgen/lib/libStatGen.a | + | '''If you use this software, please e-mail me, Mary Kate Wing, at mktrost@umich.edu''' |
− | *Use Makefile.include to build with the library. | + | |
− | **See examples in statgen/src. | + | Here are links to the copyrights for our code and some of the utilities it uses: |
| + | *[https://github.com/statgen/libStatGen/blob/master/general/COPYING GNU GENERAL PUBLIC LICENSE] and [https://github.com/statgen/libStatGen/blob/master/general/LICENSE.txt Our Copyright Note] |
| + | *[https://github.com/statgen/libStatGen/blob/master/general/LICENSE.twister Copyright for MERSENNE TWISTER (used in Random.cpp)] |
| + | *[https://github.com/statgen/libStatGen/blob/master/samtools/COPYING Samtools Copyright (MIT License)] |
| + | Copies of these can be found in our library under libStatGen/copyrights/. |
| + | |
| + | = Join in libStatGen mailing list = |
| + | |
| + | Please join in the [http://groups.google.com/group/libStatGen libStatGen Google Group] to ask / discuss / comment about this library. |
| + | |
| + | |
| + | = Troubleshooting = |
| + | If you are having trouble compiling any of the versions, check [[libStatGen Troubleshooting]] for help. If that does not solve your problem, email me for support. |
| + | |
| + | |
| + | = Where to Find It = |
| + | |
| + | {{ToolGitRepo|repoName=libStatGen|libStatGen=true|libBaseName=libStatGen}} |
| + | |
| + | == Releases == |
| + | Released Versions are documented at [[libStatGen Download]] |
| + | |
| + | = What has changed = |
| + | The <code>pipeline</code> and <code>statgen</code> repositories have been deprecated, so please update to our new framework. |
| + | |
| + | <code>libStatGen</code> is the new git repository for our library code. |
| + | |
| + | There are now separate repositories for specific tools/groups of tools, allowing us to track everything separately so it is easier to follow changes that impact a specific tool or the library in general. |
| + | |
| + | |
| + | = Library Documentation = |
| + | Latest Doxygen documentation: |
| + | <!-- <a href="http://csg.sph.umich.edu//abecasis/GOLD/ --> |
| + | <!-- [http://www.sph.umich.edu/csg/mktrost/doxygen/current/ Current Library Documentation in Doxygen] --> |
| + | [http://csg.sph.umich.edu//mktrost/doxygen/current/ Current Library Documentation in Doxygen] |
| + | |
| + | Additional documentation: |
| + | * [[libStatGen: general]] - General classes for file processing and performing common tasks (used by most other libraries). |
| + | * [[libStatGen: BAM]] - Classes specific for reading/writing/analyzing SAM/BAM files. |
| + | * [[libStatGen: GLF]] - Classes specific for reading/writing/analyzing GLF files. |
| + | * [[libStatGen: FASTQ]] - Classes specific for reading/writing/analyzing FastQ files. |
| + | * [[libStatGen: ASP]] - Classes specific for reading/writing/analyzing ASP files. |
| + | * [[libStatGen: VCF]] - Classes specific for reading/writing/analyzing VCF files. |
| + | |
| + | = Using the Library = |
| + | == Dependencies == |
| + | * This software requires the following to be installed: |
| + | ** g++ |
| + | ** development version of zlib (zlib1g-dev on ubuntu) |
| + | * Compiles on Linux/Unix |
| + | |
| + | == Building the Library == |
| + | |
| + | If you type make help, you get the build options. |
| + | <pre> |
| + | Makefile help |
| + | ------------- |
| + | Type... To... |
| + | make Compile opt |
| + | make help Display this help screen |
| + | make all Compile everything (opt, debug, & profile) |
| + | make opt Compile optimized |
| + | make debug Compile for debug |
| + | make profile Compile for profile |
| + | make clean Delete temporary files |
| + | make test Execute tests (if there are any) |
| + | </pre> |
| + | |
| + | When you just type make, it will by default to make opt (optimized). |
| + | |
| + | Make all indicates opt, debug, and profile. |
| + | |
| + | opt creates <code>libStatGen.a</code>, debug creates <code>libStatGen_debug.a</code>, profile creates <code>libStatGen_profile.a</code> |
| + | |
| + | These libraries are created in the top level libStatGen directory and can then be linked to appropriately for building tools as optimized, debugging, and/or profiling. |
| + | |
| + | == Navigating the Library Subdirectories == |
| + | Under the main libStatGen repository, there are: |
| + | *bam - library code for operating on bam files. |
| + | *copyrights - copyrights for the library and any code included with it. |
| + | *fastq - library code for operating on fastq files. |
| + | *general - library code for general operations |
| + | *glf - library code for operating on glf files. |
| + | *include - after compiling, the library headers are linked here |
| + | *Makefiles - directory containing Makefiles that are used in the library and can be used for developing programs using the library |
| + | *samtools - library code used from samtools |
| + | |
| + | After Compiling: libStatGen.a, libStatGen_debug.a, libStatGen_profile.a are created at the top level. |
| + | |
| + | === bam, fastq, general, glf, samtools === |
| + | Object files are placed in an obj directory under each subdirectory with debug & profile objects in obj/debug and obj/profile. |
| + | |
| + | Most also have a test directory. Tests are executed by running <code>make test</code> |
| + | |
| + | === Makefiles === |
| + | This directory contains base makefiles and makefile settings that are used by the library and by programs being written to use the library. |
| + | |
| + | == Using the Library in Your Own Program == |
| + | |
| + | === Starting from a Sample Program (Recommended) === |
| + | [https://github.com/statgen/SampleProgram https://github.com/statgen/SampleProgram] is a simple program demonstrating how to write a tool that uses libStatGen and can be used as a starting point for your tool. |
| + | |
| + | SampleProgram has 4 subdirectories: |
| + | * copyrights - contains the copyright information, add your own copyrights as necessary |
| + | * obj - this directory is where the object files are placed when the code is compiled (with a subdirectory for debug and profile objects) |
| + | * src - this is where your own program code goes |
| + | * test - this is where your test code goes. Test code can be setup to run with <code>make test</code> to ensure the program works properly. |
| + | |
| + | '''Using SampleProgram as a starting point for your tool:''' |
| + | # Copy SampleProgram into a directory with your program name (it is the starting point for your own program). |
| + | # Update ChangeLog, .gitignore, and README.txt as appropriate. |
| + | # Add any necessary copyrights to the copyrights directory. |
| + | #* No changes to Makefile should be necessary. |
| + | # Update Makefile.inc |
| + | ## Update the VERSION as necessary. |
| + | ## Replace all occurrences of <code>SAMPLE_PROGRAM</code> with an all caps name for your program. |
| + | ##* You can then use the <code>LIB_PATH_<your program name></code> environment variable to specify an alternate path to libStatGen specific for your program. In most cases you will not need to do this. |
| + | #* No other updates to Makefile.inc should be necessary. |
| + | # Add your program (cpp & h files) to the <code>src</code> directory. |
| + | # Update src/Makefile |
| + | ## Set EXE to your program executable (replacing sampleProgram) |
| + | ## Set TOOLBASE, SRCONLY, and HDRONLY as appropriate for specifying your program file names. |
| + | ## Set any of the other optional settings as specified in the sample makefile. |
| + | #* No other changes should be necessary to src/Makefile. |
| + | # Add your tests to the <code>test</code> directory. |
| + | # Update test/Makefile as appropriate for specifying how to compile/run your tests. |
| + | |
| + | |
| + | After compiling a <code>bin</code> directory is created in the top level directory. Your executable goes in there. If you build for <code>debug</code> and/or <code>profile</code>, subdirectories for those are created under <code>bin/</code> and <code>obj</code>. |
| + | |
| + | |
| + | === Working from Scratch === |
| + | When compiling your code, be sure to include the library header files found in libStatgen/include/ and link in the appropriate library (opt: libStatGen.a, debug: libStatGen_debug.a, or profile: libStatGen_profile.a). |
| + | |
| + | |
| + | === Starting from a Sample Set of Tools === |
| + | [https://github.com/statgen/SampleTools https://github.com/statgen/SampleTools] is a repository containing multiple programs within one directory structure. It demonstrates how to have subdirectories for each tool using libStatGen and can be used as a starting point for your set of tools. |
| + | |
| + | SampleTools has 3 subdirectories: |
| + | * copyrights - contains the copyright information, add your own copyrights as necessary |
| + | * SampleProgram1 - a dummy demo program to show the structure for having multiple programs |
| + | * SampleProgram2 - a second dummy demo program to show the structure for having multiple programs |
| + | |
| + | SampleProgram1 & SampleProgram2 have 2 subdirectories: |
| + | * src - this is where your own program code goes |
| + | * test - this is where your test code goes. Test code can be setup to run with <code>make test</code> to ensure the program works properly. |
| + | |
| + | Upon compiling, an <code>obj</code> directory is created under <code>SampleProgram1</code> and <code>SampleProgram2</code> and a <code>bin</code> directory is created at the top level. If you build for <code>debug</code> and/or <code>profile</code>, subdirectories for those are created under <code>bin/</code> and <code>SampleProgram1(2)/obj</code>. |
| + | |
| + | |
| + | '''Using SampleTools as a starting point for your set of tools:''' |
| + | # Copy <code>SampleTools</code> into a directory with your toolset name (it is the starting point for your own set of tools). |
| + | # Update <code>ChangeLog</code>, <code>.gitignore</code>, and <code>README.txt</code> as appropriate. |
| + | # Add any necessary copyrights to the copyrights directory. |
| + | # Rename the <code>SampleProgram1</code> and <code>SampleProgram2</code> directories |
| + | # Create any additional directories as necessary. |
| + | #* Recursively copy the structure/Makefiles from <code>SampleProgram1</code>. |
| + | # Update <code>SUBDIRS</code> in <code>Makefile</code> as necessary. |
| + | # Update <code>Makefile.inc</code> |
| + | ## Update the <code>VERSION</code> as necessary. |
| + | ## Replace all occurrences of <code>SAMPLE_PROGRAM</code> with an all caps name for your toolset. |
| + | ##* You can then use the <code>LIB_PATH_<your toolset name></code> environment variable to specify an alternate path to libStatGen specific for your program. In most cases you will not need to do this. |
| + | #* No other updates to <code>Makefile.inc</code> should be necessary. |
| + | # For each Program you want to add: |
| + | ## Move into the appropriate subdirectory. |
| + | ##* No change should be made to the program's <code>Makefile</code> |
| + | ## Add your program (cpp & h files) to the <code>src</code> subdirectory. |
| + | ## Update src/Makefile |
| + | ### Set EXE to your program executable (replacing sampleProgram) |
| + | ### Set TOOLBASE, SRCONLY, and HDRONLY as appropriate for specifying your program file names. |
| + | ### Set any of the other optional settings as specified in the sample makefile. |
| + | ##* No other changes should be necessary to src/Makefile. |
| + | ## Add your tests to the <code>test</code> directory. |
| + | ## Update test/Makefile as appropriate for specifying how to compile/run your tests. |
| + | |
| + | |
| + | = How To Use the APIs = |
| + | More coming soon, see: http://genome.sph.umich.edu/wiki/Sam_Library_Usage_Examples |
| + | |
| + | [[LibStatGen: ASP#API for Reading ASP Files| ASP APIs]] |
| + | |
| + | [[LibStatGen: VCF#API for Reading VCF Files| VCF APIs]] |