Changes

From Genome Analysis Wiki
Jump to navigationJump to search
11,277 bytes added ,  15:54, 31 January 2017
Line 1: Line 1: −
To use the StatGenLibrary, first download:
+
[[Category:C++]]
 +
[[Category:libStatGen]]
 +
 
 +
= Description =
 +
Open source, freely available (GPL license), easy to use C++ APIs
 +
* General Operation Classes including:
 +
** File/Stream I/O – uncompressed, BGZF, GZIP, stdin, stdout
 +
** String processing
 +
** Parameter Parsing
 +
* '''Statistical Genetic Specific Classes''' including:
 +
**Handling Common file formats – SAM/BAM, FASTQ, GLF, VCF (coming soon)
 +
***Accessors to get/set values
 +
***Indexed access to BAM files
 +
**Utility classes, including:
 +
***Cigar – interpretation and mapping between query and reference
 +
***Pileup – structured access to data by individual reference position
 +
 
 +
Can be used to create your own C++ programs.
 +
 
 +
Currently the repository is recommended for Unix/Linux users with access to the GNU C++ compiler.
 +
 
 +
 
 +
= Copyrights =
 +
'''If you use this software, please e-mail me, Mary Kate Wing, at mktrost@umich.edu'''
 +
 
 +
Here are links to the copyrights for our code and some of the utilities it uses:
 +
*[https://github.com/statgen/libStatGen/blob/master/general/COPYING GNU GENERAL PUBLIC LICENSE] and [https://github.com/statgen/libStatGen/blob/master/general/LICENSE.txt Our Copyright Note]
 +
*[https://github.com/statgen/libStatGen/blob/master/general/LICENSE.twister Copyright for MERSENNE TWISTER (used in Random.cpp)]
 +
*[https://github.com/statgen/libStatGen/blob/master/samtools/COPYING Samtools Copyright (MIT License)]
 +
Copies of these can be found in our library under libStatGen/copyrights/.
 +
 
 +
= Join in libStatGen mailing list =
 +
 
 +
Please join in the [http://groups.google.com/group/libStatGen libStatGen Google Group] to ask / discuss / comment about this library.
 +
 
 +
 
 +
= Troubleshooting =
 +
If you are having trouble compiling any of the versions, check [[libStatGen Troubleshooting]] for help.  If that does not solve your problem, email me for support.
 +
 
 +
 
 +
= Where to Find It =
 +
 
 +
{{ToolGitRepo|repoName=libStatGen|libStatGen=true|libBaseName=libStatGen}}
 +
 
 +
== Releases ==
 +
Released Versions are documented at [[libStatGen Download]]
 +
 
 +
= What has changed =
 +
The <code>pipeline</code> and <code>statgen</code> repositories have been deprecated, so please update to our new framework.
 +
 
 +
<code>libStatGen</code> is the new git repository for our library code. 
 +
 
 +
There are now separate repositories for specific tools/groups of tools, allowing us to track everything separately so it is easier to follow changes that impact a specific tool or the library in general.
 +
 
 +
 
 +
= Library Documentation =
 +
Latest Doxygen documentation:
 +
<!-- <a href="http://csg.sph.umich.edu//abecasis/GOLD/ -->
 +
<!-- [http://www.sph.umich.edu/csg/mktrost/doxygen/current/ Current Library Documentation in Doxygen] -->
 +
[http://csg.sph.umich.edu//mktrost/doxygen/current/ Current Library Documentation in Doxygen]
 +
 
 +
Additional documentation:
 +
* [[libStatGen: general]] - General classes for file processing and performing common tasks (used by most other libraries).
 +
* [[libStatGen: BAM]] - Classes specific for reading/writing/analyzing SAM/BAM files.
 +
* [[libStatGen: GLF]] - Classes specific for reading/writing/analyzing GLF files.
 +
* [[libStatGen: FASTQ]] - Classes specific for reading/writing/analyzing FastQ files.
 +
* [[libStatGen: ASP]] - Classes specific for reading/writing/analyzing ASP files.
 +
* [[libStatGen: VCF]] - Classes specific for reading/writing/analyzing VCF files.
 +
 
 +
= Using the Library =
 +
== Dependencies ==
 +
* This software requires the following to be installed:
 +
** g++
 +
** development version of zlib (zlib1g-dev on ubuntu)
 +
* Compiles on Linux/Unix
 +
 
 +
== Building the Library ==
 +
 
 +
If you type make help, you get the build options.
 +
<pre>
 +
Makefile help
 +
-------------
 +
Type...          To...
 +
make              Compile opt
 +
make help        Display this help screen
 +
make all          Compile everything (opt, debug, & profile)
 +
make opt          Compile optimized
 +
make debug        Compile for debug
 +
make profile      Compile for profile
 +
make clean        Delete temporary files
 +
make test        Execute tests (if there are any)
 +
</pre>
 +
 
 +
When you just type make, it will by default to make opt (optimized).
 +
 
 +
Make all indicates opt, debug, and profile.
 +
 
 +
opt creates <code>libStatGen.a</code>, debug creates <code>libStatGen_debug.a</code>, profile creates <code>libStatGen_profile.a</code>
 +
 
 +
These libraries are created in the top level libStatGen directory and can then be linked to appropriately for building tools as optimized, debugging, and/or profiling.
 +
 
 +
== Navigating the Library Subdirectories ==
 +
Under the main libStatGen repository, there are:
 +
*bam - library code for operating on bam files.
 +
*copyrights - copyrights for the library and any code included with it.
 +
*fastq - library code for operating on fastq files.
 +
*general - library code for general operations
 +
*glf - library code for operating on glf files.
 +
*include - after compiling, the library headers are linked here
 +
*Makefiles - directory containing Makefiles that are used in the library and can be used for developing programs using the library
 +
*samtools - library code used from samtools
 +
 
 +
After Compiling: libStatGen.a, libStatGen_debug.a, libStatGen_profile.a are created at the top level.
 +
 
 +
=== bam, fastq, general, glf, samtools ===
 +
Object files are placed in an obj directory under each subdirectory with debug & profile objects in obj/debug and obj/profile.
 +
 
 +
Most also have a test directory. Tests are executed by running <code>make test</code>
 +
 
 +
=== Makefiles ===
 +
This directory contains base makefiles and makefile settings that are used by the library and by programs being written to use the library.
 +
 
 +
== Using the Library in Your Own Program ==
 +
 
 +
=== Starting from a Sample Program (Recommended) ===
 +
[https://github.com/statgen/SampleProgram https://github.com/statgen/SampleProgram] is a simple program demonstrating how to write a tool that uses libStatGen and can be used as a starting point for your tool. 
 +
 
 +
SampleProgram has 4 subdirectories:
 +
* copyrights - contains the copyright information, add your own copyrights as necessary
 +
* obj - this directory is where the object files are placed when the code is compiled (with a subdirectory for debug and profile objects)
 +
* src - this is where your own program code goes
 +
* test - this is where your test code goes.  Test code can be setup to run with <code>make test</code> to ensure the program works properly.
 +
 
 +
'''Using SampleProgram as a starting point for your tool:'''
 +
# Copy SampleProgram into a directory with your program name (it is the starting point for your own program).
 +
# Update ChangeLog, .gitignore, and README.txt as appropriate.
 +
# Add any necessary copyrights to the copyrights directory.
 +
#* No changes to Makefile should be necessary.
 +
# Update Makefile.inc
 +
## Update the VERSION as necessary.
 +
## Replace all occurrences of <code>SAMPLE_PROGRAM</code> with an all caps name for your program.
 +
##*  You can then use the <code>LIB_PATH_<your program name></code> environment variable to specify an alternate path to libStatGen specific for your program.  In most cases you will not need to do this.
 +
#* No other updates to Makefile.inc should be necessary.
 +
# Add your program (cpp & h files) to the <code>src</code> directory.
 +
# Update src/Makefile
 +
## Set EXE to your program executable (replacing sampleProgram)
 +
## Set TOOLBASE, SRCONLY, and HDRONLY as appropriate for specifying your program file names.
 +
## Set any of the other optional settings as specified in the sample makefile.
 +
#* No other changes should be necessary to src/Makefile.
 +
# Add your tests to the <code>test</code> directory.
 +
# Update test/Makefile as appropriate for specifying how to compile/run your tests.
 +
 
 +
 
 +
After compiling a <code>bin</code> directory is created in the top level directory.  Your executable goes in there.  If you build for <code>debug</code> and/or <code>profile</code>, subdirectories for those are created under <code>bin/</code> and <code>obj</code>.
 +
 
 +
 
 +
=== Working from Scratch ===
 +
When compiling your code, be sure to include the library header files found in libStatgen/include/ and link in the appropriate library (opt: libStatGen.a, debug: libStatGen_debug.a, or profile: libStatGen_profile.a).
 +
 
 +
 
 +
=== Starting from a Sample Set of Tools ===
 +
[https://github.com/statgen/SampleTools https://github.com/statgen/SampleTools] is a repository containing multiple programs within one directory structure.  It demonstrates how to have subdirectories for each tool using libStatGen and can be used as a starting point for your set of tools. 
 +
 
 +
SampleTools has 3 subdirectories:
 +
* copyrights - contains the copyright information, add your own copyrights as necessary
 +
* SampleProgram1 - a dummy demo program to show the structure for having multiple programs
 +
* SampleProgram2 - a second dummy demo program to show the structure for having multiple programs
 +
 
 +
SampleProgram1 & SampleProgram2 have 2 subdirectories:
 +
* src - this is where your own program code goes
 +
* test - this is where your test code goes.  Test code can be setup to run with <code>make test</code> to ensure the program works properly.
 +
 
 +
Upon compiling, an <code>obj</code> directory is created under <code>SampleProgram1</code> and <code>SampleProgram2</code> and a <code>bin</code> directory is created at the top level.  If you build for <code>debug</code> and/or <code>profile</code>, subdirectories for those are created under <code>bin/</code> and <code>SampleProgram1(2)/obj</code>.
 +
 
 +
 
 +
'''Using SampleTools as a starting point for your set of tools:'''
 +
# Copy <code>SampleTools</code> into a directory with your toolset name (it is the starting point for your own set of tools).
 +
# Update <code>ChangeLog</code>, <code>.gitignore</code>, and <code>README.txt</code> as appropriate.
 +
# Add any necessary copyrights to the copyrights directory.
 +
# Rename the <code>SampleProgram1</code> and <code>SampleProgram2</code> directories
 +
# Create any additional directories as necessary.
 +
#* Recursively copy the structure/Makefiles from <code>SampleProgram1</code>.
 +
# Update <code>SUBDIRS</code> in <code>Makefile</code> as necessary.
 +
# Update <code>Makefile.inc</code>
 +
## Update the <code>VERSION</code> as necessary.
 +
## Replace all occurrences of <code>SAMPLE_PROGRAM</code> with an all caps name for your toolset.
 +
##*  You can then use the <code>LIB_PATH_<your toolset name></code> environment variable to specify an alternate path to libStatGen specific for your program.  In most cases you will not need to do this.
 +
#* No other updates to <code>Makefile.inc</code> should be necessary.
 +
# For each Program you want to add:
 +
## Move into the appropriate subdirectory.
 +
##* No change should be made to the program's <code>Makefile</code>
 +
## Add your program (cpp & h files) to the <code>src</code> subdirectory.
 +
## Update src/Makefile
 +
### Set EXE to your program executable (replacing sampleProgram)
 +
### Set TOOLBASE, SRCONLY, and HDRONLY as appropriate for specifying your program file names.
 +
### Set any of the other optional settings as specified in the sample makefile.
 +
##* No other changes should be necessary to src/Makefile.
 +
## Add your tests to the <code>test</code> directory.
 +
## Update test/Makefile as appropriate for specifying how to compile/run your tests.
 +
 
 +
 
 +
= How To Use the APIs =
 +
More coming soon, see: http://genome.sph.umich.edu/wiki/Sam_Library_Usage_Examples
 +
 
 +
[[LibStatGen: ASP#API for Reading ASP Files| ASP APIs]]
 +
 
 +
[[LibStatGen: VCF#API for Reading VCF Files| VCF APIs]]
96

edits

Navigation menu