Difference between revisions of "Software"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 1: Line 1:
 +
[[Category:Software]]
 +
[[Category:C++]]
 +
 
=Software=
 
=Software=
 
Due to increasing volume of next generation sequencing and genotyping data, we have created these created C++ library and tools that use that library.
 
Due to increasing volume of next generation sequencing and genotyping data, we have created these created C++ library and tools that use that library.

Revision as of 02:10, 2 November 2010


Software

Due to increasing volume of next generation sequencing and genotyping data, we have created these created C++ library and tools that use that library.

This page points to downloads, documentation, and papers for software that is written here at the Center for Statistical Genetics

StatGen C++ Software

A library and set of set of tools developed for handling and analyzing next generation sequencing and genotyping data.

Download

Click the link to download the tar of the statgen library and tools: StatGen.0.1.0.tgz

If you use this software, please e-mail me, Mary Kate Trost, at mktrost@umich.edu

This version is recommended for Unix users with access to the GNU C++ compiler.

To install, unpack the downloaded file (tar xvf) and type make.

Library

  • C++ Library: libStatGen - Library containing easy-to-use APIs for developing tools for processing and analyzing next generation sequencing and genotyping data. Allows easy processing of SAM/BAM, GLF, FASTQ.


Tools

SAM/BAM

General Tools

  • QPLOT - Calculate & plot summary statistics
  • BamValidator – Check file format & print statistics
  • Convert – Convert between SAM & BAM
  • WriteRegion – Write only reads in the specified region
  • Pileup – Pileup every base or just bases in specified region and write VCF - Coming Soon
  • ReadIndexedBam - Read an indexed BAM file reference by reference id -1 to the max reference id and write it out as a SAM/BAM file


Update the File

  • RGMergeBam – Merge sorted BAM files adding Read Groups
  • PolishBam – Add/Update header lines & add RG tag to each record
  • TrimBam – Trim end of reads, changing read ends to ‘N’ & quality to ‘!’
  • Filter – Soft clip ends with too high mismatch % and mark unmapped if quality of mismatches is too high


Split the File


Helper Tools to Print Readable Information

  • DumpHeader - Print the File Header to the screen.
  • DumpRefInfo - Print the reference information from the SAM/BAM header.
  • DumpIndex - Print the BAM Index to the screen in a readable format
  • ReadReference - Print the reference string for the specified region to the screen.


FASTQ

  • fastqValidator - validate a FASTQ file
    • Reports errors for badly formatted files
    • Reports Base Composition Statistics (%reads at each read index)


Other Tools

  • VcfGenomeStat – Print flanking sequences and how often they appear for input VCF file

Other Tools

Read Mapping

  • Examples - Sample command lines with discussion
  • MapabilityScores - Definitions of various mappability scores adopted at UCSC genome browser.


SAM/BAM

  • VerifyBamID – Check sample identities for contamination/sample swap
    • Genotype concordance based detection
    • Estimate based on population allele frequencies without genotype data
  • Recalibrator – Resource-efficient tool, which recalibrates base qualities based on an adaptive logistic regression model - Available upon request
  • Deduper – Mark or remove duplicates - Coming Soon

Variant Calling

  • glfSingle - Variant calling for a single, deeply sequenced individual
  • glfTrio- Variant calling for a single, deeply sequenced nuclear family with two parents and one child
  • glfMultiples - Variant calling for multiple, unrelated individuals

Variant Annotation

Quality Control

  • GenotypeIDcheck - Check that mapped reads are consistent with known genotypes for each individual.

File Conversion

  • bam2FastQ - Convert BAM files into FastQ files


Other Useful Links