Difference between revisions of "Software"
(→Rewrite SAM/BAM file: Add BAM recovery) |
(add email address) |
||
Line 6: | Line 6: | ||
This page points to downloads, documentation, and papers for software that is written here at the [http://genome.sph.umich.edu Center for Statistical Genetics] | This page points to downloads, documentation, and papers for software that is written here at the [http://genome.sph.umich.edu Center for Statistical Genetics] | ||
+ | |||
+ | If you have any questions or comments, please email Mary Kate Trost (mktrost@umich.edu). | ||
=StatGen C++ Software= | =StatGen C++ Software= |
Revision as of 16:39, 12 October 2011
Software
Due to increasing volume of next generation sequencing and genotyping data, we have created these created C++ library and tools that use that library.
This page points to downloads, documentation, and papers for software that is written here at the Center for Statistical Genetics
If you have any questions or comments, please email Mary Kate Trost (mktrost@umich.edu).
StatGen C++ Software
We have developed a C++ library and tools for handling and analyzing next generation sequencing and genotyping data.
Library
The library contains easy-to-use APIs for developing tools for processing and analyzing next generation sequencing and genotyping data. Allows easy processing of SAM/BAM, GLF, and FASTQ (VCF is coming).
More information on the library can be found at: C++ Library: libStatGen
The library can be downloaded at: libStatGen Download
Programs/Tools
Follow the program links for more information on obtaining the tool. Some tools are packaged together.
SAM/BAM
General Tools
- QPLOT - Calculate & plot summary statistics
- Validate – Check file format & print statistics
- VerifyBamID – Check sample identities for contamination/sample swap
- Genotype concordance based detection
- Estimate based on population allele frequencies without genotype data
- Diff - Print the diffs between 2 bams
- Stats - Generate some statistics for a SAM/BAM file
Rewrite SAM/BAM file
- Convert – Convert between SAM & BAM
- SplitBam – Split into 1 file per Read Group
- SplitChromosome – Split into 1 file per Chromosome
- WriteRegion – Write only reads in the specified region and/or have the specified read name
- Pileup – Pileup every base or just bases in specified region and write VCF
- ReadIndexedBam - Read an indexed BAM file reference by reference id -1 to the max reference id and write it out as a SAM/BAM file
- BAM Recovery - Recover corrupted BAM files
Update the File
- SuperDeDuper - Determine duplicate alignments, either marking or removing the lower quality duplicates. In addition, it may modify paired-end reads where the ends overlap by soft clipping the end with the lower quality bases in the region of overlap.
- RGMergeBam – Merge sorted BAM files adding Read Groups
- PolishBam – Add/Update header lines & add RG tag to each record
- TrimBam – Trim end of reads, changing read ends to ‘N’ & quality to ‘!’
- Filter – Soft clip ends with too high mismatch % and mark unmapped if quality of mismatches is too high
- Revert - Revert SAM/BAM replacing the specified fields with their previous values (if known) and removes specified tags
- Squeeze - Reduce files size by dropping OQ fields, duplicates, specified tags, using '=' when a base matches the reference, binning quality scores, and replacing readNames with unique integers
Helper Tools to Print Readable Information
- DumpHeader - Print the File Header to the screen.
- DumpRefInfo - Print the reference information from the SAM/BAM header.
- DumpIndex - Print the BAM Index to the screen in a readable format
- ReadReference - Print the reference string for the specified region to the screen.
FASTQ
- fastqValidator - validate a FASTQ file
- Reports errors for badly formatted files
- Reports Base Composition Statistics (%reads at each read index)
Other Tools
- createUMref - Create the University of Michigan formatted reference used by many of our tools
- thunderVCF
- vcfCooker – Manipulate, filter, summarize VCF/BED file in various forms
- VcfGenomeStat – Print flanking sequences and how often they appear for input VCF file
Requested Tools
Other Tools
Read Mapping
- Karma - Our fast short read aligner, which generates Mapping Quality Scores
- Karma-ColorSpace - QUICKSTART on mapping color space reads
- baseQualityCheck - a mature tool to calculate the observed base quality vs. empirical base quality (helps to evaluate mappers)
- Examples - Sample command lines with discussion
- MapabilityScores - Definitions of various mappability scores adopted at UCSC genome browser.
SAM/BAM
- Recalibrator – Resource-efficient tool, which recalibrates base qualities based on an adaptive logistic regression model - Available upon request
- Deduper – Mark or remove duplicates - Coming Soon
Variant Calling
- glfSingle - Variant calling for a single, deeply sequenced individual
- glfTrio- Variant calling for a single, deeply sequenced nuclear family with two parents and one child
- glfMultiples - Variant calling for multiple, unrelated individuals
- polymutt - Variant and de novo mutation detection in families (nuclear or extended pedigrees) from sequencing
Variant Annotation
- vcfCodingSnps - Annotate coding variants in a VCF file.
Quality Control
- GenotypeIDcheck - Check that mapped reads are consistent with known genotypes for each individual.
File Conversion
- bam2FastQ - Convert BAM files into FastQ files
Other Useful Links
Links to Sequence Analysis Tools
Other
ASHG 2010 Poster: C++ library & tools for next generation sequence data