Line 1: |
Line 1: |
| + | [[Category:Software]] |
| + | [[Category:C++]] |
| + | |
| =Software= | | =Software= |
− | Due to increasing volume of next generation sequencing and genotyping data, we have created these created C++ library and tools that use that library. | + | Due to increasing volume of next generation sequencing and genotyping data, we have created these C++ library and tools that use that library. |
| | | |
| This page points to downloads, documentation, and papers for software that is written here at the [http://genome.sph.umich.edu Center for Statistical Genetics] | | This page points to downloads, documentation, and papers for software that is written here at the [http://genome.sph.umich.edu Center for Statistical Genetics] |
| + | |
| + | If you have any questions or comments, please raise issue in [https://github.com/statgen/ our GitHub repositories]. |
| | | |
| =StatGen C++ Software= | | =StatGen C++ Software= |
− | A library and set of set of tools developed for handling and analyzing next generation sequencing and genotyping data.
| |
− |
| |
− | == Download ==
| |
| | | |
| + | We have developed a C++ library and tools for handling and analyzing next generation sequencing and genotyping data. |
| | | |
| == Library == | | == Library == |
− | * [[C++ Library: libStatGen]] - Library containing easy-to-use APIs for developing tools for processing and analyzing next generation sequencing and genotyping data. Allows easy processing of SAM/BAM, GLF, FASTQ.
| |
| | | |
| + | The library contains easy-to-use APIs for developing tools for processing and analyzing next generation sequencing and genotyping data. Allows easy processing of SAM/BAM, GLF, and FASTQ (VCF is coming). |
| | | |
− | == Tools ==
| + | More information on the library can be found at: [[C++ Library: libStatGen]] |
− | === SAM/BAM ===
| |
| | | |
− | ==== General Tools ====
| + | The library can be downloaded at: [[libStatGen Download]] |
− | *[[QPLOT]] - Calculate & plot summary statistics
| |
− | *[[BamValidator]] – Check file format & print statistics
| |
− | *[[C++ Executable: bam#convert|Convert]] – Convert between SAM & BAM
| |
− | *[[C++ Executable: bam#writeRegion|WriteRegion]] – Write only reads in the specified region
| |
− | *[[Pileup]] – Pileup every base or just bases in specified region and write VCF - <span style="color:#D2691E">Coming Soon</span>
| |
− | *[[C++ Executable: bam#readIndexedBam|ReadIndexedBam]] - Read an indexed BAM file reference by reference id -1 to the max reference id and write it out as a SAM/BAM file
| |
| | | |
| + | == Programs/Tools == |
| | | |
− | ==== Update the File ====
| + | Follow the program links for more information on obtaining the tool. Some tools are packaged together. |
− | *[[RGMergeBam]] – Merge sorted BAM files adding Read Groups
| |
− | *[[PolishBam]] – Add/Update header lines & add RG tag to each record
| |
− | *[[TrimBam]] – Trim end of reads, changing read ends to ‘N’ & quality to ‘!’
| |
− | *[[C++ Executable: bam#filter|Filter]] – Soft clip ends with too high mismatch % and mark unmapped if quality of mismatches is too high
| |
| | | |
| + | === SAM/BAM === |
| | | |
− | ==== Split the File ====
| + | *[[QPLOT]] - Calculate & plot summary statistics |
− | *[[SplitBam]] – Split into 1 file per Read Group | + | *[[VerifyBamID]] – Check sample identities for contamination/sample swap |
− | *[[C++ Executable: bam#splitChromosome|SplitChromosome]] – Split into 1 file per Chromosome | + | **Genotype concordance based detection |
− | | + | **Estimate based on population allele frequencies without genotype data |
− | | + | *[[Pileup]] – Pileup every base or just bases in specified region and write VCF |
− | ==== Helper Tools to Print Readable Information ====
| |
− | *[[C++ Executable: bam#dumpHeader|DumpHeader]] - Print the File Header to the screen. | |
− | *[[C++ Executable: bam#dumpRefInfo|DumpRefInfo]] - Print the reference information from the SAM/BAM header. | |
− | *[[C++ Executable: bam#dumpIndex|DumpIndex]] - Print the BAM Index to the screen in a readable format | |
− | *[[C++ Executable: bam#readReference|ReadReference]] - Print the reference string for the specified region to the screen.
| |
− | | |
| | | |
| + | ==== BAM Util Tools ==== |
| + | {{BamUtilPrograms}} |
| | | |
| === FASTQ === | | === FASTQ === |
Line 51: |
Line 41: |
| **Reports Base Composition Statistics (%reads at each read index) | | **Reports Base Composition Statistics (%reads at each read index) |
| | | |
| + | |
| + | === Meta Analysis === |
| + | * [[Rare-Metal-Worker|RAREMETALWORKER - generate summary level statistics for meta analysis using Rare-Metal]] |
| + | * [[Rare-Metal|RAREMETAL - perform genome-wide meta analysis of rare variants]] |
| | | |
| === Other Tools === | | === Other Tools === |
| + | *[[statgenTools#createUMref|createUMref - Create the University of Michigan formatted reference used by many of our tools]] |
| + | *[[Thunder|thunderVCF]] |
| + | *[[vcfCooker]] – Manipulate, filter, summarize VCF/BED file in various forms |
| *[[VcfGenomeStat]] – Print flanking sequences and how often they appear for input VCF file | | *[[VcfGenomeStat]] – Print flanking sequences and how often they appear for input VCF file |
| | | |
− | =Other Tools=
| |
| | | |
− | == [[Read Mapping]] ==
| |
− | *[[Karma|Karma]] - Our fast short read aligner, which generates [[Mapping Quality Scores]]
| |
− | *[[Karma-colorspace|Karma-ColorSpace]] - QUICKSTART on mapping color space reads
| |
− | *[[baseQualityCheck]] - a mature tool to calculate the observed base quality vs. empirical base quality (helps to evaluate mappers)
| |
| | | |
− | *[[Examples|Examples]] - Sample command lines with discussion
| + | === Requested Tools === |
− | | |
− | *[[MapabilityScores]] - Definitions of various mappability scores adopted at UCSC genome browser.
| |
| | | |
| + | =Other Tools= |
| | | |
− | | + | * [[samtools-hybrid]] - Since many of our tools still rely on GLF files and samtools stopped supporting GLF files, we created a version of samtools that still supports pileup to GLF files AND incorporates the updated BAQ logic. This version is called samtools-hybrid That code can be downloaded at: https://github.com/statgen/samtools-0.1.7a-hybrid |
− | ==SAM/BAM==
| + | *[[baseQualityCheck]] - tool to calculate the observed base quality vs. empirical base quality (helps to evaluate mappers) |
− | *[[VerifyBamID]] – Check sample identities for contamination/sample swap | |
− | **Genotype concordance based detection
| |
− | **Estimate based on population allele frequencies without genotype data
| |
− | *Recalibrator – Resource-efficient tool, which recalibrates base qualities based on an adaptive logistic regression model - <span style="color:#D2691E">Available upon request</span>
| |
− | *Deduper – Mark or remove duplicates - <span style="color:#D2691E">Coming Soon</span> | |
| | | |
| == Variant Calling == | | == Variant Calling == |
| * [[glfSingle]] - Variant calling for a single, deeply sequenced individual | | * [[glfSingle]] - Variant calling for a single, deeply sequenced individual |
− | * [[glfTrio]]- Variant calling for a single, deeply sequenced nuclear family with two parents and one child
| |
| * [[glfMultiples]] - Variant calling for multiple, unrelated individuals | | * [[glfMultiples]] - Variant calling for multiple, unrelated individuals |
| + | * [[Polymutt|polymutt]] - Variant and ''de novo'' mutation detection in families (nuclear or extended pedigrees) from sequencing |
| | | |
| == Variant Annotation == | | == Variant Annotation == |
| *[[vcfCodingSnps]] - Annotate coding variants in a VCF file. | | *[[vcfCodingSnps]] - Annotate coding variants in a VCF file. |
| | | |
− | == Quality Control == | + | == Genotype Imputation == |
− | *[[GenotypeIDcheck]] - Check that mapped reads are consistent with known genotypes for each individual. | + | *[[Minimac3]] - Fast and Efficient Genotype Imputation. |
| | | |
− | == File Conversion == | + | == Additional Pedigree & Sequence Analysis Tools == |
− | *[[bam2FastQ]] - Convert BAM files into FastQ files
| + | Can be found at: http://sph.umich.edu/csg/abecasis/software.html |
| | | |
| + | = Other Useful Links = |
| + | [[Links to Sequence Analysis Tools]] |
| | | |
− | = [[Links to Sequence Analysis Tools|Other Useful Links]] = | + | = Other = |
| + | ASHG 2010 Poster: [[Media:TrostASHG2010.pdf|C++ library & tools for next generation sequence data]] |