Line 1: |
Line 1: |
− | = Software Page Overview = | + | =Software= |
| + | Due to increasing volume of next generation sequencing and genotyping data, we have created these created C++ library and tools that use that library. |
| | | |
| This page points to downloads, documentation, and papers for software that is written here at the [http://genome.sph.umich.edu Center for Statistical Genetics] | | This page points to downloads, documentation, and papers for software that is written here at the [http://genome.sph.umich.edu Center for Statistical Genetics] |
| | | |
| + | =StatGen C++ Software= |
| + | A library and set of set of tools developed for handling and analyzing next generation sequencing and genotyping data. |
| | | |
− | = [[Read Mapping]] = | + | == Download == |
| | | |
− | ==[[Karma|Karma]]==
| |
− | Our fast short read aligner, which generates [[Mapping Quality Scores]]
| |
| | | |
− | ==[[Karma-colorspace|Karma-ColorSpace]]== | + | == Library == |
− | QUICKSTART on mapping color space reads
| + | * [[C++ Library: libStatGen]] - Library containing easy-to-use APIs for developing tools for processing and analyzing next generation sequencing and genotyping data. Allows easy processing of SAM/BAM, GLF, FASTQ. |
| | | |
− | ==[[Examples|Examples]]==
| |
− | Sample command lines with discussion
| |
| | | |
− | ==[[MapabilityScores]]== | + | == Tools == |
− | Definitions of various mappability scores adopted at UCSC genome browser.
| + | === SAM/BAM === |
| | | |
− | ==Evaluation of Mappers== | + | ==== General Tools ==== |
− | [[baseQualityCheck]] is a mature tool to calculate the observed base quality vs. empirical base quality. | + | *[[QPLOT]] - Calculate & plot summary statistics |
| + | *[[BamValidator]] – Check file format & print statistics |
| + | *[[C++ Executable: bam#convert|Convert]] – Convert between SAM & BAM |
| + | *[[C++ Executable: bam#writeRegion|WriteRegion]] – Write only reads in the specified region |
| + | *[[Pileup]] – Pileup every base or just bases in specified region and write VCF - <span style="color:#D2691E">Coming Soon</span> |
| + | *[[C++ Executable: bam#readIndexedBam|ReadIndexedBam]] - Read an indexed BAM file reference by reference id -1 to the max reference id and write it out as a SAM/BAM file |
| | | |
− | = Variant Calling =
| |
| | | |
− | ==[[glfSingle]]== | + | ==== Update the File ==== |
− | Variant calling for a single, deeply sequenced individual
| + | *[[RGMergeBam]] – Merge sorted BAM files adding Read Groups |
| + | *[[PolishBam]] – Add/Update header lines & add RG tag to each record |
| + | *[[TrimBam]] – Trim end of reads, changing read ends to ‘N’ & quality to ‘!’ |
| + | *[[C++ Executable: bam#filter|Filter]] – Soft clip ends with too high mismatch % and mark unmapped if quality of mismatches is too high |
| | | |
− | ==[[glfTrio]]==
| |
− | Variant calling for a single, deeply sequenced nuclear family with two parents and one child
| |
| | | |
− | ==[[glfMultiples]]== | + | ==== Split the File ==== |
− | Variant calling for multiple, unrelated individuals
| + | *[[SplitBam]] – Split into 1 file per Read Group |
| + | *[[C++ Executable: bam#splitChromosome|SplitChromosome]] – Split into 1 file per Chromosome |
| | | |
− | = Variant Annotation =
| |
| | | |
− | ==[[vcfCodingSnps]]== | + | ==== Helper Tools to Print Readable Information ==== |
− | Annotate coding variants in a VCF file.
| + | *[[C++ Executable: bam#dumpHeader|DumpHeader]] - Print the File Header to the screen. |
| + | *[[C++ Executable: bam#dumpRefInfo|DumpRefInfo]] - Print the reference information from the SAM/BAM header. |
| + | *[[C++ Executable: bam#dumpIndex|DumpIndex]] - Print the BAM Index to the screen in a readable format |
| + | *[[C++ Executable: bam#readReference|ReadReference]] - Print the reference string for the specified region to the screen. |
| | | |
− | = Quality Control Utilities =
| |
| | | |
− | == Validators ==
| |
| | | |
− | [[C++ Executable: fastQValidator|FastQValidator]] -- Check that a FASTQ file conforms to specification. | + | === FASTQ === |
| + | * [[FastQValidator|fastqValidator]] - validate a FASTQ file |
| + | **Reports errors for badly formatted files |
| + | **Reports Base Composition Statistics (%reads at each read index) |
| | | |
− | [[GenotypeIDcheck]] -- Check that mapped reads are consistent with known genotypes for each individual.
| |
| | | |
− | [[BamValidator]] -- Checks that a SAM/BAM file conforms to specification and generates some statistics on the file. | + | === Other Tools === |
| + | *[[VcfGenomeStat]] – Print flanking sequences and how often they appear for input VCF file |
| | | |
− | == File Readers == | + | =Other Tools= |
| | | |
− | [[C++ Library: libbam|BamFile]] -- Reads a BAM/SAM file. | + | == [[Read Mapping]] == |
| + | *[[Karma|Karma]] - Our fast short read aligner, which generates [[Mapping Quality Scores]] |
| + | *[[Karma-colorspace|Karma-ColorSpace]] - QUICKSTART on mapping color space reads |
| + | *[[baseQualityCheck]] - a mature tool to calculate the observed base quality vs. empirical base quality (helps to evaluate mappers) |
| | | |
− | [[C++ Library: libfqf|FastQFile]] -- Read a FASTQ file sequence by sequence. Validating the sequence as it is read. | + | *[[Examples|Examples]] - Sample command lines with discussion |
| + | |
| + | *[[MapabilityScores]] - Definitions of various mappability scores adopted at UCSC genome browser. |
| + | |
| + | |
| + | |
| + | ==SAM/BAM== |
| + | *[[VerifyBamID]] – Check sample identities for contamination/sample swap |
| + | **Genotype concordance based detection |
| + | **Estimate based on population allele frequencies without genotype data |
| + | *Recalibrator – Resource-efficient tool, which recalibrates base qualities based on an adaptive logistic regression model - <span style="color:#D2691E">Available upon request</span> |
| + | *Deduper – Mark or remove duplicates - <span style="color:#D2691E">Coming Soon</span> |
| + | |
| + | == Variant Calling == |
| + | * [[glfSingle]] - Variant calling for a single, deeply sequenced individual |
| + | * [[glfTrio]]- Variant calling for a single, deeply sequenced nuclear family with two parents and one child |
| + | * [[glfMultiples]] - Variant calling for multiple, unrelated individuals |
| + | |
| + | == Variant Annotation == |
| + | *[[vcfCodingSnps]] - Annotate coding variants in a VCF file. |
| + | |
| + | == Quality Control == |
| + | *[[GenotypeIDcheck]] - Check that mapped reads are consistent with known genotypes for each individual. |
| | | |
| == File Conversion == | | == File Conversion == |
− | | + | *[[bam2FastQ]] - Convert BAM files into FastQ files |
− | [[bam2FastQ]] -- Convert BAM files into FastQ files | |
| | | |
| | | |
| = [[Links to Sequence Analysis Tools|Other Useful Links]] = | | = [[Links to Sequence Analysis Tools|Other Useful Links]] = |