Line 3: |
Line 3: |
| | | |
| =Software= | | =Software= |
− | Due to increasing volume of next generation sequencing and genotyping data, we have created these created C++ library and tools that use that library. | + | Due to increasing volume of next generation sequencing and genotyping data, we have created these C++ library and tools that use that library. |
| | | |
| This page points to downloads, documentation, and papers for software that is written here at the [http://genome.sph.umich.edu Center for Statistical Genetics] | | This page points to downloads, documentation, and papers for software that is written here at the [http://genome.sph.umich.edu Center for Statistical Genetics] |
| + | |
| + | If you have any questions or comments, please raise issue in [https://github.com/statgen/ our GitHub repositories]. |
| | | |
| =StatGen C++ Software= | | =StatGen C++ Software= |
Line 25: |
Line 27: |
| === SAM/BAM === | | === SAM/BAM === |
| | | |
− | ==== General Tools ====
| |
| *[[QPLOT]] - Calculate & plot summary statistics | | *[[QPLOT]] - Calculate & plot summary statistics |
− | *[[BamUtil: validate|Validate]] – Check file format & print statistics
| |
| *[[VerifyBamID]] – Check sample identities for contamination/sample swap | | *[[VerifyBamID]] – Check sample identities for contamination/sample swap |
| **Genotype concordance based detection | | **Genotype concordance based detection |
| **Estimate based on population allele frequencies without genotype data | | **Estimate based on population allele frequencies without genotype data |
− | *[[BamUtil: diff|Diff]] - Print the diffs between 2 bams
| |
− | *[[BamUtil: stats|Stats]] - Generate some statistics for a SAM/BAM file
| |
− |
| |
− |
| |
− | ==== Rewrite SAM/BAM file ====
| |
− | *[[BamUtil: convert|Convert]] – Convert between SAM & BAM
| |
− | *[[SplitBam]] – Split into 1 file per Read Group
| |
− | *[[BamUtil: splitChromosome|SplitChromosome]] – Split into 1 file per Chromosome
| |
− | *[[BamUtil: writeRegion|WriteRegion]] – Write only reads in the specified region and/or have the specified read name
| |
| *[[Pileup]] – Pileup every base or just bases in specified region and write VCF | | *[[Pileup]] – Pileup every base or just bases in specified region and write VCF |
− | *[[BamUtil: readIndexedBam|ReadIndexedBam]] - Read an indexed BAM file reference by reference id -1 to the max reference id and write it out as a SAM/BAM file
| |
− |
| |
− |
| |
− | ==== Update the File ====
| |
− | *[[SuperDeDuper]] - Determine duplicate alignments, either marking or removing the lower quality duplicates. In addition, it may modify paired-end reads where the ends overlap by soft clipping the end with the lower quality bases in the region of overlap.
| |
− | *[[RGMergeBam]] – Merge sorted BAM files adding Read Groups
| |
− | *[[PolishBam]] – Add/Update header lines & add RG tag to each record
| |
− | *[[TrimBam]] – Trim end of reads, changing read ends to ‘N’ & quality to ‘!’
| |
− | *[[BamUtil: filter|Filter]] – Soft clip ends with too high mismatch % and mark unmapped if quality of mismatches is too high
| |
− | *[[BamUtil: revert|Revert]] - Revert SAM/BAM replacing the specified fields with their previous values (if known) and removes specified tags
| |
− | *[[BamUtil: squeeze|Squeeze]] - Reduce files size by dropping OQ fields, duplicates, specified tags, using '=' when a base matches the reference, binning quality scores, and replacing readNames with unique integers
| |
− |
| |
− | ==== Helper Tools to Print Readable Information ====
| |
− | *[[BamUtil: dumpHeader|DumpHeader]] - Print the File Header to the screen.
| |
− | *[[BamUtil: dumpRefInfo|DumpRefInfo]] - Print the reference information from the SAM/BAM header.
| |
− | *[[BamUtil: dumpIndex|DumpIndex]] - Print the BAM Index to the screen in a readable format
| |
− | *[[BamUtil: readReference|ReadReference]] - Print the reference string for the specified region to the screen.
| |
− |
| |
| | | |
| + | ==== BAM Util Tools ==== |
| + | {{BamUtilPrograms}} |
| | | |
| === FASTQ === | | === FASTQ === |
Line 66: |
Line 41: |
| **Reports Base Composition Statistics (%reads at each read index) | | **Reports Base Composition Statistics (%reads at each read index) |
| | | |
| + | |
| + | === Meta Analysis === |
| + | * [[Rare-Metal-Worker|RAREMETALWORKER - generate summary level statistics for meta analysis using Rare-Metal]] |
| + | * [[Rare-Metal|RAREMETAL - perform genome-wide meta analysis of rare variants]] |
| | | |
| === Other Tools === | | === Other Tools === |
Line 76: |
Line 55: |
| | | |
| === Requested Tools === | | === Requested Tools === |
− | [[BAM to FASTQ]]
| |
| | | |
| =Other Tools= | | =Other Tools= |
| | | |
− | == [[Read Mapping]] ==
| + | * [[samtools-hybrid]] - Since many of our tools still rely on GLF files and samtools stopped supporting GLF files, we created a version of samtools that still supports pileup to GLF files AND incorporates the updated BAQ logic. This version is called samtools-hybrid That code can be downloaded at: https://github.com/statgen/samtools-0.1.7a-hybrid |
− | *[[Karma|Karma]] - Our fast short read aligner, which generates [[Mapping Quality Scores]] | + | *[[baseQualityCheck]] - tool to calculate the observed base quality vs. empirical base quality (helps to evaluate mappers) |
− | *[[Karma-colorspace|Karma-ColorSpace]] - QUICKSTART on mapping color space reads
| |
− | *[[baseQualityCheck]] - a mature tool to calculate the observed base quality vs. empirical base quality (helps to evaluate mappers) | |
− | | |
− | *[[Examples|Examples]] - Sample command lines with discussion
| |
− | | |
− | *[[MapabilityScores]] - Definitions of various mappability scores adopted at UCSC genome browser.
| |
− | | |
− | | |
− | | |
− | ==SAM/BAM==
| |
− | *Recalibrator – Resource-efficient tool, which recalibrates base qualities based on an adaptive logistic regression model - <span style="color:#D2691E">Available upon request</span>
| |
− | *Deduper – Mark or remove duplicates - <span style="color:#D2691E">Coming Soon</span>
| |
| | | |
| == Variant Calling == | | == Variant Calling == |
| * [[glfSingle]] - Variant calling for a single, deeply sequenced individual | | * [[glfSingle]] - Variant calling for a single, deeply sequenced individual |
− | * [[glfTrio]]- Variant calling for a single, deeply sequenced nuclear family with two parents and one child
| |
| * [[glfMultiples]] - Variant calling for multiple, unrelated individuals | | * [[glfMultiples]] - Variant calling for multiple, unrelated individuals |
− | * [[Polymutt:_a_tool_for_calling_polymorphism_and_de_novo_mutations|polymutt]] - Variant and ''de novo'' mutation detection in families (nuclear or extended pedigrees) from sequencing | + | * [[Polymutt|polymutt]] - Variant and ''de novo'' mutation detection in families (nuclear or extended pedigrees) from sequencing |
| | | |
| == Variant Annotation == | | == Variant Annotation == |
| *[[vcfCodingSnps]] - Annotate coding variants in a VCF file. | | *[[vcfCodingSnps]] - Annotate coding variants in a VCF file. |
| | | |
− | == Quality Control == | + | == Genotype Imputation == |
− | *[[GenotypeIDcheck]] - Check that mapped reads are consistent with known genotypes for each individual. | + | *[[Minimac3]] - Fast and Efficient Genotype Imputation. |
− | | |
− | == File Conversion ==
| |
− | *[[bam2FastQ]] - Convert BAM files into FastQ files
| |
| | | |
| + | == Additional Pedigree & Sequence Analysis Tools == |
| + | Can be found at: http://sph.umich.edu/csg/abecasis/software.html |
| | | |
| = Other Useful Links = | | = Other Useful Links = |