Changes

From Genome Analysis Wiki
Jump to navigationJump to search
2,575 bytes removed ,  01:07, 31 January 2015
Line 3: Line 3:     
=Software=
 
=Software=
Due to increasing volume of next generation sequencing and genotyping data, we have created these created C++ library and tools that use that library.
+
Due to increasing volume of next generation sequencing and genotyping data, we have created these C++ library and tools that use that library.
    
This page points to downloads, documentation, and papers for software that is written here at the [http://genome.sph.umich.edu Center for Statistical Genetics]
 
This page points to downloads, documentation, and papers for software that is written here at the [http://genome.sph.umich.edu Center for Statistical Genetics]
   −
If you have any questions or comments, please email Mary Kate Trost (mktrost@umich.edu).
+
If you have any questions or comments, please email Mary Kate Wing (mktrost@umich.edu).
    
=StatGen C++ Software=
 
=StatGen C++ Software=
Line 27: Line 27:  
=== SAM/BAM ===
 
=== SAM/BAM ===
   −
==== General Tools ====
   
*[[QPLOT]] - Calculate & plot summary statistics
 
*[[QPLOT]] - Calculate & plot summary statistics
*[[BamUtil: validate|Validate]] – Check file format & print statistics
   
*[[VerifyBamID]] – Check sample identities for contamination/sample swap
 
*[[VerifyBamID]] – Check sample identities for contamination/sample swap
 
**Genotype concordance based detection
 
**Genotype concordance based detection
 
**Estimate based on population allele frequencies without genotype data
 
**Estimate based on population allele frequencies without genotype data
*[[BamUtil: diff|Diff]] - Print the diffs between 2 bams
  −
*[[BamUtil: stats|Stats]] - Generate some statistics for a SAM/BAM file
  −
  −
  −
==== Rewrite SAM/BAM file ====
  −
*[[BamUtil: convert|Convert]] – Convert between SAM & BAM
  −
*[[BamUtil: splitBam|SplitBam]] – Split into 1 file per Read Group
  −
*[[BamUtil: splitChromosome|SplitChromosome]] – Split into 1 file per Chromosome
  −
*[[BamUtil: writeRegion|WriteRegion]] – Write only reads in the specified region and/or have the specified read name
   
*[[Pileup]] – Pileup every base or just bases in specified region and write VCF
 
*[[Pileup]] – Pileup every base or just bases in specified region and write VCF
*[[BamUtil: readIndexedBam|ReadIndexedBam]] - Read an indexed BAM file reference by reference id -1 to the max reference id and write it out as a SAM/BAM file
  −
*[[BamUtil: convert#BAM File Recovery | BAM Recovery]] - Recover corrupted BAM files
  −
  −
==== Update the File ====
  −
*[[SuperDeDuper]] - Determine duplicate alignments, either marking or removing the lower quality duplicates. In addition, it may modify paired-end reads where the ends overlap by soft clipping the end with the lower quality bases in the region of overlap.
  −
*[[RGMergeBam]] – Merge sorted BAM files adding Read Groups
  −
*[[PolishBam]] – Add/Update header lines & add RG tag to each record
  −
*[[TrimBam]] – Trim end of reads, changing read ends to ‘N’ & quality to ‘!’
  −
*[[BamUtil: filter|Filter]] – Soft clip ends with too high mismatch % and mark unmapped if quality of mismatches is too high
  −
*[[BamUtil: revert|Revert]] - Revert SAM/BAM replacing the specified fields with their previous values (if known) and removes specified tags
  −
*[[BamUtil: squeeze|Squeeze]] - Reduce files size by dropping OQ fields, duplicates, specified tags, using '=' when a base matches the reference, binning quality scores, and replacing readNames with unique integers
  −
*[[BamUtil: clipOverlap|ClipOverlap]] - Clip overlapping read pairs so they do not overlap
  −
  −
==== Helper Tools to Print Readable Information ====
  −
*[[BamUtil: dumpHeader|DumpHeader]] - Print the File Header to the screen.
  −
*[[BamUtil: dumpRefInfo|DumpRefInfo]] - Print the reference information from the SAM/BAM header.
  −
*[[BamUtil: dumpIndex|DumpIndex]] - Print the BAM Index to the screen in a readable format
  −
*[[BamUtil: readReference|ReadReference]] - Print the reference string for the specified region to the screen.
  −
      +
==== BAM Util Tools ====
 +
{{BamUtilPrograms}}
    
=== FASTQ ===
 
=== FASTQ ===
Line 69: Line 41:  
**Reports Base Composition Statistics (%reads at each read index)
 
**Reports Base Composition Statistics (%reads at each read index)
    +
 +
=== Meta Analysis ===
 +
* [[Rare-Metal-Worker|RAREMETALWORKER - generate summary level statistics for meta analysis using Rare-Metal]]
 +
* [[Rare-Metal|RAREMETAL - perform genome-wide meta analysis of rare variants]]
    
=== Other Tools ===
 
=== Other Tools ===
Line 79: Line 55:     
=== Requested Tools ===
 
=== Requested Tools ===
[[BAM to FASTQ]]
      
=Other Tools=
 
=Other Tools=
   −
== [[Read Mapping]] ==
+
* [[samtools-hybrid]] - Since many of our tools still rely on GLF files and samtools stopped supporting GLF files, we created a version of samtools that still supports pileup to GLF files AND incorporates the updated BAQ logic.  This version is called samtools-hybrid That code can be downloaded at: https://github.com/statgen/samtools-0.1.7a-hybrid
*[[Karma|Karma]] - Our fast short read aligner, which generates [[Mapping Quality Scores]]
+
*[[baseQualityCheck]] - tool to calculate the observed base quality vs. empirical base quality (helps to evaluate mappers)
*[[Karma-colorspace|Karma-ColorSpace]] - QUICKSTART on mapping color space reads
  −
*[[baseQualityCheck]] - a mature tool to calculate the observed base quality vs. empirical base quality (helps to evaluate mappers)
  −
 
  −
*[[Examples|Examples]] - Sample command lines with discussion
  −
 
  −
*[[MapabilityScores]] - Definitions of various mappability scores adopted at UCSC genome browser.
  −
 
  −
 
  −
 
  −
==SAM/BAM==
  −
*Recalibrator – Resource-efficient tool, which recalibrates base qualities based on an adaptive logistic regression model - <span style="color:#D2691E">Available upon request</span>
  −
*Deduper – Mark or remove duplicates - <span style="color:#D2691E">Coming Soon</span>
      
== Variant Calling ==
 
== Variant Calling ==
 
* [[glfSingle]] - Variant calling for a single, deeply sequenced individual
 
* [[glfSingle]] - Variant calling for a single, deeply sequenced individual
* [[glfTrio]]- Variant calling for a single, deeply sequenced nuclear family with two parents and one child
   
* [[glfMultiples]] - Variant calling for multiple, unrelated individuals
 
* [[glfMultiples]] - Variant calling for multiple, unrelated individuals
* [[Polymutt:_a_tool_for_calling_polymorphism_and_de_novo_mutations|polymutt]] - Variant and ''de novo'' mutation detection in families (nuclear or extended pedigrees) from sequencing
+
* [[Polymutt|polymutt]] - Variant and ''de novo'' mutation detection in families (nuclear or extended pedigrees) from sequencing
    
== Variant Annotation ==
 
== Variant Annotation ==
 
*[[vcfCodingSnps]] - Annotate coding variants in a VCF file.
 
*[[vcfCodingSnps]] - Annotate coding variants in a VCF file.
   −
== Quality Control ==
+
== Genotype Imputation ==
*[[GenotypeIDcheck]] - Check that mapped reads are consistent with known genotypes for each individual.
+
*[[Minimac3]] - Fast and Efficient Genotype Imputation.
 
  −
== File Conversion ==
  −
*[[bam2FastQ]] - Convert BAM files into FastQ files
      +
== Additional Pedigree & Sequence Analysis Tools ==
 +
Can be found at: http://sph.umich.edu/csg/abecasis/software.html
    
= Other Useful Links =
 
= Other Useful Links =
487

edits

Navigation menu