Difference between revisions of "Base Caller Summaries"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 31: Line 31:
  
 
==Alta-Cyclic==
 
==Alta-Cyclic==
 +
===Training Stage===
 +
*Learns run-specific noise patterns according to model and finds optimized solution reducing affect of noise sources using a Support Vector Machine (SVM)
 +
*Half of training set used for cross-validation
 +
 +
===Base Calling Stage===
 +
*Reports all sequences from run with optimized parameters
 +
 +
===Differences from Standard Illumina Base Caller===
 +
*Calling parameters optimized empirically and tested to enhance accuracy of each run
 +
*Calculates phasing parameters based on parametric model
 +
*Dynamically tracks changes in crosstalk, which disrupt signals in later cycles
  
 
==Probabilistic Base Calling==
 
==Probabilistic Base Calling==

Revision as of 12:04, 26 February 2010

Standard Illumina Base Caller

Sequencing-by-Synthesis (SBS)

  • DNA sample obtained, containing many copies of same sequences and randomly fragmented
  • Single-stranded DNA fragments attached to slide and amplified so there is a cluster of each fragment
  • DNA polymerase and 4 terminal bases (with distinct fluorescent markers) added
  • Clusters excited by lasers and photos taken in optimal wavelengths for 4 fluorophores
  • Fluorophores and terminators removed and process repeated for L cycles

Image Analysis

  • Corrects for imperfect repositioning of camera and aberrations of lens by aligning images to reference from original cycle
  • Signal for each cluster characterized as time series data of fluorescence intensities and noise

Base Calling

  • Converts fluorescence signals into actual sequence data with quality scores
  • Takes intensities of four channels for every cluster in each cycle and determines concentration of each base
  • Renormalizes concentrations by multiplying by ratio of average concentrations in first cycle and current cycle
  • Uses Markov model to determine transition matrix modeling probability of phasing (no new base synthesized), prephasing (two new bases synthesized), and normal incorporation
  • Uses transition matrix and observed concentrations of each base to determine concentrations in absence of phasing and reports these as base calls

General Noise Factors

  • Phasing
    • Failures in nucleotide incorporation or block removal or incorporation of more than one nucleotide in a particular cycle
  • Fading
    • Decay in fluorescent signal intensity with each cycle
    • Likely attributable to material loss during sequencing
  • Crosstalk
    • C channel illumination overlaps with A: a C label fluoresces in A channel (similarly G and T overlap)
    • Likely caused by overlap in dye emission frequencies

Alta-Cyclic

Training Stage

  • Learns run-specific noise patterns according to model and finds optimized solution reducing affect of noise sources using a Support Vector Machine (SVM)
  • Half of training set used for cross-validation

Base Calling Stage

  • Reports all sequences from run with optimized parameters

Differences from Standard Illumina Base Caller

  • Calling parameters optimized empirically and tested to enhance accuracy of each run
  • Calculates phasing parameters based on parametric model
  • Dynamically tracks changes in crosstalk, which disrupt signals in later cycles

Probabilistic Base Calling

BayesCall

Swift

Ibis

(To be added soon.)

References

Erlich, Y., Mitra, P.P., delaBastide, M., McCombie, W.R., Hannon, G.J. (2008) Alta-Cyclic: A self-optimizing base caller for next-generation sequencing. Nature Methods 5:679-682

Kao, W.-C., Stevens, K., Song, Y.S. (2009) BayesCall: A model-based base-calling algorithm for high-throughput short-read sequencing. Genome Research 19:1884-1895

Rougemont, J., Amzallag, A., Iseli, C., Farinelli, L., Xenarios, I., Naef, F. (2008) Probabilistic base calling of Solexa sequencing data. BMC Bioinformatics 9:Article 431

Whiteford, N., Skelly, T., Curtis, C., Ritchie, M.E., Löhr, A., Zaranek, A.W., Abnizova, I., Brown, C. (2009) Swift: Primary data analysis for the Illumina Solexa sequencing platform. Bioinformatics 25:2194-2199