Base Caller Summaries
From Genome Analysis Wiki
Jump to navigationJump to searchStandard Illumina Base Caller
Sequencing-by-Synthesis (SBS)
- DNA sample obtained, containing many copies of same sequences and randomly fragmented
- Single-stranded DNA fragments attached to slide and amplified so there is a cluster of each fragment
- DNA polymerase and 4 terminal bases (with distinct fluorescent markers) added
- Clusters excited by lasers and photos taken in optimal wavelengths for 4 fluorophores
- Fluorophores and terminators removed and process repeated for L cycles
Image Analysis
- Corrects for imperfect repositioning of camera and aberrations of lens by aligning images to reference from original cycle
- Signal for each cluster characterized as time series data of fluorescence intensities and noise
Base Calling
- Converts fluorescence signals into actual sequence data with quality scores
- Takes intensities of four channels for every cluster in each cycle and determines concentration of each base
- Renormalizes concentrations by multiplying by ratio of average concentrations in first cycle and current cycle
- Uses Markov model to determine transition matrix modeling probability of phasing (no new base synthesized), prephasing (two new bases synthesized), and normal incorporation
- Uses transition matrix and observed concentrations of each base to determine concentrations in absence of phasing and reports these as base calls
General Noise Factors
- Phasing
- Failures in nucleotide incorporation or block removal or incorporation of more than one nucleotide in a particular cycle
- Fading
- Decay in fluorescent signal intensity with each cycle
- Likely attributable to material loss during sequencing
- Crosstalk
- C channel illumination overlaps with A: a C label fluoresces in A channel (similarly G and T overlap)
- Likely caused by overlap in dye emission frequencies
Alta-Cyclic
Training Stage
- Learns run-specific noise patterns according to model and finds optimized solution reducing affect of noise sources using a Support Vector Machine (SVM)
- Half of training set used for cross-validation
Base Calling Stage
- Reports all sequences from run with optimized parameters
Differences from Standard Illumina Base Caller
- Calling parameters optimized empirically and tested to enhance accuracy of each run
- Calculates phasing parameters based on parametric model
- Dynamically tracks changes in crosstalk, which disrupt signals in later cycles
Probabilistic Base Calling
BayesCall
Swift
Ibis
(To be added soon.)
References
Erlich, Y., Mitra, P.P., delaBastide, M., McCombie, W.R., Hannon, G.J. (2008) Alta-Cyclic: A self-optimizing base caller for next-generation sequencing. Nature Methods 5:679-682
Kao, W.-C., Stevens, K., Song, Y.S. (2009) BayesCall: A model-based base-calling algorithm for high-throughput short-read sequencing. Genome Research 19:1884-1895
Rougemont, J., Amzallag, A., Iseli, C., Farinelli, L., Xenarios, I., Naef, F. (2008) Probabilistic base calling of Solexa sequencing data. BMC Bioinformatics 9:Article 431
Whiteford, N., Skelly, T., Curtis, C., Ritchie, M.E., Löhr, A., Zaranek, A.W., Abnizova, I., Brown, C. (2009) Swift: Primary data analysis for the Illumina Solexa sequencing platform. Bioinformatics 25:2194-2199