Base Caller Summaries
From Genome Analysis Wiki
Jump to navigationJump to searchStandard Illumina Base Caller
Sequencing-by-Synthesis (SBS)
- DNA sample obtained, containing many copies of same sequences and randomly fragmented
- Single-stranded DNA fragments attached to slide and amplified so there is a cluster of each fragment
- DNA polymerase and 4 terminal bases (with distinct fluorescent markers) added
- Clusters excited by lasers and photos taken in optimal wavelengths for 4 fluorophores
- Fluorophores and terminators removed and process repeated for L cycles
Image Analysis
- Corrects for imperfect repositioning of camera and aberrations of lens by aligning images to reference from original cycle
- Signal for each cluster characterized as time series data of fluorescence intensities and noise
Base Calling
- Converts fluorescence signals into actual sequence data with quality scores
- Takes intensities of four channels for every cluster in each cycle and determines concentration of each base
- Renormalizes concentrations by multiplying by ratio of average concentrations in first cycle and current cycle
- Uses Markov model to determine transition matrix modeling probability of phasing (no new base synthesized), prephasing (two new bases synthesized), and normal incorporation
- Uses transition matrix and observed concentrations of each base to determine concentrations in absence of phasing and reports these as base calls
General Noise Factors
- Phasing
- Failures in nucleotide incorporation or block removal or incorporation of more than one nucleotide in a particular cycle
- Fading
- Decay in fluorescent signal intensity with each cycle
- Likely attributable to material loss during sequencing
- Crosstalk
- C channel illumination overlaps with A: a C label fluoresces in A channel (similarly G and T overlap)
- Likely caused by overlap in dye emission frequencies
Alta-Cyclic
Probabilistic Base Calling
BayesCall
Swift
Ibis
(To be added soon.)
References
Erlich, Y., Mitra, P.P., delaBastide, M., McCombie, W.R., Hannon, G.J. (2008) Alta-Cyclic: A self-optimizing base caller for next-generation sequencing. Nature Methods 5:679-682
Kao, W.-C., Stevens, K., Song, Y.S. (2009) BayesCall: A model-based base-calling algorithm for high-throughput short-read sequencing. Genome Research 19:1884-1895
Rougemont, J., Amzallag, A., Iseli, C., Farinelli, L., Xenarios, I., Naef, F. (2008) Probabilistic base calling of Solexa sequencing data. BMC Bioinformatics 9:Article 431
Whiteford, N., Skelly, T., Curtis, C., Ritchie, M.E., Löhr, A., Zaranek, A.W., Abnizova, I., Brown, C. (2009) Swift: Primary data analysis for the Illumina Solexa sequencing platform. Bioinformatics 25:2194-2199