1 — Why batch growth follows a sigmoidal curve
In a closed batch, all substrate is loaded at the start. Cells progress through four characteristic phases — lag (metabolic adaptation, no division), exponential (μ ≈ μmax, substrate non-limiting), deceleration (substrate falling, μ declining), and stationary (substrate exhausted or product-inhibited, μ → 0). The biomass trajectory X(t) plotted against time traces an asymmetric S-curve: a slow start, an inflection where dX/dt is maximal, and a slow approach to an asymptote Xmax.
The differential form is simple:
dX/dt = μ(X, S) · X
but μ depends on substrate concentration through Monod-style saturation, on biomass through density-dependent feedback (waste accumulation, oxygen depletion, ethanol toxicity), and on time through the lag-phase adaptation. Closed-form integration of the full mechanistic system is intractable except in trivial cases. The practical approach is to fit X(t) directly with empirical sigmoidal functions whose parameters happen to coincide with biologically meaningful quantities.
2 — Five sigmoidal models
The calculator implements five canonical models. They differ in symmetry around the inflection point, in how lag time is parameterized, and in their parameter count (3 vs 4). All five fit batch data through phenomenological matching rather than mechanistic derivation, but each emerges from a defensible theoretical limit.
2.1 Logistic (Verhulst 1838)
The earliest sigmoidal growth law — originally proposed for human population growth — assumes the per-capita growth rate declines linearly with population size:
dX/dt = μmax · X · (1 − X/Xmax)
which integrates to:
X(t) = Xmax / (1 + (Xmax/X0 − 1) · exp(−μmax t))
Three parameters: X0, Xmax, μmax. The curve is symmetric around the inflection point at X = Xmax/2; there is no explicit lag term. Use when lag is short or absent and the data is close to symmetric.
2.2 Gompertz (Gompertz 1825)
Originally proposed for human-mortality probabilities, Gompertz curves are asymmetric: the inflection occurs early, at X = Xmax/e ≈ 0.37 Xmax, and the approach to the asymptote is slow.
X(t) = Xmax · exp(−b · exp(−c t))
Three parameters: Xmax, b, c. The maximum specific growth rate is μmax = b · c / e (rate at the inflection point divided by biomass at the inflection point). Use when growth approaches the asymptote slowly with no lag.
2.3 Modified Gompertz / Zwietering reparameterization (Zwietering 1990)
The default model. Zwietering showed that the basic Gompertz function can be rewritten so that its parameters are directly the biologically meaningful quantities — lag time, maximum specific growth rate, and asymptotic biomass — rather than the abstract b and c:
X(t) = X0 + A · exp(−exp((μmax · e / A)(λ − t) + 1))
Four parameters: X0, A = Xmax − X0, μmax, λ (lag time). Using A as the additive amplitude (rather than Xmax directly) keeps the parameters orthogonal — small changes in μmax and λ don't bleed into A. This is the recommended default for batch fermentation: the lag is explicit, the curve is asymmetric, and the parameters are interpretable.
2.4 Baranyi-Roberts (Baranyi & Roberts 1994)
A dynamic model with mechanistic motivation. The lag is treated as a quantity of an unspecified intracellular Q(t) (sometimes interpreted as substrate adaptation, RNA pool, or membrane fluidity) that must accumulate before growth begins. The full expression is:
ln(X/X0) = μmax · A(t) − ln(1 + (eμmax A(t) − 1) / (Xmax/X0))
where A(t) = t + (1/μmax) · ln(e−μmax t + e−h0 − e−μmax t − h0) and h0 = μmax · λ.
Four parameters: X0, Xmax, μmax, λ. Compared to Modified Gompertz, the lag-to-exponential transition is sharper. Recommended for low-inoculum or stress-adapted cultures where the entry into exponential growth is abrupt.
2.5 Richards (Richards 1959)
An empirical generalization of the logistic with an extra shape parameter ν that lets the inflection point move:
X(t) = Xmax · (1 + ν · exp(−μmax(t − τ)))−1/ν
Four parameters: Xmax, μmax, τ (time of inflection), ν (shape). When ν → 0 the curve approaches Gompertz; when ν = 1 it equals the Logistic; intermediate values give a flexible asymmetric shape. There is no explicit lag term, so τ absorbs both the lag and the position of the inflection. Use when the data shows clear asymmetry that doesn't match canonical Gompertz, or as a tiebreaker when no other model fits well.
3 — Nonlinear regression: Levenberg-Marquardt
Fitting any of these models to data Xi at times ti means solving an iterative least-squares problem. Define the residuals:
ri(p) = Xi − f(ti, p)
where p is the parameter vector and f is the chosen growth model. The objective is:
SSE(p) = Σi ri(p)2
The Levenberg-Marquardt algorithm (Levenberg 1944, Marquardt 1963) interpolates between two simpler methods. Gauss-Newton uses the Jacobian Jij = ∂f(ti)/∂pj and solves the normal equations:
(JTJ) · Δp = JTr
which is fast near the optimum but unstable far from it (JTJ can be near-singular). Steepest descent uses just the gradient and is robust but very slow near the optimum. LM combines them by adding a damping term λ (Marquardt parameter, not the lag time):
(JTJ + λ · diag(JTJ)) · Δp = JTr
When λ is large, the step approximates steepest descent (small, robust); when λ is small, it approximates Gauss-Newton (large, fast). After each successful step, λ is reduced by a factor of 10; after each failed step (where SSE increases), λ is multiplied by 10. The calculator initializes λ = 10−3 and uses central-difference numerical derivatives to compute J. Convergence is declared when the relative change in SSE falls below 10−9 or after 200 iterations.
Initial guesses are derived from data heuristics: X0 from the first data point, Xmax from the 90th percentile, μmax from the steepest log-slope across consecutive points, λ from the first time biomass exceeds X0 by 5%. Poor initial guesses can trap LM at local optima — the auto-select feature mitigates this by trying all five models in parallel.
4 — Goodness-of-fit metrics
Three metrics are reported for each fit, addressing different questions.
4.1 RMSE (root-mean-square error)
RMSE = √(SSE / n)
The typical residual size in the same units as X. Best read alongside the data scale: an RMSE of 0.1 g/L is excellent for Xmax = 4 g/L data but poor for Xmax = 0.1 g/L data.
4.2 R2 (coefficient of determination)
R2 = 1 − SSE / SST, SST = Σ (Xi − X̄)2
The fraction of variance the model explains. R2 is unit-independent and roughly comparable across datasets, but it is biased upward as parameter count grows: a 4-parameter model will always reach a higher R2 than a 3-parameter model on the same data even when the extra parameter contributes nothing useful.
4.3 AIC (Akaike Information Criterion)
AIC = n · ln(SSE/n) + 2k
where k is the number of parameters. AIC penalizes parameter count: a more complex model only "wins" if its SSE drops by enough to overcome the 2k penalty. Lower is better. AIC is the right metric for choosing between 3-parameter and 4-parameter models — R2 alone will spuriously favor the 4-parameter ones.
For small samples (rule of thumb: n / k < 40), use the corrected AICc:
AICc = AIC + 2k(k+1) / (n − k − 1)
which adds an additional small-sample penalty and is reported alongside AIC.
5 — Confidence intervals: Wald approximation
The covariance matrix of the parameter estimates is approximated as:
Cov(p̂) ≈ σ̂2 · (JTJ)−1, σ̂2 = SSE / (n − k)
The standard error of each parameter is the square root of the corresponding diagonal element. The 95% confidence interval is:
p̂j ± t0.975, n−k · SE(p̂j)
where t0.975, n−k is the two-sided t-critical value for n − k degrees of freedom. The calculator computes t-critical values via a Wilson-Hilferty approximation. The Wald CI assumes the SSE landscape is locally quadratic around the optimum — a reasonable approximation for batch growth curves with moderate noise, but it can underestimate true uncertainty for parameters that are weakly identified by the data (e.g., λ when no points lie clearly within the lag phase).
6 — Auto-select algorithm
When invoked, the auto-select feature fits all five models in sequence, then ranks the successful fits in two stages: primary by R2 descending, tiebreak by AIC ascending. Failed fits (singular Jacobian, non-finite parameters) are listed at the bottom. The user can override the auto-choice by clicking Pick on any other row in the ranking table.
The two-stage ranking handles the common case where two models give nearly identical R2 — AIC then picks the simpler one. It does not perfectly mirror AIC-only ranking: occasionally a 4-parameter model will dominate R2 by a fraction of a percent while having higher AIC. In those cases the auto-choice is the higher-R2 model; for principled model selection on parameter parsimony, use AIC directly.
7 — Number of generations Z
The number of population doublings from inoculum to capacity:
Z = log2(Xmax / X0)
Z counts the elementary cell-division events the population underwent, regardless of how fast each one was. It is independent of μmax and λ. Z matters in three contexts:
- Yeast lineage tracking. Petite-mutant accumulation, plasmid loss, and certain epigenetic drifts scale with the cumulative number of doublings, not with elapsed time. A culture that ran for 24 h with Z = 10 carries more genetic baggage than one that ran for 48 h with Z = 5.
- Brewing pitch-rate planning. Successive yeast harvest-and-repitch cycles compound Z; commercial brewers track total Z across generations and discard slurries past a threshold (typically Z > 60–100).
- Population genetic experiments. Mutation rates per generation are calibrated against Z, not chronological time.
8 — Phase durations
Three durations are reported, all derived from the fitted curve rather than from raw data:
- Lag time λ: the explicit fitted parameter (Modified Gompertz, Baranyi-Roberts) or zero (Logistic, Gompertz, Richards). Operationally defined as the intercept where the maximum tangent meets the X0 level.
- Exponential phase duration: the time interval between λ and the time at which X reaches 95% of Xmax. The 95% threshold is conventional; a higher threshold (99%) gives a longer "exponential" duration that bleeds into the deceleration phase.
- Stationary phase duration: the time interval between the end of exponential phase (X = 0.95 Xmax) and the last data point. This is a duration of observation, not a duration of stationary biology — if the experiment ended before the cells entered death phase, no death phase is captured.
9 — Yield coefficients and specific rates
When a substrate column is present, the batch-integral yield is:
YX/S = (Xmax − X0) / (S0 − Sfinal)
This combines biomass produced through both growth and maintenance, so it is always less than the true growth-coupled yield YX/Smax. To separate the two, use Tab 2 (Fed-Batch) or Tab 3 (Continuous) which run a Pirt regression across multiple operating points.
When a product column is also present, YP/S is computed analogously. The specific rates qS and qP are reported as windowed averages over the exponential phase:
qS = ΔS / (X̄ · Δt), qP = ΔP / (X̄ · Δt)
using the first half of the dataset as the window. These are batch-integral approximations and don't replace continuous-culture chemostat measurements for parameter precision.
10 — Metabolism classifier
When yields are computed, the calculator classifies the metabolism into one of four regimes based on YX/S and YP/S envelopes for S. cerevisiae:
- Aerobic respiratory: YX/S ≥ 0.40 g/g, YP/S ≤ 0.05 g/g. Glucose is fully oxidized to CO2 + H2O via the TCA cycle; ethanol is not produced. Typical of low-glucose chemostat or restricted fed-batch cultivation.
- Aerobic respiro-fermentative: 0.10 ≤ YX/S < 0.40 g/g, 0.10 ≤ YP/S ≤ 0.40 g/g. Mixed metabolism — partial respiration alongside ethanol overflow under high-glucose aerobic conditions (Crabtree effect).
- Anaerobic fermentative: YX/S ≤ 0.15 g/g, YP/S ≥ 0.40 g/g. Glucose is converted almost entirely to ethanol + CO2 via the Embden-Meyerhof-Parnas pathway; biomass yield is low because only 2 ATP per glucose are recovered (vs ~32 ATP in respiration). Typical of brewery and bioethanol fermentations.
- Indeterminate: yields fall outside the canonical envelopes — might reflect mixed conditions, atypical substrates (e.g., xylose for engineered strains), aeration transitions, or under-sampled batches.
The thresholds are operational, not mechanistic — they are based on canonical glucose-cerevisiae behavior and may misclassify other organisms or non-glucose substrates. The classifier is informational; the user can override it via the metabolism dropdown.
11 — When the models break
The five sigmoidal models all assume monotonic growth-and-stationary. Three patterns systematically violate the assumptions:
- Death phase. Post-stationary biomass decline is not captured — including death-phase data in the fit will distort Xmax downward and inflate residuals. Truncate to the stationary plateau before fitting.
- Diauxic growth. Two-phase growth (e.g., glucose → ethanol re-utilization) gives a double-S curve that no single model fits well. Use Tab 4 (Diauxic) instead, which detects the phase boundary and fits each phase independently.
- Catastrophic shifts. Sudden μ collapse (oxygen limitation, pH crash, contamination) breaks the smooth-curve assumption. Inspect residuals visually; structured residuals (a curve in the residuals plot) means the model is misspecified for the data, not just noisy.
Reasonable fit quality (R2 ≥ 0.95) plus structureless residuals are jointly necessary to trust the parameter values. Either alone is insufficient.