Modeling of biocatalytic reactions: A workflow for model calibration, selection, and validation using Bayesian statistics

Funding information Deutsche Forschungsgemeinschaft, Grant/ Award Numbers: PL145/16-1, EXC 2075-390740016, EXC 310/2 Abstract We present a workflow for kinetic modeling of biocatalytic reactions which combines methods from Bayesian learning and uncertainty quantification for model calibration, model selection, evaluation, and model reduction in a consistent statistical framework. Our workflow is particularly tailored to sparse data settings in which a considerable variability of the parameters remains after the models have been adapted to available data, a ubiquitous problem in many real-world applications. Our workflow is exemplified on an enzyme-catalyzed two-substrate reaction mechanism describing the symmetric carboligation of 3,5-dimethoxy-benzaldehyde to (R)-3,30,5,50tetramethoxybenzoin catalyzed by benzaldehyde lyase from Pseudomonas fluorescens. Results indicate a substrate-dependent inactivation of enzyme, which is in accordance with other recent studies.


| INTRODUCTION
Modeling is a difficult task with many challenges. A good model is predictive and helps to get deeper insight into the described system or phenomenon by, for example, explaining underlying mechanisms or giving raise to nonobvious hypotheses which can be tested in a subsequent step. In many applications, building a good model renders it necessary to adapt the model's granularity to the available data and also to the kind of scientific questions the model should answer.
For (bio-)chemical reaction networks, as considered here on an example of an enzyme-catalyzed two-substrate conversion, standard modeling approaches based on chemical reaction kinetics exist. In addition, several workflows guide the researcher through a sequence or iteration of experiments, parameter estimation and model quality assessment as well as model refinement and/or experimental design steps. [1][2][3][4][5] These models are usually available in parameterized form, and parameter estimation is formulated as an optimization problem in which the model behavior is calibrated to experimental data. We often face the problem that the data sets do not contain enough information for the parameters to be uniquely identified. This is either caused by structural nonidentifiabilities, that is, the kind of data used for estimation do generally not allow for a unique parameter identification regardless of a particular realization and measurement noise, or by practical nonidentifiabilites, which can occur in a setting in which given data are sparse or very noisy. Methods based on point estimates and local approximations cannot be used in these cases. Structural identifiability can be tested by algebraic methods, which are often only applicable to small or medium size models or need at least a reference value and suitable neighborhood. 6 Regularization methods can be used to convert ill-posed into well-posed problems for particular Ina Eisenkolb and Antje Jensch contributed equally to this study. problem settings. 7 Suitable methodology to investigate practical identifiability comprises profile likelihood analysis 8 and Bayesian methods. 9 The later also allow for a consistent uncertainty quantification from data to model parameters to model predictions for any quantity of interest.
In addition to nonidentifiable parameters, sometimes the reaction mechanism is not known in full detail, the reaction kinetics is influenced by its environment, or the reaction system is embedded in a larger reaction network with yet uncharacterized crosstalk effects. Then, the model is not completely specified, resulting in problems such as structure identification or model selection, which are usually even more difficult than parameter estimation for a single model. 10 Here, we introduce a modeling workflow for parameter estimation, model selection, model reduction, and validation based on Bayesian statistics, which is particularly tailored for consistent uncertainty quantification, and compare it to a similar workflow which uses local methods. 11 Moreover, we discuss different ways to visualize outcomes of individual steps in the workflow. We regard such a workflow as a prerequisite for an automated data and model management system, which makes modeling results transparent and reproducible and facilitates standardization of processes.

| Modeling workflow
Our proposed modeling workflow is depicted in Figure 1. We start with an experimental data set and a set of initial models, here exemplified with models 1 and 2. These models are calibrated independently by using sampling-based Bayesian approaches. For model selection, we compare model fits and introduce additional methods, which make use of residual and parameter identifiability analysis, to judge overall plausibility of the models. If both models give satisfactory results, one can use model-based experiment design to suggest further experiments which help to discriminate between both models, leading to  13 In case that none of the two models provide satisfactory fits, the model has to be revised, leading to a new model hypothesis. If the fits are good but parameter precision is not satisfactory, we employ model reduction techniques based on the sampled parameters, which also leads to a new model hypothesis that can be compared in the same way. The individual workflow steps are explained in more detail for the particular example in the following.
Color codes for the three models in Figure 1 are reused in the simulations of the respective models in this study. We do not explicitly consider model-driven experiment design methods and model revision here, which is indicated by dashed lines and boxes in Figure 1, since standard methods are available for this and model revision is usually done manually, for example, via including expert knowledge and/or more details about the process at hand.

| Experimental time course data and competing modeling approaches
We use experimental data obtained from Zavrel et al., 11  Two initial model hypotheses (models 1 and 2) based on the mechanistic kinetic models derived in Zavrel et al. 11 are calibrated and compared in a first step. The enzyme reaction of BAL follows an ordered bi-uni reaction mechanism since two (identical) substrate molecules are converted to one product molecule. Both models are based on the same reaction mechanism, which is shown with its microreaction steps in Figure 2c. 14 with seven parameters that are listed in Figure 2d. Due to the fact that only six microreaction constants exist (Figure 2c), we employ the functional relation. 11 We refer to this model as model 1, with parameter vector θ 1 = (k catf , K eq , K mA , K mB , K mP , K iA ). Model 2 includes in addition a substrate-dependent enzyme inactivation, which is described by mass action kinetics, such that Model 2 comprises one additional parameter k inS and hence θ 2 = (k catf , K eq , K mA , K mB , K mP , K iA , k inS ). The theory that the enzyme Often it makes sense to use a shrinkage approach or to partly pool parameters such as SD for the same model output within the same experiment. As pooling reduces the number of parameters, it is often used for computational reasons.
Here, we employ a multiplicative error model according to where z j A t k , θ ð Þ refers to the solution of the differential equation for substrate A at time point t k , k ∈ 1, …, T in experiment j ∈ 1, …, 9 and y j A t k ð Þ denotes the respective noise-corrupted measurement.
The resulting likelihood function for experimental data set D reads A Bayesian approach requires a prior distribution π(θ) on the parameters, for which we choose a uniform distribution on the logtransformed parameters. This transformation is usually employed in cases where the order of magnitude of the unknown parameters is not known a priori since it enables to cover several orders of magnitude and obeys Benford's law according to which the mantissa of logarithms of numbers are equally distributed. 21,22 In addition, it maps positive parameters onto the entire set of real numbers, and in this way we get rid of this positivity assumption as a constraint. This log transformation has been shown to be highly advantageous for parameter estimation in systems biology models, see for example, Kreutz 23 and Villaverde et al. 24 Of note is here that probability distributions are generally not invariant for such nonlinear transformations, and results are different from choosing a uniform prior for the nontransformed parameters. An unbounded uniform measure corresponds to an improper prior distribution, resulting in the posterior distribution being proportional to the likelihood function.
For implementation, we use a proper prior distribution on the parameters by choosing reasonable bounds. This is usually done in an adaptive way. We cover a large range initially, solve the optimization problem to find the maximum a posteriori (MAP) estimator, and then adapt the boundaries such that they enclose this estimator and comprise most of the probability mass. The MAP estimator is thus defined asθ with corresponding objective function value J opt The posterior distribution then reads These PPDs or, generally, PPDs for any quantity of interestỹ can be inferred via Monte Carlo integration, where θ i are N samples from the posterior distribution p(θ|D). For parameter marginals, that is,ỹ is a subset of the parameters, this translates into just considering the respective entries in the sampled parameter vectors and using these to estimate marginal densities. Since the θ i have been generated from a Markov process, the series is usually autocorrelated, which has to be taken into account when estimating the variance of the measure of interest. The variableỹ can, for example, also be a state y(t*) at a particular time point t* or a waiting time until a particular event occurs.
If the measurement noise σ 2 in Equation (4) is small, interquartile ranges of this distribution can directly be approximated by using the posterior sample to simulate a bundle of trajectories and using these to empirically estimate these ranges. Model fits of the remaining six experiments are shown in Figure S1. Visually, there is not much difference between the two model variants in terms of fit quality.
The MAP estimators for Models 1 and 2 are listed in Figures 4a and 5a, respectively, and are indicated as white dots in Figures 4b and   5b. Figures 4b and 5b visualize estimated 2D parameter marginals p(θ i , θ j |D) (Equation (9)) of the posterior distributions for Models 1 and 2 as 2D scatterplots (lower right half) and corresponding binning plots on a hexagonal lattice (upper left half). 1D Parameter marginals (Equation (9)) are depicted below.
MAP estimates of both models are similar only for the parameters k catf and K eq . All other parameters differ significantly by several orders of magnitude, showing that estimated parameter values strongly depend on model equations and thus have to be handled with care.
This is a phenomenon which was also observed in Buchholz et al. 5 As indicated by the 1D marginal distributions for Model 1, the parameters k catf and K eq are well identifiable. The parameter K iA has a broader distribution, and those of K mA , K mB , and K mP cover a wide range and are not identifiable. Correlations with absolute values of coefficients larger than 0.5 only exist between K iA and K mA (ρ = 0.67) and between K iA and K mB (ρ = −0.69).
For Model 2, k catf , K eq , K mA , K mP , and k inS are well identifiable, while K mB and K iA have broader distributions. Moreover, the parameters are much more correlated. Besides the two correlations which are also present in Model 1, ρ(K iA , K mA ) = 0.72 and ρ(K iA , K mB ) = −0.96, the parameters K iA and K mP , K mP , and K mA , K mP , and K mB as well as K mA and K mB are also correlated with a coefficient ρ > 0.5.
We note here that these 1D and 2D parameter marginals clearly show that a FIM-based local analysis, as presented in Reference 11, comes to its limits. Confidence bounds are estimated in this analysis by approximating the 2D densities with bivariate normal distributions centered aroundθ MAP , and corresponding 1D marginals, which are also normally distributed. While this might still be a reasonable approximation for some parameters of Model 1, in particular the wellidentifiable parameters k catf and K eq , it cannot describe the nonlinear relationships between parameters of Model 2.
In summary, although Model 2 has more parameters, these parameters are much better identifiable than those of Model 1, which indicates that Model 2 is better suited to describe the data. Estimation

| Model comparison suggests substratedependent enzyme inactivation
Residual analysis is a way to judge whether the deviations between the progress curves produced byθ MAP and the data are in the range of noise. Therefore, we use our trained stochastically embedded dif-   Overall, Model 2 is favorable according to our analysis, which supports the hypothesis of a substrate-dependent enzyme inactivation.

| Model reduction based on statistical analysis
Equipped with a model which gives satisfactory predictions, but whose parameters are not completely identifiable, we decided to apply model reduction techniques to Model 2 according to our workflow ( Figure 1). Model reduction is a broad field, and many different techniques are on the market. For larger chemical reaction networks, time-scale separation techniques such as quasi-steady state or rapid equilibrium approximations are sensible approaches. [31][32][33] Model-order reduction techniques can be applied to reduce the com- The scatterplots in Figure 8 show the 2D (left) and 3D (right) correlations. In order to be able to compare to results in Zavrel et al. 11 and Buchholz et al., 5 these linear relations were used to eliminate the parameters K mP and K iA .
The parameters of the reduced model (Model 3) were sampled and Figure S2b shows that all parameters except K inS are well identifiable with small variances. Interestingly, model reduction led to an almost completely flat distribution of K inS .
Results of the fit analysis are shown in Figure 9.  Figure S5.
In summary, the reduced model is inferior to Model 2 in terms of Model fit and overall plausibility. This overall conclusion is further supported by statistical measures for model comparison such as the Akaike or the Bayesian information criteria (Table S3).  while the enzyme inactivation rate k inS derived in Buchholz et al. 5 is in the same range as that of our Model 2 that is selected here, it is two orders of magnitude smaller than that of our Model 3, that was rejected, which is a further support of our results.
Subsequently, we present the methodological key findings: The superiority of statistical methods for model calibration over classical least squares approaches lies in the fact that they include information about the data generation process by taking the noise characteristics in the data into account. This enables a thorough analysis including uncertainties and confidence bounds on all levels, which ultimately also allows to judge overall model plausibility.
Sampling-based approaches as applied here are computationally expensive (for runtimes, see Table S3), which constitutes a clear limitation. Despite many attempts and progress, the development of advanced sampling schemes to improve scalability of these approaches is still a current research topic. 34 Usually, many parameter samples are needed for convergence of the sampling schemes, which is, for example, caused by low acceptance rates of standard sampling schemes especially in cases where parameter correlations are high, 25,26 or because the posterior mass is spread and a large space has to be explored. In addition, if, for example, the data set contains many experiments, evaluation of the likelihood function via numerical integration for each parameter sample might be time consuming as well.
In other studies, we have observed that the parametric bootstrap statistics, as presented here, is highly influenced by the error model, which therefore has to be chosen carefully. Here, we have used a multiplicative error model with predefined SD. Residual analysis indicated that this was a good choice, but we still lack suitable methods for setting up a good error model in general.
Furthermore, we have used linear correlation analysis for model reduction. This was justified by the fact that we observed high correlations between pairs of parameters in the scatterplot analysis. Similar analysis methods that are able to detect nonlinear relations between model parameters exist. 35 In general, however, such purely data-based model reduction techniques are more difficult to interpret and we generally recommend using these techniques that do not deviate much from the underlying physical process if this is possible.
Overall, we are convinced that Bayesian methods for the analysis of dynamic models will become a standard approach once computation times are not that limiting anymore. They are more and more frequently used in different contexts for model calibration, model selection, and uncertainty quantification, see, for example, Davies et al. 36 and Luzyanina and Bocharov. 37 Future work includes the development of bootstrap methods which exploit information of the full posterior distribution rather than just using the MAP estimate as well as methodology on how to make decisions in case that results of the presented analysis methods are not as consistent as they were in this study. Moreover, in future interdisciplinary projects in which we acquire new data we intend to explicitly integrate optimal experiment design methods into our workflow that go beyond local methods (see, e.g., Stigter et al. 6 ) and that make use of the Bayesian viewpoint.