Computers & Operations Research 37 (2010) 1427 -- 1438
Contents lists available at ScienceDirect
Computers & Operations Research journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / c o r
Optimization of biochemical systems through mathematical programming: Methods and applications Julio Vera a , Carlos González-Alcón b,∗ , Alberto Marín-Sanguino c , Néstor Torres d a
Systems Biology and Grupo de Tecnología Max Planck Institute d Grupo de Tecnología b c
A R T I C L E
Bioinformatics Group, Department of Computer Science, University of Rostock, Rostock, Germany Bioquímica, Departamento de Estadística, Investigación Operativa y Computación, Universidad de La Laguna, La Laguna, Tenerife, Islas Canarias, Spain of Biochemistry, Department of Membrane Biochemistry, Am Klopferspitz 18, D-82152 Martinsried, Germany Bioquímica, Departamento de Bioquímica y Biología Molecular, Universidad de La Laguna, La Laguna, Islas Canarias, Spain
I N F O
Available online 10 March 2009 Keywords: Biochemical systems Power-law formalism S-systems Generalized mass action Linear programming Multiobjective optimization Geometric programming
A B S T R A C T
In this work we present a general (mono and multiobjective) optimization framework for the technological improvement of biochemical systems. The starting point of the method is a mathematical model in ordinary differential equations (ODEs) of the investigated system, based on qualitative biological knowledge and quantitative experimental data. In the method we take advantage of the special structural features of a family of ODEs called power-law models to reduce the computational complexity of the optimization program. In this way, the genetic manipulation of a biochemical system to meet a certain biotechnological goal can be expressed as an optimization program with some desired properties such as linearity or convexity. The general method of optimization is presented and discussed in its linear and geometric programming versions. We furthermore illustrate the use of the method by several real case studies. We conclude that the technological improvement of microorganisms can be afforded using the combination of mathematical modelling and optimization. The systematic nature of this approach facilitates the redesign of biochemical systems and makes this a predictive exercise rather than a trial-and-error procedure. © 2009 Elsevier Ltd. All rights reserved.
1. Introduction In the last decade the advancement in the experimental characterisation of the biological systems at the cellular and intracellular level has revealed that a new integrative biology is necessary to investigate them (Voit [65], Kitano [22], Dollery et al. [15], Wolkenhauer [67]). In this new paradigm, the structure and dynamics of cellular systems are investigated as a whole, rather than as isolated parts, in order to understand their essential features. In this regard, the mathematical modelling of biochemical systems supported with quantitative experimental data has emerged as a new approach to investigate the dynamics of biochemical pathways. This approach is a suitable tool to support the biologists in the process of hypotheses generation and the design of experiments (Kitano [23], Wolkenhauer [67]), decreasing the experimental effort necessary to characterise a biological system. In addition, there are specific properties of the biochemical systems (feedback loops, multistability, ultrasensitivity, etc.) that escape the purely experimental approach used in the
∗ Corresponding author. E-mail address:
[email protected] (C. González-Alcón). 0305-0548/$ - see front matter © 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.cor.2009.02.021
traditional molecular biology and can be understood only in terms of a dynamical mathematical description of the systems. One of the usual approaches to model biochemical pathways is the use of mathematical models in ordinary differential equations (ODEs) (Heinrich and Schuster [20]). The power-law models (Savageau [38], Voit [65], Sorribas et al. [49], Vera et al. [59]) are a special kind of ODE models in which the processes that integrate biochemical networks are modelled using power-law expansions in the variables of the system (n dependent and m independent) and then included in non-linear ODEs with the following structure: n+m dX i gijk = cij Xk dt j
for i = 1, . . . , n,
(1)
k=1
where Xi are the variables of the model (concentrations of biochemical compounds of the investigated network: metabolites, proteins, messenger RNAs, etc.), and cij (rate constants) and gijk (kinetic orders) are different kinds of parameters defining the dynamics of the system. The specific numerical values for the parameters are determined using prior biological knowledge, information about the basal steady states of the system (Voit [65]), or experimental data from dynamical perturbations (Vera et al. [60,57]). Unlike in conventional
1428
J. Vera et al. / Computers & Operations Research 37 (2010) 1427 -- 1438
kinetic models, where the kinetic orders are always integer numbers, power-law models admit non-integer values. The most remarkable property of power-law equations is that two such equations with the same structure can describe totally different dynamics (from inhibitory process to the description of cooperativity) by just modifying the numerical values of the kinetic orders involved (Vera et al. [59]). A special simplified sub-category of power-law models are the S-system models, in which rate aggregation is applied to the differential equations describing the dynamics of every time-dependent compound (Torres and Voit [54], Voit [65]). In S-system models, every positive (or production) fluxes are aggregated into a unique input flux and every negative (or degradation) fluxes are aggregated into a unique negative flux. Every differential equation in an S-system model has the same characteristic structure no matter the complexity or the nature of the biochemical system investigated: n+m n+m gij hij dX i = i Xj − i Xj dt j=1
for i = 1, . . . , n.
(2)
j=1
S-system models have remarkable mathematical properties related to their structural simplicity. One of their most interesting features is the possibility of computing their steady states using linear algebra by just applying a simple logarithmic transformation to the steadystate equations (Voit [65]). From a biotechnological and biomedical perspective, mathematical models of biochemical systems can be a very valuable tool in the coming times. The use of suitable mathematical models may allow to investigate the effect of pathway modifications that are not accessible with the current experimental techniques. In addition they can be used to predict in advance the effect of costly complex interventions over biochemical systems with technological or biomedical purposes. In this vein, mathematical modelling can be combined with operations research in order to support microbiologists in the biotechnological improvement of microorganisms with industrial applications (Torres and Voit [54]). “What is the minimum set of enzymes in a given microorganism that should be modified in order to maximize the industrial production of a desired end product?” “How to make a proper decision when genetically modifying a microorganism if we find that more than one biological response should be optimized?” These are technologically relevant questions that can be rationally answered using mathematical modelling and operations research. Moreover, the combination of mathematical modelling, data mining and operations research is a promising approach in drug discovery (Vera et al. [60]). In that case, some biomedical dilemma concerning the design of new therapies can also be presented as optimization programs: “What is the minimum set of biochemical interactions in a network that must be inhibited via specifically designed drugs in order to subvert the pathological condition associated to a certain disease?” Several groups have worked in the past decade using this combination of mathematical modelling and optimization in biotechnology and biomedicine (Voit [64], Torres et al. [53], Mendes and Kell [30], Hatzimanikatis et al. [19], Banga et al. [4]). In those works, the tools and routines developed by the operations research have been used, these ranging from linear programming techniques to the most sophisticated, up to date optimization algorithms. In this paper we review and discuss the basic concepts of the mathematical optimization of biochemical systems and illustrate them using several case studies of technological and biomedical relevance. 2. Linear programming and the IOM method The indirect optimization method (IOM) (see Fig. 1) is based on the fact that, although S-system models are non-linear, their
Fig. 1. Flow diagram of the indirect optimization method (IOM).
steady-state equations are linear when the variables are expressed in logarithmic coordinates. Since a function and its logarithm assume their maxima for the same argument, yields or fluxes can thus be optimized with linear programs expressed in terms of the logarithms of the original variables. Also, some additional constraints frequently found in biological systems, such ratios or products, become linear in the logarithmic coordinate system. Thus, transporting the steady-state system and the constraints into a logarithmic space reduces the non-linear optimization problem to a problem of straightforward linear programming (Voit [64], Vera et al. [58]). Since then, the method has been extended and applied to different types of systems (Torres et al. [53], Alvarez-Vasquez et al. [1,2], Marín-Sanguino and Torres [28], Vera et al. [58,60,62,63], Xu et al. [68], Sendín et al. [47]). The method is basically a procedure that systematizes the steps to follow for optimum results, or at least efficient S-system models. Fig. 1 illustrates the stages of the method in an outline for the most general case. Step 1: Getting the model. The first step is the development of a model of the system. This can be modelled directly on the S-system formalism or in the form of kinetic model or GMA, and then transferred to S-system. Step 2: Analysis of quality of the model. The S-system modelling formalism provides us with tools to analyse its stability, robustness and dynamic response. It is important to ensure the validity of the model, i.e., that it properly describes the known behaviour of the
J. Vera et al. / Computers & Operations Research 37 (2010) 1427 -- 1438
system. Otherwise, it is necessary to refine the model prior to its optimization. Step 3: Construction and resolution program optimization. This is the core of the procedure. All information gathered on the system and the limitations of biological or technological constraints on the solutions are converted into the mathematical equations that make up the program optimization. After ensuring the validity of the model and the adequacy of the components of the program optimization, the stated objectives and the conditions imposed on the system are moved into logarithmic space. In the logarithmic space, the optimization problem becomes a linear program and once the optimal solution is obtained it is transferred to the area of the original variables. In the more general case (multiobjective optimization) once we reach this point we can choose among some options. The first one is the pure multicriteria problem, the multiple objective linear program, a direct extension of the linear programming case. An alternative is the weighted sum approach. Here we assign a weight to each objective or function to be optimized. In the present case, these weights are chosen between zero and one such that they sum up to one. Finally, the third option is goal programming, where a goal level of achievement is established for each objective. These goal levels are soft constraints that are included in the definitions of the objective functions. Moreover the optimized system must be at a steady state and fulfil some constraints. These constraints, as well as the objective functions, are readily translated into linear functions and inequalities, and the optimization task becomes a linear program. Some software packages are available that produces the efficient solution set for multiobjective linear programming tasks. This is the case of ADBASE (Steuer [52]) used here, besides the Matlab optimization toolbox for the goal programming and weighted sum approach. Step 4: Analysis and refinement of the solution. At this stage the solution is analysed, that is, the system's properties at the optimal state (stability, robustness and dynamics). If the optimal solution is sound, it is chosen as a solution to the problem. If the predicted steady state is unstable or not robust enough, the imposed restrictions or the objective function are refined and the optimization process repeated until a satisfactory solution is reached. In the case where an S-system model was built from a previous kinetic model, the S-system solution must be transferred to the original model. In the following we will illustrate the application of this approach to three case studies. 2.1. Case study 1. Optimization of the Escherichia coli L-(-)-carnitine production L-(-)-carnitine is a chiral compound widely distributed in nature and with a wide range of medical applications. A great deal of the Lcarnitine is produced by chemical synthesis, with the disadvantage of producing a racemic mixture that it is necessary to separate in a costly process. Fig. 2 shows a schematic diagram of one experimental set-up for the biotransformation of crotonobetaine into L-carnitine by an overproducing E. coli strain in a cell recycle bioreactor (Canovas et al. [9], Alvarez-Vasquez et al. [2]). The mathematical model of the system was presented by Cánovas et al. [9]. In their original presentation, the authors included the metabolic transformation of crotonobetaine into -butyrobetaine, through the activity of a previously described crotonobetaine reductase (Roth et al. [36], Preusser et al. [35]). In Fig. 2 appear the process variables represented in the model. In the optimization procedure the dependent variables are X1 –X4 , while X5 –X9 are the parameters. We started from the original model set-up and carried out its translation to the S-system representation
1429
Fig. 2. Schematic representation of the experimental set-up for the biotransformation of crotonobetaine into L-(-)-carnitine by E. coli strains.
(Alvarez-Vasquez et al. [2]): dX 1 = X5 X6 − 1.0213X10.9162 X40.5675 X50.4324 X80.5675 , dt dX 2 = 0.2438X30.3538 X40.9394 X50.0605 X70.0605 X90.9292 dt − 0.464X20.1424 X40.9643 X50.0356 X90.9537 , dX 3 = 0.3844X20.1106 X4 X90.989 − 0.2058X30.3926 X40.9742 X50.0257 X90.9636 , dt dX 4 = 0.0053X10.8524 X4 X8 − 0.08X4 . dt Subsequently we proceed with the mono-objective version of the IOM approach. Accordingly, we first define the objective function. In the present case it was the net rate of L-carnitine, that in terms of the model is expressed as Vc = X3 X5 . This function becomes in logarithmic coordinates Y3 + Y5 , where Yi is ln(Xi ). Then the steady-state condition, once linearized in logarithmic coordinates was calculated: 0.0211 = −0.9162Y1 − 0.5675Y4 + 0.5676Y5 + Y6 − 0.5675Y8 , 0.6433 = − 0.1424Y2 + 0.3538Y3 − 0.0249Y4 + 0.0249Y5 + 0.0605Y7 − 0.0245Y9 , −0.6246 = 0.1106Y2 − 0.3926Y3 + 0.0257Y4 − 0.0257Y5 + 0.0254Y9 , 2.706 = 0.8524Y1 + Y8 . At this point we define the extent to which it is allowed to change the variables and parameters of the model. In general, the variation range will be between 0.5 and 1.5 times the baseline values (except for X4 , biomass, which is allowed to increase up to 2 times the baseline). Accordingly, we obtain the following bounds: 3.0746 Y1 4.1732,
2.6909 Y2 3.7895,
2.3277 Y3 3.4263,
1.8068 Y4 3.1931, −0.6931 Y5 0.4054, 3.9120 Y6 5.0106, 3.2188 Y7 4.3174, −1.1988 Y8 − 0.1002, 4.1214 Y9 6.424.
Once the linear program was so obtained we perform two optimization tasks: with one and two decision (independent) variables. The solutions were translated from the S-system to the original model (K-M). The obtained optimal profiles (Table 1) were then implemented in the bioreactor. It can be seen in Table 1 that the experimental results practically coincided with the theoretical
1430
J. Vera et al. / Computers & Operations Research 37 (2010) 1427 -- 1438
Table 1 Comparison between predicted (K-M) and actual experimental values (Exp.) of L-(-)carnitine production rate by E. coli in a cell recycle crotonobetaine biotransformation system. Variables
Basal
Dependent Glycerol, X1 Crotonobetaine, X2 Carnitine, X3 Biomass, X4
43.28 mM 29.49 mM 20.51 mM 12.18 g/L
Independent Dilution rate, X5 Crotonobetaine input, X7 Production rate, Vc
1 L/h 50 mM 21.1 mM/h
1
2
K-M
Exp.
K-M
Exp.
1 1 1 1.5
0.96 0.97 1.01 1.53
1 1.49 1.11 1.5
0.95 1.77 1.13 1.53
1.5 1 1.5
1.5 1.33 1.54
1.65
1.74
The optimized parameter profiles for changes in one (dilution rate, X5 ) and two parameters (dilution rate and crotonobetaine concentration in the input, X7 ) are shown. Results are given as the values divided by the basal.
predictions, both in terms of stationary states experimentally observed as well as to the production speeds of L-carnitine. What can be seen in Table 1 is that there is close agreement between the experimental results and the model-based optimization predictions. In both cases assayed, the experimental steady state was the same predicted by the model and, more importantly, the increase in the rate of L-carnitine production was almost coincident with the predicted increase. 2.2. Case study 2. Application to biomedicine: design of new therapeutic approaches to human hyperuricemia (gout) In classical hyperuricemia, a functional defect in enzyme that controls the synthesis of de novo purines increases its activity and leads to an increase in the activity of degradative metabolic fluxes yielding uric acid (see Fig. 3). A complete model, in power-law formalism was presented and used for optimization purposes (Curto et al. [12,13]). The symptoms of this disease include acute episodes of arthritic pain and inflammation, and various kinds of nephropathy. Currently, gout treatments usually include a symptomatic treatment for joint pain, a restricted diet that precludes consumption of food with high concentrations of purine precursors, and the prescription of allopurinol. Allopurinol is a specific inhibitor of the enzyme xanthine oxidase that can lead to a drastic reduction in the concentrations of uric acid and urate (Klinenberg [24]). The IOM (mono-objective version) was applied for the design of new therapeutic strategies (Vera et al. [60]). The usual steady state of a metabolic network in a typical healthy individual is referred as the healthy state (HS). This HS is a metabolic configuration in which metabolite concentrations and fluxes have values that do not lead to metabolic diseases. In contrast, the pathologic state (PS) corresponds to that of an individual suffering from the disease. In this state, the system contains one or more enzymes with significantly altered activity that provoke changes in certain fluxes and concentrations (called key metabolites and fluxes) that ultimately lead to the manifestation of the symptoms. We can alter a metabolic network by using drugs to modulate enzyme activities. In this case, the aim is to alter the values of critical metabolite concentrations and fluxes in the network and shift them towards those encountered in the HS. Eventually we can also change the initial substrate concentrations, e.g., by dietary restriction. Once the metabolic network and variables associated with the PS are known, we aims to identify modifications of enzyme activities and substrate concentrations that shift the system to a steady state as close as possible to that of the HS, while also verifying all the additionally imposed physiological and biological constraints. These concepts can be translated into a mathematical framework for the identification of new target
enzymes for therapeutic treatment of metabolic diseases. Given that the aim is to shift the PS towards the HS, we look for solutions in which the values of metabolite concentrations and fluxes that play a key role in the manifestation of the disease will be shifted towards those seen in the HS. This aim can be mathematically represented in the following objective function: ⎤ ⎡ i |Xi − XiHS | + j |Jj − JjHS |⎦ , min ⎣ i
j
where i , j 0, and Xi and Jj are the key metabolites and fluxes, respectively. The values of i and j are determined by the relative importance of each key metabolite and flux in relation to the symptoms. We model the functional origin of the disease by imposing a characteristic value to the enzyme activities that underlie the PS profile: Xi = XiPS . In this equation, Xi represents the activity value for the deficient enzyme and XiPS is its characteristic value in the PS. The solution near to the HS would be a stable steady state of the considered metabolic network, which is reflected by the following set of equations: dX i = 0, dt
i = 1, . . . , n.
Moreover, we have to assume additional conditions to guarantee physiologically acceptable solutions. These conditions can be modelled as bounds in the concentrations: XiLB Xi XiUB , where XiLB and XiUB establish the physiologically admissible lower and upper bounds for each metabolite and modifiable enzyme activity Xi . Metabolites were allowed to vary from a 50% to 1000% their basal values: 0.5XiHS Xi 10XiHS ,
i = 1, . . . , 16,
meanwhile X17 only can vary a 50% up and down: HS HS X17 1.5X17 . 0.5X17
We consider an optimization problem for each modifiable enzyme. So, for each target enzyme Xi (i = 20, . . . , 46) we solve a problem considering a unique pair of bounds on enzymes: 0.1XiHS Xi XiHS taking the other enzymes their HS values. The outlined approach integrates modelling of the system and identification of target enzymes with the analysis of the available information and the choice of the best solutions. Each solution consists of a set of predicted values for metabolites, initial substrates and enzyme activities that describes a biologically acceptable steady state of the system that shifts the PS towards the HS. The computation of programs was carried out using the Matlab toolbox MetMAP (Vera and Torres [63]). Additional selection criteria were imposed in order to select the most feasible solutions within the generated set (Vera et al. [60]). Firstly, we imposed an upper limit on the concentration HS . The of uric acid (X16 ) equal to 105% of the HS value: X16 1.05X16 solutions that verify this equation were called satisfactory solutions. A second criterion was defined using the Cartesian distance in the space of the metabolite concentrations from the considered solution (X) to the HS (X HS ): 2 16 Xi HS − 1 . D(X, X ) =
XiHS i=1
J. Vera et al. / Computers & Operations Research 37 (2010) 1427 -- 1438
1431
Fig. 3. Scheme of the considered model of purines metabolism in humans. Time dependent metabolites [S-System variable, Name]: X1 , phosphoribosilpyrophosphate; X2 , inosine monophosphate; X3 , adenylsuccinate; X4 , pool of adenosine derivatives; X5 , S-adenosyl-L-methionine; X6 , adenine; X7 , xanthosine monophosphate; X8 , pool of guanosine derivatives; X9 , pool of deoxyadenosine derivatives; X10 , pool of phosphated deoxyguanosine derivatives; X11 , ribonucleic acid; X12 , deoxyribonucleic acid; X13 , pool of inosine derivatives; X14 , xantine; X15 , pool of guanosine derivatives; X16 , uric acid. Non-time dependent metabolites: X17 , ribose-5-phosphate; X18 , inorganic phosphate. Enzyme activities [S-System variable (abbreviation)]: X19 (Vprpps); X20 (Vgprt-hprt); X21 (Vaprt); X22 (Vden); X23 (Vpyr); X24 (Vasuc); X25 (Vasli); X26 (Vimpd); X27 (Vgmps); X28 (Vampd); X29 (Vgmpr); X30 (Vtrans); X31 (Vmat); X32 (Vpoliam); X33 (Dade); X34 (Vinuc-gnuc); X35 (Vgrna-arna); X36 (Vrnag-rnaa); X37 (Vdgnuc); X38 (Vada-dada); X39 (Vgdrnr-adrnr); X40 (Vgua); X41 (Dgdna-adna); X42 (Vdnag-dnaa); X43 (Vhx); X44 (Vxd-hxd); X45 (Vx); X46 (Vua). Thick gray arrows: de novo synthesis; square dotted arrows: recovery pathways; slash-dot arrows: processes directly associated with the degradation and synthesis of DNA and RNA; black arrows: degradation and elimination processes. Very thin and dotted black arrows: regulatory signal. (See [60]).
We called this the metabolic distance, which measures the deviation of the metabolite concentrations from the HS values in the considered solution. Solutions with a high value of D(X, X HS ) would cause metabolic disequilibrium and undesirable physiological side effects. Finally, solutions with the highest values for the activity of the modified target enzymes were chosen. We took into account the fact that higher enzyme activities imply the use of a lower dose of the specific drug, and reduced drug doses minimise adverse side effects. Six satisfactory solutions were obtained: Inhibition of 1. Adenine phosphoribosyltransferase (Vden X22 ); 2. AMP deaminase (Vampd X28 ), 3. RNases
to AMP and GMP (Vrgna Vrnaa X36 ); 4. 5 (3 )-nucleotidase (Vdgnuc X37 ); 5. guanine hydrolase (Vgua X40 ) and 6. xanthine oxidase (Vxd Vhxd X44 ). Further details on the selected solutions can be found in Table 2. Examination of the table also shows that all the solutions have significant values of D(X, X HS ) and require substantial inhibition of the targeted enzyme (low values in Xi ). In Fig. 4, this group of solutions is shown in the space of D(X, X HS ) and Xi /XiHS . The figure also shows the so-called utopian solution, the potential solution which has the lowest computed distance and the highest value of Xi . Solution 1, that inhibit X22 , has the
1432
J. Vera et al. / Computers & Operations Research 37 (2010) 1427 -- 1438
Table 2 Set of satisfactory solutions. Sol
i
Xi
D(X, X HS )
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X11
X12
X13
X14
X15
X16
X17
1 2 3 4 5 6
22 28 36 37 40 44
0.22 0.33 0.36 0.18 0.28 0.33
1.56 1.86 2.07 1.63 4.81 7.73
2.15 1.03 1.55 1.3 1.11 1.07
1.58 0.84 1.38 1.20 1.32 1.7
1.61 1.51 1.15 1.07 1.37 1.49
1.34 1.99 0.94 0.95 1.22 1.27
1.06 1.15 0.99 0.99 1.04 1.05
0.88 2.41 0.69 0.79 1.2 1.3
1.09 0.65 1.2 1.12 1.03 1.2
1.48 0.97 1.27 1.16 1.37 1.09
1.07 1.03 1.03 1.69 1.05 1.02
1.06 0.97 1.05 2.25 1.05 1.01
1.07 1.03 2.89 1.02 1.05 1.02
1.05 1.0 1.03 1.63 1.04 1.01
0.99 1.14 0.95 0.98 1.76 5.38
1 1 1 1 1 7.29
1.01 0.83 1.07 1.03 5.7 1.24
1 1 1 1 1 1
1.14 0.6 0.67 0.51 0.54 0.5
All variables entries are quotients of dividing the true value by the corresponding HS value.
8
X
44
7
D(X,XHS)
6 5
X
40
4 3 2 1
X
0.1
0.15
37
X
X 22
0.2
0.25
0.3
28
X
36
0.35
0.4
HS
Xi /Xi
Fig. 4. Solutions with a single target enzyme. The large square represents the utopian point, while diamonds indicate the selected satisfactory solutions.
lowest metabolic distance (1.56), but the highest enzyme activity value corresponds to solution 3, X36 = 0.36. Examination of Fig. 2 shows that the best treatments with prescription of diet are through inhibition of X36 (solution 3) or X28 (solution 2), the closest solutions to the utopian point. An important observation relates to the solution 6, that involving inhibition of X44 . This solution coincides with the most common current clinical treatment for hyperuricemia, which is based on the use of a combination of allopurinol (an inhibitor of X44 ) and a diet low in purine precursors (X17 ) (see Table 2). Moreover, our method predicts the well-known metabolic side effects of treatment with allopurinol, namely a strong increase in the concentration of xanthine and hypoxanthine (X13 and X14 ). This represents an a posteriori pragmatic verification of our model predictions. IOM has been efficient not only to found the current clinical treatment of the disease but also several other possible target enzymes with potentially lower side effects. 2.3. Case study 3. The multiobjective version of the IOM: application to the ethanol production by Saccharomyces cerevisiae We have used a mathematical model of the ethanol production by S. cerevisiae presented by Schlosser et al. [44] as a case study to apply the multiobjective version of the IOM procedure. Fig. 5 is a schematic representation of the investigated metabolic system. As we can see, the glucose provided to the microorganism is converted in ethanol after five consecutive biochemical processes. In addition, the biochemical pathway has two branches and part of the glucose is transformed into the side-products polysaccharides and glycerol. The model refers to the ethanol production under anaerobic, no growing conditions, with glucose as the sole carbon source for the
Fig. 5. Anaerobic fermentation pathway of yeast Saccharomyces cerevisiae from glucose to ethanol, glycerol and polysaccharides. This metabolic pathway involves five dependent concentrations and eight fluxes. Chemical species: Intracellular glucose (X1 ); glucose-6-phosphate (X2 ); fructose-1,6-bisphosphate (X3 ); phosphoenolpyruvate (X4 ); ATP (X5 ); polysaccharides; glycerol; ethanol (X16 ); ADP. ADP is function of ATP, the equilibrium constant Keq for the reaction ATP + AMP ↔ 2 ADP, and the sum of adenine nucleotides (ATP + ADP + AMP) (for details see Schlosser et al. [44]). Polysaccharides, glycerol and ethanol correspond to the outputs of the system. Enzymes/pathway steps: Vin , glucose uptake (X6 ); VHK , hexokinase (X7 ); VPFK , phosphofructokinase (X8 ); VGAPD , glyceraldehyde 3-phosphate dehydrogenase (X9 ); VPK , pyruvate kinase (X10 ); VPOL , total disaccharide and polysaccharide storage (X11 ); VGOL , glycerol production (X12 ); VATPase (X13 ), generalized rate for all ATP-utilizing process, with the exception of VHK , VPFK and VPOL .
microorganism and absence of nitrogen. Under these conditions, only the metabolic pathways to ethanol, glycerol and to polysaccharide storage were operative. The preliminary model by Schlosser et al. [44] was converted into an S-system model in Vera et al. [58] with the following structure: dX 1 = 1.0006X2−0.0492 X6 − 1.6497X10.5582 X50.0465 X7 , dt dX 2 0.1678 = 1.6497X10.5582 X50.0465 X7 − 0.5793X20.5097 X5−0.2218 X80.8322 X11 , dt dX 3 = 0.4536X20.4407 X5−0.2665 X8 dt 0.1453 − 0.2456X30.4506 X40.0441 X50.092 X90.8547 X12 , dX 4 = 0.2365X30.5285 X50.0994 X9 − 2.0892X3−0.0075 X40.304 X50.0484 X10 , dt dX 5 0.5 = 1.406X30.2605 X40.152 X50.0739 X90.5 X10 dt 0.0589 0.297 − 2.9437X10.1962 X20.1791 X50.2354 X70.3514 X80.2925 X11 X13 .
J. Vera et al. / Computers & Operations Research 37 (2010) 1427 -- 1438
In our investigation, we have considered three biotechnologically relevant optimization objectives (Vera et al. [58,63]): (i) the maximization of the ethanol production, Fprod ; (ii) the minimization of the total amount of internal metabolite concentrations, Fint and (iii) the minimization of the total amount of enzyme activities, Fenz . The first objective directly relates to the desired enhancement of productivity in the microorganism, measured in terms of ethanol production. The other two objectives relates to the minimization of the amount of glucose transformed into other products different from ethanol (objective (ii)), but also to the cell viability by reducing the osmolarity stress (Atkinson [3]) and the metabolic burden (Snoep et al. [48]) (objective (iii)). Being Y = ln(X), they are defined as follows: min Z(Y),
Z(Y) = (−Fprod (Y), Fint (Y), Fenz (Y)),
where the maximization objective becomes a minimization objective by changing the sign of the objective function, which has no repercussions in the position of its extrema. The first objective function take the following form in the logarithmic transformed space: Fprod (Y) = −0.0075Y3 + 0.304Y4 + 0.0483Y5 + Y10 , and we consider the other two objectives as Fint (Y) =
5 i=1
Yi
and
Fenz (Y) =
13
Yi .
1433
provided by the solutions (Vera et al. [58,61]). Thus, three magnitudes were formulated, , , and , with the following structure:
(X) =
VPK (X) , VPK (X 0 )
5
(X) = 5i=1
Xi
0 i=1 Xi
13
,
i=6 (X) = 13
Xi
0 i=6 Xi
.
is defined as the ratio between the total intermediate concentration of the current solution and the one in the original basal solution of the system. In a similar way, we defined as the ratio between the total enzyme activities at the optimum solution and the ones at the basal steady state, while describes the same ratio for the ethanol production of the system. In order to support the biotechnologists making the proper decision, we have introduced an additional criterion. Firstly, we ruled out all solutions with the lowest admissible ethanol production (min = 2.0). We furthermore ranked the remaining solutions according with their parametric distance to the so-called utopian point, a fictitious solution constructed with the optimal values of ethanol production (utp ), total intermediate concentration (utp ) and total enzyme conutp centration ( ) from the computation of separated mono-objective programs (Vera et al. [58,61]): utp = 4.986,
utp = 0.5,
utp = 1.0.
i=6
In our optimization program we considered three types of constraints (Vera et al. [58,61]): (a) those that guarantee that solutions correspond to a steady-state solution of the system; (b) those setting the physiologically feasible lower and upper boundaries for the model variables and (c) a set of additional constraints related to special features of the considered biosystem. The constraints that set up the boundaries for the model variables (metabolite or enzyme concentrations) are especially important and configure the properties of the solutions; their values are dictated by the experience and are specifically formulated for each variable. In our case, previous studies with similar metabolic systems (Alvarez-Vasquez et al. [1], Marín-Sanguino and Torres [27]) led us to consider for the lower boundary 50% of the original basal steady-state value and for the upper boundary five times this basal value for most of the variables (0.5Xi0 < Xi < 5Xi0 ). The exception to this rule were the ATPase activity (X5 < 1.5X50 ), limited by additional physiological constrains, and the side-products polysaccharides and glycerol, which were main0 0 ; X13 = X13 ) (see [58,61]). Moretained at their basal values (X11 = X11 over, an additional constraint was introduced based on our experience analysing the dynamics of the metabolic system, limiting the value of the flux ratio through the enzymes X9 and X10 to prevent dynamical instability in the solutions (X9 /X10 1.75). We furthermore imposed a last constraint in order to ensure that any computed solution doubles at least the ethanol production in the original unmodified microorganism: VPK (X) 2VPK (X 0 ). After computing the multiobjective program using a version of ADBASE (Steuer [52]) integrated in the software package MetMAP (Vera and Torres [63]), 22 efficient solutions were obtained. In our investigation, we considered only the efficient vertexes of the problem, which already contain complex enough information about the properties of the biochemical system. The initial analysis of the results suggested that solutions with a high ethanol production are only possible when the maximum of ATP (X5 = 1.5) and ATPase (X13 = 5) are provided to the system (Vera et al. [58]). From a biological perspective, this information indicates the extreme importance of the ATP turnover to increase the productivity of the system. We furthermore faced task of analysing and classifying the generated solutions. In order to facilitate this, we defined some auxiliary biologically relevant parameters that integrate the information
This solution is actually a unfeasible solution because it does not satisfy all the physiological constraints imposed. In a similar way, we defined the anti-utopian solution, extracting the worst value from every column in the payment matrix for the separated monoobjective programs (Vera et al. [61]):
fun = 2.0,
fun = 4.76,
fun = 3.30.
With this information we defined a weighted parametric distance from every solution to the utopian point as follows: ⎡ 2 ⎤ 2 utp − (X) 2 utp − (X) ⎦ 1 ⎣ utp −(X) utp . + + D(X, X )=
3 utp −fun utp −fun utp −fun Fig. 6 shows the solutions grouped in three sets acscording with their D(X, X utp ) value. Set I corresponds to those solutions with the largest D(X, X utp ) value. This group is characterized by a very high productivity (), a positive feature, but also by high total intermediate concentration (), an undesirable property (both with values bigger than 4). Set II includes the closer solutions to the utopian point D(X, X utp ), characterised by productivity and enzyme levels bigger than 3, but low metabolite concentrations (0.8–1.85). Finally, Set III represents those with intermediate values of D(X, X utp ), high productivity and enzyme levels (bigger than 4), but medium values of total metabolites concentration (1.25–1.75). Some other solutions are not classified in any of the previous sets. When the parametric distance is considered the criterion to choose the appropriate solution for biotechnological purposes, we found two solutions which are clearly better (highlighted in Fig. 6 with a five-pointed star) and are shown in Table 3. These two solutions represent totally opposite alternatives from a biotechnological perspective. Solution I in Table 3 prefers the increase in the ethanol production at the price of high cell resources consumption, while solution II establishes a biotechnological compromise between satisfactory increasing in the ethanol production and non-excessive cell resource consumption.
1434
J. Vera et al. / Computers & Operations Research 37 (2010) 1427 -- 1438
Fig. 6. Efficient solutions for the multicriteria optimization of ethanol production by S. cerevisiae. (•) Set I, solutions with the largest D(X, X utp ) value. () Set II, solutions closer to the utopian point (star). (䊏) Set III, solutions with intermediate values of D(X, X utp ). (×) represents the anti-ideal point.
Table 3 Best two efficient solutions for the multicriteria optimization of ethanol production by S. cerevisiae. Sol.
D(X, X utp )
I II
4.987 2.791
1.812 0.596
3.301 2.798
0.604 0.613
3. Beyond linearity: geometric programming The previous examples used an S-system formulation of the model in order to use linear programming. The power-law method offers a more general kind of model of which the S-system is a particular case, the generalized mass action (GMA). Such systems are interesting not only because they preserve the structure of the fluxes in the original system (no aggregation is performed) but also because virtually any non-linear system can be represented exactly by a GMA of higher dimension through algebraic equivalence transformations of variables in a process called recasting (Savageau and Voit [42]). In order to deal with GMA equations, linear programming has to be replaced by a more flexible tool such as general nonlinear programming techniques or structured problem methods as geometric programming (GP) (Zener [70]). Geometric programs are a class of optimization problems that present a special structure in all the functions involved (objective and constraints). As linear programming (which is included in GP) it falls in the broader category of convex optimization. Convex problems are among the few non-linear tasks where, thanks to powerful interior point methods, the efficient determination of global optima is feasible even for large scale systems. For example, a geometric program of 1000 variables and 10,000 constraints can be solved in less than a minute on a desktop computer (Boyd and Vandenberghe [5]); the solution is even faster for sparse problems as they are found in metabolic engineering. Furthermore, easy to use solvers are starting to become available (Grant and Boyd [17]; Koh et al. [25]). The special structure that shows the objective function and the constraints of a geometric program is a sum of monomials, i.e., power-law terms as each term of the right-hand side of Eq. (1): a
a
M(X) = cX 11 X22 · · · Xnan ,
where c is positive and ai are any real number. (Do not confuse with the standard “monomial” in algebra, where c can be any real and all the exponents are integers.) Sums of monomials, all with positive sign, are called posynomials. If some of the monomials enter the sum with negative signs, the collection is called a signomial. The peculiarities of convexity and GP methods render the difference between posynomials and signomials crucial. A GP problem in standard form shows as min
P0 (X)
(3)
s.t. Pj (X) 1, Mk (X) = 1, Xi > 0,
j = 1, . . . , p, k = 1, . . . , q,
i = 1, . . . , n
(4) (5) (6)
being Mk (X) monomials, while the objective function P0 (X) and the functions Pj (X) involved in inequalities must be posynomials. Signomials are not allowed, and optimization problems involving them require additional effort. Under these conditions (minimization of a posynomial subject to posynomials smaller or equal to 1, monomials equal to 1) it can be guaranteed the conversion of the problem to a convex program and so the existence of a global solution, and the possibility to solve it in a very efficient manner. The equivalence between monomials and power laws immediately suggests the potential use of GP for optimization problems formulated within the framework of biochemical systems theory (BST). In the next section two different ways to afford the optimization of GMA models by a sequence of GP problems will be proposed. The optimization of an S-system is included in Eqs. (3) and (5) for the particular case of a monomial objective function. This can easily be seen by undoing the logarithmic transformation. Therefore, GP can replace linear programming in the general scheme of the IOM method. The immediate advantage of this would be the possibility of including upper bounds to posynomial functions through Eq. (4). Such functions appear frequently in biochemical systems as sums of variables, i.e., the total protein content of the cell should not increase too much. By contrast, steady-state GMA equations (the right-hand side of Eq. (2) equalled to zero) do not automatically fall within the GP structure, because GMA systems usually include negative terms, thus making them signomials. Two ways are proposed to adapt GP solvers for the treatment of GMA systems, both relying on condensation (Floudas [16]).
J. Vera et al. / Computers & Operations Research 37 (2010) 1427 -- 1438
Condensation is a standard procedure in GP of which flux aggregation is a particular case. Namely, a posynomial is approximated by a single monomial. In the terminology of GP, the condensation of a posynomial P(x) is generically denoted as C[P(X)]=C[M1 (X)+M2 (X)+ · · · + Mk (X)] = M0 (X), and it is done at a certain operating point X 0 : C[M1 (X) + M2 (X) + · · · + Mk (X)] =
k Mi (X) wi i=1
wi
(7)
being wi = Mi (X 0 )/P(X 0 ). In the terminology of Eq. (2): ⎡ ⎤ n+m k n+m fjk g C⎣ j X ⎦= X k, k
j=1
k
j=1
k=1
where and gk are chosen such that equality holds at the chosen operating point; thus, the result is equivalent to the Taylor series linearization that is fundamental in BST (Savageau [38,39], Voit [65]). As in the Taylor series, the condensed form is equal to the original equation at the operating point. It can be shown that the left and right-hand side of Eq. (7) are equivalent to those of the arithmetic–geometric (A–G) inequality, n
ai
i=1
n ai wi i=1
wi
.
(8)
Therefore, at any point the condensed form is an underestimation of the original. Objective functions can only be minimized in GP. This is seldom a problem given that the functions to maximize are often monomials that can be inverted: a variable, a reaction rate or a flux ratio. Posynomial objectives are usually entitled for minimization, like the sum of certain variables. Nonetheless, it is also relevant in metabolic engineering to consider the maximization of posynomials, such as the sum of variables or fluxes. In such cases, condensation or recasting can be used. For an extensive introduction on GP modelling see Boyd et al. [6]. A local approach: controlled error method. Each of the steady-state equations of a GMA system may be written as the single difference of two posynomials representing the formation and degradation terms of certain variable: P(X) − Q(X) = 0.
(9)
If both posynomials are condensed, the equation will be reduced to the standard form for monomial equality restriction: 1=
C[P(X)] P(X) ≈ Q(X) C[Q(X)]
as the quotient of two monomials is a monomial. Since the condensed steady state equations of the GMA are equivalent to those of an S-system, this method could be regarded as a direct generalization of classical IOM methods. In other words, GP can include the equations of IOM as monomial equalities but also enables us to include posynomial inequalities in the problem. The first consequence of this is that S-systems can be optimized globally together with a constraint establishing an upper level for the total enzyme in the cell. Such a constraint or any other posynomial expression cannot be directly treated in IOM since it is non-linear in the logarithmic space. There is an additional advantage coming from the possibility of using posynomial constraints. When a posynomial is approximated by condensation, the A–G inequality, Eq. (8), guarantees that the monomial is an underestimation of the original posynomial. Furthermore, the posynomial structure is not altered when divided by a monomial so the quotient between a posynomial and its condensed form, always greater than or equal to 1, provides the exact condensation
1435
error as a posynomial function that can be added to the geometric problem as an extra constraint: P(X) 1 + tol. C[P(X)] The importance of controlling the aggregation error has been pointed out and led to the proposal of variations in IOM [16] that included linear approximations to the error as constraints added to the problem. With the above discussed constraints, the exact error is taken into account. The constraints on the error can even go beyond providing a safer flux aggregation. When the GMA system has been obtained by recasting (and is therefore an exact representation of the original model) the approximated S-system form can be obtained through condensation instead of using a Taylor series. Although both methods yield the same result, the condensation approach can be subject to error controlling constraints to limit every possible source of error, as shown in the example below. This method is intended to be used iteratively so that the original problem is solved as a series of GPs. An initial point X 0 is used to set up a GP taking X 0 as the operating point for the necessary condensations. The solution X 1 is used for the next problem. The extra set of constraints ensures that every iteration will only explore the neighbourhood of the feasible region in which the error due to condensation remains below an arbitrary tolerance set by the user. When two successive solutions coincide the parameter tol can be diminished. A global approach: penalty treatment. A similar yet distinct strategy that minimizes the use of condensation is an extension of the penalty treatment method (Roundtree and Rigler [37]), a classic algorithm for signomial programming. In this method, a signomial constraint such as Eq. (9) is replaced by two posynomial equalities through the creation of an ancilliary variable t: P(X) = t, Q(X) = t. These are not valid GP constraints, so the following relaxed version is used: P(X) t, Q(X) t. Upon dividing by t, the feasible area of the original problem is contained in the feasible area of the new relaxed version and approximation by condensation is not needed. In order to force these inequalities to be tight in the final solution, the objective function is augmented with penalty terms that grow with the slackness of the constraints, namely the inverses of the condensation of the relaxed constraints. The result of this procedure is a legal GP: t t + w− w+ min P0 (X) + i C[P (X)] i C[Q (X)] i i i
s.t.
Pi (X) 1, t Qi (X) 1, t
i = 1, . . . , n,
where the condensed terms are initially calculated at the basal steady state. If the obtained solution falls within the feasible area of the original problem, it is taken as a solution; if it does not (any of the relaxed inequalities is below 1) the solution is used as the next reference point: condensations are calculated again, the weights of the violated constraints are increased and the new problem is solved. This procedure is repeated until a satisfactory solution is obtained. These variations on the original method serve to prevent the penalty terms from dominating the objective function and pushing the relaxed problem towards the boundaries of the feasible region from the very beginning.
1436
J. Vera et al. / Computers & Operations Research 37 (2010) 1427 -- 1438
Table 4 Comparison of results obtained by IOM and GP (both penalty and error tolerance) when applied to anaerobic fermentation in S. cerevisiae.
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 Flux
Basal
Predicted IOM
Final IOM
GP
0.034 1.01 9.18 0.0095 1.127 19.7 68.5 31.7 49.9 3440 14.31 203 25.1 0.042 30.22
2.0 2.0 2.0 2.0 0.50
2.19 1.58 1.53 1.19 0.28
2.0 2.0 2.0 2.0 0.50 7.33 3.78 2.86 4.72 4.16 0.010 0.010 14.04 1.0 198.85
7.49 3.85 2.92 6.48 5.72 0.010 0.010 27.04 1.0 273.12
214.62
Violated constraints appear in boldface. The three last columns are given as quotient by the basal value.
Table 5 Comparison of results obtained by IOM, modified IOM and GP (both penalty and error tolerance) when applied to a model of tryptophan production in E. coli.
X1 X2 X3 X4 X5 X6 X7 Vtrp
Basal
Iterative IOM
Modified IOM
GP
0.18 7.99 1418 0.0031 5 2283 430 1.31
1.20 1.09 0.46 0.005 4 5000 1000 3.88
1.20 1.05 0.27 0.062 4 5000 1000 4.54
1.20 1.15 0.8 0.004 4 5000 1000 3.06
Violated constraints appear in boldface. Values divided by the corresponding basal.
Application to the optimization of the ethanol production by S. cerevisiae (Marin-Sanguino et al. [29]). In Table 4 we can see the results of optimizing a GMA version of the fermentation model presented above for S. cerevisiae. The imposed constraints limited the maximum and minimum values of each metabolite to twofold and half the basal level, respectively. An additional constraint was imposed to keep the total enzyme concentration below four times the basal value. The `Predicted IOM' column shows the results predicted by IOM after aggregating the fluxes of the GMA and applying linear programming to the resulting S-system. The `Final IOM' column shows the results of substituting the enzyme levels predicted by IOM in the original GMA model. The discrepancies between them are not too significant for most metabolites, which remain within the designated margins. An exception is the value of ATP (X5 ) which drops to almost half the lower bound imposed to it in the optimization. As can be seen in the next column, the optimization through GP (both controlled error and penalty treatment) finds a solution without violating any constraints. Application to the optimization of the tryptophan production by E. coli. Table 5 compares several optimization methods applied to a model of tryptophan production in E. coli. (Xiu et al. [69]). The original model was not a power law, so it was transformed into a GMA through recasting (Marin-Sanguino et al. [29]). The recasting process generated some extra posynomial and signomial constraints that were condensed in each iteration together with the steady-state constraints. The first column shows the results predicted by the IOM in S-system form. The second column shows the result of adjusting the parameters of the original system according to the results of IOM and letting it reach a new steady state (Marin-Sanguino and Torres [27]). As can be seen, there is a discrepancy between the prediction
of the S-system and the real steady state. Such discrepancy is especially significant in the level of Tryptophan (represented by X3 ). The next column shows the results of optimizing with an extra set of constraints that approximate the condensation error linearly (Xu et al. [68]). Finally, the results obtained by both GP methods are shown in the last column. 4. Discussion Although several groups have published a number of applications during the last decade, the integrated approach for the improvement of biotechnological systems optimization consisting in the combination of mathematical modelling and optimization of biochemical systems remains underexploited. Two main reasons explain this situation. Firstly, the importance of mathematical modelling in biochemical systems has been acknowledged only in very recent times, but also the experimental techniques necessary for a proper characterisation of such models have been accessible only in the last years. Secondly, the optimization of heavily non-linear systems presents inherent difficulties, which are evident when one confronts (structurally and dynamically) complex biological systems. Thus, most of the recent practical and experimental applications of mathematical optimization in biochemical systems deal almost exclusively with stoichiometric models, a special fully linear class of ODE models for biochemical systems (Palsson [33]). In this paper we have shown how to overcome these difficulties through the use of a special kind of ODE models called power-law models and linear programming (Torres and Voit [54]). The homogeneous structure of the power-law models is the most relevant feature of this formalism because it allows the standardization of the modelling-based analysis of biochemical systems (composed by model design and calibration, steady-state analysis, dynamical analysis and simulation), which in many cases becomes a fuzzy, nonsystematic procedure within other modelling formalisms. This makes power-law models very convenient for the integrated optimization of biochemical systems. In addition, when the logarithmic transformation proposed in the IOM is applied to biotechnological optimization programs (Voit [64], Torres et al. [53]), they become linear and a plethora of refined computationally efficient algorithms for linear programming can be applied. Moreover, the method scales very well with problem size: when using the IOM the computational requirements are negligible in most of the state-of-the-art biochemical problems analysed and therefore they can be solved in almost real time. This means that the IOM procedure allows an interactive optimization of biochemical models, in which the researcher can define the program, compute and analyse the solutions and refine the program in a fast iterative manner. This is a very valuable feature when biochemical systems are investigated, which very often requires the use of experience and intuition over several iterations. In this vein, simulation software packages like MetMAP [63] enables this iterative procedure because the user can define the optimization program, compute the solutions and analyse their biological features via computational simulation using a unique integrated software tool. However, the IOM has some drawbacks. While the computation of the solution via Simplex method is fast, every time the optimization program is modified the logarithmic transformation of the equations must be performed. This increases the total computational time required for the method. In addition, the heuristic nature of the last step in the IOM method may provoke a loss of efficiency in the solutions when the behaviour of the original model departs from their Ssystem expansion. This may result in the violation of some of the restrictions imposed. However, our experience shows that these violations are often negligible and the steady-state solutions usually stay in the vicinity of the real efficient solution. Moreover, the method
J. Vera et al. / Computers & Operations Research 37 (2010) 1427 -- 1438
incorporates tools to estimate solutions closer to the optimum in the non-linear domain. This latter step consumes considerable computational effort, but even more important, the solution obtained ceases to be optimal. The practical application of the method indicates however that in most cases it remains a high quality solution. The original IOM method is based on a local linearization in logarithmic coordinates. Replacing LP by GP in the error controlled method provides an important improvement since it provides the optimization process with exact knowledge on the error introduced by such linearization. Furthermore, the penalty method offers a completely different approach that aims to converge on a global optimum. GP seems to offer interesting possibilities in the optimization of biotechnological systems, both through local and global methods. It remains yet to be seen how both approaches perform comparatively. On one hand, the penalty treatment is most promising since it offers a global approach. On the other hand, its convergence capabilities are yet to be tested on large scale problems. The special nature of the biological systems makes the interpretation of the solutions arising from the IOM or other methods rather peculiar. From a biotechnological point of view, we are actually not interested in the numerical values for the model variables encoded by the optimal solution (this precision is actually not reachable with genetic engineering). Our interest is to focus on the information that such solution contains concerning what are the critical cellular compounds which concentration must be modified to improve the technological features of the investigated biosystem. The ability of the IOM to allow real time iterative analysis rather than the accuracy of its solutions makes this method a suitable tool for prototyping optimal genetic modifications of microorganisms or suggesting critical biochemical interactions in drug discovery. Finally, we think that in the next future the development of the DNA recombinant technologies (Porro et al. [34]), together with the emergence of the new paradigms of the metabolic engineering (Stephanopoulus [50], Bro and Nielsen [8]) and synthetic biology (Meyer et al. [31], Greber and Fussenegger [18]) will enhance the importance of the model-based biochemical optimization in biotechnology, making virtually possible any change in the cell proteome. Acknowledgements This work was supported by the Spanish Ministry of Education and Science, research Grant no. BIO2005-08898-C02-02. The contribution of Julio Vera is supported by the German Federal Ministry of Education and Research (BMBF) as a part of the project CALSYSFORSYS under contract 0315264 (www.sbi.uni-rostock.de/calsys). The authors acknowledge the support of José Hormiga in the figure design. References [1] Alvarez-Vasquez F, González-Alcón CM, Torres NV. Metabolism of citric acid production by Aspergillus niger. Model definition, steady state analysis, dynamic behavior and constrained optimization of citric acid production rate. Biotechnol Bioeng 2000;70(1):82–108. [2] Alvarez-Vasquez F, Cánovas M, Iborra JL, Torres NV. Modeling, optimization and experimental assessment of continuous L-(-)-carnitine production by Escherichia coli cultures. Biotechnol Bioeng 2002;80(7):794–805. [3] Atkinson DE. Limitation of metabolite concentrations and the conservation of solvent capacity in the living cell. Curr Top Cell Regul 1969;1:29–43. [4] Banga JR, Balsa-Canto E, Moles CG, Alonso AA. Dynamic optimization of bioprocesses: efficient and robust numerical strategies. J Biotechnol 2005;117(4):407–19. [5] Boyd S, Vandenberghe L. Convex optimization. Cambridge: Cambridge University Press; 2004. [6] Boyd S, Kim S, Vandenberghe L, Hassibi A. A tutorial on geometric programming. Optim Eng 2007;8:67–127. [8] Bro C, Nielsen J. Impact of `ome' analyses on inverse metabolic engineering. Metab Eng 2004;6(3):204–11. [9] Canovas M, Maiquez JR, Obón JM, Iborra JL. Modeling of the biotransformation of crotonobetaine into L-(-)-carnitine by Escherichia coli strains. Biotechnol Bioeng 2002;77(7):764–75.
1437
[12] Curto R, Voit EO, Sorribas A, Cascante M. Validation and steady-state analysis of a power-law model of purine metabolism in man. Biochem J 1997;324:761–75. [13] Curto R, Voit EO, Cascante M. Analysis of abnormalities in purine metabolism leading to gout and to neurological dysfunction in man. Biochem J 1998;320: 477–87. [15] Dollery C, Kitney R, Challis R, Delpy D, Edwards D, Henney A, et al. Systems biology: a vision for engineering and medicine. Report of the Royal Academy of Engineering and Academy of Medical Sciences, London; 2007 [ISBN No. 1-904401-13-5]. [16] Floudas CA. Deterministic global optimization. Dordrecht: Kluwer Academic Publishers; 2000. [17] Grant M, Boyd S. CVX: matlab software for disciplined convex programming (web page and software), April 2008 http://standford.edu/∼boyd/cvx. [18] Greber D, Fussenegger M. Mammalian synthetic biology: engineering of sophisticated gene networks. J Biotechnol 2007;130(4):329–45. [19] Hatzimanikatis V, Emmerling M, Sauer U, Bailey JE. Application of mathematical tools for metabolic design of microbial ethanol production. Biotechnol Bioeng 1998;58(2–3):154–61 Review. [20] Heinrich R, Schuster S. The regulation of cellular systems. New York: Chapman & Hall; 1996. [22] Kitano H. Foundations of systems biology. Cambridge, MA: MIT Press; 2001. [23] Kitano H. Towards a theory of biological robustness. Mol Syst Biol 2007;3:137. [24] Klinenberg JR, Goldfinger SE, Seegmiller JE. The effectiveness of the xantine oxidase inhibitor allopurinol in the treatment of gout. Ann Intern Med 1965;62:639–47. [25] Koh K, Kim S, Mutapic A, Boyd S, GGPLAB: a simple matlab toolbox for geometric programming, Version 1.00, May 2006 http://standford.edu/∼boyd/ggplab. [27] Marín-Sanguino A, Torres NV. Optimization of tryptophan production in bacteria. Design of a strategy for genetic manipulation of the tryptophan operon for tryptophan flux maximization. Biotechnol Prog 2000;16(2):133–45. [28] Marín-Sanguino A, Torres NV. Modelling, steady state analysis and optimization of the catalytic efficiency of triosephosphate. Bull Math Biol 2002;64(2): 301–26. [29] Marin-Sanguino A, Voit EO, Gonzalez-Alcon C, Torres NV. Optimization of biotechnological systems through geometric programming. Theor Biol Med Model 2007;4:38. [30] Mendes P, Kell D. Non-linear optimization of biochemical pathways: applications to metabolic engineering and parameter estimation. Bioinformatics 1998;14(10):869–83. [31] Meyer A, Pellaux R, Panke S. Bioengineering novel in vitro metabolic pathways using synthetic biology. Curr Opin Microbiol 2007;10(3):246–53. [33] Palsson BO. Systems biology: properties of reconstructed networks. Cambridge: Cambridge University Press; 2006. [34] Porro D, Sauer M, Branduardi P, Mattanovich D. Recombinant protein production in yeasts. Mol Biotechnol 2005;31(3):245–59. [35] Preusser A, Wagner U, Elsner T, Kleber HP. Crotonobetaine reductase from E. coli consist of two proteins. Biochim Biophys Acta 1999;14331:166–78. [36] Roth S, Jung K, Jung H, Hommel RK, Kleber HP. Crotonobetaine reductase from Escherichia coli. A new inducible enzyme of aerobic metabolism of L-(-)-carnitine. Antoine van Leewenhoek J Microbiol Serol 1994;65:63–9. [37] Roundtree D, Rigler A. A penalty treatment of equality constraints in generalized geometric programming. J Optim Theory Appl 1982;38(2):169–78. [38] Savageau M. Biochemical system analysis. I. Some mathematical properties of the rate law for the component enzymatic reactions. J Theor Biol 1969;25(3): 365–9. [39] Savageau M. Biochemical system analysis: a study of function and design in molecular biology. Reading, MA: Addison-Wesley; 1976. [42] Savageau M, Voit EO. Recasting nonlinear differential equations as S-systems: a canonical nonlinear form. Math Biosci 1987;87:83–115. [44] Schlosser PM, Riedy G, Bailey J. Ethanol production in baker's yeast: application of experimental perturbation techniques for model development and resultant changes in flux control analysis. Biotechnol Prog 1994;10(2):141–54. [47] Sendín OH, Vera J, Torres NV, Banga JR. Model based optimization of biochemical systems using multiple objectives: a comparison of several solution strategies. Math Comp Model Dyn 2006;12(5):469–87. [48] Snoep JL, Yomano LP, Westerhoff HV, Ingram LO. Protein burden in Zymomonas mobilis: negative flux and growth control due to overproduction of glycolytic enzymes. Microbiology+ 1995;141:2329–37. [49] Sorribas A, Hernández-Bermejo B, Vilaprinyo E, Alves R. Cooperativity and saturation in biochemical networks: a saturable formalism using Taylor series approximations. Biotechnol Bioeng 2007;97(5):1259–77. [50] Stephanopoulos G. Metabolic fluxes and metabolic engineering. Metab Eng 1999;1(1):1–11. [52] Steuer RE. ADBASE: A multiple objective linear programming solver for all efficient extreme points and all unbounded efficient edges. Terry college of Business, University of Georgia, Athens; 2003. [53] Torres NV, Rodríguez F, González-Alcón C, Voit EO. An indirect optimization method for biochemical systems: description of the method and application to ethanol, glycerol and carbohydrates production in Saccharomyces cerevisiae. Biotechnol Bioeng 1997;55(5):758–72. [54] Torres NV, Voit EO. Pathway analysis and optimization in metabolic engineering. Cambridge: Cambridge University Press; 2002. [57] Vera J, Bachmann J, Pfeifer AC, Becker V, Hormiga JA, Torres Darias NV. et al. A systems biology approach to analyse amplification in the JAK2-STAT5 signalling pathway. BMC Syst Biol 2008;2:38.
1438
J. Vera et al. / Computers & Operations Research 37 (2010) 1427 -- 1438
[58] Vera J, De Atauri P, Cascante M, Torres NV. Multicriteria optimization of biochemical systems by linear programming. Application to the ethanol production by Saccharomyces cerevisiae. Biotechnol Bioeng 2003;83(3):335–43. [59] Vera J, Balsa-Canto E, Wellstead P, Banga JR, Wolkenhauer O. Power-law models of signal transduction pathways. Cell Signalling 2007;19:1531–41. [60] Vera J, Curto R, Cascante M, Torres NV. Detection of potential enzyme targets by metabolic modelling and optimization. Application to a simple enzymopathy. Bioinformatics 2007;23(17):2281–9. [61] Vera J, González-Alcón CM, Torres NV. Multiobjective optimization of biochemical systems. In: Pandalay SG, editor. Recent research developments in applied microbiology & biotechnology, vol. 1. Kerala, India: Research Signpost; 2004. p. 69–85. [62] Vera J, Torres NV, Moles CG, Banga JR. Integrated nonlinear optimization of bioprocesses via linear programming. AIChE J 2003;49(12):3173–87. 䉷 [63] Vera J, Torres NV. MetMAP: an integrated Matlab package for analysis and optimisation of metabolic systems. In Silico Biol 2004;4(2):97–108.
[64] Voit EO. Optimization in integrated biochemical systems. Biotechnol Bioeng 1992;40:572–82. [65] Voit EO. Computational analysis of biochemical systems. A practical guide for biochemists and molecular biologist. Cambridge, UK: Cambridge University Press; 2000. [67] Wolkenhauer O. Defining systems biology: an engineering perspective. IET Syst Biol 2007;1(4):1–4. [68] Xu G-X, Shao CX, Zhi-Long K. A new algorithm for steady-state optimization of biochemical systems. Control Theory Appl 2007;23(4):574–80. [69] Xiu Z, Chang Z, Zeng A. Nonlinear dynamics of regulation of bacterial trp operon: model analysis of integrated effects of repression, feedback inhibition, and attenuation. Biotechnol Prog 2002;18(4):686–93. [70] Zener C. Engineering design by geometric programming. New York: Wiley; 1971.