Gene Modification Identification Under Flux ... - Semantic Scholar

Report 2 Downloads 66 Views
Gene Modification Identification Under Flux Capacity Uncertainty Mona Yousofshahi1, Michael Orshansky2, Kyongbum Lee3, and Soha Hassoun1 1 2

Department of Computer Science, Tufts University

Department of Electrical and Computer Engineering, University of Texas at Austin 3

Department of Chemical and Biological Engineering, Tufts University

Abstract Re-engineering cellular behavior promises to advance the production of commercially significant biomolecules and to enhance cellular function for many applications. To achieve a desired cellular objective, it is necessary to identify within a metabolic network a set of reactions whose fluxes should be changed using gene modifications. We develop a computational method, CCOpt, to optimize the selection of an intervention set that consists of gene up/down-regulation using uncertaintyaware chance-constrained optimization. In contrast to deterministic approaches where constraints are met with 100% certainty, constraints in CCOpt are probabilistically met at a user-specified confidence level. We investigate the application of CCOpt to two case studies that utilize the Chinese Hamster Ovary (CHO) cell metabolism. Our results demonstrate that CCOpt is capable of identifying optimal intervention sets without the run-time cost of a sampling based (Monte Carlo) approach.

1. Introduction Biological cells have been engineered to produce various commercially significant biomolecules ranging from therapeutic antibodies to biofuels. Engineered cells have been used as therapeutic agents, e.g. to accelerate wound healing or regenerate damaged or lost tissue function, and play an implicit role in the efficacy of drugs and other molecular therapeutics designed to mitigate a harmful (e.g. uncontrolled growth) or beneficial (e.g. detoxification) process in the cell. Systematic engineering of cell behavior is just beginning to benefit from computational approaches that provide analysis and optimization capabilities. To enable cellular engineering to fully exploit computational advances, computational models and methods must account for uncertainties inherent to biological systems: stochastic fluctuations of molecular processes, incomplete knowledge, and the imprecision of engineering interventions. Addressing uncertainty is a challenging issue that has become increasingly important not only for engineering biological systems, but also human-made systems such as electronic systems. Indeed, the past decade has witnessed a paradigm shift in the design of electronics, where circuits are now designed to maximize tolerance to manufacturing and operational variations or to include tuning circuitry for post-manufacturing re-calibration. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DAC ’13, May 29 - June 07 2013, Austin, TX, USA. Copyright 2013 ACM 978-1-4503-2071-9/13/05 ...$15.00

As biological engineering efforts progress from proof-ofprinciple to scaled-up manufacturing, computational methods to effectively address biological and engineering uncertainties at the design stage will become increasingly important in ensuring the identification of the most robustly optimal gene modifications. Several computational methods have been recently developed to identify optimal gene modifications to achieve a desired metabolic engineering objective. Here, a gene modification refers to changing the enzyme expression level, and thus the activity, for a particular network reaction. The problem of identifying optimal genetic modifications translates to identifying reactions whose fluxes should be modified to achieve a metabolic objective. The problem can be expressed in terms of operating state variables (flux associated with each reaction), and control (decision) variables such as the presence or absence of gene modifications. The optimal design “tunes” the control variables such that the solution maximizes a desired engineering objective while satisfying several constraints reflecting physicochemical considerations and experimental observations. Notable computational methods include OptReg [1], OptKnock [2], and GDLS [3]. None of the techniques however address uncertainty. Due to biological variability, stochastic effects associated with gene expression, and imprecision in genetic engineering implementation, it is questionable that enzyme levels can be precisely tuned to exactly match the target values calculated using computational design tools. More likely, the target enzyme levels, and thus the corresponding reaction flux capacities, can only be achieved with some uncertainty. The uncertainty in achieving the targeted enzyme activities suggests that the enzyme levels, and hence the corresponding flux carrying capacities, could be considered statistical distributions rather than fixed value parameters. In this statistical interpretation, a flux constraint in a conventional deterministic optimization problem represents the most conservative point in the flux capacity distribution, since a deterministic problem enforces all constraints with zero uncertainty. Such deterministic approaches [1-3], might lead to choosing an intervention set that could fail to achieve in practice the best possible yield. Repeatedly applying deterministic approaches to problem instances obtained using Monte Carlo sampling of flux capacities could identify some intervention sets. However, such an approach is computationally expensive. An alternate approach is to use chance-constrained programming (CCP). CCP selects an optimal solution with a user-defined degree of probabilistic confidence in meeting constraints. Chanceconstrained programming was first introduced to solve the problem of temporal planning when uncertainty is present [4]. Since then, CCP has been utilized in numerous applications such as circuit sizing [5], soil conservation [6], ground water management [7], and energy management [8].

Figure 1: Deterministic vs. chance-constraained upper-bound constraint interpretation. The red dottted lines represent the upper-bounds for reaction j. The arrows sh how the flux ranges. In the deterministic approach, vj is striictly less than any value in the upper-bound distribution. The chancce-constrained approach allows a violation of the upper-boun nd distribution by ε.

We present in this paper a mathematical formullation to identify gene modification while addressing uncerrtainty due to engineering interventions. We investigate how thhe results of our CCP optimization (CCOpt) method compare witth those obtained using Monte Carlo optimization. We apply tthe optimization methods to two test cases that describe the C Chinese Hamster Ovary (CHO) metabolism.

1. Background A cellular process is represented using a biochhemical network consisting of a set of reactions and compounds. A network is at steady state if the total rate of producing eachh of its internal compounds is equal to the total rate of consuming the compound. A network may be able to operate aat more than one such steady state. The steady state flux is a quanntitative measure that characterizes the degrees of engagements off the reactions in a network. One of the most widely used techhniques for flux calculation is Flux Balance Analysis (FBA) [9]]. FBA assumes that the cell is at a metabolic steady state, which dictates that fluxes leading to the formation of a metabolite are balanced by the fluxes leading to the degradation of the metabbolite. There are typically more reaction fluxes in the system m than balance equations that constrain the magnitude and diirection of these fluxes. Consequently, the system of equatioons is typically underdetermined. Using linear programming, FBA finds flux values for a particular cellular objective, such ass maximizing the production of a metabolite, minimizing nutrrient uptake, or maximizing biomass production (cellular growth). Flux Variability Analysis (FVA) is a related technnique useful for computing steady-state flux ranges [10]. Each reaction flux is separately minimized and maximized, subject too similar balance constraints as FBA, to calculate the maximal allowable range supported by the cells operating at steady statee. Measurements on metabolite uptake or release, when availabble, can further restrict such ranges. In the present study, we usee the steady state ranges calculated using FVA as upper and loweer bounds on the reaction fluxes, which are the operating statte variables for optimization.

2. Formulations 2.1 Chance-Constrained Optimization n (CCOpt) In this section, we investigate a way of capturinng uncertainty in parameter values when formulating an optim mization problem with the objective of maximizing the productiion of a desired metabolite through gene up/down-regulating operations. The objective function can be formulated as:

where vtarget is the production rate of a desired metabolite. The objective function in equation (1) ( is subject to certain restrictions on reaction fluxes and decision variables. These constraints include:

• The flux for each reaction is limiteed within a possible range of zero and an upper bound (inequaliity (4)). The upper bound is an uncertain parameter and will bee discussed in detail.

• The modified cell needs to rem main viable which can be guaranteed with a minimal growth rate constraint.

• The rate of production of each intracellular metabolite is equal to its rate of consumption.

• The number of allowed interventiions is limited by an upper bound.

• The gene manipulations are excluusive, i.e. a reaction can be either up- or down-regulated in a solution, but not both.

• The uncertain parameters in this foormulation are the regulated

flux capacities which are defined as a the flux upper bounds and are set by the expression levels of the t corresponding genes.

Figure 1 illustrates the deterministtic and chance-constrained interpretation of an uncertain upper-b bound constraint on the flux of reaction j. In a deterministic interp pretation, the value of flux vj is enforced to be strictly less than all a the values in the upperbound distribution . In the probabilistic p interpretation, there is a nonzero probability that flux vj will be equal to or he distribution . In the larger than some of the values in th case of CCP, the constraint is relaxed by introducing a parameter which reflects the confidence level for the probability that the solution satisfies the t constraint: Prob

1

(2)

To generalize the previous inequality y to also consider the effects of up/down-regulating an enzyme (orr the expression of the gene that encodes the enzyme), we intrroduce two sets of binary decision variables and . A vaalue of 1 indicates that the corresponding enzyme is up/down-reegulated, whereas a value of 0 indicates the corresponding enzymee expression is unchanged. Prob

(1)

1

(3)

Figure 2. Maximum production rates and iintervention sets obtained by CCOpt for the small CHO celll model (a), and for the large CHO model (b). The blue circles reepresent the maximum production rates obtained by CCO Opt with . . The reactions involved in each intervention set aare denoted above each data point. When there is no modification in the enzyme leevel, the flux for is reaction j is less than the upper reference level (SSUj). When 1), vj cann go above the the reaction is up-regulated ( reference state upper bound up to . In thee case of downregulation, the upper bound for vj, decreases to . This probabilistic constraint is illustrated in Figure 2. Various approaches have been developed to solve CCP problems. One method to solve CCP is to convert the probabilistic constraints into their deterministicc equivalents at their specified confidence level ε. This approaach works if the chance constraints are linear, independent and aassume the form of random right-hand side as described in [8]. Our formulation meets all of these conditions; therefore, the chhance constraints can be converted into their deterministic equivaalents. Using the inverse of the cumulative distribution functions (CDF) for , inequality (3) can be reformulated as: and

, ,

(4)

where , and , denote the inverse CDFs of and respectively, which can be numerically calculateed if needed.

3. Results 3.1 Test Cases We use two test cases based on models for thee metabolism of Chinese Hamster Ovary (CHO) cells to assess thhe benefits of our chance-constrained optimization technique. Thee first test case, which is a small-scale model describing CHO ceell metabolism in fed-batch culture [11], comprises 24 metabbolites and 46 reactions. The optimization objective is to maximize the production of a therapeutic antibody. The seecond test case, which is a significantly larger model, repressents a reduced version [12] of a genome scale model [13]. Wee augmented the model from [12] by adding a biomass reaction,, as the test case objective was to maximize cell growth. The modified model comprises 270 metabolites and 423 reactions. Traditionally, gene modification has been modeled as a deterministic event leading to a fold-change in the level of the corresponding enzyme, and hence a fold-channge in the flux

capacity of the reaction catalyzed by the enzyme. Here, we model enzyme level modification ass an uncertain event using a probability distribution. We assum me a normal distribution , [14] with an average fold--change of 6 following gene up-regulation and a standard deviation of 1.3. The average fold-change value reflects ex xperimental data reported in [15]. The standard deviation is cho osen such that 3 1, which ensures that the flux capacity after up-regulating the modified state. A decrease in enzyme level is higher than the unm enzyme level, and hence reaction flu ux capacity, is modeled by a , with an average fold-change of normal distribution 0.5 and a spread of 0.17. Based on the probabilistic interpreetation of fold-changes in enzyme levels, we also estimate the t resulting reaction flux capacities as probability distributionss. A fold-change in enzyme level is assumed to directly correlatee with a fold-change in the maximal reaction velocity ( , ). ) Therefore, flux capacity distributions were calculated by mu ultiplying the enzyme foldchange distributions with , .

3.2 Identifying Intervention Sets Figure 3(a) shows the maximum an ntibody production rate and the intervention sets obtained by CC COpt using the small CHO cell. The intervention sets are shown n above their corresponding flux values. An empty set indicates that no interventions were identified. Reaction 17 represents the antibody synthesis on, allows more antibody reaction, which upon up-regulatio production. Reactions 13 and 25 are in series with each other with a common goal of synthesizin ng cysteine. As reported in [16], one of the rate-limiting steps alo ong the antibody production process in CHO cells is the folding f and assembly of polypeptides in endoplasmic reticulu um where cysteine residues enter the assembly process. Up-regu ulating reaction 4, increases the production of α-ketoglutarate whiich is an intermediate of the tricarboxylic acid cycle. A reccent study showed that supplementing the culture medium with α-ketoglutarate has a positive effect on growth and antib body production of mouse hybridoma cell lines [17]. pplying CCOpt to the large The intervention sets obtained by ap CHO cell models are shown in Figurre 3(b). Sets A, B, C and D represent reaction sets {89, 256, 276 6}, {89, 90, 109, 110, 122, 209, 256, 276, 359, 363}, {86, 89, 90, 9 109, 110, 122, 167, 168, 209, 256, 276, 359, 363}, and {89, 109, 110, 256, 276, 363}, respectively. As for the small CHO cell, the maximal predicted

{17}

1500

{17}

1300 1100 900 700 {} 500 0

1

2

3

4

Intervention upper bound (L)

Maximum production rate (mmol/g-DW/h)

Maximum production rate (nmol/106cells/day)

{4,13,17,25} {13,17,25}

1700

0.25

B

C

0.23 0.21 0.19 0.17

A

0.15 0.13 0.11 0.09

{}

0.07 0

5

10

15

Intervention upper bound (L)

(a)

(b)

Figure 3. Maximum production rates and intervention sets obtained by CCOpt for the small CHO cell model (a), and for the large CHO model (b). The blue circles represent the maximum production rates obtained by CCOpt with . . The reactions involved in each intervention set are denoted above each data point. target flux increases with the number of allowed interventions. A general trend is that the smaller sets of interventions are largely subsets of the larger sets of interventions. The intervention set represented by point A (L = 5) in Figure 3(b) includes hexokinase, nucleoside-diphosphate kinase and the biomass reaction. Set B (L = 10) adds glucose phosphate isomerase, UDP-glucose pyrophosphorylase, glycogen synthase, malic enzyme, phosphatidylinositol synthase, ribulose-5phosphate-3-epimerase and ribose 5-phosphate isomerase. The selection of these enzymes in set B for up-regulation is consistent with experimental observations that CHO cell cultures in log-phase growth exhibit large fluxes through glycolysis (from glucose to lactate), anaplerosis (from pyruvate to oxaloacetate and from glutamate to a-ketoglutarate), and the malate cycle [18].

3.3 Comparison against Monte Carlo Sampling To further understand the quality of solutions obtained using CCOpt, we compare the maximum production rate with those obtained using Monte Carlo sampling. We use Monte Carlo methods to randomly select a flux capacity bound from the flux capacity distribution and then determine an intervention set. Using this approach, the repeated solutions eventually identify multiple intervention sets with some frequencies. We iterated the Monte Carlo simulation 1,000,000 times. The results for the small CHO cell are shown in Figure 4(a). When the number of allowed interventions is 1 or 2 (L = 1, 2), Monte-Carlo based optimization generates only a single intervention set that is identical to the result of CCOpt. For L = 3, Monte Carlo identifies two sets of interventions. The intervention set identified with the greater frequency exactly matches the set identified by CCOpt. Interestingly, this intervention set also corresponds to the highest target production rate among all intervention sets comprising three reactions

( 3). For L = 4, the trend is the same as for L = 3; that is, among the different solutions generated by Monte Carlo sampling, the intervention set with the highest frequency is the same as the set identified by CCOpt. The CCOpt results for the large CHO cell are also compared to solutions generated using the Monte Carlo approach. As was the case for the smaller CHO cell model, the intervention sets identified by CCOpt (shown in red) represent the most frequently generated Monte Carlo solutions (Figure 4(b)). CCOpt solutions A, B and C occur with 100%, 98% and 97% frequency, respectively. These solutions also have the highest maximum fluxes compared to other possible intervention sets. Together, these results indicate that the chance-constrained optimization approach, CCOpt, is as capable of identifying the best intervention sets without the run-time cost of multiple optimizations used in the Monte Carlo sampling approach.

4. Conclusion The presented work addresses an efficient computational framework to identify an optimal set of gene modifications that can be used to maximize cellular production of a particular metabolite. The novelty is in explicitly accounting for likely variations in flux capacities due to engineering modifications. Our algorithm, CCOpt, is based on chance-constrained programming where constraints are probabilistically met at a user-specified confidence level. Evaluation of the approach for two test cases demonstrates that CCOpt consistently finds the solution most frequently found when using Monte Carlo sampling, but at a fraction of the computational cost. The CCOpt formulation is the first work to incorporate uncertainty when computing gene modifications. It can be extended to capture other types of uncertainties, such as biological variability in measured data and cell transfection efficiency, making CCOpt an effective technique for probabilistic strain optimization.

98%