Parameter estimation using simulated annealing for Ssystem models of biochemical networks Orland Gonzalez
Outline • S-systems quick review • Definition of the problem • Simulated annealing – Perturbation function – Demonstration
• General techniques – Decoupling – Structure identification
• The cadBA network in E. coli
Outline • S-systems quick review • Definition of the problem • Simulated annealing – Perturbation function – Demonstration
• General techniques – Decoupling – Structure identification
• The cadBA network in E. coli
S-systems review • Coupled ODEs – Power-laws
• Special form of GMAs – Production and degradation fluxes are aggregated
n: number of dependent variables m: number of independet variable
S-system example
Very simple symbolic setup! X1 production:
X&1+ = α1 X 0g10 X 2g12
where: g10 > 0 since X1 is produced from X0 g12 < 0 since X1 is inhibited by X2
X1 degradation:
X&1− = β1 X 1h11 Therefore:
where: h11 > 0 since we expect the current state of X1 affects its own degradation rate.
X&1 = α1 X 0g10 X 2g12 − β1 X 1h11
S-system example
Very simple symbolic setup! Whole Network:
X&1 = α1 X 0g10 X 2g12 − β1 X 1h11 X& = α X g 21 − β X h22 2
2
1
2
2
where α2 = β1 and h11 = g21 since the amount of X1 degraded is the amount of X2 that is produced.
Outline • S-systems quick review • Definition of the problem • Simulated annealing – Perturbation function – Demonstration
• General techniques – Decoupling – Structure identification
• The cadBA network in E. coli
Parameter Estimation • From literature – From Kms and other enzyme data
• From perturbation experiments – From a system in steady state: a single perturbation is made
• From time series data (biochemical profiles) – Inverse problem – Curve fitting
Inverse problem ex. Given the network:
And the profile:
Problem: Find the parameters that fit the data.
Challenges • Mathematical redundancy – Multiple possible solutions
• Computational complexity – Abundant local minima – Extreme computational requirements • due to numerical integration An early EXTREME example (Kikuchi et. al. 2003, Bioinformatics) : •System with 5 variables •Artificial data •Cluster of 1040 Pentium-III 933MHz processors Each algorithmic loop took about 10 hours!!!
Outline • S-systems quick review • Definition of the problem • Simulated annealing – Perturbation function – Demonstration
• General techniques – Decoupling – Structure identification
• The cadBA network in E. coli
Simulated annealing • Optimization technique – roughly analogous to glass and metal creation – melting and then slow cooling
• Properties – solutions less fit than the current one can be accepted (probabilistically) – can „escape“ from local minima
Pseudo code create initial solution h (random) initialize „temperature“ (melting) slow cooling create new candidate solution h‘ probabilistically accept h‘
lower the temperature (gradually) Error:
2 Aspects of SA • Annealing schedule – temperature control – t←αt where α < 1
• Perturbation function – creation of candidate solution – initial form used
where c is a constant
Perturbation function • Problems – does not converge for too large c – frequently gets trapped with too small c
• What we did – convert c to a function of the error E
Construction of p-function error of a specific metabolite Xi and a specific time t
first order approx. of Xit given Xit-1
substitution yields
...cont‘d Suppose it is possible to reduce Eit to a factor of p < 0 by perturbing a particular gik by ∆g, this relationship is described by
Subtracting this from the orginal relationship, doing algebraic manipulations, shifting to logarithmic space, and isolating ∆g yields
Direct use of equation
• Problems – Computationally expensive • Aggregation for each parameter over all possible t
– How to aggregate?
• Note: Aggregation actually makes it possible to increase the overall error E.
Indirect use of equation • Defines a roughly linear relationship between Note: Consistent with BST theory of linearization in logarithmic space.
Perturbation function:
where l is externally optimized.
Graph of perturbation function •Perturbation magnitudes go down with the error •Encourages exploration of promising areas of the solution space •The lower E gets, the smaller the average magnitudes are
Demonstration
Generate artificial data Pretend we do not know the true parameters. Try to estimate them using the generated artificial data set (4 sets in this example)
Parameter estimation
•18 free parameters •Initial solution randomly created •[0, 90] for rate constants •[-2, 2] for kinetic orders (based on general observations) •10 hours on a 1.5GHz Intel Celeron machine
Outline • S-systems quick review • Definition of the problem • Simulated annealing – Perturbation function – Demonstration
• General techniques – Decoupling – Structure identification
• The cadBA network in E. coli
Decoupling / slope estimation • Approximate slopes – Finite difference, splines, etc... • Equate them directly to the ODE‘s • Advantages – removes the need for numerical integration – runtime dropped to under 2 mins per iteration!
• Disadvantages – additional search bias – susceptibility to noise – not always applicable
Structure identification • Try to identify structure – simple due to canonical form of S-systems – leave all/most parameters free – let optimization determine structure
• Note: still in infancy but can be useful for limited applications – ex. leaving only alternative parameters free (alternative regulations)
...cont‘d
Outline • S-systems quick review • Definition of the problem • Simulated annealing – Perturbation function – Demonstration
• General techniques – Decoupling – Structure identification
• The cadBA network in E. coli
The cadBA network in E. coli • Believed to be part of coping mechanism for acidic environments • Has regulatory (transcriptional) and metabolic elements
Existing cadBA model (CMA) Similarity: We used same data set Difference: We tried to avoid speculating on dynamics of components with no data.
Existing cadBA model (CMA)
Problems & handling • Data availability – CadC is constitutive • hard to measure active form • bypassed by redirecting regulatory signals from pH and lysine to the equation for the transcript
– No data for internal lysine and cadaverine • decarboxylation and transport activities were coupled
Working model Network with estimated parameters
Model fit
Uncertain reason (data for lysine measured separately from the rest)
Working model Network with estimated parameters play around: (neutral pH) Model fit
cadBA model • Challenges left – nature of data • CadA in arbitrary units (activity) • cadBA in relative intensity
• Further work – include other players • lysP and LysP
– determine correctness of hypothesized structure
Possible SA refinements • Annealing schedule – quick convergence heuristics
• Perturbation function – perturbation by parameter subsets • probably useful for handling larger problems
– solution histories • momentum terms
– automatic control – direct use of derived equation