Parameter estimation using simulated annealing for S- system models ...

Comment

Report 5 Downloads 120 Views

Parameter estimation using simulated annealing for Ssystem models of biochemical networks Orland Gonzalez

Outline • S-systems quick review • Definition of the problem • Simulated annealing – Perturbation function – Demonstration

• General techniques – Decoupling – Structure identification

• The cadBA network in E. coli

Outline • S-systems quick review • Definition of the problem • Simulated annealing – Perturbation function – Demonstration

• General techniques – Decoupling – Structure identification

• The cadBA network in E. coli

S-systems review • Coupled ODEs – Power-laws

• Special form of GMAs – Production and degradation fluxes are aggregated

n: number of dependent variables m: number of independet variable

S-system example

Very simple symbolic setup! X1 production:

X&1+ = α1 X 0g10 X 2g12

where: g10 > 0 since X1 is produced from X0 g12 < 0 since X1 is inhibited by X2

X1 degradation:

X&1− = β1 X 1h11 Therefore:

where: h11 > 0 since we expect the current state of X1 affects its own degradation rate.

X&1 = α1 X 0g10 X 2g12 − β1 X 1h11

S-system example

Very simple symbolic setup! Whole Network:

X&1 = α1 X 0g10 X 2g12 − β1 X 1h11 X& = α X g 21 − β X h22 2

2

1

2

2

where α2 = β1 and h11 = g21 since the amount of X1 degraded is the amount of X2 that is produced.

Outline • S-systems quick review • Definition of the problem • Simulated annealing – Perturbation function – Demonstration

• General techniques – Decoupling – Structure identification

• The cadBA network in E. coli

Parameter Estimation • From literature – From Kms and other enzyme data

• From perturbation experiments – From a system in steady state: a single perturbation is made

• From time series data (biochemical profiles) – Inverse problem – Curve fitting

Inverse problem ex. Given the network:

And the profile:

Problem: Find the parameters that fit the data.

Challenges • Mathematical redundancy – Multiple possible solutions

• Computational complexity – Abundant local minima – Extreme computational requirements • due to numerical integration An early EXTREME example (Kikuchi et. al. 2003, Bioinformatics) : •System with 5 variables •Artificial data •Cluster of 1040 Pentium-III 933MHz processors Each algorithmic loop took about 10 hours!!!

Outline • S-systems quick review • Definition of the problem • Simulated annealing – Perturbation function – Demonstration

• General techniques – Decoupling – Structure identification

• The cadBA network in E. coli

Simulated annealing • Optimization technique – roughly analogous to glass and metal creation – melting and then slow cooling

• Properties – solutions less fit than the current one can be accepted (probabilistically) – can „escape“ from local minima

Pseudo code create initial solution h (random) initialize „temperature“ (melting) slow cooling create new candidate solution h‘ probabilistically accept h‘

lower the temperature (gradually) Error:

2 Aspects of SA • Annealing schedule – temperature control – t←αt where α < 1

• Perturbation function – creation of candidate solution – initial form used

where c is a constant

Perturbation function • Problems – does not converge for too large c – frequently gets trapped with too small c

• What we did – convert c to a function of the error E

Construction of p-function error of a specific metabolite Xi and a specific time t

first order approx. of Xit given Xit-1

substitution yields

...cont‘d Suppose it is possible to reduce Eit to a factor of p < 0 by perturbing a particular gik by ∆g, this relationship is described by

Subtracting this from the orginal relationship, doing algebraic manipulations, shifting to logarithmic space, and isolating ∆g yields

Direct use of equation

• Problems – Computationally expensive • Aggregation for each parameter over all possible t

– How to aggregate?

• Note: Aggregation actually makes it possible to increase the overall error E.

Indirect use of equation • Defines a roughly linear relationship between Note: Consistent with BST theory of linearization in logarithmic space.

Perturbation function:

where l is externally optimized.

Graph of perturbation function •Perturbation magnitudes go down with the error •Encourages exploration of promising areas of the solution space •The lower E gets, the smaller the average magnitudes are

Demonstration

Generate artificial data Pretend we do not know the true parameters. Try to estimate them using the generated artificial data set (4 sets in this example)

Parameter estimation

•18 free parameters •Initial solution randomly created •[0, 90] for rate constants •[-2, 2] for kinetic orders (based on general observations) •10 hours on a 1.5GHz Intel Celeron machine

Outline • S-systems quick review • Definition of the problem • Simulated annealing – Perturbation function – Demonstration

• General techniques – Decoupling – Structure identification

• The cadBA network in E. coli

Decoupling / slope estimation • Approximate slopes – Finite difference, splines, etc... • Equate them directly to the ODE‘s • Advantages – removes the need for numerical integration – runtime dropped to under 2 mins per iteration!

• Disadvantages – additional search bias – susceptibility to noise – not always applicable

Structure identification • Try to identify structure – simple due to canonical form of S-systems – leave all/most parameters free – let optimization determine structure

• Note: still in infancy but can be useful for limited applications – ex. leaving only alternative parameters free (alternative regulations)

...cont‘d

Outline • S-systems quick review • Definition of the problem • Simulated annealing – Perturbation function – Demonstration

• General techniques – Decoupling – Structure identification

• The cadBA network in E. coli

The cadBA network in E. coli • Believed to be part of coping mechanism for acidic environments • Has regulatory (transcriptional) and metabolic elements

Existing cadBA model (CMA) Similarity: We used same data set Difference: We tried to avoid speculating on dynamics of components with no data.

Existing cadBA model (CMA)

Problems & handling • Data availability – CadC is constitutive • hard to measure active form • bypassed by redirecting regulatory signals from pH and lysine to the equation for the transcript

– No data for internal lysine and cadaverine • decarboxylation and transport activities were coupled

Working model Network with estimated parameters

Model fit

Uncertain reason (data for lysine measured separately from the rest)

Working model Network with estimated parameters play around: (neutral pH) Model fit

cadBA model • Challenges left – nature of data • CadA in arbitrary units (activity) • cadBA in relative intensity

• Further work – include other players • lysP and LysP

– determine correctness of hypothesized structure

Possible SA refinements • Annealing schedule – quick convergence heuristics

• Perturbation function – perturbation by parameter subsets • probably useful for handling larger problems

– solution histories • momentum terms

– automatic control – direct use of derived equation

Thanks!!!! Reference

Recommend Documents

Simulated annealing for maximum a posteriori parameter estimation of ...

Parameter estimation for Markov chain models - NIMBioS

PARAMETER ESTIMATION FOR STATISTICAL PARSING MODELS ...

Observations on Using Simulated Annealing for ... - Semantic Scholar

Using simulated annealing for resource allocation - Semantic Scholar