An efficient approach to stochastic optimal control Bert Kappen SNN Radboud University Nijmegen the Netherlands
Bert Kappen
Examples of control tasks
Motor control
Bert Kappen
Pascal workshop, 27-29 May 2008 1
Examples of control tasks
Foraging
Bert Kappen
Pascal workshop, 27-29 May 2008 2
Examples of control tasks
Collaborating agents
Bert Kappen
Pascal workshop, 27-29 May 2008 3
Stochastic optimal control theory
Control: how to act (now) to optimize future rewards - optimal solution is noise dependent - computation is intractable - tractable approaches are unimodal (LQ, deterministic)
Bert Kappen
Pascal workshop, 27-29 May 2008 4
Outline Control theory Path integral control theory Spontaneous symmetry breaking, timing of decisions Agents Summary If time permits: Learning and neural implementation
Bert Kappen
Pascal workshop, 27-29 May 2008 5
Discrete time control Consider the control of a discrete time dynamical system: xt+1 = f (t, xt, ut),
(1)
t = 0, 1, . . . , T
xt is an n-dimensional vector describing the state of the system and ut is an m-dimensional vector that specifies the control or action at time t. Note, that Eq. 1 describes a noiseless dynamics. If we specify x at t = 0 as x0 and we specify a sequence of controls u0:T = u0, u1, . . . , uT , we can compute future states of the system x1, . . . , xT +1 recursively from Eq.1. Define a cost function that assigns a cost to each sequence of controls:
C(x0, u0:T ) =
T X
R(t, xt, ut)
(2)
t=0
R(t, x, u) is the cost that is associated with taking action u at time t in state x. Bert Kappen
Pascal workshop, 27-29 May 2008 6
Discrete time control The problem of optimal control is to find the sequence u0:T that minimizes C(x0, u0:T ). The problem has a standard solution, which is known as dynamic programming. Introduce the optimal cost-to-go:
J(t, xt) = min ut:T
T X
R(s, xs, us)
(3)
s=t
which solves the optimal control problem from an intermediate time t until the fixed end time T , starting at an arbitrary location xt. The minimum of Eq. 2 is given by J(0, x0).
Bert Kappen
Pascal workshop, 27-29 May 2008 7
Discrete time control One can recursively compute J(t, x) from J(t + 1, x) for all x in the following way: J(T + 1, x) = 0 J(t, xt) = min ut:T
T X
R(s, xs, us)
s=t
= min R(t, xt, ut) + min ut
ut+1:T
T X
R(s, xs, us)
s=t+1
!
= min (R(t, xt, ut) + J(t + 1, xt+1)) ut
The minimizers u0:T give the optimal control path.
Bert Kappen
Pascal workshop, 27-29 May 2008 8
Continuous limit The discrete time recursion is: J(t, xt) = min (R(t, xt, ut) + J(t + dt, xt+dt)) ut
In the limit of continuous time we get J(t + dt, xt+dt) = J(t, xt) + dt∂tJ(t, xt) + dx∂xJ(t, xt) dx = f (x, u, t)dt Thus, −∂tJ(t, x) = min (R(t, x, u) + f (x, u, t)∂xJ(x, t)) u
with boundary condition J(x, T ) = R(T, x) = φ(x).
Bert Kappen
Pascal workshop, 27-29 May 2008 9
Example: Bang-bang control
The spring force F = −z towards the rest position. Control force u. Newton’s Law F = m¨ z with m = 1: z¨ = −z + u Control problem: Given initial position and velocity zi = z˙i = 0 at time t = 0, find the control path −1 < u(0 → T ) < 1 such that z(T ) is maximal.
Bert Kappen
Pascal workshop, 27-29 May 2008 10
Example: Bang-bang control Introduce x1 = z, x2 = z, ˙ then x˙1 = x2 x˙2 = −x1 + u The end cost is φ(x) = −x1 and R(x, u, t) = 0. The HJB takes the form: −∂tJ
Bert Kappen
∂J ∂J ∂J + x1 + u u ∂x1 ∂x2 ∂x2 ∂J ∂J ∂J ∂J + x1 − , u = −sign = −x2 ∂x1 ∂x2 ∂x2 ∂x2
= min −x2
Pascal workshop, 27-29 May 2008 11
Example: Bang-bang control The solution is J(t, x1, x2) = − cos(t − T )x1 + sin(t − T )x2 + α(t) u(t, x1, x2) = −sign(sin(t − T ))
As an example consider T = 2π. Then, the optimal control is u = −1, u = 1,
0