An efficient approach to stochastic optimal control - UCL Computer ...

Report 3 Downloads 136 Views
An efficient approach to stochastic optimal control Bert Kappen SNN Radboud University Nijmegen the Netherlands

Bert Kappen

Examples of control tasks

Motor control

Bert Kappen

Pascal workshop, 27-29 May 2008 1

Examples of control tasks

Foraging

Bert Kappen

Pascal workshop, 27-29 May 2008 2

Examples of control tasks

Collaborating agents

Bert Kappen

Pascal workshop, 27-29 May 2008 3

Stochastic optimal control theory

Control: how to act (now) to optimize future rewards - optimal solution is noise dependent - computation is intractable - tractable approaches are unimodal (LQ, deterministic)

Bert Kappen

Pascal workshop, 27-29 May 2008 4

Outline Control theory Path integral control theory Spontaneous symmetry breaking, timing of decisions Agents Summary If time permits: Learning and neural implementation

Bert Kappen

Pascal workshop, 27-29 May 2008 5

Discrete time control Consider the control of a discrete time dynamical system: xt+1 = f (t, xt, ut),

(1)

t = 0, 1, . . . , T

xt is an n-dimensional vector describing the state of the system and ut is an m-dimensional vector that specifies the control or action at time t. Note, that Eq. 1 describes a noiseless dynamics. If we specify x at t = 0 as x0 and we specify a sequence of controls u0:T = u0, u1, . . . , uT , we can compute future states of the system x1, . . . , xT +1 recursively from Eq.1. Define a cost function that assigns a cost to each sequence of controls:

C(x0, u0:T ) =

T X

R(t, xt, ut)

(2)

t=0

R(t, x, u) is the cost that is associated with taking action u at time t in state x. Bert Kappen

Pascal workshop, 27-29 May 2008 6

Discrete time control The problem of optimal control is to find the sequence u0:T that minimizes C(x0, u0:T ). The problem has a standard solution, which is known as dynamic programming. Introduce the optimal cost-to-go:

J(t, xt) = min ut:T

T X

R(s, xs, us)

(3)

s=t

which solves the optimal control problem from an intermediate time t until the fixed end time T , starting at an arbitrary location xt. The minimum of Eq. 2 is given by J(0, x0).

Bert Kappen

Pascal workshop, 27-29 May 2008 7

Discrete time control One can recursively compute J(t, x) from J(t + 1, x) for all x in the following way: J(T + 1, x) = 0 J(t, xt) = min ut:T

T X

R(s, xs, us)

s=t

= min R(t, xt, ut) + min ut

ut+1:T

T X

R(s, xs, us)

s=t+1

!

= min (R(t, xt, ut) + J(t + 1, xt+1)) ut

The minimizers u0:T give the optimal control path.

Bert Kappen

Pascal workshop, 27-29 May 2008 8

Continuous limit The discrete time recursion is: J(t, xt) = min (R(t, xt, ut) + J(t + dt, xt+dt)) ut

In the limit of continuous time we get J(t + dt, xt+dt) = J(t, xt) + dt∂tJ(t, xt) + dx∂xJ(t, xt) dx = f (x, u, t)dt Thus, −∂tJ(t, x) = min (R(t, x, u) + f (x, u, t)∂xJ(x, t)) u

with boundary condition J(x, T ) = R(T, x) = φ(x).

Bert Kappen

Pascal workshop, 27-29 May 2008 9

Example: Bang-bang control

The spring force F = −z towards the rest position. Control force u. Newton’s Law F = m¨ z with m = 1: z¨ = −z + u Control problem: Given initial position and velocity zi = z˙i = 0 at time t = 0, find the control path −1 < u(0 → T ) < 1 such that z(T ) is maximal.

Bert Kappen

Pascal workshop, 27-29 May 2008 10

Example: Bang-bang control Introduce x1 = z, x2 = z, ˙ then x˙1 = x2 x˙2 = −x1 + u The end cost is φ(x) = −x1 and R(x, u, t) = 0. The HJB takes the form: −∂tJ

Bert Kappen





∂J ∂J ∂J + x1 + u u ∂x1 ∂x2 ∂x2   ∂J ∂J ∂J ∂J + x1 − , u = −sign = −x2 ∂x1 ∂x2 ∂x2 ∂x2

= min −x2

Pascal workshop, 27-29 May 2008 11

Example: Bang-bang control The solution is J(t, x1, x2) = − cos(t − T )x1 + sin(t − T )x2 + α(t) u(t, x1, x2) = −sign(sin(t − T ))

As an example consider T = 2π. Then, the optimal control is u = −1, u = 1,

0