A Conjugate Gradient-based BPTT-like Optimal Control Algorithm

Report 1 Downloads 64 Views
A Conjugate Gradient-based BPTT-like Optimal Control Algorithm Josip Kasać*, Joško Deur*, Branko Novaković*, Ilya Kolmanovsky** *

University of Zagreb, Faculty of Mech. Eng. & Naval Arch., Zagreb, Croatia (e-mail: [email protected], [email protected], [email protected]).

** Ford Research Laboratory, Dearborn, MI 48121-2053 USA (e-mail: [email protected]).

Introduction „

In this paper a gradient-based algorithm for optimal control of nonlinear multivariable systems with control and state vectors constraints is proposed

„

The algorithm has a backward-in-time recurrent structure similar to the backpropagation-through-time (BPTT) algorithm Original algorithm (Euler method and standard gradient algorithm) is extended with: ‰ implementation of higher-order Adams methods ‰ implementation of conjugate gradient methods Vehicle dynamics control example – double lane change maneuver executed by using control actions of active rear steering and active rear differential actuators.

„

„

2

Continuous-time optimal control problem formulation Find the control vector u(t) ∈ Rm that minimizes the cost function

„

(

tf

) ∫

) ) J 0 = Φ 0 x(t f ) + F0 ( x(t ), u(t ) ) dt 0

subject to the nonlinear MIMO dynamics process equations

‰

) ) x& (t ) = φ(x(t ), u(t )), ‰

) x(t ) ∈ R n0

with initial and final conditions of the state vector ) ) ) ) x ( 0 ) = x0 , x t f = x f ,

( )

‰

subject to control & state vector inequality and equality constraints ) ) g ( x (t ) , u (t )) ≥ 0 h ( x (t ) , u (t )) = 0 3

Extending the cost function with constraints-related terms „

Reduced optimal control problem - find u(t) that minimizes: tf

(

) ∫

) ) J = Φ x(t f ) + F ( x(t ), u(t ) ) dt 0

‰

subject to the process equations (only):

) ) x& (t ) = φ(x(t ), u(t )), ‰

) ) x ( 0 ) = x0 ,

where penalty terms are introduced: ) ) F ( x(t ), u(t ) ) = F0 ( x(t ), u(t ) ) + K E

q



) hk2 ( x ( t ) , u ( t ) ) +

k =1

p

+ KV



(

k =1

(

)

(

)

) ) g k2 ( x ( t ) , u ( t ) ) H − g k ( x ( t ) , u ( t ) ) ,

)

) ) Φ x(t f ) = Φ 0 x(t f ) + K B

n0

∑( ( ) k =1

) ) xk t f − xk , f

)

⎧0, if H − (z) = ⎨ ⎩1, if

z ≥ 0, z < 0.

2

4

Transformation to terminal optimal control problem ‰

In order to simplify application of higher-order numerical integration methods, an additional state variable is introduced:

) x&n = F ( x(t ), u(t ) ) , ‰

xn (0) = 0,

n = n0 + 1

The final continuous-time optimization problem - find the control vector u(t) that minimizes the terminal condition:

( )

(

)

( )

J t f = Φ x(t f ) + xn t f ‰

subject to the process equations:

x ( 0 ) = x0 ,

x& (t ) = f (x(t ), u(t )), ‰

where:

) x(t ) = ⎡⎣ x1

) ) x2 K xn0

T

xn ⎤⎦ ∈ R

n

f = ⎡⎣ φ1 φ2 K φn0

F ⎤⎦

T

5

Multistep Adams methods 1st order Adams method (Euler method):

2nd order Adams method:

3rd order Adams method:

k-th order Adams method:

x(i + 1) = x(i ) + τ f (i ),

x(0) = x0 ,

i = 0, 1, 2,...,

f (i ) ≡ f (x(i ), u(i )) , 1 ⎛3 ⎞ x(i + 1) = x(i ) + τ ⎜ f (i ) − f (i − 1) ⎟ , 2 ⎝2 ⎠ x(0) = x0 , x(1) = x1 , i = 1, 2, 3,... 16 5 ⎛ 23 ⎞ x(i + 1) = x(i ) + τ ⎜ f (i ) − f (i − 1) + f (i − 2) ⎟ , 12 12 ⎝ 12 ⎠ x(0) = x0 , x(1) = x1 , x(2) = x 2 , i = 2, 3, 4,...

x(i + 1) = x(i ) + τ

k

∑a

(k ) j f (i −

j + 1) ,

j =1

x(0) = x0 , x(1) = x1 ,..., x(k − 1) = x k −1 ,

i = k − 1, k , ... 6

State-space representation of the Adams methods 3rd order Adams method:

x j (i + 1) = x j (i ) + τ

23 f j (i ) + τ xn + j (i ), 12

16 f j (i ) + x2 n + j (i ), 12 5 x2 n + j (i + 1) = f j (i ), 12 xn + j (i + 1) = −

Initial conditions:

x j (2) = x j 2 , 16 5 f j (1) + f j (0), 12 12 5 x2 n + j (2) = f j (1), 12 xn + j (2) = −

The fourth-order Runge-Kutta method is used (only) for calculation of the first k initial conditions for the k-th order Adams method: k 1 = f ( x(i ), u(ti ) ) , ⎛ τ τ ⎞⎞ ⎛ k 2 = f ⎜ x(i ) + k1 , u ⎜ ti + ⎟ ⎟ , 2 2 ⎠⎠ ⎝ ⎝ ⎛ τ τ ⎞⎞ ⎛ k 3 = f ⎜ x(i ) + k 2 , u ⎜ ti + ⎟ ⎟ , 2 2 ⎠⎠ ⎝ ⎝ k 4 = f ( x(i ) + τ k 3 , u ( ti + τ ) ) , x(i + 1) = x(i ) +

τ 2

( k1 + 2k 2 + 2k 3 + k 4 ) .

‰ The

Adams method of k-th order requires only one computations of the function f(i) in a sampling time ti ‰ The Runge-Kutta method of k-th order requires the k computations of the function f(i) in a sampling time ti 7

State-space representation of the Adams methods k-th order Adams method:

Initial conditions:

x j (i + 1) = x j (i ) + τ a1( k ) f j (i ) + τ xn + j (i ), xrn + j (i + 1) = ar( k+1) f j (i ) + x( r +1) n + j (i ), x( k −1) n + j (i + 1) =

ak( k )

f j (i ),

∑a

xrn + j (k − 1) =

(k ) l

f j (k − 1 + r − l ) ,

r = 1, 2, K , k − 1

The k-th order Adams discretization of continuous process equations:

x% (i + 1) = f% ( x% (i ), u(i ) ) , ‰

k

l = r +1

r = 1, 2, K, k − 2 ‰

x j (k − 1) = x j ( k −1) ,

x% (0) = x% 0

where: x% (t ) = ⎡⎣ x1 x2 K xn⋅k −1 xn⋅k ⎤⎦ T ∈ R n⋅k f% = ⎡ x1 (i ) + τ a1( k ) f1 (i ) + τ xn +1 (i ) ⎣

x2 (i ) + τ a2( k ) f 2 (i ) + τ xn + 2 (i )

L

ak( k ) f n −1 (i )

ak( k ) f n (i ) ⎤ T ⎦

‰ Adams

method provide same state-space formulation of the optimal control problem as Euler method: x% (i + 1) = f% ( x% (i), u(i) )

‰ Runge-Kutta

method leads to: x(i + 1) = f ( x(i), u(i), u(i + 1) ) 8

Discrete–time optimization problem ‰

The final discrete-time optimization problem - find the control sequence u(0), u(1), u(2),…, u(N-1) that minimizes the terminal cost function:

J = Φ ( x( N ) ) + xn ( N ) ‰

subject to the discrete-time process equations (k-th order Adams method):

x% (i + 1) = f% ( x% (i ), u(i ) ) , ‰

Gradient algorithm: l +1 l u( ) ( i ) = u( ) ( i ) − η

‰

‰

x% (0) = x% 0

∂J l ∂u( ) ( i )

,

The cost function J depends explicitly only on the state vector at the terminal time x(N) implicit dependence on u(0), u(1), u(2),…, u(N-1) follows from the discretetime state equations % % %

x% (i + 1) = f ( i )

f ( i ) ≡ f ( x% (i ), u(i ) )

9

Exact gradient calculation „

Implicit but exact calculation of cost function gradient

nk ∂J ∂J ∂x%r ( N ) =∑ ∂u j ( i ) r =1 ∂x%r ( N ) ∂u j ( i )

‰

for i=N-1:

The partial derivatives can be calculated backward in time

x% ( N ) = f% ( x% ( N − 1), u( N − 1) ) ≡ f% ( N − 1)

∂x%r ( N )

∂u j ( N − 1) ‰

for i=N-2:

∂x%r ( N )

=

∂f%r ( N − 1)

∂u j ( N − 1) x% ( N − 1) = f% ( x% ( N − 2), u( N − 2) ) ≡ f% ( N − 2 )

% nk ∂f% ∂f%r ( N − 1) ∂x%l ( N − 1) r ( N − 1) ∂f l ( N − 2 ) = =∑ =∑ % ∂u j ( N − 2 ) ∂u j ( N − 2 ) l =1 ∂xl ( N − 1) ∂u j ( N − 2 ) l =1 ∂x%l ( N − 1) ∂u j ( N − 2 ) ‰

∂f%r ( N − 1)

nk

chain rule for ordered derivatives → back-propagation-through-time (BPTT) algorithm 10

The final algorithm for gradient calculation ‰

Initialization (i=N-1):

~ J u ( N − 1) = U( N − 1) ⋅ J x ( N ) D(N − 1) = I ‰

Backward-in-time iterations (i=N-2, N-3, ..., 1, 0):

~ D(i ) = D(i + 1) ⋅ X(i + 1) ~ Y(i ) = D(i ) ⋅ U(i ) J u (i ) = Y(i ) ⋅ J x ( N ) ‰

% (i ) are Jacobians with elements: where U% (i ) and X

‰

and where:

J u (i ) =

∂J ∂u ( i )

Jx (N ) =

∂J ∂x% ( N )

∂f%r ( i ) % U rj ( i ) = ∂u j ( i ) Yrj ( i ) =

∂f%r ( i ) % X rj ( i ) = ∂x% j ( i )

∂x%r ( N ) ∂u j ( i )

11

Conjugate gradient methods w(k +1) = w(k ) +ηk d(k )

d(k) – search direction vector g(k) – gradient

d(k +1) = −g(k +1) + βk d(k ) T

w ≡ ⎡⎣u1 (0) u1 (1) K um ( N − 2) um ( N −1)⎤⎦ ∈ RmN ⎡ ∂J g≡⎢ ⎣ ∂u1 (0)

∂J ∂J K ∂u1 (1) ∂um ( N − 2)

T

⎤ ∂J mN ⎥ ∈R ∂um ( N −1) ⎦

‰

Standard gradient algorithm: βk=0 and ηk = const.

‰

Gradient algorithm with momentum: βk=const.

‰

‰

Standard method for computing ηk is line search algorithm which requires one-dimensional minimization of the cost function. Computationally expensive method - require many evaluations of the cost function during one iteration of the gradient algorithm.

12

Learning rate adaptation ‰

Learning rate adaptation (a modified version of SuperSAB algorithm): ⎧⎪ d + ⋅η k −1 if ηk = ⎨ − ⎪⎩ d ⋅η k −1 if

g ( k )T ⋅ g ( k −1) ≥ 0

0 < d−