A Conjugate Gradient-based BPTT-like Optimal Control Algorithm Josip Kasać*, Joško Deur*, Branko Novaković*, Ilya Kolmanovsky** *
University of Zagreb, Faculty of Mech. Eng. & Naval Arch., Zagreb, Croatia (e-mail:
[email protected],
[email protected],
[email protected]).
** Ford Research Laboratory, Dearborn, MI 48121-2053 USA (e-mail:
[email protected]).
Introduction
In this paper a gradient-based algorithm for optimal control of nonlinear multivariable systems with control and state vectors constraints is proposed
The algorithm has a backward-in-time recurrent structure similar to the backpropagation-through-time (BPTT) algorithm Original algorithm (Euler method and standard gradient algorithm) is extended with: implementation of higher-order Adams methods implementation of conjugate gradient methods Vehicle dynamics control example – double lane change maneuver executed by using control actions of active rear steering and active rear differential actuators.
2
Continuous-time optimal control problem formulation Find the control vector u(t) ∈ Rm that minimizes the cost function
(
tf
) ∫
) ) J 0 = Φ 0 x(t f ) + F0 ( x(t ), u(t ) ) dt 0
subject to the nonlinear MIMO dynamics process equations
) ) x& (t ) = φ(x(t ), u(t )),
) x(t ) ∈ R n0
with initial and final conditions of the state vector ) ) ) ) x ( 0 ) = x0 , x t f = x f ,
( )
subject to control & state vector inequality and equality constraints ) ) g ( x (t ) , u (t )) ≥ 0 h ( x (t ) , u (t )) = 0 3
Extending the cost function with constraints-related terms
Reduced optimal control problem - find u(t) that minimizes: tf
(
) ∫
) ) J = Φ x(t f ) + F ( x(t ), u(t ) ) dt 0
subject to the process equations (only):
) ) x& (t ) = φ(x(t ), u(t )),
) ) x ( 0 ) = x0 ,
where penalty terms are introduced: ) ) F ( x(t ), u(t ) ) = F0 ( x(t ), u(t ) ) + K E
q
∑
) hk2 ( x ( t ) , u ( t ) ) +
k =1
p
+ KV
∑
(
k =1
(
)
(
)
) ) g k2 ( x ( t ) , u ( t ) ) H − g k ( x ( t ) , u ( t ) ) ,
)
) ) Φ x(t f ) = Φ 0 x(t f ) + K B
n0
∑( ( ) k =1
) ) xk t f − xk , f
)
⎧0, if H − (z) = ⎨ ⎩1, if
z ≥ 0, z < 0.
2
4
Transformation to terminal optimal control problem
In order to simplify application of higher-order numerical integration methods, an additional state variable is introduced:
) x&n = F ( x(t ), u(t ) ) ,
xn (0) = 0,
n = n0 + 1
The final continuous-time optimization problem - find the control vector u(t) that minimizes the terminal condition:
( )
(
)
( )
J t f = Φ x(t f ) + xn t f
subject to the process equations:
x ( 0 ) = x0 ,
x& (t ) = f (x(t ), u(t )),
where:
) x(t ) = ⎡⎣ x1
) ) x2 K xn0
T
xn ⎤⎦ ∈ R
n
f = ⎡⎣ φ1 φ2 K φn0
F ⎤⎦
T
5
Multistep Adams methods 1st order Adams method (Euler method):
2nd order Adams method:
3rd order Adams method:
k-th order Adams method:
x(i + 1) = x(i ) + τ f (i ),
x(0) = x0 ,
i = 0, 1, 2,...,
f (i ) ≡ f (x(i ), u(i )) , 1 ⎛3 ⎞ x(i + 1) = x(i ) + τ ⎜ f (i ) − f (i − 1) ⎟ , 2 ⎝2 ⎠ x(0) = x0 , x(1) = x1 , i = 1, 2, 3,... 16 5 ⎛ 23 ⎞ x(i + 1) = x(i ) + τ ⎜ f (i ) − f (i − 1) + f (i − 2) ⎟ , 12 12 ⎝ 12 ⎠ x(0) = x0 , x(1) = x1 , x(2) = x 2 , i = 2, 3, 4,...
x(i + 1) = x(i ) + τ
k
∑a
(k ) j f (i −
j + 1) ,
j =1
x(0) = x0 , x(1) = x1 ,..., x(k − 1) = x k −1 ,
i = k − 1, k , ... 6
State-space representation of the Adams methods 3rd order Adams method:
x j (i + 1) = x j (i ) + τ
23 f j (i ) + τ xn + j (i ), 12
16 f j (i ) + x2 n + j (i ), 12 5 x2 n + j (i + 1) = f j (i ), 12 xn + j (i + 1) = −
Initial conditions:
x j (2) = x j 2 , 16 5 f j (1) + f j (0), 12 12 5 x2 n + j (2) = f j (1), 12 xn + j (2) = −
The fourth-order Runge-Kutta method is used (only) for calculation of the first k initial conditions for the k-th order Adams method: k 1 = f ( x(i ), u(ti ) ) , ⎛ τ τ ⎞⎞ ⎛ k 2 = f ⎜ x(i ) + k1 , u ⎜ ti + ⎟ ⎟ , 2 2 ⎠⎠ ⎝ ⎝ ⎛ τ τ ⎞⎞ ⎛ k 3 = f ⎜ x(i ) + k 2 , u ⎜ ti + ⎟ ⎟ , 2 2 ⎠⎠ ⎝ ⎝ k 4 = f ( x(i ) + τ k 3 , u ( ti + τ ) ) , x(i + 1) = x(i ) +
τ 2
( k1 + 2k 2 + 2k 3 + k 4 ) .
The
Adams method of k-th order requires only one computations of the function f(i) in a sampling time ti The Runge-Kutta method of k-th order requires the k computations of the function f(i) in a sampling time ti 7
State-space representation of the Adams methods k-th order Adams method:
Initial conditions:
x j (i + 1) = x j (i ) + τ a1( k ) f j (i ) + τ xn + j (i ), xrn + j (i + 1) = ar( k+1) f j (i ) + x( r +1) n + j (i ), x( k −1) n + j (i + 1) =
ak( k )
f j (i ),
∑a
xrn + j (k − 1) =
(k ) l
f j (k − 1 + r − l ) ,
r = 1, 2, K , k − 1
The k-th order Adams discretization of continuous process equations:
x% (i + 1) = f% ( x% (i ), u(i ) ) ,
k
l = r +1
r = 1, 2, K, k − 2
x j (k − 1) = x j ( k −1) ,
x% (0) = x% 0
where: x% (t ) = ⎡⎣ x1 x2 K xn⋅k −1 xn⋅k ⎤⎦ T ∈ R n⋅k f% = ⎡ x1 (i ) + τ a1( k ) f1 (i ) + τ xn +1 (i ) ⎣
x2 (i ) + τ a2( k ) f 2 (i ) + τ xn + 2 (i )
L
ak( k ) f n −1 (i )
ak( k ) f n (i ) ⎤ T ⎦
Adams
method provide same state-space formulation of the optimal control problem as Euler method: x% (i + 1) = f% ( x% (i), u(i) )
Runge-Kutta
method leads to: x(i + 1) = f ( x(i), u(i), u(i + 1) ) 8
Discrete–time optimization problem
The final discrete-time optimization problem - find the control sequence u(0), u(1), u(2),…, u(N-1) that minimizes the terminal cost function:
J = Φ ( x( N ) ) + xn ( N )
subject to the discrete-time process equations (k-th order Adams method):
x% (i + 1) = f% ( x% (i ), u(i ) ) ,
Gradient algorithm: l +1 l u( ) ( i ) = u( ) ( i ) − η
x% (0) = x% 0
∂J l ∂u( ) ( i )
,
The cost function J depends explicitly only on the state vector at the terminal time x(N) implicit dependence on u(0), u(1), u(2),…, u(N-1) follows from the discretetime state equations % % %
x% (i + 1) = f ( i )
f ( i ) ≡ f ( x% (i ), u(i ) )
9
Exact gradient calculation
Implicit but exact calculation of cost function gradient
nk ∂J ∂J ∂x%r ( N ) =∑ ∂u j ( i ) r =1 ∂x%r ( N ) ∂u j ( i )
for i=N-1:
The partial derivatives can be calculated backward in time
x% ( N ) = f% ( x% ( N − 1), u( N − 1) ) ≡ f% ( N − 1)
∂x%r ( N )
∂u j ( N − 1)
for i=N-2:
∂x%r ( N )
=
∂f%r ( N − 1)
∂u j ( N − 1) x% ( N − 1) = f% ( x% ( N − 2), u( N − 2) ) ≡ f% ( N − 2 )
% nk ∂f% ∂f%r ( N − 1) ∂x%l ( N − 1) r ( N − 1) ∂f l ( N − 2 ) = =∑ =∑ % ∂u j ( N − 2 ) ∂u j ( N − 2 ) l =1 ∂xl ( N − 1) ∂u j ( N − 2 ) l =1 ∂x%l ( N − 1) ∂u j ( N − 2 )
∂f%r ( N − 1)
nk
chain rule for ordered derivatives → back-propagation-through-time (BPTT) algorithm 10
The final algorithm for gradient calculation
Initialization (i=N-1):
~ J u ( N − 1) = U( N − 1) ⋅ J x ( N ) D(N − 1) = I
Backward-in-time iterations (i=N-2, N-3, ..., 1, 0):
~ D(i ) = D(i + 1) ⋅ X(i + 1) ~ Y(i ) = D(i ) ⋅ U(i ) J u (i ) = Y(i ) ⋅ J x ( N )
% (i ) are Jacobians with elements: where U% (i ) and X
and where:
J u (i ) =
∂J ∂u ( i )
Jx (N ) =
∂J ∂x% ( N )
∂f%r ( i ) % U rj ( i ) = ∂u j ( i ) Yrj ( i ) =
∂f%r ( i ) % X rj ( i ) = ∂x% j ( i )
∂x%r ( N ) ∂u j ( i )
11
Conjugate gradient methods w(k +1) = w(k ) +ηk d(k )
d(k) – search direction vector g(k) – gradient
d(k +1) = −g(k +1) + βk d(k ) T
w ≡ ⎡⎣u1 (0) u1 (1) K um ( N − 2) um ( N −1)⎤⎦ ∈ RmN ⎡ ∂J g≡⎢ ⎣ ∂u1 (0)
∂J ∂J K ∂u1 (1) ∂um ( N − 2)
T
⎤ ∂J mN ⎥ ∈R ∂um ( N −1) ⎦
Standard gradient algorithm: βk=0 and ηk = const.
Gradient algorithm with momentum: βk=const.
Standard method for computing ηk is line search algorithm which requires one-dimensional minimization of the cost function. Computationally expensive method - require many evaluations of the cost function during one iteration of the gradient algorithm.
12
Learning rate adaptation
Learning rate adaptation (a modified version of SuperSAB algorithm): ⎧⎪ d + ⋅η k −1 if ηk = ⎨ − ⎪⎩ d ⋅η k −1 if
g ( k )T ⋅ g ( k −1) ≥ 0
0 < d−