IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 44, NO. 10, OCTOBER 1999
By Gronwall’s inequality, we have
t
eCt jx2 (s) 0 2 (s)jR2 0 jx2 (t) 0 2 (t)jR2 Const. sup t
jx1 (t) 0 1 (t)j2R
[8] H. J. Kushner and P. Dupuis, Numerical Method for Stochastic Control Problems in Continuous Time. New York: Springer-Verlag, 1992. [9] W. H. Fleming and H. M. Soner, Controlled Markov Processes and Viscosity Solutions. New York: Springer-Verlag, 1993.
ds :
1961
(32)
It is easy to show that x2 (t) 0 2 (t) satisfies d(x2 (t)
0 2 (t))
Adaptive Sampling Control of High-Gain Stabilizable Systems
0 ddt2 (t) dt + f2 (x1 (t); x2 (t)) dt + dw2 (t):
=
(33) Achim Ilchmann and Stuart Townley
By Girsanov’s transformation, we have
Pfsupt jx2 (t) 0 2 (t)jR jx1 (0) = x1o ; x2 (0) = x2o g Pfsupt jw2 (t)jR g T 3 d2 (t) 3 f2 (x1 (t); w2 (t) + 2 ) 0 dw2 (t) = E exp dt 0
0 1
1 2
T
0
jf2(x1 (t); w2 (t) + 2 ) 0 ddt2 (t) j2R
dt
jR ; x1 (0) = x1o ; x2 (0) = x2o
sup w2 (t)
t
(34)
t
where x1 (t) = x1o + 0 f1 (x1 ; w2 + 2 ) ds. Now we shall evaluate the following term:
T
I1 =
0
3 f2 (x1 (t); w2 (t) + 2 (t)) dw2 (t):
(35)
By using the Ito lemma and (7), we obtain for some k
2 p L (0; T ; R ) I1 =
T
0 12 +
T 0
0
f
g
T 0
3 k (s; ) dw2 (t):
Index Terms—Adaptive stabilization, minimum-phase systems, proportional control, sampled-data control.
2
Tr Dx f2 (1 (t); 2 (t)) dt
2 O(w2 ) dw2 (t) + c
Abstract—It is well known that proportional output feedback control can stabilize any relative-degree one, minimum-phase system if the sign of the feedback is correct and the proportional gain is high enough. Moreover, there exist simple adaptation laws for tuning the proportional gain (so-called high-gain adaptive controllers) which do not need to know the system and do not attempt to identify system parameters. In this paper the authors consider sampled versions of the highgain adaptive controller. The motivation for sampling arises from the possibility that the output of a system may not be available continuously, but only at sampled times. The main point of interest is the need to develop techniques for adapting the sampling rate, since the stiffness of the system increases as the proportional gain is increased. Our main result shows that adaptive sampling stabilization is possible if the product hk of the decreasing sampling interval h and the increasing proportional gain k decreases at a rate proportional to 1= log k.
(36)
I. INTRODUCTION In this paper, we will show that the ideas and techniques of highgain adaptive output feedback stabilization carry over when the output of the system is not available continuously but is only available at sampled instants of time. This situation arises naturally when digital computations of control inputs are used. It is well known (see, [13]) that
From the recent results given by Shepp and Zeitouni [5], we have
T 2 E exp c 0 O(w2 ) dw2 (t) supt w2 (t) R T and E exp c 0 k3 (s; ) dw2 (t) supt w2 (t) R 2 p L (0; T ; R ). 1; c > 0 0 for any k
f f f f 9 !
derive the desired results.
j
j j 2
j
g ! 1; 9c > 0 j j g ! Hence we can
REFERENCES [1] G. Kallianpur and R. L. Karandikar, White Noise Theory of Prediction, Filtering and Smoothing. New York: Gordon and Breach, 1988. [2] O. Zeitouni and A. Dembo, “A maximum a posteriori estimator for trajectories of diffusion processes,” Stochastics, vol. 20, pp. 221–246, 1987. [3] R. E. Mortensen, “Maximum-likelihood recursive nonlinear filtering,” JOTA, vol. 2, pp. 386–394, 1968. [4] O. B. Hijab, “Minimum energy estimation,” Ph.D. dissertation, Univ. California, Berkeley, 1979. [5] L. A. Shepp and O. Zeitouni, “A note on conditional exponential moments and Onsager–Machlup functional,” Ann. Probability, vol. 20, pp. 652–654, 1992. [6] A. Isidori, Nonlinear Control Systems, 2nd ed. Berlin, Germany: Springer-Verlag, 1989. [7] A. Bensoussan, Perturbation Methods in Optimal Control. Chichester, U.K.: Wiley, 1988.
u(t)
=
0
k (t)y (t);
_ (t) k
=
y
2
(t)
is a continuous-time, high-gain adaptive stabilizer for the class of minimum-phase systems with positive high-frequency gain. This controller arose from the work of [6] and has been developed by [5], [4], [2], [11], [3], [1], and [12], to name but a few. While not all of these papers deal with adaptive control of minimum-phase systems, they are all similar in spirit in the sense that the adaptation of the controller gain is not based on any attempt to identify the parameters of the system. We continue in this spirit but focus on developing a mechanism to deal with the restriction that the output is only available at sampled time instants. The main novelty, which distinguishes this problem from either continuous or discrete-time adaptive control, is the need to develop suitable mechanisms for adjusting a variable Manuscript received June 17, 1998. Recommended by Associate Editor, G. Tao. This work was supported by the Human Capital and Mobility Programme under Project CHRX-CT93-0402 and the University of Exeter, Small Grants Committee. The authors are with the School of Mathematical Sciences, University of Exeter, Exeter EX4 4QE, U.K. and the Centre for Systems and Control Engineering, School of Engineering and Computer Science, University of Exeter, Exeter EX4 4QF, U.K. Publisher Item Identifier S 0018-9286(99)07139-1.
0018–9286/99$10.00 1999 IEEE
1962
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 44, NO. 10, OCTOBER 1999
sampling rate. Reference [9] is the only paper we are aware of which deals with this issue. We focus on adaptive stabilization of minimum-phase multiinput/multi-output systems with the spectrum of the high-frequency gain unmixed. These systems are high-gain stabilizable. While this might be considered a restriction, it is precisely in this high-gain case that the conflict between gain adaptation and sampling is most apparent. Variable sampling could of course be considered in many other situations in adaptive control. More precisely, we consider systems to be stabilized to be of the form
x_ (t) = Ax(t) + Bu(t);
y (t) = Cx(t);
x(0) = x0
(1)
where A 2 IRn2n ; B; C T 2 IRn2m ; x0 2 IRn and n are all unknown. The assumption that (1) is minimum phase means that det
sIn 0 A B C 0
6= 0;
s 2 C+ :
for all
(2)
The assumption that the high-frequency gain CB is unmixed means that
(sCB ) C+
for some unknown
s 2 f01; 1g:
(3)
The sign of the high-frequency gain is called positive if and only if
(CB ) C+ :
(4)
The control objectives are described as follows. Design a simple scalar adaptation law
kj +1
=
tj +1
f (kj ; yj );
= g (tj ; kj )
(5)
so that the proportional sampled-data output feedback
u(t) = 0kj yj ;
t 2 [tj ; tj +1 )
(6)
which uses only sampled output information yj := y (tj ); when applied to a system (1) satisfying (2) and either (3) or (4), yields a closed-loop system (1), (5), (6) with convergent gain adaptation, positive sampling interval length, and stabilized sampled output, i.e., j !1
kj
=k1
lim
yj
=0:
lim
j !1
2 IR;
lim
j !1
tj +1 0 tj
=
The following theorem is the main result of this section. An adaptive gain and sampling time mechanism is presented which stabilizes the output at the sampling instants and guarantees convergent gain and sampling period adaptation. Theorem 2.1: Suppose the system (1) satisfies (2) and (4), i.e., (1) is minimum phase with positive high-frequency gain. Define the adaptive-sampling output feedback law by (7)
where yi := y (ti ); and fkj gj 2IN and ftj gj 2IN are generated by the gain and sampling-time adaptation mechanism 1
; ti+1 = ti + hi ki log ki ki+1 = ki + ki hi kyi k2 ; for all i 2 IN0 hi
=
y (t) = cx(t);
x(0) = x0
(9)
where a; b; c; x0 2 IR all unknown. It is well known (see, e.g., [13]) that the continuous-time adaptive control law
k_ (t) = y 2 (t)
u(t) = 0k(t)y (t);
(10)
will stabilize any system given by (9) with cb > 0: The reason, loosely speaking, is that in the resulting closed-loop system
x_ (t) = [a 0 k(t)cb]x(t)
(11)
k(t) must increase until a 0 k(t)cb is negative, after which x(t) tends to zero exponentially and k(t) converges to a finite limit. A Euler discretization of the k dynamics in (10), with a step length j ; is given by kj +1 0 kj 2 = yj : (12) j On the other hand, sampling (11) on a sampling interval of length hj gives x(tj ) determined approximately by a Euler discretization of (11) with step length hj : Since the “stiffness” of (11) increases affinely with k(t); one would to need to sample (11) at a rate faster than 1=k(t): It is also natural to sample the x-dynamics (which are responding to changes in k) more rapidly than the numerical integration of the k-dynamics. With these observations in mind we choose j
hj
II. SAMPLING STABILIZATION OF MULTIVARIABLE SYSTEMS
t 2 [ti ; ti+1 )
x_ (t) = ax(t) + bu(t);
=
1
(13)
log kj
and
h1 > 0
The paper is organized as follows. Section II is devoted to sampleddata adaptive stabilization of multivariable systems satisfying (2) and either (3) or (4), while in Section III we study the intersampling behavior and prove that under additional mild assumptions we can guarantee that the continuous-time state x(t) tends to zero.
u(t) = 0ki yi ;
with t0 = 0 and k0 > 1: Then the closed-loop system (1), (7), and (8) admits a unique solution x(1) defined on the whole half-axis [0; 1): Furthermore: 1) limi!1 ki = k1 2 IR; 2) limi!1 hi = h1 > 0; 3) fyi gi2IN 2 l2 : Before proving this result we discuss the underlying adaptation, especially the adaptation of the sampling rate. Remark 2.2—1): The basic ideas underlying Theorem 2.1 can be motivated by considering the simplest situation of scalar systems
(8)
= tj +1
0 tj = [kj log kj ]01 = o(j ):
(14)
Note that (12) and (13) coincide with (8), kj is monotonically increasing and hj given by (14) is monotonically decreasing. If k is large enough, so that a 0 kcb is negative and the continuous-time system (2.5) is stable, and the sampling period hj is small enough, then exponential decay of x(t) can be expected. 2): The idea of using a variable sampling rate has been considered by [9]. Our approach differs from this work in two crucial aspects. In Owens the functional dependence between h and k (k = k(h)) is such that ^1 > 0 (15) lim hk (h) = k h!0 whereas in our approach lim hk(h) = lim (1=k log k)k = h!0 k!1 0: More significantly, in the context of adaptive control without identification, we require neither the extra assumptions (15) nor that ^1 CB < 2 holds for N = +1 or 1; imposed in [9]. 0 < Nk 3): We stress that, in general, we cannot expect x(t) 0 as t : However, in the case n = 1; boundedness and monotonicity of kj and hj gives
0
!1
!
jx(t)j [ejajh + cb ejajh k1 h0 ]jxj j; for t 2 [tj ; tj+1 ): Since fyj gj 2IN 2 l2 and c 6= 0 it follows that xj tends to zero,
so that limt!1 x(t) = 0:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 44, NO. 10, OCTOBER 1999
The crucial step in proving Theorem 2.1 is the investigation of fixed (nonadaptive) high-gain feedback control. Indeed, if a feedback of the form
( ) = 0ky ;
u t
t
i
2 [hi; h(i + 1))
= ( log ) 1
( ) = 0ky ;
u t
t
i
2 [hi; h(i + 1))
(16)
applied to (1) yields an exponentially stable closed-loop system i
i
i
i+1
h;k
n
h
2
h
2
i+1 h
i+1
i
i
for all
i
i
i :
(19)
0
=
Ah
e
h
0k
As
e
0
T i
Ah
e
0k
As
0
2
A
=
C
=[I ; 0];
A1 A3
A2 ; A4
B x
m
=
i
= CB 0
n
A1 A3
2 IR
m
A2 A4
;z
2 T
(20)
= [9 + h U ]x : 2
h;k
k
for all k
9
T h;k
( ) :=
y z
y R z
;
0
= 0 Q = P 2 IR P
with R
T m2m as a Lyapunov-function candidate. Here P ; Q QT 2 (n0m)2(n0m) denote the positive-definite solutions of
= IR (CB) P + P (CB) = I T
R
1:
T h;k
m
and AT4 Q
T i
i
4
i
h;k
T i
T h;k
+ QA = 0I 4
i
h;k
i
h;k
2
2
i
i
2
Now
+ h 0kCBA + A AA +A 1 P0 Q0 I + h 0kCB A
9 =
1
I
h;k
T
2
3
4
A2 A4
1
3
so that by using (22)
9
T h;k
9 0R = 0h kI0
R
h;k
0
In0m
+ PA + QA 0kCB + A
AT 1P AT 2P
+ PA 0
AT 3Q
1
2
3
A3
0kCB + A
T
A2 A4
1
A2 : A4
1
R
A3
0 so that 1V 0 hkky k 0 hkz k + hM (ky k + ky k kz k) + h M [k ky k + kky k kz k + kz k ] h + logh k [ky k + kz k ] + M plog k i
2
2
i
2
3
i
2
3
2
i
i
2
i
and hence, by using ky k 1 kz k
2
i
2
0
n
m
: (22)
i
i
2
i
2
2
i
2M kyk + (2M )0 kzk 2
3
1
3
2
1V 0 hk 1 0 M k1 + 2Mk + hk + 2M h 3
3
3
h 0 kpMlog k 1 + plog ky k k i
2
M h M h 0 h 12 0 hk 2 0 M h 0 plog k 0 log k kz k 2
3
2
i
2
(23)
for all k k 1 : By choosing k > k1 sufficiently large, we obtain for all k > k
1V 0 h4 kky k 0 h4 kz k : i
(21)
i
To prove that there exists k sufficiently large so that (19) holds for all k > k; we consider T
T h;k
i
T i
2
(18) becomes h;k
i
1
2
0 9 := I 0 kh CB 0 0 +h xi+1
1
i
h;k
i
with A4 C0 and blocks structured according to y n0m : Setting
V y; z
T h;k
i
y z
( )
T i
i
h;k
Therefore, there exists M3 >
2
(see, e.g., [1, p. 11]), we may assume A; B; C; and x are of the form
h;k
T i
+h
h2
hIn
2
h;k
h;k
+h
+ 2! A + 1 1 1 BC: Since (1) is minimum phase and detCB 6= 0; the state space can be decomposed into im B 8 kerC so that without loss of generality 0k
i
2
ds BC xi :
+ hA + 2!1 h A + 1 1 1
In
i
T h;k
h;k
m
ds BC
) 0 V (y ; z ) along the solution 1
2
e
=
IR
2
T h;k
1
Equation (18) follows from the uniform power series expansion h
; zi+1
1V = [x (9 + h U )]R[(9 + h U )x ] 0 x Rx : Let k > 1: Then there exists M > 0; so that 1 kU k M ; for all k k : k9 k + kplog k Hence, there exists M > 0 so that 1V = x 9 R9 x 0 x Rx + 2h x U R9 x + h x U RU x x [9 R9 0 R]x h h kx k + log + M plog kx k k k
Applying variation-of-constants to (17) yields xi+1
i+1
i
h
kx k M 0 kx k;
Proof:
h;k
1V := V (y
2
_ ( ) = Ax(t) 0 kBCx ; t 2 [hi; h(i + 1)): (17) Here x = x(t ): Moreover, the associated discrete-time system = [I + h(A 0 kBC ) + h U ]x (18) x where U := T (A)[A 0 kBC ] and T (A) := (1=2!)A + (1=3!)hA + 1 1 1 ; is power stable, i.e., there exists some M > 0 and 2 (0; 1); independent of k; so that x t
Computing of (21) gives i
with fixed gain k and sampling length h k k 01 ; is applied to (1), with sufficiently high gain k; then every solution of the closedloop system tends to zero exponentially. The following lemma, which is of interest in its own right, clarifies this “high-gain” idea. Lemma 2.3: Consider a system (1) satisfying (2) and (4), and let sufficiently large such h k k 01 : Then there exists k > that for all k k; the feedback
= ( log )
1963
i
2
2
i
(24)
Hence
(
V xi+1
) 0 V (x ) 0 h4 kx k 0 4khRk V (x ) i
i
2
i
i : Therefore V (x ) (1 0 h(4kRk)0 ) 0 V (x ): Then (19) follows, with := 1 0 h(4kRk)0 ; by using the standard inequalities (R)kxk x Rx kRk kxk :
for all i
0
1 i+1
i+1
h
min
i
i
1
2
T
2
1964
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 44, NO. 10, OCTOBER 1999
It remains to prove exponential stability of (17). Again by we have, variation-of-constants applied to (17) for t 2 h i; i for some suitable M4 > ; that
[ ( + 1))
0
kAkh kx(t)k ekAkh + k e kAk0 1 kBC k kxi k M kRk i kx k 4
+1
0 min (R) h and, for 0 := (1=h) log h < 0; we conclude that 4 kRk 0h(i+1) 4 kRk 0t kx(t)k Mmin kx0 k Mmin (R) e (R) e kx0 k:
Proof of Theorem 2.1: Existence and uniqueness of the solution to the closed-loop system (1), (7), and (8) is obvious. Clearly fki gi2IN is nondecreasing. To prove convergence of fki gi2IN ; suppose to the contrary that 1: Consider the i!1 ki associated discrete-time system
lim
=
xi+1 = [In + hi (A 0 ki BC ) + h2i Uh ;k ]xi
(25)
which we derive from (17) as in the proof of Lemma 3.3; see (20) and (21). Analogously to (24) we obtain, for sufficiently large i0 2 IN0 (recall that i!1 ki 1) and all i i0
lim = h 1Vi 0 4i ki kyi k = 0 14 (ki 0 ki ): Hence V (yN ; zN ) 0 V (yi ; zi ) N0 N0 (ki 0 ki ) = 1V (yi ; zi ) 0 14 i i i i = 0 14 [kN 0 ki ] 2
+1
1
Fig. 1. The switching procedure.
1
+1
=
so that
to choose the feedback
=
V (yi ; zi ) + ki ;
u(t) = 0ki Si yi ;
i : This contradicts the unboundedness of fki gi2IN : Hence fki gi2IN 1 4
kN
for all N
1 4
0
converges. Now the statements 1) and 2) are immediate from boundedness of fki gi2IN : The proof is completed by noting that 3) follows from the inequality
1 log k1
N i=0
kyi k 2
N i=0
1 log ki kyi k = kN 0 k k1 : 2
+1
0
If the sign of the high-frequency gain CB is unknown, so that we only know that (3) holds, then the adaptation law has to additionally find the sign for the feedback. For continuous-time feedback this problem was solved by the famous contribution of [7]. Nussbaum’s idea was to introduce sign-switching feedbacks of the form u t k t y t : The analogue for adaptive-sampling feedback 0k t control is obtained by replacing the continuous sign changing function p k by a piecewise continuous sign changing function as follows. Algorithm 2.4: For a monotone nondecreasing sequence < k0 k1 1 1 1 define a switching sequence fSi gi2IN f0 ; g by the flow chart shown in Fig. 1, initialized with i L ; S0 and where ; if k0 1 1 1 ki
( )sin
()=
() ()
sin
1 11 = =0 =1
i (k; S ) :=
1
1
= i0
1
=
(k 0 k )S ; ki 0 k0 j =0 j +1 j j otherwise:
(26)
If fki gi2IN diverges to infinity, then the Algorithm 2.4 ensures that i 2 0 ; for all i i0 (for i0 sufficiently large) and i has the two accumulation points 1 and 01. Thus Si will stay at 1, respectively 01, for longer and longer intervals and it is then natural
( 1 1)
+
+
t 2 [ti ; ti+1 ):
(27)
This switching procedure is similar to the one used in [9]. However, our implementation is direct and does not use a “switching activation sequence.” Theorem 2.5: Suppose the system (1) satisfies (2) and (3). Let S0 and k0 > : Then the adaptive-sampling output feedback law (27), where yi y ti ; and the gain and sampling-time are adapted according to (8) in Theorem 2.1, with switching sequence fSi gi2IN defined by Algorithm 2.4, applied to (1) yields a closed-loop system which admits a unique solution x 1 defined on the whole half-axis ; 1 : Furthermore: 1) k k1 2 i!1 i 2) h h1 > i i!1 3) 12 0; i!1 i 4) there exists some i0 such that Si Si for all i i0 5) fyi gi2IN 2 l2 : Proof: This proof is very similar to the proof of Theorem 2.1. The main difference is in proving convergence of fki gi2IN ; and the remainder is straightforward and is omitted. Suppose that fki gi2IN is not bounded. For suitable s or 0 we have
=1
[0 )
1
1 := ( )
()
lim lim lim
= = =
IR; 0; ( 1 1];
=
;
= +1
(CB)T P + P (CB) = sIn:
If k in (20) is replaced by kS; then we can deduce, as in the proof of Lemma 2.3 [see in particular (23)] that
1Vi 0 shi ki Si kyi k + 2
M3 (hi + 2M3 hi + hi2 ki2 + 2M3 hi2 ki ) hi 0 pMloghki 1 + plog k 2
i
i
kyi k
2
hi kz k : 0 hi 12 0 hi2ki 0 M hi 0 pMloghki 0 M i i log ki 3
2
2
2
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 44, NO. 10, OCTOBER 1999
0
Therefore, there exists i0 sufficiently large and M5 > ; so that for all i i0
M5 1Vi log 0 shi kiSi kyi k2 ki =[M5 0 sSi] [ki+1 0 ki ]: Hence, for all N
)
N 01
M5 [kN 0 ki ] 0 s
= (kN 0 ki )
i=i
Si [ki+1 0 ki ]
B=
1 1
;
C = (1; 0)
=1 =0
(28)
1
() 0
0
0 2i h 2=
^ = (1
)
3) 4) : t!1 x t Remark 3.2—1): If an upper bound of A in the spectral norm is known, i.e., for some M > we have kAk M; then we can choose h 2 ; =M : This is an immediate consequence of
lim
for which the high-frequency gain CB : It is easy to see that with initial data x ; and k0 > such that T ; t0 h0 k0 k0 01 ; the adaptive feedback law (7), (8) applied ; hj ; kj k0 for all j 2 IN0 and u 1 ; to (28) yields yj but x t t; t : Note that this example is pathological since the sampling times occur exactly where the continuous-time output vanishes. Since the continuous-time system (1) is detectable (this is a consequence of the minimum phase assumption; see, [1]), we could overcome this problem by choosing the sampling periods in such a way that sampling preserves detectability. Now it is well known that the sampled system (with constant sampling period h > ) is detectable if and only if
(0) = (02 ) = ( log ) = 1 =0 =1 = ( ) = (sin2 2 cos 2 )
IR;
i0 ; fyi gi2IN 2 l2 ;
()
;
[0 )
i
( )
1
()
=
lim
N 01
In Remark 2.2-3), we indicated that we would not expect the adaptive controller in Section II to stabilize the whole state, not even the state at sampling instants. In fact, while the sequence y ti converges to zero, the following example (see [9] for the details) shows that the continuous-time output y t need not converge to zero. Consider the minimum phase system
042 0
1
with t0 ; k0 > ; applied to (1) yields a closed-loop system which admits a unique solution x 1 defined on the whole axis ; 1 : Moreover: 1) i!1 ki k1 2 2) there exists some i0 ; j1 2 IN such that hi =j1 h for all
M5 0 S [k 0 ki ] : kN 0 ki i=i i i+1
x_ (t) = Ax(t) + Bu(t) y(t) = Cx(t) with
2 j + 1; j ki log ki ti+1 = ti + h^ i and ki+1 = ki + ki h^ i kyi k2 (30)
=0
s
1 1
1
j is such that hi =
III. STABILIZATION OF THE STATE BY SAMPLING OUTPUT FEEDBACK
0
1
h^ i = h j where
Now it is easy to show, by construction of the switching sequence, that the right-hand side above tends to 01; whilst the left-hand side is bounded. Hence we have a contradiction and therefore fki gi2IN is bounded. This completes the proof.
A=
:= ( )
y ti ; and fki gi2IN and fti gi2IN are generated (7) where yi by the gain and sampling-time adaptation mechanism
i0
V (yN ; zN ) 0 V (yi ; zi
1965
()=0
0 (0 (3 )) 0 < h 20i M3 22M < 1
from which (29) follows. 2): If A is rational, then (29) holds for any h 2 : To see this, note that since In 0 A is a polynomial with rational coefficients, the real and imaginary parts of the eigenvalues of A are algebraic numbers. And since the difference of any two algebraic that numbers is algebraic (see, e.g., [10]), we have for any h 2 h 0 2= i and therefore the claim follows. Proof of Theorem 3.1: This proof is similar to the proof of Theorem 2.1 but with hi replaced by hi : Existence and uniqueness of the solution is again straightforward. Step 1: We will prove boundedness of fki gi2IN : Suppose to the contrary that i!1 ki 1: Analogously to (24) we can find i0 2 IN such that for all i i0
det(
(
)
) 2
^
lim
=
^ 1Vi 0 h2i ki kyi k2 = 0 12 (ki+1 0 ki )
and 1) follows as in part 1) of the proof of Theorem 3.2. Step 2: Since ki converges to k1 ; there exists some j1 i0 2 IN such that
hi = ki
1 log ki 2
1
1
; ; j1 + 1 j1
for all
2 IN and
i i0 :
This proves 2), and 3) follows from
N N h k yi k2 ki h^ i kyi k2 = kN+1 0 k0 k1 : log k1 i=0 i=0 Step 3: It remains to prove 4). To this end consider
= ;
for any 6
; 2 (A) [ f0g:
(29)
We shall modify the adaptive sampling time algorithm (8) under the additional assumption that (29) holds for some known h: This detectability of the sampled system at some known sampling time h is used in [8] in constructing sampled-data identification-based adaptive controllers. The main benefit of the extra assumption is that (29) then holds for a sampling period h=q; for any q 2 IN: We exploit this in the following result. Theorem 3.1: Suppose the system (1) satisfies (2) and (4). Let h be such that (29) holds. Then the adaptive sampling output feedback law
^ xi+1 = eAh xi +
h^ 0
esA B ds ui ;
yi = Cxi
which is the sampled version of (1) on a sampling interval of length i0 it follows that
h^ i : Since hi 2 [(1=j1 + 1); (1=j1 )) for all i h^i = (h=j1 ) := h^ for all i i0 : Hence h^ sA ^ xi+1 = eAh xi + e B ds ui ; 0
(
)
yi = Cxi :
By the minimum phase assumption A; C is detectable. Using ^ (29) it follows that eAh ; C is detectable. Since by 3) we have
(
)
1966
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 44, NO. 10, OCTOBER 1999
limi!1 yi = 0; it follows t [ti ; ti+1 ) we have
2
k
k=
x(t)
that limi!1
kAkh + 1 e
This shows limt!1
^
k
x(t)
=0
= 0:
Sample-Path Average Optimality for Markov Control Processes
Now for all
t A(t0s) e BC xi ds
A(t0t ) xi 0 ki
e
i
x
t ^ k A k h e 01
k k k A
BC
k k ik x
:
and completes the proof.
IV. CONCLUSIONS In this paper we have considered the problem of how to adapt a variable sampling rate in the high-gain adaptive stabilization of minimum phase, relative degree one systems. The adaptive sampling rate is used to counter the increasing stiffness of the closed-loop system caused by an increased gain which is needed to exploit the stability of the zero dynamics. The ideas explored in this paper are prototypical of many problems in adaptive sampled-data control where the possible conflict between variable stiffness, caused by adaptive gains, and adaptive sampling rates has to be resolved. REFERENCES [1] A. Ilchmann, Non-Identifier-Based High-Gain Adaptive Control. London, U.K.: Springer-Verlag, 1993. [2] H. K. Khalil and A. Saberi, “Adaptive stabilization of a class of nonlinear systems using high-gain feedback,” IEEE Trans. Automat. Contr., vol. 32, pp. 1031–1035, 1987. [3] H. Logemann and B. M˚artensson, “Adaptive stabilization of infinitedimensional systems,” IEEE Trans. Automat. Contr., vol. 37, pp. 1869–1883, 1992. [4] B. M˚artensson, “Adaptive stabilization,” Ph.D dissertation, Lund Inst. Technol., Lund, Sweden, 1986. [5] I. Mareels, “A simple selftuning controller for stably invertible systems,” Syst. Contr. Lett., vol. 4, pp. 5–16, 1984. [6] A. S. Morse, “Recent problems in parameter adaptive control,” in Outils et Mod`eles Math´ematiques pour l’Automatique, l’Analyze de Syst`emes et le Traitment du Signal I. D. Landau, Ed. 1983, pp. 733–740. [7] R. D. Nussbaum, “Some remarks on a conjecture in parameter adaptive control,” Syst. Contr. Lett. vol. 3, pp. 243–246, 1983. [8] R. Ortega and G. Kreisselmeier, “Discrete-time, model reference adaptive control for continuous-time systems using generalized sampled-data hold functions,” IEEE Trans. Automat. Contr., vol. 35, pp. 334–338, 1990. [9] D. H. Owens, “Adaptive stabilization using a variable sampling rate,” Int. J. Contr., vol. 63, no. 1, pp. 107–119, 1996. [10] H. E. Rose, A Course in Number Theory. New York: Oxford Univ. Press, 1988. [11] E. P. Ryan, “Adaptive stabilization of a class of uncertain nonlinear systems: A differential inclusion approach,” Syst. Contr. Lett., vol. 10, pp. 95–101, 1988. [12] S. Townley, “Topological aspects of universal adaptive stabilization,” SIAM J. Contr. Optim., vol. 34, no. 3, pp. 1044–1070, 1996. [13] J. C. Willems and C. I. Byrnes, “Global adaptive stabilization in the absence of information on the sign of the high frequency gain,” in Lect. Notes in Control and Inf. Sciences, vol. 62. Berlin, Germany: Springer-Verlag, 1984.
Jean B. Lasserre Abstract—The authors consider a Markov control process with Borel state and actions spaces, unbounded costs, and under the long-run sample-path average cost criterion. They prove that under very weak assumptions on the transition law and a moment assumption for the one-step cost, there exists a stationary policy with invariant probability distribution , that is sample-path average cost optimal for —almost all initial states. In addition, every expected average-cost optimal stationary policy is in fact (liminf) sample-path average-cost optimal and strongly expected average-cost optimal. Index Terms— Borel spaces, Markov control (decision) processes, sample-path average optimality.
I. INTRODUCTION In this paper, we consider a (discrete-time) Markov control process with Borel state and action spaces and unbounded one-step costs. The standard average-cost criterion is the expected long-run average cost and for various results concerning in particular the existence of stationary policies, the interested reader is referred to [9] and the references therein. A stronger (but less studied) average cost criterion is the sample-path long-run average cost, i.e., a policy that is samplepath average cost optimal is not only optimal in expectation, but also optimal almost surely, a highly desirable property. Recent results have covered the case where the state space is countably infinite (see, e.g., Araposthatis et al. [1], Cavazos-Cadena and Fernandez-Gaucherand [5], Borkar [3], [4]). For instance, in [5], the action space is assumed to be compact and a Lyapunov function condition (LFC) ensures that the average cost optimality equation (ACOE) holds. In particular, every stationary policy has a unique invariant probability distribution and a finite expected average cost. Similarly, in [1], [3], and [4], all the stationary policies also induce a unichain Markov chain and a near-monotone condition on the onestep cost holds. The countable semi-Markov case is treated in [2] along the same lines. However, a weaker notion of sample-path average optimality may be satisfactory and we will show that existence of a sample-path average optimal stationary policy can be achieved under very weak assumptions. Indeed, in many cases, it may be enough to require that a policy be sample-path average optimal only for a subset of initial states (with, of course, every other policy being worse for every initial state). For instance, this is quite satisfactory in Rn , when the complement of the subset has null-Lebesgue measure (which is true for some additive-noise systems) or when one is free to choose the initial state from which to operate the system. The purpose of this paper is to investigate the sample-path average optimality criterion and extend previous results that hold in the countable case to the case where the state and action spaces are locally compact separable metric spaces and the one-step cost is unbounded and with no other assumption on the transition law than the usual weak continuity. Most practical applications fall into this framework. The one-step cost function is assumed to be a moment, which is Manuscript received June 1, 1997. Recommended by Associate Editor, W.-B. Gong. The author is with LAAS-CNRS, 31077 Toulouse C´edex 4, France (e-mail:
[email protected]). Publisher Item Identifier S 0018-9286(99)07135-4.
0018–9286/99$10.00 1999 IEEE