MINIMAX LQG CONTROL OF STOCHASTIC ... - Semantic Scholar

Report 8 Downloads 167 Views
SIAM J. CONTROL OPTIM. Vol. 40, No. 4, pp. 1189–1226

c 2001 Society for Industrial and Applied Mathematics 

MINIMAX LQG CONTROL OF STOCHASTIC PARTIALLY OBSERVED UNCERTAIN SYSTEMS∗ VALERY A. UGRINOVSKII† AND IAN R. PETERSEN† Abstract. We consider an infinite-horizon linear-quadratic minimax optimal control problem for stochastic uncertain systems with output measurement. A new description of stochastic uncertainty is introduced using a relative entropy constraint. For the stochastic uncertain system under consideration, a connection between the worst-case control design problem and a specially parametrized risk-sensitive stochastic control problem is established. Using this connection, a minimax optimal LQG controller is constructed which is based on a pair of algebraic matrix Riccati equations arising in risk-sensitive control. It is shown that this minimax optimal controller absolutely stabilizes the stochastic uncertain system. Key words. robust control, LQG control, stochastic control, stochastic risk-sensitive control, stochastic dynamic games AMS subject classifications. 93E20, 93E05, 93C41 PII. S0363012998349352

1. Introduction. One of the important ideas in modern robust control theory emerges from the fact that many robust control problems can be formulated as optimization problems. The advantage of this approach is that it allows one to readily convert a problem of robust controller design into a mathematically tractable gametype minimax optimization problem. For linear systems with full state measurement, this methodology leads to a robust version of the linear quadratic regulator (LQR) approach to state feedback controller design [15, 18]. However, the development of a robust version of the LQG technique appears to be a challenging problem. The problem becomes especially difficult in situations in which one wishes to take into account the fact that in real physical systems, noise disturbances entering into the controlled plant differ from Gaussian white noise. A suitable way of introducing noise disturbances in this case may be to treat the disturbances as uncertain stochastic processes. A formalization of this idea leads to the concept of an uncertain stochastic system introduced in recent papers [11, 12, 19]. Note that in the case of a finite time horizon, the uncertain systems framework introduced in [12, 19] allows one to extend the standard LQG design methodology into a partial information minimax optimal control methodology for stochastic uncertain systems. The problem considered in [12, 19] involves constructing a controller which minimizes worst-case performance in the face of system uncertainty which satisfies a certain stochastic uncertainty constraint. This constraint restricts the relative entropy between an uncertain probability measure related to the distribution of the uncertainty input and the reference probability measure. This relative entropy constraint can be thought of as a stochastic counterpart of the deterministic integral quadratic constraint uncertainty description; see [15, 23]. One advantage of the relative entropy uncertainty description is that it allows for stochastic uncertainty inputs ∗ Received

by the editors December 17, 1998; accepted for publication (in revised form) April 16, 2001; published electronically November 28, 2001. This work was supported by the Australian Research Council. http://www.siam.org/journals/sicon/40-4/34935.html † School of Electrical Engineering, Australian Defence Force Academy, Canberra ACT 2600, Australia ([email protected], [email protected]). 1189

1190

VALERY A. UGRINOVSKII AND IAN R. PETERSEN

to depend dynamically on the uncertainty outputs. In this paper, we address an infinite-horizon version of the robust LQG problems considered in [12, 19]. As we proceed from a finite time interval to an infinite time interval, the fact that the systems under consideration are those with additive noise becomes important. The solutions of such systems do not necessarily belong to L2 [0, ∞). Hence, the approaches used to describe admissible uncertainties in the deterministic case (e.g., see [15]) and the multiplicative noise case [18] are not applicable here. Note that the class of admissible uncertainties defined using the approach of [15, 18] is consistent with the notion of absolute stabilizability defined in terms of the L2 [0, ∞)-summability of uncertainty inputs and corresponding solutions to the closed-loop system. However, in the present paper the uncertainty inputs and solutions need not be L2 [0, ∞)-summable. Instead, we will consider the time-averaged properties of the system solutions. This requires us to correspondingly modify the definitions of admissible uncertainty and absolute stabilizability in order to properly account for the nature of the systems under consideration. In particular, our new definition of the class of admissible uncertainties is one of the contributions of this paper. In the case of an uncertain system with additive noise considered on the infinite time interval, we use an approximating sequence of martingales to describe the class of admissible uncertainties. In particular, we give an example which shows that H ∞ norm-bounded uncertainty can be incorporated into the proposed framework by constructing a corresponding sequence of martingales. The main result of the paper is a robust LQG control synthesis procedure based on a pair of algebraic Riccati equations arising in risk-sensitive optimal control; see [9]. We show that solutions to a certain specially parametrized risk-sensitive control problem provide us with a controller which guarantees an optimal upper bound on the time-averaged performance of the closed-loop system in the presence of admissible uncertainties. 2. Definitions. Let (Ω, F, P ) be a complete probability space on which a pdimensional standard Wiener process W (·) and a Gaussian random variable x0 : Ω → Rn with mean x ˇ0 and nonsingular covariance matrix Y0 are defined, p = r + l. The first r entries of the vector process W (·) correspond to the system noise, while the last l entries correspond to the measurement noise. The space Ω can be thought of as the noise space Rn × Rl × C([0, ∞), Rp ) [1]. The probability measure P can then be defined as the product of a given probability measure on Rn × Rl and the standard Wiener measure on C([0, ∞), Rp ). The space Ω is endowed with a filtration {Ft , t ≥ 0} which has been completed by including all sets of probability zero. The filtration {Ft , t ≥ 0} can be thought of as the filtration generated by the mappings {Πt , t ≥ 0}, where Π0 (x, η, W (·)) = (x, η) and Πt (x, η, W (·)) = W (t) for t > 0 [1]. The random variable x0 and the Wiener process W (·) are stochastically independent in (Ω, F, P ). 2.1. The nominal system. On the probability space (Ω, F, P ) defined above, we consider the system and measurement dynamics driven by the noise input W (·) and a control input u(·). These dynamics are described by the following stochastic differential equation: (1)

dx(t) = (Ax(t) + B1 u(t))dt + B2 dW (t), z(t) = C1 x(t) + D1 u(t), dy(t) = C2 x(t)dt + D2 dW (t),

y(0) = 0.

x(0) = x0 ,

MINIMAX LQG CONTROL

1191

In the above equations, x(t) ∈ Rn is the state, u(t) ∈ Rm is the control input, z(t) ∈ Rq is the uncertainty output, and y(t) ∈ Rl is the measured output. System (1) is referred to as the nominal system. All coefficients in (1) are assumed to be constant matrices of corresponding dimensions. Also, we assume that D2 D2 > 0. In the minimax optimal control problem to be considered in this paper, our attention will be restricted to linear output-feedback controllers of the form dˆ x = Ac x ˆ + Bc dy, (2)

u = Kx ˆ,

where x ˆ ∈ Rnˆ is the state of the controller and Ac ∈ Rnˆ ׈n , K ∈ Rm׈n , and Bc ∈ Rnˆ ×q . Let U denote this class of linear controllers. Note that the controller (2) is adapted to the filtration {Yt , t ≥ 0} generated by the observation process y. The closed-loop nominal system corresponding to controller (2) is described by a linear Ito differential equation of the form (3)

¯xdt + BdW ¯ d¯ x = A¯ (t), ¯ z = Cx ¯,   u= 0 K x ¯

and is considered on the probability space (Ω, F, P ). In (3), x ¯ = [x x ˆ ] ∈ Rn+ˆn is the state of the closed-loop system. Also, the following notation is used:       B2 A B1 K ¯ ¯ , B= , C¯ = C1 D1 K . (4) A= Bc D2 Bc C2 Ac 2.2. The stochastic uncertain system. In this paper, we introduce an uncertainty description for stochastic uncertain systems with additive noise which can be regarded as an extension of the uncertainty description considered in [12, 19] to the case of an infinite time horizon. As in [12, 19], the stochastic uncertain systems to be considered are described by the nominal system (1) considered over the probability space (Ω, F, P ), and also by a set of perturbations of the reference probability measure P . These perturbations are defined as follows. Consider the set M of continuous positive martingales (ζ(t), Ft , t ≥ 0) such that for each T ≥ 0, Eζ(T ) = 1; here, E denotes the expectation with respect to the probability measure P . Note that the set M is convex. Every martingale ζ(·) ∈ M gives rise to a probability measure QT on the measurable space (Ω, FT ) defined by the equation (5)

QT (dω) = ζ(T )P T (dω).

Here, P T denotes the restriction of the reference probability measure P to (Ω, FT ). From this definition, for every T > 0, the probability measure QT is absolutely continuous with respect to the probability measure P T , QT P T . The uncertain system is described by the stochastic differential equation (1) considered over the probability space (Ω, FT , QT ) for every T > 0. The expectation in this probability space is T denoted EQ . We now present an infinite-horizon uncertainty description for stochastic uncertain systems with additive noise. This uncertainty description may be regarded as an extension of the uncertainty description considered in [19] to the infinite-horizon case. Also, this uncertainty description can be thought of as an extension of the

1192

VALERY A. UGRINOVSKII AND IAN R. PETERSEN

deterministic integral quadratic constraint uncertainty description [15, 16, 23] and the stochastic integral quadratic constraint uncertainty description [18] to the case of stochastic uncertain systems with additive noise. Recall that the integral quadratic constraints arising in [15, 16, 23, 18] exploit a sequence of times {ti }∞ i=1 to “localize” the uncertainty inputs and uncertainty outputs to time intervals [0, ti ]. The consideration of the system dynamics on these finite time intervals then allows one to deal with bounded energy processes. However, in this paper the systems under consideration are those with additive noise. For this class of stochastic systems, it is natural to consider bounded power processes rather than bounded energy processes. This motivates us to propose the relative entropy uncertainty description given below in Definition 1 to accommodate bounded power processes. In contrast to the case of deterministic integral quadratic constraints, the uncertainty description considered in this paper exploits a sequence of continuous positive martingales {ζi (t), Ft , t ≥ 0}∞ i=1 ⊂ M which converges to a limiting martingale ζ(·) in the following sense: For any T > 0, the sequence {ζi (T )}∞ i=1 converges weakly to ζ(T ) in L1 (Ω, FT , P T ). Using the martingales ζi (t), we define a sequence of probability measures {QTi }∞ i=1 as follows: (6)

QTi (dω) = ζi (T )P T (dω).

From the definition of the martingales ζi (t), it follows that for each T > 0 the sequence T {QTi }∞ i=1 converges to the probability measure Q corresponding to a limiting martingale ζ(·) in the following sense: For any bounded FT -measurable random variable η,   ηQTi (dω) = ηQT (dω). lim (7) i→∞





We denote this fact by QTi ⇒ QT as i → ∞. Remark 1. The property QTi ⇒ QT implies that the sequence of probability measures QTi converges weakly to the probability measure QT . Indeed, consider the Polish space of probability measures on the measurable space (Ω, FT ) endowed with the topology of weak convergence of probability measures. Note that Ω is a metric space. Hence, such a topology can be defined; e.g., see [2]. For the sequence {QTi } to converge weakly to QT , it is required that (7) hold for all bounded continuous random variables η. Obviously, this requirement is satisfied if QTi ⇒ QT . As in the finite-horizon case [12, 19], we describe the class of admissible uncertainties in terms of the relative entropy functional h(··); for the definition and properties of the functional h(··), see Appendix A and also [2]. Definition 1. Let d be a given positive constant. A martingale ζ(·) ∈ M is said to define an admissible uncertainty if there exists a sequence of continuous positive martingales {ζi (t), Ft , t ≥ 0}∞ i=1 ⊂ M which satisfies the following conditions: (i) For each i, h(QTi P T ) < ∞ for all T > 0; (ii) For all T > 0, QTi ⇒ QT as i → ∞; (iii) The following stochastic uncertainty constraint is satisfied: For any sufficiently large T > 0, there exists a constant δ(T ) such that limT →∞ δ(T ) = 0 and     1 1 QTi  T d 2 T T (8) E inf z(t) dt − h(Qi P ) ≥ − + δ(T )   T >T T 2 2 0 for all i = 1, 2, . . . . In (8), the uncertainty output z(·) is defined by (1) considered on the probability space (Ω, FT , QTi ).

MINIMAX LQG CONTROL

1193

In the above conditions, QTi is the probability measure defined by (6) corresponding to the martingale ζi (t) and time T > 0. We let Ξ denote the set of martingales ζ(·) ∈ M corresponding to admissible uncertainties. Elements of Ξ are also called admissible martingales. Observe that the reference probability measure P corresponds to the admissible martingale ζ(t) ≡ 1. Hence, the set Ξ is not empty. Indeed, choose ζi (t) = 1 for all i and t. Then QTi = P T for all i. It follows from the identity h(P T P T ) = 0 that      T 1 1 QTi  T 1 2 T T E inf z(t) dt − h(Qi P ) = inf E z(t)2 dt. T  >T T  2 T  >T 2T  0 0 Note that the expectations are well defined. Also, the infimum on the right-hand side of the above equation is nonnegative for any T > 0. Therefore, for any constant d > 0, one can find a sufficiently small δ = δ(T ) such that limT →∞ δ(T ) = 0 and the constraint (8) is satisfied strictly in this case. Remark 2. Note that condition (8) implies that    T 1 1 QTi d 2 T T lim inf E z(t) dt − h(Qi P ) ≥ − T →∞ T 2 2 0 for all i = 1, 2, . . . . In what follows, we will use the following notation. Let PT be the set of probability measures QT on (Ω, FT ) such that h(QT P T ) < ∞. Also, the notation M∞ will denote the set of martingales ζ(·) ∈ M such that h(QT P T ) < ∞ for all T > 0. It is readily verified that the set M∞ is convex. Note that the martingales ζi (·) from Definition 1 belong to M∞ . 2.3. A discussion of the class of stochastic uncertain systems under consideration. In this section, we give more insight into the class of stochastic uncertain systems under consideration. In the integral quadratic constraint approach to robust control theory, the uncertainty is described in terms of a given set of uncertainty input signals. In contrast, Definition 1 presents a martingale uncertainty description or, equivalently, a probability measure uncertainty description. The motivation behind Definition 1 is as follows. The proposed uncertain system model allows us to obtain a tractable solution to the corresponding problem of minimax optimal LQG controller design. Also, the stochastic uncertainty description presented in Definition 1 encompasses many important classes of uncertainty arising in robust control theory. In particular, it includes H ∞ norm-bounded linear time-invariant (LTI) uncertainties and cone-bounded nonlinear uncertainties. This makes the approach developed in this paper applicable to a broad range of control system design problems. We show below that H ∞ norm-bounded uncertainties satisfy the requirements of Definition 1. The definition of admissible uncertainties given above involves a collection of martingales {ζi (·)}∞ i=1 which has a given uncertainty martingale ζ(·) as its limit point. In the deterministic case and the multiplicative noise case, similar approximations have been defined by restricting uncertainty inputs to finite time intervals and then extending the restricted processes by zero beyond these intervals; e.g., see [15, 16, 18]. In the case of a stochastic uncertain system with additive noise considered on an infinite time interval, we apply a similar idea. However, in contrast to the deterministic and multiplicative noise cases, we use a sequence of martingales and corresponding probability measures in Definition 1. This procedure may be thought of as involving

1194

VALERY A. UGRINOVSKII AND IAN R. PETERSEN

a spatial restriction rather than the temporal restriction used previously. Indeed, a natural way to define the required sequence of martingales and corresponding probability measures is to consider martingales corresponding to the uncertainty inputs as “truncated” at certain Markov times ti . For example, this can be achieved by choosing an expanding sequence of compact sets Ki in the uncertainty input space and letting ti be the Markov time when the uncertainty input reaches the boundary of the set Ki . In this case, we focus on spatial domains rather than time intervals on which the uncertainty inputs and uncertainty outputs are then constrained. An illustration of this idea will be given in section 2.3.2. 2.3.1. A connection between uncertainty input signals and martingale uncertainty. A connection between the uncertainty input signal uncertainty model and the perturbation martingale uncertainty model is based on Novikov’s theorem [6]. Using the result of Novikov’s theorem, a given uncertainty input ξ(·) satisfying the conditions of this theorem on every finite interval [0, T ] can be associated with a martingale ζ(·) ∈ M. This result is summarized in the following lemma. Lemma 1. Suppose a random process (ξ(t), Ft ), 0 ≤ t ≤ T , satisfies the conditions:  T

P (9)

0

ξ(s)2 ds < ∞

= 1,

  1 T 2 E exp ξ(s) ds < ∞. 2 0

Then the equation  (10)

ζ(t) = 1 +

0

t

ζ(s)ξ(s) dW (s)

defines a continuous positive martingale ζ(t). Furthermore, the stochastic process (11)

˜ (t) = W (t) − W

 0

t

ξ(t)dt

is a Wiener process with respect to the system {Ft , 0 ≤ t ≤ T } and the probability measure QT defined by (5), where ζ(·) is defined by (10). Proof. Conditions (9) are the conditions of Novikov’s theorem (e.g., see Theorem 6.1 on page 216 of [6]). It follows from this theorem that the random process (ζ(t), Ft ), 0 ≤ t ≤ T , defined by (10) is a continuous martingale and, in particular, Eζ(T ) = 1. Furthermore, this martingale is given by

 (12)

ζ(t) = exp

0

t

1 ξ  (s)dW (s) − ξ(s)2 ds . 2

The statement of the lemma now follows from Girsanov’s theorem; e.g., see Theorem 6.3 on page 232 of [6]. We now consider an uncertain system with H ∞ norm-bounded LTI uncertainty and driven by a Gaussian white noise process v(t) as shown in Figure 1. We will show that such an uncertain system can be described in terms of the stochastic uncertain system framework defined above. Note that if ∆(s) ≡ 0 and ξ(·) = 0, then w(t) = v(t).

1195

MINIMAX LQG CONTROL

ξ(t)

❄ v(t) ✏ w(t) ✲ ✲ ✒✑ ✲ u(t)

Uncertainty ∆(s)

z(t) ✛

Nominal system

✲ y(t)

Fig. 1. An uncertain system.

That is, the nominal system is driven by a Gaussian white noise. However, in the presence of uncertainty, the input w(·) ceases to be a Gaussian white noise. For each T > 0, a rigorous mathematical description of the system shown in Figure 1 can be given by the equations (13)

˜ (t), dx = (Ax + B1 u + B2 ξ)dt + B2 dW z = C1 x + D1 u,

˜ (t) dy = (C2 x + D2 ξ)dt + D2 dW considered on the probability space (Ω, FT , QT ), where QT is the probability measure constructed in Lemma 1. Also, the uncertainty input is related to the uncertainty output by the relation ξ = ∆(s)z. Now, the substitution of (11) into (13) leads to a set of equations of the form (1) considered on the probability space (Ω, FT , QT ). Thus, the uncertain system shown in Figure 1 can be considered in the stochastic uncertain system framework defined above. In what follows, we will show that an H ∞ norm bound on the LTI uncertainty ∆(s) implies the satisfaction of the relative entropy constraint described above. Note that the case ξ(·) = 0 corresponds to ζ(t) ≡ 1 and QT = P T . 2.3.2. H ∞ norm-bounded uncertainty and the relative entropy constraint. In this section we will show that if the LTI uncertainty ∆(s) shown in Figure 1 satisfies an H ∞ norm bound, then the corresponding stochastic uncertain system satisfies the relative entropy constraint defined above. This completes the proof of our assertion that the standard H ∞ norm-bounded uncertainty description can be incorporated into the framework of Definition 1. In a similar fashion, one can also show that a cone-bounded nonlinear uncertainty defines an admissible uncertainty according to Definition 1. This proof has been removed for the sake of brevity. In what follows, we will use the following well-known property of linear stochastic systems. On the probability space (Ω, F, P˜ ), consider the following linear system ˜ (·) and a disturbance input ξ(t), t ∈ [0, T ]: driven by the Wiener process W (14)

¯x + Bξ(t))dt ¯ ¯ W ˜ (t). d¯ x = (A¯ + Bd

1196

VALERY A. UGRINOVSKII AND IAN R. PETERSEN

Proposition 1. If for some constant ρ > 0  T ˜ (15) ξ(t)2 dt ≤ ρ, E 0

then the corresponding solution to (14) is mean square bounded on the interval [0, T ]. ˜ denotes the expectation with respect to the probability measure P˜ . Here E Proof. The proof of the proposition follows straightforwardly using standard Lyapunov arguments. Consider an uncertain system of the form (1) on the probability space (Ω, F, P ), driven by a controller (2). Associated with the system (1) and controller (2), consider the disturbance input ξ(·) defined by the convolution operator  t ξ(t) = (16) g(t − θ)z(θ)dθ 0

corresponding to a given causal uncertainty transfer function ∆(s) which belongs to the Hardy space H ∞ . In (16), z(·) is the uncertainty output of the closed-loop system corresponding to the system (1) and the given controller (2). Lemma 2. Let an uncertainty transfer function ∆(s) ∈ H ∞ be given which satisfies the norm bound condition (17)

∆(s)∞ ≤ 1.

Also, suppose that the random process (ζ(t), Ft ) defined by (10) is a martingale; here ξ(·) is the disturbance input generated by the operator (16). Then this martingale satisfies the conditions of Definition 1. Remark 3. The requirements of Lemma 2 are satisfied if ∆(s) is a stable rational transfer function satisfying condition (17). Indeed, in this case one can show that the augmented dynamics [x (·), x ˆ (·), η  (·), z  (·), ξ  (·)] are described by a linear system driven by a Wiener process, with Gaussian initial condition; here η denotes the state of the uncertainty. Hence for any T > 0 there exists a constant δT such that sup E exp(δT ξ(t)2 ) < ∞;

t≤T

see the remark on page 138 of [6]. This implies that ζ(t) is a martingale; see Example 3 on page 220 of [6]. Hence, any uncertainty described by a stable rational transfer function satisfying condition (17) will belong to the class Ξ of uncertainties admissible for system (1) controlled by a linear output-feedback controller of the form (2). Proof of Lemma 2. Since the random process (ζ(t), Ft ), 0 ≤ t ≤ T , defined by (10) is a martingale and Eζ(T ) = 1, it follows from Girsanov’s theorem that the ˜ (·) defined by (11) is a Wiener process with respect to the filtration random process W {Ft , 0 ≤ t ≤ T } and the probability measure QT defined as in (5); see [6]. Note that on the probability space (Ω, FT , QT ), system (1) becomes a system of the form (13). To verify that the martingale ζ(t) corresponding to the H ∞ norm-bounded uncertainty under consideration defines an admissible uncertainty, we need to prove the existence of a sequence of martingales {ζi (t)}∞ i=1 satisfying the conditions of Definition 1. To construct such a sequence, consider the following family of Markov stopping times {tρ , ρ > 0} [6]. For any ρ > 0, define

t

∞ inf{t ≥ 0 : 0 ξ(s)2 ds > ρ} if 0 ξ(s)2 ds > ρ,

∞ tρ := ∞ if 0 ξ(s)2 ds ≤ ρ.

1197

MINIMAX LQG CONTROL

The family {tρ } is monotonically increasing and tρ → ∞ P -a.s. We now are in a position to construct an approximating sequence of martingales {ζi (t)}∞ of Markov stopping times. First, note that the i=1 using the above sequence

t stochastic integral µ(t) := 0 ξ(s) dW (s) defines a local continuous martingale; see Definition 6 on page 69 of [6]. Also, for any stopping time tρ defined above,  µ(t ∧ tρ ) =

t∧tρ

0





ξ(s) dW (s) =

0

t





ξρ (s) dW (s) =

t

0

ξ(s) dW (s ∧ tρ ),

where the process ξρ (·) is defined as follows: ξρ (t) = ξ(t)χ{tρ ≥t} .

(18)

Here, χΛ denotes the indicator function of a set Λ ⊆ Ω. In the above definitions, the notation t ∧ t := min{t, t} is used. Associated with the positive continuous martingale ζ(t) and the family of stopping times {tρ , ρ > 0} defined above, consider the stopped process ζρ (t) = ζ(t ∧ tρ ). From this definition, ζρ (t) is a continuous martingale; e.g., see Lemma 3.3 on page 69 of [6]. Furthermore, using the representation (10) of the martingale ζ(t), it follows that ζρ (t) is an Ito process with the stochastic differential (19)

dζρ (t) = ζρ (t)ξρ (t)dW (t) = ζρ (t)dµ(t ∧ tρ );

ζρ (0) = 1.

From (19), the martingale ζρ (t) admits the following representation:

 (20)

ζρ (t) = exp

t

0

ξρ dW (s)

1 − 2

 0

t

2



ξρ (s) ds .

Also, Eζρ (t) = Eζρ (0) = 1. Hence, ζρ (·) ∈ M. Using the martingale ζρ (t) defined above, we define probability measures QTρ on (Ω, FT ) as follows: QTρ (dω) = ζ(T ∧ tρ )P T (dω). From (20), the relative entropy between the probability measures QTρ and P T is given by (21)

h(QTρ P T ) =

1 QTρ E 2

 0

T

ξρ (s)2 ds =

1 QTρ E 2

 0

tρ ∧T

ξ(s)2 ds.

From this equation and from (18), it follows that h(QTρ P T ) ≤ (1/2)ρ < ∞ for all T > 0. Thus, condition (i) of Definition 1 is satisfied. Also, using part 1 of Theorem 3.7 on page 62 of [6], we observe that for every T > 0 the family {ζ(tρ ∧ T ), ρ > 0} is uniformly integrable. Also, since tρ → ∞ with probability one as ρ → ∞, then ζρ (T ) → ζ(T ) with probability one. This fact together with the property of uniform integrability of the family {ζρ (T ), ρ > 0} implies that  lim E(|ζ(T ∧ tρ ) − ζ(T )|G) = 0 (22) P -a.s. ρ→∞

1198

VALERY A. UGRINOVSKII AND IAN R. PETERSEN

for any σ-algebra G ⊂ FT ; see the Corollary on page 16 of [6]. We now observe that for any FT -measurable bounded random variable η with values in R E|ηζ(T ∧ tρ ) − ηζ(T )| ≤ sup |η| · E|ζ(T ∧ tρ ) − ζ(T )|. ω

Therefore, it follows from the definition of the probability measures QTρ and QT and from (22) that QTρ ⇒ QT as ρ → ∞ for all T > 0. Thus, we have verified that the family of martingales ζρ (t) satisfies condition (ii) of Definition 1. We now consider system (1) on the probability space (Ω, FT , QTρ ). Equivalently, we consider system (13) driven by the uncertainty input ξρ (t) on the probability space

T (Ω, FT , QTρ ). Note that since 0 ξρ (t)2 ≤ ρ P -a.s., Proposition 1 implies that the corresponding output z(·) of system (1) satisfies the conditions (23)

T

EQρ

 0

T



z(s)2 ds < ∞,

T

0

z(s)2 ds < ∞

QTρ -a.s.

for any T > 0. We now use the fact that condition (17) implies that for any pair ˜ (˜ z (·), ξ(·)), z˜(·) ∈ L2 [0, T ], T > 0, related by (16) 

T

0

2 ˜ ξ(t) dt ≤

 0

T

˜ z (t)2 dt;

e.g., see [25]. Hence from this observation and from (23), it follows that the pair (z(·), ξ(·)), where z(·) and ξ(·) are defined by system (1) and the operator (16), satisfies the condition  T  T (24) ξ(t)2 dt ≤ z(t)2 dt QTρ -a.s. 0

0

Then, the definition of the uncertainty input ξρ (·) and condition (24) imply that for each T > 0   1 T z(s)2 − ξρ (s)2 ds ≥ 0 QTρ -a.s. (25) T 0 From the above condition, it follows that for each ρ > 0 1 inf T  >T T 



T

0

T

EQρ

  z(s)2 − ξρ (s)2 ds ≥ 0.

Note that the expectation on the left-hand side of the above inequality exists by virtue of (23). Obviously in this case, one can find a constant d > 0 and a variable δ(T ) which is independent of ρ and such that limT →∞ δ(T ) = 0 and 1 QTρ  inf E T  >T 2T 

 0

T

  d z(s)2 − ξρ (s)2 ds ≥ − + δ(T ). 2

This, along with the representation of the relative entropy between the probability measure QTρ and the reference probability measure P T given in (21), leads us to the conclusion that for the H ∞ norm-bounded uncertainty under consideration, the corresponding martingale ζρ (t), ρ > 0, satisfies the constraint (8). This completes the proof of the lemma.

MINIMAX LQG CONTROL

1199

Remark 4. In the special case where the uncertainty is modeled by the operator (16) with L2 -induced norm less then one, and where the uncertainty output z(·) of the closed-loop system is known to be QT mean square–integrable on any interval [0, T ], the above proof shows that such an uncertainty can be characterized directly in terms of the martingale ζ(t) and the associated probability measures QT . That is, one can choose ζi (t) = ζ(t) and QTi = QT in Definition 1. This will be true, for example, if the chosen controller is a stabilizing controller; see Definition 2. However, in the general case, the connection between the uncertainty output z(·) and the uncertainty input ξ(·) can be of a more complex nature than that described by (16). In this case, the QT mean square–integrability of the uncertainty output z(·) is not known a priori. Hence, one cannot guarantee that h(QT P T ) < ∞ for all T > 0. Also, the expectation    T 1 QT 2 T T E z(t) dt − h(Q P ) T 0 may not exist for all T > 0 unless it has already been proved that the controller (2) is a stabilizing controller. In this case, the approximations of the martingale ζ(t) allow us to avoid the difficulties arising when defining an admissible uncertainty for the uncertain system (1) controlled by a generic linear output-feedback controller. 3. Absolute stability and absolute stabilizability. An important issue in any optimal control problem on an infinite time interval concerns the stabilizing properties of the optimal controller. For example, a critical issue addressed in [15, 16, 18] was to prove the absolutely stabilizing property of the optimal control schemes presented in those papers. In this paper, the systems under consideration are subject to additive noise. Hence, we need a definition of absolute stabilizability which properly accounts for this feature of the systems under consideration. Definition 2. A controller of the form (2) is said to be an absolutely stabilizing output-feedback controller for the stochastic uncertain system (1), (8) if the process x(·) defined by the closed-loop system corresponding to this controller satisfies the following condition. There exist constants c1 > 0, c2 > 0 such that for any admissible uncertainty martingale ζ(·) ∈ Ξ (26)

   T   1 QT 2 2 T T E x(t) + u(t) dt + h(Q P ) ≤ c1 + c2 d. lim sup T →∞ T 0

The property of absolute stability is defined as a special case of Definition 2 corresponding to u(·) ≡ 0. In this case, system (1) becomes a system of the form (27)

dx(t) = Ax(t)dt + B2 dW (t), z(t) = C1 x(t).

Definition 3. The stochastic uncertain system corresponding to the state equations (27) with uncertainty satisfying the relative entropy constraint (8) is said to be absolutely stable if there exist constants c1 > 0, c2 > 0 such that for any admissible uncertainty martingale ζ(·) ∈ Ξ (28)

   T 1 QT 2 T T E x(t) dt + h(Q P ) ≤ c1 + c2 d. lim sup T →∞ T 0

1200

VALERY A. UGRINOVSKII AND IAN R. PETERSEN

In what follows, the following property of mean square stable systems will be used; see [21]. For the sake of completeness, the proof of the following lemma is given in Appendix B. Lemma 3. Suppose the stochastic nominal system (27) is mean square stable; i.e.,  T 1 lim sup E (29) x(t)2 dt < ∞. T →∞ T 0 Also, suppose the pair (A, B2 ) is stabilizable. Then, the matrix A must be stable. 4. Infinite-horizon minimax optimal control problem. Associated with the stochastic uncertain system (1), (8), consider a cost functional J(·) of the form (30)

J(u(·), ζ(·)) = lim sup T →∞

1 QT E 2T

 0

T

F (x(t), u(t))dt,

defined on solutions x(·) to (1). In (30), F (x, u) := x Rx + u Gu, and R and G are positive-definite symmetric matrices, R ∈ Rn×n , G ∈ Rm×m . Also, in (30), QT is the probability measure corresponding to the martingale ζ(·); see (5). In this paper, we are concerned with a minimax optimal control problem associated with system (1), cost functional (30), and uncertainty set Ξ. In this problem, we seek to find a controller u∗ (·) of the form (2) which minimizes the worst-case value of the cost functional J in the face of uncertainty ζ(·) ∈ Ξ satisfying the constraint (8): (31)

sup J(u∗ (·), ζ(·)) = inf

sup J(u(·), ζ(·)).

u(·)∈U ζ(·)∈Ξ

ζ(·)∈Ξ

The derivation of a solution to the above minimax optimal control problem relies on a duality relationship between free energy and relative entropy established in [1]; see Lemma 8 of Appendix A. Associated with system (1), consider the parameterdependent risk-sensitive cost functional    T 2τ 1 τ,T (u(·)) := (32) Fτ (x(t), u(t))dt , log E exp T 2τ 0 where τ > 0 is a given constant and (33)

(34)

Fτ (x, u) := x Rτ x + 2x Υτ u + u Gτ u, Rτ := R + τ C1 C1 , Gτ := G + τ D1 D1 , Υτ := τ C1 D1 .

We will apply the duality result of Lemma 8 of Appendix A; also, see [1]. When applied to system (1) and the risk-sensitive cost functional (32) (see Corollary 3.1 and Remark 3.2 of [1]), this result states that for each admissible controller u(·) (35)

sup Jτ,T (u(·), QT ) =

QT ∈P

T

1 τ,T (u(·)), 2

1201

MINIMAX LQG CONTROL

where 1 Jτ,T (u(·), Q ) := T



T

(36)

1 QT E 2



T

0

 T

T

Fτ (x(t), u(t))dt − τ h(Q P ) .

The use of the duality result (35) is a key step that enables us to replace the minimax optimal control problem by a risk-sensitive optimal control problem. Hence, we will be interested in constructing an output-feedback controller of the form (2) solving the following stochastic risk-sensitive optimal control problem:1 (37)

inf

lim τ,T (u(·)).

u(·)∈U T →∞

5. A connection between risk-sensitive optimal control and minimax optimal control. In this section, we present results establishing a connection between the risk-sensitive optimal control problem (37) and the minimax optimal control problem (31). For a given constant τ > 0, let Vτ denote an optimal value of the risk-sensitive control problem (37); i.e., Vτ := inf

lim τ,T (u(·))    T 1 2τ log E exp Fτ (x(s), u(s))ds . = inf lim 2τ 0 u(·) T →∞ T u(·)∈U T →∞

Theorem 1. Suppose that for a given τ > 0 the risk-sensitive control problem (37) admits an optimal controller uτ (·) ∈ U of the form (2) which guarantees a finite optimal value: Vτ < ∞. Then this controller is an absolutely stabilizing controller for the stochastic uncertain system (1) satisfying the relative entropy constraint (8). Furthermore, sup J(uτ (·), ζ(·)) ≤

(38)

ζ(·)∈Ξ

1 (Vτ + τ d). 2

Proof. It follows from the condition of the theorem that    T 1 2τ log E exp Vτ = lim Fτ (x(s), uτ (s))ds < ∞, T →∞ T 2τ 0 where uτ (·) is the risk-sensitive optimal controller of the form (2) corresponding to the given τ . We wish to prove that this risk-sensitive optimal controller satisfies condition (26) of Definition 2. 1A

risk-sensitive control problem of the form (37) was considered in [9]. That paper defines the class of admissible infinite-horizon risk-sensitive controllers as those controllers which satisfy a certain causality condition. This causality condition is formulated in terms of corresponding martingales and ensures that the probability measure transformations required in [9] are well defined. As observed in [9], linear controllers satisfy this causality condition. Furthermore, it is shown in [9] that a solution to the risk-sensitive optimal control problem (37), in the broader class of nonlinear output-feedback controllers satisfying such a causality condition, is attained by a linear controller of the form (2). This implies that the class of admissible controllers in the risk-sensitive control problem (37) can be restricted to include only linear output-feedback controllers.

1202

VALERY A. UGRINOVSKII AND IAN R. PETERSEN

Using the duality result (35), we obtain 

1 lim sup T →∞ QT ∈P T T

(39)

1 QT E 2



T

0

 T

T

Fτ (x(s), uτ (s))ds − τ h(Q P ) =

Vτ . 2

Equation (39) implies that, for any sufficiently large T > 0, one can choose a suffiˆ ) > 0 such that limT →∞ δ(T ˆ ) = 0 and ciently small positive constant δˆ = δ(T (40)

sup

T

Q

∈PT 

1 T





1 QT  E 2

T

0

 



Fτ (x(s), uτ (s))ds − τ h(QT P T ) ≤

Vτ ˆ ) + δ(T 2

ˆ ) > 0 and for all T  > T , for all T  > T . Thus, for the chosen constants T > 0 and δ(T 1 T



1 QT  E 2



T

0

 T

Fτ (x(s), uτ (s))ds − τ h(Q P



T

Vτ ˆ ) + δ(T 2

) ≤



for any QT ∈ PT  . Furthermore, if QT ∈ PT  for all T  > T , then     1 1 QT  T Vτ T T ˆ ). (41) E + δ(T sup  Fτ (x(s), uτ (s))ds − τ h(Q P ) ≤  T 2 2 T >T 0 Let ζ(·) ∈ Ξ be a given admissible uncertainty martingale and let ζi (·) be a corresponding sequence of martingales as in Definition 1. Recall that the corresponding probability measures QTi belong to the set PT for all T > 0. Hence each probability measure QTi satisfies condition (41); i.e., 1 sup  T  >T T

(42)



1 QTi  E 2

 0

T

 Fτ (x(s), uτ (s))ds −

  τ h(QTi P T )



Vτ ˆ ). + δ(T 2

ˆ ) and T are the constants which are independent of i. Note that in condition (42), δ(T Since F (x, u) ≥ 0 and τ > 0, condition (42) implies 1 sup  T  >T T



1 QTi  E 2

 0

T

 2

z(s) ds −

  h(QTi P T )

< ∞.

From this, it follows from (42) that for each integer i 1 QTi  sup E  T  >T 2T

 0

T

F (x(s), uτ (s))ds

    1 1 QTi  T 2 T T E + τ inf z(s) ds − h(Qi P ) T  >T T  2 0      1 1 QTi  T  2 T T ≤ sup  F (x(s), uτ (s)) + τ z(s) ds − τ h(Qi P ) E 2 T  >T T 0



1 ˆ ). Vτ + δ(T 2

1203

MINIMAX LQG CONTROL

This implies 1 QTi  E 2T 



T

F (x(s), uτ (s))ds    T T   Vτ 1 1 Q 2 T T ˆ ) − τ inf ≤ z(s) ds − h(Qi P ) + δ(T E i T  >T T  2 2 0 sup

T  >T

(43)



0

1 ˆ ) − τ δ(T ). (Vτ + τ d) + δ(T 2

The derivation of the last line of inequality (43) uses the fact that the probability measure QTi satisfies condition (8). Also, note that in condition (43), the constants ˆ ), δ(T ), and T are independent of i and T  > T . δ(T We now let i → ∞ in inequality (43). This leads to the following proposition. Proposition 2. For any admissible uncertainty ζ(·) ∈ Ξ, (44)

sup

T  >T

1 QT  E 2T 

 0

T

F (x(s), uτ (s))ds ≤

1 ˆ ) − τ δ(T ). (Vτ + τ d) + δ(T 2 

To establish this proposition, consider the space L1 (Ω, FT  , P T ) endowed with the topology of weak convergence of random variables, where T  > T . We denote this space by Lw 1 . Define the functional     T 1 φ(ν) :=  E ν (45) F (x(s), uτ (s))ds , T 0 mapping Lw 1 into the space of extended reals R = R ∪ {−∞, ∞}. Also, consider a sequence of functionals mapping Lw 1 → R defined by     T 1 φN (ν) :=  E ν (46) FN (x(s), uτ (s))ds , N = 1, 2, . . . , T 0 where each function FN (·) is defined as follows: F (x, u) if F (x, u) ≤ N , FN (x, u) := N if F (x, u) > N . Note that from the above definition, the sequence φN (ν) is monotonically increasing in N for each ν. Also, we note that for any N > 0    T 1 P FN (x(s), uτ (s))ds ≤ N = 1. T 0 Hence, if νi → ν weakly, then φN (νi ) → φN (ν). That is, each functional φN (·) is continuous on the space Lw 1 . Therefore, the functional φ(ν) = lim φN (ν) N →∞

is lower semicontinuous; e.g., see Theorem 10 on page 330 of [13]. Now let ν = ζ(T  )  be the Radon–Nikod´ ym derivative of the probability measure QT , and let νi = ζi (T  )

1204

VALERY A. UGRINOVSKII AND IAN R. PETERSEN 

be the Radon–Nikod´ ym derivative of the probability measure QTi . Then, the fact   that ζi (T ) → ζ(T ) weakly implies 1 QT  E 2T 



  1 QTi  T F (x(s), uτ (s))ds ≤ lim inf E F (x(s), uτ (s))ds i→∞ 2T  0 1 ˆ ) − τ δ(T ). ≤ (Vτ + τ d) + δ(T 2

T

0

(47)

Since the constants on the right-hand side of (47) are independent of T  > T , condition (44) of the proposition now follows. This completes the proof of the proposition. Note that from the above proposition, (38) follows. Indeed, for any ζ(·) ∈ Ξ, ˆ ), δ(T ) → 0 as T → ∞ together imply Proposition 2 and the fact that δ(T J(uτ (·), ζ(·)) = lim sup

T →∞ T  >T

(48)

1 QT  E 2T 

1 (Vτ + τ d). 2



 0

T

F (x(s), uτ (s))ds

From condition (48), equation (38) of the theorem follows. We now establish the absolute stabilizing property of the risk-sensitive optimal controller uτ (·). Indeed, since the matrices R and G are positive-definite, inequality (44) implies (49)

lim sup T →∞

1 QT E T



T



0

 x(s)2 + uτ (s)2 ds ≤ α (Vτ + τ d) ,

where α is a positive constant which depends only on R and G. To complete the proof, it remains to prove that there exist constants c1 , c2 > 0 such that lim sup

(50)

T →∞

1 h(QT P T ) < c1 + c2 d. T

To this end, we note that for any sufficiently large T and for all T  > T , the constraint (8) implies (51)

  1 1 QTi  h(QTi P T ) ≤ E  T 2T 

 0

T

z(s)2 ds +

d − δ(T ) 2

for all i = 1, 2, . . . . We now observe that condition (43) implies that for all T  > T (52)

1 QTi  E 2T 

 0

T

z(s)2 ds ≤ c¯



1 ˆ ) − τ δ(T ) , (Vτ + τ d) + δ(T 2

where c¯ is a positive constant determined only by the matrices R, G, C1 , and D1 . From conditions (51), (52), Remark 1, and the fact that the relative entropy functional is lower semicontinuous, it follows that     1 1 h(QT P T ) ≤ lim inf  h(QTi P T ) i→∞ T T

d 1 ˆ ≤ c¯ (Vτ + τ d) + δ(T ) − τ δ(T ) + − δ(T ) 2 2

MINIMAX LQG CONTROL

1205

ˆ ) → 0 as T → ∞ for any T  > T . This inequality and the fact that δ(T ) → 0 and δ(T together imply that lim sup T →∞

1 1 h(QT P T ) ≤ (¯ cVτ + (1 + c¯τ )d). T 2

Combining this condition and inequality (49), we obtain condition (26), where the constants c1 , c2 are defined by Vτ , τ , α, c¯, and hence independent of ζ(·) ∈ Ξ. Remark 5. It is straightforward to extend the result of Theorem 1 to the case in which the uncertainty output is structured; i.e., z1 (t) = C1,1 x(t) + D1,1 u(t), .. . zk (t) = C1,k x(t) + D1,k u(t). In this case, we need k relative entropy uncertainty constraints of the form (8) to define the admissible uncertainty. The corresponding k risk-sensitive control problem involves k scaling parameters τ1 ≥ 0, . . . , τk ≥ 0, j=1 τj > 0. To formulate conditions under which a converse to Theorem 1 holds, we consider the closed-loop system corresponding to system (1) and a linear time-invariant output-feedback controller of the form (2). Recall that the closed-loop nominal system corresponding to controller (2) is described by the linear Ito differential equation (3) on the probability space (Ω, F, P ). In what follows, we will consider the class of linear controllers of the form (2) satisfying the following assumptions: the matrix A¯ ¯ B) ¯ is controllable, and the pair (A, ¯ R) ¯ is observable, where is stable, the pair (A,   R 0 ¯ R= (53) . 0 K  GK ¯ Also, let D0 be the set of all linear functions φ(¯ x) = Φ¯ x such that the matrix A¯ + BΦ ¯ ¯ ¯ ¯ ¯ is stable. Note that the pair (A + BΦ, B) is controllable since the pair (A, B) is controllable. Under these assumptions, the Markov process generated by the linear system (54)

¯ xφ (t)dt + BdW ¯ (t) d¯ xφ (t) = (A¯ + BΦ)¯

has a unique invariant probability measure ν φ on Rn+ˆn ; e.g., see [24]. It is shown in [24] that the probability measure ν φ is a Gaussian probability measure. Lemma 4. For every function φ(¯ x) = Φ¯ x, φ(·) ∈ D0 , there exists a martingale ζ(·) ∈ M∞ such that for any T > 0 the process (55)

˜ (t) = W (t) − W

 0

t

Φ¯ x(s)ds

is a Wiener process with respect to {Ft , t ∈ [0, T ]} and the probability measure QT corresponding to the martingale ζ(·). In (55), x ¯(·) is the solution to the nominal closed-loop system (3) with initial probability distribution ν φ . Furthermore, (56)

¯ xdt + Bd ¯ W ˜ (t), d¯ x = (A¯ + BΦ)¯

1206

VALERY A. UGRINOVSKII AND IAN R. PETERSEN

considered on the probability space (Ω, FT , QT ), admits a stationary solution x ¯ζ (·) such that QT (¯ xζ (t) ∈ d¯ x) = ν φ (d¯ x).

(57)

Proof. Let ν φ be the Gaussian invariant probability measure corresponding to a given φ(·) ∈ D0 . Consider a stochastic process x ¯(t) defined by (3) and having x(0) ∈ d¯ x) = ν φ (d¯ initial probability distribution ν φ ; i.e., P (¯ x). Since the probability measure ν φ is Gaussian, there exists a constant δ0 > 0 such that E exp(δ0 ¯ x(0)2 ) =



exp(δ0 ¯ x2 )ν φ (d¯ x) < ∞.

Hence, using the multivariate version of Theorem 4.7 on page 137 of [6] along with Example 3 on page 220 of [6], this leads to the satisfaction of the conditions of Lemma 1, ˜ (·) defined by (55) is a Wiener process with which shows that the random process W respect to {Ft , t ∈ [0, T ]} and the probability measure QT defined in Lemma 1. We now consider system (56) on the probability space (Ω, FT , QT ) with initial distribution ν φ . Also, consider system (54) on the probability space (Ω, FT , P T ) with initial distribution ν φ . It follows from Proposition 3.10 on page 304 of [4] that the stochastic process x ¯ζ (·) defined by (56) and the corresponding stochastic process x ¯φ (·) defined by (54) have the same probability distribution under their respective probability measures. Also, as in [2, 1], h(QT P T ) =

1 QT E 2

 0

T

Φ¯ x(t)2 dt =

1 2



Φ¯ x2 ν φ (d¯ x) < ∞

for each T < ∞, since x ¯(t) is the solution to system (3) with Gaussian initial distribution ν φ . Thus, QT ∈ PT for all T > 0. Hence, ζ(·) ∈ M∞ . From this, the lemma follows. We now present a converse to Theorem 1. Theorem 2. Suppose that there exists a controller u∗ (·) ∈ U such that the following conditions are satisfied: (i) supζ(·)∈Ξ J(u∗ (·), ζ(·)) < c < ∞. (ii) The controller u∗ (·) is an absolutely stabilizing controller such that the corre¯ B) ¯ is controllable, and the sponding closed-loop matrix A¯ is stable, the pair (A, ¯ R) ¯ is observable. pair (A, Then there exists a constant τ > 0 such that the corresponding risk-sensitive optimal control problem (37) has a solution which guarantees a finite optimal value. Furthermore, (58)

1 (Vτ + τ d) < c. 2

The proof of this theorem follows along the same lines as the proof of the necessity part of the main result of [21]. For the sake of completeness, the modification of this proof adapted to the condition of Theorem 2 is presented below. We first establish the following lemma.

1207

MINIMAX LQG CONTROL

Lemma 5. Consider the uncertain closed-loop system (3), (8) in which the pair ¯ B) ¯ is controllable. Also, consider a nonnegative-definite matrix R ¯ such that the (A, ¯ ¯ pair (A, R) is observable. If the system (3), (8) is absolutely stable, then there exists a positive constant τ0 > 0 such that the Riccati equation ¯ + τ0 C¯  C¯ + 1 ΠB ¯B ¯ Π = 0 A¯ Π + ΠA¯ + R τ0

(59)

admits a positive-definite stabilizing solution. Proof. Since the uncertain system (3), (8) is absolutely stable, there exists a positive constant c˜ such that for all ζ(·) ∈ Ξ (60)

lim inf T →∞

1 QT E 2T

 0

T

¯ x(s)ds + ε¯ lim inf 1 EQT x ¯(s) R¯ T →∞ 2T

 0

T

¯ x(t)2 dt ≤ c˜.

Here ε¯ > 0 is a sufficiently small positive constant. Consider the functionals  1 QT T ¯ x(s)ds E G0 (ζ(·)) := c˜ − lim inf x ¯(s) R¯ T →∞ 2T 0  1 QT T ¯ x(t)2 dt, − ε¯ lim inf E T →∞ 2T  0   1 1 QT T d 2 T T G1 (ζ(·)) := − − lim inf E (61) z(s) ds − h(Q P ) . T →∞ T 2 2 0 Note that since the system (3), (8) is absolutely stable, both of these functionals are well defined on the set Ξ. Now consider a martingale ζ(·) ∈ M∞ such that (62)

G1 (ζ(·)) ≤ 0.

This condition implies that the martingale ζ(·) satisfies the conditions of Definition 1 with ζi (·) = ζ(·). Indeed, condition (i) of Definition 1 is satisfied since ζ(·) ∈ M∞ . Condition (ii) is trivial in this case. Also, let δ(T ) be any function chosen to satisfy the conditions limT →∞ δ(T ) = 0 and     1 1 QT  T d 2 T T E inf z(s) ds − h(Q P ) ≥ − + δ(T ) T  >T T  2 2 0 for all sufficiently large T > 0. The existence of such a function δ(T ) follows from condition (62). Then condition (8) of Definition 1 is also satisfied. Thus, condition (62) implies that each martingale ζ(·) ∈ M∞ satisfying this condition is an admissible uncertainty martingale. That is, ζ(·) ∈ Ξ. From condition (60), it follows that G0 (ζ(·)) ≥ 0. We have now shown that the satisfaction of condition (60) implies that the following condition is satisfied: (63)

If G1 (ζ(·)) ≤ 0, then G0 (ζ(·)) ≥ 0.

Furthermore, the set of martingales satisfying condition (62) has an interior point ζ(t) ≡ 1; see the remark following Definition 1. Also, it follows from the properties

1208

VALERY A. UGRINOVSKII AND IAN R. PETERSEN

of the relative entropy functional that the functionals G0 (·) and G1 (·) are convex. We have now verified all of the conditions needed to apply the Lagrange multiplier result (e.g., see [7]). Indeed, Theorem 1 on page 217 of [7] implies that there exists a constant τ0 ≥ 0 such that (64)

G0 (ζ(·)) + τ0 G1 (ζ(·)) ≥ 0

for all ζ(·) ∈ M∞ . We now show that the conditions of the theorem guarantee that τ0 > 0. Proposition 3. In inequality (64), τ0 > 0. Consider system (56) where φ(¯ x) := Φ¯ x belongs to D0 . From Lemma 4, the corresponding martingale ζΦ (·) belongs to the set M∞ . Now consider the quantity  1 QTΦ T ¯ x(t)dt. E x ¯(t) R¯ lim inf T →∞ 2T 0 Here, QTΦ is the probability measure corresponding to the martingale ζΦ (·), and x ¯(·) is the solution to the corresponding system (1) considered on the probability space (Ω, FT , QTΦ ). Also, consider the Lyapunov equation (65)

¯ +R ¯ = 0. ¯  Π + Π(A¯ + BΦ) (A¯ + BΦ)

¯ is stable, then this matrix equation admits a nonnegativeSince the matrix A¯ + BΦ definite solution Π. Using Ito’s formula, it is straightforward to show that (65) leads to the inequality  T T 1 ¯ x(t)dt ≥ tr B ¯B ¯  Π. x ¯(t) R¯ lim inf EQΦ T →∞ T 0 This condition implies that 1 QT E sup lim inf (66) T →∞ 2T ζ(·)∈M∞

 0

T

¯ x(t)dt ≥ x ¯(t) R¯

sup

¯ BΦ ¯ is stable Φ:A+

1 ¯ ¯ tr B B Π = ∞. 2

Using (66), the proposition follows. Indeed, suppose that τ0 = 0. Then condition (64) implies that   T T 1 ¯ x(t)dt lim inf T →∞ 2T EQ 0 x ¯(t) R¯ sup (67) ≤ c˜ < ∞. T T 1 ζ(·)∈M∞ EQ 0 ¯ x(t)2 dt +¯ ε lim inf T →∞ 2T Inequality (67) leads to a contradiction with condition (66). From this, it follows that τ0 > 0. Proposition 4. The Riccati equation (59) with τ0 defined above admits a positivedefinite stabilizing solution. ¯ R ¯ + τ0 C¯  C) ¯ is observable, since the pair (A, ¯ R) ¯ We first note that the pair (A, is observable. Hence, if Π ≥ 0 satisfies (59), then Π > 0. Thus, it is sufficient to prove that (59) admits a nonnegative-definite stabilizing solution. This is true if and only if the following bound on the H ∞ norm of the corresponding transfer function is satisfied: (68)

Hτ0 (s)∞ ≤ 1,

1209

MINIMAX LQG CONTROL

where

  Hτ0 (s) := 

¯ 1/2 √1 R τ0 C¯

√ √ ε¯ I τ0

  ¯ −1 B; ¯  (sI − A)

see Lemma 5 and Theorem 5 of [22]. In order to prove the above claim, we note that condition (64) implies that for any martingale ζ(·) ∈ M∞ T 1 lim inf EQ T →∞ T

 0

T

 1 1 QT T ε¯ ¯ E x ¯(t) R¯ lim inf ¯ x(t)2 dt x(t)dt + 2τ0 τ0 T →∞ 2T 0    1 1 QT T 2 T T + lim inf z(s) ds − h(Q P ) E T →∞ T 2 0 ≤

(69)

d c˜ − . τ0 2

We will show that the satisfaction of condition (68) follows from (69). Suppose condition (68) is not true. That is, suppose that (70)

Hτ0 (s)∞ > 1.

Consider a set P+ of deterministic power signals ξ(t), t ∈ (−∞, ∞), for which the autocorrelation matrix exists and is finite and for which the power spectral density function exists. Furthermore, ξ(t) = 0 if t < 0. It can be shown that Hτ0 P+ = Hτ0 ∞ , where Hτ0 P+ denotes the induced norm of the convolution operator P+ → P+ defined by the transfer function Hτ0 (s). The proof of this fact is a minor variation of the proof of the corresponding fact given in [25]. Now consider the following state space realization of the transfer function Hτ0 (s): d¯ x1 ¯ ¯x1 + Bξ(t), = A¯ dt  1 1/2  ¯ √ R τ0   C¯ ¯1 . z1 =  x

(71)

√ √ ε¯ I τ0

Then, the fact that Hτ0 P+ = Hτ0 ∞ > 1 leads to the following conclusion: (72)

1 ξ(·)∈P+ T →∞ T sup



lim

0

T



 z1 (t)2 dt − ξ(t)2 dt = ∞.

In (72), z1 (·) is the output of system (71) corresponding to the input ξ(·) ∈ P+ and an arbitrarily chosen initial condition x ¯1 (0).2 That is, for any N > 0 there exists an uncertainty input ξN (·) ∈ P+ such that 1 lim T →∞ T 2 Note

(71).

 0

T



 z1 (t)2 dt − ξN (t)2 dt > N.

that the limit on the left-hand side of (72) is independent of the initial condition of system

1210

VALERY A. UGRINOVSKII AND IAN R. PETERSEN

This condition implies that for a sufficiently small ε > 0 there exists a constant T (ε, N ) > 0 such that 1 T

(73)



T

0



 z1 (t)2 dt − ξN (t)2 dt > N − ε

for all T > T (ε, N ). We now suppose that the initial condition of system (71) is a random variable x ¯0 . This system is driven by the input ξN (·). In this case, system (71) gives rise to an F0 -measurable stochastic process x ¯1 (·). Furthermore, for all T > T (ε, N ), inequality (73) holds with probability one. Now note that the signal ξN (·) is a deterministic signal; hence, it satisfies the conditions of Lemma 1. Therefore, for this process, the martingale ζN (·) ∈ M, the probability measure QTN , and the Wiener ˜ (·) can be constructed as described in Lemma 1. Also, since ξN (·) ∈ P+ process W and is deterministic, then for any T > 0  T  T T EQN ξN (t)2 dt = ξN (t)2 dt < ∞. 0

0

From this observation, it follows that ζN (·) ∈ M∞ , and also the random variable on the left-hand side of inequality (73) has the finite expectation with respect to the probability measure QTN . Furthermore, using inequality (73), one can prove that the system ¯ W ˜ (t), ¯x + Bξ ¯ N (t))dt + Bd d¯ x = (A¯

(74)

x ¯(0) = x ¯0 ,

considered on the probability space (Ω, FT , QTN ), satisfies the following condition: 1 QTN E T



T

0



x ¯ (t)



1 ¯ ¯ ¯ ε¯ x(t)2 − ξN (t)2 dt > N − ε. R+C C x ¯(t) + ¯ τ0 τ0

This condition can be established using the same arguments as those used in proving the corresponding fact in [19]. Hence,



1 QTN T 1 ¯ ¯ ¯ ε¯  2 2 E x ¯ (t) (75) lim x(t) − ξN (t) dt ≥ N. R+C C x ¯(t) + ¯ T →∞ T τ0 τ0 0 Letting N → ∞ in (75) and using the representation of the relative entropy h(QTN P T ), we obtain a contradiction with (69):  

1 QT T 1 ¯   lim inf E x ¯ (t) R¯ x (t)dt T →∞   T 0 2τ0     T T 1 ε¯ Q 2 + lim inf E ¯ x (s) ds sup T →∞ 2T τ0 0    ζ∈M∞   

   + lim inf T →∞ 1 1 EQT T z(s)2 ds − h(QT P T )  T 2 0  T T ¯ x(t)dt  limT →∞ T1 EQN 0 2τ10 x ¯(t) R¯   T T 1 ≥ sup + τε¯0 limT →∞ 2T EQN 0 ¯ x(s)2 ds N >0    T T 1 + limT →∞ 2T EQN 0 (z(s)2 − ξN (s)2 )ds = ∞.

      

1211

MINIMAX LQG CONTROL

Thus, condition (68) holds. As observed above, the proposition follows from this condition. Consequently, Lemma 5 follows from Proposition 4. Proof of Theorem 2. This proof exploits a large deviation result established in [14]. We first note that since the given controller u∗ (·) is an absolutely stabilizing ¯ R) ¯ is observable, the uncertain closed-loop system (3), (8) controller and the pair (A, is absolutely stable. Furthermore, condition (i) of the theorem implies that there exists a sufficiently small positive constant ε¯ > 0 such that for all ζ(·) ∈ Ξ (76) lim inf T →∞

1 QT E 2T



T

0

¯ x(s)ds + ε¯ lim inf 1 EQT x ¯(s) R¯ T →∞ 2T

 0

T

¯ x(t)2 dt ≤ c − ε¯.

¯ is the matrix corresponding to the controller u∗ (·) as defined in (53). Also, Here R c > 0 is the constant defined in condition (i) of the theorem. Then, it follows from Lemma 5 that there exists a positive constant τ0 > 0 such that the Riccati equation (59) has a positive-definite stabilizing solution. The existence of such a constant τ0 is established using condition (76) in the same manner as in the proof of Lemma 5. Also, as in the proof of Lemma 5, it follows that for any martingale ζ(·) ∈ M∞ , lim inf T →∞

(77)

1 QT E T



 T 1 ¯ x(t)dt + ε¯ lim inf 1 EQT x ¯(t) R¯ ¯ x(t)2 dt T →∞ 2T 0 2 0    1 1 QT T + lim inf z(s)2 ds − h(QT P T ) E T →∞ T 2 0 T

d ≤ c − ε¯ − τ0 . 2

¯ B) ¯ is controllable, and the pair Furthermore, the matrix A¯ is stable, the pair (A, ¯ R ¯ + τ0 C¯  C) ¯ is observable. The above conditions and the condition that Riccati (A, equation (59) has a positive-definite stabilizing solution are the conditions of Example 2.2 of [14]. It follows from this example that

(78)

  T 1 τ0  ¯ ¯ ¯ log E exp x ¯ (t)(R + τ0 C C)¯ x(t)dt lim T →∞ T 2τ0 0   1   ¯ ¯ x − τ0 φ(x)2 ν φ (d¯ x ¯ (R + τ0 C¯  C)¯ x), = 2

¯  Π¯ where φ(¯ x) = 1/τ0 B x and Π is the positive-definite stabilizing solution to Riccati equation (59). On the left-hand side of (78), x ¯(·) is the solution to (3) corresponding to the given controller of form (2) and a given initial condition. It is shown in [14] that the value on both sides of (78) is independent of this initial condition. For the function φ(·) defined above, consider the martingale ζ(·) ∈ M∞ and the corresponding stationary solution x ¯(·) to system (56) with initial distribution ν φ constructed as in Lemma 4. For this martingale ζ(·) and stationary solution x ¯(·), condition (78) leads to the following expression for the risk-sensitive cost:

1212

VALERY A. UGRINOVSKII AND IAN R. PETERSEN

  T 1 τ0  ¯ ¯ ¯ log E exp lim x ¯ (t)(R + τ0 C C)¯ x(t)dt T →∞ T 2τ0 0   1   ¯ ¯ x − τ0 φ(x)2 ν φ (d¯ x ¯ (R + τ0 C¯  C)¯ = x) 2  1 QT T = lim inf E F (x(s), u∗ (s))ds T →∞ 2T 0    1 1 QT T 2 T T E + τ0 lim inf z(s) ds − h(Q P ) . T →∞ T 2 0

(79)

Also, note that the right-hand side of the above equation is independent of the initial condition of system (56). This fact is readily established using Ito’s formula and the ¯B ¯  Π is stable. Therefore, on the right-hand side of fact that the matrix A¯ + τ10 B inequality (79), the stationary process x ¯(·) can be replaced by the solution x ¯(·) to system (56) corresponding to the given initial condition. Then, (79) and (77) imply that   T 1 τ0   ¯ + τ0 C¯ C)¯ ¯ x(t)dt ≤ c − ε¯ − τ0 d. (80) log E exp lim x ¯ (t)(R T →∞ T 2τ0 0 2 Thus, Vτ0

   T 1 2τ0  ¯ ¯ ¯ log E exp ≤ lim x ¯ (t)(R + τ0 C C)¯ x(t)dt < 2c − τ0 d. T →∞ T 2τ0 0

Hence the optimal value of the corresponding risk-sensitive control problem (37) is finite. 6. Design of the infinite-horizon minimax optimal controller. In this section, we present the main result of the paper. This result shows that the solution to an infinite-horizon minimax optimal control problem of the form (31) can be obtained via optimization over solutions to a scaled risk-sensitive control problem of the form (37). Therefore, this result extends the corresponding result of [19] to the case where the underlying system is considered on an infinite time interval. Consider the class U of linear controllers of the form (2). In what follows, we will focus on linear output feedback controllers of the form (2) having a controllable and observable state-space realization. The class of such controllers is denoted by U0 . The derivation of the main result of this paper makes use of parameter-dependent algebraic Riccati equations. Let τ > 0 be a constant. We consider the algebraic Riccati equations

(81)

(82)

(A − B2 D2 (D2 D2 )−1 C2 )Y∞ + Y∞ (A − B2 D2 (D2 D2 )−1 C2 )

1 − Y∞ C2 (D2 D2 )−1 C2 − Rτ Y∞ + B2 (I − D2 (D2 D2 )−1 D2 )B2 = 0, τ  −1   X∞ (A − B1 G−1 τ Υτ ) + (A − B1 Gτ Υτ ) X∞

1 −1  −1   + (Rτ − Υτ Gτ Υτ ) − X∞ B1 Gτ B1 − B2 B2 X∞ = 0. τ

The subsequent development relies on Theorem 3 of [9]. We now present a version of this theorem adapted to the notation used in this paper. We first note that some

MINIMAX LQG CONTROL

1213

of the conditions of Theorem 3 of [9] are automatically satisfied. Indeed, using the notation   1 1/2   √ R 0 τ ˜ 1 :=  √1 G1/2  , , D C˜1 :=  0 τ D1 C1  ˜ ˜ ˜  ˜ −1 D ˜  )C˜1 ≥ 0. Also, the pair (A − we obtain Rτ − Υτ G−1 τ Υτ = τ C1 (I − D1 (D1 D1 ) 1 −1  −1  B1 Gτ Υτ , Rτ − Υτ Gτ Υτ ) is detectable since the matrix   A − sI B1  R1/2  0   1/2   0 G √ √ τ C1 τ D1

has full column rank for all s such that Res ≥ 0. Lemma 6. Consider the risk-sensitive optimal control problem (37) with underlying system (1). Suppose the pair (A − B2 D2 (D2 D2 )−1 C2 , B2 (I − D2 (D2 D2 )−1 D2 ))

(83)

is stabilizable. Also, suppose that there exists a constant τ > 0 such that the following assumptions are satisfied: (i) Algebraic Riccati equation (81) admits a minimal positive-definite solution Y∞ . (ii) Algebraic Riccati equation (82) admits a minimal nonnegative-definite solution X∞ . (iii) The matrix I − τ1 Y∞ X∞ has only positive eigenvalues; that is, the spectral radius of the matrix Y∞ X∞ satisfies the condition ρ(Y∞ X∞ ) < τ ;

(84)

ρ(·) denotes the spectral radius of a matrix. If Y∞ ≥ Y0 , then there exists a controller solving risk-sensitive optimal control problem (37) where the infimum is taken over the set U. This optimal risk-sensitive controller is a controller of the form (2) with

(85)

  K := −G−1 τ (B1 X∞ + Υτ ), 1 Ac := A + B1 K − Bc C2 + (B2 − Bc D2 )B2 X∞ , τ −1

1 Bc := I − Y∞ X∞ (Y∞ C2 + B2 D2 )(D2 D2 )−1 . τ

The corresponding optimal value of the risk-sensitive cost is given by Vτ := inf lim τ,T (u(·)) u∈U T →∞   Y∞ Rτ + . = tr (Y∞ C2 + B2 D2 )(D2 D2 )−1 (C2 Y∞ + D2 B2 )X∞ (I − τ1 Y∞ X∞ )−1

(86)

Proof. See Theorem 3 of [9].

1214

VALERY A. UGRINOVSKII AND IAN R. PETERSEN

Remark 6. The condition Y∞ ≥ Y0 required by Lemma 6 is a technical condition needed to apply the results of [9] to risk-sensitive control problem (37). However, it can be seen from Lemma 6 that the resulting optimal risk-sensitive controller and the optimal risk-sensitive cost are independent of the matrix Y0 . Therefore, the condition of Lemma 6 requiring Y∞ ≥ Y0 can always be satisfied by a suitable choice of the matrix Y0 . Reference [9] does not address the issue of stability for the closed-loop system corresponding to the optimal risk-sensitive controller. However, Theorem 1 shows that the controller (2), (85) leads to a robustly stable closed-loop system. This fact is consistent with results showing that risk-sensitive controllers enjoy certain robustness properties; e.g., see [3, 20]. The following results show that the conditions of Lemma 6 are not only sufficient conditions, but also necessary conditions for the existence of a solution to the risk-sensitive optimal control problem under consideration, if such a solution is sought in the class of linear stabilizing controllers; cf. [8]. Lemma 7. Suppose the pair (83) is controllable and for some τ  > 0 there exists an absolutely stabilizing controller u ˜(·) ∈ U0 such that (87)

Vτ0 := lim τ  ,T (˜ u) = inf lim τ  ,T (u) < +∞. T →∞

u∈U0 T →∞

Then there exists a constant τ > 0 which satisfies conditions (i)–(iii) of Lemma 6. Furthermore, if for this τ the corresponding pairs (Ac , Bc ) and (Ac , K) defined by (85) are controllable and observable, respectively, then   Y∞ Rτ + (88) Vτ0 = tr . (Y∞ C2 + B2 D2 )(D2 D2 )−1 (C2 Y∞ + D2 B2 )X∞ (I − τ1 Y∞ X∞ )−1 In the proof of Lemma 7, the following proposition is used. Proposition 5. Suppose the pair (83) is controllable. Then, for any controller ¯ B) ¯ in the corresponding closed-loop system is controllable and u(·) ∈ U0 , the pair (A, ¯ R) ¯ is observable. the pair (A, The proof of this proposition is given in Appendix B. Proof of Lemma 7. We prove the lemma by contradiction. Suppose that for any τ > 0 at least one of conditions (i)–(iii) of Lemma 6 does not hold. That is, either (81) does not admit a positive-definite stabilizing solution, or (82) does not admit a nonnegative-definite stabilizing solution, or (84) fails to hold. Note that conditions (i)–(iii) of Lemma 6 are standard conditions arising in H ∞ control. Since for any stabilizing controller u(·) ∈ U0 the corresponding matrix A¯ is stable (see Proposition 5 and Lemma 3), then it follows from standard results on H ∞ control that if at least one of conditions (i)–(iii) of Lemma 6 fails to hold, then for any controller of the form (2)   ¯ −1 B ¯ ∞ ≥ 1; ˜ 1 K (jωI − A)  C˜1 D (89) see Theorem 3.1 of [10]. It is straightforward to verify that the conditions of Theorem 3.1 of [10] are satisfied. Furthermore, the strict bounded real lemma implies that the Riccati equation (90)

¯ +X ¯ A¯ + 1 X ¯ +R ¯ + τ C¯  C¯ = 0 ¯B ¯B ¯ X A¯ X τ

does not have a stabilizing positive definite solution. In this case, Lemma 5 implies that none of the controllers u(·) ∈ U0 leads to an absolutely stable closed-loop system. This leads to a contradiction with the assumption that an absolutely stabilizing

1215

MINIMAX LQG CONTROL

controller exists and belongs to U0 . This completes the proof by contradiction that there exists a constant τ which satisfies conditions (i)–(iii) of Lemma 6. It remains to prove (88). Note that Lemma 6 states that for each τ > 0 satisfying the conditions of that lemma, the optimal controller solving risk-sensitive control problem (86) is the controller (2), (85). Furthermore, it is assumed that the state-space realization of this controller is controllable and observable, and hence the optimal controller from Lemma 6 belongs to the set U0 . Therefore, Vτ0 = inf lim τ,T (u(·)) = inf lim τ,T (u(·)) = Vτ .

(91)

u∈U0 T →∞

u∈U T →∞

From this observation, (88) follows. We now define a set T ⊂ R as the set of constants τ ∈ R satisfying the conditions of Lemma 6. It follows from Lemma 6 that, for any τ ∈ T , the controller of form (2) with coefficients given by (85) represents an optimal controller in the risk-sensitive control problem (37), which guarantees the optimal value (88). Theorem 3. Assume that the pair (83) is controllable. (i) Suppose that the set T is nonempty and that τ∗ ∈ T attains the infimum in inf

(92)

τ ∈T

1 0 (V + τ d), 2 τ

where Vτ0 is defined in (88). Then the corresponding controller u∗ (·) := uτ∗ (·) of the form (2) defined by (85), with the pair (Ac , Bc ) being controllable and the pair (Ac , K) being observable, is an output-feedback controller guaranteeing that inf sup J(u, ζ) ≤ sup J(u∗ , ζ) ≤ inf

(93)

u∈U0 ζ∈Ξ

τ ∈T

ζ∈Ξ

1 0 (V + τ d). 2 τ

Furthermore, this controller is an absolutely stabilizing controller for the stochastic uncertain system (1), (8). (ii) Conversely, if there exists an absolutely stabilizing minimax optimal controller u ˜(·) ∈ U0 for the stochastic uncertain system (1), (8) such that u, ζ) < ∞, sup J(˜ ζ∈Ξ

then the set T is nonempty. Moreover, 1 0 u, ζ). (V + τ d) ≤ sup J(˜ 2 τ ζ∈Ξ Proof. Part (i). The conditions of this part of the theorem guarantee that u∗ (·) ∈ U0 . Then Vτ0∗ = Vτ∗ . This fact together with Theorem 1 implies that inf

(94)

(95)

τ ∈T

inf sup J(u, ζ) ≤ sup J(u∗ , ζ) ≤

u∈U0 ζ∈Ξ

ζ∈Ξ

1 1 0 (V + τ∗ d) = inf (Vτ0 + τ d). τ ∈T 2 2 τ∗

Also from Theorem 1, the controller u∗ (·) solving the corresponding risk-sensitive control problem is an absolutely stabilizing controller. From this observation, part (i) of the theorem follows. Part (ii). Note that the controller u ˜(·) ∈ U0 satisfies the conditions of Theorem 2; see Proposition 5. Let c be a constant such that u, ζ) < c. sup J(˜ ζ∈Ξ

1216

VALERY A. UGRINOVSKII AND IAN R. PETERSEN

When proving Theorem 2, it was shown that there exists a constant τ > 0 such that Riccati equation (90) has a stabilizing positive-definite solution and 1 τ lim τ,T (˜ u) < c − d < ∞; 2 T →∞ 2

(96)

see (80). Hence, Vτ0 < ∞. From the above conditions and using Lemma 7, we conclude that the set T is nonempty. We now prove (94). Consider a sequence {ci }, i = 1, 2, . . . , such that ci ↓ sup J(˜ u, ζ) ζ∈Ξ

as i → ∞.

From (96) it follows that inf

τ ∈T

1 0 (V + τ d) < ci . 2 τ

Hence, letting i approach infinity leads to the satisfaction of (94). The first part of Theorem 3 provides a sufficient condition for the existence of an optimal solution to the minimax LQG control problem considered in this section. This condition is given in terms of certain Riccati equations. This makes the result useful in practical controller design since there is a wide range of software available for solving such Riccati equations. In the control literature, there is a great deal of interest concerning the issue of conservatism in robust controller design. For example, a significant issue considered in [15, 16, 18] is to prove that the results on the minimax optimal control considered in those papers are not conservative, in that the corresponding Riccati equations fail to have stabilizing solutions if the minimax optimal controller does not exist. Thus, the conditions for the existence of a minimax optimal controller presented in those sections are necessary and sufficient conditions. The second part of Theorem 3 is analogous to the necessity results of [15, 16, 18, 19]. It follows from this part of Theorem 3 that the controller u∗ (·) constructed in the first part of Theorem 3 represents a minimax optimal controller in the subclass U0,stab ⊂ U0 of stabilizing linear output feedback controllers. This result is summarized in the following theorem. Theorem 4. Assume that the conditions of part (i) of Theorem 3 are satisfied. Then, the controller u∗ (·) constructed in part (i) of Theorem 3 is the minimax optimal controller such that (97)

inf

sup J(u, ζ) = sup J(u∗ , ζ) = inf

u∈U0,stab ζ∈Ξ

ζ∈Ξ

τ ∈T

1 0 (V + τ d). 2 τ

Proof. It was shown in part (i) of Theorem 3 that the controller u∗ (·) belongs to the set U0,stab . Hence, (98)

inf

sup J(u, ζ) ≤ sup J(u∗ , ζ) ≤ inf

u∈U0,stab ζ∈Ξ

ζ∈Ξ

τ ∈T

Furthermore, condition (98) implies that inf

sup J(u, ζ) < ∞.

u∈U0,stab ζ∈Ξ

1 0 (V + τ d). 2 τ

1217

MINIMAX LQG CONTROL

x2 ✲ u

x1 ✲ k



m2 ❤



❅ ❅ ❅

❅ ❅ ❅

m1 ❤



Fig. 2. A two mass spring system.

That is, for any sufficiently small ε > 0, there exists a controller u ˜(·) ∈ U0,stab such that sup J(˜ u, ζ) ≤ ζ∈Ξ

inf

sup J(u, ζ) + ε.

u∈U0,stab ζ∈Ξ

This controller satisfies the conditions of part (ii) of Theorem 3. Therefore, it follows from Theorem 3 that inf

τ ∈T

1 0 (V + τ d) ≤ inf sup J(u, ζ) + ε. u∈U0,stab ζ∈Ξ 2 τ

The above inequality holds for any infinitesimal ε > 0. Therefore, inf

τ ∈T

1 0 (V + τ d) ≤ inf sup J(u, ζ). u∈U0,stab ζ∈Ξ 2 τ

This inequality together with (98) implies (97). 7. Illustrative example. We now consider the tracking problem which was used as an illustrative example in [15, 17]. In this tracking problem, the goal is to design an output-feedback controller so that the controlled output of a two-cart system tracks a reference step input. The system to be controlled is shown in Figure 2. As in [15, 17], the masses of the carts are assumed to be m1 = 1 and m2 = 1. Furthermore, the spring constant k is treated as an uncertain parameter subject to the bound 0.5 ≤ k ≤ 2.0. From this, a corresponding uncertain system was derived in [17]. This uncertain system is described by the following state equations:       0 0 1 0 0 0 0 0     0 0 0 1  0 0 0   x +  0 u +  x˙ =   0   −0.70 0 0  ξ,  −1.25 1.25 0 0  1.25 −1.25 0 0 1 0.80 0 0   (99) z = 1 −1 0 0 x,     1 0 0 0 0 0.05 0 y= x+ ξ, 0 1 0 0 0 0 0.05   yT = 1 0 0 0 x. Here, the uncertainty is subject to an integral quadratic constraint which will be specified below. The output yT is the output which is required to track a step input.

1218

VALERY A. UGRINOVSKII AND IAN R. PETERSEN

The control problem solved in [17] involved finding a controller which absolutely stabilized the system and also ensured that the output yT tracks a reference step input. In [17], the system was transformed into the form:       0 1 0 0 0 0 0 0  0 0  0.5    0 0  x  u +  0.05 0 0  ξ, x ¯˙ =  ¯+ (100)  0 0     0 1 0 0 0 0  0 0 −2.5 0 −0.5 −0.75 0 0   z= 0 0 2 0 x ¯,     1 0 1 0 0 0.05 0 y¯ = x ¯+ ξ, 1 0 −1 0 0 0 0.05   yT − y˜T = 1 0 1 0 x ¯, where

 y¯ := y −

1 1

 η.

Here, η denotes the state of the reference input signal model: η˙ = 0, y˜T = η.

(101)

η(0) = 1,

The above transformation involved the following change of variables: x ¯1 = (x1 + x2 )/2 − η, x ¯2 = (x˙ 1 + x˙ 2 )/2 = (x3 + x4 )/2, (102)

x ¯3 = (x1 − x2 )/2, x ¯4 = (x˙ 1 − x˙ 2 )/2 = (x3 − x4 )/2.

To construct the required controller, the following cost function was used:  ∞ (103) [(yT − y˜T )2 + 0.1¯ x2 + u2 ]dt. 0

Hence, the matrices R and G are as defined in [17]:   1.1 0 1 0  0 0.1 0 0   > 0, R=  1 0 1.1 0  0 0 0 0.1

G = 1.

In (100), the uncertainty input ξ(·) has three components, ξ(·) = [ξ1 (·), ξ2 (·), ξ3 (·)] . The uncertainty input ξ1 (·) describes the uncertainty in the spring rate. This uncertainty satisfies the constraint |ξ1 (t)| ≤ |z(t)|.

MINIMAX LQG CONTROL

1219

The components ξ2 and ξ3 of the uncertainty input vector ξ are fictitious uncertainty inputs which were added to system (99) in [17] in order to fit this system into the framework of the method presented in that paper. Specifically, it was assumed in [17] that the uncertainty input ξ(·) satisfies the following integral quadratic constraint:  (104)

0

ti

ξ(t)2 dt ≤

 0

ti

z(t)2 dt + x ¯0 S x ¯0 ,

where {ti } is a sequence of times as discussed in [17]. Also, in [17], the initial condition of system (100) was chosen to be x ¯0 = [−1 0 0 0] . This choice of the initial condition corresponds to a zero initial condition on the system dynamics and an initial condition of η(0) = 1 on the reference input dynamics. Also, the mismatch matrix S was chosen to be   0.1 0 0 0  0 0.1 0 0   > 0. S=  0 0 0.1 0  0 0 0 0.1 The output-feedback robust controller designed in [17] was a suboptimal timevarying controller. We now apply the controller design procedure presented in this paper to design a time-invariant output-feedback minimax optimal controller solving the above tracking problem. We will use the state space transformation (102), which reduces the original tracking problem to a regulator problem. However, in order to apply the results of this paper to this robust control problem, we must introduce a stochastic description of the system. To satisfy this requirement, a noise input will be added to the system, and the controller will be designed for the system with additive noise. That is, we replace the nominal system corresponding to (100) with ξ(·) ≡ 0 with a stochastic system described by the following stochastic differential equation: (105)



0  0  d¯ x=  0 0  z= 0 0  1 0 d¯ y= 1 0  yT − y˜T = 1 0

     1 0 0 0 0  0.5    0.05 0 0 0  x    ¯+  0  u dt +  0 0 1  0 0 −2.5 0 −0.5 −0.75  2 0 x ¯,    1 0 0 0.05 0 x ¯dt + dW, −1 0 0 0 0.05  1 0 x ¯,

 0 0 0 0   dW (t), 0 0  0 0

where W (t) = [W1 (t), W2 (t), W3 (t)] is a 3-dimensional Wiener process on a certain measurable space (Ω, F, P ). Here, P is the reference probability measure. Also, the uncertain system (100) is replaced by an uncertain system of the form (105) considered on an uncertain measurable space defined using an uncertain martingale ζ(·). Also, as noted in section 2, uncertain systems of this type can be described using a stochastic differential equation of the form (13). System (105) is a system of the form (1) to which the design technique presented in this paper is applicable. Note that in this example, a robust controller is sought which stabilizes the system in the face of stochastic uncertainty. It can readily be shown using Lemma 5 that the absolute stability of the stochastic closed-loop system consisting of system

1220

VALERY A. UGRINOVSKII AND IAN R. PETERSEN

(105) and this controller implies the robust stability of the closed-loop system corresponding to the deterministic system (100) driven by the same linear output-feedback controller. Indeed, Lemma 5 shows that the corresponding Riccati equation (59) has a nonnegative-definite stabilizing solution. Then, using the strict bounded real lemma leads to the conclusion that the corresponding deterministic closed-loop system with norm-bounded uncertainty is quadratically stable [5]. Also, the corresponding deterministic closed-loop system with the uncertainty modeled using an integral quadratic constraint of the form (104) is absolutely stable [23]. It follows from this observation that a robust output-feedback controller designed for the uncertain stochastic system (105) also serves as a robust controller for the original uncertain system (100). Thus, a controller designed for stochastic uncertain system (105) will solve the original tracking problem. We now proceed to the derivation of a robust output-feedback controller for system (105). We first replace the integral quadratic constraint (104) by the following stochastic uncertainty constraint: For any T > 0 (106)

1 QT E 2T



T

0

1 QT E ξ(t) dt ≤ 2T 2

 0

T

z(t)2 dt + d,

d=

1  x ¯ Sx ¯0 . 2 0

It was shown in section 2 that the uncertainty class defined by the constraint (106) can be embedded into an uncertainty class described by the corresponding relative entropy uncertainty constraint of the form (8). The cost functional is chosen to have the form (107)

1 lim sup 2T T →∞

 0

T

T

EQ [(yT − y˜T )2 + 0.1¯ x2 + u2 ]dt.

We are now in a position to apply the design procedure outlined in Theorem 3. For each value of τ > 0, the Riccati equations (81) and (82) are solved, and then a line search is carried out to find the value of τ > 0 which attains the minimum of the function 1/2(Vτ0 + τ d) defined in Theorem 3. A graph of 1/2(Vτ0 + τ d) versus τ for this example is shown in Figure 3. It was found that the optimal value of the parameter τ is τ = 5.6931. With this optimal value of τ , the following positive-definite stabilizing solutions to Riccati equations (82) and (81) were obtained: 

X∞

Y∞

 4.0028 6.8156 −6.3708 3.8312  6.8156 18.3891 −20.6541 9.3784  , =  −6.3708 −20.6541 48.5330 −5.8268  3.8312 9.3784 −5.8268 12.5738   0.0007 0.0003 −0.0005 −0.0014  0.0003 0.0008 −0.0017 −0.0108  . =  −0.0005 −0.0017 0.0077 0.0236  −0.0014 −0.0108 0.0236 0.1641

Furthermore, a corresponding time-invariant controller of the form (2), (85) was

1221

MINIMAX LQG CONTROL 12

11

10

9

8

7

6

5

4

3

4

5

Fig. 3. Cost bound

constructed to be

6

1 (Vτ0 2

τ

7

8

9

10

+ τ d) versus the parameter τ .



 −0.5868 1.0000 0.4581 0  −1.0384 −2.3064 5.7466 0.7202  x dˆ x= ˆdt  0.4581 0 −7.6627 1.0000  2.6530 3.0582 −34.7464 0.3817   0.0643 0.5225  −0.8702 1.1403   +  3.6023 −4.0604  dy(t), 13.2633 −14.8366   u = −1.4922 −4.5053 7.4137 1.5977 x ˆ.

Then referring to system (99), the required tracking control system is constructed by replacing the time-varying controller of [17] with the above time-invariant controller as shown in Figure 4. To verify the robust tracking properties of this control system, Figure 5 shows the step response of the system for various values of the spring constant parameter k. It can be seen from these plots that the stochastic minimax optimization approach of this paper leads to a robust tracking system which exhibits transient behavior similar to the behavior of the tracking system designed using the deterministic approach of [17]. However, the controller designed using the approach of this paper is time-invariant. Appendix A. Relative entropy. This appendix presents a result on the duality between free energy and relative entropy which is exploited in this paper. This result is taken from [1]. Let (Ω, F) be a measurable space, and let P(Ω) be the set of probability measures on (Ω, F).

1222

VALERY A. UGRINOVSKII AND IAN R. PETERSEN

_

1

Time-Invariant Controller

1

Two Mass Spring System

+

Fig. 4. Block diagram of a tracking control system. 1.8

1.6

1.4

1.2

yT(t)

1

0.8

0.6

0.4

0.2

0

0

2

4

6

8

10 12 time, seconds

14

16

18

20

Fig. 5. Control system step response for various spring constants.

Definition 4. Let P ∈ P(Ω), and ψ : Ω → R be a measurable function. The quantity

 ψ E := log e P (dω) is called the free energy of ψ with respect to P . Definition 5. Given any two probability measures Q, P ∈ P(Ω), the relative entropy of the probability measure Q with respect to the probability measure P is defined by " # " # dQ Q(dω) if Q

P and log ∈ L1 (Ω, F, Q), log dQ dP dP (A.1) h(QP ) := +∞ otherwise. ym derivative of the probability In the above definition, dQ dP is the Radon–Nikod´ measure Q with respect to the probability measure P . Note that the relative entropy is a convex, lower semicontinuous functional of Q; e.g., see [2]. It is shown in [1]

MINIMAX LQG CONTROL

1223

that the functions E(ψ) and h(QP ) are in duality with respect to a Legendre-type transform as follows. Lemma 8. (i) For every Q ∈ P(Ω), $ % (A.2) ψQ(dω) − E(ψ) ; h(QP ) = sup eψ ∈L1 (Ω,F ,P ), ψ bounded below

(ii) For every ψ bounded from below, $ (A.3)

E(ψ) =

sup

h(QP ) 0.

We now consider the following two cases. Case 1. Reλ > 0. In this case, we obtain the following bound on y  E [x(t)x (t)] y:  t   y  E [x(t)x (t)] y = y  eAt E [x(0)x (0)] eA t y + y  eA(t−s) B2 B2 eA (t−s) yds 0  t = e2Reλt y  E [x(0)x (0)] y + e2Reλ(t−s) y  B2 B2 yds e2Reλt − 1  ≥ y B2 B2 y. 2Reλ

0

Thus for any T > 0   1 T  1 T e2Reλt − 1    (B.3) dt. y E [x(t)x (t)] ydt ≥ y B2 B2 y · T 0 T 0 2Reλ

1224

VALERY A. UGRINOVSKII AND IAN R. PETERSEN

Case 2. Reλ = 0. In this case, we obtain the following bound on y  E [x(t)x (t)] y:  t   y  eA(t−s) B2 B2 eA (t−s) yds y  E [x(t)x (t)] y = y  eAt E [x(0)x (0)] eA t y + 0  t y  B2 B2 yds = y  E [x(0)x (0)] y + 0

≥ t · y  B2 B2 y.

Thus for any T > 0  1 T  1 T2 T y E [x(t)x (t)] ydt ≥ y  B2 B2 y · (B.4) = y  B2 B2 y . T 0 T 2 2 Since y  B2 B2 y > 0, the expressions on the right-hand side of inequalities (B.3) and (B.4) both approach infinity as T → ∞. That is, in both cases, lim sup T →∞

1 T

 0

T

y  E [x(t)x (t)] ydt = ∞.

This yields the desired contradiction with (B.1). Proof of Proposition 5. Note that by definition, for any controller u(·) ∈ U0 , the corresponding pair (Ac , Bc ) is controllable and the pair (Ac , K) is observable. ¯ B), ¯ we first note that the matrix To prove the controllability of the pair (A,    A − sI C2 (B.5) B2 D2 has full column rank for all s ∈ C [25]. Next, consider the matrix pair

 A ¯ B) ¯ = (A, (B.6) Bc C2

B1 K Ac

  ,

B2 Bc D2

 .

For this matrix pair to be controllable, the equations (B.7a) (B.7b) (B.7c)

(A − sI)x1 + C2 Bc x2 = 0, B2 x1 + D2 Bc x2 = 0,

K  B1 x1 + (Ac − sI)x2 = 0

must imply that x1 = 0 and x2 = 0 for every s ∈ C. Equations (B.7a) and (B.7b) can be written as follows:     A − sI C2 x1 = 0. D2 B2 Bc x2 It was noted above that the matrix (B.5) has full column rank for all s ∈ C. Hence, the above equation and (B.7c) imply that x1 = 0,

Bc x2 = 0,

(Ac − sI)x2 = 0.

Since the pair (Ac , Bc ) is controllable, then the two last equations imply that x2 = 0. Thus, the pair (B.6) is controllable.

MINIMAX LQG CONTROL

1225

¯ R), ¯ we need to show that the In order to prove the observability of the pair (A, equations (B.8b)

(A − sI)x1 + B1 Kx2 = 0, Bc C2 x1 + (Ac − sI)x2 = 0,

(B.8c) (B.8d)

R1/2 x1 = 0, G1/2 Kx2 = 0

(B.8a)

imply that x1 = 0, x2 = 0 for every s ∈ C. Indeed, since the matrices R, G are positive-definite, then it follows from (B.8c) and (B.8d) that x1 = 0 and Kx2 = 0. Using these equations, we also obtain from (B.8b) that (Ac − sI)x2 = 0. Since the ¯ R) ¯ pair (Ac , K) is observable, this implies that x1 = 0 and x2 = 0. Thus, the pair (A, is observable. REFERENCES [1] P. Dai Pra, L. Meneghini, and W. Runggaldier, Connections between stochastic control and dynamic games, Math. Control Signals Systems, 9 (1996), pp. 303–326. [2] P. Dupuis and R. Ellis, A Weak Convergence Approach to the Theory of Large Deviations, Wiley, New York, 1997. [3] P. Dupuis, M. R. James, and I. R. Petersen, Robust properties of risk–sensitive control, in Proceedings of the IEEE Conference on Decision and Control, Tampa, FL, 1998, pp. 2365– 2370. [4] I. Karatzas and S. E. Shreve, Brownian Motion and Stochastic Calculus, Springer-Verlag, New York, 1988. [5] P. P. Khargonekar, I. R. Petersen, and K. Zhou, Robust stabilization of uncertain systems and H ∞ optimal control, IEEE Trans. Automat. Control, AC-35 (1990), pp. 356–361. [6] R. S. Liptser and A. N. Shiryayev, Statistics of Random Processes. I. General Theory, Springer-Verlag, New York, 1977. [7] D. G. Luenberger, Optimization by Vector Space Methods, Wiley, New York, 1969. [8] D. Mustafa and K. Glover, Minimum Entropy H∞ Control, Springer-Verlag, Berlin, 1990. [9] Z. Pan and T. Bas¸ar, Model simplification and optimal control of stochastic singularly perturbed systems under exponentiated quadratic cost, SIAM J. Control Optim., 34 (1996), pp. 1734–1766. [10] I. R. Petersen, B. D. O. Anderson, and E. A. Jonckheere, A first principles solution to the non-singular H ∞ control problem, Internat. J. Robust Nonlinear Control, 1 (1991), pp. 171–185. [11] I. R. Petersen and M. R. James, Performance analysis and controller synthesis for nonlinear systems with stochastic uncertainty constraints, Automatica, 32 (1996), pp. 959–972. [12] I. R. Petersen, M. R. James, and P. Dupuis, Minimax optimal control of stochastic uncertain systems with relative entropy constraints, IEEE Trans. Automat. Control, 45 (2000), pp. 398–412. [13] J. F. Randolph, Basic Real and Abstract Analysis, Academic Press, New York, London, 1968. [14] T. Runolfsson, The equivalence between infinite-horizon optimal control of stochastic systems with exponential-of-integral performance index and stochastic differential games, IEEE Trans. Automat. Control, 39 (1994), pp. 1551–1563. [15] A. V. Savkin and I. R. Petersen, Minimax optimal control of uncertain systems with structured uncertainty, Internat. J. Robust Nonlinear Control, 5 (1995), pp. 119–137. [16] A. V. Savkin and I. R. Petersen, Nonlinear versus linear control in the absolute stabilizability of uncertain linear systems with structured uncertainty, IEEE Trans. Automat. Control, 40 (1995), pp. 122–127. [17] A. V. Savkin and I. R. Petersen, Output feedback guaranteed cost control of uncertain systems on an infinite time interval, Internat. J. Robust Nonlinear Control, 7 (1997), pp. 43–58. [18] V. A. Ugrinovskii and I. R. Petersen, Absolute stabilization and minimax optimal control of uncertain systems with stochastic uncertainty, SIAM J. Control Optim., 37 (1999), pp. 1089–1122. [19] V. A. Ugrinovskii and I. R. Petersen, Finite horizon minimax optimal control of stochastic partially observed time varying uncertain systems, Math. Control Signals Systems, 12 (1999), pp. 1–23.

1226

VALERY A. UGRINOVSKII AND IAN R. PETERSEN

[20] V. A. Ugrinovskii and I. R. Petersen, Robust output feedback stabilization via risk-sensitive control, in Proceedings of the IEEE Conference on Decision and Control, Phoenix, AZ, 1999, pp. 546–551. [21] V. A. Ugrinovskii and I. R. Petersen, Robust stability and performance of stochastic uncertain systems on an infinite time interval, Systems Control Lett., to appear. [22] J. C. Willems, Least-squares stationary optimal control and the algebraic Riccati equation, IEEE Trans. Automat. Control, AC-16 (1971), pp. 621–634. [23] V. A. Yakubovich, Dichotomy and absolute stability of nonlinear systems with periodically nonstationary linear part, Systems Control Lett., 11 (1988), pp. 221–228. [24] M. Zakai, A Lyapunov criterion for the existence of stationary probability distributions for systems perturbed by noise, SIAM J. Control Optim., 7 (1969), pp. 390–397. [25] K. Zhou, J. Doyle, and K. Glover, Robust and Optimal Control, Prentice-Hall, Upper Saddle River, NJ, 1996.