IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 50, NO. 1, JAN. 2005 (TO APPEAR; SUBMITTED MAY 2003, REVISED MAY AND OCTOBER 2004)
1
H ∞ Control and Estimation with Preview—Part II: Fixed-Size ARE Solutions in Discrete Time Gilead Tadmor and Leonid Mirkin
Abstract— H ∞ preview control and fixed-lag smoothing problems are solved in general discrete time linear systems, via a reduction to equivalent open loop differential games. To prevent high order Riccati equations, found in some solutions, the state of the Hamilton-Jacobi system (HJS) resides in a quotient space of an auxiliary extended state space system. The dimension of that auxiliary space is equal to the state space dimension of the original system (ignoring the delay). Index Terms— Preview control; fixed-lag smoothing; differential games; H ∞ optimization.
I. I NTRODUCTION This paper studies the discrete-time H ∞ preview tracking and fixed-lag smoothing problems, thus complementing [1], where continuous-time results are presented. In the preview tracking problem, preview information is represented as a delay in the exogenous input, whereby the complete state comprises both the difference equation’s state and the exogenous input history over the preview interval. In the fixed-lag smoothing problem, preview information shows up in the delay between the measurement and the estimation generation. The reader is referred to [1] and to references therein for a detailed motivation. One might be tempted to consider discrete-time problems with preview as simpler than the corresponding continuoustime problems, since the delay operator in discrete time is finite dimensional. A straightforward and standard trick is to incorporate the delay into an augmented state dynamics, resulting with a finite-dimensional delay-free problem. While this is certainly possible, the result is an increase in state dimension, hence in both the computational and conceptual burden of treating such systems, a burden that grows rapidly with the delay. Indeed, smoothing results formulated in terms of such high-dimensional models [2]–[4] are computationally involved and do not enable one to account for the effect of preview on achievable performance. To the best of our knowledge, previous attempts to simplify such high-dimensional results to low-dimensional formulae, similar to what was done in the H 2 case [5], fell short of providing a complete solution. Recent studies [6]–[8] provide solution procedures based upon a combination of the standard (low-dimensional) filtering H ∞ algebraic Riccati equation (ARE) and an iterative procedure, to verify the solvability of This research was supported by the U.S.-Israel Binational Science Foundation (grant no. 2000167) and the Israel Science Foundation (grant no. 106/01). G. Tadmor is with Electrical & Computer Eng. Dept., Northeastern University, Boston, MA 02115, U.S.A. E-mail:
[email protected] L. Mirkin is with Faculty of Mechanical Eng., Technion—IIT, Haifa 32000, Israel. E-mail:
[email protected] the smoothing problem. Yet it is easy to construct examples (see, e.g., §III-B) where the filtering ARE does not admit a stabilizing solution or where the iterative procedure fails, under some performance levels γ for which the smoothing problem is solvable. In other words, the solvability conditions in [6]–[8] are only sufficient. The J-spectral factorization approach of [9] is also based on the (potentially ill-posed) filtering ARE. In this paper we propose a first complete solution to the preview H ∞ control and estimation problems, which is based on fixed size (equal to the original state space dimension and independent of the preview interval) ARE’s. We derive necessary and sufficient solvability conditions in terms of two ARE’s: one is the standard H 2 ARE and the other is a nonstandard H ∞ -like ARE. We also do not impose any restrictive assumption on plant realization, beyond what is standard in systems without delays. For example, we do not require the nonsingularity of either the “ A” or the “D2 ” matrices of the plant (the invertibility of these matrices is critical in [7]–[9]). A difficulty similar to the one arising in the analysis of discrete augmented systems has been encountered in continuoustime systems with delayed control, e.g., in [10], [11], and resolved by exploiting the interplay between the differential game’s solution in the original ODE and in an extended state space model. The former provides a computation of the optimal cost in terms of fixed size matrix Riccati equations. The kernel of the quadratic form of the optimal cost is the solution of the Riccati equation in the augmented model, and is used to provide explicit expressions for suboptimal feedback gains. Direct application of these ideas to the preview tracking problem is not straightforward, as here, the state projection of the stable subspace of the Hamilton-Jacobi system (HJS) need not cover the entire original state space. This difficulty is exacerbated by other mischievous aspects of discrete time systems, such as a possibly singular “ A”. An enabling observation is that the state and co-state in the partition of a HJS should capture the respective causal effects of past inputs, and the effects of future input decisions. Having identified the appropriate HJS, via its state and co-state, suboptimal values are characterized in terms of the associated ARE, and the form of the (central) suboptimal compensator is found by an analysis of the interplay between that system, and the auxiliary model. Conceptually, the HJS state is similar to the quotient state obtained by the so called structural F-operator [12]–[14], in systems with state delays. The paper is organized as follows. In Section II the main results are formulated: preview control is addressed in §II-A and fixed-lag smoothing – in §II-B. In Section III two simple examples illustrating our results are presented. Section IV
2
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 50, NO. 1, JAN. 2005 (TO APPEAR)
uk = −ϒ−1 γ22
′l l B2′ (Xκ + Aκ Xγ A κ ) D2′
A B1 C D1
is devoted to the proof of the preview control result (the smoothing counterpart follows by duality arguments). Finally, concluding remarks are provided in Section V. A. Notations The notations of this paper basically follow those of [1] with the obvious replacement of L2 with ℓ2 . To simplify the exposition, we write A′m to denote ( A′ )m . The notation η˘ k stands for the finite window history of length l of a signal η ′ . at the time k: η˘ k = η′k−1 η′k−2 . . . η′k−l . II. M AIN
RESULTS
A. Preview control Consider an LTI system xk+1 = Axk + B1 wk−l + B2 uk zk = Cxk + D1 wk−l + D2 uk
(1)
The delay, l ≥ 1, in the action of the exogenous input w, reflects a preview information, available for control decisions. An admissible triplet (x0 , w, u) ∈ Rn × ℓ2 × ℓ2 is one that is associated with a state trajectory x ∈ ℓ2 , in (1). The optimal value is . γopt = infuk = f (xk ,wk ,w˘ k ) kTwz : w 7→ zkℓ2 . Alternative definitions of γopt , e.g., where x0 is included as part of the exogenous input, are easily incorporated. The following are standard hypotheses that guarantee the well-posedness of the problem: A 1 : ( A, B2 ) is stabilizable. A 2 : The realization [ A, B2 , C, D2] has no invariant zeros on the unit circle and is left invertible. It is worth emphasizing that we do not impose conventional “simplifying” assumptions, such as that det( A) 6= 0, D1 = 0, or that D2 has full column rank. In the discrete-time case, these assumptions hardly simplify the arguments and the formulation of final results. Indeed, where they seem to do, simplicity tends to be accompanied by incomplete solutions. The statement of the main result for the preview problem is made in terms of a succession of definitions that now follows. Let Xκ ≥ 0 be the stabilizing solution of the H 2 algebraic Riccati equation (ARE) Xκ = C ′ C + A′ Xκ A −1 − ( A′ Xκ B2 + C ′ D2 )ϒκ (B2′ Xκ A + D2′ C),
(2) . ′ where ϒκ = B2 Xκ B2 + D2′ D2 > 0. The strict positivity of ϒκ and existence and properties of this solution, under A 1 and A 2 , are well known and will be reviewed in Section IV. In these terms, define ′ B2 A B1 Aκ Bκ . −1 B2 Xκ D2′ ϒκ , = I− D2 C D1 Cκ Dκ . ′ Aκ Bκ ′ Eκ Qκ = Bκ Xκ Dκ , Cκ Dκ
xk wk−l
+ B2′
−1 X
i=−l
′l ′l+i ′ Aκ Xγ A1−i κ Bl+i + Aκ Eκ w k+i+1
(4)
k−1
. X i −1 ′ ′i Gk = Aκ B2 ϒκ B2 Aκ , i=0
. ′ , Bk = Bκ − Aκ Gk Eκ . l Ψ = Bl Aκ B2 ,
where we ignore summation over empty index sets, so that Gi = 0 whenever i ≤ 0, and then . ′ Λ γ = Q κ − E κ Gl E κ − γ 2 I, . Aγ = Aκ − Bl Λ−1 γ Eκ , . Λγ 0 Φγ = . 0 ϒκ The definition of Aγ is obviously valid only when Λγ is nonsingular, a fact that will be established before any use of this notation. The main result of this paper, which will be proved in Section IV, is formulated as follows: Theorem 2.1: The following two statements are equivalent 1) γ > γopt . 2) Λγ < 0 and there exists the stabilizing solution Xγ ≥ 0 to the ARE ′ −1 ′ Xγ = A′γ Xγ (I + Ψ Φ−1 A γ − Eκ Λ−1 (3) γ Ψ Xγ ) γ Eκ −I . ′ such that the inertias of ϒγ = Φγ + Ψ Xγ Ψ and I coincide. Assume that, indeed, γ > γopt and let Xγ be the said solution of (3). Then one stabilizing, strictly γ-suboptimal complete information feedback control law is given by Eq. (4) at the top of this page, where ϒγ22 is the (2, 2) sub-block of ϒγ with the partitioning compatible with that of Φγ . Remark 2.1: Several parts of the ARE (3) involve the inverse of the matrix Λγ . Although by the solvability conditions this matrix is strictly negative, it might be close to singular as γ approaches γopt . To avoid the inversion of this potentially ill-conditioned matrix, it might be more convenient to express (3) in terms of the stable deflating subspace of the following extended symplectic matrix pencil: −1 ′ ′l l I Aκ B2 ϒκ B2 Aκ 0 A κ 0 Bl ′ ′ − Aκ 0 − 0 −I Eκ z 0 , ′ 0 −Bl 0 Eκ 0 Λ γ
see [15] for relevant definitions. Remark 2.2: As in the continuous-time case, there exists a connection between the stabilizing solution Xγ to the H ∞ ARE (3) and the stabilizing solution X˜ γ to the standard statefeedback H ∞ ARE (when the latter exists). Following the arguments in [1], it can be shown that X˜ γ = Xκ + Xγ (I − Gl Xγ )−1 m Xγ = I + ( X˜ γ − Xκ )Gl
−1
( X˜ γ − Xκ ).
Hence, X˜ γ exists only if I − Gl Xγ is nonsingular. This condition, however, might fall when the preview problem is solvable.
“H ∞ CONTROL AND ESTIMATION WITH PREVIEW—PART II,” BY G. TADMOR AND L. MIRKIN
3
−1 l ′l ¯ γ22 εk xˆk+1 = A xˆk + A(Yκ + A¯ κ Yγ A¯ κ )C2′ + BD2′ ϒ zˆk = C1 xˆk−l +
−1 X
i=−l
′ ¯ −1 ¯ −1 ¯l ¯ ′l ′ ¯ ′l ¯ ′ ¯ ′l+i C2′ ϒ Cl+i A¯ 1−i γ22 ε k+i+1 + C1 (Yκ + Aκ Yγ Aκ )C2 + D1 D2 ϒγ22 ε k−l κ Yγ Aκ + Eκ Aκ
B. Fixed-lag smoothing ∞
This is a variant of the H estimation (output reconstruction) problem, where a finite delay is allowed: xk+1 = Axk + Bwk , zk = C1 xk + D1 wk , yk = C2 xk + D2 wk , ek = zk−l − zˆk ,
(5) zˆ = F y,
where, as in [1], the (stable, causal, linear) system F is defined by a full state Luenberger observer, and is used to reconstruct the output z from the measured signal y. The delay, zt−h , in the definition of the reconstruction error ek , represents the available latency in the reconstruction problem. As usual, it is required that the closed loop system governing the propagation of state estimation errors be stable. The purpose of design is to minimize the induced ℓ2 norm of the closed loop mapping Twe : w 7→ e, and “γopt ” stands, again, for the infimal attainable induced L2 norm of that mapping. The following are the counterparts of A 1 and A 2 : A 3 : (C2 , A) is detectable. A 4 : The realization [ A, B, C2 , D2 ] has no invariant zeros on the unit circle and is right invertible. Introduce now the counterparts of the notations in the previous subsection: −1 ¯κ Yκ = BB′ + AYκ A′ − ( AYκ C2′ + BD2′ )ϒ (C2 Yκ A′ + D2 B′ ), . ¯κ = C2 Yκ C2′ + D2 D2′ . Assumptions A 3 and A 4 where ϒ ¯ κ > 0 and Yκ > 0 exists. In these terms, define guarantee that ϒ A¯ κ B¯ κ . Yκ C2′ ¯ −1 A B C D I − = , ϒ 2 2 κ ¯κ D2′ C1 D1 C¯ κ D E¯ κ . A¯ κ B¯ κ Yκ C¯ κ′ , = ′ ¯κ ¯κ Q¯ κ D C¯ κ D k−1
. X ¯ ′i ′ −1 ¯ i Aκ C2 ϒκ C2 Aκ , G¯ k =
¯ γ < 0 and there exists the stabilizing solution Yγ ≥ 0 2) Λ to the ARE ¯ −1 ¯ −1 ¯ ′ ¯ ¯ −1 ¯ ′ (6) Yγ = A¯ γ (I + Yγ Ψ¯ ′ Φ γ Ψ ) Yγ A γ − Eκ Λ γ Eκ . ¯ γ + ΨY ¯ γ Ψ¯ ′ and −I I ¯γ = Φ such that the inertias of ϒ coincide. Assume that, indeed, γ > γopt and let Yγ be the said solution of (6). Then one stable, strictly γ-suboptimal smoother is given . by Eq. (7) at the top of this page, where ε k = yk − C2 xˆk ¯ γ22 is the (2, 2) sub-block of ϒ ¯ γ with the partitioning and ϒ ¯ γ. compatible with that of Φ Remark 2.3: Note that the H 2 smoothing result can be thought of as a particular case of Theorem 2.2 for γ → ∞. ¯ γ22 = ϒ ¯ κ , so It can be verified that in this case Yγ = 0 and ϒ that the smoother in (7) is simplified. III. I LLUSTRATIVE EXAMPLES In this section we consider two simple examples that illustrate our approach and its advantages over previously available solutions. A. Example 1 The first example is the discrete-time version of the continuous-time example in [1, Section III]. It is discretized here with the sampling period T = 0.05, admiting the form 0.9987 0.0512 0.0500 0.0013 A B1 B2 = , −0.0512 1.0500 −0.0013 0.0512 1 −α 0 0 C D1 D2 = . 0 0 0 1 Applying the conditions of Theorem 2.1 to this system with α = 0.2 leads to the plot of γopt versus l in Fig. 1. This curve practically repeats its continuous-time counterpart in [1]. Remarkably, it also saturates at a finite prediction horizon l. This phenomenon is known to be generic in
i=0
. ′ ¯ ¯ Ck = C¯ κ − E¯ κ Gk A κ , Cl . Ψ¯ = , l C2 A¯ κ
Achievable H ∞ performance, γopt
1.75
and then . ¯ ′ ¯ ¯ ¯γ = Λ Qκ − E¯ κ Gl Eκ − γ 2 I, . ¯ −1 ¯ A¯ γ = A¯ κ − Cl Λ γ Eκ , ¯γ 0 . Λ ¯γ = Φ ¯κ . 0 ϒ The main result concerning the smoothing problem is then formulated as follows: Theorem 2.2: The following two statements are equivalent 1) γ > γopt .
(7)
0.9 0
5
10
15
20
25
30
35
40
Preview, l Fig. 1.
Tracking performance vs. the length of preview in Example 1
4
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 50, NO. 1, JAN. 2005 (TO APPEAR)
continuous-time systems [16], [17], yet its applicability to the discrete time case has been an open question for quite some time. Indeed, Theorem 2.1 does not provide readily useful means to predict performance saturation. The genericity of this phenomenon (when lim l→∞ γopt 6= 0) in the discrete-time case was recently proven in [18].
While this is surely an academic example, the sole purpose of which is to emphasize the point made, it does illustrate a clear advantage of the solvability conditions of Theorem 2.2 (in addition to the fact that the latter are also numerically simpler). IV. P ROOF
B. Example 2
OF THE MAIN RESULT
A. Game theoretic analysis: necessity
Consider now the fixed-lag smoothing problem with A B α α−β C1 D1 = 0 1 . C2 D2 1 1
This system represents the equalization problem in the comz−β munication channel z−α with no measurement noise (see [19, §15.2]). We assume that |β| > 1, that is this channel is nonminimum-phase (i.e., non-invertible). It can be shown, that in this case β2l (β2 − 1) . γ 2 β2l − 1 Thus, according to Theorem 2.2, the problem is solvable iff ¯ γ = β−2l (1 − β−2 ) − γ 2 Λ
and
Yγ =
γ > γopt = β−2l . Note that this γopt does not depend on the channel pole, α. Also, in this case γopt does not saturate as a function of l, which agrees with the analysis in [17], [18]. It might be of interest to compare this solution with those in [7], [8]. The solvability conditions there are based on the stabilizing solution of the standard H ∞ ARE, which for this problem is γ 2 (β2 − 1) Y˜ γ = , γ2 − 1 and the following H 2 Riccati recursion over k = 1, . . . , l: ′ ¯ −1 ˜ ′ Y˜ k+1 = BB′ + AY˜ k A′ − ( AY˜ k C2′ + BD2′ )ϒ k (C2 Yk A + D2 B ) . ¯k = with Y˜0 = Y˜ γ , where ϒ C2 Y˜ k C2′ + D2 D2′ . Since the initial ¯ k might condition here is typically not positive semi-definite, ϒ become singular at some k. Thus, the resursion might not exist, whereby the solvability conditions in [7], [8] are only sufficient. It is argued in [8] (see also [9]) that this shortcoming is insignificant since it discards only a zero-measure set of possible values of the γ. It is worth noting, however, that this argument might be misleading as the zero-measure set of γ values that ¯ k might be arbitrarily dense. give rise to the singularity of ϒ This, in turn, puts in doubt the numerical robustness of the suggested criterion. Indeed, an explicit computation in the current example yields
γ 2 β2k (β2 − 1) . Y˜ k = γ 2 β2k − 1 Therefore, the conditions of [7], [8] fail for γ’s in the set {γ : γ = β−2k , for some k = 0, . . . , l} Allowing β to be (arbitrarily) close to unity, and allowing an arbitrarily large l (for improved performance), this set can be made arbitrarily dense in the interval (γopt , 1).
The proof of Theorem 2.1 is based on a reduction to the differential game o n (8) max minkzk2ℓ2 − γ 2 kwk2ℓ2 : x ∈ ℓ2 . w∈ℓ2 u∈ℓ2
The difference in the setting of (8) from standard H ∞ problems is in the determining data, which now consist of the pair (x0 , w˘ 0 ) ∈ Rn × ℓ2 [−l, −1]. The following is a basic observation Observation 4.1: If γ > γopt then, ∀(x0 , w˘ 0 ) ∈ Rn × ℓ2 [−l, −1] there is a unique solution of (8). The optimal cost of the game is then non-negative and is bounded below by the value of the internal LQ optimal control problem, with wk = 0, k ≥ 0. Immediately below is a sketched proof. Additional details on existence and properties of the two optimizations in (8) occupy much of the remainder of this section. Outline of the Proof: The inner optimization (in u) is a standard LQ optimal control problem, whose unique solution is guaranteed by assumptions A 1 and A 2 . Moreover, the mappings from the data (x0 , w˘ 0 , w) ∈ Rn × ℓ2 [−l, −1] × ℓ2 to the optimal control u# (x0 , w˘ 0 , w), optimal state x# (x0 , w˘ 0 , w) and optimal controlled output, z# (x0 , w˘ 0 , w), then define bounded linear operators. If, indeed, γ > γopt then there is a complete information controller under which the closed loop response z = Twz w (i.e., the response with the initial data (x0 , w˘ 0 ) = (0, 0) and the exogenous input w ∈ ℓ2 ) satisfies γ 2 kwk2ℓ2 − kzk2ℓ2 > δkwk2ℓ2
(9)
with some fixed δ > 0 and all w ∈ ℓ2 . Clearly, the inequality in (9) will only become sharper if the closed loop response z = Twz w is replaced by the optimal z# (0, 0, w), so that γ 2 kwk2ℓ2 − kz# (0, 0, w)k2ℓ2 > δkwk2ℓ2
(10)
In complete analogy to what was done in [20], [21], and now, in [1], this implies existence and uniqueness of an optimal w. For future use we introduce the notation w∗ (x0 , w˘ 0 ) for that solution, along with u∗ (x0 , w˘ 0 ) = u# (x0 , w˘ 0 , w∗ (x0 , w˘ 0 )), x∗ (x0 , w˘ 0 ) = x# (x0 , w˘ 0 , w∗ (x0 , w˘ 0 )) and z∗ (x0 , w˘ 0 ) = z# (x0 , w˘ 0 , w∗ (x0 , w˘ 0 )). All of w∗ , u∗ , x∗ and w∗ then define bounded linear operators from the data (x0 , w˘ 0 ) ∈ Rn × ℓ2 [−l, −1] to the appropriate ℓ2 space. The claim regarding the lower bound over the optimal cost stems from the simple fact that w = 0 is one candidate for optimality in (8), and that, with that selection, the value of the game is the non-negative kz# (x0 , w˘ 0 , 0)k2ℓ2 (i.e., the optimal value with wk = 0, k ≥ 0). The remainder of this section reviews a cohesive solution method for the two optimization problems in the differential
“H ∞ CONTROL AND ESTIMATION WITH PREVIEW—PART II,” BY G. TADMOR AND L. MIRKIN
game (8), and makes connections with the original H ∞ control problem. Despite an overlap with components of previous solutions of the discrete time LQ optimal control and H ∞ problems (of which [22] provides a thorough account), the need to customize tools specifically for our variant of (8) makes this review inescapable. 1) LQ optimal control: the optimal u: The application of [1, Theorem 3] to the discrete time LQ optimization of u in (8) is as follows. Let initial data and w ∈ ℓ2 be fixed. Let −K be a stabilizing feedback control gain, so that u = −K x + u, ˜ u˜ ∈ ℓ2 , is a complete parameterization of admissible controls in (1). This transformation affects neither A 1 nor A 2 , and the optimization over u is equivalently reduced to an optimization over u, ˜ where the requirement that x ∈ ℓ2 becomes implicit. Let z˜ ∈ ℓ2 be the response with the prescribed initial data (x0 , w˘ 0 ) ∈ R × ℓ2 [−l, −1], the selected w ∈ ℓ2 , and u˜ ≡ 0. Let the bounded operator S : u˜ 7→ z : ℓ2 7→ ℓ2 be defined as the input-output (I/O) response in the stable, homogeneous system [ A − B2 K, B2 , C − D2 K, D2 ] (i.e., with x0 = 0, and w ≡ 0), and let J = I. These definitions bring the optimization of u˜ into the framework of [1, Theorem 3] with u˜ and −˜z substituting for u and v. Assumptions A 1 and A 2 mean that S ′ S is uniformly positive. Thus, a unique optimal control u˜# (equiv. u# ) is completely characterized by the condition 0 = S ′ (Sv# + z˜ ) = S ′ z# . By direct computation, this equation has the anti-causal realization pk = ( A − B2 K )′ pk+1 + (C − D2 K )′ zk+1 0 = B2′ pk + D2′ zk
(11)
By the second equation in (11), the expression in K, in the first equation, can be dropped. Thus, the arbitrary inclusion of K, used to obtain a bounded S, has no impact on the solution. Keeping K is still useful to establish the boundedness of the mapping z 7→ p. In particular, a bounded linear operator p# (x0 , w˘ 0 , w) determines p as a function of the data. The conclusion is that u˜ is optimal (equivalently, u = −K x + u˜ is optimal), if and only if, x, p, u and z are ℓ2 trajectories satisfying both (1) and (11). The uniqueness of the optimal control further means that this set of trajectories is uniquely determined as an ℓ2 solution of the combined (1) and (11), with the data, (x0 , w˘ 0 ) and wk , k ≥ 0. a) The homogeneous case: Our next step is to review the association of the homogeneous case, where both w˘ 0 and w are zero, with an algebraic Riccati equation (ARE) and an explicit expression for the optimal control. The linear mappings x0 7→ p#0 (x0 , 0, 0) and x0 7→ # z0 (x0 , 0, 0), have matrix realizations, say p0 = Px0 and z0 = Zx0 . The preceding analysis of the optimization problem applies equally to the shifted problem, of minimizing kzk2ℓ2 [k,∞) , given xk and w ≡ 0, for all k ≥ 0. Thus, the equalities p k = Pxk and zk = Zxk remain valid throughout (this is the essence of the principle of dynamic programming). Substituting into the right hand side of (11), a relation . pk = ( A′ P + C ′ Z)xk+1 = Xκ xk+1 emerges. We proceed to characterize the matrix Xκ . Appealing to (1) (for an expression for xk+1 ) and both equations in (11) (for
5
expressions for p k−1 and B2′ pk + C ′ zk ), there holds hxk , p k−1i − hxk+1 , p k i = kzk k2 − hwk−l , B1′ pk + D1′ zk i. (12) Setting w ≡ 0 and summing over k, this leads to a quadratic expression for the optimal cost kz# k2ℓ2 = hx0 , p−1 i = hx0 , Xκ x0 i. This equation reveals, in particular, that the self adjoint component of Xκ (i.e., 21 (Xκ + Xκ′ )) is nonnegative. Substituting p k = Xκ xk+1 , the second equation in (11) becomes 0 = B2′ pk + D2′ zk = B2′ Xκ xk+1 + D2′ Cxk + D2′ D2 uk = ϒκ uk + (B2′ Xκ A + D2′ C)xk ,
(13) . ′ where ϒκ = B2 Xκ B2 + D2′ D2 , as in Section II-A. Observation 4.2: ϒκ is invertible and its self-adjoint com′ ponent, (ϒκ + ϒκ )/2, is strictly positive definite. Proof: The unique optimal solution for the zero initial data, x0 = 0 and w˘ 0 = 0, is u# ≡ 0, associated with the optimal cost kz# k2ℓ2 = 0. The uniqueness of the optimal control means that the cost with any non-zero control must be strictly positive. Select u0 6= 0. Along [1, ∞) implement the optimal control that minimizes kzk2ℓ2 [1,∞) , subject to the initial state x1 = B2 u0 . Thus, the total cost of this process, kzk2ℓ2 = kD2 u0 k2 + hx1 , Xκ x1 i = hu0 , ϒκ u0 i, must be positive, as claimed. By (13), the optimal feedback control is −1 uk = −ϒκ (B2′ Xκ A + D2′ C)xk .
(14)
Under the statement of the optimization problem, this feedback control law must be stabilizing, meaning that the closed-loop . −1 matrix, Aκ = A − B2 ϒκ (B2′ Xκ A + D2′ C), defined in §II-A, must be Schur. Substituting (14) into (1), and eventually, into (11), it is a standard procedure to obtain the ARE (2). The fact that that Xκ (and hence ϒκ ) is self adjoint can be shown by rewriting (2) as a (discrete-time) Lyapunov equation with a self adjoint free term ′ Xκ = Aκ Xκ Aκ + Cκ′ Cκ .
(15)
The advantage of this argument over familiar alternatives, such as spectral analysis of the associated Hamilton-Jacobi system, is that it carries over, as is, to time-varying systems. Invoking the fact that the optimal cost is given as a quadratic form with the kernel Xκ , the conclusion is that Xκ ≥ 0. An additional conclusion is that ϒκ is self adjoint, hence positive definite. b) The inhomogeneous case: We utilize the observations made above to obtain a computable version of the coupled (1) and (11), as well as an explicit expression for u. Denote ζk = p k − Xκ xk+1 . In complete analogy to (13), one now has 0 = B2′ pk + D2′ zk ′ A B1 xk ′ + B2′ ζk . = ϒκ uk + B2 Xκ D2 C D1 wk−l
(16)
6
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 50, NO. 1, JAN. 2005 (TO APPEAR)
Thus
partition
−1 ′ uk = −ϒκ B2 ζk
′ −1 B2 Xκ D2′ − ϒκ
A B1 C D1
xk . wk−l
k x k = Aκ x0 +
k−1 X
k−1−i −1 ′ Aκ Bκ wi−l − B2 ϒκ B2 ζi
k x0 + = Aκ
k−1 X
k−1−i Aκ Bκ wi−l
i=0
(17)
Substituting (17) into (1), one also has −1 ′ xk+1 = Aκ xk − B2 ϒκ B2 ζk + Bκ wk−l , −1 ′ zk = Cκ xk − D2 ϒκ B2 ζk + Dκ wk−l ,
i=0
(18)
where Bκ , Dκ , and Cκ are as defined in §II-A. Substituting (17) also into (11), a dynamic equation is obtained for ζk , as follows:
−
−
−1 X
′ ′ = Aκ ζk+1 + Eκ wk+1−l ,
j=i+1−l k−l−1 X
k−l−i−1 Aκ Bκ wi
k−1 X
k−1−i −1 ′ Aκ B2 ϒκ B2
i=0
−1 X
j=i+1−l
(19)
(21)
xl = ξl − Gl ζl−1 ,
(22)
where the term ξl , defined in (23), right below, captures the part of (the optimal) xl that is completely determined by the initial data (x0 , w˘ 0 ): l ξl = Aκ x0 +
−1 X
i=−l
′ A−(i+1) Bκ − Aκ Gl+i Eκ wi κ
(23)
and where we use the convention Gi = 0 for i ≤ 0. The component −Gl ζl−1 of xl is completely determined by the free selection of w ∈ ℓ2 and is thus the subject of the outer optimization in (8). Again, the principle of dynamic programing implies that, for any k ≥ 0, xk+l = ξk+l − Gl ζk+l−1 ,
(24)
where ξk+l is defined by (xk , w˘ k ), in complete analogy to (23). We conclude with an expression for p−1 : p−1 = Xκ x0 + ζ−1
′i ′ Aκ Eκ wi+k+1−l
′l = Xκ x0 + Aκ ζl−1 + ′i+l−k−1 ′ Aκ Eκ w i
+
′l−k−1 Aκ ζl−1 ,
(20a)
i=k+1−l
where the summation term (ignored for k = l − 1) is completely determined by the initial data, w˘ 0 , and ∞ X
′ j+l−i−1 ′ Aκ Eκ w j
where, again, a sum is ignored when the index set becomes empty, and where the Grammians Gk are as defined in §II-A. In particular, the value of xl can be written as
where Eκ is as defined in §II-A and the last equality in (19) is obtained from the ARE (2). In effect, the coupled (18) and (19) is a convenient partition of the coupled (1) and (11) into a cascade (i.e., a block triangular system), comprising the anticausal, anti-stable system (19), followed by the causal, stable system (18). c) The interval [0, l − 1]: A clear distinction exists in the optimization of w, in (8), between the dynamics over the interval [0, l − 1], where only pre-determined initial values of w come to bear, and the ray [l, ∞), where the selection of w ∈ ℓ2 takes effect. Indeed, this distinction is a key ingredient of the difference from a standard H ∞ problem. It is therefore useful to obtain a detailed picture of the inner optimization (in u) along [0, l − 1], based on (18) and (19). We begin with the computation of an explicit expression for ζk , k = −1, . . . , l − 1:
ζl−1 =
′ j+l−i−1 ′ Eκ w j Aκ
′l−k − Gk A κ ζl−1 ,
+ ( A′ Xκ B1 + C ′ D1 )wk+1−l + ( A′ Xκ B2 + C ′ D2 )uk+1
=
k−1−i −1 ′ B2 Aκ B2 ϒκ
k = Aκ x0 +
−
+ C ′ D1 wk+1−l + C ′ D2 uk+1 = ( A′ Xκ A + C ′ C − Xκ )xk+1 + A′ ζk+1
i=0 −1 X
i=0 k−1 X
k−1−i −1 ′ ′l−i−1 Aκ B2 ϒκ B2 Aκ ζl−1
i=−l
= A′ pk+1 + C ′ zk+1 − Xκ xk+1 = A′ Xκ xk+2 + A′ ζk+1 + (C ′ C − Xκ )xk+1
∞ X
k−1 X
i=0
ζk = p k − Xκ xk+1
ζk =
′i ′ Aκ Eκ w i
(20b)
i=0
captures the dependence on the selection of w ∈ ℓ2 . Turning to xk , for k = 1, . . . , l, one obtains a similar
−1 X
′i+l ′ Aκ Eκ w i .
(25)
i=−l
2) The complete game – the optimal w: To complete the solution of (8), we now consider the outer optimization, appealing again to [1, Theorem 3]. That is done with the following definitions: we set V = ℓ2 × ℓ2 and embed in that space pairs (w, z# (x0 , w˘ 0 , w)); using the notations of [1, Theorem 3], the target vector “−v” is defined as the contribution of the data, (x0 , w˘ 0 ), to (w, z# (x0 , w˘ 0 , w)), which is (0, z# (x0 , w˘ 0 , 0)); the role of the optimization variable “u” is now played by the trajectory w; the operator S is defined via Sw = (w, z# (0, 0, w)) (i.e., the contribution of w with zero
“H ∞ CONTROL AND ESTIMATION WITH PREVIEW—PART II,” BY G. TADMOR AND L. MIRKIN
.
Jγ = kz∗ k2ℓ2 − γ 2 kw∗ k2ℓ2 = hx0 , p−1 i +
l−1 X
7
hwk−l , B1′ p∗k + D1′ z∗k i
k=0
=
′l hx0 , Xκ x0 i + hx0 , Aκ ζl−1 i +
x0 ,
= hx0 , Xκ x0 i + hξl , ζl−1 i + 2 x0 ,
−1 X
′l+i ′ Aκ Eκ w i
i=−l −1 X
+
−1 X
′ hwk , Eκ xk+l + Bκ ζk+l + Qκ wk i
k=−l
′l+i ′ Aκ Eκ wi + hw˘ 0 , Qκ w˘ 0 iℓ2 [−l,−1] +
−1 X
w k , Eκ
k=−l+1 −1 X ′ ′i−k−1 ′ −1 ′ l+k−1−i + wk , Bκ w k , Eκ Aκ Eκ w i − Aκ B2 ϒκ B2 k=−l k=−l+1 i=k+1 i=0 j=i+1−l −2 X
i=−l
−1 X
−1 X
initial data); finally, we substitute J = diag{γ 2 I, −I}. With these definitions hSw, J Swiℓ2 = γ 2 kwk2ℓ2 − kz# (0, 0, w)k2ℓ2 .
(26)
As noted in the outline of the proof of Observation 4.1 (specifically, in (10)), if γ > γopt then the expression in (26) is uniformly positive, and the condition of [1, Theorem 3] is satisfied. Thus, then there exists a unique solution to (8). That solution is completely characterized by the condition S ′ J(w∗ , z∗ ) = 0. That is, ∀w ∈ ℓ2 0 = hSw, J(w∗ , z∗ )iℓ2 2
∗
#
= γ hw, w iℓ2 − hz (0, 0, w), z (x0 , w˘ 0 , w )iℓ2 Since S is defined in terms of a combined causal and anticausal systems, direct computation of S ′ is awkward, and we use a trick from [20] to first simplify the equality above. Again, let u = −K x be a stabilizing feedback. Let z˜ ∈ ℓ2 be the response to w and the zero initial data, under this feedback, and let △z = z# (0, 0, w) − z˜ ∈ ℓ2 . Then we can continue as follows: . . . = γ 2 hw, w∗ iℓ2 − h˜z , z# (x0 , w˘ 0 , w∗ )iℓ2 − h△z, z (x0 , w˘ 0 , w )iℓ2 = γ hw, w iℓ2 − h˜z , z# (x0 , w˘ 0 , w∗ )iℓ2 2
∗
(27)
The fact that the term h△z, z# (x0 , w˘ 0 , w∗ )iℓ2 vanishes follows from the fact that z∗ = z# (x0 , w˘ 0 , w∗ ) is, in particular, the optimal solution of the inner LQ optimization, in (8), with the predetermined initial data and the exogenous input w∗ . Returning to §IV-A.1, the optimality condition of [1, Theorem 3], applied to the optimization of u, is that z∗ must be orthogonal to the ℓ2 response of [ A, B2 , C, D], to any admissible control with x0 = 0, w˘ 0 ≡ 0 and w ≡ 0. The trajectory △z is one such response, hence h△z, z∗ iℓ2 = 0. Now we can consider the alternative characterization of the optimal solution, in terms of the condition S˜ ′ J(w∗ , z∗ ) = 0, ˜ = (w, z˜ ) is as defined above. The adjoint S˜ ′ is where Sw realized by the same anti-causal dynamics as (11), and the condition S˜ ′ J(w∗ , z∗ ) = 0 thus means that 0 = B1′ pk + D1′ zk − γ 2 w∗k−l ,
k ≥ l.
i=−l
′ j+l−i−1 ′ Eκ w j . Aκ
(33)
3) The optimal cost: The restriction of the combined (1), (11), and (28) to the positive ray [l, ∞) is a homogeneous system, in the sense that both the control and the exogenous input are completely (albeit, so far, implicitly) determined by the initial data. In complete analogy to (12), the combined (1), (11), and (28) yield hx∗k , p∗k−1i − hx∗k+1 , p∗k i = kz∗k k2 − γ 2 kw∗k−l k2 ,
k ≥ l. (29)
Summing over k, the contribution to the optimal cost is therefore (30)
Since wk−l is part of the initial data, and need not satisfy (28) for k ∈ [0, l − 1], the expression for the complete cost, including that initial interval, requires additional details. The starting point is the inhomogeneous (12) hxk , p k−1i − hxk+1 , p k i = kzk k2 − hwk−l , B1′ pk + D1′ zk i (31) Invoking the expression (17) for the optimal control, one can substitute B1′ pk + D1′ zk = B1′ Xκ xk+1 + B1′ ζk + D1′ zk = (B1′ Xκ A + D1′ C)xk + B1′ ζk
∗
#
k−1−i Aκ Bκ wi
hx∗l , p∗l−1i = kz∗ k2ℓ2 [l,∞) − γ 2 kw∗ k2ℓ2 [0,∞) .
∗
#
l+k−1 X
k−1 X
(28)
In summary, the combination of (1), (11), and (28) provides a complete and unique characterization of the optimal trajectories of x, u, w and z ∈ ℓ2 , and the auxiliary trajectories p, ζ, and ξ ∈ ℓ2 .
+ (B1′ Xκ B1 + D1′ D1 )wk−l + (B1′ Xκ B2 + D1′ D2 )uk ′ = Eκ xk + Bκ ζk + Qκ wk−l
(32)
where Bκ , Eκ , and Qκ are as in §II-A. The expression for the combined cost is now obtained by summing (31) for k ∈ [0, l − 1] and adding (30). This results in (33) at the top of this page. To simplify summation expressions we maintain our earlier convention that a sum is ignored when the index set is empty and that Gi = 0 whenever i ≤ 0. 4) A change of state variable: Our eventual purpose is to characterize solutions of (8) in terms of the H ∞ -type ARE (3). The desired solution of such Riccati equations typically defines the stable subspace of a related homogeneous Hamilton-Jacobi system, which in the present setting consists of the coupled (1), (11) and (28). The current deviation from the standard case is in the fact that members of the stable subspace need not be defined in terms of the state xl , and it is not guaranteed that any selection of xl is associated with an ℓ2 extension. This point has already been made in [1], for the continuous time case. Here we develop the more elaborate, discrete time case.
8
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 50, NO. 1, JAN. 2005 (TO APPEAR)
That is, the role of a parameterization of the stable subspace is now be taken by ξl , as shown below. Observation 4.3: A linear mapping ζl−1 = Xγ ξl is well defined, relating the two components of the optimal solution. Moreover, the unique combined solution of (1), (11) and (28) over [l, ∞) (including the optimal selections of u k and wk−l , k ≥ l) is linearly determined by ξl . Proof: We begin with the observation that the term involving ζl−1 , in (33), which captures the contribution of w∗ ∈ ℓ2 to the optimal cost, is hξl , ζl−1 i. Thus, when the contribution of the initial data to xl is ξl = 0, the contribution of w∗k , k ≥ 0, to the optimal cost, is zero, as well. Then, the optimal cost is determined only by terms that involve the initial data (x0 , w˘ 0 ), in (33). The same contribution is achieved with the selection of wi = 0, i ≥ 0, hence with ζl−1 = 0, xl = 0 and uk = 0, k ≥ l. Since the solution of (8) is unique, we conclude that, indeed, then w∗ ≡ 0, hence that ζl = 0. Pairs (ξl , ζl ) depend linearly on the initial data and form a linear manifold. The fact that ξl = 0 implies ζl−1 = 0, thus means that ζl depends linearly on ξl , as claimed. Similarly, a linear dependence on ξl of the unique ℓ2 solution of the combined (1), (11) and (28), and of optimal values of u k and wk−l , k ≥ l, follows. The mapping Xγ is uniquely defined over the subspace Ξ ⊂ Rn of possible values of ξl , and should be understood in that . sense. We denote Z = Im(Xγ ) ⊂ Rn . Defining Xγ |Ξ⊥ = 0, the mapping Xγ is extended to the entire Rn , with an n × n matrix representation. Since ξl and ζl−1 = Xγ ξl fully determine the initial conditions xl and pl−1 of the unique ℓ2 solution of the combined (1), (11) and (28) over the positive ray [l, ∞), they can be viewed as the appropriate state and co-state of that system. Having applied the optimal input up to a point k − l ≥ 0, the game (8) can be solved again, now over the ray [k − l, ∞) with the initial data (xk−l , w˘ k−l ). The new solution must coincide with the restriction of the original solution. In particular, if ζk−1 and ξk are defined in the shifted system, the relation ζk−1 = Xγ ξk must be maintained for all k ≥ l. The relations with the original x and p are derived from (16) and (23)
′ 0 = Eκ xk + Bκ ζk + (Qκ − γ 2 I )wk−l ′ = Eκ ξk − Eκ Gl ζk−1 + Bκ ζk + (Qκ − γ 2 I )wk−l ′ ′ )ζk + Λγ wk−l , = Eκ ξk + (Bκ − E κ Gl A κ . ′ where Λγ = Qκ − Eκ Gl Eκ − γ 2 I, as defined in §II-A. Observation 4.4: Λγ is negative definite. Proof: The basic idea is the same as in the proof of Observation 4.2. Consider (8) with the zero data x0 = 0 and w0 = 0. By (10), if γ > γopt then the unique optimal solution is the zero solution: w∗ ≡ 0 and u∗ ≡ 0, associated with the zero value for (8). Any other selection of w would thus result with a strictly lower, negative value for the game. Select w0 6= 0 and wk = 0, k ≥ 1, and let u be the associated optimal control. Returning to (12) and appealing to (32), one computes the associated cost as ′ kzk2ℓ2 − γ 2 kwk2ℓ2 = hw0 , Eκ xl + Bκ ζl + (Qκ − γ 2 I )w0 i.
All it takes now is to realize that, by (20) and (22), ζl = 0, ′ ′ ζl−1 = Eκ w0 and xl = −Gl ζl−1 = −Gl Eκ w0 . In short, the cost is hw0 , Λγ w0 i, which must be negative. The conclusion is that w∗ is given by ′ ′ w∗k−l = −Λ−1 Eκ ξk + (Bκ − E κ Gl A κ )ζk , k ≥ l. (34) γ This equality reduces the dynamic equations in ξ and ζ (for k ≥ l) to a homogeneous Hamilton-Jacobi system ′ ξk+1 = Aκ − (Bκ − Aκ Gl Eκ )Λ−1 Eκ ξk
′ ′ ′ − (Bκ − Aκ Gl Eκ )Λ−1 (Bκ − E κ Gl A κ ) l −1 ′ ′l + Aκ B2 ϒκ B2 Aκ ζk (35) ′ ′ −1 ′ ′ ′ −1 ζk−1 = Aκ − Eκ Λ (Bκ − Eκ Gl Aκ ) ζk − Eκ Λ Eκ ξk
To facilitate further reference we use the abbreviated notations from §II-A, rewriting these equations as ′ ξk+1 = Aγ ξk − Ψ Φ−1 γ Ψ ζk ′ ′ ζk−1 = −Eκ Λ−1 γ Eκ ξk + A γ ζ k
xi = ξi − Gl ζi−1 = (I − Gl Xγ )ξi , pi−1 = Xκ xi + ζi−1 = Xκ (I − Gl Xγ ) + Xγ ξi ,
this characterization, using the definitions and the dynamic equations of ξ and ζ:
i ≥ l.
A dynamic equation for ξi is derived, using the equality ξk = xk + Gl ζk−1 ξk+1 = xk+1 + Gl ζk −1 ′ = Aκ xk + (Gl − B2 ϒκ B2 )ζk + Bκ wk−l −1 ′ = Aκ (ξk − Gl ζk−1 ) + (Gl − B2 ϒκ B2 )ζk + Bκ wk−l −1 ′ ′ = Aκ ξk + (Gl − B2 ϒκ B2 − Aκ Gl Aκ )ζk ′ + (Bκ − Aκ Gl Eκ )wk−l l −1 ′ ′l ′ = Aκ ξk − Aκ B2 ϒκ B2 Aκ ζk + (Bκ − Aκ Gl Eκ )wk−l
5) Solving for Xγ – an ARE characterization: We return to an evaluation of the optimal solution, and the optimal cost, in (8). Equation (28) and its interpretation in (32) provide an implicit characterization of w∗ . We now elaborate on
(36)
Our earlier conclusions can be stated in terms of this system as follows: Observation 4.5: Let Ξ be the subspace spanned by possible values of ξl , as defined above. Then for any ξl ∈ Ξ there is a unique ℓ2 solution of (36). The linear mapping ζk−1 = Xγ ξk , k ≥ l, defines the dependence between the state and co-state in ℓ2 solutions of (36). This observation provides a complete characterization of Xγ in terms of the state component of a basis for the stable subspace of (36). We proceed to compute an ARE characterization of Xγ . ′ Corollary 4.1: The matrix I + Ψ Φ−1 γ Ψ Xγ is invertible and the optimal dynamics of ξ is governed by the discrete time ′ −1 stable matrix FX = (I + Ψ Φ−1 Aγ = (I − Ψ (Φγ + γ Ψ Xγ ) ′ −1 ′ Ψ Xγ Ψ ) Ψ Xγ ) A γ . Proof: We begin with the observation that Im(Ψ ) ⊂ Ξ ′ and Im(Bκ − Aκ Gl Eκ ) ⊂ Ξ, and consequently, that Ξ is
“H ∞ CONTROL AND ESTIMATION WITH PREVIEW—PART II,” BY G. TADMOR AND L. MIRKIN
invariant under Aκ . Indeed, both facts follow from the the definition (23) and from the following equality: ′ , Xκl B2 Ψ = Bκ − Aκ Gl Eκ I 0 ′ l . = Bκ − Aκ Gl−1 Eκ , Aκ B2 −1 ′ ′l−1 ′ −ϒκ B2 Aκ Eκ I
′ The invariance of Ξ under I + Ψ Φ−1 γ Ψ Xγ is an immediate conclusion. The invariance of Ξ under Aκ now follows from the state equation in (35), which can be written in the form ′ ′ −1 Aκ ξk = (I + Ψ Φ−1 γ Ψ Xγ )ξk+1 + (Bκ − Aκ Gl Eκ )Λ Eκ ξk
We now rewrite this equation in the form ′ (I + Ψ Φ−1 γ Ψ Xγ )ξk+1 = A γ ξk
and observe that Ξ is invariant also under Aγ . The uniqueness of the ℓ2 solution of (36) and the relation ζk−1 = Xγ ξk imply that the last equation has a unique ℓ2 solution for each initial value ξl ∈ Ξ. In particular, the only solution associated with ξl = 0 is ξk = 0, k > l. This will be ′ violated if ∃0 6= ξl+1 ∈ N (I + Ψ Φ−1 γ Ψ Xγ ) ∩ Ξ. Adding the −1 ′ facts that Ξ is Ψ Φγ Ψ Xγ -invariant and that the latter matrix ′ vanishes over Ξ⊥ , we conclude that N (I + Ψ Φ−1 γ Ψ Xγ ) = {0}. The fact that the closed loop dynamics of ξ is generated by FX is now immediate. This provides for the invariance of Ξ under FX and for the stability of the restriction of FX to Ξ. The proof of the invariance properties above shows that FX can be written in the form FX = Aκ + L,
where Im(L) ⊂ Ξ.
(37)
l Since Ξ is Aκ -invariant and, by (23), Im( Aκ ) ⊂ Ξ, it follows l that Im(FX ) ⊂ Ξ, which completes the proof. The ARE characterization of Xγ , in (3), is now readily obtained from the co-state equation in (36). The proof that a stabilizing solution of an ARE is self adjoint is by a standard reduction to an equivalent Lyapunov equation with a self adjoint free term, in complete analogy to the reduction of (2) to (15), above. To see that Xγ ≥ 0, one first notes that the contribution of the optimal w∗ to the cost of (8), as compared with the cost with wk = 0, k ≥ 0, must be non-negative. That contribution is captured by the term hζl−1 , ξl i = hξl , Xγ ξl i, in (33). Further analysis of Xγ follows.
B. An extended state-space solution ∞
Here we recast the preview H problem – and the game (8) – as a standard problem in an extended state space model that subsumes the preview. The solution from the previous sections will be interpreted in that context and a suboptimal feedback will be derived as a standard H ∞ result. In the continuous time case, the extended state space model is an abstract evolution model [1]. Extended state space models were also used in the context of the discrete-time preview/smoothing problems, e.g., in [3]. The main disadvantage of a solution based solely on the extended state space is that the dimension of the resulting ARE grows with the length of the preview lag. The contribution
9
here, as compared with previous solutions, is in utilizing the interplay between the direct analysis in the original system, above, and the analysis in the extended state space model, to obtain a solution based on the ARE (3), whose dimension does not depend on the preview lag. 1) The extended state space model & solution: Capturing the entire relevant data at the time k, the complete state of the preview system is xk . xˇk = w˘ k In developing a state space realization it will be convenient to shift the control input as follows: ′ A B1 xk −1 B2 Xκ D2′ (38) uk = uˆ k − ϒκ C D1 wk−l The extended state space realization is then of the form xˇk+1 = Aˇ xˇk + Bˇ 1 wk + Bˇ 2 uˆ k ˇ 2 uˆ k zk = Cˇ xˇk + D
(39)
where
Aˇ Bˇ 1 Bˇ 2 ˇ1 D ˇ2 Cˇ D
Aκ 0 0 0 .. .
0 0 I 0 .. .
0 0 0 I .. .
0 0 0 0 .. .
... ... ... ... .. .
Bκ 0 0 0 .. .
0 I 0 0 .. .
B2 0 0 0 .. .
. = 0 0 0 ... I 0 0 0 Cκ 0 0 . . . 0 Dκ 0 D2
.
(40)
The following observation is immediate. Observation 4.6: The input/output mapping in (1) is realized by (39), the latter satisfies A 1 and A 2 , and the H ∞ problems for (1) and for (39) are identical. We quote the well known solution of the discrete-time fullinformation H∞ problem[22] (hereafter, we use the notations ˇ = D ˇ 2 ): ˇ1 D Bˇ = Bˇ 1 Bˇ 2 and D Theorem 4.1: The following are equivalent 1) γ > γopt , in the complete information problem, in (39) 2) The Riccati equation Xˇ = Aˇ ′ Xˇ Aˇ + Cˇ ′ Cˇ ˇ ˇ ′ C), ˇ Γˇ γ−1 ( Bˇ ′ Xˇ Aˇ + D − ( Aˇ ′ Xˇ Bˇ + Cˇ ′ D) . ˇ ′D ˇ −γ where Γˇ γ = Bˇ ′ Xˇ Bˇ + D ˇ lizing solution X ≥ 0 such that
2 I 0 0 0
(41)
, admits the stabi-
ˇ ˇ Γˇ γ 12 Γˇ γ−1 22 Γ γ 21 − Γ γ 11 > 0
(42)
ˇ (the partitioning of Γˇ γ corresponds to that of Bˇ and D). Furthermore, if γ > γopt indeed, then the optimal cost in the game (8) is xˇ0′ Xˇ xˇ0 and ˇ′ ˇ ˇ ˇ′ ˇ ˇ′ ˇ ˇ uˆ k = −Γˇ γ−1 (43) 22 ( B2 X A + D2 C) xˇk + B2 X B1 w k is one strictly γ-suboptimal solution.
10
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 50, NO. 1, JAN. 2005 (TO APPEAR)
′l l χ = (Xκ + Aκ Xγ A κ )x0 +
−1 X
i=−l
′l −i−1 ′ ′l+i ′ Aκ Xγ A κ (Bκ − Aκ Gl+i Eκ ) + Aκ Eκ w i
(45a)
and, for k = −l, . . . , −1, k+l ′ ′ ′−(k+1) l η k = Eκ Aκ + (Bκ − Eκ Gl+k Aκ ) Aκ Xγ A κ x0 + Qκ w k ′ ′ ′−(k+1) + (Bκ − Eκ Gl+k Aκ ) Aκ Xγ
−1 X
′ A−(i+1) (Bκ − Aκ Gl+i Eκ )wi + Eκ κ
i=−l ′ + Bκ
k−1 X
k−1−i Aκ Bκ wi
i=−l −1 X
i=k+1
2) Analysis of the extended state space solution: A key fact, from our perspective, is that the solution Xˇ is completely determined as the positive semi-definite kernel of the quadratic form for the optimal cost, which we have already computed, in (33): substituting ζl−1 = Xγ ξl and the expression (23) for ξl into (33), the equality χ = Xˇ xˇ0 (44) η˘ translates to the expressions in (45) at the top of this page, where, again, we simplify notations with the conventions that G j = 0 whenever j ≤ 0 and Aκj = 0 whenever j < 0. It is observed that χ = p−1 , is the initial co-state of the original Hamilton-Jacobi system, formed by (1), (11), and (28). Equations (45) can be further utilized to derive the explicit expressions for the coefficients in the ARE (41). We, however, only need the first block row of Xˇ to express the coefficient of the control law (43) in terms of the parameters of (1), see §IVC. We thus omit these cumbersome expressions. We do note, however, that by laborious yet straightforward computations, one shows that l−1 −1 I 0 I Eκ Aκ B2 ϒκ ˇ (46) Γγ = ϒγ −1 ′ ′l−1 ′ B2 Aκ Eκ I ϒκ 0 I where ϒγ is defined in Theorem 2.1. Therefore, the inertias of Γˇ γ and ϒγ coincide. Having obtained (46), it is noted that all the ingredients of the necessary conditions in Theorem 2.1 (i.e., the implications of the assumption “γ > γopt ”) are now established. That is, the proof of necessity in Theorem 2.1 is complete. 3) The proof of sufficiency: Our goal is to translate the existence of a solution of the ARE (3), with the specified properties, to validation of the feedback (4) as a stabilizing and strictly γ-suboptimal control policy. Given that (4) is identical to (43), as observed above, this would have been trivial, have ˇ as defined by (45) we known that the extended matrix X, and (44), is a solution of the ARE (41) with the properties specified in Theorem 4.1. This is roughly the direction in which our proof goes. Henceforth it will thus be assumed that, indeed, the matrix Xγ is a solution of (3), satisfying the various conditions in Theorem 2.1, and that a matrix Xˇ is defined over the extended state space, as in (45) and (44). Observation 4.7: Xˇ is a self adjoint, stabilizing solution of the ARE (41). Proof: The fact that Xˇ is self adjoint readily follows from its definition, in (45). Given initial data in (1), define ξl
′i−k−1 ′ Aκ Eκ w i −
k−1 X
i=−l
k−1−i −1 ′ Aκ B2 ϒκ B2
−1 X
′ j−i−1 ′ Aκ Eκ w j
(45b)
j=i+1
as in (23), ζl−1 = Xγ ξl , and then a control u, a state x and auxiliary trajectories p and ζ, for k < l, by the expressions derived for the optimal trajectories, in §IV-A.1.b and §IVA.1.c. The association of the ARE (3) with the HamiltonJacobi system (36) means that the trajectories ξk+l = FXk ξl and ζk+l−1 = Xγ ξk+l , k ≥ 0, define an ℓ2 solution of (36). Denote xk+l = ξk+l − Gl ζk+l−1 and pk+l−1 = Xκ xk+l + ζk+l−1. Given the invertibility of ϒκ (guaranteed by A 1 and A 2 ) and of Λγ (one of the requirements in Theorem 2.1), these trajectories can be associated with ℓ2 trajectories of u and w, defined by (17) and (34), respectively, and a trajectory z, along the entire positive ray [0, ∞). Together, x, p, w, u and z form an ℓ2 solution of the original (implicit) Hamilton-Jacobi system, comprising (1), (11), and (28). In particular, the computations made above show that the equality kzk2ℓ2 − γ 2 kwk2ℓ2 = h xˇ0 , Xˇ xˇ0 i remains valid, albeit, so far, with no claim to optimality or uniqueness. We shall use these facts to establish that Xˇ is a solution of (41). As a preliminary observation, direct computation reveals that the characterization of the input in (1) via (17) and (34) (equivalently, by the second equation in (11) and by (28)) is identical to the the characterization of the input in (39), via the following equation wk ′ ˇ ˇ ′ ˇ ˇ ˇ ˇ . 0 = ( B X A + D C) xˇk + Γγ uˆ k Details of the algebra used to establish this equivalence are deferred to the end of this section. Thus, wk ˇ xˇk . ˇ ′ C) (47) = −Γˇ γ−1 ( Bˇ ′ Xˇ Aˇ + D uˆ k The implication here is that the state trajectory of the original system, with inputs given by (17) and (34), defines a trajectory . ˇ ˇ ′ C). xˇk in (39), generated by Aˇ F = Aˇ − Bˇ Γˇ γ−1 ( Bˇ ′ Xˇ Aˇ + D The fact that Xˇ satisfies (41) is now obtained from the following set of equalities, that hold for any xˇ0 : h xˇ0 , Xˇ xˇ0 i = kzk2ℓ2 [0,∞] − γ 2 kwk2ℓ2 [0,∞] = kz0 k2 − γ 2 kw0 k2 + kzk2ℓ2 [1,∞] − γ 2 kwk2ℓ2 [1,∞] = kz0 k2 − γ 2 kw0 k2 + h xˇ1 , Xˇ xˇ1 i
“H ∞ CONTROL AND ESTIMATION WITH PREVIEW—PART II,” BY G. TADMOR AND L. MIRKIN
ˇ xˇ0 k2 ˇ Γˇ γ−1 ( Bˇ ′ Xˇ Aˇ + D ˇ ′ C) = k(Cˇ − D
γ I 0 −1 ′
ˇ xˇ0 2 ˇ ′ C) − Γˇ ( Bˇ Xˇ Aˇ + D 0 0
γ
+ h xˇ0 , Aˇ ′F Xˇ Aˇ F xˇ0 i.
Since the equality holds for all xˇ0 , the matrix kernel of the first ˇ must be equal to the matrix kernel of expression, which is X, the last expression. It is a matter of simple algebra to see that the latter is no other than the right hand side of (41). Again, details are deferred to the end of this section. To establish stability of Aˇ F we appeal again to the observation that the feedback (47) is equivalent to the combined (17) and (34). This means that the trajectory generated by Aˇ F is eventually defined by an associated trajectory of ξ, generated by FX . The stability of FX implies exponential decay of ξk and ζk , relative to the initial data, hence of the associated trajectories of x, p, u and w, and eventually, of the trajectory x, ˇ that is generated by Aˇ F . Observation 4.8: Xˇ is non-negative. Proof: Given any exogenous input w, the control policy (43) and state response x, ˇ in (39), denote 1 2
− 21
−1 ˇ ′ −1 ˇ ′ Γ12 − Γˇ 11 ) w + (Γˇ 12 Γˇ 22 △w = (Γˇ 12 Γˇ 22 Γ12 − Γˇ 11 ) −1 ′ ′ ˇ xˇk . ˇ 2′ C)) × ( Bˇ 1 Xˇ Aˇ − Γˇ 12 Γˇ 22 ( Bˇ 2 Xˇ Aˇ + D
(48)
−1 ˇ ′ Γ12 − Γˇ 11 > 0 (by (46) the inertias of Γˇ γ and As Γˇ 12 Γˇ 22 ϒγ coincide), this expression is well defined. The following standard equality is obtained by completion of squares. It holds over any finite interval, with any input w
kzk2ℓ2 [0,k] − γ 2 kwk2ℓ2 [0,k] + k△wk2ℓ2 [0,k] + h xˇk , Xˇ xˇk i = h xˇ0 , Xˇ xˇ0 i,
k > l.
(49)
Select wi = 0 for all i ≥ 0. Then w˘ k = 0 for k ≥ l, where ′l l (by (45)) h xˇk , Xˇ xˇk i reduces to hxk , (Xκ + Aκ Xγ A κ )xk i. Thus, then (49) becomes kzk2ℓ2 [0,k] + k△wk2ℓ2 [0,k] ′l l Xγ A κ )xk i = h xˇ0 , Xˇ xˇ0 i, + hxk , (Xκ + Aκ
k > l.
Since both Xκ ≥ 0 and (by assumption) Xγ ≥ 0, the left hand side is nonnegative. This proves that Xˇ ≥ 0. At this point one can invoke the sufficient condition in Theorem 4.1, applied to (39) and the ARE (41), to conclude that, indeed, the feedback (43) (equivalently, (4)) is strictly γ-suboptimal, and in particular, that γ > γopt . C. Controller formula To complete the proof of Theorem 2.1 we only need to show that (4) is equivalent to (38) and (43). Toward this end, denote by Xˇ 0 Xˇ 1 . . . Xˇ l the first block row of Xˇ . partitioned compatible with xˇ and also Xˆ = Xκ + Xˇ 0 . Then, using (40), Aκ Bκ ′ xk ′ ˇ uˆ k = −ϒ−1 B X D γ22 2 0 2 wk−l Cκ Dκ ′ − ϒ−1 γ22 B2
−1 X
i=−l
Xˇ −i wk+i+1 .
11
It is readily seen that the summation term above is exactly the summation term in (4). Then, Aκ Bκ ′ ′ ˇ B2 X0 D2 Cκ Dκ A B1 ′ ′ ˇ = B2 X0 D2 C D1 A B1 ′ −1 B2 Xκ D2′ − (B2′ Xˇ 0 B2 + D2′ D2 )ϒκ C D1 A B 1 −1 B2′ Xκ D2′ = B2′ Xˆ D2′ − ϒγ22 ϒκ C D1 (the last equality is obtained using the fact that ϒγ22 = Γˇ γ22 = B2′ Xˆ B2 + D2′ D2 ). Now, the second term in the right-hand side above pre-multiplied by ϒ−1 γ22 cancels out the second term in the right-hand side of (38). We thus obtain (4) by noticing that ′l l Xγ A κ . Xˆ = Xκ + Aκ The proof of Theorem 2.1 is now complete. V. C ONCLUDING R EMARKS In retrospect, a main stumbling block to the application of standard H ∞ results to the two problems considered here, concerns the explicit formulation of an appropriate HamiltonJacobi system. The breakthrough in this respect is the introduction of appropriate concepts of a state and co-state (here ξ and ζ): appealing to differential games reasoning, the state and co-state are defined as the vectors capturing the impact of past input and effects of future inputs, respectively. A second challenge is due to the presence of input delays. The approach taken here builds on developments from the mid 1990’s, whereby the solution of a delay problem exploits the interplay between the original, delay model, and an euivalent, extended state space model, without delay: the optimal value of the differential game is computed, using finite dimensional tools, in the former, and the format of the compensator, as well as suboptimality arguments are obtained, exploiting the simpler formalism of the latter. While still technically challenging (relative to other common H ∞ problems), henceforce the road towards a solution followd fameliar milestones. An important aspect, as compared with earlier results, is that the necessary and sufficient characterization of suboptimal values, in a general setting, is in terms of a single ARE whose size is the same as original state sspace dimension. This improves on solutions which required higher dimesnional equations, iterative algoritms or stringent sufficient but not necessary conditions. R EFERENCES [1] G. Tadmor and L. Mirkin, “H ∞ control and estimation with preview— Part I: Matrix ARE solutions in continuous time,” IEEE Trans. Automat. Control, vol. 50, no. 1, 2005 (to appear). [2] M. J. Grimble, “H ∞ fixed-lag smoothing filter for scalar systems,” IEEE Trans. Signal Processing, vol. 39, no. 9, pp. 1955–1963, 1991. [3] Y. Theodor and U. Shaked, “Game theory approach to H ∞ -optimal discrete-time fixed-point and fixed-lag smoothing,” IEEE Trans. Automat. Control, vol. 39, no. 9, pp. 1944–1948, 1994. [4] H. Zhang, L. Xie, and Y. C. Soh, “H ∞ deconvolution filtering, prediction, and smoothing: A Krein space poloynominal approach,” IEEE Trans. Signal Processing, vol. 48, no. 3, pp. 888–892, 2000.
12
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 50, NO. 1, JAN. 2005 (TO APPEAR)
[5] B. D. O. Anderson and J. B. Moore, Optimal Filtering. Englewood Cliffs, NJ: Prentice-Hall, 1979. [6] H. Zhang, L. Xie, and Y. C. Soh, “A unified approach to linear estimation for discrete-time systems. II. H ∞ estimation,” in Proc. 40th IEEE Conf. Decision and Control, Orlando, FL, 2001, pp. 2923–2928. [7] A. C. Kahane, L. Mirkin, and Z. J. Palmor, “On the discrete-time H ∞ fixed-lag smoothing,” in Proc. 15th IFAC World Congress, Barcelona, Spain, 2002. [8] P. Bolzern, P. Colaneri, and G. De Nicolao, “H ∞ smoothing in discretetime: a direct approach,” in Proc. 41st IEEE Conf. Decision and Control, Las Vegas, NV, 2002, pp. 4233–4238. [9] P. Colaneri and A. Ferrante, “A J-spectral factorization approach for H∞ estimation problems in discrete time,” IEEE Trans. Automat. Control, vol. 47, no. 12, pp. 2108–2113, 2002. [10] A. Kojima and S. Ishijima, “Robust controller design for delay systems in the gap-metric,” IEEE Trans. Automat. Control, vol. 40, no. 2, pp. 370–374, 1995. [11] G. Tadmor, “Robust control in the gap: A state space solution in the presence of a single input delay,” IEEE Trans. Automat. Control, vol. 42, no. 9, pp. 1330–1335, 1997. [12] M. C. Delfour, E. B. Lee, and A. Manitius, “F-reduction of the operator Riccati equation for hereditary differential equations,” Automatica, vol. 14, pp. 385–395, 1978. [13] M. C. Delfour and A. Manitius, “The structural operator F and its role in the theory of retarded systems,” J. Math. Analysis and Appl., vol. 73, pp. 466–490, 1980. [14] D. Salamon, Control And Observation of Neutral Systems. Pitman, 1984. [15] V. Ionescu, C. Oar˘a, and M. Weiss, Generalized Riccati Theory and Robust Control. Chichester: John Wiley & Sons, 1999. [16] L. Mirkin, “On the H ∞ fixed-lag smoothing: How to exploit the information preview,” Automatica, vol. 39, no. 8, pp. 1495–1504, 2003. [17] L. Mirkin and G. Meinsma, “When does the H ∞ fixed-lag smoothing performance saturate for a finite smoothing lag?” IEEE Trans. Automat. Control, vol. 49, no. 1, pp. 131–134, 2004. [18] L. Mirkin and G. Tadmor, “Fixed-lag smoothing as a constrained version of the fixed-interval case,” in Proc. 2004 American Control Conf., Boston, MA, 2004, pp. 4165–4170. [19] B. Hassibi, A. H. Sayed, and T. Kailath, Indefinite Quadratic Estimation and Control: A Unified Approach to H 2 and H ∞ Theories. Philadelphia: SIAM, 1999. [20] G. Tadmor, “Worst case design in the time domain: The maximum principle and the standard H ∞ problem,” Math. Control, Signals and Systems, vol. 3, pp. 301–324, 1990. [21] ——, “The standard H ∞ problem and the maximum principle: The general linear case,” SIAM J. Control Optim., vol. 31, pp. 831–846, 1993.
[22] A. A. Stoorvogel, The H ∞ Control Problem: A State Space Approach. London: Prentice-Hall, 1992.
Gilead Tadmor received a B.Sc. from Tel Aviv University in 1977, and the M.Sc., in 1979, and Ph.D., in 1984, from the Weizmann Institute of Science, Israel, all in mathematics (systems and control). In 1989 he joined Northeastern University where he is a professor of Electrical & Computer Engineering and the Director of the Communications & Digital Signal Processing Center for Research and Graduate Studies. Previously he held research and faculty positions at Tel Aviv University, Brown University, the University of Texas (Dallas) and the Laboratory for Information and Decision Systems, at M.I.T. During 1998/9 he visited SatCon Technology Co., Cambridge, MA, and during the summer of 2001, the Air Force Research Laboratory at Wright-Patterson Air Force Base. He has consulted at SatCon, United Technologies Research Center and at Corning Applied Technologies. Dr. Tadmor’s background is in the areas of robust and optimal control, distributed parameter systems, and mathematical systems theory. His recent active interests include robust and nonlinear control with applications in modeling and control of mechanical and fluid flow systems, dynamic imaging and neural systems.
Leonid Mirkin was born in Frunze, USSR (now Bishkek, Kyrgyz Republic) in 1967. He received the electrical engineer degree from Frunze Polytechnic Institute and the PhD (candidate of sciences) degree in automatic control from the Institute of Automation, Academy of Sciences of Kyrgyz Republic, in 1989 and 1992, respectively. From 1989 to 1993 he was with the Institute of Automation, Academy of Sciences of Kyrgyz Republic. In 1994 he joined the Faculty of Mechanical Engineering at the Technion — Israel Institute of Technology, where he is currently an Associate Professor. His research interests include control and estimation of sampled-data systems, deadtime compensation, control and estimation for systems with preview, and the application of control to mechanical and optical devices and A/D data converters.