1
Conditional Central Algorithms for Worst-Case Set Membership Identification and Filtering A. Garulli, A. Vicino and G. Zappa Abstract— This paper deals with conditional central estimators in a set membership setting. The role and importance of these algorithms in identification and filtering is illustrated by showing that problems like worst-case optimal identification and state filtering, in contexts where disturbances are described through norm bounds, are reducible to the computation of conditional central algorithms. The conditional Chebishev center problem is solved for the case when energy norm bounded disturbances are considered. A closed form solution is obtained in terms of finding the unique real root of a polynomial equation in a semi-infinite interval. Keywords— Identification, filtering, set membership, bounded disturbances, conditional estimation.
I. Introduction Identification for robust control has assumed a key role in the past few years. The basic objective of this research area is to design identification schemes providing variables tailored to the robust control design techniques developed in the last two decades. The essential feature of these methods is to construct a mixed model of uncertainty, consisting of a parametric model and an associated nonparametric perturbation. Loosely speaking, the parametric part accounts for the frequency behavior of the system within the control bandwidth, while the nonparametric part accounts for the unmodelled dynamics always present in any model of a real system. The objective is to derive an estimate of the parametric model and of the magnitude of the associated nonparametric perturbation, such that the real system is guaranteed to belong to the set of admissible plants associated to the estimated model. First efforts along this line of research can be found in [1], [2], [3]. Both H∞ and `1 error criteria have been adopted (see [4], [5], [6] for more extensive lists of references). The main problem with these approaches is the overparameterization of the estimated models, generally of high order, with a number of parameters larger or equal to the number of data. Obviously, this fact makes synthesis control techniques like H∞ or `1 more difficult to be used in practical applications. Moreover, for a number of additional reasons, a “parsimonious” model of the system is more acceptable in real situations. Easy interpretation of the dominant system dynamics, limited complexity of the synthesized controller and tractability of the robustness problem for the overall control system are only some of the reasons why simple models are used in common practice. A. Garulli and A. Vicino are with the Dipartimento di Ingegneria dell’Informazione, Universit` a di Siena, 53100 Siena, Italy (e-mail:
[email protected]). G. Zappa is with the Dipartimento di Sistemi e Informatica, Universit` a di Firenze, 50139 Firenze, Italy.
These reasons motivate the use of restricted (or reduced) complexity models, which have been investigated in many contributions, either in a “hard bound” context or in a “soft bound” setting of the identification problem [7], [8]. For a given model parameterization, the problem consists in choosing an estimate in a subset of the parameter space with a prescribed structure, and an associated bound on the nonparametric perturbation. The simplest example of this situation is when the estimate of the unknown variable is constrained to belong to a linear manifold in the parameter space. In this paper an approach founded on Information Based Complexity (IBC) [9], [10] with set membership description of uncertainty is taken. A linear framework is considered. The purpose is to characterize restricted complexity optimal estimators in a worst-case setting and to evaluate the associated nonparametric estimation error. Reduced complexity estimators are called conditional algorithms in the IBC literature [11]. Conditional algorithms deserve special attention also for the role they play in the important area of state estimation with norm bounded disturbances. In fact, it can be shown that the optimal filtering problem in a set membership setting can be reduced to the computation of a conditional central algorithm (see [12]). Unfortunately, these algorithms are difficult to compute, except for some special situations (see e.g. [11]). In several papers, suboptimal conditional estimators have been proposed for the problem of H2 and H∞ estimation ([8], [13], [14]). Although in many cases suboptimal estimators provide acceptable results, the problem of quantifying how much their worst-case error differs from the optimal one is at a large extent unsolved (see [14], [15] for preliminary results). The main objective of this paper is to characterize the solution of the conditional optimal estimation problem and provide an efficient procedure for the computation of the conditional center of the feasible problem element set, i.e. the set of all problem elements compatible with the a priori assumptions and the measured data, for the case when disturbances are bounded according to the energy norm. For this case, it turns out that the conditional optimal estimator can be derived in closed form, as a function of the unique real solution of an algebraic equation in a semiinfinite interval. The paper is organized as follows. Section 2 illustrates the problem formulation and shows how two important estimation problems, namely worst-case identification and set membership state filtering, lead naturally to conditional optimal estimators. Section 3 provides the main results on conditional central estimators and a procedure for comput-
2
ing the conditional center of an ellipsoidal set. In Section 4 numerical simulations on worst-case system identification are shown, while concluding remarks are reported in Section 5. The notation in the paper is standard: 0 denotes transpose, In is the n × n identity matrix, k · kX is the norm adopted in the normed space X. The linear subspace spanned by the columns of A is denoted by R(A). If A is a set, δA denotes its boundary.
the conditional central algorithm φcc requires the computation of the Chebishev center of an ellipsoid, conditioned to belong to a linear manifold. Before we attempt to characterize the solution of this problem, we show two examples of restricted complexity set-membership estimation, in which the algorithm φcc plays an important role. A. Set-membership worst-case identification Consider the linear, time-invariant, discrete-time, SISO system, described by the input/output relationship
II. Problem formulation Consider the following linear set-membership estimation setting [10]. Let X and Y be linear normed spaces, and y = Fx + e ,
(1)
where y ∈ Y is the measurement vector, x ∈ X is the unknown problem element that must be estimated, and e is the error vector, which is supposed to be unknown but bounded in some norm kekY ≤ ε.
sup kφ(y) − xkX
(3)
x∈F P Sy
where F P Sy = {x ∈ X : kF x − ykY ≤ ε}
(4)
is the feasible problem element set, i.e. the set of all the problem elements, compatible with the available measurement y and the error bound. Let us now introduce the notion of conditional Chebishev center of a set. Given two sets S, Z ⊆ X, the center of S, “conditioned” to Z, is defined as cZ (S) = arg inf sup kx − zkX . z∈Z x∈S
(5)
Hence, the conditional central algorithm φcc (y) = cM (F P Sy )
(6)
enjoys the following property [11]. Proposition 1: φcc is Y -locally optimal in the class of conditional estimators ΦR Ey (φcc ) ≤ Ey (φ)
∀y , ∀φ ∈ ΦR .
The optimal error Ey (φcc ) is called conditional radius of information. When X = IRn and energy-bounded errors are of concern, the set F P Sy is a n-dimensional ellipsoid. Therefore,
∞ X
hi uk−i .
(7)
i=0
It is assumed that the impulse response sequence h = {hi }∞ i=0 belongs to a linear normed space H, equipped with the norm k · kH . An identification experiment consists in collecting N + 1 noisy input/output pairs {(uk , yk ), k = 0, 1, . . . , N }, related by
(2)
Let M ⊂ X be a p-dimensional linear manifold. The aim is to find a restricted complexity algorithm (or estimator) φ : Y → M, providing an estimate z = φ(y) ∈ M of the problem element x. The class of restricted complexity (or conditional) estimators will be denoted by ΦR . The quality of the estimate is measured according to the worst-case Y -local error Ey (φ) =
yk =
yk =
k X
hi uk−i + vk
(8)
i=0
where vk is the disturbance affecting the k-th measurement, and it has been assumed that uk = 0, for k < 0. The relationship (8) can be rewritten as y N = U N hN + v N where yN hN vN
UN
= = =
=
[y0 y1 . . . yN ]0 [h0 h1 . . . hN ]0 [v0 v1 . . . vN ]0 u0 0 u1 u 0 .. .. . . uN uN −1
... ... .. .
0 0 .. .
...
u0
.
The aim of the identification experiment is to estimate the unknown impulse response h on the basis of measurements and suitable a priori information. A priori information is necessary in order to bound the set of feasible systems, because measurements provide information only on the first N + 1 samples of the system impulse response. More specifically, we assume that v N is bounded in some norm, kv N kY ≤ ε, and the unknown impulse response h belongs to a given subset K of H. It is convenient to introduce the truncation operator TN in H, as TN : h → hN . Hence, (I − TN )h denotes the tail of h and TN (K) is the set of truncated sequences obtained by applying the operator TN to every element of K. A mild assumption on K adopted here, according to most of the literature, is that (I − TN )(K) is a bounded balanced set. Moreover, in order to get rid of unnecessary mathematical complications, it is assumed that a priori information concerns only the tail
3
of the impulse response sequence, i.e. TN K = TN H (see e.g. [8]). Now, define the Feasible System Set F SSy corresponding to the measurements {uN , y N }, i.e. the set of impulse response sequences compatible with the measurements and the a priori information, as F SSy = {h ∈ K : ky N − UN hN kY ≤ ε}.
(9)
Notice that F SSy is bounded if and only if u0 6= 0. We look for an estimate of h in a restricted class of linearly parameterized models M M = {h : h = Mh α, α ∈ IRp }
(10)
where Mh is a linear operator, Mh : IRp → H. M is a linear manifold of H and α is the parameter vector to be identified. In system identification theory, the basis of M is typically chosen as a collection of impulse responses of linear filters, such as Laguerre, Kautz, and other orthonormal functions (see e.g. [16], [17]). In the above context, an estimator is a mapping φ from y N to the parameter space IRp . For a given φ, the local worst-case error is defined as Ey (M, φ) =
sup kh − Mh φ(y N )kH
(11)
h∈F SSy
and it accounts for the maximum “gap” between the selected model and the set of feasible systems. The local error (11) depends on the choice of the model class M. The following result says that, for many norms, it is convenient to select M such that M ⊂ TN H (see [8], [18]). Proposition 2: Assume that the set (I − TN )(K) is balanced and that the norm k · kH satisfies khkrH = kTN hkrH + k(I − TN )hkrH
where TN (F SSy ) = {hN ∈ TN H : ky N − UN hN kY ≤ ε}. Since the term depending on K can be computed a priori, the domain of search for the optimal estimate can be restricted to the finite dimensional space IRN . In particular, Proposition 1 guarantees that the conditional central algorithm φcc (y N )
khkH∞ =
sup |
∞ X
hk e−jωk |,
xk+1 yk
M = {hN ∈ IRN : hN = M α, α ∈ IRp , p < N } where M ∈ IRN ×p defines the chosen pth-order model class M. Then, if Proposition 2 applies, the worst-case error is given by à =
sup
hN ∈TN (F SSy )
khN − M φ(y N )krH
+ sup k(I − h∈K
TN )hkrH
¶1/r
= =
A k xk + w k C k xk + vk
k = 0, . . . , N
(13)
(15)
where xk ∈ IRnx is the state, yk ∈ IRny the output, wk the process error and vk the measurement noise. Let x ˆ0 be a given estimate of the initial state x0 . According to the framework introduced above, equations (15) can be written as y N = FN xN + eN , where xN yN eN
= = =
=
[x00 x01 . . . x0N ]0 0 0 ] [ˆ x00 0 . . . 0 y00 . . . yN 0 0 0 0 0 ] [(ˆ x0 − x0 ) w0 . . . wN −1 v00 . . . vN 0 ... I nx A0 −Inx 0 ... 0 ... A −I 1 n x . .. . . . A −Inx N −1 C0 0 . . . 0 C1 0 ... 0 C ... 2 . .. .. . 0
ω∈[0,2π] k=0
the R.H.S. of (12) is only an upper bound (for r = 1). In the following discussion, we will assume M ⊂ TN H, i.e.
Ey (M, φ)
cM (TN (F SSy ))
khN − M αkH
Consider the linear, time-varying system
FN
Notice that assumption (12) is satisfied for every `q norm, 1 ≤ q < ∞. On the contrary, for the H∞ norm
=
sup
α∈IR hN ∈T (F SS ) N y
B. Set-membership state filtering
(12)
∀φ, ∀y.
arg inf p
(14) provides the worst-case optimal impulse response estimate, among the reduced-order models M. If Proposition 2 does not apply, even if M ⊂ TN H, the R.H.S. of (13) provides an upper bound of the worst-case error Ey (M, φ), and φcc minimizes the value of this upper bound.
for some positive integer r. Then Ey (TN M, φ) ≤ Ey (M, φ)
=
CN
.
The vector eN contains all the uncertainties in the system. An upper bound on some norm of eN is assumed to be known, keN kY ≤ ε. Assume that one wants to estimate xk on the basis of x ˆ0 , y0 , y1 , . . . , yk , for each k = 1, . . . , N (filtering problem). Let x ˆj|j denote the estimate of xj on the basis of the information up to time j (i.e. y j ), and let F F Syj = {xj : kFj xj − y j kY ≤ ε} be the Feasible Filtering Set, containing the state sequences compatible with y j . Then, at time k the desired state sequence estimate has the form x ˆk = [ˆ x00|0 x ˆ01|1 . . . x ˆ0k|k ]0 and can not be freely selected among all the elements of F F Syk .
4
In fact, at time k the filtered estimates x ˆ0|0 . . . x ˆk−1|k−1 are fixed, since they have been already computed at previous time instants. Therefore, the new estimate x ˆk must belong to the set
A. Characterization of the conditional Chebishev center To start with, let us recall a necessary condition for the solution of a general min-max problem (see [21]). Let . SE (z) = {ξ ∈ E : kξ − zk2 = dE (z)}
ˆi|i , i = 0, . . . , k − 1} Mk = {[x00 . . . x0k ]0 : xi = x which is a nx -dimensional linear manifold in IR(k+1)nx . The set-membership state filtering problem can be summarized as follows: at each time k, find an estimate x ˆk ∈ Mk of xk ∈ F F Syk . According to Proposition 1, the conditional central algorithm
be the set of points of E where the maximum in (22) is achieved. Theorem 1: A necessary and sufficient condition for the function dE (z) to achieve its minimum on M, at a point zm ∈ M, is that min
x ˆk|k = φcc (y k ) = cMk (F F Syk ) is the optimal set-membership state filter, according to the worst-case error defined in (3). III. Conditional central algorithm for energy bounded disturbances In this section, an efficient procedure for the computation of the conditional central algorithm is presented for the case when energy bounded disturbances are considered and errors are measured in the energy norm. This means that the `2 norm is used in (2) and (3). In the worst-case identification context, this corresponds to the H2 identification problem which has been widely addressed in the recent literature (see e.g. [17], [19], [20]). For ease of notation, the `2 norm will be denoted by k · k in the following. Without loss of generality, the problem can be reformulated as follows, through a suitable change of coordinates. Problem [CCC (Conditional Chebishev Center)]: For a given axis-aligned ellipsoid E = {x ∈ IRn : x0 Qx ≤ 1} Q = diag{qi }ni=1 , 0 < q1 ≤ q2 ≤ · · · ≤ qn
(16) (17)
and a p-dimensional linear manifold (p < n) M MM M 0zo 0
= {z ∈ IRn : z = z o + M α, α ∈ IRp } = Ip = 0,
(18) (19) (20)
find zcc ∈ M such that zcc = arg min max kx − zk2 . z∈M x∈E
(21)
If the distance of a point z from a set E is defined as dE (z) = max kx − zk2 , x∈E
(22)
we can say that zcc is the element of the linear manifold M, whose distance from the ellipsoid E is minimum. Notice that, while in general (22) is nonconvex, the problem min dE (z)
z∈M
is a convex optimization problem, because dE (z) is convex in z. The aim of this section is to exploit the special structure of this problem, in order to characterize the solution of the conditional center problem.
(23)
g∈R(M ) kgk=1
max
ξ∈SE (zm )
g 0 (ξ − zm ) ≥ 0.
(24)
According to the previous result, we need to characterize the set SE (z) for a generic point z. In order to simplify the discussion, let us assume that q1 < q2 ≤ q3 ≤ . . . , i.e. there is a unique largest semiaxis of the ellipsoid E (the extension to the general case being straightforward). Moreover, let us define z = [z2 . . . zn ]0 and Q = diag{qi }ni=2 . The next preliminary lemmas will be used in the following derivations. Lemma 1: Let z be given. Define the functions c(λ) = z 0 (In − λQ)−2 Qz − 1 and d1 (λ) = k (In − λQ)
−1
λQzk2 .
(25) (26)
If λ1 , λ2 are two real solutions of c(λ) = 0, with λ1 > λ2 , then d1 (λ1 ) > d1 (λ2 ). Proof: See Appendix A. Lemma 2: Let c(λ) and d1 (λ) be defined as in Lemma 1 and set # " µ ¶−1 1 1 0 Qz . (27) d2 = 1 − z In−1 − Q q1 q1 Then, if z1 = 0, for any λ ∈ IR satisfying c(λ) = 0 d2 ≥ d1 (λ) and the inequality is strict if λ 6= q11 . Proof: See Appendix B. Now, we are ready to characterize the function dE (z). Theorem 2: Let z ∈ IRn be given and define the (n − 1)dimensional ellipsoid µ ¶−2 1 E = {z : z1 = 0 and z 0 In−1 − Q Qz ≤ 1} . (28) q1 Then: - If z ∈ E, then SE (z) is a pair and dE (z) = d2 ,
(29)
with d2 given by (27). - If z 6∈ E, then SE (z) is a singleton and ˆ , dE (z) = d1 (λ)
(30)
5
ˆ is the largest real soluwhere d1 (λ) is given by (26) and λ tion of the equation c(λ) = 0, with c(λ) given by (25). Proof: Let z be fixed. From the theory of constrained optimization [22], the first-order necessary condition for x ˆ to be a local maximum of kx−zk2 , subject to the constraint x0 Qx ≤ 1, is that there exists λ ∈ IR such that (In − λQ)ˆ x = z 0 x ˆ Qˆ x = 1.
(31) (32)
Let us discuss the possible solutions of (31)-(32), for different values of λ. (i) Let λ 6= q1i , i ∈ {1, 2, . . . , n}. Then, from (31) x ˆ = (In − λQ)
−1
z
(33)
and kˆ x − zk2 = d1 (λ), defined by (26). Substituting (33) into (32), one gets c(λ) = 0. Then, due to Lemma 1, the real solution of c(λ) = 0 which maximizes the cost d1 (λ) is ˆ Hence, for λ 6= 1 , the unique the largest one, namely λ. qi candidate maximum is ³ ´−1 ˆ x ˆ = In − λQ z. (34) (ii) Let λ = q11 . From (31) it must be z1 = 0. Then, from (31)-(32) one gets two real solutions s µ ¶ ´−2 ³ x ˆ1 = ± q11 1 − z 0 In−1 − q11 Q Qz x ˆ2 ³ ´−1 .. In−1 − q11 Q z . = x ˆn (35) ´−2 ³ 1 0 provided that z In−1 − q1 Q Qz ≤ 1. It is easy to ver-
ify that kˆ x − zk2 = d2 , given by (27). Then, from Lemma 2 one has that d2 ≥ d1 (λ) for any real λ satisfying c(λ) = 0. Hence, when z ∈ E, the solutions (35) exist and the correˆ relative to sponding cost d2 is larger than the cost d1 (λ), the solution (34) in (i). Conversely, when z 6∈ E and z1 = 0, ¡ ¢−2 one has c(λ) = z 0 In−1 − λQ Qz−1, with c( q11 ) > 0 and limλ→∞ c(λ) = −1. Since c(λ) is a continuous decreasing function in [ q11 , +∞), the unique candidate maximum is ˆ > 1 . In this case, x given by (34), with λ ˆ1 = 0. q1
(iii) Let λ = q1i , i ∈ {2, . . . , n}. In order to complete the proof, it remains to show that in this case the solutions of (31)-(32) do not provide a maximum point for dE (z). From constrained optimization theory [22], recall that the solution x ˆ, λ of (31)-(32) must satisfy also the second-order necessary condition, i.e. the matrix In − λQ must be negative semidefinite on the tangent plane {ξ : x ˆ0 Qξ = 0}. It is easy to see that this condition is violated by λ = q1i , ˆ such that i ∈ {3, . . . , n}, for any x ˆ, and by λ = q12 for all x x ˆ2 6= 0. Then, the only remaining case to be examined is λ = q12 , x ˆ2 = 0 (which implies z2 = 0). Now, the solution
of (31)-(32) is given by x ˆi =
x ˆ2
=
zi 1 − qi /q2 0
i 6= 2
(36)
P zi2 qi2 and its cost is d3 = i6=2 (qi −q 2 . It can be shown that it 2) is always dE (z) > d3 . In fact, define the functions c(λ) =
n X i=1 i6=2
zi2 qi −1 (1 − λqi )2
d1 (λ) =
n X λ2 zi2 qi2 . (1 − λqi )2 i=1 i6=2
Notice that c(λ) = c(λ), d1 (λ) = d1 (λ), for all λ 6= q1i . By substituting (36) into (32) one gets c( q12 ) = 0. Then, if ˆ ∈ IR such that z1 6= 0, it is easy to see that the largest λ 1 1 ˆ ˆ c(λ) = 0 satisfies λ > q1 > q2 and hence, due to Lemma 1, ˆ = d1 (λ) ˆ > d1 ( 1 ) = d3 . Otherwise, if z1 = 0, let d1 (λ) q2
" # n X zi2 qi 1 1− . d2 = q1 1 − qi /q1 i=3 Then, following Lemma 2, one gets d2 > d1 (λ), for all λ satisfying c(λ) = 0, λ 6= q11 , and hence also d2 = d2 > d1 ( q12 ) = d3 . Therefore, the maximum of kx − zk2 cannot be achieved at x ˆ given by (36). This completes the proof. According to Theorem 2, the general condition given by Theorem 1 becomes much simpler when applied to the minmax problem (21). In particular, if SE (zm ) is a singleton, SE (zm ) = {ξm }, condition (24) boils down to min
g∈R(M ) kgk=1
g 0 (ξm − zm ) ≥ 0
which implies that the vector ξm − zm must be orthogonal to every element in the span of M , i.e. M 0 (ξm − zm ) = 0.
(37)
We shall exploit condition (37) in order to characterize the solution of the CCC problem when SE (zm ) is a singleton. For this purpose, let us introduce some geometrical quantities. Let M⊥ = {z ∈ IRn : M 0 z = 0} be the (n − p)dimensional linear subspace orthogonal to M, and let Π : IRn → M⊥ be the orthogonal projection operator on M⊥ . Notice that, by (20), z o ∈ M⊥ . Moreover, let Eπ = Π(E) be the (n − p)-dimensional ellipsoid obtained by projecting E on M⊥ . Then, let ξ ∗ = arg max kξ − z o k2 ξ∈Eπ
(38)
be the point of Eπ having maximum distance from z o (if SEπ (z o ) is not a singleton, let ξ ∗ be any element of the solution set). Necessarily, ξ ∗ ∈ δEπ and therefore there exists a unique point x∗ ∈ δE such that Π(x∗ ) = ξ ∗ . Finally, let
6
z ∗ be the orthogonal projection of x∗ onto M. A sketch of the above geometric construction is reported in Fig. 1 for the case n = 2, p = 1. The expressions of z ∗ and x∗ will be derived in Sect. III-B. Now we can prove the following lemma. Lemma 3: i) If SE (zcc ) is a singleton, then zcc = z ∗ . ii) If SE (z ∗ ) is a singleton, then zcc = z ∗ if and only if SE (z ∗ ) = {x∗ }. Proof: i) By contradiction. Assume that zcc = z ∈ M such that z 6= z ∗ and SE (z) = {x}. Then, according to Theorem 1, it must be M 0 (z − x) = 0. From the construction of z ∗ and x∗ , one has that for each x ∈ E and z ∈ M, such that (z − x) ∈ M⊥ , it holds kz − xk ≤ kz ∗ − x∗ k. Therefore, kz − xk ≤ kz ∗ − x∗ k . (39) But since z ∗ is the projection of x∗ onto M, we have also kz ∗ − x∗ k ≤ kz − x∗ k
where M = {z ∈ M : z1 = 0} .
Notice that (41) is a quadratic optimization problem, whose solution can be easily computed in closed form (see e.g. [22]). Details of the complete derivation of z ∗∗ are reported in Sect. III-B. The following lemma characterizes the role of z ∗∗ . Lemma 4: If SE (z ∗ ) 6= {x∗ }, then z ∗∗ ∈ E. Proof: Assume by contradiction that z ∗∗ does not belong to E. Then, according to Theorem 2, SE (z ∗∗ ) is a singleton and dE (z ∗∗ ) = d1 (λ∗∗ ) for some λ∗∗ > q11 such that c(λ∗∗ ) = 0. If we consider d2 in (27) as a function of z, we get from (41) d2 (z) ≥ d2 (z ∗∗ ) ,
(40)
∀z ∈ M, z 6= z ∗ . Therefore, form (39) and (40) we get kz − xk ≤ kz − x∗ k . But this contradicts SE (z) = {x}. ii) Since by construction (x∗ − z ∗ ) ∈ M⊥ , if SE (z ∗ ) = {x∗ } Theorem 1 guarantees that zcc = z ∗ . Conversely, if SE (z ∗ ) = {x}, x 6= x∗ and zcc = z ∗ , one has M 0 (z ∗ −x) = 0 and hence, by construction kz ∗ − xk ≤ kz ∗ − x∗ k. But this contradicts SE (z ∗ ) = {x}. From Lemma 3, it is clear that z ∗ is the only candidate solution of the CCC problem in the set {z ∈ M : SE (z) is a singleton}. Moreover, it is sufficient to check the condition SE (z ∗ ) = {x∗ } to guarantee that z ∗ solves (21). Otherwise, due T to Theorem 2, the solution must belong to the set M E, where E is given by (28). First notice that, if this intersection is empty, then SE (z) is a singleton for any z ∈ M, and hence zcc = z ∗ . Then, recall that, when SE (z) is a pair, dE (z) = d2 given by (27), and define z ∗∗ = arg min d2 (41) z∈M
T ∀z ∈ M E. Moreover, as z1∗∗ = 0 and λ∗∗ > Lemma 2 one has
M
E
x* 0
zo
z* Eπ
M
Fig. 1. A sketch of the geometric construction of x∗ and z ∗
1 q1 ,
from
d2 (z ∗∗ ) > d1 (λ∗∗ ) . Since, according to Lemma 3, SE (z ∗ ) 6= {x∗ } guarantees that zcc ∈ E, we get dE (zcc ) = d2 (zcc ) > dE (z ∗∗ ) which is clearly a contradiction. Finally, we are ready to state the main result of the paper. Theorem 3: The solution of the CCC problem (16)-(21) is given by ½ ∗ z , if SE (z ∗ ) = {x∗ } zcc = (43) z ∗∗ , otherwise.
zcc
Proof: If SE (z ∗ ) = {x∗ }, Lemma 3 guarantees that = z ∗ . Otherwise, zcc = arg
ξ*
(42)
minT
z∈M
E
d2 .
T Since (M E) ⊂ M and, by Lemma 4, z ∗∗ ∈ E, we get zcc = z ∗∗ which completes the proof. Remark 1: The theory developed in this section can be easily extended to the case when q1 = q2 = · · · = qr < qr+1 ≤ · · · ≤ qn , the only significative change concerning Theorem 2. It can be checked that SE (z) is not a singleton only if z1 = z2 = · · · = zr = 0 (i.e. z lays on the (n − r)-dimensional hyperplane orthogonal to the r largest semiaxes of E). Moreover, in this case SE (z) is either a singleton or the boundary of an r-dimensional hypersphere. Nevertheless, all the reasoning following Theorem 2 still holds. Notice that M in (42) is now defined as {z ∈ M : z1 = · · · = zr = 0} and, in general, it is empty if r > p.
7
B. A procedure for the computation of the conditional Chebishev center According to Theorem 3, the computation of zcc boils down to that of z ∗ , x∗ and z ∗∗ . The following propositions provide explicit expressions for these quantities. First, let us consider z ∗ and x∗ , defined through the geometric construction preceding Lemma 3. Proposition 3: Define V ∈ IRn×(n−p) such that V 0 M = 0 and V 0 V = In−p . Then z∗
=
∗
=
x
z o − M (M 0 QM )−1 M 0 QV ξˆ1 [In − M (M 0 QM )−1 M 0 Q]V ξˆ1 .
(44) (45)
where ξˆ1 = arg dE1 (V 0 z o ) = arg max kξ1 − V 0 z o k2 ξ1 ∈E1
(46)
and E1
=
{ξ1 ∈ IRn−p : £ ¤ ξ10 V 0 Q − QM (M 0 QM )−1 M 0 Q V ξ1 ≤ 1} . (47)
Proof: First, notice that the column vectors of the matrix W = [V M ] provide an orthonormal basis for IRn , and let us introduce the change of coordinates x ˜ = W 0 x. In these new coordinates, the sets M and E are given respectively by ¸ · 0 o ¸ · V z 0 ˜ = {˜ α, α ∈ IRp } M z : z˜ = + 0 Ip E˜ = {˜ x: x ˜0 W 0 QW x ˜ ≤ 1} . Now, it is not difficult to check that, in the new coordinates, the ellipsoid Eπ = Π(E) is given by ¸ · ξ1 ˜ , ξ1 ∈ E 1 } . Eπ = {˜ x: x ˜= 0 Define ξˆ1 as in (46) and consider the linear manifold ¸ ¸ · · 0 ξˆ1 ˜ α, α ∈ IRp } , + M1 = {˜ z : z˜ = Ip 0 ˜ By definition of passing through [ξˆ10 0]0 and parallel to M. ˆ ˜ ˜ ˜ and the contact Eπ and ξ1 , we get that M1 is tangent to E, point is given by ¸ · \ ξˆ1 . ∗ ˜ ˜ =x ˜ . M1 E = −(M 0 QM )−1 M 0 QV ξˆ1 ˜ is Moreover, the projection of x ˜∗ onto M · ¸ V 0zo . = z˜∗ . −(M 0 QM )−1 M 0 QV ξˆ1 Finally, going back to the original coordinates, it is easy to verify that W [ξˆ10 0]0 = ξ ∗ in (38), and W x ˜∗ = x∗ , W z˜∗ = ∗ ∗ ∗ z , with z and x given by (44) and (45), respectively.
In order to derive an explicit expression for z ∗∗ , one must solve the quadratic optimization problem (41). Standard calculations allow one to rewrite M as · ¸ 0 M = {z = , z ∈ IRn−1 : z = z o + M α, α ∈ IRp−1 } z (48) where z o and M can be determined from z o and M , by imposing the constraint z1o = 0. Then, the following proposition holds. ¡ ¢−1 Proposition 4: Let D = Q − q1 In Q. Then # " 0 ³ ´ 0 0 . (49) z ∗∗ = In−1 − M (M DM )−1 M D z o Proof: By using (27) and the definition of D, problem (41) can be rewritten as · ¸ 1 ∗∗ 0 z = arg min + z Dz z∈M q1 from which (49) easily follows. Finally, the algorithm for the solution of the CCC problem can be summarized as follows: 1. Compute z ∗ and x∗ , according to Proposition 3. 2. Check if SE (z ∗ ) = {x∗ } by using Theorem 2. If the condition is satisfied, set zcc = z ∗ ; otherwise go to step 3. 3. Compute z ∗∗ according to Proposition 4 and set zcc = z ∗∗ . Notice that the main tasks that must be performed are related to the computation of the distance of a point from an ellipsoid (respectively of V 0 z o from E1 in Proposition 3, and of z ∗ from E for checking the condition in step 2). This can be easily accomplished according to Theorem 2. In particular, it is easy to verify that the equation c(λ) = 0 has a unique real solution for λ > q1f , where 1 f = min{j : zj 6= 0}, and that dc(λ) dλ < 0, ∀λ ∈ ( qf , +∞). Moreover, from the proof of Theorem 2, it can be argued ˆ> 1. that when z1 = 0, SE (z) is a singleton if and only if λ q1 ˆ in (30) can be easily computed within the Therefore, λ desired precision, using any search technique in the semiinfinite interval ( q11 , +∞). Finally, the set SE (z) can be obtained by using equations (34) or (35), depending on SE (z) being a singleton or a pair.
IV. Numerical examples The conditional central algorithm derived in the previous section has been applied to the set membership worst-case identification problem of Section II-A. Input/output data have been generated according to the model (8), with the following assumptions: - the impulse response h belongs to the a priori set K = {h :
∞ X
k=N +1
with 0 < ρ < 1;
h2k ρ−2k < L2 }
8
- the noise sequence is energy bounded
Input: step − Model: Laguerre 8
N 1 X 2 vk ≤ ε 2 . N
7.5
(50)
7
k=0
6.5 estimation error
The worst-case error (11) has been measured in the H2 norm. Recall that in this case Proposition 2 guarantees that the conditional central algorithm provides the optimal worst-case estimate.
6 5.5 5
Input: step − Model: FIR 11
4.5 4
10
3.5
9
estimation error
3 0
5
10
15
20
8
25 30 model order
35
40
45
50
Fig. 4. Comparison of estimation errors: step input, Laguerre model class (solid: conditional central estimator; dashed: central projection estimator)
7
6
5
Input: square wave − Model: Laguerre 9
4
8
5
10
15
20
25 30 model order
35
40
45
50
Fig. 2. Comparison of estimation errors: step input, FIR model class (solid: conditional central estimator; dashed: central projection estimator)
Input: square wave − Model: FIR 12
7.5 estimation error
3 0
8.5
7 6.5 6 5.5 5
11 4.5
estimation error
10
4 0
5
10
15
20
25 30 model order
35
40
45
50
9
8
Fig. 5. Comparison of estimation errors: square wave input, Laguerre model class (solid: conditional central estimator; dashed: central projection estimator)
7
6
the simulation experiments, N = 50 measurements have been collected and the disturbance v has been chosen as an i.u.d. signal satisfying (50), with ε = 0.2. Two dif4 ferent input signals have been used: a step and a square 0 5 10 15 20 25 30 35 40 45 50 model order wave of period 10, both satisfying √1N kuN k2 = 1. MoreFig. 3. Comparison of estimation errors: square wave input, FIR over, two linearly parameterized model classes, based on model class (solid: conditional central estimator; dashed: central pro- FIR and Laguerre series expansion have been considered. jection estimator) The Laguerre models have been formed according to [17], with Laguerre pole equal to 0.75 (the same value chosen The nominal system generating the data is taken from in [8]). Simulation results relative to different input and [8]: model class choices are reported in Figures 2-5. The conditional radius of information (i.e. the worst-case error of H(z) = the φ algorithm) is compared with the estimated worstcc 0.486z 5 + 0.493z 4 + 0.779z 3 + 0.904z 2 + 0.074z + 0.383 case error of the central projection suboptimal algorithm . z 6 − 1.54z 5 + 2.128z 4 − 2.409z 3 + 1.453z 2 − 0.942z + 0.384 provided in [8], for different parametric model orders. The It can be checked that kH(z)kH2 ∼ = 7.26 and H(z) satis- reported plots show the average results over 10 different fies the a priori assumptions for L = 2 and ρ = 0.995. In noise realizations. 5
9
As it can be checked by Figures 2-5, the conditional central algorithm provides a significant improvement of the worst-case estimation error. The error reduction, over model orders ranging from 1 to 10, varies from approximately 15% for the case of step input, to 25% for the case of square wave input. The improvement of estimate reliability is due to two distinct factors. First, the central projection estimator is known to be only suboptimal. Second, the available error associated to this suboptimal estimator is only an upper bound of the true error. Actually, it can be observed that the exact evaluation of the central projection estimator error shows the same level of complexity as the computation of the optimal solution. In the above simulations, it was assumed that the bound ε is known exactly. In order to evaluate the effect of overestimation of the noise bound, the previous results are compared to those obtained in the case ε = 0.3 and ε = 0.4 (respectively, 50% and 100% overestimated noise bound) in Figure 6. It can be observed that while the increase of
Several interesting problems deserve attention for future work. The quality of the estimate provided by suboptimal algorithms, like for example the central projection estimator or the reduced least-squares estimator, needs a deeper investigation. Moreover, very little is known on conditional central algorithms when disturbances are bounded in different norms. For example, the case of l∞ norm in the measurement space, which has been extensively investigated in the recent literature, appears of much interest in practical applications. Appendix A. Proof of Lemma 1 Consider λ1 , λ2 ∈ IR such that λ1 > λ2 and c(λ1 ) = c(λ2 ) = 0. Then, from (25) one gets 0
(a) −− Input: square wave − Model: FIR
estimation error
12
where
10
=
c(λ1 ) − c(λ2 ) i h Pn 1 1 2 = i=1 zi qi (1−λ1 qi )2 − (1−λ2 qi )2 Pn = i=1 γi [2 − (λ1 + λ2 )qi ] γi =
8
zi2 qi2 (λ1 − λ2 ) >0. (1 − λ1 qi )2 (1 − λ2 qi )2
6 4 0
Then 5
10
15
20
25 30 model order
35
40
45
50
d1 (λ1 ) − d1 (λ2 )
(b) −− Input: step − Model: Laguerre
=
10 estimation error
(51)
=
h
i
(52)
8
From (51) we get
6 4 2 0
λ21 λ22 2 2 i=1 zi qi (1−λ1 qi )2 − (1−λ2 qi )2 Pn i=1 γi (λ1 + λ2 − 2λ1 λ2 qi ).
Pn
5
10
15
20
25 30 model order
35
40
45
50
Fig. 6. Estimation errors for different values of the noise bound: ε = 0.2 (solid), ε = 0.3 (dashed), ε = 0.4 (dotted). (a) square wave input, FIR model class; (b) step input, Laguerre model class
the estimation error is moderate, the rate of decrease of the error as the model order grows remains approximately unchanged. V. Conclusions In this paper, the role and optimality of conditional central algorithms in worst-case system identification and state estimation with set membership uncertainty have been investigated. The computation of optimal estimators is a very difficult task in general. If disturbances are bounded according to the energy norm, a closed form for the conditional central estimator can be derived in terms of the unique real root of a polynomial equation in a semiinfinite interval. A comparison of the error of the conditional central estimator with the estimated error of a suboptimal algorithm available in the literature is performed on simulated data, showing the improvement provided by the optimal solution.
X
γi =
i
λ1 + λ 2 X γ i qi 2 i
and hence, substituting into (52) we obtain ¸X n (λ1 + λ2 )2 d1 (λ1 ) − d1 (λ2 ) = − 2λ1 λ2 γi qi > 0. 2 i=1 ·
as it was claimed. B. Proof of Lemma 2 Let z1 = 0. From (25), c(λ) = 0 can be rewritten as n X i=2
zi2 qi . = w0 s = 1 (1 − λqi )2
(53)
q2 qn 0 where s = [z22 . . . zn2 ]0 and w = [ (1−λq 2 . . . (1−λq )2 ] . For 2) n fixed λ satisfying (53), we want to show that d2 − d1 (λ) is nonnegative in the set
L = {s ∈ IRn−1 : w0 s = 1, sj ≥ 0 ∀j} . Since d2 − d1 (λ) is a linear functional of s, it suffices to prove that the claim is true in the vertices of L, i.e. for
10 2 . i) s = si = [0 . . . 0 (1−λq 0 . . . 0]0 , 2 ≤ i ≤ n. Substituting qi i 2 2 [z2 . . . zn ] = s into (26) and (27) one gets
d1
=
d2
=
q i λ2 1 (1 − λqi )2 + q1 (qi − q1 )
from which it is easily obtained that d2 − d1 ≥ 0 for all λ satisfying c(λ) = 0, and d2 − d1 > 0 if λ 6= q11 . References [1]
[2]
[3]
[4] [5] [6] [7]
[8] [9] [10] [11] [12]
[13] [14]
[15] [16] [17] [18]
A. J. Helmicki, C. A. Jacobson, and C. N. Nett, “Control oriented system identification: a worst-case/deterministic approach in H∞ ,” IEEE Transactions on Automatic Control, vol. 36, pp. 1163–1176, 1991. M. Gevers, “Towards a joint design of identification and control,” in H. L. Trentelman and J. C.Willems (Eds.), Essays on Control: Perspectives in the Theory and Its Applications, Berlin, 1993, Birkh¨ auser. R. L. Kosut, G. C. Goodwin, and M. P. Polis (Guest Eds.), “Special issue on system identification for robust control design,” IEEE Transactions on Automatic Control, vol. 37, pp. 899–1008, 1992. M. Milanese and A. Vicino, “Information-based complexity and nonparametric worst-case system identification,” Journal of Complexity, vol. 9, pp. 427–446, 1993. P. M. M¨ akil¨ a, J. R. Partington, and T. K. Gustafsson, “Worstcase control-relevant identification,” Automatica, vol. 31, no. 12, pp. 1799–1819, 1995. P. M. M¨ akil¨ a, “Robust control-oriented identification,” in Proc. of IFAC Symposium on System Identification, Fukuoka, Japan, July 1997, vol. 1, pp. 111–116. G. C. Goodwin, M. Gevers, and B. Ninness, “Quantifying the error in estimated transfer functions with application to model order selection,” IEEE Transactions on Automatic Control, vol. 37, pp. 913–928, 1992. L. Giarr` e, B. Z. Kacewicz, and M. Milanese, “Model quality evaluation in set membership identification,” Automatica, vol. 33, no. 6, pp. 1133–1139, 1997. J. F. Traub, G. W. Wasilkowski, and H. Wo´ zniakowski, Information-based Complexity, Academic Press, San Diego, 1988. M. Milanese and A. Vicino, “Optimal estimation theory for dynamic systems with set membership uncertainty: an overview,” Automatica, vol. 27, no. 6, pp. 997–1009, 1991. B. Z. Kacewicz, M. Milanese, and A. Vicino, “Conditionally optimal algorithms and estimation of reduced order models,” Journal of Complexity, vol. 4, pp. 73–85, 1988. A. Garulli, A. Vicino, and G. Zappa, “Optimal induced-norm and set membership state smoothing and filtering for linear systems with bounded disturbances,” Automatica, vol. 35, no. 5, 1999. L. Giarr` e, M. Milanese, and M. Taragna, “H∞ identification and model quality evaluation,” IEEE Transactions on Automatic Control, vol. 42, no. 2, pp. 188–199, 1997. A. Garulli, A. Vicino, and G. Zappa, “Optimal and suboptimal H2 and H∞ estimators for set-membership identification,” in Proc. of 36th IEEE Conf. on Decision and Control, San Diego, December 1997. B. Z. Kacewicz, “Quality of conditional system identification in a general class of norms,” Tech. Rep. RW 97-02, Institute of Applied Mathematics and Mechanics, Warsaw University, 1997. P. M. J. Van den Hof, P. S. C. Heuberger, and J. Bokor, “System identification with generalized orthonormal basis functions,” Automatica, vol. 31, no. 12, pp. 1821–1834, 1995. B. Wahlberg and P. M¨ akil¨ a, “On approximation of stable linear dynamical systems using Laguerre and Kautz functions,” Automatica, vol. 32, no. 5, pp. 693–708, 1996. A. Garulli, P. Guarnieri, A. Vicino, and G. Zappa, “Set membership identification of mixed parametric/nonparametric models for robust control,” in Prep. 2nd IFAC Symposium on Robust Control Design, Budapest, Hungary, June 25-27, 1997, pp. 95– 100.
[19] L. Giarr` e and M. Milanese, “Model quality evaluation in H2 identification,” IEEE Transactions on Automatic Control, vol. 42, pp. 691–698, 1997. [20] S. R. Venkatesh and M. A. Dahleh, “System identification of complex systems: problem formulation and results,” in Proc. of 36th IEEE Conf. on Decision and Control, San Diego, December 1997, pp. 2441–2446. [21] V. F. Dem’yanov and V. N. Malozemov, Introduction to Minimax, Dover, New York, 1974. [22] D. G. Luenberger, Linear and Nonlinear Programming, Addison-Wesley, 1989.