1246
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 45, NO. 7, JULY 2000
Persistent Identification of Systems with Unmodeled Dynamics and Exogenous Disturbances Le Yi Wang and G. George Yin
Abstract—In this paper, a novel framework of system identification is introduced to capture the hybrid features of systems subject to both deterministic unmodeled dynamics and stochastic observation disturbances. Using the concepts of persistent identification, control-oriented system modeling, and stochastic analysis, we investigate the central issues of irreducible identification errors and time complexity in such identification problems. Upper and lower bounds on errors and speed of persistent identification are obtained. The error bounds are expressed as functions of observation lengths, sizes of unmodeled dynamics, and probability distributions of disturbances. Asymptotic normality and complexity lower bounds are investigated when periodic inputs and LS estimation are applied. Generic features of asymptotic normality are further explored to extend the asymptotic lower bounds to a wider range of signals and identification mappings. Index Terms—Error bounds, exogenous noises, identification, time complexity, uncertainties, unmodeled dynamics.
I. INTRODUCTION
T
HIS paper introduces a novel framework of system identification to capture the hybrid features of systems subject to both deterministic unmodeled dynamics and stochastic disturbances. Using the concepts of persistent identification, control-oriented system modeling, and stochastic analysis, we investigate the central issues of persistent identification errors and time complexity. There have been significant efforts on integrated treatments of unmodeled dynamics and stochastics. Traditionally, unmodeled dynamics were treated through model order selection. For instance, the well-known Akaike’s information criterion (see [1] and the related references of the author on Annals of the Institute of Statistical Mathematics), and Rissanen’s minimum length description [26] uses certain complexity criteria to select model orders. Treating infinite-dimensional autoregressive models [27], [28], Shibata proposed an asymptotically efficient selection of autoregressive model orders. These frameworks can be employed, at least in principle, to treat both unmodeled dynamics and stochastic noises. More recently, in the paradiam of worst case identification, Venkatesh and Dahleh [32] introduced a methodology in which Manuscript received December 1, 1997; revised September 8, 1998, May 8, 1999, and July 11, 1999. The research of the first author was supported in part by the National Science Foundation under Grants ECS-9412471 and ECS9634375, and the research of the second author was supported in part by the National Science Foundation under Grant DMS-9877090. Recommended by Associate Editor, B. Pasik-Duncan. L. Y. Wang is with the Department of Electrical and Computer Engineering, Wayne State University, Detroit, MI 48202 USA. G. G. Yin is with the Department of Mathematics, Wayne State University, Detroit, MI 48202 USA. Publisher Item Identifier S 0018-9286(00)05950-X.
informational constraints are imposed on unmodeled dynamics and temporal constraints are imposed on disturbances. Motivated by the similar issues discussed in this paper such as time complexity, [32] provides a framework in which sample path statistics can be used to characterize classes of disturbances and probing inputs which restore consistency of estimates and polynomial time complexity. The current work adds to the literature a new perspective. The framework is characterized by the following features: 1) Prior information is deterministic on plants (deterministic uncertainty sets) and stochastic on disturbances (stochastic distributions). It follows that in measuring the performance of identification algorithms, identification errors are evaluated against worst case unmodeled dynamics but statistical effects from disturbances. 2) For applications to adaptive control, identification must persist in time in the sense that input signals and identification algorithms must provide satisfactory models for control design for all possible observation windows. As a result, the concept of persistent identification is employed. The following main issues are pursued. 1) Upper and lower bounds on persistent identification errors. 2) Time complexity: How fast can one acquire information about an unknown plant via input/output observations? 3) Input signals selection: How can one characterize probing inputs which can facilitate fast acquisition of plant information, and hence, are desirable for hybrid identification problems? We derive upper and lower bounds on identification errors and obtain the speed of persistent identification. It is revealed that the class of full-rank periodic inputs and the standard least squares estimation possess certain appealing properties for the problems under study. Asymptotic normality and complexity lower bounds are investigated when such inputs and LS estimation procedures are applied. Generic features of asymptotic normality are further explored to extend the asymptotic lower bounds to a wider class of signals and identification mappings. The remainder of the paper is organized as follows. Section II is a prologue that contains the basic notation of this paper. Section III proceeds with the precise problem formulation. Section IV establishes certain upper bounds of estimation errors. Lower bounds on identification errors are presented in Sections V and VI. Starting with general cases in Section V, we first show that identification errors cannot be reduced to the level below the size of unmodeled dynamics for noise-free case. We then deduce expectation and probabilistic lower bounds when noises are present. Section VI concentrates on asymptotic lower bounds. The basic premise of the section is the asymptotic normality. Full rank periodic inputs with the least squares estimates are treated. Some open issues are
0018–9286/00$10.00 © 2000 IEEE
WANG AND YIN: PERSISTENT IDENTIFICATION OF SYSTEMS WITH UNMODELED DYNAMICS
summarized in Section VII. Finally, the paper is closed with an Appendix including the proofs of a number of technical results. Related Literature: The concept of persistent identification in deterministic identification problems was introduced in Wang [33], and Zames et al. [38]. Complexity issues in identification have been pursued by many researchers. The concepts of -net and -dimension in the Kolmogorov sense [16] were first employed by Zames [37] in studies of model complexity and system identification. Time complexity in worst case identification was studied by Tse et al. [31], Dahleh et al. [8], and Poolla and Tikku [25]. Results of -widths of many other classes of functions and operators were summarized in Pinkus [23]. A general and comprehensive framework of information-based complexity was developed in Traub et al. [30]. Milanese is one of the first researchers to recognize the importance of worst case identification. Milanese and Belforte [21], and Milanese and Vicino [22] introduced the problem of set membership identification and produced many interesting results on the subject. Algorithms for worst case identification were developed in Gu and Khargonekar [13], Makila [20], and Chen et al. [7]. A unified methodology which combines worst case identification and probability framework was recently introduced in [32]. The issues of estimation consistency in worst case identification were treated by Kakvoort and Van den Hof [14] in which main ideas from probability frameworks were extended to set-membership descriptions for disturbances. Like its deterministic counterpart, many significant results have been obtained for identification and adaptive control involving random disturbances in the past a few decades. For instance, model validation approaches provide means of obtaining identification error bounds in both deterministic and statistical frameworks. There is a large amount of literature available. Here, we cite only the books by Aström and Wittenmark [2], Caines [5], Chen and Guo [6], Kumar and Varaiya [17], Ljung and Söderström [19], and Solo and Kong [29], among others. For related work in analyzing recursive stochastic algorithms, we refer the reader to the most recent work of Kushner and Yin [18] and the references therein. II. BASIC NOTATION The real numbers, complex numbers, and integers are denoted , and , respectively. For , is its modulus or by is the identity matrix. For a matrix absolute value. denotes its transpose, and is its Frobenius or is the normed space that consists of norm. For , for which sequences
For a positive integer and a sequence is the -truncation operator . Denote
, defined by
1247
and (1) of For a subspace 0 and radius in , and addition, for an operator . We will use the probability.
is the closed ball of center is its complement in . In on to denote the expectation and
III. PROBLEM FORMULATION Consider single-input single-output, discrete-time, stable, with input–output linear-time-invariant (LTI) systems where denotes relationships with , and convolution, the probing input the exogenous disturbance is a sequence of random variables. Conditions on will be given in the subsequent sections. This paper is concerned with open-loop identification problems. Hence, the probing input is deterministic and can be selected , by the designer. arbitrarily, albeit A priori information on is given by a deterministic uncerwhich contains . A priori uncertainty sets tainty set commonly used in deterministic worst case identification are the following decaying-memory types: and , where , for is a monotone nonin. creasing function and the sequence Typical examples are given by systems with exponentially defor some and caying memory . To capture the hybrid nature of a priori deterministic information on and stochastic information on , we introduce a combined framework which characterizes identification errors in a unified manner. A. Selection of Model Spaces The first question in system identification is the selection are of model spaces. Suppose -dimensional subspaces of be the set of all -dimensional used to model . Let and a subspace , we subspaces of . For . Then, define the distance is the optimal worst case by models from . The modeling error in representing optimal modeling error, when the best model space is selected, . is precisely is given by the Kolmogorov -width of [16], [37]. and , the optimal model space turns out to be For defined in (1) (see [23]). For this reason and for simplicity, we as our model space in this paper. It follows that will select can be decomposed into ; and . is the modeled part, is the unmodeled dynamics, and is a measure of model complexity. In this paper, we assume that for a given model complexity , the a priori uncertainty on the plant is .
1248
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 45, NO. 7, JULY 2000
B. Identification Errors After applying an input to the system and taking output observations in the time interval obtain the observation equations
, we
Unlike the approach of worst case identification in which minimization of is sought, we seek identification algorithms to minimize the confidence level. Namely
Furthermore, in persistent identification, starting time of observation windows cannot be fixed. Hence, the worst case over all is considered
where
Or, in a vector form Finally, optimal inputs are employed to achieve (4)
where
Assumptions: is full column rank. A1) For all is a sequence of i.i.d. random variA2) The disturbance ables, with a common distribution that is symmetric with . respect to the origin and A3) The estimators are linear and unbiased. That is, for , depending on some matrix such that for and all , when . all To obtain upper bounds on identification errors, we will need to modify A2) slightly as follows. A2") Condition A2) holds. In addition, the moment generating exists. function . It Define and , alshould be emphasized that depends on though this dependence is suppressed from the notation. Now, , we have the estimation error for any (2) . Note that where is an -dimensional random vector whose distribution depends , input , model uncertainty on the identification mapping , and the distribution of . Also the first term on the last line of (2) is deterministic and the second is stochastic.
In summary, we have the following definition. is called optimal persistent identiDefinition 1: fication error, which is a function of the signal space and identification mapping set . is an intrinsic relationship among estimation errors, observation lengths, and the corresponding probabilities. , we are Especially, for a selected confidence level seeking the minimal observation length defined by (5) is a complexity measure of the identiThe quantity fication problem, which indicates how fast one can reduce the to with confidence , when the size size of uncertainty on of unmodeled dynamics is . We point out in passing that the total estimation errors on the . As a result, in this paper we plant is bounded by will focus on only. D. Class
Inputs and LS Estimation
While probing inputs and identification algorithms are not specified in Definition 1, we pay close attention to the least squares (LS) estimation and periodic signals. For simplicity of for some positive integer . analysis, we only consider Namely, the observation length is a multiple of the model order. has a simple In the case where is an -periodic signal, expression. Due to periodicity of ,
(6) the following class of input signals: is -periodic and full rank , where “full Toeplitz matrix is full rank. rank” means that the The importance of the class stems from some basic observations: 1) In the case of noise-free observations for systems with typical unmodeled dynamics, it was shown in [33] that all are optimal probing signals for persistent identification, and the least squares estimation is an optimal identification algorithm. is 2) The class is feedback invariant, in the sense that if LTI, stable and does not have boundary zeros, then whenever . This property is of particular importance for closed-loop identification where the plant input is the output of Denote by
C. Persistent Identification: General Cases , denote the For a given tolerable identification error by . probability of the event Since unmodeled dynamics is deterministic, we take the worst case over all possible unmodeled dynamics , and arrive at
(3) represents the confidence level for estimation errors to be bounded by .
WANG AND YIN: PERSISTENT IDENTIFICATION OF SYSTEMS WITH UNMODELED DYNAMICS
a feedback mapping from an external input (see, e.g., [34] for detail). and the identification algorithm is the stanSuppose dard least squares estimation
1249
computing the corresponding observation length required to reduce the uncertainty size to with confidence . We select the input to be the -periodic signal with the . As a result, and first components . It follows that (11)
By (6)
It follows from (8) and (11) that the identification error is . Now, for any , the bounded by set inclusions and
for some It follows that the deterministic part of identification errors becomes (7) , for where Hence, the deterministic part is bounded by (8) Here, the full rank condition results in a reduction of estimation errors from unmodeled dynamics. Furthermore, for the stochastic part
(9) , for . Here, where periodicity leads to an averaging in disturbance. When the input is limited to and the identification mapping is specified to the LS estimation, we introduce the following definition. Definition 2:
and
imply that . The last is a sequence of i.i.d. inequality follows from the fact that random variables. with , the probaTherefore, for any given is bounded by bility . To proceed, define , where is the moment generating function of . An exponential upper bound can be obtained for . Lemma 1 (Chernoff): Under A2 ), for any . Proof: See ([24, p. 326]). Remark 1: The bound given in Lemma 1 is tight. Alis bounded above by , it is though for sufficiently large and much larger than . In fact, for any sufficiently small . is symmetric with It is observed that the distribution of respect to the origin, and hence . By Chernoff’s bound, with . Therefore, to have , it suffices that . , and hence . We Note that summarize the discussion above into the following theorem. . An upper bound of Theorem 1: Let is given by , where
Obviously (10) IV. UPPER BOUNDS ON
AND
By (10), any upper bound on will be an . Due to the relationship between upper bound on and , we need only establish upper . Upper bounds on can be bounds on , and then obtained by first selecting a special input
To illustrate, consider a couple of examples below. , where Example 1: Consider the case Now the density is
.
(12) Since the density belongs to the exponential family, a is easily obtainable. In simple representation of
1250
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 45, NO. 7, JULY 2000
fact, the moment generating function takes the form . This leads to
By
virtue
of
Lemma 1, . As in Theorem 1
. The following lemma characterizes conis implied. ditions on under which . Define the manifold , we denote by At the vector the normal vector of at . It can be easily verified that , where
Lemma 2: Suppose that with . or , then . , Proof: Consider the total estimation error . Without loss of generality, we assume that where are nonnegative. It follows that will have all elements of , the number of nonzero elements 0 or 1. Let elements in . can be orthogonally decomposed into components Now, and the corresponding orthogonal complement, along , where and . Then . As a result, since contains only 0 or 1 and and If
where
denotes the smallest integer above .
Example 2: Consider tion. Since the density of
, the uniform distribuis given by (13) otherwise
. Consequently, . Then the upper bound can be computed via Theorem 1. Modeling the disturbance using a uniform distribution is suggested by our previous consideration of worst case analysis under bounded disturbances from a deterministic point of view. Note that when is sufficiently large, the well-known asymptotic normality allows us to approximate the underlying distribution by that of a normal random variable, leading to an upper bound as discussed in Example 1. Remark 2: To obtain upper bounds of the estimation errors, Lemma 1 is sufficient. The essence is that for fixed and , one may be interested in obtaining probabilities or . It exploits detailed asympsuch as totics of the observation disturbances. It should be pointed out that a related result can be obtained in terms of the well-known large deviations result of Gärtner [12]. In deriving such a result, there is no distributional assumption on the disturbance. In need not be independent and identically distributed fact, “white noise.” it is plain that
V. LOWER BOUNDS ON
AND
The estimation error (2) contains errors from deterministic unmodeled dynamics and stochastic disturbances, and depends on the input and identification algorithms. We are seeking and , which can then be used lower bounds on . This section deals with to derive a lower bound on and linear general cases in which all possible inputs unbiased identification mappings are considered. with . The main Suppose that complication in establishing lower bounds stems from the inand . To obtain the lower bounds, it is teraction between , then observed that if , and hence,
It follows that a sufficient condition for . Namely or
is (14)
A. Noise-Free Deterministic Lower Bounds , In the special case of noise-free observations, i.e., . the estimation error is reduced to Theorem 2: The optimal deterministic estimation error is bounded below by (15) Proof: See the Appendix. This theorem shows that no matter how long the observation windows are, how the input signals are selected, and how the identification algorithms are designed, persistent identification errors on cannot be reduced below , the size of unmodeled , this result was obtained dynamics. For the special case by Wang [33]. The general result proved here turns out to be much more difficult to establish. It should also be pointed out that this conclusion is unique in persistent identification problems. If the starting time is fixed, then it can be easily shown that the lower bound is 0. B. Expectation Lower Bounds Theorem 3: The following moment bound holds:
WANG AND YIN: PERSISTENT IDENTIFICATION OF SYSTEMS WITH UNMODELED DYNAMICS
Proof: By Theorem 2, there exist and for has norm . For this determinwhich istic error , the total estimation error becomes and as a result
Proof: Since . Then (14) becomes Therefore
It only remains to show that . It follows that , or on both sides gives us where is the distribution of with respect to the origin
. Since
1251
. Denote or
.
. Observe that . Taking the norm
is symmetric
Namely, . Lemma 3 leads to the following theorem. Its proof is given in the Appendix. Theorem 4: Assume that A1)–A3) are satisfied. Denote the common distribution of the i.i.d. disturbance by . For is bounded below by
and
We will show that for any (16) which will imply that
D. Normal and Uniform Distributions or , tighter In the special case of lower bounds than those of Theorem 4 can be explicitly obtained. Theorem 5: Suppose that the conditions of Theorem 4 are . Then satisfied and
as required. and It remains to prove (16). Let . Then, (16) follows from the following inequalities:
(17)
Proof: First, by Lemma 3, there exists a such that
with
C. Probability Lower Bounds This subsection derives probabilistic lower bounds when the random disturbance has a known distribution. The observation length can be either large or small. with such Lemma 3: There exists a . that
Obviously, in this case is also Gaussian and , where . Since and , we have . , a stanSince given by dard normal random variable has the density
1252
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 45, NO. 7, JULY 2000
(12) and the distribution function . It follows that
E. Remarks on Moving Average Noise , suppose the observation is given , where is a sequence of independent and identically distributed random variables. As before, this may be written as , where For
by
By virtue of [10 Lemma 2, p. 175], for any . Thus, (17) holds. Next, suppose that . The case of the uniform distribution is of particular importance in embedding the paradigm of worst case identification in a stochastic framework. Essentially, the worst case identification may be viewed as a special case of the framework here, in which the disturbances are i.i.d. , and the required confidence level . In and . It will be other words, it is required that in the worst shown that the requirement , which case identification mandates the error bound is the noise/signal ratio. . Theorem 6: Suppose 1) The following lower bounds hold. . If If . If . , then . 2) If Proof: 1) This follows from Theorem 4 and if if 2) By Lemma 3
Thus, we only need to show that . Without loss of generality, assume . . In view of the hypothesis Then . Select satisfying , and define
It is clear that Consequently, for any
Note that is a random vector with independent components. In lieu of A2), assume A2 ) below. is a sequence of indepenA2”) The random disturbance dent and identically distributed random variables, whose distribution is symmetric with respect to the origin, and . To establish upper bounds, let us consider the case of periodic . In such a case, for are given signals . Assumption A2 ) by for have an identical distribution. implies that w.p. 1. In view of Remark 2, the Moreover, as upper bound continues to hold. For lower bounds, we point out that in the derivation of and . Lemma 2 we used mainly the fact No conditions on the distribution of the noise is needed. As for Theorem 3, the moment bound is also distribution free, except the condition that the noise is symmetric with respect to the origin. Likewise, Lemma 3 and Theorem 4 are also independent of particular distribution functions of the noise processes. Hence, we have the following proposition. Proposition 1: Under conditions A1), A2 ), and A3), Lemma 2, Lemma 3, Theorem 3, and Theorem 4 continue to hold. Furthermore, we can prove that Theorem 5 also holds for moving average noise. The result is recorded below. Proposition 2: Suppose the conditions of Proposition 1 are for some . Denote satisfied, and , and , where is as given before. Then the replaced lower bound (17) holds with replaced by and by . and the normal Proof: First, the independence of is a normal random vector with mean assumption imply . As in the proof of Theorem 5, there 0 and covariance . Then is exists a vector satisfying . also normally distributed with mean 0 and variance The rest of the proof is exactly the same as that of Theorem 5.
. and VI. LOWER BOUNDS BASED ON ASYMPTOTIC NORMALITY
This implies that claimed.
as
In this section, we present several lower bounds on and , based on asymptotic normality. The first subsection will focus on the case that the inputs are and the identification algorithm is constrained to the class specified to the LS estimation. The second subsection treats
WANG AND YIN: PERSISTENT IDENTIFICATION OF SYSTEMS WITH UNMODELED DYNAMICS
more general classes of inputs and identification algorithms which verify the asymptotic normality. A. Class
, the main contribution to the lower bound comes from the first term on the right-hand side of
Signals and LS Estimation
signals and LS estimation, the stochastic part in For can be expressed as the estimation error . Since is a sequence of i.i.d. random variables with zero mean and covariance
It follows that
. Denote , which is independent of . The standard central limit theorem (see, for instance, [11, p. 252]) is asymptotically normal. then yields that converges in distribution Lemma 4: Under A1)–A3), , as . to . Since Now, consider where
since . To derive the lower bounds, we would like to assert that (18) However, it is well known that this equation is not true in genis “slowly varying.” In what follows, we first eral unless state a sufficient condition for (18) and then proceed with the desired asymptotic bounds. After establishing the results, we make some remarks regarding alternative choices of . exists Lemma 5: If for some (where denotes the distribution function for all , then (18) holds, where of ), and if means or as . Proof: See [11, Th. 1, p. 517]. Theorem 7: Suppose and Then below by , there exists
set is asymptotically bounded , in the sense that for any such that for all .
Proof: By symmetry
(19) It follows that
By virtue of Lemma 5, equivalent to We have
is a fixed positive real number for the deterministic as . With
is asymptotically .
where denotes the distribution of the standard normal random variable. By [10, Lemma 2, p. 175], for any , where is given by (12). , as desired. The Hence, proof of the theorem is concluded. Remark 3: The essence of the above theorem is the use of , we use Lemma 5. If in lieu of , then we have (see [11, p. 520])
where than
but
. In particular, if , then
is faster
with where denotes . The corresponding lower bounds then can be obtained. is any natural norm, and is a nonsingular Note that if , where matrix, then by ([4], p. 406), is any eigenvalue of . Note that , denotes the Frobenius norm. The above result imwhere plies that
where matrix
Since bound,
1253
denotes any eigenvalue of the . Since for all of these eigenvalues, where is the . largest singular value,
1254
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 45, NO. 7, JULY 2000
Furthermore, since . Consequently, we obtain the following corollary, which is . advantageous since the bound is independent of Corollary 1: The asymptotic lower bound in Theorem 7 can with . be replaced by B. Class
Signals
The development of the previous subsection reveals certain generic features of the results which carry over to the cases beyond the class signals and LS estimation. The main ingredient is the asymptotic normality. In this section, we concentrate on a class of signals and identification mappings that verify the asymptotic normality. The main effort is still on obtaining the . To proceed, we first define lower bounds of the class of signals and identification mappings. Definition 3: Let be the collection of operators and input signals such that stochastic estimation errors satisfy in distribution, where is a positive definite matrix. contains a wide range of random processes The class and identification algorithms. Before proceeding to the lower bounds, consider the following examples first. and be chosen as the least squares Example 3: Let estimation scheme. As demonstrated in the previous sections, under conditions A1)–A3), they belong to class . Example 4: With the absence of the unmodeled dynamics, a necessary condition guaranteeing the estimators to be consistent w.p. 1 or weakly is that the random noise should be averaged w.p. 1 or weakly. Consider a class of out, i.e., there is an operator rescaled operators defined by satisfying as . Note that is symmetric and posihas full column tive definite since we have assumed that rank. , for each , define For
where
denotes the integer part of . Denote and . Assuming A1)–A3), and using a weak convergence arguconverges weakly to ,a ment, one can derive that process with independent Gaussian increments and covariance . Consequently, converges in distribution to a normal random vector with mean 0 and covariance , so such rescaled operators belong to . Example 5: It has not escaped our attention that the condition on the random disturbances can be much relaxed. In fact, we can treat correlated noises of mixing type [3], [9], that include -dependent sequences, moving average processes, and
processes with diminishing correlation. All that is really required is that a central limit theorem holds. We chose the simple condition for presentation in order to reach a wider audience and to communicate the main ideas to many people whose primary interest is in robust design and worst case analysis. In fact, with some modifications, the technique used here works for the stationary -mixing processes. Suppose that is a sequence of stationary -mixing process with 0 mean, mixing , and for some such that rate . Then the corresponding signals belong to the class . Observe that for a given input and an identification map. The requirement ping then implies . Once again that Lemma 2 is in force. Similar to the development of last section, we obtain the following asymptotic lower bounds. , and , Theorem 8: If is asymptotically bounded below by then where . VII. CONCLUDING REMARKS The framework introduced in this paper enables the hybrid natures of deterministic unmodeled dynamics and stochastic disturbances to be treated in a coherent and unified manner. The upper and lower bounds obtained on estimation errors and identification speed provide posteriori uncertainty for robust control on one hand, and complexity properties on the other. In lieu of the uncorrelated “white noise,” much of the analysis of this paper can be carried out for more general exogenous disturbances such as moving average processes, -mixing processes, or functions of mixing processes. There are still many questions and issues whose answers remain open. The lower bounds for most of the systems considered in this paper are obtained by means of the asymptotic normality and large deviation estimates for normal random variables. Can we further improve the bounds? In addition, the framework we proposed immediately suggests a possible application of the nonparametric estimation methods. In particular, the work of Hasminskii and Ibragimov [15] could be of help. Furthermore, various signs appear to suggest that there is a relationship between the results obtained in this paper and information theoretic ideas such as entropy and capacity measures. Nevertheless, completely revealing the connections has not been finished yet. Although involving unmodeled dynamics, it seems still possible to design stopping rules. One of the possibilities is to use rules similar to Yin [36], in which stopping rules were given via asymptotic properties of a stopped stochastic process. For applications to adaptive control, it will be of great importance to extend our results to closed-loop identification of time-varying systems. APPENDIX Before proving Theorem 2, we will reduce (15) to a more tractable form. First, we notice that the identification errors in (15) are pertinent to the noise-free observations, where the estiwhich mation error (2) is reduced to is deterministic.
WANG AND YIN: PERSISTENT IDENTIFICATION OF SYSTEMS WITH UNMODELED DYNAMICS
Secondly, the set of estimation errors
for any then which implies, noting that
contains the set , for some . Indeed, if , , for some , that
(20) . Hence, . or, Consequently, to prove (15) it suffices to show that for any . Furthercan be equivalently expressed as follows: Demore, the set . Then, note by . Now, due to the Toeplitz structure of the matrices and contains all the and , namely, column vectors in where is some matrix whose actual value is irrelevant to our analysis. It follows that
As a result, Therefore, a sufficient condition for Theorem 2 is
.
(21)
1255
unbounded when . This will contradict the fact that which implies the uniform boundedness of . is a column vector in First, we show that if , then along the direction will be expanded to by a factor of at least . is full rank, there is a Since , where is the boundary of such that for some constant . By the hypothesis (22), . contains Since is a vector contained in . The conclusion follows since the vector . where the inequality Denote is full rank. A direct consequence of the is true since previous conclusion is that if there are vectors in distinct , which are on the same direction , the size of along the direction will . be at least , let be the smallest integer for Now, for any . It follows that at any given direction which the input can generate a maximum of vectors in distinct before the size of exceeds . Furthermore, by continuity there exists a neighborhood of such that the input can generate no more than vectors . The class of all such neighborhoods whose directions is an open cover of . Since is compact, contains a finite subcover of . Consequently, the input can generate a total of vectors in distinct before the size no more than exceeds . Since is arbitrary, this proves that of is unbounded when . Since the argument is , the proof is complete. valid for any choice of inputs Proof of Theorem 4: Lemma 3 implies that
We are now ready to prove Theorem 2. Proof of Theorem 2: We will prove by contradiction the following lower bound which is stronger than (21): . Hence, assume that (22) . Let is full Without loss of generality, we assume that Otherwise, if becomes rank for all , which will imply that is full rank full rank at , we may simply reindex as the initial time for all is for our discussions. On the other hand, if none of full rank, the subsequent proof is still valid by restricting our . discussions to the subspace For notational convenience, we express a vector in the polar coordinate, , where is an -dimensional angular variable , defining the direction of in ; and the length of . We shall use to denote the domain of which is a compact set. is selected, which will Now, suppose an input signal generate infinitely many (column) vectors in the regression We will show that under matrices in the set will become hypothesis (22), the sets
For any satisfying , we assume, without loss of generality, that all elements of are non) and is the largest element negative (i.e., . It follows that (by symmetry of )
Similarly, . Therefore, we have . REFERENCES [1] H. Akaike, “Maximum likelihood identification of Gaussian autoregressive moving average models,” Biometrika, vol. 60, pp. 407–419, 1973. [2] K. Aström and B. Wittenmark, Adaptive Control. New York: AddisonWesley, 1989.
1256
[3] P. Billingsley, Convergence of Probability Measures. New York: Wiley, 1968. [4] R. L. Burden and J. D. Faires, Numerical Analysis, 5th ed. Boston, MA: PWS , 1993. [5] P. E. Caines, Linear Stochastic Systems. New York: Wiley, 1988. [6] H.-F. Chen and L. Guo, Identification and Stochastic Adaptive Control. Boston, MA: Birkhäuser, 1991. [7] J. Chen, C. N. Nett, and M. K. H. Fan, “Optimal nonparametric system identification from arbitrary corrupt finite time series,” IEEE Trans. Automat. Contr., vol. 40, pp. 769–776, 1995. [8] M. A. Dahleh, T. Theodosopoulos, and J. N. Tsitsiklis, “The sample complexity of worst case identification of FIR linear systems,” Syst. Contr. Lett., vol. 20, 1993. [9] S. N. Ethier and T. G. Kurtz, Markov Processes, Characterization and Convergence. New York: Wiley, 1986. [10] W. Feller, An Introduction to Probability Theory and Its Applications Volume I, 3rd ed. New York: Wiley, 1968. , An Introduction to Probability Theory and Its Applications Volume [11] II. New York: Wiley, 1966. [12] J. Gärtner, “On large deviations from the invariant measure,” Theory Probab. Appl., vol. 22, pp. 24–39, 1977. [13] G. Gu and P. P. Khargonekar, “Linear and nonlinear algorithms for identification in H with error bounds,” IEEE Trans. Automat. Contr., vol. 37, pp. 953–963, 1992. [14] R. G. Kakvoort and P. M. J. Van den Hof, “Consistent parameter bounding identification for linearly parameterized model sets,” Automatica, vol. 31, pp. 957–969, 1995. [15] R. Z. Hasminskii and I. A. Ibragimov, “On density estimation in the view of Kolmogorov’s ideas in approximation theory,” Ann. Statist., vol. 18, pp. 999–1010, 1990. [16] A. N. Kolmogorov, “On some asymptotic characteristics of completely bounded spaces,” Dokl. Akad. Nauk SSSR, vol. 108, pp. 385–389, 1956. [17] P. R. Kumar and P. Varaiya, Stochastic Systems: Estimation, Identification and Adaptive Control. Englewood Cliffs, NJ: Prentice-Hall, 1986. [18] H. J. Kushner and G. Yin, Stochastic Approximation Algorithms and Applications. New York: Springer-Verlag, 1997. [19] L. Ljung and T. Söderström, Theory and Practice of Recursive Identification. Cambridge, MA: MIT Press, 1983. [20] P. M. Mäkilä, “Robust identification and Galois sequences,” Int. J. Contr., vol. 54, no. 5, pp. 1189–1200, 1991. [21] M. Milanese and G. Belforte, “Estimation theory and uncertainty intervals evaluation in the presence of unknown but bounded errors: Linear families of models and estimators,” IEEE Trans. Automat. Contr., vol. AC-27, pp. 408–414, 1982. [22] M. Milanese and A. Vicino, “Optimal estimation theory for dynamic systems with set membership uncertainty: An overview,” Automatica, vol. 27, pp. 997–1009, 1991. [23] A. Pinkus, n-Widths in Approximation Theory. New York: SpringerVerlag, 1985. [24] R. J. Serfling, Approximation Theorems of Mathematical Statistics. New York: Wiley , 1980. [25] K. Poolla and A. Tikku, “On the time complexity of worst case system identification,” IEEE Trans. Automat. Contr., vol. 39, pp. 944–950, 1994. [26] J. Rissanen, “Estimation of structure by minimum description length,” presented at the Workshop Ration. Approx. Syst., Louvain, France. [27] R. Shibata, “Asymptotically efficient selection of the order of the model for estimating parameters of a linear process,” Ann. Statist., vol. 8, pp. 147–164, 1980.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 45, NO. 7, JULY 2000
[28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38]
, “An optimal autoregressive spectral estimate,” Ann. Statist., vol. 9, pp. 300–306, 1981. V. Solo and X. Kong, Adaptive Signal Processing Algorithms. Englewood Cliffs, NJ: Prentice-Hall, 1995. J. F. Traub, G. W. Wasilkowski, and H. Wozniakowski, InformationBased Complexity. New York: Academic, 1988. D. C. N. Tse, M. A. Dahleh, and J. N. Tsitsiklis, “Optimal asymptotic identification under bounded disturbances,” IEEE Trans. Automat. Contr., vol. 38, pp. 1176–1190, 1993. S. R. Venkatesh and M. A. Dahleh, “Identification in the presence of classes of unmodeled dynamics and noise,” IEEE Trans. Automat. Contr., vol. .42, pp. 1620–1635, 1997. L. Y. Wang, “Persistent identification of time varying systems,” IEEE Trans. Automat. Contr., vol. 42, pp. 66–82, 1997. L. Y. Wang and J. Chen, “Persistent identification of unstable LTV systems,” presented at the Proc. 1997 CDC Conf., San Diego, CA, 1997. L. Y. Wang and L. Lin, “Persistent identification and adaptation: Stabilization of slowly varying systems in H ,” IEEE Trans. Automat. Contr., vol. 43, 1998. G. Yin, “A stopping rule for least-squares identification,” IEEE Trans. Automat. Contr., vol. 34, pp. 659–662, 1989. G. Zames, “On the metric complexity of causal linear systems: "-entropy and "-dimension for continuous time,” IEEE Trans. Automat. Contr., vol. AC-24, pp. 222–230, 1979. G. Zames, L. Lin, and L. Y. Wang, “Fast identification n-widths and uncertainty principles for LTI and slowly varying systems,” IEEE Trans. Automat. Contr., vol. 39, pp. 1827–1838, 1994.
Le Yi Wang received the M.E. degree in computer control from the Shanghai Institute of Mechanical Engineering, China, in 1982 and the Ph.D. degree in electrical engineering from McGill University, Montreal, PQ, Canada, in 1990. Since 1990, he has been with Wayne State University, Detroit, MI, where he is currently an Associate Professor in the Department of Electrical and Computer Engineering. His research interests include H optimization, robust control, time-varying systems, system identification, and adaptive systems, as well as hybrid and nonlinear systems with automotive applications. Dr. Wang was awarded the Research Initiation Award in 1992 from the National Science Foundation. He also received the Faculty Research Award from Wayne State University in 1992 and the College Outstanding Teaching Award from the College of Engineering, Wayne State University, in 1995. He is an Associate Editor of the IEEE TRANSACTIONS ON AUTOMATIC CONTROL.
G. George Yin received the B.S. degree in mathematics from the University of Delaware, Newark, in 1983 and the M.S. degree in electrical engineering and the Ph.D. degree in applied mathematics, both from Brown University, Providence, RI, in 1987. Subsequently, he joined the Department of Mathematics, Wayne State University, where he is currently a Professor. Dr. Yin served on the editorial board of Stochastic Optimization & Design, the Mathematical Review Date Base Committee, and various conference program committees. He was the Editor of SIAM Activity Group on Control and Systems Theory Newsletters, the SIAM Representative to the 34th CDC, and Co-Chair of 1996 AMS-SIAM Summer Seminar in Applied Mathematics. He has been an Associate Editor of IEEE TRANSACTIONS ON AUTOMATIC CONTROL.