Automatica 38 (2002) 1035 – 1043
www.elsevier.com/locate/automatica
Brief Paper
Recursive 4SID algorithms using gradient type subspace tracking Hiroshi Okua; ∗ , Hidenori Kimurab a Systems
and Control Engineering Division, Faculty of Applied Physics, University of Twente, P.O. Box 217, 7500 AE Enschede, The Netherlands of Complexity Science and Engineering, Graduate School of Frontier Sciences, University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
b Department
Received 29 November 1999; received in revised form 6 August 2001; accepted 26 November 2001
Abstract Sometimes we obtain some prior information about a system to be identi2ed, e.g., the order, model structure etc. In this paper, we consider the case where the order of a MIMO system to be identi2ed is a priori known. Recursive subspace state-space system identi2cation algorithms presented here are based on the gradient type subspace tracking method used in the array signal processing. The algorithms enable us to estimate directly the subspace spanned by the column vectors of the extended observability matrix of the system to be identi2ed without performing the singular value decomposition. Also, a new convergence proof of the gradient type subspace tracking is given in this paper. Under the condition of a step size between 0 and 1, we prove the convergence property of the recursive equation of the gradient type subspace tracking. A numerical example illustrates that our algorithm is more robust with respect to the choice of the initial values than the corresponding PAST one. ? 2002 Elsevier Science Ltd. All rights reserved. Keywords: Identi2cation algorithms; Subspace methods; Gradient methods; Convergence proofs; Matrix inversion; Adaptive 2lters
1. Introduction In this paper, we are concerned with on-line recursive subspace state-space system identi2cation. A great deal of attention has been paid to subspace state-space system identi2cation (4SID) for the last decade, and a number of papers have been published, e.g., Moonen, De Moor, Vandenberghe, and Vandewalle (1989), Van Overschee and De Moor (1994, 1996), Verhaegen (1993, 1994), and Verhaegen and Dewilde (1992a,b). In 4SID, it is important to obtain an estimate of the extended observability matrix of a system to be identi2ed. According to it, the state-space representation of an identi2ed model is determined. In o?-line 4SID, the extended observability matrix is derived from the singular value decomposition of a certain matrix made of given input and output (I=O) data. The larger the number of I=O data is, the more computational complexity is necessary to perform singular value decomposition of it. To remedy such computational burden, several results with respect to recursive algorithms of 4SID methods, specially MOESP (the MIMO output-error state-space model This paper was not presented at any IFAC meeting. This paper was recommended for publication in revised form by Associate Editor H. Hjalmarsson, under the direction of Editor Torsten SBoderstrBom. ∗ Corresponding author. Tel.: +31-53-4894389; fax: +31-53-4893184. E-mail address:
[email protected] (H. Oku).
identi2cation) methods, have been reported (Cho, Xu, & Kailath, 1994; Verhaegen & Deprettere, 1991). The authors have presented the recursive algorithms of the ordinary MOESP and the PO-MOESP (the ordinary MOESP scheme with instrumental variables constructed from past input and output measurement) based on the matrix inversion lemma (Oku & Kimura, 1999). All the recursive algorithms proposed in the three articles are the ways of compressing given I=O data recursively into a matrix with a 2xed size, not the ones of updating an estimate of the extended observability matrix directly. In order to obtain the estimate, it is necessary to perform singular value decomposition of the data-compressed matrix at every update step. For on-line identi2cation, the computation of the singular value decomposition has been the bottleneck in recursi3cation of 4SID. Under the assumption that the order of a system to be identi2ed is a priori known, Gustafsson, Lovera, and Verhaegen (1998) presented recursive algorithms which directly update an estimate of the extended observability matrix. To be exact, they focused on how to update a vector to be fed into the IV-PAST (IV: instrumental variable, PAST: projection approximation subspace tracking) algorithm given by Gustafsson (1998), Gustafsson (1997). The PAST algorithm was originally introduced into the array signal processing by Yang (1995,1996). He considered a minimization problem with respect to a
0005-1098/02/$ - see front matter ? 2002 Elsevier Science Ltd. All rights reserved. PII: S 0 0 0 5 - 1 0 9 8 ( 0 1 ) 0 0 2 8 6 - 2
1036
H. Oku, H. Kimura / Automatica 38 (2002) 1035–1043
fourth-order cost function of a tall matrix variable to retrieve a signal subspace from measurement contaminated by temporally and spatially white noise. He showed that the global minimizer of the cost function corresponds to the desired signal subspace, and that, moreover, the function has no local minimum except for the global minimum. The PAST algorithm was derived by approximately reducing the minimization problem to an exponentially weighted linear least-squares optimization problem by making use of the previous estimate at each update step. Therefore, the algorithm has as good convergence property as such weighted least-squares algorithm. However, because of the approximation, an estimate converges to a matrix whose column vectors span a slightly di?erent subspace from the one obtained by the original minimization problem. The IV-PAST algorithm is its extension by introducing the instrumental variable to the case where the noise is not spatially white. In this paper, we also consider the case where the order of a system to be identi2ed is a priori known. Although we also apply the same minimization problem to the recursive 4SID, we adopt the gradient type subspace tracking to search for the global minimizer. A numerical example in this paper illustrates that our algorithm is more robust with respect to the choice of the initial values than the PAST one. This is the remarkable advantage of our algorithm. On the gradient algorithms corresponding to the cost function considered in this paper, any information about the step size was not displayed except that it was a “nonnegative suitable” (Yang, 1995) or “small constant” (Yang & Kaveh, 1988) number. However, in this paper we show the convergence property of the algorithms presented here under the assumption that the step size is on (0,1). In a sense, we obtain the upper and lower bounds of the step size of these algorithms. This paper is organized as follows. In Section 2, the general problem is formulated and several notations used here are introduced. Several assumptions related to feasibility of PO-MOESP are presented. In Sections 3 and 4, however, we concentrate on considering recursive estimation of the extended observability matrix for a special case where an output error model is contaminated by white noise with a diagonal covariance matrix. In Section 4, a gradient-based update algorithm of an estimate of the extended observability matrix is proposed along with its convergence property. This algorithm can be extended easily to the general case in Section 5. Section 6 gives a numerical example to illustrate the advantage of our algorithm in comparison with the corresponding PAST one. The di?erentiation of a scalar function J of a matrix variable E, denoted by @J=@E, is adopted in the following manners. Suppose every entry of E should change independently of each other. Then, the (i; j)-entry of @J=@E is de2ned as (@J=@E)ij := @J=@eij ; where eij denotes the (i; j)-entry of E. For example, with a constant square matrix A, we have @=@E Trace(E T AE) = 2AE (Athans, 1968).
2. Problem formulation and notations 2.1. General problem formulation Let A 2nite dimensional LTI system to be considered be represented as xk+1 = Axk + Buk + Fwk ;
(1a)
yk = Cxk + Duk + Gwk ;
(1b)
y˜ k = yk + vk ;
(1c)
n
r
m
p
where xk ∈ R , uk ∈ R , yk ; vk ; y˜ k ∈ R and wk ∈ R . It is assumed that the order n of system (1) is known. Let the coeOcient matrix A be stable and the pair (A; C) be observable. It is assumed that a sequence {uk } is a realization of an ergodic process with the bounded fourth order moment, Ljung (1999), and that there exists an integer N0 such that for a given integer satisfying ¿ n + 1, a given input sequence {ui } and ∀N ¿ N0 N
u (j)u (j)T
(2)
j= T · · · ujT ]T . (If there is nonsingular, where u (j) := [uj−+1 is no data available at j ¡ 0, uj = 0 is imposed for j ¡ 0). Eq. (2) is a kind of persistent excitation condition and it guarantees some matrices to be invertible. Let N . The process noise wk and the measurement noise vk are assumed to be independently identical distributed zero-mean Gaussian white noise sequences and to have no correlation with ui for ∀i and initial state x0 . They are assumed to satisfy Qw S wk T T [wk vk ] = : E vk S T Qv
Let the pair (A; [B FQw1=2 ]) be controllable. This means that all modes of system (1) are excited by either the input uk and=or the noise wk (Overschee & Moor, 1996). Additionally, let the output yk be ergodic. Our goal is to derive a scheme of estimating unknown coeOcient matrices (A; B; C; D) recursively from given I=O data, without being inRuenced by the process and measurement noises. 2.2. Notations For a 2nite sequence of input–output (for short, “I=O”) data {(uj ; yj )}; j = 1; : : : ; N , u (i) ∈ Rr , y (i) ∈ Rm are de2ned, with an integer satisfying ¿ n + 1, as follows: T u (i) := [ui−+1
···
uiT ]T ;
T y (i) := [yi−+1
···
yiT ]T :
(3)
For the process and measurement noises, w (i) and v (i), and the contaminated measurement y˜ (i), they are de2ned
H. Oku, H. Kimura / Automatica 38 (2002) 1035–1043
in the same manner as (3), respectively. Then, from (1), it follows that y (i) = xi−+1 + H u (i) + K w (i);
(4)
where := [C T (CA)T H := K :=
(CA−1 )T ]T ;
···
D
0
CB
D
.. .
.. .
CA−2 B
CA−3 B
..
.
0 G
.. .
.. .
CA−2 F
CA−3 F
;
..
.
(If the system start at time instant 0, I=O data up to the time instant 0 are imposed on the zero vectors, i.e., · · · = u−2 = u−1 =0 and · · ·=y−2 =y−1 =0.) The matrix )N plays a role of an instrumental variable in 4SID identi2cation (Jansson & Wahlberg, 1996). Following the references of Jansson and Wahlberg (1997, 1998) and Verhaegen (1994), the input and the noise are assumed to be persistently exciting, such that XN −+1 1 [)NT UNT ] ¿ 0: lim (14) N →∞ N − + 1 UN We further assume that for ∀N ¿ N0 )N −1 1 [)NT −1 UNT−1 ] ¿ 0: N − UN −1
··· D
G CF
(5)
:
,˜ N := (Y˜ N 'U⊥N )NT -N )N 'U⊥N ) ×(Y˜ N 'U⊥N )NT -N )N 'U⊥N )T
Arranging Eq. (4) from i = to N yields (6)
where UN := [u ()
u ( + 1)
···
u (N )];
(7)
YN := [y ()
y ( + 1)
···
y (N )];
(8)
XN −+1 := [x1
x2
···
WN := [w ()
w ( + 1)
xN −+1 ]; ···
w (N )]:
(15)
Let the N th compressed I=O data matrix ,˜ N be de2ned as follows:
··· G
YN = XN −+1 + H UN + K WN ;
1037
(9) (10)
Note that UN and YN are r×(N −+1) and m×(N −+1) matrices which consist of N input and output data, respectively. ∈ Rm×n and H ∈ Rm×r are said to be an extended observability matrix and a lower triangular block Toeplitz matrix containing the Markov parameters, respectively. De2ning Y˜ N in the same manner as (8), the following relation is obtained from (1c) and (6):
T = Y˜ N 'U⊥N )NT -N )N 'U⊥N Y˜ N ;
(16)
where -N := ()N 'U⊥N )NT )−1
(17)
is well de2ned because of the persistent excitation condition (15). Note that the matrix ,˜ N can be made of the input and measurement data. Remark 1. An estimate of the extended observability matrix; denoted by EN ; can be given by the singular value decomposition (or equivalently eigenvalue decomposition) of the real symmetric matrix ,˜ N ; i.e.; T EN Sn ⊥ ,˜ N = [EN EN ] ; (18) S. (EN⊥ )T
Notice that the matrix is invertible because (2) is invertible. We de2ne the following two projection matrices:
where the diagonal entries of the n × n diagonal matrix Sn consist of the n largest singular values of ,˜ N . Moreover; according to the reference of Verhaegen and Dewilde (1992a); from EN ; EN⊥ ; YN UNT and PN := (UN UNT )−1 ; estimates of the coeOcient matrices of the system (ATN ; BTN ; CTN ; DTN ) can be calculated up to similarity transformation TN ; that is; the following relations hold for a suOciently large N :
'UN := UNT (UN UNT )−1 UN ;
ATN = TN ATN−1 ;
Y˜ N = YN + VN = XN −+1 + H UN + K WN + VN : (11) UN UNT
'U⊥N := I − 'UN :
(12)
We introduce the following matrix )N ∈ R(r+m)×(N −+1) , which is the so-called regressor matrix u (0) u (1) · · · u (N − ) )N := y (0) y (1) · · · y (N − ) =: [*(0)
*(1)
···
*(N − )]:
(13)
CTN = CTN−1 ;
BTN = TN B; DTN = D:
(19) (20)
For the details about how to obtain the coeOcient matrices; see Oku (2000); Oku and Kimura (1999). Hence; we develop the methods of obtaining an estimate EN of the extended observability matrix recursively.
1038
H. Oku, H. Kimura / Automatica 38 (2002) 1035–1043
3. Special case with restrictive assumptions of wk ≡ 0 and QC = C2 I In this and the next sections, we consider a special case where there is no process noise disturbing the system to be identi2ed, i.e., wk ≡ 0 for ∀k, and the covariance of the measurement noise Qv is equal to 1v2 I . This case is the so-called ordinary MOESP problem setting (Verhaegen & Dewilde, 1992a). In this case, system (1) can be described as follows:
n of a system to be identi2ed is a priori known, the column subspace of the extended observability matrix can be estimated directly by applying the technique of the subspace tracking in the array signal processing to Eq. (22) in Algorithm 1. In this paper we focus on the gradient method which is one of the subspace tracking methods (Yang, 1995). 4.1. Interpretation of e˜ N from the viewpoint of array signal processing
xk+1 = Axk + Buk ;
(21a)
Let us denote by eN the vector which is made by leaving out the noise terms in the right-hand side of (25), i.e.,
yk = Cxk + Duk :
(21b)
eN := y (N ) − YN −1 UNT−1 PN −1 u (N ):
y˜ k = yk + vk :
(21c)
The input is assumed to be satisfy (2). Note that, in this problem setting, the compressed I=O data matrix ,˜ N can be T de2ned simply as ,˜ N := Y˜ N 'U⊥N Y˜ N . For the details, see Oku (2000) and Oku and Kimura (1999). The following algorithm is a recursive 4SID one for this case (Oku & Kimura, 1999). This algorithm is based on the matrix inversion lemma, and is an indirect recursive algorithm with respect to an estimate of the extended observability matrix. De2ne PN := (UN UNT )−1 . Algorithm 1 (Oku & Kimura, 1999). Suppose ,˜ N −1 ; T ˜ PN −1 and Y N −1 UN −1 have already been obtained. When a pair of the N th I=O data (uN ; y˜ N ) is obtained; we can construct u (N ) and y (N ) from (3). Then; the N th compressed I=O data matrix ,˜ N can be updated by the following recursion: ,˜ N = ,˜ N −1 + 2N e˜ N e˜TN
(22)
together with the following equations: PN = PN −1 − 2N PN −1 u (N )u (N )T PN −1 ;
(23)
2N = (1 + u (N )T PN −1 u (N ))−1 ;
(24)
e˜ N = y˜ (N ) − Y˜ N −1 UNT−1 PN −1 u (N );
(25)
Y˜ N UNT = Y˜ N −1 UNT−1 + y (N )u (N )T :
(26)
Remark 2. At every update step; we require the eigenvalue decomposition of the compressed I=O data matrix ,˜ N to obtain an estimate of the extended observability matrix. This is why Algorithm 1 is called an indirect one. Note that it is not necessary to know the order of a system a priori. 4. Direct recursive algorithm for the special case In this section, a direct recursive estimation method for the extended observability matrix of a system contaminated only by measurement noise is considered. When the order
(27)
Notice that for ∀N ¿ N0 the vector eN in (27) belongs to the subspace spanned by the column vectors of the extended observability matrix because, from (4) and (6) with taking w (N ) = 0 and WN = 0 into account, eN = (xN −+1 − XN − UNT−1 PN −1 u (N )): From now on, let the notation span stand for the subspace mentioned above. Then, from (25) and (27), the vector e˜ N is described as follows: e˜ N = y (N ) − YN −1 UNT−1 PN −1 u (N ) + v (N ) − VN −1 UNT−1 PN −1 u (N ) = eN + 3eN ;
(28)
3eN := v (N ) − VN −1 UNT−1 PN −1 u (N ).
Note that the where term 3eN in (28) comes from the measurement noise vk . Suppose e˜ N = 0 for ∀N . Now, de2ne the normalized vector 4˜N as eN 3eN e˜ N = + = : 4N + 34N : 4˜N := (29) e˜ N e˜ N e˜ N Since the vector 4N belongs to span , there exists a vector sN ∈ Rn such that 4N = sN :
(30)
Then, we obtain 4˜N = sN + 34N :
(31)
Regarding 4˜N ; ; sN and 34N as an observed data vector, a signal subspace, a random source vector and an additive noise, respectively, in spatial domain spectral analysis (Yang, 1995) allows us to apply subspace tracking methods to estimate span . For suOcient large N (¿ N0 ) let the correlation matrix of 4˜N be de2ned as
T C4 := E[4˜N 4˜N ]:
(32)
Note that, if the e?ect of the measurement noise vk is relatively small, the subspace spanned by the n dominant eigenvectors of C4 coincides with span . All eigenvalues of C4 are obviously not greater than 1 because the vector 4˜N is with
H. Oku, H. Kimura / Automatica 38 (2002) 1035–1043 T norm 1. To see the existence of (32), since 4˜N 4˜N F =
1 almost everywhere and p(4˜N ) d 4˜N = 1, where p(4˜N ) denotes the probability density function of 4˜N , the integral T 4˜N 4˜ F p(4˜N ) d 4˜N N
exists and hence T 4˜N 4˜N p(4˜N ) d 4˜N
Algorithm 2. The initial value EN0 is assumed to be any column-full-rank matrix whose column vectors are normalized. Assume that the parameter 6; which is called a step size; satis3es 0 ¡ 6 ¡ 1. Suppose EN −1 ; Cˆ 4; N −1 ; PN −1 and Y˜ N −1 UNT−1 have already been obtained. When the N th I=O data (uN ; yN ) is obtained; the N th estimate of the extended observability matrix EN can be updated recursively as 6 EN = EN −1 − (−2Cˆ 4; N + Cˆ 4; N EN −1 ENT −1 2 + EN −1 ENT −1 Cˆ 4; N )EN −1 ;
also exists.
1039
(36)
Remark 3. We make use of the normalized 4˜N instead of e˜ N itself because we need only the direction of e˜ N in order to estimate span . Moreover; the normalization of e˜ N enables us to analyze the stability of the algorithm shown below.
T 1 ((N − N0 )Cˆ 4; N −1 + 4˜N 4˜N ) N − N0 + 1 together with the four equations from (23) to (26); and the normalized vector 4˜N de3ned by (29).
4.2. A direct recursive algorithm based on the subspace tracking
4.3. The convergence of Algorithm 2
The following theorem due to Yang (1995) is useful for our purpose. (For the proof, see the proofs of Theorems 1 and 2 in the reference.) Theorem 4 (Yang, 1995). Consider the following scalar function with respect to a m × n matrix variable E:
J (E) := E 4˜N − EE T 4˜N 2 : (33) Then; (1) E is a stationary point of J (E) if and only if there exists an n × n unitary matrix T such that E = FC T ; where FC ∈ Rm×n contains n distinct eigenvectors of C4 . At each stationary point; J (E) equals to the sum of eigenvalues whose eigenvectors are not involved in FC . (2) All stationary points of J (E) are saddle points excepts when FC contains the n dominant eigenvectors of C4 . In this case; J (E) attains the global minimum. Using (32), Eq. (33) is rewritten as follows: = Trace C4 − 2 Trace(E T C4 E) + Trace(E T C4 EE T E): Then, the di?erentiation of J (E) with respect to E yields @J (34) = −2(C4 (I − EE T ) + (I − EE T )C4 )E: @E For a practical use, we replace the correlation matrix C4 by the following 2nite summation: N T 1 4˜i 4˜i N − N0 + 1 i=N 0
or
Let the eigenvalue decomposition of C4 be denoted by T Eopt 0 8 s ⊥ ] C4 = [Eopt Eopt ⊥ T 0 8n (Eopt ) = EC 8C ECT ;
Cˆ 4; N = 0 if N ¡ N0 : (35)
Now, we obtain the following algorithm which is an application of the gradient method.
(37)
where the diagonal entries of the diagonal matrices 8s ∈ Rn×n and 8n ∈ R(m−n)×(m−n) are the eigenvalues of C4 which are arranged in descending order. Note that 8s ¡ I since all the eigenvalues of the semipositive real symmetric matrix C4 are less than 1. From Theorem 4 and Eq. (34), Eopt satis2es @J T T = −C4 (I − Eopt Eopt )Eopt + (I − Eopt Eopt )C4 Eopt = 0: @E E=Eopt We can now establish the convergence of Algorithm 2. Theorem 5. In (36); suppose that we obtain the precise covariance matrix C4 at every update step; i.e.; Cˆ 4; N = C4 for ∀N . It is assumed that the noise e?ect is su@ciently small; i.e.; I 8n
J (E) = E[Trace(4˜N − EE T 4˜N )(4˜N − EE T 4˜N )T ]
Cˆ 4; N :=
Cˆ 4; N =
and 1si
1sn 1n1 ;
(38)
1ni )
where (or means the ith diagonal entry of a diagonal matrix 8s (or 8n ). Then; Eopt is a locally asymptotically stable point of recursion (36). Proof. See Appendix A. 5. A recursive algorithm for the case with both process and measurement noises In this section, we consider a direct recursive 4SID algorithm for the general case where a system is described by (1). To begin with, we show the previous result of the indirect algorithm which has been presented by Oku and Kimura
1040
H. Oku, H. Kimura / Automatica 38 (2002) 1035–1043
(1999). This algorithm recursively updates the compressed I=O data matrix ,˜ N in (16) based on the matrix inversion lemma. Algorithm 3 (Oku & Kimura, 1999). Suppose ,˜ N −1 ; PN −1 ; )N −1 UNT−1 ; Y˜ N −1 UNT−1 and -N −1 have already been obtained at the previous step. If a pair of the N th I=O data (uN ; y˜ N ) is obtained; the N th compressed I=O data matrix T ,˜ N := Y˜ N 'U⊥N )NT -N )N 'U⊥N Y˜ N can be updated recursively as the following procedure: First, update 2N , e˜ N by (24) and (25), respectively, and qN and