IDENTIFICATION OF NONLINEAR ERRORS-IN-VARIABLES MODELS

Report 1 Downloads 80 Views
Copyright © 2002 IFAC 15th Triennial World Congress, Barcelona, Spain

IDENTIFICATION OF NONLINEAR ERRORS-IN-VARIABLES MODELS István Vajk, Jenő Hetthéssy Department of Automation and Applied Informatics Budapest University of Technology and Economics H-1521 Budapest, Hungary fax : (361) 463-2871 and e-mail : [email protected]

Abstract: The paper is about a generalization of a classical eigenvalue-decomposition method originally developed for errors–in-variables linear system identification to handle an important class of nonlinear problems. A number of examples are presented to call the attention to the most critical part of the procedure turning the identification problem to a generalized eigenvalue-eigenvector calculation problem with symmetrical matrices. The elaborated method generates consistent parameter estimation. Simulation results demonstrate the effectiveness of the proposed algorithm. Copyright © 2002 IFAC Keywords: errors-in-variables model, nonlinear system identification, eigenvalueeigenvector decomposition.

1. INTRODUCTION The problem of identifying a model from noisy input and output measurements is known as the errors-invariables (EIV) problem. Considering linear systems a number of methods were elaborated in the last few years to handle the EIV problem. E.g. Castaldi et al. (1996), Chou et al. (1997) and Heij et al. (1999) presented results for the EIV model based identification. The nonlinear problem has been exposed in Amemiya et al. (1988) and Vandersteen (1997). The central problem discussed in this paper is an extension of one of the EIV methods available for linear models to handle nonlinear problems. The Koopmans method (Koopmans, 1937) is a wellknown classical method, which has been serving as a firm theoretical basis for a number of identification algorithms for decades. Using the Koopmans algorithm Levin (Levin, 1964, Fernando et al., 1985) proposed methods for identification of EIV dynamic linear system. This algorithm can be considered the ancient version of the nowadays-popular subspace

algorithms. As a major difference Levin’s algorithm is based on eigenvalue-eigenvector calculations, while the subspace algorithms are using singular value decomposition resulting in improved numerical robustness. The goal of this paper is to extend the capability of the eigenvalue-eigenvector calculation based algorithms to handle nonlinear system identification problems. The presented algorithm exhibits procedural similarities with the approach followed in Vandersteen (1997), however, the a priori conditions and – consequently – the elaborated algorithm are completely different. The paper is organized as follows. First an overview of the Koopmans - Levin method will be given for linear systems. Then a simple example will be used as a vehicle to show the key idea proposed to handle the nonlinear case. The steps used in the introductory example will be generalized and further examples will demonstrate the proper treatment of the recorded data in the critical phase of the proposed algorithm. Simulation results will conclude the paper.

2. EIGENVALUE-EIGENVECTOR METHOD – THE LINEAR CASE Considering discrete-time linear systems assume that a set of noiseless observations z io,t results in a model output by the following linear combination of yto = a1 z1o,t + a2 z2o,t + ... + am zmo ,t ,

(1)

where t denotes the discrete time instants (t=0,1,2,…) and the ai parameters are to be estimated. Collecting the unknown parameters and the noiseless observations in parameter and observation vectors, respectively, θ T = [a1 a 2 ... a m ] (2) and

[

T

z to = z1o,t

z 2o,t

... z mo ,t

]

(3)

form the model equation explicitly by T

o t

ϑ T = [ −1 θ T ] = [ − 1 a 1

(5)

a2

... a m ]

z 2o,t

... z mo ,t ] . (6)

and T

x to = [ y to

T

z to ] = [ y to

z1o,t

are extended parameter and noise-free observation vectors, respectively. The main consequence of using an implicit model is that each variable plays the same role, no distinction is made among input and output signals. The application of the well known Least Squares (LS) estimation technique leads to unbiased estimation for the a i parameters if the set of the xt measured records available for the estimation is such that x Tt = [ y to + ε t

T

z to ] = [ y t

z1o,t

z 2o,t

... z mo ,t ] (7)

where ε t is white noise. Koopmans (Koopmans, 1937) generalized the LS estimation scheme by developing a maximum likelihood (ML) estimation assuming extended observation vectors. In his model all components of the observation vector may be corrupted by additive noise according to x t = x to + n t , where n t is Gaussian white noise with covariance matrix µC . Assuming N samples find an estimation of the parameter vector ϑ . With no a'priori information available there is no better estimation than the maximum likelihood (ML) estimation. To find this estimation we need to set up the likelihood function of the observations:

(

)

prob x 1 ,..., x N ϑ , ϑ T x 1o = 0,..., ϑ T x oN = 0 =

(

)

x to .

With the constraint of ϑ T x to = 0 the above maximization is equivalent to minimize the loss function N  ϑ T  ∑ x t x Tt ϑ 1 t =1   J = (10) 2 ϑT C ϑ by ϑ with no constraints, which in turn needs to solve ϑ T Wϑ Wϑ − Cϑ T =0 , (11) ϑ Cϑ where N

y =θ z (4) The model equation can be reformulated in an implicit form of ϑ T x to = 0 , where o t

Now the ML estimation can be obtained by maximizing the likelihood function in terms of ϑ and

(

T  1 N const exp − ∑ x t − x to ( µC) −1 x t − x to  2 t =1 where ϑ T x to = 0 t = 1..N .

)

(8)

 (9)

W = ∑ x t x Tt .

(12)

t =1

According to the above equation the optimal value of ϑ can be determined as a solution of the generalized eigenvalue calculation problem defined by (W − µC)ϑ = 0 . (13) More exactly, the optimal value of the parameter estimation will be equal to the eigenvector related to the smallest eigenvalue found, namely min J = µ 1 (14) ϑ

holds. The original Koopmans method relates to static linear models. Levin, however, applied the above algorithm for linear dynamic EIV models. In this case the observation vector is composed from the delayed recorded input and output samples of the process. Otherwise the parameter vector is still calculated via solving a generalized eigenvalue-eigenvector calculation problem. The advantage of the Koopmans-Levin method is clearly seen from the above equations, namely the consistent estimation is based on the relatively easy calculation of eigenvectors and eigenvalues of symmetrical matrices. Namely for symmetrical matrices of W = Φ T Φ and C = Γ T Γ the generalized eigenvalue calculation for the ( W, C) pair can be traced back to a generalized singular value decomposition for (Φ, Γ) representing a far more robust numerical algorithm. The presented methods can be qualified as direct strategies in the sense that no iteration is required to obtain the parameter estimation.

3. EIGENVALUE-EIGENVECTOR METHOD – THE NONLINEAR CASE In this section the above-presented algorithm will be generalized for nonlinear EIV identification. First a simple example will be shown to demonstrate the way the nonlinear extension is performed. Then a nonlinear eigenvalue-eigenvector algorithm will be

presented. Finally some technical hints will be given for the calculation. 3.1 An introductory example

Just to get started let us consider how to fit a second order parabolic function to a set of u1 ,...u N and y1 ,... y N input/output observation records. Working with the standard assumptions the input/output records are assumed to be corrupted by independent, normal distribution white noise with zero mean and variance of µσ u2 and µσ y2 , respectively. As far as the noiseless observations u1o ,...u No and y1o ,... y No are concerned the constraint by 2

y to = θ u to t=1,2,… (15) holds expressing the second order parabolic relation, though the θ parameter is not known. The maximum likelihood estimation for the unknown θ parameter can be obtained by minimizing the loss function J=

N

[

1 ∑ y t − y to 2 i =` N

(

1 / σ u t − u to   0

]

+ ∑ λ t y to − θu to i =1

2

2 y

)

Coming back to the problem to fit a second order parabolic function to the input/output records consider the following matrix constructed by the noisy (available) observations 1 N y  D = ∑  t2  y t u t2 , (17) N t =1 u t  and find the expected value of D . The components of E{D} can be calculated as follows:

[

]

1 N  1 N 2 E  ∑ y t2  = ∑ y to + µσ y2  N t =1  N t =1 N



t =1



∑ y t u t2  =

1 N

N

2

∑ y to u to + t =1

1 N

N

∑ y to µσ u2 t =1

(18)

(19)

t =1



1 N

N

4

∑ u to + t =1

6 N

N

2

∑ u to µσ u2 + 3µ 2σ u4 t =1

  ∑ y to = E  ∑ yt2  − µσ y2 N

1 N 

2

t =1

N

(23)



t =1

1 N o o2  1 N  1 N ∑ y t u t = E  ∑ y t u t2  − µσ u2 E  ∑ y t  N t =1 N N  t =1    t =1 (24) 1 N o4 ∑ ut = N t =1  1 N  1 N E  ∑ u t4  − 6 µσ u2 E  ∑ u t2  + 3µ 2σ u4  N t =1   N t =1  (25)

(16)

The approach to be followed hereafter aims to turn the parameter estimation problem to a generalized eigenvalue-eigenvector calculation problem. It is noted, however, that the elaborated method will not give a maximum likelihood estimation. All we can say is that the estimation will be consistent and the underlying method will be adequate for practical applications.



1 N

0  yt − y    1 / σ u2   u t − u 

y No , λ1o , λo2 ,…, λoN , θ ), including the θ parameter itself. It is obvious, however, that with the increasing number of observations this simple approach will not work in practice. Consequently, to derive a feasible solution a different approach should be applied.

N

∑ u t4  =

(20) All the right hand side expressions above contain sums made out of the noiseless signals not available for calculations. To isolate the terms with unknown variables we are going to use the following evaluations valid for the expected values: 1 N  1 N E  ∑ y t  = ∑ y to (21)  N t =1  N t =1 1 N  1 N 2 E  ∑ u t2  = ∑ u to + µσ u2 (22)  N t =1  N t =1 1 N o 1 N Substituting ∑ yt and ∑ u to back to the N t =1 N t =1 expressions of the E{D} components results in

o t o t

The straightforward minimization would lead to a set of 3N+1 equations allowing to find the 3N+1 unknown variables (i.e. u1o , u 2o ,…, u No , y1o , y 2o ,…,

1 E N

1 E N

Arranging the above components into a vector-matrix form: 1 N

[

 y to  o o 2  yt t =1 u t    N

∑

u to

2

] = E  N1 ∑ uy [y N

2



t

6 µσ u2

]

 2  ut  

t

   t =1  N 1 2 ∑ u t − 3µ 2σ u4   N t =1  1 N

µσ u2

N

∑ yt

(26)

Now introducing Do =



t =1



 µσ y2  − E  1 N  µσ u2 ∑ yt  N t =1

t

[

o 1 N  yt  o ∑  o 2  yt N t =1 u t 

u to

2

]

(27)

and  µσ y2  H( µ ) =  N  µσ u2 1 ∑ y t N t =1 

we have

   N 2 2 1 2 4 6 µσ u ∑ u t − 3µ σ u  N t =1  (28)

µσ u2

1 N ∑ yt N t =1

D o = E{D − H ( µ )} .

(29)

Observe that D o is singular as a result of the linear in 2

parameters dependency introduced by y to = θ u to : 1 Do =  N Multiplication by

N

4

 θ

∑ u to   t =1

2

 θ

θ  1

(30)

ϑ = [−1 θ T ]T

D o = E{D − H ( µ )} .

(31)

leads to

D oϑ = E{D − H ( µ )}ϑ = 0 . (32) The operation instructed by the expected value relates to calculated mean values, thus the approximation by 1  1 E  ∑ (...) ≈ ∑ (...) (33) N   N allows the parameter estimation reformulated as (D − H(µ ))ϑ = 0 . (34) In other words the parameter estimation problem is equivalent to a generalized eigenvalue-eigenvector problem in the sense that the smallest generalized eigenvalue µ is looked for satisfying

det (D − H (µ )) = 0 , (35) while the estimation ϑˆ will be obtained as the eigenvector belonging to this eigenvalue.

In the sequel, the introductory example just discussed will be directly generalized for more complex linear in parameters nonlinear models.

3.2 Generalization In contrast to the linear equation given by ϑ T x to = 0 discussed in Section 2 nonlinear process models will now be considered. Denoting the set of the noiseless observations by x to and the noisy observations by

x t , and further on assuming white noise conditions with zero mean and µC covariance we have

(

)

x t ~ N x to , µC . (36) For the noiseless observations a nonlinear constraint is assumed to exist, which relation is assumed to be linear in parameters: ϑ T g (x to ) = 0 (37)

Further on we assume that the elements of g (x to ) are polynomials of the observations. Based on the noisy observations x t for t=1,2,…,N the goal is to find an estimation for ϑ . Calculating g (x t ) from the available observations consider the following matrix 1 N D = ∑ g( x t )g( x t ) T (38) N t =1 Now find E{D} 1 N  E{D} = E  ∑ g ( x t )g (x t ) T  (39)  N t =1  by repeating the very same steps applied in the introductory example. It can be shown that E{D} = D o − E{H ( µ )} (40) where 1 N D o = ∑ g (x to )g (x to ) T . (41) N t =1 However, it is difficult to give a general analytical form for H ( µ ) . Anyway, we have

(42)

Due to the constraint by ϑ g ( x ) = 0 matrix D o is singular, meaning that D oϑ = E {D − H ( µ )}ϑ = 0 (43) or det( E{D − H ( µ )}) = 0 . (44) As it has already been shown, approximating the expected value of a calculated mean by the calculated mean itself allows to drop the expected value operation and results in D o (x 1o .., x oN )ϑ = (D(x 1 .., x N ) − H ( µC, x1 .., x N ) )ϑ = 0 (45) Again, the parameter estimation problem has been turned to a generalized eigenvalue-eigenvector problem. T

o t

Due to the polynomial construction of g( ) the matrix H is a polynomial like function of µ: H = µ H 1 + µ 2 H 2 + ...µ m H m (46) where m is the maximum order in the elements of g( ), and the matrices Hi are H i = H i (C, x 1 .., x N ) . (47) Note that for the original Koopmans method H ( µC, x 1 .., x N ) = µC (48) holds. 3.3 Practical hints As it has been pointed out earlier it is quite difficult to derive a general analytical form for H ( µC, x 1 .., x N ) . However, to demonstrate the technique how to find H ( µC, x 1 .., x N ) consider a few examples. In the examples we apply the following properties valid for normal distribution:

µ 2 k = 1* 3 * ...(2k − 1)σ 2 k , µ 2 k −1 = 0 (k = 1,2,...)

(49)

where µ r ≡ E{(ξ − E{ξ }) r } is defined as the central moment.

(50)

Example 1 Find the following expected value: 1 N  E  ∑ y t6  = ?  N t =1  Straightforward calculations give   1 N 1 N E  ∑ y t6  = E  ∑ ( y to + η ) 6  =   N t =1  N t =1 

 6  6 1 N 02  15µ 3σ 6 +  3µ 2σ 4 ∑ yt N t =1  6  4  6 1 N 0 4  6 1 N 0 6 +   µ σ 2 ∑ y t +   ∑ y t N t =1  2  0  N t =1

(51)

(52)

where the summations can further be expressed by µ and σ resulting in the following matrix form:

statement is straightforward by expanding ln( y t ) into Taylor series and substituting the central moments.

 0      1     0    N  2   1   2  2   E  N ∑ yt    2 µσ   tN=1  =      4 2 4  1 4   E  N ∑ yt    4 3µ σ   tN=1      6   1 6  2 4  E  N ∑ yt   6 15µ σ t = 1     

   1     1 N o 2   2     ∑ yt  Summary of the proposed  0   N t =1 algorithm   1 N o 4   4 2  4  µσ     ∑ yt  N  2  0   tN=1 Based on the discussions so far  o6  6  2 4  6  2  6   1 and evaluating the examples  3µ σ  µσ   N ∑ yt    presented in details earlier a  4  2  0   t =1 procedural algorithm can be (53) given to generate the matrix H provided a D matrix According to the special triangular structure of the of coefficient matrix the above equation can easily be 1 N solved and we obtain: D = ∑ g ( x t )g ( x t ) T (59) N t =1 i.e. a matrix with entries like 1 N o6 1 N  1 N  ∑ y t =E  ∑ y t6  − 15µ σ 2 E  ∑ y t4  1 N N t =1  N t =1   N t =1  Di. j = ∑ g i (x t )g j ( x t ) (60) N t =1 N 1   and further on assuming noise conditions to be + 45µ 2σ 4 E  ∑ y t2  + 75µ 3σ 6 (54)  N t =1  normal: Step 1: Find This means that for a matrix D with entries like  1 N E  ∑ g i (x t )g j (x t ) (61) 1 N 6   N t =1 Di , j = ∑ y t (55) N t =1 expressed by the noiseless observations and the noise the appropriate entry of matrix H becomes: parameters (mean and variance). 1 N 1 N Step 2: When finding the expected value for the H i , j = 15µ σ 2 ∑ y t4 − 45µ 2σ 4 ∑ yt2 − 75µ 3σ 6 noiseless observations use the calculated mean of the N t =1 N t =1 noisy observations as approximations (see the (56) examples shown previously). Set up a set of equations Example 2 based on the relation between the calculated means of Consider a matrix D given by the noisy and noiseless observations. 1 N 2 2 Di , j = ∑ u t yt (57) Step 3: Solve the equations set up in Step 2 for N t =1

then to derive H the following set of equations should be solved:    1    1    N  1 2   µσ u2  E  N ∑ ut    t =1         1 N µc uy  E  N ∑ ut yt   =     t =1N   1 2   µσ y2  E  N ∑ yt    t =1      N 2 2 2 2 2 E  1 2 2   µ σ u σ y + 2 µ c uy u y   ∑ t t       N t =1

1 1 1

µσ y2

µc uy

µσ u2

            1  

  1     N  1  o2 u ∑ t  N t =1    N 1  u to y to  ∑  N t =1   1 N o2   ∑ y t   N t =1  1 N o2 o2   ∑ ut yt   N t =1 

(58)

where cuy stands for the cross-correlation between u t and y t .

Example 3 Assume we have random variables x t with normal distribution. Considering transcendent functions f (x t ) of x t it is noted that E{ f (x t )} may or may not exist. As an example we state 1 N  that E  ∑ ln( y t ) does not exist. The proof of this  N t =1 

1 N (62) ∑ g i (x to )g j (x to ) N t =1 Step 4: The H i. j entries can be calculated by

approximating the expected values of sums involved with the sums themselves . Step 5: The following equation should be solved numerically for µ min :

det (D( x1.., x N ) − H( µC, x1.., x N ) ) = 0 (63)

Step 6: Finally, using the fundamental relation of the algorithm (64) (D(x1 .., x N ) − H(µ min C, x1 .., x N ) )ϑˆ = 0 ˆ ϑ can be determined. The presented algorithm results in a consistent, but not ML estimation. The effectiveness of the algorithm can be further improved by an appropriate weighting sequence applied in the composition of matrix D. 4. SIMULATION RESULTS In the sequel, two numerical examples will be shown to demonstrate the effectiveness of the elaborated algorithm. 4.1 Identification of a parabolic curve Consider N=100 samples of the following system: (65) y t = a 2 u t2 + a1u t + a o with measurements corrupted by normal noise conditions of σ u = σ y = 0.2 . In Fig. 1 the noisy observations, the LS estimation, the true system, as well as the results obtained by applying the proposed algorithm are given. It is seen that the true system and the nonlinear EIV model based identification derived in this paper are almost cover each other. 4.2 Identification of an elliptic curve Consider N=1000 samples of the following system: a o y t2 + a1 y t u t + a 2 u t2 + a 3 y t + a 4 u t + a 5 = 0 (66) with measurements corrupted by normal noise conditions of σ u = σ y = 1 . In Fig. 2 it is seen that now we have a really high noise/signal ratio, this is why 1000 samples were selected for the identification. In spite of the noisy environment the result obtained by the new algorithm is capable to almost completely reconstruct the contour determined by the noiseless observations. 5. CONCLUSION The goal of this paper is to extend the capability of the eigenvalue-eigenvector calculation based algorithms to handle nonlinear system identification problems. A new identification algorithm for nonlinear linear in parameter errors-in-variables models has been presented in the paper. The algorithm is based on generalized eigenvalueeigenvector calculations in the form of (D − H(µ ))ϑ = 0 (67) The vector composed by the unknown system parameters then is associated with the smallest eigenvalue found. Assuming sufficient input excitation the parameter estimation is consistent and it is direct in the sense that the estimation is obtained in one step rather than a result of iterations.

Acknowledgements: This work is supported by the fund of the Hungarian Academy of Sciences for control research and the Hungarian National Research Fund (grant number: T29815)

1.2

Measured values 1

Original system LS estimation

0.8

EIV estimation

0.6

0.4

0.2

0

-0.2

0

0.5

1

1.5

2

Fig. 1. Identification of a parabolic curve 10

Measured values Original system Estimated system

8

6

4

2

0

-2

-4 -6

-4

-2

0

2

4

6

8

10

Fig. 2. Identification of an elliptic curve REFERENCES Amemiya,Y, W.A.Fuller (1988) Estimation for the nonlinear functional relationship. The Annals of Statistics, Vol.16, No.1, 147-160. Castaldi P., Soverini U. (1996) Identification of dynamic errors-in-variables models. Automatica, Vol.32, No.4, 631-636. Chou C.T., Verhaegen M. (1997). Subspace algorithms for the identification of multivariable dynamic errors-in-variables models. Automatica, Vol.33, No.10. 18571869. Fernando K.V., Nicholson H. (1985) Identification of linear systems with input and output noise: the Koopmans-Levin method. IEE Proceedings, Vol.132.Pt.D, No.1, 30-36. Heij C., Scherrer W. (1999). Consistency of system identification by global total least squares. Automatica Vol.35, 993-1008. Koopmans, T. (1937) Linear regression analysis of economic time series. DeErven F.Bohn, N.V. Haarlem, The Netherlands. Levin M,J. (1964) Estimation of a system pulse transfer function in the presence of noise. IEEE Trans. on Automatic Control, 229-235 Vandersteen, G. (1997) Identification of linear and nonlinear systems in an errors-in-variables least squares and total least squares framework. PhD thesis, Vrije Universiteit, Brussel.