A Modified LOLIMOT Algorithm for Nonlinear Estimation Fusion Javad Rezaie Control and intelligent processing centre of excellence University of Tehran Tehran, Iran Student member, IEEE
[email protected] Behzad Moshiri Control and intelligent processing centre of excellence University of Tehran Tehran, Iran Senior member, IEEE
[email protected] Babak N. Araabi Control and intelligent processing centre of excellence University of Tehran Tehran, Iran Member, IEEE
[email protected] Although new methods have been developed for crisp input space partition in piecewise linear modeling [4], their applications are restricted due to the inherent ambiguity or fuzziness in the input space partitioning based upon its local linearity. FLL provides a potential way to resolve this problem [5]. In an FLL model, local linear models are constructed on local regions generated from the input space partition by fuzzy sets and are combined by membership function weighting. In general, both the membership functions that define the fuzzy sets and the corresponding local linear models need to be identified by using optimization techniques such as least squares (LS) and least mean squares (LMS), based on observational data and/or fuzzy rules [5]. Expectation-maximization (EM) is a general technique for maximum likelihood or maximum a posteriori estimation and has become an alternative to LS and LMS techniques in solving many estimation problems as the EM technique can provide covariance information about model mismatch [6]. For input space partitioning or membership function construction, evolutionary or growing/pruning algorithms based on optimal criteria, such as structural risk minimization (SRM), have been developed [5]. For instance, the adaptive spline modeling (ASMOD) algorithm based on the analysis of variance (ANOVA) decomposition has been widely used in spline models for combating the problem of curse of dimensionality in high-dimensional system modeling [5], [2], [3], [7], [8]. For the purpose of state estimation and control, linear local models and associated model mismatch information are desirable. It is possible to give FLL models a probabilistic interpretation, because the relation between the input vector x and the output vector y of a nonlinear system can be described by multiple probabilistic models, in which the conditional probability of y given x can be constructed as the weighted sum of local Gaussian
Abstract In this paper, first an enhanced NeuroFuzzy method for modeling nonlinear system is presented. In this method we use EM algorithm for identification of local models, which gain us model mismatch covariance. The achieved model can be stated in state space model as a linear time-varying system. As the noise and model mismatch covariace is known, Kalman filter can be easily used for centralized estimation fusion. The simulations show that using data fusion will enhance the estimation accuracy to a great deal also accuracy of centralized estimation fusion is better than distributed estimation fusion.
1. Introduction In developing model-based methods for state estimation or control of a priori unknown dynamic processes, the first step is to establish plant models from available observational data and/or expert process knowledge. The quality of the established models significantly affects the performance of model-based estimators and controllers. Nonlinear process modeling is generally a difficult and complex task, frequently subject to a variety of problems such as data sparsity, measurement noise, parametric overfitting/underfitting, and the curse of dimensionality. Divide-and-conquer is a principle widely used to attack complex problems by dividing a complex problem into simpler problems whose solutions can be combined to yield a solution to the original problem. Fuzzy local linearization (FLL) has recently emerged as a useful divide-and-conquer approach to nonlinear process modeling [1]-[3], it distinguishes itself from traditional piecewise linearization by fuzzy (soft) input space partitioning.
1-4244-1500-4/07/$25.00 ©2007 IEEE
Amir Rafati National Iranian South Oil Company Ahwaz, Iran
[email protected] 520
probabilistic models (mixture-of-experts), with linear functional means and covariance matrices determined by learning algorithms [6], [9]. Jordan et al. in [6] used the EM algorithm to identify the parameters in both the weight functions and the local probabilistic models. The difficulty in achieving this lies in the selection of the number of local models and the learning rate for updating the parameters in the weight functions. Note that the expected value of y given x is the weighted sum of the linear functional means of the local probabilistic models, the formulation of which is similar to the FLL scheme (weight functions can be viewed as membership functions) [5]. This observation motivates the idea of combining the local linear model tree (LOLIMOT) method for weight function identification with the EM method for identifying local probabilistic models to improve the model performance and to provide covariance information about the model mismatch (LS and LMS which used in previous LOLIMOT algorithm not provide this information about model mismatch), which is essential in the consequent state estimation, data fusion, or control applications. The remainder of this paper is organized as follows. A hybrid learning scheme for FLL modeling, which embeds the EM algorithm into the LOLIMOT algorithm, is presented in Section 2. Nonlinear state estimation using the standard Kalman filter (KF) and extended forgetting factor recursive least square (EFRLS) based on FLL models is discussed in Section 3. Section 4 includes a comparison between two mentioned nonlinear state estimators on a robot benchmark.
Fig.1: The network structure of a local linear NeuroFuzzy model with M neuron for p input.
rectangular region modeled by a separate local linear model, as with the ANFIS approach. A LOLIMOT model with Gaussian weighting function is structurally similar to the T-S fuzzy system and ANFIS model with Gaussian membership function [10]. The network structure of a local linear NeuroFuzzy model is depicted in Fig.1. Each neuron realizes a local linear model (LLM) and an associated validity function that determines the region of validity of the LLM. The outputs of the LLMs are yˆi = wi 0 + wi1 x1 + ... + wi p x p where w i j denote the LLM parameters for neuron i. The validity functions (where are chosen as normalized M
Gaussian) form a partition of unity such that ∑ Φi ( x )=1 . i =1
Finally the output of a local linear NeuroFuzzy model M
becomes yˆ = ∑ yˆ i Φ i ( x ) . i =1
2. NeuroFuzzy modeling
2.1. The LOLIMOT algorithm
Whilst NeuroFuzzy systems have become an attractive powerful data modeling technique, combining well established learning laws of neural networks and the linguistic transparency of fuzzy logic, they do suffer from the course-of-dimensionality. To solve high dimensional problems and yet retain all the desirable properties of Neurofuzzy systems (e.g. linear in the weights, transparency, partition of unity), some form of model complexity reduction must be performed, producing parsimonious models. Hence during model construction the following principles should be employed [10]: • Principle of data reduction; • Principle of network parsimony. The LOLIMOT approach is a piecewise linear modeling algorithm that partition the input space using a k-d tree structure into hyper-rectangulars. Each hyper-
The LOLIMOT Algorithm consists of an outer loop in which the rule premise structure is determined and a nested inner loop in which the rule consequent parameters are optimized by local estimation [11]: I. Start with an initial model. II. Find worst LLM: Calculate a local loss function for each of the i = 1, ..., M LLMs according to: N
2
I i = ∑ ( y (t ) − yˆ i (t )) Φ i ( x (t )) t =1
Find the worst performing LLM, that is max( I i ) i
III. Check all divisions: The LLM that selected in previous step is considered for further refinement. The hyper-rectangle of this LLM is split into two halves with an axis-orthogonal split. Divisions in all dimensions are tried. For each division dim=1,…,P the following steps are carried out: a) Construction of the multidimensional MSFs for both hyper-rectangles; b) Construction of all validity functions;
521
c) Local estimation of the rule consequent parameters for both newly generated LLMs (using LS or EM); d) Calculation of the loss function for the current overall model. IV. Find best division: The best of the P alternatives checked in step III is selected. The number of LLMs is incremented M → M + 1 V. Test for convergence: if the termination criterion is met the stop, else go to step II. For the termination criterion various options exist, e.g., maximum number of LLMs, statistical validation tests, or information criteria. Note that the number of effective parameters must be inserted in these termination criteria.
where p (k , x (t ) , w ′k ) is MF and also p ( y | x, w k , ∑ k ) =
1
=
12
(2π )
12
| ∑k |
1 2 −1 exp{− [ y − f k ( x, w k )] ∑ k } . 2
Finally: Q (Θ | Θ
(q)
N M
(q)
) = ∑ ∑ hk
(t ) ln[ p ( k
t =1 k =1
p( y
(t )
(t )
| x , w ′k ) ⋅
(6)
(t )
| x , w k , ∑ k )]
with (t )
(q)
p ( k | x , w ′k ) p ( y
q
hk (t ) =
M
(t )
(t )
q
q
| x , w k , ∑k )
(7)
(t )
q q (q) (t ) (t ) ∑ p (l | x , w ′l ) p ( y | x , w l , ∑ l )
l =1
2.2. The algorithm
Expectation
Maximization
However, it can be shown that an iteration of the EM algorithm not only increases the "complete-data'' log likelihood but also increases the "incomplete-data'' log likelihood. From ∂Q / ∂ ∑ k = 0 and ∂Q / ∂w k = 0 , equations for
(EM)
The purpose of embedding the EM algorithm into the LOLIMOT algorithm is to obtain covariance information of model mismatch, which is required in the consequent state estimation and control applications. The strategy of the EM algorithm is to maximize the conditionally expected log likelihood: EY [ln p ( Y , Ymis | Θ) | Y , Θ mis
(q)
updating the parameters ∑ k and w k can be explicitly as follows: N
(1)
]
(q +1) ∑k
where Θ represents the current estimate of Θ . The EM algorithm consists of two steps [10]: (i) The expectation (E) step, which calculate the following conditional expectation of the log likelihood
=
∑ {h t =1
[ y (t ) − f k (x(t ) ,w k(q ) )]2
(q ) k (t )
N
∑
(q )
given Y and Θ Q (Θ | Θ
(q)
(q )
w
( q +1)
(q)
(2)
]
(q )
M
∏∏ t =1
T
(5)
k =1
q −1
(t )
(t )T T
where x (t ) = [ x 1 (t )...x n (t )]
[ p (k , x (t ) , w ′k ) ⋅
(t )
X (∑ k ) y
(t )
Section 2 presented algorithms for identifying FLL models of unknown nonlinear processes based on observational data. This section develops methods for FLL model-based state estimation of nonlinear processes described by the following non-linear discrete-time statespace model: x (t ) = f [ x (t − 1), u (t )] + v (t ) (10) y (t ) = h ( x (t )) + w (t ) (11)
and defined the "complete-data" likelihood as follows: N
(9) (q)
3. Nonlinear state estimation
⎧1 if y (t) is generated from kth local model; =⎨ otherwise; ⎩0
p ( Y, Ymis | Θ) =
]
Functions) at each iteration.
(4)
= 1, 2, ..., M ; t = 1, 2, ..., N }
( t ) T −1
where X = [1 x ] (For more details on this algorithm see [1]-[6] and [10]) . Although, we use LOLIMOT for updating the parameters, w ′k , of validity functions (Membership
with I
q −1
X (∑ k ) X
t =1
(t )
(3) = arg max Q (Θ | Θ ) Θ Obviously, the difficulty in using EM algorithm lies in the selection of "missing" variables and the definition of the "complete-data" log likelihood. Jordan et al. in [12] chose the following set of indicator random variables as the "missing" data:
(t ) k
t =1
∑ hk
(q )
(t )
= [∑ h
(t )
N
) = EY [ln p ( Y, Ymis | Θ) | Y, Θ mis
Ymis = {I k , k
(q) (t ) k
:
where Θ is the estimate of the parameter vector Θ at iteration q. (ii) The maximization (M) step, which computes Θ
N
(8)
(q ) h k (t )
t =1
( q +1) k
}
is a state vector, u (t ) =
[u1 (t )...u m (t )] is an input vector, y (t ) = [ y1 (t )... yQ ( t )] is T
I k( t )
p ( y (t ) | x(t ) , w k , Σ k )]
T
an observation vector, f [.] is an unknown smooth
522
nonlinear function vector describing the state transition process, h [.] is a smooth function vector describing the measurement process, v (t ) and w (t ) represent additive Gaussian
noise,
with
E[ v ( i ) v ( j ) ] = δ i j Q ( i ) , T
E [v (t )] = E [w (t )] = 0 , T
E [ w ( i ) w( j ) ] = δ i j R ( i )
and
T
E [v (i )w ( j ) ] = 0 . By using the previous FLL modeling methods based on observational data, a nonlinear statespace model, which is a priori unknown, can be transformed into a time-varying linear state-space model. T T T Let x′(t ) = [ x (t − 1)u (t )] and fˆ [ x (t − 1), u (t )] =
Fig. 2: Indirect state estimation scheme
fˆ [ x′(t )] = [ fˆ1 ( x′)... fˆn ( x′)] , based on FLL modeling T
formulation fˆl ( x′), l fˆl ( x′) =
= 1, ..., n ,
can be modeled as follows:
M
∑Φ
k′
( x′)[ ak ′l 1 x1 (t − 1) + ... +ak ′ln xn (t − 1)
k ′=1
bk ′l 1u1 (t ) + ... + bk ′lm um (t )]=α l 1 (t ) x1 (t − 1) + ... +
(12)
α ln (t ) xn (t − 1) +β l 1 (t )u1 (t ) + ... + β lm (t )um (t ) with
α li (t ) =
M
∑Φ
k′
( x′) ak ′li ,
i = 1, ..., n,
(13)
Fig. 3: Direct state estimation scheme
(14)
and w ′(t ) includes the observation noise and the error noise resulting from model mismatch.
k ′=1 M
β l j (t ) =
∑Φ
k′
( x′)b k ′lj ,
j = 1,..., m ,
Based on (15) and (16) we can use standard Kalman filter for estimation purpose, in addition if we don't have any information about process and observation noises, Extended Forgetting Factor Recursive Least Square (EFRLS) can be used for state estimation, because this algorithm
k ′=1
where k ′ corresponds to an ordered sequence k 1 ,..., k n , k n +1 ,..., k n + m and M represents the total number of sequences [5]. This local model construction can be used to produce state estimate either by (i) The indirect method, in which system identification is used to parameterize the Kalman filter that in turn used to generate states (see Fig. 2) or (ii) The direct method, in which the identification and estimation process are combined in a bootstrap scheme to generate state estimates directly (see Fig. 3) [2]. Based on Eqs. (12)-(14), Eq. (10) can be reformed as: x (t ) = A (t ) x (t − 1) + B (t )u (t ) + v ′(t ) (15) where: A (t ) = [α li (t )]n ×n , B (t ) = [ β l j (t )]n ×m
and
can work without knowledge about above noises information [13]: EFRLS: xˆ (t | t ) = A (t − 1)[ xˆ (t − 1 | t − 1) + L (t )( y (t ) (17) ˆ (t − 1 | t − 1))] + B (t )u (t ) − C (t ) A (t − 1) x L (t ) = P (t − 1 | t − 1) A ′(t − 1)C ′(t )[λf I (18) + C (t )A (t − 1) P (t − 1 | t − 1) A ′(t − 1)C ′(t )]−1
v ′(t )
represent the original state transition noise plus the error noise introduced by model mismatch. Similarly, Eq. (11) can be reformed as: (16) y (t ) = C (t ) x (t ) + w ′(t )
P (t | t ) = λ −f 1 A(t )[ I − L(t )C (t ) A(t )]P (t − 1 | t − 1) A′(t )
(19)
where λf represent the forgetting factor.
Where C (t ) = [γ ql (t )]Q ×n .
4. Robot model Consider the kinematics for a rotating rigid body which can be found in [14]:
523
θ&1 = w2 cos θ 2 − w3 sin θ 2 θ&2 = w1 + w2 sin θ 2 tan θ1 + w3 cos θ 2 tan θ1 η&1 = τ 1 (θ1 − η1 )
6. Conclusion (20)
In this paper, first an enhanced NeuroFuzzy method for modeling nonlinear system is presented. In this method we use EM algorithm for identification of local models, which gain us model mismatch covariance. The achieved model can be stated in state space model as a linear time-varying system. As the noise and model mismatch covariace is known, Kalman filter can be easily used for centralized estimation fusion. The simulations show that using centralized estimation fusion will enhance the estimation accuracy to a great deal.
η&2 = τ 2 (θ 2 − η 2 ) The sensors at hand are the rate gyro measuring the angular velocity wi and the inclinometers measuring ηi (
θ i ) where inclinometers modeled as first-order low-pass filters also τ i = 1 / Ti are the inverse time constant and ηi are the inclinometer outputs. Here we have used the notation θ1 for pitch, θ 2 for roll. This parameterization is suitable for many applications as the angles have an intuitive meaning and also, typical motions for mobile land robots are such that θ1 , θ 2 < π / 2 for which the parameterization is unique. T
Finally, by letting x = [θ1 θ 2 η1 η 2 ] , we can write the system in the standard form: x& = f ( x, w) y = Cx
(21) (22)
where C = [0 2×2 I 2×2 ] .
Fig. 4:
θ1 estimation with centralized fusion
Fig. 5:
θ 2 estimation with centralized fusion
In this paper, we consider the problem of reconstructing pitch and roll from sensor measurements. Throughout the paper, we assume that the measurements for wi are accurate enough, therefore they will be considered as known time-varying inputs to the pitch-and-roll equations.
5. Simulation results As it is observable from the simulation results, when we measure pitch and roll angles with two inclinometers, and use centralized Kalman filter for data fusion, estimation is done with less error (see Fig. 6-Fig. 7). In addition when LOLIMOT+EM was used for modeling and KF for centralized estimation, the results are more accurate than the case in which distributed estimation fusion is used for estimation (see Fig. 8-Fig. 9). Also when LOLIMOT+LS was used for modeling and EFRLS for estimation, fusion can’t change estimation accuracy. Finally we see that (see Fig. 6-Fig. 9) when KF is used for centralized estimation fusion, data fusion can enormously increase the accuracy of results and this method has the best results.
Fig. 6:
524
θ1 estimation mean square error (MSE).
modeling," IEEE Trans. Syst. Man Cybern, vol. 29, pp. 559-565, 1999. [4] Q. Gan and C. J. Harris, "Multi-sensor data fusion using Kalman filters based on linearised process models," presented at Proceedings of the international Conference on Data Fusion, EuroFusion99, UK, 1999.
Fig. 7:
θ2
[5] C. J. Harris and Q. Gan, "State estimation and multi-sensor data fusion using data-based neurofuzzy local linearization process models," Information Fusion, vol. 2, pp. 17-29, 2001.
estimation mean square error (MSE).
[6] M. I. Jordan and R. A. Jacobs, "Hierarchical mixtures of experts and the EM algorithm," Neural Comput, vol. 6, pp. 181-214, 1994. [7] T. Kavli, "ASMOD - an algorithm for adaptive spline modeling of observation data," Int. J. Control, vol. 58, pp. 947-967, 1993.
Fig. 8:
θ1
[8] J. Stone, M. H. Hansen, C. Kooperberg, and Y. K. Truong, "Polynomial splines and their tensor products in extended linear modeling," Ann. Statist, vol. 25, pp. 1371-1470, 1997.
estimation mean square error (MSE).
[9] N. Gershenfeld, B. Schoner, and E. Metois, "Cluster-weighted model ling for time-series analysis," Nature pp. 329:332, 1999. [10] C. Harris, X. Hong, and Q. Gan, Adaptive Modeling, Estimation and Fusion from Data: A neurofuzzy Approach. Verlag Berlin Heidelberg New York: Springer, 2002.
Fig. 9:
[11] O. Nelles, Nonlinear System Identification: Springer-Verlag Berlin Heidelberg New York, 2001.
θ 2 estimation mean square error (MSE).
[12] M. I. Jordan and L. Xu, "Convergence results for the EM approach to mixtures of experts architectures," Neural Networks, vol. 8, pp. 1409-1431, 1995.
7. References [1] Z. Q. Wu and C. J. Harris, "A neurofuzzy network structure for modeling and state estimation of unknown nonlinear systems," Int. J. Syst. Sci., vol. 28, pp. 335-345, 1997.
[13] Y. M. Zhu, Multisensor Decision and Estimation Fusion. MA: Kluwer, Boston, 2003. [14] H. Rehbinder, X. Hu, " Nonlinear state estimation for rigid-body motion with low-pass sensors," Systems & Control Letters, vol. 40, pp. 183-190, 2000.
[2] Q. Gan and C. J. Harris, "Neurofuzzy state estimators using a modified ASMOD and Kalman filter algorithm," presented at International Conference on Computational Intelligence for Modeling Control and Automation, Vienna, Austria, 1999.
Rafati, B.Moshiri, K.Salahshoor and [15] A. M.Tabatabaei pour, " Asynchronous Sensor Bias Estimation in Multisensor-Multitarget Systems” International Conference on Multisensor Fusion and Integration for Intelligent Systems- MFI2006, Heidelberg, Germany, September 3-6, 2006, pp 402-407.
[3] Q. Gan and C. J. Harris, "Fuzzy local linearisation and local basis function expansion in nonlinear system
525