An on-line algorithm for creating self-organizing ... - Semantic Scholar

Report 5 Downloads 81 Views
Neural Networks 17 (2004) 1477–1493 www.elsevier.com/locate/neunet

An on-line algorithm for creating self-organizing fuzzy neural networks Gang Leng1, Girijesh Prasad*, Thomas Martin McGinnity Intelligent Systems Engineering Laboratory, School of Computing and Intelligent Systems, University of Ulster at Magee, Derry, Northern Ireland BT48 7JL, UK Received 11 April 2003; accepted 15 July 2004

Abstract This paper presents a new on-line algorithm for creating a self-organizing fuzzy neural network (SOFNN) from sample patterns to implement a singleton or Takagi-Sugeno (TS) type fuzzy model. The SOFNN is based on ellipsoidal basis function (EBF) neurons consisting of a center vector and a width vector. New methods of the structure learning and the parameter learning, based on new adding and pruning techniques and a recursive on-line learning algorithm, are proposed and developed. A proof of the convergence of both the estimation error and the linear network parameters is also given in the paper. The proposed methods are very simple and effective and generate a fuzzy neural model with a high accuracy and compact structure. Simulation work shows that the SOFNN has the capability of self-organization to determine the structure and parameters of the network automatically. q 2004 Published by Elsevier Ltd. Keywords: EBF; Recursive least squares algorithm; Self-organizing fuzzy neural network; TS model

1. Introduction Fuzzy neural networks are hybrid systems that combine the theories of fuzzy logic and neural networks. In these hybrid systems, the fuzzy techniques are used to create or enhance certain aspects of the neural network’s performance (Nauck, 1997). In recent years, the idea of self-organization has also been introduced in hybrid systems (Cho & Wang, 1996; Lin, 1995; Wu & Er, 2000) to create adaptive models, mainly for representing nonlinear and time-varying systems. Such hybrid systems are found to be very effective and useful in several areas (Cho & Wang, 1996; Jang, 1993; Wu & Er, 2000). One important area of interest is the generation of models from observations of complex systems, where little or insufficient expert knowledge is available to describe the underlying behavior. Problems that arise in these systems are large dimensions, time-varying characteristics, large amounts of data

* Corresponding author. E-mail addresses: [email protected] (G. Leng), g.prasad@ ulster.ac.uk (G. Prasad), [email protected] (T.M. McGinnity). 1 Tel.: C44 2871 3757; fax: C44 2871 3755. 0893-6080/$ - see front matter q 2004 Published by Elsevier Ltd. doi:10.1016/j.neunet.2004.07.009

and noisy measurements, as well as the need for an interpretation of the resulting model. The main purpose of the proposed hybrid fuzzy neural network architecture is to create self-adaptive fuzzy rules for on-line identification of a singleton or Takagi-Sugeno (TS) type (Takagi & Sugeno, 1985) fuzzy model of a nonlinear time-varying complex system. The proposed algorithm therefore aims to build a self-organizing neural network which is designed to approximate a fuzzy algorithm or a process of fuzzy inference through the structure of neural networks and thus create a more interpretable hybrid neural network model making effective use of the superior learning ability of neural networks and easy interpretability of fuzzy systems. The twin issues associated with the identification of a singleton or TS type fuzzy model are: (1) parameter estimation that includes identifying parameters of premises and consequences and (2) structure identification which involves partitioning the input–output space and thus identifying the number of fuzzy rules for the desired performance. With the availability of reasonably good quality and sufficient training data, there are several supervised training approaches (Mitra & Hayashi, 2000; Wang & Lee, 2001) proposed in the literature for an adaptive fuzzy model

1478

G. Leng et al. / Neural Networks 17 (2004) 1477–1493

identification. The main difference lies in the method adopted for the partitioning of input–output space for the structure identification. As suggested in Wang and Lee (2001), the methods proposed in the literature can be placed into two broad categories: (1) the static adaptation method where the number of input–output partitions are fixed, while their corresponding fuzzy rule configurations are adapted through optimization to obtain the desired performance and (2) the dynamic adaptation method where both the number of input–output space partitions and their corresponding fuzzy rule configurations are simultaneously and concurrently adapted. The main focus of this paper is the design of an approach for dynamic adaptation of the structure of the hybrid network, so that the underlying behavior of a nonlinear time-varying complex system could be captured more easily and accurately. Given the number of input–output space partitions, a range of techniques is proposed to arrive at the best set of fuzzy rules under static adaptation method. One of the early notable works is the adaptive-network-based fuzzy inference system (ANFIS), given in Jang (1993). The ANFIS creates an input–output map based on the input– output space partitions fixed by a priori expert knowledge. The partitions are grid-type whose number depends on the number of input variables and the number of membership functions for each input variable (Jang, 1993; Jang & Sun, 1995). In a slightly improved algorithm presented in Nauck and Kruse (1999), though the structure learning selects fuzzy rules based on a predefined grid over the input space, it is shown that a smaller number of fuzzy rules are required by selectively choosing appropriate number of grids to define fuzzy rules for the desired approximation performance. In general, the grid-type partitioning, however, suffers from the curse of dimensionality, as the number of inputs becomes larger. For similar performance, the number of fuzzy partitions, and thereby fuzzy rules, can be greatly reduced by cluster-based partitioning of the input–output space (Zadeh, 1994). Therefore, for identifying a more compact structure, the input–output space partitioning based on the traditional clustering approaches, such as hard c-means (HCM) and fuzzy c-means (FCM) (Gonzalez, Rojas, Pomares, Ortega, & Prieto, 2002; Klawonn & Keller, 1998; Wang & Lee, 2001), has been proposed. The number of partitions, however, needs to be fixed by a priori expert knowledge in these approaches also. The final configuration of the clusters and the corresponding fuzzy rule parameters are obtained by a nonlinear optimization. However, there is no guarantee of convergence to an optimal solution, as the final solution greatly relies on the selection of initial locations of cluster centers. An efficient and easy to use algorithm to construct fuzzy graphs from example data is presented in Berthold and Huber (1999). The fuzzy graphs are based on locally based independent fuzzy rules that operate solely on selected important attributes. It involves automatic partitioning of input space and fuzzy structure identification. However, the number and shape of

membership functions for the output variable have to be predetermined manually in Berthold and Huber (1999). These static adaptation based approaches thus provide a systematic method for fuzzy rule-base identification and facilitate adaptation as a result of changes in the reasoning environment. However, it is difficult and time consuming to visualize the optimal number of partitions required for modeling a complex system, particularly if the underlying structural behaviour of the system is time-varying and inadequately known. Under the dynamic adaptation method, the training may start with none or a single fuzzy rule or neuron (Er & Wu, 2002; Rizzi, Panella, & Mascioli, 2002; Wang & Lee, 2001; Wu & Er, 2000). During training, the network structure grows and concurrently the parameters of antecedents and consequents are adapted to obtain the desired modeling accuracy. A typical example of a dynamic adaptation method is the approach based on the aligned clustering algorithm (ACA) presented in Berthold and Huber (1999). This self-constructing paradigm is a growing partitioning algorithm and starts from empty clusters. A self-organizing learning algorithm adds and adapts new clusters until a prespecified criterion is satisfied. After structure identification, the backpropagation learning is performed to identify final values of parameters. The new cluster selection is, however, mainly based on the criterion of coverage by existing clusters. This makes the final cluster configuration quite sensitive to the sequence of training data (Wang & Lee, 2001). A constructive approach for creating ANFIS-like network is proposed in Mascioli and Martinelli (1998), Mascioli, Rizzi, Panella, and Martinelli (2000) and Rizzi et al. (2002) based on Simpson’s min–max technique (Simpson, 1992, 1993). The input space is partitioned by constructing hyperboxes using the min–max procedure. The hyperbox-based framework facilitates application of different types of fuzzy membership functions. To decrease the complexity of the network, a pruning method, named pruning adaptive resolution classifier (PARC), is developed in Rizzi et al. (2002). This consists of deleting some negligible hyperboxes and a fusion procedure to make actual coverage complete. This constructive approach is basically developed for batch learning to create structureadaptive network with a significantly high degree of automation. A self-organizing dynamic fuzzy neural network (DFNN) architecture reported in Er and Wu (2002) and Wu and Er (2000) is a recent notable work that can be used in the on-line situation. It makes use of a hierarchical learning approach and an orthogonal least squares based pruning technique (Chen, Cowan, & Grant, 1991). In the DFNN architecture, fuzzy rules are represented by RBF neurons in the first hidden layer. However, the representation is restrictive in the sense that widths of the membership functions belonging to various inputs that create an RBF neuron of the DFNN, have the same value. An enhanced version of DFNN is GDFNN

G. Leng et al. / Neural Networks 17 (2004) 1477–1493

(Wu, Er, & Gao, 2001). It introduces width vectors in RBF neurons, so that the Gaussian membership functions within a neuron could be assigned appropriate widths separately. The GDFNN also attempts to provide explanations for selecting the values of width parameter of the Gaussian fuzzy membership functions based on the concept of 3-completeness (Wang, 1992). However, the hierarchical learning approach proposed for training GDFNN is dependent on the total number of training data patterns. This implies that the GDFNN approach is primarily designed for batch training using a fixed number of training data patterns. In order to model a time-varying nonlinear system, a truly on-line training algorithm is required. Ensuring convergence of the network parameters and the estimation error is essential for an on-line training algorithm. To the best of the authors’ knowledge, a convergence analysis of the self-organizing on-line learning approaches proposed in the literature has not been reported in the fuzzy neural network or neuro-fuzzy context (Mitra & Hayashi, 2000). Based on the dynamic adaptation method, this paper presents a new algorithm for creating a self-organizing fuzzy neural network (SOFNN) that identifies a singleton or TS type fuzzy model (Takagi & Sugeno, 1985) on-line. A modified recursive least squares (RLS) algorithm is derived including a proof of convergence. The proofs of the convergence of the estimation error and the linear network parameters provide conditions for guaranteed convergence in the proposed RLS parameter learning algorithm. For structure learning, a new approach is proposed to decide how to create a new ellipsoidal basis function (EBF) neuron with a center vector and a width vector. The EBF neuron represents a fuzzy rule formed by AND logic (or T-norm) operating on Gaussian fuzzy membership functions. The elements of center vector and width vector of the EBF neuron are the centers and widths of the Gaussian membership functions. The structure learning approach consists of a combination of system error and firing strength based criterion for adding new EBF neurons using the concepts from statistical theory and the 3-completeness of fuzzy rules (Lee, 1990). This ensures that the membership functions have the capability to cover more data than that in GDFNN. The SOFNN approach uses a pruning method that is distinctly different from that of GDFNN. The pruning method is based on the optimal brain surgeon (OBS) approach (Hassibi & Stork, 1993) and the research in Leung, Wong, Sum, and Chan (2001). This method is computationally very efficient for on-line implementation, as it does not involve any significant additional computation because it makes direct use of the Hermitian matrix obtained as a part of the proposed RLS algorithm. These two features of adding and pruning neurons automatically make the SOFNN have an adaptive structure to model nonlinear and time-varying systems on-line. The proposed structure learning and parameter learning methods are very simple

1479

and more efficient and yield a fuzzy neural model with a high accuracy and compact structure. In Section 2, the architecture of the SOFNN is described. The structure learning of the SOFNN that presents the new approach of adding and pruning neurons is organized in Section 3. It also describes the modified RLS algorithm and the analysis of the properties of this algorithm for the parameter learning. Simulations of SOFNN are compared with other algorithms in Section 4. The results show that the SOFNN has a high accuracy with compact structure, and neurons can be added or pruned automatically in the learning process. Finally, conclusions are summarized in Section 5.

2. Architecture of the SOFNN The architecture of the self-organizing fuzzy neural network (SOFNN) is a five-layer fuzzy neural network shown in Fig. 1. The five layers are the input layer, the ellipsoidal basis function (EBF) layer, the normalized layer, the weighted layer, and the output layer. Some of the interconnections are shown by dotted lines. This is to indicate that the SOFNN has the ability to self-organize its own neurons in the learning process for implementing Singleton or Takagi-Sugeno (TS) fuzzy models. In the EBF layer, each neuron is a T-norm of Gaussian fuzzy membership functions belonging to the inputs of the network. Every membership function (MF) thus has its own distinct center and width, which means every neuron has both a center vector and a width vector and the dimensions of these vectors are the same as the dimension of the input vector. Fig. 2 illustrates the internal structure of the jth neuron, where xZ[x1 x2 / xr] is the input vector, cjZ[c1j c2j / crj] is the vector of centers in the jth EBF neuron, and sjZ[s1j s2j / srj] is the vector of widths in the jth neuron. Here, multi-input and single-output (MISO) systems are considered. However, all results could also be applied to multi-input and multi-output (MIMO) systems. A layerwise mathematical description of the network follows.

Fig. 1. Self-organizing fuzzy neural network.

1480

G. Leng et al. / Neural Networks 17 (2004) 1477–1493

the bias is the column vector BZ[1 x 1 x 2 / x r] T. For the singleton fuzzy model, BZ[1 0 0 / 0]T. The dimension of the column vector B is rC1. The row vector AjZ[aj0 aj1 aj2 / ajr] represents the set of parameters corresponding to the then-part (or consequent) of the fuzzy rule j. The weighted bias w2j is w2j Z Aj $B Z aj0 C aj1 x1 C/C ajr xr ;

j Z 1; 2; .; u: (4)

Fig. 2. Structure of the jth neuron Rj with cj and sj in EBF layer.

This is thus the then-part (or consequent) of the jth fuzzy rule of the fuzzy model. The output of each neuron is fjZw2jjj. Layer 5 is the output layer. Each neuron in this layer represents an output variable as the summation of incoming signals from the layer 4. The output of a neuron in the layer 5 is   Pu Pr ðxiKcij Þ2 w exp K u jZ1 2j iZ1 2s2 X ij h P i ; yðxÞ Z fj Z P (5) ðxiKcik Þ2 u r exp K jZ1 2 kZ1 iZ1 2s

Layer 1 is the input layer. Each neuron in this layer represents an input variable, xi, iZ1,2,.,r. Layer 2 is the EBF layer. Each neuron in this layer represents an if-part (or premise) of a fuzzy rule. The outputs of EBF neurons are computed by products of the grades of MFs. Each MF is in the form of a Gaussian function " # ðxi Kcij Þ2 mij Zexp K ; iZ1;2;.;r; jZ1;2;.;u; (1) 2s2ij

Y Z W2 J;

where

where for the TS model,

mij is the ith membership function in the jth neuron; cij is the center of the ith membership function in the jth neuron; sij is the width of the ith membership function in the jth neuron; r is the number of input variables; u is the number of neurons. For the jth neuron, the output is " # r X ðxi K cij Þ2 fj Z exp K ; j Z 1; 2; .; u: 2s2ij iZ1

(2)

Layer 3 is the normalized layer. The number of neurons in this layer is equal to that of layer 2. The output of the jth neuron in this layer is   Pr ðxiKcij Þ2 exp K iZ1 2s2 fj ij h P i; ZP jj Z Pu ðxiKcik Þ2 u r (3) kZ1 fk kZ1 exp K iZ1 2s2

ik

where y is the value of an output variable. Suppose u EBF neurons are generated from the n training patterns of the input vector xt and the corresponding desired output dt(tZ1,2,.,n). Rewrite the output of the network as

Y Z ½ y1

y2 / yn ;

W2 Z ½ a10 2

(6)

j11

(7)

a11 / a1r / au0

6 6 j11 x11 6 6 6 « 6 6j x 6 11 r1 6 J Z6 6 « 6 6 ju1 6 6 6 ju1 x11 6 6 « 4 ju1 xr1

/

j1n

au1 / aur ;

(8)

3

7 / j1n x1n 7 7 7 « « 7 7 / j1n xrn 7 7 7 « « 7 7; 7 / jun 7 7 7 / jun x1n 7 7 « « 7 5 / jun xrn

(9)

where W2 is the parameter matrix, jjt is the output of the jth neuron in the normalized layer when the tth training pattern enters the network.

ik

j Z 1; 2; .; u: Layer 4 is the weighted layer. Each neuron in this layer has two inputs and the product of these inputs as its output. One of the inputs is the output of the related neuron in the layer 3 and the other is the weighted bias w2j. For the Takagi-Sugeno (TS) model (Takagi & Sugeno, 1985),

3. The SOFNN learning algorithm The learning process of the SOFNN includes the structure learning and the parameter learning. The structure learning attempts to achieve an economical network size with a new self-organizing approach. As a result of this,

G. Leng et al. / Neural Networks 17 (2004) 1477–1493

neurons in the EBF layer are augmented or pruned dynamically in the learning process. The parameter learning makes the network converge quickly through an on-line recursive least squares algorithm. 3.1. Adding a neuron There are two criteria to judge whether to generate an EBF neuron or not. One is the system error criterion, and the other is the if-part criterion. The error criterion considers the generalization performance of the overall network. The ifpart criterion evaluates whether existing fuzzy rules or EBF neurons can cover and cluster the input vector suitably. Error criterion. Consider the nth observation (xn,dn), where xn is the input vector and dn is the desired output. The output of the network with the current structure is yn. The system error 3(n) is defined as j3ðnÞj Z jdn K yn j:

(10)

If j3ðnÞjO d;

(11)

where d is the predefined error tolerance, a new EBF neuron should be considered either for addition to the structure or to modify the widths of some membership functions. If-part criterion. Every EBF neuron in the EBF layer represents an if-part of a fuzzy rule. For the nth observation (xn,dn) with r as the dimension of input vector xnZ[x1n x2n / xrn], i.e. the number of input variables, the firing strength or the output of each EBF neuron is as given in (2) " # r X ðxin K cij Þ2 fj Z exp K ; j Z 1; 2; .; u: (12) 2s2ij iZ1 The input vector xn would be assumed to have been appropriately clustered by existing fuzzy rules or EBF neurons, if the firing strength of at least one neuron is greater than 0.1354. This condition would ensure that no individual input can have a fuzzy membership grade less than 0.1354 and thus the 3-completeness of fuzzy rules (Lee, 1990), with 3Z0.1354, would be satisfied. It is also to be noted that for the Gaussian membership function MF(cij,sij) the grade is 0.1354 for the input at cijG2sij. Assuming a normal input data distribution, 95% input data belonging to this membership function will lie within the input range [cijK 2sij,cijC2sij]. Thus the threshold for the output of each EBF neuron is set to 0.1354. Now define fðnÞ Z arg maxðfj Þ;

1481

the neuron output threshold constant, the capability of membership functions for covering input vectors may be increased by enlarging the widths of membership functions. When the first training pattern (x1,d1) enters the network, the structure of the first neuron is defined as c1 Z xT1 ;

(14)

2

3 s0 6s 7 6 07 s1 Z 6 7; 4 « 5

(15)

s0 where c1 is the center vector whose dimension is r!1, s1 is the width vector whose dimension is r!1, s0 is a predefined initial width. For the nth training pattern, the following scenarios are possible: (a) j3(n)j%d and f(n)R0.1354. This means that the network has good generalization and some neurons can cluster this input vector. The fuzzy neural network can thus accommodate the observation. Either nothing should be done or only the parameters should be adjusted. (b) j3(n)j%d and f(n)!0.1354. In this case, the network has good generalization, but no neuron can cluster this input vector. To ensure the 3-completeness of fuzzy rules (Lee, 1990), the widths of this EBF neuron should be enlarged to make its output to match the threshold 0.1354. So the width of the membership function that has the least value in this neuron should be enlarged to cover the input vector suitably using (21) which is described later. (c) j3(n)j%d and f(n)R0.1354. This means that the network has poor generalization performance, but a neuron can cluster the current input vector. Thus despite proper coverage of the current input by the existing neurons, the approximation performance of the network is poor. A new EBF neuron is therefore required to be added to the current structure to improve the performance. Suppose the kth new neuron is to be added to the structure. For every neuron in the EBF layer except the kth new neuron, find the minimum distance vector 2

dist1n ðhm1 Þ

3

2

arg minjZ1;.;kK1 ðjx1n K c1j jÞ

3

7 6 7 6 6 dist2n ðhm2 Þ 7 6 arg minjZ1;.;kK1 ðjx2n K c2j jÞ 7 7; 7 Z6 Distk Z 6 7 6 7 6 « « 5 4 5 4 distrn ðhmr Þ

arg minjZ1;.;kK1 ðjxrn K crj jÞ (16)

(13)

j

if f(n)!0.1354, it means no neuron in this structure can cluster this input vector. So the width should be modified to cover the input vector suitably or a new EBF neuron should be considered for adding in the structure. Keeping

where distin(hmi) is the distance between the ith input of the nth observation xnZ[x1n x2n / xrn] and the center of the hmith membership function in the mith neuron. The center cihmi of the hmith membership function is the nearest center from xin. Here, iZ1,2,.,r. Let the center vector ck

1482

G. Leng et al. / Neural Networks 17 (2004) 1477–1493

and the width vector sk of the kth new neuron be 2 3 c1k 6c 7 6 2k 7 ck Z 6 7; 4 « 5

(17)

qi is the ith linear parameter; 3(t) is the error signal; M is the dimension of the parameters, here, MZu! (rC1). Using (6)–(9), the above (22) can be written in matrix form at time tZn as,

crk and 2

3 s1k 6s 7 6 2k 7 sk Z 6 7: 4 « 5

where (18)

srk

DðnÞ Z ½ dð1Þ dð2Þ / dðnÞ T 2Rn ; PðnÞ Z JT Z ½ pT ð1Þ

If distin(hmi)%kd(i), where kd(i) is a predefined distance threshold of the ith input, iZ1,2,.,r, then

here

cik Z cihmi ;

pT ðiÞ Z ½ p1 ðiÞ

sik Z sihmi ;

(19)

and if distin(hmi)%kd(i), then cik Z xin ;

sik Z distin ðhmi Þ:

(20)

The parameters of the current network should be updated to minimize the system error 3(n) since there is a new neuron added in this situation. (d) j3(n)jOd and f(n)!0.1354. It shows that the network has poor generalization and no neurons can cluster this input vector. The strategy for this case is to try improving the entire performance of the current network by enlarging some of the widths to cover the current input vector. Suppose f(n) is the output of the jth EBF neuron. The width of the ith membership function that has the least value in this neuron should be enlarged to cover the input vector suitably snew Z ks sold ij ij ;

(23)

DðnÞ Z PðnÞQðnÞ C EðnÞ;

(21)

where ks is a predefined constant which is larger than 1. For simulations in this paper, ksZ1.12. The width enlargement process is continued until f(n)R0.1354. If the network still has bad generalization, i.e. j3(n)jOd, the old widths are maintained and a new EBF neuron is added in the structure using the same procedure as described in the case (c) above. The parameters of the new network should also be updated.

p2 ðiÞ / pM ðiÞ ;

QðnÞ Z WT2 Z ½ q1 EðnÞ Z ½ 3ð1Þ

T n!M ; pT ð2Þ / pT ðnÞ  2R

1% i% n;

q2 / qM T 2RM ;

3ð2Þ / 3ðnÞ T 2Rn :

Using the linear least squares method, the parameter of the regression model is given by ^ Z ½PT ðtÞPðtÞK1 PT ðtÞDðtÞ: QðtÞ

(24)

Based on the recursive least squares algorithm (RLS) (Astrom & Wittenmark, 1995; Franklin, Powell, & Workman, 1990), at time t, an on-line weight learning algorithm for the SOFNN is developed for the parameter learning. Define an M!M Hermitian matrix Q as QðtÞ Z ½PT ðtÞPðtÞK1 :

(25)

Using the matrix Q, the final update equations of the standard RLS algorithm (Astrom & Wittenmark, 1995; Franklin et al., 1990) can be written as LðtÞ Z QðtÞpðtÞ Z Qðt K 1ÞpðtÞ½1 C pT ðtÞQðt K 1ÞpðtÞK1 ; (26) QðtÞ Z ½I K LðtÞpT ðtÞQðt K 1Þ;

(27)

^ K 1Þ: ^ Z Qðt ^ K 1Þ C LðtÞ½dðtÞ K pT ðtÞQðt QðtÞ

(28)

3.2. The linear parameter learning method The SOFNN model could be rewritten as a special case of a linear regression model (Chen et al., 1991; Wu & Er, 2000) dðtÞ Z

M X

pi ðtÞqi C 3ðtÞ;

(22)

iZ1

where d(t) is the desired output; pi(t) is the ith regressor which is a fixed function of input vector xtZ[x1t x2t / xrt], i.e. pi(t)Zpi(xt);

In order to ensure convergence, a variation of the above standard recursive least squares algorithm is used. At time t, when the estimation error je(t)j is less than the approximation error j3(t)j, the parameters of the network are not adjusted and the structure is also not modified. Thus the on-line learning algorithm is modified to: LðtÞ Z QðtÞpðtÞ Z Qðt K 1ÞpðtÞ½1 C pT ðtÞQðt K 1ÞpðtÞK1 ; (29) QðtÞ Z ½I K aLðtÞpT ðtÞQðt K 1Þ;

(30)

G. Leng et al. / Neural Networks 17 (2004) 1477–1493

^ K 1Þ; ^ Z Qðt ^ K 1Þ C aLðtÞ½dðtÞ K pT ðtÞQðt QðtÞ ( aZ

1; jeðtÞjR j3ðtÞj; 0; jeðtÞj! j3ðtÞj:

(31) (32)

This modification ensures that the above modified algorithm guarantees an asymptotic convergence of the estimation error to zero and the convergence of the parameter to finite values. The proof of the convergence is detailed in Section 3.5. 3.3. Pruning a neuron The pruning strategy is based on the optimal brain surgeon (OBS) approach (Hassibi & Stork, 1993) and the research of Leung et al. (2001). The basic idea is to use second derivative information to find the unimportant neuron. If the performance of the entire network is accepted when the unimportant neuron was deleted, the new structure of the network is generated. The pruning method thus relies on the sensitivity analysis of the parameter Q. The main idea is that the parameter Q is critical if the slight change of the parameter Q corresponds to a large change of the cost function. The pruning method is given below. In the learning process, the network reaches a local minimum in error. The cost function is defined as the squared error EðQ; tÞ Z

t 1X ½dðiÞ K pT ðiÞQ2 : 2 iZ1

At time t, the functional Taylor series of the error with respect to parameters is   vEðQ; tÞ T EðQ C DQ; tÞ Z EðQ; tÞ C DQ vQ 1 v2 EðQ; tÞ C DQT DQ C OðjjDQjj3 Þ; 2 vQ2 (33) where Hh

v2 EðQ; tÞ vQ2

(34)

t v2 EðQ; tÞ X Z pðiÞpT ðiÞ Z PT ðtÞPðtÞ vQ2 iZ1

Z QK1 ðt K 1Þ C pðtÞpT ðtÞ Z QK1 ðtÞ:

(36)

Deleting a neuron is equivalent to setting the values of the related parameters to zero. The smaller the value of the change in the squared error, the less important is the neuron. Based on these twin concepts, the pruning method is designed using the following steps: (a) Calculate the training root mean squared error Ermse at time t. (b) Define the tolerance limit for the root mean squared error as lErmse, where l is a predefined value and 0! l!1. For simulations in this paper, lZ0.8. (c) Calculate the change of the squared error DE for every neuron. The smaller the value of DE is, the less important is the neuron. (d) Choose EZmax(lE rmse,k rmse), where k rmse is the expected training root mean squared error which is a predefined value. (e) Select the least important neuron, delete this neuron, and then calculate the training root mean squared error (ERMSE) of this new structure. If ERMSE!E, this neuron should be deleted. Then do the same thing for the second least important neuron, and so on. Otherwise, stop and do not delete neurons any further.

3.4. Combining membership functions and rules If some membership functions are very similar to each other, they can be in the same group and combined into one new membership function as in Chao, Chen, and Teng (1995) and Wang (1997). In this research, the membership functions, which have the same center, are considered to be combined into one new membership function. Consider n membership functions that have the same center cs and different widths ss1,ss2,.,ssn. The new membership function is defined with the center

snew Z

(35)

Also the first partial derivative of the squared error is t X vEðQ; tÞ ZK pðiÞ½dðiÞ K pT ðiÞQ: vQ iZ1

HZ

(37)

and the width

and the third and all higher order terms are ignored, then (33) becomes DEðQ; tÞ z 12 DQT HDQ:

Then, using (25), the Hessian matrix can be defined as

cnew Z cs ;

is the Hessian matrix. At a local minimum in error,   vEðQ; tÞ T DQ Z 0; vQ

1483

ss1 C ss2 C/C ssn : n

(38)

Owing to the elimination of redundant membership functions, fewer membership functions are generated and the complexity of the network is reduced. After combining the similar membership functions, it is possible that some neurons have the same center vector and width vector. It means that the premise part of these fuzzy rules is the same. This requires replacing all these similar rules by a single rule. Suppose there are i neurons having

1484

G. Leng et al. / Neural Networks 17 (2004) 1477–1493

the same membership functions and the consequence parts are W21,W22,.,W2i, the new consequence part W2new can be combined into

As d is an arbitrary positive small value, j1CpT(t)Q(tK 1)p(t)jd is also an arbitrary positive small value. So

W C W22 C/C W2i : W2new Z 21 i

t/N

lim eðtÞ Z 0:

(39) Thus Remark 1 is proved. ,

3.5. Convergence analysis An analysis of the convergence of both the estimation error and parameters for this algorithm is given as follows. Remark 1. If a nonlinear system is stable, using the SOFNN to approximate this system with the on-line weight learning algorithm described above, the estimation error converges to a finite value. Proof. For a stable nonlinear system, using the SOFNN, it is assumed that the approximation error 3(t) satisfies (Hornik, Stinchcombe, & White, 1989; Wang, 1992)

Remark 2. If a nonlinear system is stable, using the SOFNN to approximate this system with above on-line weight learning algorithm, the parameters (i.e. weights) converge to a finite vector as time approaches infinity. Proof. Using (26) and (28) and the l1 matrix norm, the change in parameter vector between the two consecutive instants t and tK1 can be written as ^ K Qðt ^ K 1Þk1 Z kLðtÞ½dðtÞ K pT ðtÞQðt ^ K 1Þk1 kQðtÞ Z kQðtÞpðtÞeðtÞk1 % kQðtÞk1 kpðtÞk1 keðtÞk1 :

ð42Þ

And, pffiffiffiffiffi pffiffiffiffiffiqffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi kQðtÞk1 % M kQðtÞk2 Z M rðQH ðtÞQðtÞÞ pffiffiffiffiffi Z M lmax ðQðtÞÞ;

j3ðtÞj! d:

where lmax(Q(t)) is the maximum eigenvalue of the M!M Hermitian matrix Q(t). Using (26) and (27), Q(t)p(t) can be expressed as

Here, d is an arbitrary positive small value. The estimation error e(t) is ^ K 1Þ; eðtÞ Z dðtÞ K pT ðtÞQðt i.e. e(t) is the error in predicting the signal d(t) one step ^ ahead based on the estimation parameter QðtK 1Þ: Using (26)–(28), the absolute value of the approximation error can be expressed as

QðtÞpðtÞ Z ½I K LðtÞpT ðtÞQðt K 1ÞpðtÞ Z Qðt K 1ÞpðtÞ K

^ j3ðtÞj Z jdðtÞ K pT ðtÞQðtÞj

Z

^ K 1Þj Z j½I K pT ðtÞLðtÞ½dðtÞ K pT ðtÞQðt  Zj 1K Zj

(41)

T

Qðt K 1ÞpðtÞ : 1 C pT ðtÞQðt K 1ÞpðtÞ

However, from (40)



p ðtÞQðt K 1ÞpðtÞ eðtÞj 1 C pT ðtÞQðt K 1ÞpðtÞ

1 % 1; j1 C pT ðtÞQðt K 1ÞpðtÞj

eðtÞ j! d: 1 C pT ðtÞQðt K 1ÞpðtÞ

(43)

and using the above (43), the matrix norm of Q(t)p(t) can be expressed as,

Notice in here, for MISO systems, pT(t) is a 1!M matrix, Q(tK1) is an M!M matrix, and p(t) is an M!1 matrix. pT(t)Q(tK1)p(t) is a scalar. So, if 1CpT(t)Q(tK1)p(t) is finite and pT(t)Q(tK1)p(t) does not equal K1 for all time, the estimation error e(t) converges to a finite value as given below

kQðtÞpðtÞk Z k

Z

j1 C p ðtÞQðt K 1ÞpðtÞk3ðtÞj

Then, kQðtÞk Z max

pðtÞs0

Z jeðtÞj! j1 C pT ðtÞQðt K 1ÞpðtÞjd: However, ensuring that the condition in (32) is always met during the training process, leads to: (40)

Qðt K 1ÞpðtÞ k 1 C pT ðtÞQðt K 1ÞpðtÞ

kQðt K 1ÞpðtÞk % kQðt K 1ÞpðtÞk: j1 C pT ðtÞQðt K 1ÞpðtÞj

T

j1 C pT ðtÞQðt K 1ÞpðtÞjR 1:

Qðt K 1ÞpðtÞpT ðtÞQðt K 1ÞpðtÞ 1 C pT ðtÞQðt K 1ÞpðtÞ

kQðtÞpðtÞk kQðt K 1ÞpðtÞk % max pðtÞs0 kpðtÞk kpðtÞk

Z kQðt K 1Þk: Using l2 matrix norm results in kQðtÞk2 % kQðt K 1Þk2 :

G. Leng et al. / Neural Networks 17 (2004) 1477–1493

Thus,

4. Test results

kQðtÞk2 % kQðt K 1Þk2 %/% kQð1Þk2 :

(44)

This can be written in terms of largest eigenvalues as, lmax ðQðtÞÞ% lmax ðQðt K 1ÞÞ%/% lmax ðQð1ÞÞ:

(45)

Also kQ(t)k pffiffiffiffiffi 1 can be rewritten pffiffiffiffiffi as, kQðtÞk1 % M lmax ðQðtÞÞ% M lmax ðQð1ÞÞ: Next, kpðtÞk1 Z

u X

jjjt j C

jZ1

r X u X

jjjt xkt j Z 1 C

kZ1 jZ1

r X

jxkt j;

kZ1

where at time t, input xtZ[x1t x2t / xrt]. Among all input vectors, suppose the jth input vector has the maximum value Vmax Vmax Z

1485

r X

To demonstrate the effectiveness of the proposed algorithm, there are three examples problems analysed under MATLAB in this paper. They are the two-input nonlinear sinc function approximation, the nonlinear dynamic system identification, and the currency exchange rate prediction. The results of this proposed algorithm are also compared with other algorithms, such as the GDFNN (Wu et al., 2001), the ANFIS (Jang, 1993), and the DFNN (Wu & Er, 2000). Example 1. Two-input nonlinear sinc function. This example was used to demonstrate the efficiency of the ANFIS in Jang (1993). The function is defined as zZsin cðx;yÞZ

jxkj j:

kZ1

For a stable system, Vmax should be a bounded value and therefore kpðtÞk1 Z 1 C

r X

jxkt j% 1 C Vmax :

kZ1

Then using (44), (42) can be rewritten as, pffiffiffiffiffi ^ K Qðt ^ K 1Þk1 % M lmax ðQð1ÞÞð1 C Vmax ÞjeðtÞj: kQðtÞ However, from (41) lim eðtÞ Z 0;

t/N

so ^ K Qðt ^ K 1Þk1 Z 0: lim kQðtÞ

(46)

t/N

Hence Remark 2 is proved.

,

3.6. Computational complexity After pruning or addition of a neuron, the structure has to be updated. This requires updating the matrix Q(t) using (25). The time required is therefore of the order O(nUM2) and the memory required is of the order O(nM), where n is the number of training data, U is the number of times the structure needs to be updated by adding or pruning a neuron, and MZu!(rC1) in which r is the number of input variables and u is the number of EBF neurons in the network structure. Compared with other dynamic adaptation method based algorithms used in DFNN (Er & Wu, 2002; Wu & Er, 2000) and GDFNN (Wu & Er, 2001) that use the linear least square (LLS) algorithm in batch mode, the computational complexity of SOFNN is significantly reduced. In DFNN and GDFNN, the time required in the approach based on LLS algorithm is of the order O(n2M2). Usually, n[U, so the time required for SOFNN is better than that of DFNN and GDFNN.

sinðxÞsinðyÞ ; x2½K10;10; y2½K10;10: xy (47)

The training data consisted of uniformly sampled 121 two-input data and the corresponding target data. Another set of uniformly sampled 121 input-target data was used as the testing data. The training parameters were dZ0.12, s0Z4, krmseZ0.03, kd(1)Z2, kd(2)Z2. Fig. 3 shows the highly nonlinear surface for the training data. The results are presented in Figs. 4–8. A set of nine EBF neurons is generated with four membership functions for input x and five membership functions for input y. The total number of parameters is 45. The RMSE of training is 0.0565 and the RMSE of testing is 0.0956. Compared with the ANFIS, the RMSE of training of SOFNN is larger than that of ANFIS with 250 epochs (see Fig. 11 in Jang, 1993), but it is better than that of a neural network trained by quick-propagation learning with 250 epochs (Jang, 1993). The number of rules (EBF neurons) in this work is less than the 16 rules in Jang (1993). The total number of parameters in this network is also less than 72 in Jang (1993).

Fig. 3. Training data.

1486

G. Leng et al. / Neural Networks 17 (2004) 1477–1493

Fig. 4. RMSE during training. Fig. 6. Growth of neurons.

Considering that the neurons in SOFNN were created without the prior knowledge and with noniterative learning, the result using SOFNN is a significantly favorable performance. To determine the effect of noise, the training data was mixed with Gaussian white noise sequences that had zero mean and different variances (or standard deviations) as given in Table 1. The results are presented in Table 1 and Figs. 9–14. The algorithm clearly finds the most notable characteristics of the output surface. The small perturbation caused by the noise is ignored. The noise becomes a part of the training error and as a result that the training error increases within the tolerance allowed in the algorithm. However, when white noise with significantly high variance (e.g. sZ0.1) is introduced, more neuron(s) are needed for adjustment to a noisy surface so as to arrive at a training error within the tolerance limit. The result is thus similar to that obtained in Rojas, Pomares, Ortega, and Prieto (2000).

Fig. 7. Membership functions of input x.

Fig. 5. Errors during training.

Fig. 8. Membership functions of input y.

G. Leng et al. / Neural Networks 17 (2004) 1477–1493 Table 1 Results of two-input nonlinear sinc function with noise Variances (s2)

Number of neurons

Number of parameters

RMSE of training

RMSE of testing

sZ0 sZ0.01 sZ0.05 sZ0.1

9 9 9 16

45 45 45 68

0.0565 0.0568 0.0605 0.0767

0.0956 0.0963 0.0983 0.1496

Example 2. Nonlinear dynamic system identification. The plant is described as yðt C 1Þ Z

yðtÞyðt K 1Þ½yðtÞ C 2:5 C uðtÞ; 1 C y2 ðtÞ C y2 ðt K 1Þ

t 2½1; 200; yð0Þ Z 0;   2pt uðtÞ Z sin : 25

yð1Þ Z 0;

(48) Fig. 10. The output of trained SOFNN (sZ0.01).

A set of 200 input-target data was chosen as training data. The model is identified in series-parallel mode, as given below ^ C 1Þ Z f ðyðtÞ; yðt K 1Þ; uðtÞÞ: yðt

(49)

There are three inputs and one output in this network. Another 200 input-target data in the interval [401,600] are chosen as the testing data. The values of the training parameters are dZ0.04, s0Z0.4, krmseZ0.028, kd(1)Z0.5, kd(2)Z0.5, kd(3)Z0.2. The results are shown in Table 2 and Figs. 14–22. As shown in Fig. 18, the SOFNN organizes its structure with five neurons. There are two instances when one neuron is being pruned and one instance when three neurons are being pruned in the learning process. Note that in the learning process, the center vector and the width vector are changing during the course of adding or pruning of neuron(s). Fig. 17 shows that neuron(s) may be added when the error is larger than the threshold dZ0.04. Fig. 16 shows

Fig. 9. Training data with noise (sZ0.01).

Fig. 11. Training data with noise (sZ0.05).

Fig. 12. The output of trained SOFNN (sZ0.05).

1487

1488

G. Leng et al. / Neural Networks 17 (2004) 1477–1493 Table 2 Results of nonlinear dynamic system identification Works

Number of neurons

Number of parameters

RMSE of training

RMSE of testing

DFNN GDFNN SOFNN

6 6 5

48 48 46

0.0283 0.0241 0.0157

–a –a 0.0151

a

No result is listed in the original paper.

Fig. 13. Training data with noise (sZ0.1).

that the network will judge whether to delete neuron(s) or not when the RMSE is less than the threshold EZ max(lErmse,krmse). Fig. 18 shows no new neuron is added in the structure after tZ39. But the widths and/or the parameter matrix W2 are adjusted continuously. Fig. 16 shows that the RMSE of training is 0.0139 at tZ39. Figs. 20–22 show that the numbers of membership functions for input y(t), y(tK1), and u(t) are five, four, and four, respectively. Compared with the DFNN and the GDFNN, the SOFNN gives a very good performance in this example. Example 3. Currency exchange rate prediction. A set of real data is applied in the investigation for forecasting problem and to demonstrate the effectiveness of the proposed algorithm. The data represent the daily averaged exchange rates between the UK pound and the US dollar during the period from 3 January 1986 to 31 May 2002. There are 4281 observations. These data are divided into two sets. The first 3000 data are the training data and other data are the testing data. In the N-step-ahead

Fig. 14. The output of trained SOFNN (sZ0.1).

Fig. 15. Training result (– desired output, $ network output).

model ^ C NÞ Z f ðyðtÞ; yðt K tÞ; .; yðt K ðn K 1ÞtÞÞ; yðt

(50)

defining NZ6, tZ1, and nZ6, the prediction model is given by ^ C 6Þ Z f ðyðtÞ; yðt K 1Þ; yðt K 2Þ; yðt K 3Þ; yðt (51) yðt K 4Þ; yðt K 5ÞÞ:

Fig. 16. RMSE during training.

G. Leng et al. / Neural Networks 17 (2004) 1477–1493

Fig. 17. Errors during training.

Fig. 20. Membership functions of input y(t).

Fig. 18. Growth of neurons.

Fig. 21. Membership functions of input y(tK1).

Fig. 19. Testing result (— desired output, $ network output).

Fig. 22. Membership functions of input u(t).

1489

1490

G. Leng et al. / Neural Networks 17 (2004) 1477–1493

Fig. 23. Training result (— desired data, $ actual data).

Fig. 26. Growth of neurons.

Fig. 24. RMSE during training.

Fig. 27. Testing result (— desired data, $ actual data).

Fig. 25. Errors during training.

Fig. 28. Actual output error of testing data.

G. Leng et al. / Neural Networks 17 (2004) 1477–1493

Fig. 29. Membership functions of input y(t).

This model has six inputs and one output. It is used to predict the value of the next sixth step. The values of the training parameters are dZ0.05, s0Z0.4, krmseZ 0.01, k d(1)Z0.2, k d(2)Z0.2, k d(3)Z0.2, k d(4)Z0.2, kd(5)Z0.2, kd(5)Z0.2. The results are shown in Figs. 23–28. A 25-EBF-neuron network is generated. The RMSE of training data is 0.0266. The RMSE of testing data is 0.0183. The membership functions of input variables are shown in Figs. 29–34. Fig. 35 shows a frequency distribution of prediction errors for six-step-ahead prediction. This histogram gives a Gaussian distribution. The mean is K0.0044 and the standard deviation is 0.0177. As the average of errors is near 0 and the spread of errors is small, this figure shows that the obtained network is a valid model. The coefficient of correlation for the testing target and testing output is 0.9824.

Fig. 30. Membership functions of input y(tK1).

1491

Fig. 31. Membership functions of input y(tK2).

5. Conclusions A new algorithm is proposed for creating a selforganizing fuzzy neural network (SOFNN) that identifies a singleton or TS type fuzzy model on-line. Based on the dynamic adaptation method, both the number of input–output space partitions and their corresponding fuzzy rule configurations are simultaneously and concurrently adapted by the SOFNN algorithm. The architecture of the SOFNN consists of ellipsoidal basis function (EBF) neurons with a center vector and a width vector, in the first hidden layer. The EBF neuron represents the premise part of a fuzzy rule formed by AND logic (or T-norm) operating on Gaussian fuzzy membership functions. The elements of center vector and width vector of the EBF neuron are the centers and widths of the Gaussian membership functions. The structure learning approach

Fig. 32. Membership functions of input y(tK3).

1492

G. Leng et al. / Neural Networks 17 (2004) 1477–1493

Fig. 33. Membership functions of input y(tK4).

Fig. 35. Distribution of prediction error for testing data.

combines system error criterion and firing strength based criterion for adding new EBF neurons using the concepts from statistical theory and the 3-completeness of fuzzy rules. This ensures that the membership functions have the capability to cover more data than that in GDFNN which uses an algorithm falling in the similar category. The algorithm includes a pruning method that is based on the optimal brain surgeon (OBS) approach. This method is computationally very efficient for on-line implementation, as it does not involve any significant additional computation because it makes direct use of the Hermitian matrix obtained as a part of the proposed RLS algorithm. A modified recursive least squares (RLS) algorithm is derived through the proof of convergence. The proofs of the convergence of the estimation error and the linear network parameters provide conditions for guaranteed convergence in the proposed RLS parameter learning algorithm. Since the new EBF neurons are created based

on both the approximation error and the coverage criteria, the SOFNN architecture is less sensitive to the sequence of training data pattern. Moreover, the SOFNN algorithm is superior in time complexity to the other dynamic adaptation method based algorithms used in the DFNN and the GDFNN, which fall in the similar category but make use of the linear least squared (LLS) method for determining linear network parameters. The presented approach is shown to be very simple and effective and generates a fuzzy neural model with a high accuracy and compact structure. Simulation work shows that the SOFNN is suitable for application to the area of function approximation with less effort from the designer and has the capability of self-organization to determine the structure of the network automatically. Though the proposed algorithm has a number of free training parameters, these can however be decided based on easily understood criteria. Based on detailed experimental analysis (Leng, 2004), a general rule for selecting these training parameters can be suggested. The smaller the error tolerance (d) and the expected training RMSE (krmse), the better is the performance of the resulting network, but more expensive is the structure. By choosing about 10–20% of the input data range as the distance threshold (kd), extremely good performance is obtained in almost all the cases, as clearly seen in the results presented in the paper. Usually, the smaller the percentage, the more EBF neurons are generated, and the performance of the obtained network may be better. Generally, the initial width (s0) of about two times of the smallest value of the input distance thresholds is observed to be an appropriate value in all the simulation problems. A detailed sensitivity analysis of all the parameters is yet to be performed. This issue is being looked into as part of further work. Further investigation is also ongoing to apply the SOFNN to other application areas.

Fig. 34. Membership functions of input y(tK5).

G. Leng et al. / Neural Networks 17 (2004) 1477–1493

Acknowledgements The first author is supported by a Vice-Chancellor’s Research Scholarship of University of Ulster. The authors would like to thank Dr Liam Maguire for his comments on the draft paper.

References Astrom, K. J., & Wittenmark, B. (1995). Adaptive control (2nd ed.). Reading, MA: Addison Wesley. Berthold, M., & Huber, K. P. (1999). Constructing fuzzy graphs from examples. International Journal of Intelligent Data Analysis, 3, 37–53. Chao, C. T., Chen, Y. J., & Teng, C. C. (1995). Simplification of fuzzy-neural systems using similarity analysis. IEEE Transactions on Systems, Man, and Cybernetics. Part B: Cybernetics, 26, 344–354. Chen, S., Cowan, C. F. N., & Grant, P. M. (1991). Orthogonal least squares learning algorithm for radial basis function network. IEEE Transactions on Neural Networks, 2, 302–309. Cho, K. B., & Wang, B. H. (1996). Radial basis function based adaptive fuzzy systems and their applications to identification and prediction. Fuzzy Sets and Systems, 83, 325–339. Er, M. J., & Wu, S. (2002). A fast learning algorithm for parsimonious fuzzy neural systems. Fuzzy Sets and Systems, 126, 337–351. Franklin, G. F., Powell, J. D., & Workman, M. L. (1990). Digital control of dynamic systems (2nd ed.). Reading, MA: Addison Wesley. Gonzalez, J., Rojas, I., Pomares, H., Ortega, J., & Prieto, A. (2002). A new clustering technique for function approximation. IEEE Transactions on Neural Networks, 13, 132–142. Hassibi, B., & Stork, D. G. (1993). Second-order derivatives for network pruning: Optimal brain surgeon Advances in Neural Information Processing, Vol. 4, (pp. 164–171). Los Altos, CA: Morgan Kaufman . Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2, 359–366. Jang, J. S. R. (1993). ANFIS: adaptive-network-based fuzzy inference system. IEEE Transactions on Systems, Man, and Cybernetics, 23, 665–684. Jang, J. S. R., & Sun, C. T. (1995). Neuron-fuzzy modeling and control. Proceedings of IEEE, 83, 378–406. Klawonn, F., & Keller, A. (1998). Grid clustering for generating fuzzy rules. European congress on intelligent techniques and soft computing (EUFIT98), Aachen, Germany (pp. 1365–1369). Lee, C. C. (1990). Fuzzy logic in control systems: fuzzy logic controller— parts I and II. IEEE Transactions on Systems, Man, and Cybernetics, 20, 404–435.

1493

Leng, G. (2004). Algorithmic developments for self-organising fuzzy neural networks. PhD Thesis, University of Ulster, UK. Leung, C. S., Wong, K. W., Sum, P. F., & Chan, L. W. (2001). A pruning method for the recursive least squared algorithm. Neural Networks, 14, 147–174. Lin, C. T. (1995). A neural fuzzy control system with structure and parameter learning. Fuzzy Sets and Systems, 70, 183–212. Mascioli, F. M. F., & Martinelli, G. (1998). A constructive approach to neuro-fuzzy networks. Signal Processing, 64, 347–358. Mascioli, F. M. F., Rizzi, A., Panella, M., & Martinelli, G. (2000). Scalebased approach to hierarchical fuzzy clustering. Signal Processing, 80, 1001–1016. Mitra, S., & Hayashi, Y. (2000). Neuro-fuzzy rule generation: survey in soft computing framework. IEEE Transactions on Neural Networks, 11, 748–768. Nauck, D. (1997). Neuro-fuzzy systems: review and prospects. Proceedings of the fifth european congress on intelligent techniques and soft computing (EUFIT’97) (pp. 1044–1053). Nauck, D., & Kruse, R. (1999). Neuro-fuzzy systems for function approximation. Fuzzy Sets and Systems, 101, 261–271. Rizzi, A., Panella, M., & Mascioli, F. M. F. (2002). Adaptive resolution min–max classifiers. IEEE Transactions on Neural Networks, 13, 402–414. Rojas, I., Pomares, H., Ortega, J., & Prieto, A. (2000). Self-organized fuzzy system generation from training examples. IEEE Transactions on Fuzzy Systems, 8, 23–36. Simpson, P. K. (1992). Fuzzy min–max neural networks. Part 1: classification. IEEE Transactions on Neural Networks, 3, 776–786. Simpson, P. K. (1993). Fuzzy min–max neural networks. Part 1: classification. IEEE Transactions on Fuzzy Systems, 1, 32–45. Takagi, T., & Sugeno, M. (1985). Fuzzy identification of systems and its applications to modeling and control. IEEE Transactions on Systems, Man, and Cybernetics, 15, 116–132. Wang, J. S., & Lee, C. S. G. (2001). Efficient neuro-fuzzy control systems for autonomous underwater vehicle control. Proceedings of 2001 IEEE international conference on robotics and automation (pp. 2986–2991). Wang, L. X. (1992). Fuzzy systems are universal approximators. Proceedings of the International Conference on Fuzzy Systems, 1163–1170. Wang, W. J. (1997). New similarity measures on fuzzy sets and on elements. Fuzzy Sets and Systems, 85, 305–309. Wu, S., & Er, M. J. (2000). Dynamic fuzzy neural networks-a novel approach to function approximation. IEEE Transactions on Systems, Man, and Cybernetics. Part B: Cybernetics, 30, 358–364. Wu, S., Er, M. J., & Gao, Y. (2001). A fast approach for automatic generation of fuzzy rules by generalized dynamic fuzzy neural networks. IEEE Transactions on Fuzzy Systems, 9, 578–594. Zadeh, L. A. (1994). Soft computing and fuzzy logic. IEEE Software, 11, 48–56.