Multivariable Gaussian Evolving Fuzzy Modeling ... - Semantic Scholar

Report 2 Downloads 58 Views
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 1

Multivariable Gaussian Evolving Fuzzy Modeling System Andre Lemos Student Member, IEEE, Walmir Caminhas, Fernando Gomide Senior Member, IEEE

Abstract—This paper introduces a class of evolving fuzzy rulebased system as an approach for multivariable Gaussian adaptive fuzzy modeling. The system is an evolving Takagi-Sugeno functional fuzzy model whose rule-base can be continuously updated using a new recursive clustering algorithm based on participatory learning. The fuzzy sets of the rule antecedents are multivariable Gaussian membership functions, adopted to preserve information between input variables interactions. The parameters of the membership functions are estimated by the clustering algorithm. A weighted recursive least squares algorithm updates the parameters of the rules consequents. Experiments considering time series forecasting and nonlinear system identification are performed to evaluate the performance of the approach proposed. The multivariable Gaussian evolving fuzzy models are compared with alternative evolving fuzzy models and classic models with fixed structures. The results suggest that multivariable Gaussian evolving fuzzy modeling is a promising approach for adaptive system modeling. Index Terms—Evolving Fuzzy Systems, Participatory Learning, Adaptive Fuzzy Rule-Based Modeling

I. I NTRODUCTION URING the last decades, fuzzy rule based systems and their hybrid derivations have played an important role for modeling complex systems. Initially the design of such systems were based on expert knowledge [1]. During the 1990s, a new trend emerged, based on data-driven rule/knowledge extraction methods [2]–[8], where the system structure is identified based on data, and the expert knowledge plays a complementary role [9]. The techniques used in this period are mainly clustering, linear least-squares and/or nonlinear optimization for fine tunning of model parameters. Nowadays, increasing interest in adaptive system modeling has motivated the development of highly adaptive fuzzy systems, denominated evolving fuzzy systems, whose models are self-developed from a stream of data [10]. Evolving fuzzy systems (eFS) can be seen as a synergy between fuzzy systems, as a mechanism for evolvable information compactation and representation, and recursive methods of machine learning [11]. Several eFS can be found in literature, some of them proposing evolvable functional rule-based systems in which the structure (number of rules and antecedents/consequent parameters) continuously evolves based on clusters created/excluded by recursive clustering algorithms [12]. The parameters of the consequents are updated using recursive least squares or its variations [13], [14]. In [15] an evolving Takagi-Sugeno model (eTS) was proposed using an incremental version of the subtractive clustering algorithm [3] with recursive evaluation of the information potential of new data samples to create new clusters or revise the existing ones. The rule consequent

D

parameters are updated with the recursive least squares algorithm. A simplified version of the eTS was suggested in [16] to reduce the complexity of information potential calculations. The Simpl_eTS replaces the notion of information potential by the concept of scatter to provide a similar, but computationally more effective algorithm. The DENFIS (Dynamic Evolving Neural-Fuzzy System) [17] is another evolvable TS system derived from a distance-based recursive evolving clustering method (ECM) to adapt the rule base structure, and a weighted recursive least squares with forgetting factor algorithm to update rules consequent parameters. SONFIN (Self-constructing Neural Fuzzy Inference Network) [18] uses an input space partition approach with an aligned clusteringbased method to define and eliminate redundant fuzzy sets for each input variable in an incremental manner. FLEXFIS (Flexible Fuzzy Inference System), detailed in [19], uses a recursive clustering algorithm derived from a modification of the vector quantization technique [20] called eVQ (Evolving Vector Quantization) [21]. Consequent parameters estimation uses the weighted recursive least squares algorithm. More recently, an evolving neuro-fuzzy type-2 model SOFMLS (Self-organizing Fuzzy Modified Least-squares Network) was developed [22]. The SOFMLS employs an evolving nearestneighborhood clustering algorithm. The literature also reports alternative methods to adapt model structure. The SAFIS (Sequential Adaptive Fuzzy Inference System) [23] uses a distance criterion in conjunction with an influence measure of the new rules created. The SOFNN (Self-organising Fuzzy Neural Network) [24] adopts an error criterion considering the generalisation performance of the network. Although most of the evolving system approaches develop functional fuzzy models, the literature also mentions linguistic models. For instance, in [25] an evolving fuzzy classifier is developed using the clustering algorithm of [16]. In [26] an evolvable linguistic modeling methodology for fault detection is proposed. The methodology, denominated FSPC, uses a distance-based clustering inspired by statistical process control [27] to learn the parameters of the antecedent and consequent fuzzy sets. The recursive clustering algorithms used in most of the eFS found in literature can be summarized by a two-step procedure. First, a distance measure is defined and an initial number of clusters (and respective cluster centers) estimated from a priori knowledge about the problem; alternatively, a single cluster with the center at the first data sample is created. Second, for each new data sample, the distance between the existing clusters and the sample is computed and if the distance exceeds a threshold, then a new cluster is created; otherwise, the parameters of the closest cluster are updated using recursive algorithms [16], [17], [19], [22], [25]. Despite

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 2

their effectiveness, the clustering algorithms constructed upon this two-step procedure have a major drawback in the sense that they lack robustness. Whenever noisy data or outliers exceed a threshold, the two-step algorithms create new clusters instead of reject or smooth the data. A robust evolving TS system was developed in [28] using a recursive clustering algorithm inspired by the idea of participatory learning [29]. Participatory learning is a learning paradigm which assumes that learning and beliefs about the system to be modeled depend on what the learning process has already learned. An essential characteristic of this learning process is that a new observation impact in causing learning or belief revision depends on its compatibility with the current system belief. Therefore, clustering algorithms based on this learning process tend to be robust to noisy data because single outliers are likely to be incompatible with the current system belief and consequently can be either discarded or have their effect smoothed. This paper introduces a new evolving fuzzy modeling approach called Multivariable Gaussian Evolving Fuzzy Modeling System, eMG for short. The eMG uses an evolving Gaussian clustering algorithm rooted in the concept of participatory learning. The evolving clustering procedure developed here can be seen as an extension of the one addressed in [30]. Differently from the batch algorithm of [30], here each cluster is represented by a multivariable Gaussian membership function, the cluster structure (number, center and shape of the clusters) is recursively updated at each step of the algorithm, and thresholds are set automatically. More specifically, the evolving clustering algorithm developed in this paper considers the possibility that input variables may interact with each other. Clusters are estimated using a normalized distance measure (similar to the Mahalanobis distance) and trigger ellipsoidal clusters whose axes are not necessarily parallel to the input variables axes, as it would be the case if the Euclidean distance were used [15], [17], [19]. The idea is to preserve information about interactions between input variables. The fuzzy sets of the rules antecedents are multivariable Gaussian membership functions characterized by a center vector, and a dispersion matrix representing the dispersion of each variable and interactions between variables. Similarly as in other evolving system modeling approaches [15], [19], the parameters of the fuzzy rules consequents are updated using weighted recursive least squares. As discussed latter, the evolving clustering algorithms used by some eFS create a new cluster whenever a distance measure exceeds a given threshold value [17], [19], [22]. As pointedout in [19], to avoid the curse of dimensionality, the threshold value must be chosen considering the input or input-output data space dimensions. This is because the higher the space dimension, the greater the distance between two adjacent data points [31]. Thus, if the threshold does not accounts for the dimension, then as the dimensionality increases more observations will have the corresponding distances exceeding this threshold. Therefore more clusters are created, the model becomes more complex and over-fitting may occur. The recursive clustering algorithm introduced here avoids the curse of dimensionality through an automatic mechanism to adjust the

threshold value based on the input space dimension. The eMG developed in this paper differs from the existing evolving modeling approaches because • it uses multivariable membership functions for the fuzzy sets of the rules antecedents to prevent information loss about input variables interactions (section II); • the recursive clustering algorithm is robust to noisy data and outliers because it derives from the concept of participatory learning, a concept which provides a mechanism to smooth incompatible data (section III); • creation of new rules is governed by an automatic mechanism to adjust threshold value considering input space dimension and, therefore, does not suffer from the curse of dimensionality (section III); • there is no need to normalize input and output data. The remainder of the paper is organized as follows. Section II reviews the issue of why multivariable Gaussian membership functions of fuzzy rule-based systems created by clustering procedures helps to estimate interactions between the input variables and improve modeling. Next, section III details the evolving clustering algorithm within the framework of participatory learning. The complete eMG algorithm is presented in section IV. Section V addresses applications of eMG in time series forecasting modeling and nonlinear system identification. Finally, the conclusions and further developments are summarized in section VI. II. M ULTIVARIABLE G AUSSIAN M EMBERSHIP F UNCTIONS All sections and experiments reported in this paper assume multivariable Gaussian membership functions of the form: ¸ · 1 −1 T (1) H(x) = exp − (x − v)Σ (x − v) 2 where x is an 1 × m input vector, v is the 1 × m center vector and Σ is a m × m symmetric, positive definite matrix. The center vector v is the modal value and represents the typical element of H(x). The matrix Σ denotes the dispersion and represents the spread of H(x) [32]. Both, v and Σ, are parameters of the membership function to be associated with cluster center and cluster spread, respectively. There are several motivations for using Gaussian membership functions. They include: • infinite support of these functions, they do not omit inputs; • parameters are easy to define and only two are required; • interactions between input variables can be easily captured by the dispersion matrix. It is interesting to note that, because the modeling framework is adaptive and runs recursively, the issue of infinite support is convenient because input variables bounds may not be known a priori. It is also known in the neural network literature that Gaussian radial basis functions are good to characterize local properties [33]. Moreover, Gaussians provide a way to construct basis functions and fuzzy systems can be represented as linear combinations of fuzzy basis functions. Linear combinations of the fuzzy basis functions are capable

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 3 1

1

0.9

0.8

A1

A2

A3

0.6 0.8 0.4 0.7 0.2 x2

0.6 0 0.2

0.3

0.4

0.5

0.5

0.6 x1

0.7

0.8

0.9

1

0.8

0.9

1

1

0.4

0.8

0.3

B2

B1

B3

0.6 0.2 0.4 0.1 0.2

Fig. 1.

0.3

0.4

0.5

0.6 x1

0.7

0.8

0.9

1

0.2 0 0.2

0.3

0.4

0.5

Cluster structure in the input space

0.6 x2

0.7

Fig. 2. Single variable Gaussian membership functions: projections of clusters of Fig. 1 on the input spaces

x1 is A1 AND x2 is B1 x1 is A2 AND x2 is B2 x1 is A3 AND x2 is B3

(2)

Fig. 3 shows the fuzzy relations induced by the single variable membership functions, assuming the product t-norm aggregation (AND) operator for each rule antecedent. Fig.

1 0.9 0.8 0.7 0.6 x2

of uniformly approximating any real continuous function on a compact set to arbitrary accuracy, i.e., they are universal approximators [34]. Functional fuzzy models with Gaussians also are universal approximators [35]. Most of eFS systems perform clustering in the input or input-output data space, and create rules using onedimensional, single variable fuzzy sets which are projections of the clusters on each input variable space. During fuzzy inference, the fuzzy relation induced by the antecedent of each fuzzy rule is computed using an aggregation operator (e.g. a t-norm) and the input fuzzy sets. This approach is commonly used, but it may cause information loss if input variables interact [36], [37]. For instance, system identification and time series forecasting usually use lagged values of the input and/or output as inputs, and these lagged values tend to be highly related. To avoid information loss, the algorithm introduced herein uses multivariable, instead of single variable Gaussian membership functions to represent each cluster developed by the recursive clustering algorithm. The parameters of the membership functions are extracted directly from the corresponding clusters. These multivariable membership functions use the information about the dispersion matrix of each cluster (estimated by the clustering algorithm proposed in this paper) and thus provide information about input variables interactions. Fig. 1 illustrates a typical scenario where the use of the projection approach may imply in information loss. While Fig. 1 illustrates the cluster structure of the input space, Fig. 2 shows the single variable Gaussian membership functions created by projecting the clusters on the input spaces. There are three clusters in the example and each cluster represents the input relation of a fuzzy rule antecedent. The three input relations are

0.5 0.4 0.3 0.2 0.1 0.2

Fig. 3. AND)

0.3

0.4

0.5

0.6 x1

0.7

0.8

0.9

1

Fuzzy relations induced by rules antecedents (product t-norm for

4 depicts the multivariable Gaussian membership functions constructed using the cluster center vector and dispersion matrix estimated for each cluster. The multivariable membership functions suggest three rules whose antecedents are: x is H1 x is H2 x is H3

(3)

Comparing the fuzzy relations shown in Fig. 3 and those of Fig. 4 with the original clusters in Fig. 1, one can note the information loss due to projection and reconstruction with the product t-norm (the issue of finding a t-norm that recovers the original relation precisely still is, generally speaking, open). Cluster shapes and their orientation differ from the original ones. In Fig. 3 clusters have diagonal dispersion matrices and input variables x1 and x2 do not interact. In contrast, dispersion matrices of the clusters in Fig. 4 are nondiagonal and variables x1 and x2 do interact. Clearly, the cluster structure of Fig. 4 is closer to the original in Fig. 1. III. G AUSSIAN PARTICIPATORY E VOLVING C LUSTERING The evolving clustering algorithm introduced in this paper is based on the concept of participatory learning (PL) [29].

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 4 1

than the threshold, i.e, aki > Ta for i = arg maxi ρki , then a new cluster is created. Otherwise the cluster center with the highest compatibility is adjusted as follows:

0.9 H2 0.8

H1

0.7

vik+1 = vik + Gki (xk − vik )

x2

0.6

(4)

H3 0.5

where Gki is defined as:

0.4 0.3

k

Gki = α(ρki )1−ai

0.2 0.1 0.2

Fig. 4.

0.3

0.4

0.5

0.6 x1

0.7

0.8

0.9

(5)

1

Multivariable Gaussian membership functions

Participatory learning assumes that learning and beliefs about the system to be modeled depend on what the model has already learned. In other words, the current knowledge about the system is part of the learning process itself and influences the way in which new observations are used in learning. An essential characteristic of this learning process is that a new observation impact in causing learning or belief revision depends on its compatibility with the current system belief. Therefore, clustering algorithms based on this learning process tend to be robust to noise and outliers because they are likely to be incompatible with the current system belief and consequently can be discarded or smoothed. The participatory learning clustering algorithm provides an automatic mechanism to decide if a new observation lying far outside current cluster structure denotes either a new cluster to be included into the model, or if it is an outlier who should be discarded or smoothed. The outlier smoothing mechanism provided by the participatory learning also introduces a sort of stability during learning. Models based on participatory learning have the characteristic of protecting the current system belief from wide swings of the inputs caused by erroneous or anomalous observations, while still allowing learning of new knowledge [38]. The clustering algorithm proposed assumes that the knowledge about the system to be modeled is the cluster structure, i.e, the number of clusters, the corresponding cluster centers vik for i = 1, · · · , ck , where ck is the number of clusters at step k, and the shape of clusters encoded in Σk . At each step, the learning process may create a new cluster, modify the parameters of an existing one, or merge two similar clusters. The cluster structure is updated using a compatibility measure ρki ∈ [0, 1] and an arousal index, aki ∈ [0, 1]. The compatibility measure computes how much an observation is compatible with the current cluster structure, while the arousal index is the output of a arousal mechanism that acts as a critic to remind when the current structure should be revised in front of new information contained in data. Thresholds are defined for the compatibility measure (Tρ ) and the arousal index (Ta ). If at each step the compatibility measure of the current observation is less than the threshold for all clusters, i.e, ρki < Tρ ∀ i = 1, · · · , ck , and the arousal index of the cluster with the greatest compatibility is greater

and α ∈ [0, 1] is the basic learning rate. According to [29], the compatibility ρki is a function that measures the compatibility between the current belief of the model, represented by each cluster center, and the current observation: ρki = F (xk , vik )

(6)

The function F (xk , vik ) ∈ [0, 1] is such that it should approach zero as observations become contradictory with the current belief, i.e, the cluster centers, and approach one as the observations become in complete agreement with the current belief. For example, if xk is equal to a cluster center, then F (xk , vik ) = 1. If aki = 0, then Gki = αρki and the PL procedure has no arousal. The learning rate is modulated by the compatibility measure only. Observations with ρki = 0 provides no new information because vik+1 = vik , while an observation with ρki = 1 does bring new information. For example, if α = 1 and ρki = 1, then vik+1 = xk . In the above, notice that the basic learning rate α is modulated by the compatibility ρ. When there are no participatory considerations, α is often made a small value to preclude great swings due to spurious values of inputs which are far from the current cluster structure. Small values of α protect against the influence of bad inputs, but may slow down the learning process. The introduction of the participatory term ρ allows the use of higher values of α. The learning rate of the participatory learning model is dynamic and ρ acts in a way that it lowers the learning rate when large deviations occur. However, when the compatibility is large, ρ is such that it speeds up learning. The arousal index is the output of an arousal mechanism used to measure the confidence about the current knowledge of the system. For example, while a single low value of the compatibility measure causes aversion to learning, a sequence of low values of the compatibility measure should imply on a revision of the current knowledge about the system. The arousal mechanism is defined as a monitoring mechanism of the dynamics of the compatibility measure. This mechanism monitors the values of the compatibility level and its output is interpreted as the complement of the confidence about the current knowledge. A low value of aki implies in a high confidence about the system belief, while a high value indicates the necessity to revise the current belief. Analysis of expression (5) shows that, as the arousal index increases, the compatibility measure has a reduced effect, indicating that if a sequence of observations presents low compatibility

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 5

Arousal Mechanism

ρ

a

Observations Learning Process

Fig. 5.

Beliefs

Participatory learning procedure

values, then it is more likely that the current knowledge is incorrect and must be revised. When this happens, the value of the compatibility measure is reduced and the current observation will provide more information about the system when compared with the information provided without the arousal mechanism. As explained later in this section, the extreme case is when the arousal index exceeds a threshold and a new cluster is generated. Fig. 5 illustrates the participatory learning procedure, including the basic idea of using the current beliefs on the learning process and the arousal index monitoring mechanism. The compatibility measure ρki suggested here in this paper uses the squared value of the normalized distance between the new observation and cluster centers (M-Distance): M (xk , vik ) = (xk − vik )(Σki )−1 (xk − vik )T

(7)

To compute the M-Distance, the dispersion matrix of each cluster Σki must be estimated at each step. The recursive estimation of the dispersion matrix proceeds as follows: Σk+1 = (1 − Gki )(Σki − Gki (xk − vik )(xk − vik )T ) i

(8)

The compatibility measure at each step k is given by: ¸ · 1 k k k k k (9) ρi = F (x , vi ) = exp − M (x , vi ) 2 To find a threshold value for the compatibility measure, we assume that the values M (xk , vik ) can be modeled by a ChiSquare distribution. Thus, given a significance level λ, the threshold can be computed as follows: · ¸ 1 Tρ = exp − χ2m,λ (10) 2 where χ2m,λ is the λ upper unilateral confidence interval of a Chi-Square distribution with m degrees of freedom, where m is the number of inputs. The compatibility measure is based on a normalized distance measure (7). As discussed below, the corresponding threshold (10) must be adjusted considering the input space dimension to avoid the curse of dimensionality. This is because, as the input space dimension increases, the distance between two adjacent points also increases. If a fixed threshold value

is used and it does not depend of the input space dimension, then the number of threshold violations will increase, which may lead to an excessive generation of clusters. Looking at expression (10), one can note that the compatibility measure threshold includes information about the data space dimensionality because χ2m,λ is a function of the number m of inputs. Therefore no manual adjust is needed and the curse of dimensionality is automatically avoided. In other words, the clustering method has an automatic mechanism to adjust the compatibility measure threshold according to input space dimension. As the data dimension increases, the distance between two adjacent points also increases, and the respective compatibility measure decreases. However, the compatibility measure threshold also decreases avoiding excessive threshold violations. To further demonstrate the effectiveness of the automatic adjustment of the threshold Tρ , Table I summarizes the results of a numerical experiment performed to compute the average number of threshold violations when the input data dimension increases. Data sets were generated randomly using Gaussian distributions for different data dimension m ∈ [1, 100]. For each data set, a cluster centered at the mean vector with dispersion matrix equals to the covariance was constructed. The compatibility measure is computed for each data sample and the cluster center. Next, the number of compatibility threshold violations is counted. Since the compatibility measure threshold derives from the unilateral confidence interval built using confidence level λ, the expected number of violations is nλ. For each dimension m, 100 data sets with n = 1000 samples were generated. The t-test [39] was used to check if differences between the computed and expected number of threshold violations are statistically significant. The t-test has a null hypothesis that the difference between the expected and computed threshold violations are random samples from a normal distribution with zero mean and unknown variance, i.e., there is no statistical significant difference between the expected and computed number of threshold violations. The expected number of threshold violations for all data sets is 50 because λ = 0.05 and n = 1000 in all experiments. Table I displays the average number of compatibility threshold violations for each value of data dimension m. Also shown in the table is the p-value of the statistical test. Low p-values mean significant differences between the expected and computed number of violations. Table I shows that as the dimension increases, there is no statistical difference between the number of computed and expected compatibility threshold violations, for a t-test significance level of 0.01, i.e., in the table all p-values are greater than 0.01. These results demonstrate experimentally the effectiveness compatibility measure threshold suggested here in preventing the curse of dimensionality. This paper adopts an arousal mechanism to monitor the compatibility index using a sliding window assembled by the last w observations. More specifically, we define the arousal index as the probability of observing less then nv violations of the compatibility threshold on a sequence of w observations. Low values of the arousal index are associated with no or few violations of the compatibility threshold, implying a high confidence about the system knowledge. High values of the

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 6

Average # of observed threshold violations

p-value

1 5 10 15 25 50 100

48.96 49.00 49.92 49.40 51.09 49.55 50.00

0.13 0.19 0.91 0.38 0.09 0.55 1.00

arousal index are associated with several threshold violations, meaning that the current cluster structure must be revised. To compute the arousal index for each observation, a related occurrence value oki is found using the following expression ½ 0 for M (xk , vik ) < χ2n,λ oki = (11) 1 otherwise Notice that the occurrence value oki = 1 indicates threshold violation. Occurrence value oki can also be viewed as the output of a statistical test to evaluate if the values of M (xk , vik ) are normal. The null hypothesis of the corresponding test is that M (xk , vik ) can be modeled by a Chi-Square distribution with m degrees of freedom. Under null hypothesis, the probability of observing oki = 1 is λ because λ defines χ2n,λ and it is the probability of observing a false positive, i.e., M (xk , vik ) > χ2n,λ . Since the nature of oki is binary and the probability of observing oki = 1 is known, the random variable associated with oki can be described by a Bernoulli distribution [39] with probability of success λ. Given a sequence assembled by the last w observations, the number of threshold violations nvik is: ½ Pw−1 k−j k>w j=0 oi nvik = (12) 0 otherwise Notice that nvik is computed during the first w steps. This means that the algorithm has an initial latency of w steps. However this causes no problem because usually w is much smaller than the number of steps in which learning occurs. For instance, in real-time applications learning can happen continuously. The discrete probability distribution of observing nv threshold violations on a window of size w is p(N Vik = nv), with N Vik assuming the values nv = 0, 1, · · · , w. Thus, because N Vik is the sum of a sequence of i.i.d. random variables drawn from a Bernoulli distribution with the same probability of success λ, p(N Vik = nv) can be characterized by the Binomial [39] distribution:

p(N Vik

 µ ¶  w λnv (1 − λ)w−nv nv = nv) =  0

nv = 0, · · · , w otherwise (13)

The binomial distribution gives the probability of observing nv threshold violations in a sequence of w observations. High

0.2 0.18 0.16 0.14 p(N Vik = nv)

Data dimension

probability values enforce the assumption that observations fit the current cluster structure while low probability values suggests that the observations should be described by a new cluster. For example, Fig. 6 shows p(N Vik = nv) the probability of observing 0 to 50 threshold violations in ¤a sequence of £ 50 observations, given Tρ = exp −1/2χ22,0.1 , λ = 0.1 and w = 50. One can note that there is a significant probability of observing 10 or less violations, and a very low probability of observing 20 or more violations.

0.12 0.1 0.08 0.06 0.04 0.02 0 0

10

20

30

40

50

nv

Fig. 6. Probability of observing nv threshold violations in a sequence of 50 observations, λ = 0.1

The arousal index is defined as the value of the cumulative probability of N Vik , i.e. aki = p(N Vik < nv). Fig. 7 illustrates the arousal index as a function of the number v of violations observed when λ = 0.1 and w = 50. As Fig. 7 indicates, aki ≈ 1 for about 12 violations and above. 1 0.9 0.8 0.7 0.6 ak i

TABLE I AVERAGE C OMPATIBILITY M EASURE T HRESHOLD V IOLATIONS AS D ATA D IMENSION I NCREASES

0.5 0.4 0.3 0.2 0.1 0 0

10

20

30

40

50

nv

Fig. 7. Arousal index as the probability of observing less than nv threshold violations in a sequence of w = 50 observations given λ = 0.1

The threshold value of the arousal index Ta is 1 − λ, where λ is the same as the one that defines the threshold for the compatibility measure. The minimum number of compatibility threshold violations on a window of size w necessary to exceed Ta can be computed numerically looking for the first value of nv for which the discrete cumulative distribution is equal to or greater than 1 − λ. More formally:

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 7

¯ nv µ ¯ ¶ ¯X ¯ w ¯ ¯ nv ∗ = arg min ¯ λk (1 − λ)w−k − (1 − λ)¯ nv nv ¯ ¯ k=1 (14) As discussed later, the clustering algorithm proposed continually revises the current cluster structure and eventually merges similar clusters. The compatibility between the updated or created cluster and all remaining cluster centers is computed at each step. If, for a given pair, the compatibility exceeds the threshold Tρ , then the two clusters are merged, i.e., if ρki (vjk , vik ) > Tρ or ρkj (vik , vjk ) > Tρ , then clusters j and i are merged. The compatibility between two clusters i and j is computed as follows: · ¸ 1 k k k k k ρi (vj , vi ) = exp − M (vj , vi ) (15) 2 where M (vjk , vik ) is the M-distance between cluster centers i and j, that is: M (vjk , vik )

=

(vjk



vik )(Σki )−1 (vjk



vik )T

If two clusters are merged, then the center of the resulting cluster is the average of the corresponding clusters centers and the dispersion matrix is Σinit . Fig. 8 summarizes the recursive multivariable participatory fuzzy clustering algorithm assuming that it starts with a single observation. IV. G AUSSIAN E VOLVING F UZZY S YSTEM M ODELING This section details the procedure to estimate the remaining parameters of eMG modeling. For simplicity, we assume that an eMG model is a set of first-order Takagi-Sugeno (TS) fuzzy rules whose consequent parameters are estimated using the recursive weighted least squares algorithm. The number of eMG rules is the same as the number of clusters found by the clustering algorithm at each step. At each iteration a new cluster can be created, an existing cluster removed, or existing clusters updated. In other words, rules can be created, merged, or adapted at each step of the algorithm. Rules antecedents are of the form: xk is Hi

(18)

(16)

To check if two clusters are similar, is necessary to compute ρki (vjk , vik ) and ρkj (vik , vjk ) because usually Σki 6= Σkj . Notice that the clustering algorithm has only three parameters: • the basic learning rate α; • the window size w used by the arousal mechanism; • the confidence level λ to compute thresholds Tρ and Ta . The basic learning rate is usually set to a small value, i.e., typically α ∈ [10−1 , 10−5 ]. The window size w is a problem specific parameter because it defines how many consecutive observations must be considered to compute the arousal index. In other words, considering the current system knowledge, w defines the length of the anomaly pattern needed to classify data either as a new cluster or as a noise or outlier. The value of the significance level λ depends on w. It must be set such that the arousal threshold Ta corresponds to more than one compatibility threshold violation, i.e., nv > 1 when aki > Ta . Suggested ranges for values of λ, given w, are:   0.01, if w ≥ 100 0.05, if 20 ≤ w < 100 (17) λ≥  0.1, if 10 ≤ w < 20

The clustering process can be started using either a single observation or an initial data set. If initial data set is available, then an off-line clustering algorithm can be used to estimate the initial number of clusters and their respective parameters. The off-line algorithm should be capable to provide both, cluster centers and respective dispersion matrices. If the clustering process starts with a single observation, then an initial dispersion matrix Σinit must be chosen, eventually using a priori information about the problem. Whenever a new cluster is created during the clustering process, the new cluster center is set as the current observation, and the new dispersion matrix is the initial value Σinit .

where xk is a 1 × m input vector and Hi is a fuzzy set with multivariable Gaussian membership function (1) and parameters extracted from the corresponding cluster center and dispersion. The model is formed by a set of functional fuzzy rules: k Ri : IF xk is Hi THEN yik = γio +

m X

k k γij xi

(19)

j=1

where Ri is the ith fuzzy rule, for i = 1, · · · , ck , ck is the k k number of rules , and γio and γij are the parameters of the consequent at step k. The model output is the weighted average of the outputs of the each rule, that is: k

k

yˆ =

c X

Ψi (xk )yik

(20)

i=1

with normalized membership functions: ¤ £ k k T exp (xk − vik )Σ−1 i (x − vi ) Ψi (x ) = Pck ¤ £ k −1 (xk − v k )T k k i i=1 exp (x − vi )(Σi ) k

(21)

where vik and Σki are the center and dispersion matrix of the ith cluster membership function at step k. The parameters of the consequent are updated using the weighted recursive least squares [14], [40] algorithm, similarly as other TS evolving fuzzy models [15], [19]. Hence, the consequent parameters and matrix Qi of the update formulas for rule i at each iteration k are: γik+1

=

Qk+1 i

=

£ ¤ γik + Qk+1 xk Ψi (xk ) yik − ((xk )T γik ) i Ψi (xk )Qki xk (xk )T Qki Qki − (22) 1 + (xk )T Qki xk

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 8

Procedure MGEC Input: k,xk ,v,Σ,λ,w,Σinit ,c Output: v, Σ, cluster_created, cluster_merged, c, idxk if k == 1 then Initialize first cluster; v1 = x1 ; Σ1 = Σinit ; c = 1; end Compute ρi and ai for all clusters; for i = 1, c do −1 k T M (xk , vi )£ = (xk − vi )(Σ ¤ i ) (x − vi ) ; 1 k ρi = exp − 2 M (x , vi ) ; if ρi < Tρ then oki = 1; else oki = 0; end if k > w then Pw−1 nvi = l=0 ok−l ; i ai = p(N Vik < nvi ); else ai = 0; end end idx = arg maxi ρi ; if ρi < Tρ ∀i and akidx > Ta then Create new cluster; c = c + 1; vc = xk ; Σc = Σinit ; idx = c; cluster_created = c; else Update an existing cluster; Gidx = α(ρidx )1−aidx ; vidx = vidxk + Gidx (xk − vidx ); Σidx = (1 − Gidx )(Σidx − Gidx (xk − vidx )(xk − vidx )T ); end Check for redundant clusters; for i = 1, c do if ρidx (vidx , vi ) > Tρ or ρj (vj , vidx ) > Tρ then Merge two redundant clusters; vidx = mean(vj , vidx ); Σidx = Σinit ; c = c − 1; cluster_merged = [idx j]; end end Fig. 8.

Multivariable Gaussian evolving clustering

As discussed previously, the eMG algorithm can be initialized either with an existing data set, or with a single observation. If the eMG starts with an existing data set, then an offline clustering algorithm can be used to estimate the number and parameters of the initial set of rules. Clustering can be done in the input space and a rule created for each cluster. The antecedent parameters of each rule is extracted from the clusters, and the consequent parameters estimated by the weighted least squares algorithm. If the eMG starts with a single observation, then one rule is created with the antecedent membership function centered at the observation, and the respective dispersion matrix set at the pre-defined initial value. The consequent parameters are initialized as γ 0 = [y 0 0 · · · 0] and Qk = ωIm+1 , where Im+1 is an m + 1 identity matrix and ω is a large real value, for example, ω ∈ [102 , 104 ] [40]. As new data is input, the eMG algorithm may create, update or merge clusters. Thus, the set of rules, the rule-base, must be updated as well. This is done as follows. If a new cluster is created, then a corresponding rule is also created with antecedent parameters extracted from the cluster and consequent parameters computed as the weighted average of the parameters of the existing clusters: Pck k k k i=1 γi ρi γnew = P ck k i=1 ρi

(23)

The matrix Q is set as Qknew = ωIm+1 If an existing cluster is updated, then the antecedent parameters of the corresponding rule are updated accordingly. Finally, if two clusters i and j are merged, then the consequent parameters of the resulting rule are computed as follows: k γnew =

γik ρki + γjk ρkj ρki + ρkj

(24)

The matrix Q is set as Qknew = ωIm+1 Fig. 9 summarizes the multivariable Gaussian evolving fuzzy system modeling algorithm. V. E XPERIMENTS The eMG model was tested using short term load forecasting, long term time series forecasting, classic nonlinear system identification, and high dimensional system identification problems. The results obtained were compared with alternative evolving and fixed structure modeling approaches. All data sets selected for the experiments, except the short term load forecasting problem which uses actual data, are the same as the ones used to evaluate evolving models reported in the literature [15]–[17], [19], [22], [23], [41]. In all experiments, modeling performance was evaluated using the root mean squared error (RMSE) and/or the nondimensional error (NDEI). The NDEI is the ratio of the root mean squared error by the standard deviation of the target data. The error measures are computed as follows:

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 9 1

Fig. 9.

Evolving multivariable Gaussian fuzzy system modeling

RM SE =

Ã

! 12 N 1 X k (y − yˆk ) N

(25)

k=1

N DEI =

RM SE std(y k )

(26)

where N is the size of the test data set, y k is the target output, and std() is the standard deviation function. A. Short Term Load Forecasting Forecasts of load demand are very important in the operation of electric energy systems because several decision making process, such as system operation planning, security analysis, and market decisions are strongly influenced by the future values of the load. In this context, a significant error in the load forecast may result in economic losses, security constraints violations, and system operation drawbacks. Accurate and reliable load forecasting models are essential for a suitable system operation. The problem of load forecasting can be classified in long, medium and short-term, depending on the situation. Long-term forecasting is important for capacity

0.9 0.8 0.7 0.6

kW

Procedure eMG Input: x,yd ,λ,w,Σinit Output: ys γ1 = [y 1 0 · · · 0]; Q1 = ωIm+1 ; for k = 1, length(x) do Compute the output; for i = 1, c do £ ¤ k ρi = exp − 21 (xk − vi )T Σ−1 i (x − vi ) ; yi = xk γi ; end P c ρi yi ; ys = Pi=1 c i=1 ρi [v, Σ, cluster_created, cluster_merged, c, idxk ] = MGEC(k, xk , v, Σ, λ, w, Σinit , c); if cluster_created then CreatePac−1 new rule; γi ρi γc = Pi=1 ; c−1 i=1 ρi Qc = ωIm+1 ; end for i = 1, c do Update consequent parameters; £ ¤ γi = γi + Qi xk Ψi (xk ) yik − ((xk )T γi ) ; k )Qi xk (xk )T Qi Qi = Qi − Ψi (x ; 1+(xk )T Qi xk end if cluster_merged then Merge two rules; [ij] = cluster_merged; γ ρ +γ ρ γi = i ρii +ρjj j ; Qi = ωIm+1 ; end end

0.5 0.4 0.3 0.2 0.1 0

10

20

30

40

50

60

70

80

Hour

Fig. 10.

Normalized load data for the first 3 days of August, 2000

expansion of the electric system. Medium term is important to organize the fuel supply, maintaining operations and interchange scheduling. Short-term forecasting is generally used for daily planning and operation of the electrical system, energy transfer and demand management [42]. Particularly, in the context of short-term hydrothermal scheduling, load forecasting is important to elaborate the next day operation scheduling because errors in load forecasting can cause serious consequences, affecting the efficiency and the safety of the system (cost increasing, insufficient electrical energy supply to the existing demand). The goal of short-term load forecasting is to accurately predict the 24 hourly loads of the next operation day, one-stepahead. The effectiveness of the eMG approach is illustrated using load data of a major electrical utility located at the southeast region of Brazil. The load data set used in this experiment are expressed in kilowatts per hour (kW/hour), and corresponds to 31 days of August, 2000. Fig. 10 depicts the normalized load values of the first 3 days of August, 2000. The eMG approach is scale-invariant but for this experiment the data was normalized between 0 and 1 to preserve privacy. The eMG model is a one-step ahead forecaster whose purpose is to predict the current load value using lagged load values in the series. The sample partial autocorrelation function [43] for the first 36 observations of the series suggests the use of the last 2 previous values of load values as inputs of the models, that is, the forecast model is of the form: yˆk = f (y k−1 , y k−2 )

(27)

The experiment has been conducted as follows. The hourly load for first 30 days were input to the eMG algorithm (718 observations) and the evolved model performance evaluated using data of the last day (24 observations), keeping the model structure and parameters fixed at the values found after evolving during the period of 30 days. The eMG started clustering with the first observation, and the parameters were chosen as λ = 0.05, w = 20, Σinit = 10−2 I2 and α = 0.01. Fig. 11 shows the forecasting result and Fig. 12 sketches the final 5 clusters found after learning. Looking at Fig. 12 one can note that the resulting clusters have distinct orientations and most of them are not parallel to the input axes.

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 10 1 Actual Value 0.9

eMG Value

0.8 0.7

kW

0.6 0.5 0.4 0.3 0.2 0.1 0

5

10

15

20

25

Hour

Fig. 11.

Load forecasting TABLE II P ERFORMANCE OF THE ELECTRICITY LOAD FORECASTING METHODS

1.4

Model name and source

1.2

xTS [41] eMG (w = 25) MLP [44] ANFIS [2] eTS [15] ePL [28] eMG (w = 20)

1

0.8 yk−2

has three fuzzy sets for each input variable an seven fuzzy rules. The MLP adopted the following scheme for initialization phase: small weights values randomly assigned; α = 0.9 as momentum parameter; 1000 as the maximum number of epochs, and a adaptive learning-rate starting from η = 0.01 as initial step size. The ANFIS has 1000 as maximum number of epochs, ηi = 0.01 as initial step size, sd = 0.9 as step size decrease rate and si = 1.1 as step size increase rate. The parameters of the eTS model were set to r = 0.4 and Ω = 750. The xTS [41] has a Ω = 750. The parameters of the ePL model were τ = 0.3, r = 0.25, λ = 0.35. Table II suggests that eMG performs best among all the models. If the window size is set to w = 25, then eMG also outperforms xTS using the same number of rules.

0.6

Number of rules (or nodes)

RMSE

NDEI

4 4 5 9 6 5 5

0.0574 0.0524 0.0552 0.0541 0.0527 0.0508 0.0459

0.2135 0.1948 0.2053 0.2012 0.1960 0.1889 0.1707

0.4

0.2

B. Mackey-Glass Time Series Forecasting 0 −0.2

0

0.2

0.4

0.6

0.8

1

1.2

yk−1

Fig. 12.

Input space cluster structure for load forecasting

Fig. 13 shows the evolution of the number of fuzzy rules during the learning stage. Rules were merged for t ≈ 320 and t ≈ 700 . Table II shows how eMG performs against evolving and fixed structure modeling methods using RMSE and NDEI error measures. The MLP has one hidden layer with five neurons trained with backpropagation algorithm, the ANFIS 7

6

Number of rules

5

4

3

2

1

0 100

Fig. 13.

200

300

400 t (Hour)

500

600

Number of rules for load forecasting during learning

700

In this experiment the eMG model is used to develop long term forecasting of the Mackey-Glass time series [45]. The time series is created with the use of the following time-delay differential equation: 0.2x(t − τ ) dx(t) = dt 1 + x10 (t − τ )

(28)

where x(0) = 1.2, τ = 17. The aim is to forecast the value xk+85 from the input vector k−18 [x xk−12 xk−6 xk ] for any value of k. The experiment has been done as follows. Initially, 3000 data samples were collected for k ∈ [201, 3200] and used as inputs of the evolving learning procedure of all models used for comparison. Next, 500 data samples for k ∈ [5001, 5500] were collected to verify the performance after evolving. The performance was evaluated using the NDEI and compared with evolving fuzzy approaches. Fig. 14 shows the forecast result and Fig. 15 depicts the evolution of the number of fuzzy rules during the learning stage. During learning, several rule merge occurred. The eMG started clustering with the first observation and the parameters were set as λ = 0.05, w = 30, Σinit = 10−2 I4 and α = 0.01. Table III summarizes the performance of eMG against evolving fuzzy models. The parameters of the eTS model were set to r = 0.4 and Ω = 750. The xTS [41] has a Ω = 750. The results obtained for DENFIS [17] and FLEXFIS [19] were taken from the respective references. From Table III one can note that eMG, with Σinit = 10−2 I4 , achieves better performance than eTS and xTS with

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 11 1.6

Actual Value

Actual Value eMG Value

1.5

eMG Output

1.4 1 1.2 0.5

0 Output

Output

1

0.8

−0.5

0.6 −1 0.4 −1.5 0.2 0

50

100

150

200

250

300

350

400

450

500

−2 0

k

20

40

60

80

100

120

140

160

180

200

k

Fig. 14.

Mackey-Glass time series forecasting Fig. 16.

Nonlinear system identification

12 6

10 5

8

Number of rules

Number of rules

4

6

4

3

2

2 1

0 500

1000

1500

2000

2500

3000

0 500

k

1000

1500

2000

2500

3000

3500

4000

4500

5000

k

Fig. 15. learning

Number of rules for the Mackey-Glass time series forecast during

approximately the same number of fuzzy rules . With the initial dispersion matrix set to Σinit = 10−3 I4 and with w = 20 eMG outperforms the remaining three models with similar or simpler model. C. Nonlinear System Identification In this section, the eMG model is compared with alternative evolving fuzzy modelling approaches using a classic nonlinear system identification problem. The nonlinear system to be identified is the following: TABLE III P ERFORMANCE OF M ACKEY-G LASS T IME S ERIES F ORECASTING M ETHODS Model name and source eTS [15] xTS [41] eMG (Σinit = 10−2 I4 , w = 30) DENFIS [17] FLEXFIS Var A [19] FLEXFIS Var B [19] eMG (Σinit = 10−3 I4 , w = 20)

Number of rules

NDEI

9 10 9 58 69 89 58

0.372 0.331 0.321 0.276 0.206 0.157 0.139

Fig. 17. learning

Number of rules for the nonlinear system identification during

yk =

y k−1 y k−2 (y k−1 − 0.5) + uk−1 1 + (y k−1 )2 + (y n−2 )2

(29)

where uk = sin(2πk/25), y 0 = y 1 = 0. The aim is to predict the current output based on past input and outputs. The model for this data set is of the form: yˆk = f (y k−1 , y k−2 uk−1 )

(30)

where yˆk is the output. The experiment was performed as follows. First, 5000 samples were created for evolving learning and next an additional 200 samples created to evaluate the performance after evolving, keeping the model structure and parameters fixed. Performance was evaluated using the RMSE and compared with other evolving fuzzy approaches. Fig. 16 shows the result and Fig. 17 the evolution of the number of fuzzy rules. The model structure was defined after k = 175 and no rule merging was performed. The eMG starts clustering with the first observation. The parameters are λ = 0.05, w = 40, Σinit = 10−1 I3 and α = 0.01.

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 12

TABLE IV P ERFORMANCE OF N ONLINEAR S YSTEM I DENTIFICATION M ETHODS

Actual Value 2

eMG Value

1.5

SAFIS [23] SOFMLS [22] FLEXFIS Var A [19] FLEXFIS Var B [19] xTS [41] eMG

Number of rules

RMSE

17 5 5 8 5 5

0.0221 0.0201 0.0176 0.0171 0.0063 0.0058

1

0.5 Output

Model name and source

0

−0.5

−1

Table IV summarizes the performance of eMG against representative evolving models. The xTS [41] parameter was set to Ω = 750. The results for SAFIS [23], FLEXFIS [19] and SOFMLS [22] were taken from the respective references. Table IV suggests that eMG performs best than the first 4 models and has similar performance as xTS [41].

−1.5

−2 0

50

100

150

200

250

300

k

Fig. 18.

High Dimensional Nonlinear system identification

15

D. High Dimensional System Identification Problem

where uk = sin(2πk/20), y j = 0 for j = 1, · · · , m and m = 10. The purpose is to predict the current output using past input and outputs. The model for this data set is of the form: yˆk = f (y k−1 , y k−2 , ..., y k−10 , uk−1 )

10 Number of rules

The performance of the proposed model was also evaluated using a high dimensional system identification problem. This experiment aims to evaluate how the proposed model behaves when dealing with higher dimensional input spaces. The nonlinear system to be identified is: Pm k−i y k Pi=1 y = + uk−1 (31) m 1 + i=1 (y k−i )2

5

0 500

1000

1500

2000

2500

3000

k

Fig. 19. Number of rules for the high dimensional nonlinear system identification during learning

(32)

where yˆk is the model output. The experiment has been done as follows. The first 3000 samples were created for learning and the next 300 samples to evaluate performance, keeping the model structure and parameters fixed after evolving. Performance was evaluated using the RMSE and compared with representative evolving fuzzy approaches. Fig. 18 shows the result and Fig. 19 how fuzzy rules evolve during learning. The model structure was learned during the first 1000 steps. Three rule merge instances occurred during learning. As before, the eMG started clustering with the first observation. The parameters adopted were λ = 0.05, w = 25, Σinit = 10−1 I11 and α = 0.01. Table V summarizes the performance of eMG, eTS [15] (r = 0.6, Ω = 750), xTS [41] (Ω = 750), and FLEXFIS [19]. Table V suggests that eMG performs best among all the models with less or similar number of rules. With initial dispersion matrix set to Σinit = 2 × 10−1I4 eMG outperforms xTS with the same number of fuzzy rules. VI. C ONCLUSION This paper has introduced a new method to develop evolving fuzzy models emphasizing functional Takagi-Sugeno models

within the framework of multivariable Gaussian membership functions. The method uses a recursive clustering algorithm and is capable to continually modify the cluster structure whenever new data samples becomes available. It can be used in both, off-line and on-line, real-time environments. The clustering algorithm uses the concept of participatory learning and is robust to noisy data and outliers because it smooths their effect during clustering. The clustering algorithm recursively estimates the clusters centers and respective clusters dispersion matrices. Rule antecedents uses multivariable Gaussian membership functions whose parameters are extracted directly from TABLE V P ERFORMANCE OF H IGH D IMENSIONAL N ONLINEAR S YSTEM I DENTIFICATION M ETHODS Model name and source xTS [41] eMG (Σinit = 2 × 10−1 I11 ) FLEXFIS Var A [19] eTS [15] eMG (Σinit = 10−1 I11 )

Number of rules

RMSE

9 9 15 14 13

0.0331 0.0288 0.0085 0.0075 0.0050

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 13

the clusters. Multivariable Gaussian membership functions were adopted because they preserve information about input variables interactions, estimated by the clustering algorithm through the dispersion matrix. The method was evaluated using time series forecasting and nonlinear system identification problems. The experiments performed and their results suggest that the method introduced here is a promising alternative to build adaptive functional fuzzy models. Future work shall address the use of neural like structures and evolving procedures using fuzzy neuron models based on generalizations of triangular norms and conorms. ACKNOWLEDGEMENT The authors thank the Brazilian National Research Council, CNPq, for grants 141323/2009-4, 309666/2007-4 and 304857/2006-8, respectively. The second author also acknowledges the support of FAPEMIG, the Research Foundation of the State of Minas Gerais, for grant PPM-00252-09. Comments of the anonymous referees are also kindly acknowledged. R EFERENCES [1] E. H. Mamdani and S. Assilian, “An experimenti in linguistic synthesis with a fuzzy logic controller,” Int. J. Man-Machine Studies, vol. 7, no. 1, pp. 1–13, 1975. [2] J.-S. Jang, “Anfis: adaptive-network-based fuzzy inference system,” IEEE Transactions on Systems Man and Cybernetics, vol. 23, no. 3, pp. 665–685, 1993. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=256541 [3] S. L. Chiu, “Fuzzy model identification based on cluster estimation,” J. Intell. Fuzzy Syst., vol. 2, pp. 267–278, 1994. [4] R. Yager and D. Filev, “Approximate clustering via the mountain method,” IEEE Transactions on Systems Man and Cybernetics, vol. 24, no. 8, pp. 1279–1284, 1994. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=299710 [5] N. K. Kasabov, Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering. Cambridge, MA, USA: MIT Press, 1996. [6] J.-S. R. Jang, C.-T. Sun, and E. Mizutani, Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence, 1st ed. Prentice-Hall, 1997. [7] P. Angelov and R. Guthke, “A genetic-algorithm-based approach to optimization of bioprocesses described by fuzzy rules,” Bioprocess Engineering, vol. 16, no. 5, p. 299, 1997. [Online]. Available: http://www.springerlink.com/index/10.1007/s004490050326 [8] R. Babuska, Fuzzy Modeling for Control. Kluwer, 1998. [9] L.-X. Wang and J. Mendel, “Generating fuzzy rules by learning from examples,” IEEE Transactions on Systems Man and Cybernetics, vol. 22, no. 6, pp. 1414–1427, 1992. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=199466 [10] P. Angelov, D. Filev, and N. Kasabov, “Guest editorial evolving fuzzy systems - preface to the special section,” IEEE Transactions on Fuzzy Systems, vol. 16, no. 6, pp. 1390–1392, Dec 2008. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4712536 [11] N. Kasabov and D. Filev, “Evolving intelligent systems: Methods, learning, applications.” IEEE, Sep 2006, pp. 8–18. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4016749 [12] J. V. d. Oliveira and W. Pedrycz, Advances in Fuzzy Clustering and its Applications. New York, NY, USA: John Wiley & Sons, Inc., 2007. [13] P. C. Young, Recursive Estimation and Time-Series Analysis: An Introduction. Springer-Verlag, 1984. [14] L. Ljung, System Identification. Prentice-Hall, 1999. [15] P. Angelov and D. Filev, “An approach to online identification of takagi-sugeno fuzzy models,” IEEE Transactions on Systems Man and Cybernetics Part B (Cybernetics), vol. 34, no. 1, pp. 484–498, Feb 2004. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1262519 [16] ——, “Simpl_ets: a simplified method for learning evolving takagisugeno fuzzy models.” IEEE, 2005, pp. 1068–1073. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1452543

[17] N. K. Kasabov and Q. Song, “Denfis: Dynamic evolving neural-fuzzy inference system and its application for time-series prediction,” IEEE Trans. Fuzzy Syst., vol. 10, no. 2, pp. 144–154, Apr 2002. [18] C. F. Juang and C. T. Lin, “An online self-constructing neural fuzzy inference network and its applications,” IEEE Trans. Fuzzy Syst., vol. 6, no. 1, pp. 12–32, Feb 1999. [19] E. D. Lughofer, “Flexfis: A robust incremental learning approach for evolving takagi-sugeno fuzzy models,” IEEE Transactions on Fuzzy Systems, vol. 16, no. 6, pp. 1393–1410, Dec 2008. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4529084 [20] R. M. Gray, “Vector quantization,” IEEE ASSP Mag., pp. 4–29, Apr 1984. [21] E. D. Lughofer, “Extensions of vector quantization for incremental clustering,” Pattern Recognition, vol. 41, no. 3, pp. 995–1011, Mar 2008. [Online]. Available: http://linkinghub.elsevier.com/retrieve/pii/S0031320307003354 [22] J. d. J. Rubio, “Sofmls: Online self-organizing fuzzy modified least-squares network,” IEEE Transactions on Fuzzy Systems, vol. 17, no. 6, pp. 1296–1309, Dec 2009. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5196829 [23] H. Rong, N. Sundararajan, G. Huang, and P. Saratchandran, “Sequential adaptive fuzzy inference system (safis) for nonlinear system identification and prediction,” Fuzzy Sets and Systems, vol. 157, no. 9, pp. 1260–1275, May 2006. [Online]. Available: http://linkinghub.elsevier.com/retrieve/pii/S0165011405006020 [24] G. Leng, T. M. McGinnity, and G. Prasad, “An approach for on-line extraction of fuzzy rules using a self-organising fuzzy neural network,” Fuzzy Sets Syst., vol. 150, no. 2, pp. 211–243, 2005. [25] P. P. Angelov and X. Zhou, “Evolving fuzzy-rule-based classifiers from data streams,” IEEE Transactions on Fuzzy Systems, vol. 16, no. 6, pp. 1462–1475, Dec 2008. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4529082 [26] D. Filev and P. Angelov, “Algorithms for real-time clustering and generation of rules from data,” in Advances in Fuzzy Clustering and its Applications, J. V. d. Oliveira and W. Pedrycz, Eds. New York, NY, USA: John Wiley & Sons, Inc., 2007. [27] D. C. Montgomery, Introduction to Statistical Quality Control. Wiley, 2001. [28] E. Lima, M. Hell, R. Ballini, and F. Gomide, “Evolving fuzzy modeling using participatory learning,” in Evolving Intelligent Systems: Methodology and Applications, P. Angelov, D. Filev, and N. Kasabov, Eds. Wiley-Interscience/IEEE Press, 2010. [29] R. Yager, “A model of participatory learning,” IEEE Transactions on Systems Man and Cybernetics, vol. 20, no. 5, pp. 1229–1234, 1990. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=59986 [30] L. Silva, F. Gomide, and R. Yager, “Participatory learning in fuzzy clustering.” IEEE, 2005, pp. 857–861. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1452506 [31] T. Hastie, R. Tibshirani, and J. H. Friedman, The Element of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, 2001. [32] W. Pedrycz and F. Gomide, Fuzzy Systems Engineering: Toward HumanCentric Computing. NJ, USA: Wiley Interscience, 2007. [33] R. Lippmann, “A critical overview of neural network pattern classifiers,” in Proceedings of the 1991 IEEE Workshop Neural Networks for Signal Processing, 1991, pp. 266–275. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=239515 [34] L.-X. Wang and J. Mendel, “Fuzzy basis functions, universal approximation, and orthogonal least-squares learning,” IEEE Transactions on Neural Networks, vol. 3, no. 5, pp. 807–814, 1992. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=159070 [35] V. Kreinovich, M. G., and H. Nguyen, “Fuzzy rule-based modeling as universal approximation tool,” in Fuzzy Systems: Modeling and Control, H. Nguyen and M. Sugeno, Eds. Boston, MA: Kluwer Academic, 1998, pp. 135–195. [36] E. Kim, M. Park, S. Kim, and M. Park, “A transformed input-domain approach to fuzzy modeling,” IEEE Trans. Fuzzy Syst., vol. 6, no. 4, pp. 596–604, Nov 1998. [37] J. Abonyi, R. Babuska, and F. Szeifert, “Modified gath-geva fuzzy clustering for identification of takagi-sugeno fuzzy models,” IEEE Transactions on Systems Man and Cybernetics Part B (Cybernetics), vol. 32, no. 5, pp. 612–621, Oct 2002. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1033180 [38] R. R. Yager, “Extending the participatory learning paradigm to include source credibility,” Fuzzy Optimization and Decision

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 14

[39] [40] [41] [42]

[43] [44] [45]

Making, vol. 6, no. 2, pp. 85–97, Jun 2007. [Online]. Available: http://www.springerlink.com/index/10.1007/s10700-007-9007-9 A. Papoulis, Probability, Random Variables, and Stochastic Processes. McGraw-Hill, 1984. K. Astrom and B. Wittenmark, Adaptive Systems, 1st ed. USA: Addison-Wesley, 1988. P. Angelov and X. Zhou, “Evolving fuzzy systems from data streams in real-time.” IEEE, Sep 2006, pp. 29–35. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4016721 S. Rahman and O. Hazim, “A generalized knowledge-based short-term load-forecasting technique,” IEEE Transactions on Power Systems, vol. 8, no. 2, pp. 508–514, May 1993. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=260833 G. E. P. Box and G. Jenkins, Time Series Analysis, Forecasting and Control. Holden-Day, Incorporated, 1990. R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd ed., S. Edition, Ed. Wiley-Interscience, 2000. M. C. Mackey and L. Glass, “Oscillation and chaos in physiological control systems,” Science, no. 2, pp. 287–289, 1977.

Andre Lemos received the B.Sc. degree in computer science in 2002, and the M.Sc. degree in electrical engineering in 2007 both from the Federal University of Minas Gerais, Belo Horizonte, Brazil. He is now a PhD student of the department of electrical engineering of the Federal University of Minas Gerais. His current research interests include adaptive and evolving intelligent systems, machine learning and applications in time series forecasting, fault detection and diagnosis, and nonlinear systems modeling.

Prof. Fernando Gomide Prof. Fernando Gomide received the B.Sc. degree in electrical engineering from the Polytechnic Institute of the Pontifical Catholic University of Minas Gerais (IPUC, PUCMG) Belo Horizonte, Brazil, the M.Sc. degree in electrical engineering from the University of Campinas (Unicamp), Campinas, Brazil, and the PhD degree in systems engineering from Case Western Reserve University (CWRU) Cleveland, Ohio, USA. He is professor of the Department of Computer Engineering and Automation (DCA), Faculty of Electrical and Computer Engineering (FEEC) of University of Campinas, since 1983. His interest areas include fuzzy systems, neural and evolutionary computation, modeling, control and optimization, logistics, multi agent systems, decision-making and applications. He was past vice-president of IFSA - International Fuzzy systems Association - past IFSA Secretary and member of the editorial board of IEEE Transactions on SMC-B and Fuzzy Sets and Systems. Currently he serves the board of NAFIPS - North American Fuzzy Information Processing Society - and the editorial boards of IEEE Transactions on SMC-A, Fuzzy Optimization and Decision Making, International Journal of Fuzzy Systems, and Mathware and Soft Computing. He is a past editor of Controle & Automação, the journal of the Brazilian Society for Automatics - SBA - the Brazilian National Member Organization of IFAC and IFSA. He is on the Advisory Board of the International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Journal of Advanced Computational Intelligence, and Intelligent Automation and Soft Computing. He is senior member of the IEEE, member of NAFIPS, EUSFLAT, and IFSA Fellow. He also serves the IEEE Task Force on Adaptive Fuzzy Systems and the IEEE Emergent Technology Technical Committee.

Prof. Walmir Caminhas graduate at electric engineering from the Federal University of Minas Gerais in 1987, master’s at electric engineering from the Federal University of Minas Gerais in 1989 and PhD at electric engineering from the University of Campinas in 1997. He is currently an associate professor at the Department of Electronics Engineering at Federal University of Minas Gerais, Belo Horizonte, Brazil. His research interests include computational intelligence and fault detection in dynamic system

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].