Estimating Fuzzy Membership Functions Parameters by the Levenberg-Marquardt Algorithm J. Botzheim
C. Cabrita
Department of Telecommunication and Media informatics, Budapest University of Technology and Economics, Hungary Email:
[email protected] Centre for Intelligent Systems, University of Algarve, Portugal Email:
[email protected] L. T. Kóczy
Abstract – In previous papers from the authors fuzzy model identification methods were discussed. The bacterial algorithm for extracting fuzzy rule base from a training set was presented. The Levenberg-Marquardt algorithm was also proposed for determining membership functions in fuzzy systems. In this paper the Levenberg-Marquardt technique is improved to optimise the membership functions in the fuzzy rules without Ruspini-partition. The class of membership functions investigated is the trapezoidal one as it is general enough and widely used. The method can be easily extended to arbitrary piecewise linear functions as well.
I.
A. E. Ruano
Department of Telecommunication and Media informatics, Budapest University of Technology and Economics, Hungary Institute of Information Technology and Electrical Engineering, Széchenyi István University, Győr, Hungary Email:
[email protected] INTRODUCTION
One of the crucial problems of fuzzy rule based modeling is how to find an optimal or at least a quasioptimal rule base for a certain system. In most applications there is no human expert so some automatic method to determine the fuzzy rule base must be employed. Some of these methods were inspired by evolutionary processes can be found in nature. Apart from the evolutionary methods, in the area of neural networks there are training algorithms known and these can be applied to fuzzy systems as well. These are useful when there are only numerical data about the problem. Neural networks can learn these data and have the capability to generalise. In this paper the LevenbergMarquardt algorithm [2,4] is improved, which is one of the best neural networks training algorithm to extract the trapezoidal membership functions from a pattern set. The paper is organised as follows. Section II describes the structure of the fuzzy systems used in this paper. The training algorithm is shown in Section III. Simulation results are shown in Section IV. Section V mentions some conclusions.
II.
Centre for Intelligent Systems, University of Algarve, Portugal Email:
[email protected] FUZZY SYSTEMS
It is assumed that the model consists of a rule base: R = {Ri},
(1)
Ri: IF (x1 is Ai1) AND (x2 is Ai2) AND ... AND (xn is Ain) THEN (y is Bi),
(2)
where:
Ri are the fuzzy rules, Aij the fuzzy sets of the jth antecedent in the ith rule, Bi are the fuzzy sets of the ith consequence, xj is the input variable in the jth dimension and y is the output. The fuzzy system can compute the output y for an input vector x. The task is to create the most suitable fuzzy rule base for a pattern set. In the following the coding scheme and the inference method that is used in the training algorithm will be detailed. In this paper the fuzzy rules do not need to be in Ruspini-partition like in our previous paper [4]. In this case the task can be solved with fewer number of rules because there is not any restriction prescribed for the position of the membership functions in the fuzzy rules. The class of membership function is restricted to trapezoidal, as it is general enough and widely used. They are described by four parameters with the four breakpoints of the trapezium. Moreover the membership functions are identified by the two indices i and j. So, the membership function Aij(aij,bij,cij,dij) belongs to the ith rule and the jth input variable, Bi(ai,bi,ci,di) is the output membership function of the ith rule. The relative importance of the jth fuzzy variable in the ith rule:
( )
µij xj =
xj −aij bij −aij
Ni, j,1(xj )+Ni, j,2(xj )+
dij −xj dij −cij
Ni, j,3(xj ) (3)
where aij≤bij≤cij≤dij must hold and:
problem and it is denoted by n. The task is to determine the breakpoints of the membership functions. For that purpose, a minimization criterion will be employed, that is related only with the quality of the fitting. The training criterion that will be employed is the usual Sum-of-the-Square of the Errors (SSE):
1, if x j ∈ aij , bij N i , j ,1 ( x j ) = 0, if x j ∉ aij , bij
( (
1, if x ∈ b , c j ij ij N i , j ,2 ( x j ) = 0, if x j ∉ bij , cij
) )
1, if x j ∈ cij , d ij N i , j ,3 ( x j ) = 0, if x j ∉ cij , d ij
Ω=
(4)
n
j =1
(5)
where n is the number of input dimensions. wi is the importance of the ith rule if the input vector is x and µi , j ( x j ) is the jth membership function in the ith rule. The ith output is being cut in the height wi. Then with the COG defuzzification method the output is calculated: R
∑ ∫ y ( x) =
2
2
Denoting by J the Jacobean matrix : ∂y ( x ( p ) )[ k ] ∂ par[ k ]
(6)
where R is the number of rules. If this defuzzification method is used, the integrals can be easily computed. In (6) y(x) will be the following:
R
1 y ( x) = 3
i =1
i
solution of
R
∑ 2wi (di − ai ) + wi2 (ci + ai − di − bi ) i =1
Ci = 3wi (di2 − ai2 )(1 − wi ) Di = 3wi2 (ci di − aibi )
III.
(10)
In (10), α is a regularization parameter, which controls both the search direction and the magnitude of the update. The search direction varies between the Gauss-Newton direction and the steepest direction, according to the value of α . This is dependent on how well the actual criterion agrees with a quadratic function, in a particular neighborhood.
+ Di + Ei )
Ei =wi3(ci −di +ai −bi )(ci −di −ai +bi )
(9)
where the vector par contains all membership functions’ parameters (all breakpoints in the membership functions), and k is the iteration variable. s [ k ] = par [ k ] − par [ k − 1] the LM update, is given as the ( J T [ k ] J [ k ] + α I )s [ k ] = − J T [ k ] e [ k ]
∑ (C
(8)
J [k ] =
µi ( y )dy
i =1 y∈suppµ ( y ) i
e [k ]
The most used method to minimize (8) is the Error-BackPropagation (BP) [3] algorithm, which is a steepest descent algorithm. The BP algorithm is a first-order method as it only uses derivatives of the first order. If no line-search is used, then it has no guarantee of convergence and the convergence rate obtained is usually very slow. If a secondorder method is to be employed, the best algorithm to use is the Levenberg-Marquardt (LM) algorithm [2], which explicitly exploits the underlying structure (sum-of-squares) of the underlying optimization problem.
i =1 y∈suppµ ( y ) i
∑ ∫
=
assumed that there are m patterns in the training set.
y µi ( y )dy
R
2
2
where t stands for the target vector, y for the output vector, denotes the 2-norm. It will be e for the error vector, and
Besides the structure of the fuzzy rule base, the inference method must be also discussed. The activation degree of the ith rule (the t-norm is the minimum):
wi = min µ ij ( x j )
t−y
(7)
THE TRAINING ALGORITHM
As pointed out above, it is assumed that the structure of the system has been already determined. The number of rules is R, and it is a parameter of the training algorithm. The number of input dimensions depends on the given
The good results presented by the LM method (compared with other second-order methods such as the quasi-Newton and conjugate gradient methods) are due to the explicit exploitation of the underlying characteristics of the optimization problem (a sum-of-square of errors) by the training algorithm. Notice that (10) can be recast as:
∂µij
J [ k ] e [ k ] s [k ] = − α I 0 +
∂d ij (11)
( )
The complexity of this operation is of O n3 , where n is the number of columns of J .
=
x (j p ) − cij ( d ij − cij ) 2
∂y and the derivatives of the output membership functions ∂wi parameters have to be computed also. From (7) the following can be written:
A. Jacobean computation Irrespectively of the method used and of the training criterion employed, the Jacobean matrix with respect to the parameters in the rules has to be computed. This will be done as shown below, in a pattern by pattern basis. Eq. (9) can be written as follows: ∂y( x ) ∂y( x ) ( p)
J =
∂a11
( p)
∂b11
( p)
L
∂y(x ) ∂a12
( p)
L
∂y( x ) ∂d1
∂y( x ) ( p)
L
∂dR
(12)
( ) N i , j ,3 ( x j p )
∂y ∂ *i
= *
1
den
3
∂Fi
*
∂ *i
− num
∂Gi
*
( den)
∂ *i
*
*
2
(16)
where *i = wi , ai , bi , ci , d i den is the denominator and num is the numerator of (7) , resp. Fi is the i* member of the sum *
in the numerator
and Gi is the i* member in the *
denominator. The derivatives will be given as follows:
where ∂y ( x
( p)
)
∂aij ∂y ( x
( p)
)
∂bij ∂y ( x
( p)
)
∂cij ∂y ( x
( p)
∂d ij
)
=
=
=
=
∂Fi
∂y ∂wi ∂µij ∂wi ∂µij ∂aij
∂Gi ∂ai
∂y ∂wi ∂µij
∂Fi
∂wi ∂µij ∂bij
∂Gi ∂bi
∂wi ∂µij ∂cij
∂Fi
∂y ∂wi ∂µij
From (5) it can be seen that wi depends on the membership functions, and each membership function depends only on four parameters (breakpoints). So, the derivatives of wi will be: n 1, if µ = min µik ij k =1 = ∂µij 0, otherwise
∂wi
(14)
The derivatives of the membership functions will be calculated as follows: ∂µij ∂aij ∂µij ∂bij ∂µij ∂cij
=
=
=
∂ci ∂Fi
= 6 wi d i − 6 wi2 d i + 3wi2 ci + 2 wi3 ( d i − ci )
∂d i ∂Gi ∂d i
∂wi
= wi2
= 2 wi − wi2
2 2 = 3( d i − ai )(1 − 2 wi ) + 6 wi (ci d i − ai bi )
2 2 2 +3wi [ (ci − d i ) − ( ai − bi )
∂Gi ∂wi
x (j p ) − bij (bij − aij )
∂Gi
∂Fi
= − wi2 = 3wi2 d i − 2 wi3 ( d i − ci )
∂ci (13)
= −2 wi + wi2 = −3wi2 ai + 2 wi3 ( ai − bi )
∂bi
∂y ∂wi ∂µij
∂wi ∂µij ∂d ij
= −6 wi ai + 6 wi2 ai − 3wi2 bi − 2 wi3 ( ai − bi )
∂ai
2
aij − x (j p ) (bij − aij ) 2 d ij − x (j p ) ( d ij − cij ) 2
N i , j ,1 ( x (j p ) )
]
= 2(di − ai ) + 2wi (ci + ai − di − bi )
(17)
The number of columns of J will be 4*(n+1)*R. IV. SIMULATION RESULTS
N i , j ,1 ( x (j p ) )
Two problems will be presented in this section. N i , j ,3 ( x (j p ) ) (15)
pH problem The aim of this example is to approximate the inverse of a titration-like curve. This type of non-linearity relates the
pH (a measure of the activity of the hydrogen ions in a solution) with the concentration (x) of chemical substances.
Input
Rule 1
Inverse Coordinate Transformation Problem (ICT) This example illustrates an inverse kinematic transformation between 2 Cartesian coordinates and one of the angles of a two-links manipulator. For a more complete description on the examples refer to [1]. The following termination criterion (with τ f = 10−4 ) was
Output
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2 0
Rule 2
0.2 0
0.2
0.4
0.6
0.8
0
1
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2 0
Rule 3
used:
0.2
0.4
0.6
0
0.8
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
Ω[ k − 1] − Ω[ k ] < θ [ k ]
(
par[k − 1] − par[k ] < τ f 1 + par[k ]
(
g[k ] ≤ 3 τ f 1 + Ω[k ]
)
0.2
0.4
0.6
0.8
1
1.2
1.4
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0.2 0
1
0
0
0.2 0
0.2
0.4
0.6
0
0.8
(18)
)
Fig. 1. The Original Fuzzy System for the pH Problem
Rule 1
A. First Case This first experiment shows the training capabilities of the Levenberg-Marquardt algorithm optimising the fuzzy system, using both problems, and for the sake of simplicity, only two membership functions parameters will be adjusted. These are the {b,c} parameters belonging to the first input membership function of the first rule. We have decided to compare the performance of the LM method with the BackPropagation method (BP), which is usually the most applicable for this purpose. The initial fuzzy systems structures are composed of three rules and represented by the first two figures. The local minima of the performance surface for the pH problem is located at aproximately {b, c} = {0.349,0.800} and the MSE value is 0,010. A local minima of the performance surface for the ICT problem is located at aproximately {b, c} = {0.128,0.245} , and the MSE value is 0,865. Three different initial parameters positions are considered, and the initial and final MSE and parameters values are shown in figures 1 to 6 and in Table I and Table II. From the next two figures below, one can see that even a two parameter dependent performance surface features many local minima, located along the same flat surface. Despite this, from Table I one can conclude that the LM is much faster than the BP, converging to any of the minima. Using the ICT problem (as shown in the next figures), we observe that, though not reaching the local minima from the surface, both algorithms reach for final positions with similar final MSE, and close to the minima pointed out in the figures. The LM is still much faster than the BP.
Rule 2
Rule 3
Output
Input 2
Input 1
where θ [k ] = τ f (1 + Ω[k ]) .
1
1
1
0.8
0.8
0.8
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0
0
0
0.2
0.4
0.6
0.8
1
0.2 0
0.2
0.4
0.6
0.8
1
0
1
1
1
0.8
0.8
0.8
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0
0
0
0.2
0.4
0.6
0.8
1
0.2
0.4
0.6
0.8
1
0
1
1
0.8
0.8
0.8
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0
0
0.2
0.4
0.6
0.8
1
1
2
3
4
0.2 0
1
0
0
0
1
2
3
0.2 0
0.2
0.4
0.6
0.8
1
0
0
1
2
Fig. 2. The Original Fuzzy System for the ICT Problem
Fig. 3. Performance of the LM Method for the pH Problem
3
4
Fig. 6. Performance of the BP Method for the ICT Problem
Fig. 4. Performance of the BP Method for the pH Problem
TABLE II
TABLE I
SUMMARY OF THE RESULTS, FOR THE ICT PROBLEM
SUMMARY OF THE RESULTS, FOR THE PH PROBLEM Method
I point
F point
I mse
F mse
Iter
Method
I point
F point
I mse
F mse
Ite r
LM
{0.12, 0.20}
{0.355, 0.744}
0.019
0.0100
6
LM
{0.40, 0.70}
{0.100, 0.239}
0.890
0.8650
4
LM
{0.40, 0.50}
{0.387, 0.763}
0.013
0.0100
4
LM
{0.30, 0.40}
{0.190, 0.256}
0.870
0.8651
4
LM
{0.60, 0.65}
{0.355, 0.747}
0.011
0.0100
8
LM
{0.50, 0.65}
{0.113, 0.218}
0.890
0.8653
7
{0.12, 0.20}
{0.278, 0.744}
0.019
0.0100
86
{0.40, 0.70}
{0.155, 0.301}
0.890
0.8652
21
{0.40, 0.50}
{0.398, 0.743}
0.013
0.0100
38
{0.30, 0.40}
{0.162, 0.279}
0.870
0.8650
14
{0.60, 0.65}
{0.355, 0.747}
0.011
0.0100
8
{0.50, 0.65}
{0.156, 0.283}
0.890
0.8650
24
BP (α = 0.01) BP (α = 0.01) BP (α = 0.01)
BP
(α = 0.01) BP
(α = 0.01) BP
(α = 0.01)
B. Second case
Fig. 5. Performance of the LM Method for the ICT Problem
The aim here is to estimate every parameter in the fuzzy rule base system. We pretend to evaluate the performance of such an approach if the LM method is to be applied in an hybrid scheme, for example the Bacterial Algorithm [5], where the LM would estimate the parameters of multiple fuzzy rule bases given by each of the individuals of the population. Thus, in the next figures, we show the MSE line obtained along the LM evolution, and the parameters evolution lines for a total of 15 iterations using the pH problem, for a fuzzy system composed of 5 rules (40 parameters). Also, the MSE, MSRE and %MRE for the final fuzzy system given by the LM, and generated from the Bacterial Algorithm with a population of 10 individuals (infections=4 and clones=10) and along 40 generations, is presented.
parameter of the algorithm, the rules are not restrained as in Ruspini-partition and the optimisation task can be solved with fewer number of rules. The LM algorithm has proven to be feasible on its search for the best fuzzy rule base structure. It can estimate the new refined parameters after a reduced number of iterations, no matter their quantity. Since it can be used to produce a better fuzzy structure, its incorporation in a population of many candidates (as is the case of evolutionary algorithms), can produce a faster convergence to the global optima, as long as each candidate represents the search for a different local minimum, out of the whole search space.
1.4 1.2 1 0.8 0.6 0.4 0.2 0 -0.2
0
5
10
TABLE III
15
SUMMARY OF RESULTS FOR THE LM AND BP OPTIMISING A COMPLETE FUZZY RULE BASE, FOR THE PH PROBLEM
Fig. 7. The Fuzzy System Parameters Evolution
Fuzzy rule
-3
7
x 10
Initial (LM) Final (LM)
6
Bacterial Algorithm (final)
5
MSRE -3
6.5x10
9.83x10
-4
7.01x10-4
%MRE
1.98x10
-1
25.5
1.42x10
-1
16.7
1.23x10-1
14.9
ACKNOWLEDGEMENT
4
3
2
1
MSE
0
5
10
15
The authors wish to acknowledge the support of the National Scientific Research Fund OTKA T034233 and T034212 and a bilateral GRICES-OMFB research cooperation grant, a Széchenyi University Research Grant 2004, and the National Research and Development Project Grant 2/0015/2002, and the project Inovalgarve. REFERENCES
Fig. 8. MSE Evolution
Though it is not possible to evaluate the ability to reach a local minima, using the MSE line we conclude that the algorithm is actually optimising the fuzzy rule base, and from the parameters evolution lines, it takes about 15 iterations to reach a local minima. From Table III one can see that, though the initial fuzzy rule base is not the most appropriate, it takes only 15 iterations for the LM to reach the performance specifications shown, similar to the ones provided by the Bacterial Algorithm, obtained after 40 generations with 10 candidates. V.
CONCLUSIONS
A new adaptation of the Levenberg-Marquardt algorithm was introduced in this paper. The structure of the fuzzy rules is no longer Ruspini-partition, allowing a more flexible rule estimation. As the number of rules is also a
[1]
[2] [3] [4]
[5]
A. E. Ruano, C. Cabrita, J. V. Oliveira, L. T. Kóczy, Supervised Training Algorithms for B-Spline Neural Networks and NeuroFuzzy Systems, International Journal of Systems Science, 33, 8, 2002, pp. 689-711. D. Marquardt, An Algorithm for Least-Squares Estimation of Nonlinear Parameters, SIAM J. Appl. Math., 11, 1963, pp. 431441. P. Werbos, Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences, PhD. Dissertation, Appl. Math. Harvard University, USA, 1974. J. Botzheim, L. T. Kóczy, A. E. Ruano, “Extension of the Levenberg-Marquardt algorithm for the extraction of trapezoidal and general piecewise linear fuzzy rules”, WCCI 2002, Honolulu, Hawaii pp. 815-821. J. Botzheim, B. Hámori, L. T. Kóczy, A. E. Ruano, “Bacterial algorithm applied for fuzzy rule extraction”, IPMU 2002, Annecy, France, pp. 1021-1026.