A Comparative Study on Kernel Smoothers in Differential Evolution with Estimated Comparison Method for Reducing Function Evaluations Tetsuyuki Takahama, Member, IEEE, Setsuko Sakai, Member, IEEE
Abstract— As a new research topic for reducing the number of function evaluations effectively in function optimization, an idea of utilizing a rough approximation model, which is an approximation model with low accuracy and without learning process, has been proposed. Although the approximation errors between true function values and their approximation values estimated by the rough approximation model are not small, the rough model can estimate the order relation of two points with fair accuracy. In order to use this feature of the rough model, we have proposed the estimated comparison method, which omits the function evaluations when the result of comparison can be judged by approximation values. In this study, kernel smoothers are adopted as rough approximation models. Various types of benchmark functions are solved by Differential Evolution (DE) with the estimated comparison method and the results are compared with those obtained by DE. It is shown that the estimated comparison method is general purpose method for reducing function evaluations and can work well with kernel smoothers. It is also shown that the potential model, which is a rough approximation model proposed by us, has better ability of function reduction than kernel smoothers.
I. I NTRODUCTION Evolutionary computation has been successfully applied to various fields of science and engineering. Evolutionary algorithms have been proved to be powerful function optimization algorithms and outperform conventional optimization algorithms for various problems including discontinuous, non-differential, multi-modal, noisy problems, and multiobjective problems. A disadvantage of evolutionary algorithms is that they need a large number of function evaluations before a well acceptable solution can be found. Recently, the size of optimization problems tends to become larger, and the cost of function evaluations becomes higher. It is necessary to develop more efficient optimization algorithms to reduce the number of function evaluations. An effective method for reducing the function evaluations is to build an approximation model for the objective function and to solve the problems using the approximation values [1]. If an approximation model with high accuracy can be T. Takahama is with the Department of Intelligent Systems, Hiroshima City University, Asaminami-ku, Hiroshima, 731-3194 Japan (email:
[email protected]). S. Sakai is with the Faculty of Commercial Sciences, Hiroshima Shudo University, Asaminami-ku, Hiroshima, 731-3195 Japan (e-mail:
[email protected]). This work was supported by Grant-in-Aid for Scientific Research (C) (No. 16500083,17510139) of Japan society for the promotion of science and Hiroshima City University Grant for Special Academic Research (General Studies) 7111.
c 2009 IEEE 978-1-4244-2959-2/09/$25.00
built, it is possible to reduce the function evaluations largely. However, building the high-quality approximation model is very difficult and time-consuming. It needs to learn the model from many pairs of a known solution and its function value. Also, a proper approximation model depends on the problems to be optimized. It is difficult to design a general-purpose approximation model with high accuracy. We have proposed to utilize a rough approximation model, which is an approximation model with low accuracy and without learning process, to reduce the number of function evaluations effectively[2], [3], [4]. Although the approximation errors between true function values and their approximation values estimated by the rough approximation model are not small, the approximation model can estimate whether the function value of a point is smaller than that of the other point or not with fair accuracy. In order to use this feature of the rough approximation model, we have proposed the estimated comparison method. In the estimated comparison method, two approximation values are compared first. When one value is worse enough than the other value, the estimated comparison returns an estimated result without evaluating the objective function. When it is difficult to judge the result by the approximation values, true values are obtained by evaluating the objective function and the estimated comparison returns a true result based on the true values. By using the estimated comparison, the evaluation of the objective function is sometimes omitted and the number of function evaluations can be reduced. The reduction of function evaluations by the estimated comparison method is not larger than that by other optimization methods using approximation models with high accuracy. However, the estimated comparison method does not need the learning process of the approximation model which is often time-consuming and needs much effort to tune the learning parameters. The estimated comparison method is fast and easy-to-use approach and can be applied to the wide range of problems including from low or medium computation cost to high computation cost problems. It is thought that the estimated comparison method is a more general-purpose method than other methods with high-quality approximation models. In this paper, kernel regression[5], [6], [7] is studied and kernel smoothers are used as rough approximation models. The kernel smoothers used in this study estimate a function value of a point based on some other points without learning process and can be used as a general-purpose rough
1367
approximation model. Differential Evolution (DE) [8], [9], [10], [11] is used as an optimization algorithm and the estimated comparison method is introduced into the survivor selection phase of DE. Various types of benchmark functions are solved using the estimated comparison method and the results are compared with the results obtained by DE. It is shown that the estimated comparison method is general purpose method for reducing function evaluations and can work well with kernel smoothers. It is also shown that the potential model[2], [3], [4], which is a rough approximation model proposed by us, has better ability of function reduction than kernel smoothers. The rest of this paper is organized as follows: Section II describes evolutionary algorithms using approximation models briefly. Section III describes kernel regression and kernel smoothers. DE with the estimated comparison is described in Section IV. Section V presents experimental results on various benchmark problems. Finally, Section VI concludes with a brief summary of this paper and a few remarks.
•
•
II. O PTIMIZATION AND A PPROXIMATION M ODELS A. Optimization Problems In this study, the following optimization problem (P) with upper bound constraints and lower bound constraints will be discussed. (P) minimize subject to
f (x) li ≤ xi ≤ ui , i = 1, . . . , n,
(1)
where x = (x1 , x2 , · · · , xn ) is an n dimensional vector, f (x) is an objective function. Values ui and li are the upper bound and the lower bound of xi , respectively. Also, let the search space in which every point satisfies the upper and lower bound constraints be denoted by S. The objective function f (x) will be approximated using a rough approximation model. B. Evolutionary algorithms using Approximation Models In this section, evolutionary algorithms using approximation models are briefly reviewed. Various approximation models are utilized to approximate the objective function. For example, quadratic model is used as a simple case of polynomial models [12]. Kriging models [12], [13] approximate the function by a global model and a localized deviation. Also, neural network models [14] and radial basis function (RBF) network models [15], [16], [17], [18] are often used. In most approximation models, model parameters are learned by least square method, gradient method, maximum likelihood method and so on. In general, learning model parameters is time-consuming process, especially in order to obtain models with higher accuracy and models of larger functions such as functions with large dimensions. Evolutionary algorithms with approximation models can be classified some types[2]:
1368
•
All individuals have only approximation values. Very high-quality approximation model is built and the objective function is optimized using approximation values only. It is possible for these models to reduce function evaluations greatly. However, they can be applied wellinformed objective function only and cannot be applied to general problems. Some individuals have approximation values and others have true values. Methods in this type are called evolution control approaches and can be classified into individual-based and generation-based control. The individual-based control means that good individuals (or randomly selected individuals) use true values and others use approximation values in each generation [15], [17]. The generation-based control means that all individuals use true values once in a fixed number of generations and use approximation values in other generations [15], [16]. In the approaches, the approximation model should be accurate because approximation values are compared with true values. Also, it is known that approximation models with high accuracy sometimes generate a false optimum or hide a true optimum. Individuals may converge into a false optimum while they are optimized using the approximation models in some generations. Thus, these approaches are much affected by the quality of approximation models. It is difficult to utilize rough approximation models. All individuals have true values. Some methods in this type are called surrogate approaches. In the surrogate approaches, an estimated optimum is searched using an approximation model that is usually a local model. The estimated optimum is evaluated to obtain the true value and also to improve the approximation model [19], [20], [18]. If the true value is good, the value is included as an individual. In the approaches, rough approximation models might be used because approximation values are compared with other approximation values. These approaches are less affected by the approximation model than the evolution control approaches. However, they have the process of optimization using the approximation model only. If the process is repeated many times, they are much affected by the quality of approximation models.
In order to solve the difficulties mentioned above, we have proposed the estimated comparison method. The method is classified into the last category because all individuals have true values. However, the method is different from the surrogate approaches. It uses a global approximation model of current individuals using a rough approximation model. It does not search for an estimated optimum, but it judges whether a new individual is worth evaluating its true value or not. Also, it can specify the margin of approximation error when the comparison is carried out. It is not affected by the quality of approximation model much. Thus, it is thought that the estimated comparison method can adopt various rough approximation models, which are
2009 IEEE Congress on Evolutionary Computation (CEC 2009)
easy-to-use and fast approximation models. III. K ERNEL R EGRESSION AND K ERNEL S MOOTHERS In this study, kernel smoothers are used as rough approximation models. A. Kernel Regression The kernel regression is a nonparametric regression to estimate the regression function y = f (x) + ε using a data set {(xi , yi ) | i = 1, 2, ..., N }, where N is the number of data and ε is a small noise. The following Nadaraya-Watson estimator[21], [22], or a weighted average of function values yi , where the weighting function is a kernel, is often used. K (x − xi )yi ˆ i h f (x) = (2) i Kh (x − xi ) 1 Kh (u) = K(u/h) (3) h where fˆ is the estimated function of f , K is the kernel with a bandwidth h. The kernel K is a non-negative integrable function satisfying the following conditions: ∞ K(u)du = 1 (4) −∞
K(−u)
= K(u) , for all u
(5)
For example, the followings are representative kernels. • Triangle:
K(u)
=
• Epanechnikov:
K(u)
=
K(u)
=
• Gaussian:
1 − |u| (|u| ≤ 1) 0 (otherwise) 3 4 (1
− u2 ) (|u| ≤ 1) 0 (otherwise)
1 2 1 √ e− 2 u 2π
(6)
(7)
(8)
B. Kernel Smoothers A kernel smoother is an estimation model based on kernel regression, where the estimated function is smooth and the level of smoothness can be adjustable. A kernel smoother can be defined as follows: K (x, xi )yi ˆ i hλ f (x) = (9) K (x, xi ) i hλ ||x − xi || Khλ (x, xi ) = D (10) hλ (x) where P = {(xi , yi ) | i = 1, 2, · · · , N } is a data set for estimation, Khλ is a kernel, hλ is a kernel radius parameter, and D is a positive real-valued and non-increasing function of the distance between x and xi . For example, the followings are representative kernel smoothers. • Nearest neighbor smoother hλ (x) = ||x − x[k] || 1 (|t| ≤ 1) D(t) = 0 (otherwise)
(11) (12)
•
where x[k] is the k-th closest point to x. Kernel average smoother hλ (x) = λ (= const.)
(13)
where D(t) is defined by a kernel K(t). In this study, Gaussian kernel is used as D(t). Thus, the following bandwidth h0 that minimizes asymptotic mean integrated squared error (AMISE) is used as λ. h0
= 1.5874N −1/3 σ
(14)
where σ is the standard deviation of a variable, and is estimated by the standard deviation of data. IV. D IFFERENTIAL E VOLUTION AND E STIMATED C OMPARISON M ETHOD In this section, Differential Evolution (DE) and DE with the estimated comparison method are described. A. Differential Evolution DE is a variant of ES proposed by Storn and Price[8], [9]. DE is a stochastic direct search method using a population or multiple search points. DE has been successfully applied to the optimization problems including non-linear, nondifferentiable, non-convex and multi-modal functions. It has been shown that DE is fast and robust to these functions[23]. Some variants of DE such as DE/best/1/bin and DE/rand/1/exp have been proposed. The variants are classified using the notation DE/base/num/cross. “base” indicates the method of selecting a base vector. For example, DE/rand/num/cross selects the base vector at random from the population. DE/best/num/cross selects the best individual in the population. “num” indicates the number of difference vectors used to perturb the base vector. “cross” indicates the crossover mechanism used to create a trial vector or a child. For example, DE/base/num/bin shows that crossover is controlled by the binomial crossover using constant crossover rate. DE/base/num/exp shows that crossover is controlled by the two-point crossover using exponentially decreasing the crossover rate. In this study, DE/rand/1/exp variant, where the number of difference vector is 1 or num = 1, is used. B. Estimated Comparison The estimated comparison judges whether a child point is better than the parent point. In the comparison, the error estimation of the approximation model σ and a margin parameter for the approximation error δ are introduced. The function of the estimated comparison can be defined as follows: EstimatedBetter(xi , xi , σ) { if(fˆ(xi ) < fˆ(xi ) + δσ) return yes; else return no; } where xi is a parent point, xi is the child point, and the true value at the parent is known. The parameter δ ≥ 0 controls the margin value for the approximation error. When
2009 IEEE Congress on Evolutionary Computation (CEC 2009)
1369
δ is 0, the estimated comparison can reject many children and omit a large number of function evaluations. However, the possibility of rejecting good child becomes high and a true optimum sometimes might be skipped. When δ is large, the possibility of rejecting good child becomes low. However, the estimated comparison can reject fewer children and omit a small number of function evaluations. Thus, δ should have a proper value. The estimation error can be given by the standard deviation of errors between true values and their approximation values. C. DE with Estimated Comparison Method The algorithm of DE with the estimated comparison method based on DE/rand/1/exp variant, which is used in this study, is as follows: Step0 Initialization. Initial N individuals xi are generated randomly in the search space S and form an initial population P = {xi , i = 1, 2, · · · , N }. Step1 Termination condition. If a predefined condition, such that the number of generations (iterations) exceeds the maximum generation Tmax , is satisfied, the algorithm is terminated. Step2 Mutation. For each individual xi , three different individuals xp1 , xp2 and xp3 are chosen from the population without overlapping xi . A new vector x is generated by the base vector xp1 and the difference vector xp2 − xp3 as follows: x = xp1 + F (xp2 − xp3 )
(15)
where F is a scaling factor. Step3 Crossover. The vector x is recombined with the parent xi . A crossover point j is chosen randomly from all dimensions [1, n]. The element at the j-th dimension of the trial vector xnew is inherited from the j-th element of the vector x . The elements of subsequent dimensions are inherited from x with exponentially decreasing probability defined by a crossover rate CR. Otherwise, the elements are inherited from the parent xi . In real processing, Step2 and Step3 are integrated as one operation. Step4 Survivor selection. The estimated comparison is used for comparing the trial vector and the parent. The trial vector xnew is accepted for the next generation if xnew is better than the parent xi by using the estimated comparison. Step6 Go back to Step1. The pseudo-code of DE/rand/1/exp with the estimated comparison method is as follows: DE/rand/1/exp with estimated comparison() { P =Generate N individuals {xi } randomly; Evaluate xi , i = 1, 2, · · · , N ; for(t=1; t < Tmax ; t++) { σ=estimation of approximation error in P ; for(i=1; i ≤ N ; i++) {
1370
(p1 , p2 , p3 )=select randomly in [1, N ]\{i} s.t. pj = pk (j, k = 1, 2, 3, j = k); xnew =x ∈ P ; i i j=select randomly from [1, n]; k=1; do { xnew ij =xp1 ,j +F (xp2 ,j − xp3 ,j ); j=(j + 1)%n; k++; } while(k ≤ n && u(0, 1) < CR); // estimated comparison if(EstimatedBetter(xnew , xi , σ)) { Evaluate xnew ; if(f (xnew ) < f (xi )) xi =xnew ; } } } } D. Potential Model In [4] and [2], we proposed the potential model as a rough approximation model to estimate the value of the objective function f : fˆ(x) = Uo (x)/Uc (x) f (xk ) Uo (x) = d(x, xk )p xk ∈P 1 Uc (x) = d(x, xk )p xk ∈P
(16) (17) (18)
where Uo (x) is objective potential, Uc (x) is congestion potential, and d(x, xk ) is a distance between x and xk . In the potential model, an approximation value is estimated using the points with known objective values. In the estimated comparison method, the current population P is used as the set of points with known objective values. As the search process progresses, the area where individuals exist may become elliptical. In order to handle such a case, the normalized distance is introduced, in which the distance is normalized by the width of each dimension in the current population P , as follows: 2 xj − xij
d(x, xi ) = (19) maxxi ∈P xij − minxi ∈P xij j The potential model is similar to the kernel average smoother, where λ = 1 and D(t) = t−p . However, this definition of D(t) is not a kernel because D(0) = ∞. Also, when a new child point xi is generated from a parent point xi , xi is omitted from Eqs. (17) and (18) for obtaining fˆ(xi ) and fˆ(xi ) in order to avoid the infinity. V. N UMERICAL E XPERIMENTS A. Test Problems In this section, the estimated comparison method is applied to sphere function, Rosenbrock function and Rastrigin function. These functions have various surfaces such as unimodal,
2009 IEEE Congress on Evolutionary Computation (CEC 2009)
multimodal, smooth, bumpy, or steep surfaces. Table I show features of the functions. The function definitions and their search spaces, where n is the dimension of the decision vector, are as follows: • f1 : Sphere function f (x) =
n
x2i , −5.12 ≤ xi ≤ 5.12
n
{100(x1 − x2i )2 + (xi − 1)2 },
i=2
•
•
20
4
0
2 -4
-2 x1
x2
-2
0
2
4
-4
f2
(22)
−2.048/i ≤ xi ≤ 2.048/i
f(x) 10000 1000 100 10 1 0.1 0.01 0.001 -2 -1.5
2 1.5 1 0.5 0 -1
This function is a unimodal and ill-scaled function with steep surface and has the minimum value 0 at (1, 12 , · · · , n1 ). f4 : Rastrigin function f (x) = 10n +
0
−2.048 ≤ xi ≤ 2.048
n {100(x1 − (ixi )2 )2 + (ixi − 1)2 }, i=2
40
(21)
This function is a unimodal function with steep surface and has the minimum value 0 at (1, 1, · · · , 1). f3 : ill-scaled Rosenbrock function f (x) =
50
10
This function is a unimodal function and has the minimum value 0 at (0, 0, · · · , 0). f2 : Rosenbrock function f (x) =
f(x) 60
30
(20)
i=1
•
f1
n {x2i − 10 cos(2πxi )}, i=1
0
x1
0.5
1
1.5
x2
f4
80
(23)
-0.5
-0.5 -1 -1.5 -2
f(x)
70 60
−5.12 ≤ xi ≤ 5.12
This function is a multimodal function with bumpy surface and has the minimum value 0 at (0, 0, · · · , 0). TABLE I F EATURES OF TEST PROBLEMS Function modality surface dependency of variables ill-scale f1 unimodal smooth — — f2 unimodal steep strong — unimodal steep strong strong f3 f4 multimodal bumpy — —
50 40 30 20 10 0
4 2 -4
0
-2 x1
Fig. 1.
-2
0
2
4
x2
-4
Graphs of f1 , f2 and f4
Figure 1 shows the graphs of three functions f1 , f2 and f4 in the case of n = 2.
In this paper, 25 independent runs are performed. In each run, the optimization is terminated when the maximum number of true function evaluations reaches 150,000.
B. Conditions of Experiments
C. Experimental Results
All functions are optimized with setting the dimension of decision vector n = 50. Experimental conditions for DE, DE with the estimated comparison method are as follows: Parameters for DE are population size N = 50, scaling factor F = 0.8, crossover rate CR = 0.8. DE/rand/1/exp is adopted in DE with estimation comparison method. In the estimated comparison EstimatedBetter, the estimation error σ is given by the standard deviation of errors between true values and their approximation values. In order to examine the effect of the margin parameter δ, the parameter value is selected from 0.001, 0.003 and 0.005.
1) Kernel Average Smoother: Table II shows the results of optimization using the kernel average smoother with Gaussian kernel defined by Eq. (8). The column labeled “Func.” shows the objective function, and “δ” shows the value of margin parameter. The columns labeled “Best”, “Average”, “Worst” and “Std” show the best value, the average value, the worst value and the standard deviation of the best values in all runs, respectively. The best average value is obtained in the case of δ = 0.001 for f1 and f4 , and δ = 0.005 for f2 and f3 . However, the effect of margin parameter is not so large and similar results
2009 IEEE Congress on Evolutionary Computation (CEC 2009)
1371
were obtained. It is thought that the setting of δ = 0.003 can attain stable results. TABLE II O PTIMIZATION R ESULTS OF K ERNEL S MOOTHER Func. δ f1 0.001 0.003 0.005 f2 0.001 0.003 0.005 f3 0.001 0.003 0.005 f4 0.001 0.003 0.005
Best 1.71e-15 1.79e-15 2.13e-15 2.70 1.21 0.991 2.70 1.21 0.991 1.85e-07 1.22e-07 1.21e-07
Average 3.60e-15 3.68e-15 3.84e-15 4.79 4.04 3.78 4.79 4.04 3.78 7.23e-07 7.50e-07 8.09e-07
Worst 6.57e-15 6.79e-15 7.14e-15 8.22 11.18 7.67 8.22 11.18 7.67 1.50e-06 2.66e-06 3.17e-06
Std 1.27e-15 1.09e-15 1.14e-15 1.54e+00 2.11e+00 1.44e+00 1.54e+00 2.11e+00 1.44e+00 3.94e-07 5.18e-07 7.49e-07
2) Nearest Neighbor Smoother: Table III shows the results of optimization using the nearest neighbor smoother defined in Eq.(11). In order to examine the effect of the neighboring parameter k, the parameter is selected from the four values, 5, 10, 20 and 30. The best average value is obtained in the case of k = 20 and δ = 0.003 for f1 , k = 10 and δ = 0.003 for f2 , f3 and f4 . It is thought that the setting of k = 10 and δ = 0.003 can attain stable results. D. Comparison with other methods Table IV shows the comparison of results obtained by several methods. The column labeled “Method” shows optimization method where “kernel” means DE with the estimated comparison method using the kernel average smoother (δ = 0.003), “kNN” means DE with the estimated comparison method using the nearest neighbor smoother (k = 10, δ = 0.003), and “DE/rand” and “DE/best” mean the original DE/rand/1/exp and DE/best/1/exp. As a reference, results obtained by DE with the estimated comparison using the potential model is also shown as “potential”. The column labeled “FES” shows the relative number of function evaluations when the methods can obtain the same or better best value on average than DE/rand does. As for the best average value, the kNN smoother found better values than DE/rand, DE/best and the kernel average smoother. Also, the kernel average smoother found better values than DE/rand for all functions and better than DE/best for all function except for f1 . Note that it is difficult for DE/best to solve multi-modal problems such as f4 , because DE/best is easily trapped by local minimum. The kNN smoother and the kernel average smoother can stably reduce about 25% and 15% function evaluations compared with DE/rand, respectively. Thus, the estimated comparison method using the kernel smoothers can find better function values and can reduce the number of function evaluations effectively. However, compared with the potential method, the kNN smoother and the kernel average smoother cannot attain better results. In these functions, the potential method is superior to the kNN smoother and the kernel average smoother.
1372
TABLE III O PTIMIZATION R ESULTS OF N EAREST N EIGHBOR S MOOTHER Func. k δ f1 5 0.001 5 0.003 5 0.005 10 0.001 10 0.003 10 0.005 20 0.001 20 0.003 20 0.005 30 0.001 30 0.003 30 0.005 f2 5 0.001 5 0.003 5 0.005 10 0.001 10 0.003 10 0.005 20 0.001 20 0.003 20 0.005 30 0.001 30 0.003 30 0.005 f3 5 0.001 5 0.003 5 0.005 10 0.001 10 0.003 10 0.005 20 0.001 20 0.003 20 0.005 30 0.001 30 0.003 30 0.005 f4 5 0.001 5 0.003 5 0.005 10 0.001 10 0.003 10 0.005 20 0.001 20 0.003 20 0.005 30 0.001 30 0.003 30 0.005
Best 8.77e-17 1.55e-16 1.06e-16 1.73e-17 2.00e-17 3.87e-17 1.01e-17 1.36e-17 1.37e-17 2.24e-17 2.85e-17 2.79e-17 1.12 1.86 1.24 1.35 1.27 0.763 1.37 1.12 1.62 1.62 1.25 1.79 1.12 1.86 1.24 1.35 1.27 0.763 1.37 1.12 1.62 1.62 1.25 1.79 4.16e-09 8.47e-09 2.41e-08 6.16e-09 3.88e-09 7.28e-09 4.03e-09 7.32e-09 5.46e-09 6.71e-09 1.18e-08 7.36e-09
Average 2.49e-16 2.95e-16 2.89e-16 4.83e-17 4.89e-17 6.08e-17 2.68e-17 2.54e-17 2.77e-17 6.59e-17 5.66e-17 7.52e-17 3.41 3.76 3.39 3.12 2.78 3.17 3.33 2.85 3.26 3.04 3.01 3.20 3.41 3.76 3.39 3.12 2.78 3.17 3.33 2.85 3.26 3.04 3.01 3.20 6.46e-08 6.20e-08 5.41e-08 2.87e-08 2.19e-08 2.55e-08 4.57e-08 2.69e-08 4.74e-08 6.27e-08 8.91e-08 7.01e-08
Worst 4.56e-16 4.45e-16 5.03e-16 9.08e-17 8.87e-17 8.44e-17 6.28e-17 4.38e-17 5.44e-17 1.53e-16 9.42e-17 1.5e-16 6.9 8.92 5.68 4.99 5.35 7.57 6.22 5.80 7.18 6.83 4.75 5.42 6.9 8.92 5.68 4.99 5.35 7.57 6.22 5.8 7.18 6.83 4.75 5.42 2.59e-07 3.67e-07 1.25e-07 7.99e-08 7.36e-08 6.94e-08 3.01e-07 9.98e-08 2.71e-07 3.02e-07 9.35e-07 2.89e-07
Std 8.48e-17 8.57e-17 1.01e-16 1.64e-17 1.91e-17 1.49e-17 1.33e-17 6.97e-18 9.92e-18 3.04e-17 1.77e-17 2.88e-17 1.64e+00 1.65e+00 1.09e+00 1.04e+00 1.07e+00 1.35e+00 1.23e+00 1.11e+00 1.24e+00 1.17e+00 8.52e-01 1.07e+00 1.64e+00 1.65e+00 1.09e+00 1.07e+00 1.07e+00 1.35e+00 1.23e+00 1.11e+00 1.24e+00 1.17e+00 8.52e-01 1.07e+00 6.17e-08 7.11e-08 2.69e-08 1.92e-08 1.71e-08 1.60e-08 6.92e-08 1.92e-08 5.54e-08 7.08e-08 1.79e-07 6.18e-08
Figures 2, 3, 4 and 5 show single logarithmic plots of the best function value over the number of function evaluations for function f1 , f2 , f3 and f4 , respectively. It is clear that the estimation comparison method using the potential model, the kNN smoother and the kernel average smoother can find better solutions faster than the original DEs in almost all functions. Figures 6, 7, 8 and 9 show single logarithmic plots of mean square error between true values and approximation values in a population over the number of function evaluations for f1 , f2 , f3 and f4 , respectively. In f1 and f4 , all approximation models can decrease the approximation error successfully. Apparently, the potential model is the best approximation model for these functions. In f2 and f3 , the nearest neighbor smoother is the best approximation model, although the approximation error in these functions
2009 IEEE Congress on Evolutionary Computation (CEC 2009)
TABLE IV C OMPARISON OF R ESULTS
f2
f3
f4
Best 3.73e-13 2.43e-17 1.79e-15 2.00e-17 1.43e-20 2.02 1.95 1.21 1.27 0.939 2.02 1.95 1.21 1.27 0.939 7.72e-05 1.16e-08 1.22e-07 3.88e-09 2.06e-11
Average 7.69e-13 5.88e-17 3.68e-15 4.89e-17 5.88e-20 6.73 5.66 4.04 2.78 2.39 6.73 5.66 4.04 2.78 2.39 3.33e-04 0.677 7.50e-07 2.19e-08 2.38e-10
Worst 1.48e-12 1.20e-16 6.79e-15 8.87e-17 1.08e-19 14.5 10.2 11.18 5.35 6.07 14.5 10.2 11.2 5.35 6.07 9.01e-04 3.98 2.66e-06 7.36e-08 1.54e-09
Std 2.45e-13 2.17e-17 1.09e-15 1.91e-17 2.25e-20 2.82 2.19 2.11 1.07 1.39 2.82 2.19 2.11 1.07 1.39 1.82e-04 1.04 5.18e-07 1.71e-08 3.07e-10
FES 1.000 0.781 0.863 0.777 0.674 1.000 0.931 0.852 0.753 0.652 1.000 0.931 0.852 0.753 0.652 1.000 — 0.843 0.777 0.716
is quite large. These functions have a steep surface and it is difficult to approximate the functions. It is thought that the approximation models do not approximate the functions globally, but they approximate the functions locally. It should be noted that the rough approximation models can reduce function evaluations in such difficult situations.
kernel average smoother kNN smoother potential model DE/rand DE/best
1
1e-005 f1
Method DE/rand DE/best kernel(0.03) kNN(10,0.03) potential DE/rand DE/best kernel(0.03) kNN(10,0.03) potential DE/rand DE/best kernel(0.03) kNN(10,0.03) potential DE/rand DE/best kernel(0.03) kNN(10,0.03) potential
1e-010
1e-015
1e-020 0
10000
[1] Y. Jin, “A comprehensive survey of fitness approximation in evolutionary computation,” Soft Computing, vol. 9, pp. 3–12, 2005. [2] T. Takahama and S. Sakai, “Reducing function evaluations in differential evolution using rough approximation-based comparison,” in Proc. of the 2008 IEEE Congress on Evolutionary Computation, June 2008, pp. 2307–2314.
150000
kernel average smoother kNN smoother potential model DE/rand DE/best
1000
100
10
1 0
R EFERENCES
100000
Optimization of f1
Fig. 2.
50000
VI. C ONCLUSIONS
100000
150000
Evaluations
Optimization of f2
Fig. 3. 10000
kernel average smoother kNN smoother potential model DE/rand DE/best
f3
1000
100
10
1 0
50000
100000
150000
Evaluations
Optimization of f3
Fig. 4. 10000 100 1 0.01 f4
We have proposed to utilize rough approximation models, which are approximation models with low accuracy and without learning process, to reduce the number of function evaluations in the wide range of problems including from low or medium computation cost to high computation cost problems. We have proposed the estimated comparison method, in which the function evaluation of a solution is skipped when the goodness of the solution can be judged from the approximation value of it. In this study, in order to show that the estimated comparison method can work with the wide range of rough approximation models, kernel smoothers are used as rough approximation models. Through the optimization of various types of test problems, it is shown that the estimated comparison method using kernel smoothers can make the search process faster and reduce function evaluations effectively. Also, it is shown that the potential model is superior to the kernel smoothers and is well-suited to the estimated comparison method. In the future, we will investigate the effect of the margin parameter again. We plan to apply the estimated comparison method into other algorithms such as particle swarm optimization. Also, we will apply the estimated comparison method to constrained optimization problems.
50000 Evaluations
f2
Func. f1
100000
0.0001 1e-006
kernel average smoother kNN smoother potential model DE/rand DE/best
1e-008 1e-010 0
50000
100000
150000
Evaluations
Fig. 5.
2009 IEEE Congress on Evolutionary Computation (CEC 2009)
Optimization of f4
1373
1e+010
kernel average smoother kNN smoother potential model
100000
approx. error of f1
1 1e-005 1e-010 1e-015 1e-020 1e-025 1e-030 1e-035 1e-040 0
50000
100000
150000
Evaluations
Fig. 6.
Approximation error of f1
1e+010
kernel average smoother kNN smoother potential model
1e+009
approx. error of f2
1e+008 1e+007 1e+006 100000 10000 1000 0
50000
100000
150000
Evaluations
Fig. 7.
Approximation error of f2
1e+010
kernel average smoother kNN smoother potential model
1e+009
approx. error of f3
1e+008 1e+007 1e+006 100000 10000 1000 0
50000
100000
150000
Evaluations
Fig. 8.
Approximation error of f3
1e+010
approx. error of f4
100000
1
1e-005
kernel average smoother kNN smoother potential model
1e-010
1e-015
1e-020 0
50000
100000
150000
[3] T. Takahama and S. Sakai, “Efficient optimization by differential evolution using rough approximation model with adaptive control of error margin,” in Proc. of the Joint 4th International Conference on Soft Computing and Intelligent Systems and 9th International Symposium on advanced Intelligent Systems, Sept. 2008, pp. 1412– 1417. [4] T. Takahama, S. Sakai, and A. Hara, “Reducing the number of function evaluations in differential evolution by estimated comparison method using an approximation model with low accuracy,” IEICE Trans. on Information and Systems, vol. J91-D, no. 5, pp. 1275–1285, 2008, in Japanese. [5] M. P. Wand and M. C. Jones, Kernel Smoothing. Chapman & Hall, Dec. 1994. [6] M. G. Schimek, Ed., Smoothing and Regression: Approaches, Computation, and Application. Wiley-Interscience, Aug. 2000. [7] K. Weinberger and G. Tesauro, “Metric learning for kernel regression,” in Eleventh International Conference on Artificial Intelligence and Statistics, M. Meila and X. Shen, Eds. Puerto Rico: Omnipress, 2007, pp. 608–615. [8] R. Storn and K. Price, “Minimizing the real functions of the ICEC’96 contest by differential evolution,” in Proc. of the International Conference on Evolutionary Computation, 1996, pp. 842–844. [9] R. Storn and K. Price, “Differential evolution – A simple and efficient heuristic for global optimization over continuous spaces,” Journal of Global Optimization, vol. 11, pp. 341–359, 1997. [10] T. Takahama and S. Sakai, “Constrained optimization by the ε constrained differential evolution with gradient-based mutation and feasible elites,” in Proc. of the 2006 IEEE Congress on Evolutionary Computation, July 2006, pp. 308–315. [11] T. Takahama, S. Sakai, and N. Iwane, “Solving nonlinear constrained optimization problems by the ε constrained differential evolution,” in Proc. of the 2006 IEEE Conference on Systems, Man, and Cybernetics, Oct. 2006, pp. 2322–2327. [12] A. Giunta and L. Watson, “A comparison of approximation modeling techniques: Polynomial versus interpolating models,” AIAA, Tech. Rep. 98-4755, 1998. [13] T. W. Simpson, T. M. Mauery, J. J. Korte, and F. Mistree, “Comparison of response surface and kriging models in the multidisciplinary design of an aerospike nozzle,” AIAA, Tech. Rep. 98-4758, 1998. [14] W. Shyy, P. K. Tucker, and R. Vaidyanathan, “Response surface and neural network techniques for rocket engine injector optimization,” AIAA, Tech. Rep. 99-2455, 1999. [15] Y. Jin, M. Olhofer, and B. Sendhoff, “On evolutionary optimization with approximate fitness functions,” in Proc. of the Genetic and Evolutionary Computation Conference. Morgan Kaufmann, 2000, pp. 786–792. [16] Y. Jin, M. Olhofer, and B. Sendhoff, “A framework for evolutionary optimization with approximate fitness functions,” IEEE Trans. on Evolutionary Computation, vol. 6, no. 5, pp. 481–494, 2002. [17] Y. Jin and B. Sendhoff, “Reducing fitness evaluations using clustering techniques and neural networks ensembles,” in Genetic and Evolutionary Computation Conference, ser. LNCS, vol. 3102. Springer, 2004, pp. 688–699. [18] F. G. Guimar˜aes, E. F. Wanner, F. Campelo, R. H. Takahashi, H. Igarashi, D. A. Lowther, and J. A. Ram´ırez, “Local learning and search in memetic algorithms,” in Proc. of the 2006 IEEE Congress on Evolutionary Computation, Vancouver, BC, Canada, July 2006, pp. 9841–9848. [19] D. B¨uche, N. N. Schraudolph, and P. Koumoutsakos, “Accelerating evolutionary algorithms with gaussian process fitness function models,” IEEE Trans. on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 35, no. 2, pp. 183–194, May 2005. [20] Y. S. Ong, Z. Zhou, and D. Lim, “Curse and blessing of uncertainty in evolutionary algorithm using approximation,” in Proc. of the 2006 IEEE Congress on Evolutionary Computation, Vancouver, BC, Canada, July 2006, pp. 9833–9840. [21] E. A. Nadaraya, “On estimating regression,” Theory of Probability and its Applications, vol. 9, no. 1, pp. 141–142, 1964. [22] G. S. Watson, “Smooth regression analysis,” Sankhya, vol. 26, pp. 359–372, 1964. [23] U. K. Chakraborty, Ed., Advances in Differential Evolution. Springer, 2008.
Evaluations
Fig. 9.
1374
Approximation error of f4
2009 IEEE Congress on Evolutionary Computation (CEC 2009)