PARAMETER ADAPTATION FOR DIFFERENTIAL EVOLUTION WITH DESIGN OF EXPERIMENTS Karin Zielinski and Rainer Laur Institute for Electromagnetic Theory and Microelectronics (ITEM) University of Bremen, Germany email: {zielinski,rlaur}@item.uni-bremen.de ABSTRACT Optimal settings for control parameters of the Differential Evolution algorithm depend on the considered optimization problem and may also change during an optimization run. In this work an approach is suggested that adaptively controls the parameters F and CR that influence the mutation and recombination processes in Differential Evolution. By application of Design of Experiments methods significant differences in performance due to different parameter settings can be detected during an optimization run. Additionally, interaction effects of the parameters are discovered. By changing the parameter settings on the basis of these results, feedback from the current state of the optimization run is taken into account. The method is tested using a constrained single-objective optimization problem. A comparison with another study using the same problem with tuned fixed parameter values shows promising results. KEY WORDS Differential Evolution, Adaptive Parameter Setting
1 Introduction In literature several examinations can be found that try to determine optimal settings of control parameters for Differential Evolution (DE). Recommendations regarding parameter settings that are derived in these studies are often basically in agreement with each other. However, they show that different control parameter settings may be required for different problems for a reliable and fast convergence behavior [1, 2]. Usually information is not available that would provide a basis for selecting suitable parameter settings, so generally trial-and-error methods are applied for adjusting of parameters. As parameter tuning may be a time-consuming task, an automatic parameter control would be beneficial that selects parameter settings based on feedback from the problem at hand. Furthermore, it is argued in [3] that different parameter settings may be optimal in different stages of an optimization run because the aim changes from exploration in early stages to exploitation in later stages. Consequently, a fixed set of parameter values may be disadvantageous, thus on-line adaptation of parameter settings would be useful. In [3] it is stated that most studies about parameter
control concentrate on only one parameter at a time. Because parameters often interact, this approach may not be able to identify optimal parameter settings. However, tuning several parameters simultaneously generally leads to a strong increase in computational cost compared to examining one parameter at a time. Based on the above-mentioned requirements the statistical technique Design of Experiments (DoE) is chosen for tuning of parameter settings in this work. Instead of using trial-and-error methods for parameter tuning, parameter settings are automatically adjusted. Information from the search process is evaluated for the adaptation of parameter values, thus parameter settings will not remain fixed during an optimization run. Furthermore, DoE methods are capable of identifying not only the influence of single parameters but also interaction effects. Because of a sophisticated evaluation procedure the computational cost is kept low. In this work the parameters F and CR that are used in the mutation and recombination processes of DE are adaptively controlled by DoE methods. It is generally assumed in literature that the third control parameter NP that corresponds to the number of individuals is linearly dependent on the dimension D of the respective optimization problem [1, 2, 4]. However, different recommendations concerning the linear coefficient are given in literature, thus the examinations in this work are conducted in dependence on NP . This paper is organized as follows: In Section 2 the Differential Evolution algorithm is presented and in Section 3 the basics of Design of Experiments are given. Section 4 describes the approach for adaptive control used in this work and in Section 5 the experimental settings are presented. In Section 6 results are given and Section 7 closes with conclusions and an outlook for future work.
2 Differential Evolution Differential Evolution (DE) belongs to the class of evolutionary algorithms and was presented in 1995 for the first time [5]. Like in other evolutionary algorithms the first generation is initialized randomly and further generations evolve by applying the evolutionary operators mutation, recombination and selection until a stopping criterion is satisfied. DE uses a floating-point representation that allows to incorporate vector differences in the evolutionary operators. This leads to an automatic scaling of step sizes that
is the main characteristic of DE. Advantages of DE are its fastness, simplicity and ease of use as it contains only three control parameters. In the following the evolutionary operators are presented. For every individual x i with i ∈ {0, NP − 1} a mutated vector is built by calculating the difference of two randomly chosen population members, weighting the outcome with the control parameter F and adding the result to a third randomly chosen individual: vi = xr1 + F · (xr2 − xr3 )
(1)
The indices r1 , r2 , r3 denote three mutually different individuals that are also different from the so-called target vector xi . Recombination is conducted by building a trial vector ui for every i ∈ {0, NP −1}. By comparison of a random variable randj ∈ [0, 1] with the control parameter CR it is decided for every vector component j ∈ {0, D − 1} if the corresponding component should be copied from the target vector xi or the mutated vector v i . It is ensured for every individual that at least one component of u i is copied from vi by a random choice of a number k ∈ {0, D−1}. vi,j if randj ≤ CR or j = k ui,j = (2) xi,j otherwise For selection every target vector x i competes against the corresponding trial vector u i . The individual with the smaller objective function value is chosen for the next generation. As a consequence no deterioration of the objective function value is possible (greedy selection scheme). Several variants of Differential Evolution have been developed that vary in several details of the mutation and recombination operators. The described variant is known as DE/rand/1/bin [6]. For constraint-handling the modified selection procedure is employed that is also used in [7]: For feasible individuals there is no modification, feasible individuals are favored over infeasible individuals and the comparison of infeasible individuals is based on the sum of constraint violations. An advantage of this method is that no additional parameters are needed. For unconstrained problems the replacement procedure is the same as for the original DE. For handling of boundary constraints also the same method is used as in [7]: If an individual is generated that violates boundary constraints, the boundary-offending value is reset to the middle between old position and limit [8], so boundaries are approached asymptotically.
3 Design of Experiments Design of Experiments is a statistical technique for model building and optimization. It is applicable for identifying the most influential factors in an experiment with many factors (screening) and it is also useful for reliability analysis and robust design optimization [9]. Using DoE it can be determined statistically if changing a certain parameter yields
a significant effect or if performance variations are caused by randomness. Not only the influence of single parameters can be analyzed but also the interaction of parameters. Due to sophisticated designs (settings of input variables) less test runs are necessary as if only one of the parameters is regarded at a given time. DoE can be used for optimization purposes [10], but in this work DE is used for the actual optimization while the ability of DoE methods to draw conclusions from only little information is employed for the adaptive control of DE parameters. DoE was developed in the 1930s for agricultural experiments [11]. Early applications of DoE were based on real (non-simulated) systems while later problems also included simulations. A difference between these two variants is that in real systems random components exist that are accounted for by taking the same data multiple times (replicates) while simulations are often deterministic. This is also termed as design and analysis of computer experiments (DACE) [9] and it is used e.g. for building surrogate models in optimization [12, 13]. Although the problem examined here is based on computer experiments, the analysis is done with DoE methods because the experiments involve randomness as the mutation and recombination processes are observed. In each generation a performance measure is calculated that is regarded as one replicate. The problem examined here can be characterized more specifically as evolutionary operation (EVOP) [14]. It is a method for continuous monitoring and improvement of a process that equals the optimization run here. It is especially suitable for processes with varying performance. This is usually the case for an optimization run as the task changes from exploration in the early stages to exploitation in the later stages. The first step in applying DoE is the choice of an appropriate design. The most simple designs are called twolevel factorial designs, meaning that for every factor two settings are examined at a given time. If more information is needed, e.g. if it is necessary to correctly reflect the curvature of a function, response surface designs are used. Usually two-level factorial designs are employed for EVOP. An alternative is the use of a simplex design because less runs are required but the determination of interaction effects is complicated. Because control parameters of DE are expected to have interactions, a two-level factorial design is used here. Results are evaluated using a statistical method called ANOVA [analysis of variance]. In the following some basics of ANOVA are given. A detailed introduction can be found in [14]. If a two-level factorial design with factors (parameters) A and B is assumed, the main effect E(A) of factor A is calculated by computing the average process outcome Y¯A+ with high setting of A and subtracting the average outcome with low setting Y¯A− : E(A)
= =
(3) Y¯A+ − Y¯A− 1 YA+ ,B + + YA+ ,B − − YA− ,B + − YA− ,B − 2n
where n is the number of replicates and Y A+ is the sum of the process outcome y of all replicates with high level of factor A. Similarly the interaction effect of factors A and B is calculated: 1 Y + + + YA− ,B − − YA+ ,B − − YA− ,B + 2n A ,B (4) Using the calculated effects, the sum of squares and the mean squares can be computed for the effects [14]. From this and the mean square of the model error, a value F0 can be calculated for every main effect and interaction. F0 is used for checking the hypothesis that the corresponding effect is significant by comparing F 0 to a reference value. Reference tables are given in dependence on the confidence coefficient α [15]. If the calculated F 0 is larger than the reference value, the associated effect is significant, e.g. with a confidence of 95% if α = 0.05 is used. E(AB) =
4 Setting of DE Control Parameters DE includes three control parameters: NP , F and CR. In the following guidelines from literature about the settings of these parameters are summarized before describing the approach for adaptive control.
4.1 Recommendations from Literature Recommendations for NP are usually given in dependence on the dimension of the considered optimization problem: NP = a · D. Usually values of 3 ≤ a ≤ 10 are suggested [1, 2, 4]. However, for the optimization problem considered here a setting of NP ≈ 2 · D has been sufficient for reliable convergence with tuned fixed parameter settings [7]. The range for F is mostly given by F ∈ [0, 1] or F ∈ [0, 2]. However, values of 0.5 ≤ F ≤ 0.9 are usually assumed to yield the best results [1, 2, 4, 7]. CR is chosen from the interval [0, 1]. Two different recommendations regarding CR can be found in literature: Either high values of CR ≈ 0.9 or small values of CR ≈ 0.2 provide good results [1, 2, 4, 7]. According to [1] a low setting of CR is beneficial if interactions between objective function parameters are small. However, generally no knowledge about the interactions of objective function parameters is available for real-world problems.
4.2 Adaptive Control Although guidelines for parameter settings are available for DE, there is always some experimentation necessary e.g. to determine whether a low or high setting of CR is better for a specific problem. Furthermore, it is concluded in [3] that no optimal fixed set of parameters exists because requirements change during an optimization run. In this work DoE methods are used for the adaptive adjustment of the DE control parameters F and CR. For a two-level factorial design two settings of each parameter
are regarded, respectively. One possibility for initializing would be to use random values but it was decided to incorporate available information. Therefore, the initial values are set according to guidelines from literature: F − = 0.5, F+ = 0.9, CR− = 0.2 and CR+ = 0.9. Every combination of F and CR is applied to one fourth of the population. Instead of using fixed subpopulations, it is chosen randomly in each generation which parameter settings are used for which individuals. Otherwise a different development of the subpopulations might influence the results. For the calculation of effects the percentage of successful trial vectors (successful means that the vector entered the next generation) to generated trial vectors is computed for every parameter combination. However, if this percentage is used for the calculation of effects, successful trial vectors are rewarded regardless of the amount of change. As a result parameter values might be favored that induce small improvements of many individuals. Other parameter settings might lead to larger improvements but for fewer individuals. Because these settings might be essential to escape from local optima, the ratio of successful trial vectors to all trial vectors is weighted with the average improvement for the respective parameter combination. The flow chart in Fig. 1 gives an overview on the adaptive adjustment of CR and F . After a generation is generated it is checked if there is a significant effect of the parameters. If not, the next generation is built that is regarded as a further replicate in the DoE analysis. However, as the magnitude of effects may change over time, the DoE measures are reset if the average improvement of the current generation is below 1% of the improvement averaged over all generations that participate in the current DoE analysis. In [14] it is even suggested to restart the DoE calculations if after five to eight cycles (generations) no significant effects are detected. However, this restriction seems to be too strict for the regarded problem. If a significant main effect is detected, the corresponding parameter is adjusted. If the interaction effect is also significant, it is checked additionally if both main effects are significant and have the same sign. In this case a definite decision in which direction settings should be changed is difficult, so a random change is made. A random modification is also done if an interaction effect is significant but the corresponding main effects are not. In the present implementation of adaptive control it is concluded from the statistical analysis which parameter should be changed but no statements are derived about the preferable amount of change. Based on the limits of parameters that were taken from literature (F ∈ [0, 2] and CR ∈ [0, 1]) it was decided that if the calculation of effects suggests to increase parameter settings, the higher value is increased by 0.05 and the lower value is increased by 0.1. Similarly, the lower value is decreased by 0.05 and the higher value is decreased by 0.1 if lowering of parameter settings is desired. In this process it is prevented that high and low settings have the same value because this would prevent further calculations of effects.
adaptive control using the same optimization problem [7]. In [7] F and CR were varied from 0.1 to 1.0 in steps of 0.1, respectively. Because of computational cost NP = 80 was used for first experiments, but after suitable settings for F and CR have been identified (F = 0.7 and CR = 0.9), NP was varied from 10 to 100 in steps of 10 (best results were yielded by NP = 30). Because 100 independent runs were performed for every examined parameter combination, 11000 optimization runs have been conducted in total.
Generate next generation
Significant interaction-effect?
no
yes
Significant effect for F or CR?
Significant effect for both F and CR?
yes
Significant effect for F or CR?
yes
Change parameter and reset DoE
yes
sgn(EffectCR) != sgn(EffectF)?
Change parameter and reset DoE
yes Change F and CR and reset DoE
no
no
no
Current effects