Benchmarking SPSA on BBOB-2010 noisy ... - Semantic Scholar

Report 2 Downloads 15 Views
Benchmarking SPSA on BBOB-2010 Noisy Function Testbed Steffen Finck

Hans-Georg Beyer

University of Applied Sciences Vorarlberg Hochschulstrasse 1 Dornbirn, Austria

University of Applied Sciences Vorarlberg Hochschulstrasse 1 Dornbirn, Austria

[email protected]

[email protected]

ABSTRACT

Compared with other stochastic gradient approximation strategies, e.g, [5, 1], SPSA uses only two function evaluations per iteration step instead of Θ(D), where D represents the search space dimensionality. However, convergence analysis of SPSA [6, 8] revealed that SPSA needs about the same number of iteration steps to reach a given target value. While this result may not apply to all types of functions (cf. [8] for a list of the used assumptions and necessary conditions), it showed that SPSA can achieve a better performance w.r.t. the number of function evaluations than other stochastic gradient approximation strategies. The intention of this benchmark is to evaluate a rather simple variant of SPSA and to see how it performs over a wide range of test functions with the same experimental setup. This should allow to draw some conclusions about the robustness of SPSA. The paper is organized in the following manner: A description of the strategy steps is presented in Section 2 and the chosen experimental setup is given in Section 3. Section 4 presents the obtained results and a short summary is given in Section 5. For more details on SPSA, including applications, the interested reader is referred to http://www.jhuapl.edu/spsa/.

This paper presents the result for Simultaneous Perturbation Stochastic Approximation (SPSA) on the BBOB 2010 noisy testbed. SPSA is a stochastic gradient approximation strategy which uses random directions for the gradient estimate. The paper describes the steps performed by the strategy and the experimental setup. The chosen setup represents a rather basic variant of SPSA. The strategy can successfully solve 5 functions for D = 2, 2 for D = 3, and 1 for D = 5. For each function at least one target level is reached up to D = 3.

Categories and Subject Descriptors G.1.6 [Numerical Analysis]: Optimization—global optimization, unconstrained optimization; F.2.1 [Analysis of Algorithms and Problem Complexity]: Numerical Algorithms and Problems

General Terms Algorithms

Keywords Benchmarking, Black-box optimization

2. ALGORITHM PRESENTATION

1.

Algorithm 1 Single Iteration Step of SPSA 1: % set the step sizes 2: ak = a0 (k + A)−α 3: ck = c0 k−γ 4: % approximate gradient 5: for l := 1 to λ do 6: choose ∆ from symmetric ±1 Bernoulli distribution 7: f + = f (x + ck ∆) 8: f − = f (x − ck ∆) f + − f − −1 9: gl = ∆ 2ck 10: end for 1 Pλ 11: Gk = gl λ l=1 12: % update current solution 13: x ← x − ak Gk

INTRODUCTION 1

The algorithm under consideration is Simultaneous Perturbation Stochastic Approximation (SPSA). It was developed in [6] and it is a strategy which approximates the gradient of the current solution by using random directions for the gradient estimate. Its intended main area of application is noisy optimization. 1

This paper uses (almost) exactly the same description of the the algorithm and the experimental setup as for the noiseless testbed. However, due to the page limitations per paper and to keep the paper self-consistent, this information must be repeated here.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. GECCO’10, July 7–11, 2010, Portland, Oregon, USA. Copyright 2010 ACM 978-1-4503-0073-5/10/07 ...$10.00.

In Alg. 1 a single iteration step of SPSA is given, where k is the number of the current iteration step (k ≥ 1). The algorithm consists of three parts: a) the setting of the current step sizes ak and ck (lines 2–3), b) the gradient approximation procedure (lines 5–11), and c) the update of the cur-

1665

max. FEvals max. restarts xinit α γ λ(first run) λ max.λ β A

≈ 105 D 105 N (−4, 16)D 0.602 0.101 1 1 + 2#restarts 4D2 1 5000D/(2λ)

where β is the desired minimal change in the magnitude of xi (components of the solution) and gmean is the mean magnitude of the gradient at xinit . The gradient is obtained by performing the gradient approximation steps as given in lines 6–9 in Alg. 1. Choosing A > 0 increases a0 and ensures a sufficiently large ak in late iterations. The chosen A corresponds to 5% of the maximal number of iterations. With this setup, the crafting effort [3] is CrE = 0. Next to the standard termination criteria for ft and the maximal number of FEvals, three additional criteria were considered. First, if the update step is too small the current run should be stopped. That is, if the largest absolute value of the components of ak Gk in iteration step k is less than 10−10 the current run is terminated. Further, if f + = f − for all λ, i.e. Gk = 0, the run is also terminated. Finally, a history of recent best function values is kept during each run. If the median of these values is not improving by a value of at least 10−8 over 100 iterations, one assumes that the strategy is stagnating and the run will be terminated. This criterion will be referred to as stagnation criterion in the following. If the budget of function evaluations is not exhausted a restart is initiated.

Table 1: Parameter setup for the BBOB 2010 noiseless testbed. rent solution (line 13). The strategy uses two independent step sizes, ck for the gradient approximation and ak for the update of the current solution x. Both step sizes decrease during the optimization process, where the rate of the decrease is controlled by α (ak ) and γ (ck ). Both rates and the initial values a0 and c0 must be set by the user. In the gradient approximation phase, only two function evaluations are necessary to approximate the gradient. However, it might be useful to perform λ ≥ 1 such approximations and use the average gradient estimate for the update of the solution. The function values necessary for the gradient approximation are evaluated at x ± ck ∆, where the components of ∆ are drawn form the symmetric ±1 Bernoulli distribution. This distribution obeys the following properties:

4. RESULTS Results from experiments according to [3] on the benchmark functions given in [2, 4] are presented in Figures 1, 2, and 3 and in Tables 2, 3, and 4. Overall SPSA can reach the target function value (ft ) for f101–f103, f115, and f130 in D = 2. For D = 3 only f101 and f102 can be successfully solved, and for D = 5 f101 can be solved. For all functions SPSA can reach at least one target level up to D = 3. Comparing the different noise models it appears that SPSA attains the best performance for the Cauchy model. Note, this noise model is an additive model, while the other two are multiplicative noise models. This effects the setting of c0 . The difference between the Uniform and the Gaussian model is not very pronounced and differs for the different functions. Next a short overview about the performance in each function subgroup is given.

• symmetric, • zero mean and finite variance, • finite inverse moments. These properties match the required properties for the theoretical analysis of SPSA in [6]. The components of ∆ are ±1, each value with a probability of p = 0.5. Note, the inverse ∆−1 in line 9 of Alg. 1) is defined as ˆ −1 −1 ˜T ∆−1 = ∆−1 . 1 ∆2 . . . ∆D

3.

EXPERIMENTAL PROCEDURE

Moderate Noise

In Table 1 the settings of the necessary parameters for the experiments are given. The values for α and γ represent the respective smallest values derived from the theoretical analysis and were recommended in [8]. In contrast to the noiseless setup, the parameter c0 is chosen equal to the standard deviation of 10 function evaluations of the initial solution xinit . This procedure was recommended in [8]. To avoid too large test steps c0 is clamped in the interval [0.1, 10], with 0.1 being the value chosen for the noiseless benchmark. In the first run for each setting λ = 1 was used. If a restart was performed λ was increased as recommended in [3] to improve the success probability of the strategy. A critical parameter is the step size a0 . If chosen too small, SPSA will achieve progress with a very slow rate and possibly converge prematurely, while choosing a0 too large might result in a diverging behavior. To avoid parameter tuning of a0 for each function, a procedure for calculating a0 from [8] was applied. To this end, one determines a0 by evaluating (1 + A)α , a0 = β gmean

In this subgroup SPSA reaches at least one target level for all dimensions only for f103. Except for f103, the scaling behavior w.r.t. the search space dimensionality is greater than quadratic. For f103 it appears to be between linear and quadratic up to target level ’-3’. The Rosenbrock functions (f104–f106) together with the ellipsoid functions (f116-f118) appear to be the of the most difficult functions to solve for SPSA. Taking a look at the data obtained for f104 and f105, one observes that the strategy performs considerably more restarts for D = 2, 3 than for the other dimensionalities. Most runs are terminated by the triggering of the stagnation criterion.

Severe Noise For this group the best performance is observed for the sphere with Cauchy noise (f109) and the worst for the ellipsoid with the Uniform noise model (f117). Further, the strategy reaches ft only for f115 and D = 2. For most target levels the scaling behavior w.r.t. the search space dimensionality appears to be more than quadratic.

1666

Severe Noise on Multi-Modal Functions

Table 4: ERT loss ratio (see Figure 3) compared to the respective best result from BBOB-2009 for budgets given in the first column. The last row RLUS /D gives the number of function evaluations in unsuccessful runs divided by dimension. Shown are the smallest, 10%-ile, 25%-ile, 50%-ile, 75%-ile and 90%ile value (smaller values are better). f 101–f 130 in 5-D, maxFE/D=100040 #FEs/D best 10% 25% med 75% 90% 2 10 10 10 10 10 10 10 31 44 50 50 50 50 100 13 70 1.1e2 3.1e2 5.0e2 5.0e2 72 1.1e2 1.6e2 4.4e2 1.5e3 5.0e3 1e3 9.7 3.2e2 6.0e2 1.1e3 4.9e3 5.0e4 1e4 1e5 35 1.7e2 1.7e3 4.2e3 1.2e4 2.7e5 1e5 1e5 1e5 1e5 1e5 RLUS /D 1e5 f 101–f 130 in 20-D, maxFE/D=100159 #FEs/D best 10% 25% med 75% 90% 2 40 40 40 40 40 40 10 2.0e2 2.0e2 2.0e2 2.0e2 2.0e2 2.0e2 100 16 31 3.3e2 2.0e3 2.0e3 2.0e3 1e3 4.4 27 37 1.4e3 2.0e4 2.0e4 1e4 4.4 1.1e2 2.1e2 1.2e3 2.0e5 2.0e5 1e5 30 1.7e2 1.1e3 2.8e3 2.1e5 2.0e6 1e6 2.7e2 4.7e2 4.2e3 2.0e4 1.1e5 2.0e7 RLUS /D 1e5 1e5 1e5 1e5 1e5 1e5

4 CrE = 0 3 1 0

-1 1 4 2 3 log10 of FEvals / dimension f101-106

5

2 1 0

-1 1 4 2 3 log10 of FEvals / dimension f107-121

5

1 0

-1

3

1 4 2 3 log10 of FEvals / dimension f101-106

5

2 1 0

5. SUMMARY

3

1 4 2 3 log10 of FEvals / dimension f107-121

5

1 4 2 3 log10 of FEvals / dimension f122-130

5

1 4 2 3 log10 of FEvals / dimension

5

In this paper SPSA, a stochastic gradient approximation strategy, was analyzed on the BBOB 2010 noisy testbed. SPSA was able to solve 5 of the 30 functions for D = 2 and achieved at least one target level for each function up to D = 3. Similar to the results from the noiseless benchmark, the performance might suffer from using one parameter setup for all functions. An improvement in the performance on some functions might be achieved by a more careful selection of the strategy parameters. Additionally, the value of c0 was chosen equal to the standard deviation of the initial solution. However, since the Gaussian and the Uniform noise model are essentially multiplicative noise model, i.e. the noise strength depends on the current true fitness value, large standard deviations were observed. To avoid creating trial points far away from the considered domain [−5, 5]D , c0 was restricted. In such case, the value of c0 should be chosen differently to achieve a better performance. Note, in [6] it also was shown that the gradient approximation error is O(c2k ). Furthermore, the considered variant is rather simple and extensions to strategy as suggested in [8] or using additional function evaluations to approximate the Hessian matrix [7, 9] could prove helpful.

2 1 0

-1

1 4 2 3 log10 of FEvals / dimension f122-130

5

2 1 0

-1 -2

1 0

-2 4

-2 4 log10 of ERT loss ratio

log10 of ERT loss ratio

3

f101-130

-1

2

-2 4

D = 20

2

-2 4

log10 of ERT loss ratio

log10 of ERT loss ratio

3

4 CrE = 0 3

-1

log10 of ERT loss ratio

log10 of ERT loss ratio

moderate noise

3

-2 4

severe noise

f101-130

2

-2 4

severe noise multimod.

D=5 log10 of ERT loss ratio

log10 of ERT loss ratio

all functions

In this function subgroup the best performance is observed for the Griewank-Rosenbrock functions (f125–f127). There, at least two target levels are reached up to D = 20. In the case of the Schaffer functions (f122–f124), the target level ’+1’ is reached for at least D = 10, however, no target level better than ’-2’ is found in any dimensionality. On the other hand, for the Gallagher functions (f128–f130) the target level ’+1’ is always reached up to D = 5, however, for D = 2 at least target level ’-3’ is achieved for all 3 noise models. For the results of the timing experiment the reader is referred to the paper describing the noiseless benchmark.

3 2 1 0

-1

1 4 2 3 log10 of FEvals / dimension

5

-2

Figure 3: ERT loss ratio versus given budget FEvals. The target value ft for ERT (see Figure 1) is the smallest (best) recorded function value such that ERT(ft ) ≤ FEvals for the presented algorithm. Shown is FEvals divided by the respective best ERT(ft ) from BBOB-2009 for functions f101 –f130 in 5-D and 20D. Each ERT is multiplied by exp(CrE) correcting for the parameter crafting effort. Line: geometric mean. Box-Whisker error bar: 25-75%-ile with median (box), 10-90%-ile (caps), and minimum and maximum ERT loss ratio (points). The vertical line gives the maximal number of function evaluations in this function subset.

6. ACKNOWLEDGMENTS Support by the Austrian Science Fund (FWF) under grant P19069-N18 is gratefully acknowledged.

7. REFERENCES [1] J. R. Blum. Multidimensional Stochastic Approximation Methods. Annals of Mathematical Statistics, 25:737–744, 1954.

1667

7 6 5 4 3 2 1 0

101 Sphere moderate Gauss 9

2

8 7 66 5 4 3 2 1 0 2 7 66 5 4 3 2 1 0 2

3

4

5

10

20

+1 +0 -1 -2 -3 -5 -8 40

102 Sphere moderate unif

3

5

10

20

40

103 Sphere moderate Cauchy

5

10

20

40

116 Ellipsoid Gauss

7 6 5 4 3 2 1 0 2

3

5

10

20

40

117 Ellipsoid unif

8 7 6 5 4 3 2 1 0 2

3

5

10

20

40

118 Ellipsoid Cauchy

8 7 6 5 4 3 2 1 0 2

3

5

10

20

7 6 5 4 3 2 1 0

104 Rosenbrock moderate Gauss

40

6 5 4 3 2 1 0 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 8 7 6 5 4 3 2 1 0

107 Sphere Gauss

8

110 Rosenbrock Gauss

7

7

113 Step-ellipsoid Gauss

7

6

6

6

5

5

4

4

3

3

2

2

5 4 3 2 1

2

3

5

10

20

40

105 Rosenbrock moderate unif

0 2

3

5

10

20

7

1

1

0

0

40

108 Sphere unif

2

3

5

5

10

20

40

111 Rosenbrock unif

8

6

2

3

8

7

7

6

6

5

5

4

4

3

3

2

2

5

10

20

40

114 Step-ellipsoid unif

4 3 2 1

2

3

5

10

20

40

106 Rosenbrock moderate Cauchy 7

4

3

7 6 5 4 3 2 1 0

1

0

1

0 2

3

5

10

20

40

109 Sphere Cauchy

8

0 2

3

5

10

20

40

112 Rosenbrock Cauchy

8

2

3

5

10

20

40

115 Step-ellipsoid Cauchy

7 2

7

7

6

6

5

5

6 5 4

4

4

3

3

2

2

1

1

0

0

3

2

3

5

10

20

40

119 Sum of different powers Gauss

2

3

5

10

20

40

122 Schaffer F7 Gauss

8

2 1 0 2

7

6

6

5

5

4

4

3

3

2

2

1

1

5

10

20

40

125 Griewank-Rosenbrock Gauss

8

7

3

2

3

5

10

20

40

128 Gallagher Gauss

9 8 7 6 5 4

2

3

5

10

20

40

120 Sum of different powers unif

0

3 2 1

0 2

3

5

10

20

40

123 Schaffer F7 unif

8

0 2

7

6

6

5

5

4

4

3

3

2

2

1

1

5

10

20

40

126 Griewank-Rosenbrock unif

8

7

3

2

3

5

10

20

40

129 Gallagher unif

9 8 7 6 5 4

2

3

5

10

20

40

121 Sum of different powers Cauchy

0

3 2 1

0 2

3

5

10

20

40

124 Schaffer F7 Cauchy

7

8

6 5

0 2

3

5

10

20

40

127 Griewank-Rosenbrock Cauchy

2

3

7 1

6

6 5

4

4

3

3

2

2

1

1

10

20

+1

4

+0

3

1

3

5

10

20

40

-1 -2

2

2

40

130 Gallagher Cauchy

8

7

5

5

-3 -5 -8

0

0 2

3

5

10

20

40

0 2

3

5

10

20

40

2

3

5

10

20

40

Figure 1: Expected Running Time (ERT, ) to reach fopt + ∆f and median number of f -evaluations from successful trials (+), for ∆f = 10{+1,0,−1,−2,−3,−5,−8} (the exponent is given in the legend of f101 and f130 ) versus dimension in log-log presentation. For each function and dimension, ERT(∆f ) equals to #FEs(∆f ) divided by the number of successful trials, where a trial is successful if fopt + ∆f was surpassed. The #FEs(∆f ) are the total number (sum) of f -evaluations while fopt + ∆f was not surpassed in the trial, from all (successful and unsuccessful) trials, and fopt is the optimal function value. Crosses (×) indicate the total number of f -evaluations, #FEs(−∞), divided by the number of trials. Numbers above ERT-symbols indicate the number of successful trials. Y-axis annotations are decimal logarithms. The thick light line with diamonds shows the single best results from BBOB-2009 for ∆f = 10−8 . Additional grid lines show linear and quadratic scaling.



1668

∆f 10 1 1e−1 1e−3 1e−5 1e−8 ∆f 10 1 1e−1 1e−3 1e−5 1e−8 ∆f 10 1 1e−1 1e−3 1e−5 1e−8 ∆f 10 1 1e−1 1e−3 1e−5 1e−8 ∆f 10 1 1e−1 1e−3 1e−5 1e−8 ∆f 10 1 1e−1 1e−3 1e−5 1e−8 ∆f 10 1 1e−1 1e−3 1e−5 1e−8 ∆f 10 1 1e−1 1e−3 1e−5 1e−8 ∆f 10 1 1e−1 1e−3 1e−5 1e−8 ∆f 10 1 1e−1 1e−3 1e−5 1e−8

f 101 in 5-D, N=15, mFE=500142 # ERT 10% 90% RTsucc 15 9.1 e2 1.8 e2 2.2 e3 9.1 e2 15 6.3 e3 1.2 e3 1.5 e4 6.3 e3 15 1.7 e4 1.8 e3 1.7 e4 1.7 e4 15 2.8 e4 3.1 e3 7.8 e4 2.8 e4 15 3.9 e4 4.6 e3 9.6 e4 3.9 e4 4 1.6 e6 2.0 e5 3.7 e6 1.8 e5 f 103 in 5-D, N=15, mFE=500039 # ERT 10% 90% RTsucc 15 1.7 e3 5.8 e2 3.2 e3 1.7 e3 15 3.3 e3 1.4 e3 6.3 e3 3.3 e3 15 4.9 e3 2.3 e3 9.3 e3 4.9 e3 15 1.3 e4 4.3 e3 3.1 e4 1.3 e4 2 3.3 e6 9.9 e3 7.8 e6 8.3 e3 0 42e–6 82e–7 32e–5 2.9 e4 f 105 in 5-D, N=15, mFE=500197 # ERT 10% 90% RTsucc 0 36e+0 15e+0 38e+0 2.0 e4 . . . . . . . . . . . . . . . . . . . . . . . . . f 107 in 5-D, N=15, mFE=500158 # ERT 10% 90% RTsucc 10 3.4 e5 1.6 e2 1.0 e6 8.9 e4 0 88e–1 31e–1 25e+0 5.4 e4 . . . . . . . . . . . . . . . . . . . . f 109 in 5-D, N=15, mFE=500196 # ERT 10% 90% RTsucc 15 1.5 e3 2.7 e2 3.7 e3 1.5 e3 14 5.4 e4 1.9 e3 1.0 e5 1.9 e4 6 7.6 e5 2.8 e3 2.0 e6 1.0 e4 0 13e–2 12e–3 97e–2 1.4 e4 . . . . . . . . . . f 111 in 5-D, N=15, mFE=500153 # ERT 10% 90% RTsucc 1 7.3 e6 7.9 e5 1.6 e7 2.9 e5 0 20e+0 10e+0 30e+0 2.0 e5 . . . . . . . . . . . . . . . . . . . . f 113 in 5-D, N=15, mFE=500200 # ERT 10% 90% RTsucc 10 2.5 e5 1.8 e2 1.0 e6 9.8 e2 0 56e–1 18e–1 16e+0 5.1 e3 . . . . . . . . . . . . . . . . . . . . f 115 in 5-D, N=15, mFE=500197 # ERT 10% 90% RTsucc 15 1.4 e4 1.6 e2 2.8 e4 1.4 e4 6 8.0 e5 1.5 e3 2.0 e6 4.5 e4 1 7.1 e6 5.7 e5 1.7 e7 6.9 e4 0 12e–1 18e–2 39e–1 4.3 e4 . . . . . . . . . . f 117 in 5-D, N=15, mFE=500181 # ERT 10% 90% RTsucc 0 13e+1 35e+0 75e+1 2.9 e5 . . . . . . . . . . . . . . . . . . . . . . . . . f 119 in 5-D, N=15, mFE=500182 # ERT 10% 90% RTsucc 14 3.6 e4 9.3 e1 1.1 e3 3.7 e2 0 32e–1 13e–1 82e–1 8.1 e4 . . . . . . . . . . . . . . . . . . . .

f 101 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 5 4.9 e6 1.7 e5 1.2 e7 9.5 e5 0 25e+0 40e–1 35e+0 9.0 e5 . . . . . . . . . . . . . . . . . . . . f 103 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 15 8.1 e3 4.4 e3 1.6 e4 8.1 e3 15 1.3 e4 6.8 e3 2.5 e4 1.3 e4 15 1.9 e4 9.7 e3 3.4 e4 1.9 e4 15 1.1 e5 1.7 e4 2.4 e5 1.1 e5 0 37e–5 10e–5 75e–5 9.9 e4 . . . . . f 105 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 0 14e+1 13e+1 14e+1 8.7 e4 . . . . . . . . . . . . . . . . . . . . . . . . . f 107 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 0 65e+0 48e+0 87e+0 3.2 e4 . . . . . . . . . . . . . . . . . . . . . . . . . f 109 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 15 1.8 e4 3.2 e3 3.7 e4 1.8 e4 9 1.5 e6 6.5 e3 4.1 e6 1.3 e5 4 5.8 e6 1.6 e5 1.4 e7 3.3 e5 0 30e–2 32e–3 39e–1 6.4 e4 . . . . . . . . . . f 111 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 0 48e+1 16e+1 13e+2 1.5 e5 . . . . . . . . . . . . . . . . . . . . . . . . . f 113 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 0 98e+0 42e+0 15e+1 1.5 e5 . . . . . . . . . . . . . . . . . . . . . . . . . f 115 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 7 2.3 e6 1.1 e4 6.1 e6 3.7 e4 0 12e+0 60e–1 15e+0 4.1 e4 . . . . . . . . . . . . . . . . . . . . f 117 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 0 13e+3 73e+2 23e+3 5.4 e5 . . . . . . . . . . . . . . . . . . . . . . . . . f 119 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 0 19e+0 14e+0 24e+0 5.1 e3 . . . . . . . . . . . . . . . . . . . . . . . . .

f 102 in 5-D, N=15, mFE=500188 ∆f # ERT 10% 90% RTsucc 10 15 2.2 e4 4.6 e2 8.5 e4 2.2 e4 1 12 1.7 e5 3.8 e3 5.6 e5 4.7 e4 1e−1 8 5.0 e5 6.3 e3 1.1 e6 5.9 e4 1e−3 7 6.6 e5 7.2 e3 1.6 e6 9.2 e4 1e−5 3 2.1 e6 1.0 e5 5.1 e6 9.8 e4 1e−8 0 31e–3 36e–7 26e–1 1.7 e5 f 104 in 5-D, N=15, mFE=500197 ∆f # ERT 10% 90% RTsucc 10 0 36e+0 26e+0 38e+0 3.1 e4 1 . . . . . 1e−1 . . . . . 1e−3 . . . . . 1e−5 . . . . . 1e−8 . . . . . f 106 in 5-D, N=15, mFE=500066 ∆f # ERT 10% 90% RTsucc 10 14 1.9 e5 4.1 e3 4.7 e5 1.5 e5 1 0 26e–1 10e–1 83e–1 2.2 e5 1e−1 . . . . . 1e−3 . . . . . 1e−5 . . . . . 1e−8 . . . . . f 108 in 5-D, N=15, mFE=500184 ∆f # ERT 10% 90% RTsucc 10 15 7.7 e3 2.4 e3 1.4 e4 7.7 e3 1 15 5.0 e4 1.7 e4 9.2 e4 5.0 e4 1e−1 2 3.5 e6 3.8 e5 8.1 e6 2.5 e5 1e−3 0 15e–2 81e–3 21e–2 3.7 e5 1e−5 . . . . . 1e−8 . . . . . f 110 in 5-D, N=15, mFE=500195 ∆f # ERT 10% 90% RTsucc 10 1 7.0 e6 5.4 e5 1.6 e7 3.7 e4 1 0 31e+0 13e+0 70e+0 2.8 e4 1e−1 . . . . . 1e−3 . . . . . 1e−5 . . . . . 1e−8 . . . . . f 112 in 5-D, N=15, mFE=500067 ∆f # ERT 10% 90% RTsucc 10 14 1.1 e5 2.1 e3 3.6 e5 7.8 e4 1 0 45e–1 18e–1 89e–1 5.6 e4 1e−1 . . . . . . . . . 1e−3 . 1e−5 . . . . . 1e−8 . . . . . f 114 in 5-D, N=15, mFE=500181 ∆f # ERT 10% 90% RTsucc 10 15 1.4 e5 2.5 e4 3.3 e5 1.4 e5 1 2 3.6 e6 3.8 e5 8.3 e6 3.6 e5 1e−1 0 29e–1 75e–2 66e–1 2.5 e5 1e−3 . . . . . 1e−5 . . . . . 1e−8 . . . . . f 116 in 5-D, N=15, mFE=500190 ∆f # ERT 10% 90% RTsucc 10 0 74e+1 10e+1 14e+2 6.7 e4 1 . . . . . 1e−1 . . . . . 1e−3 . . . . . 1e−5 . . . . . 1e−8 . . . . . f 118 in 5-D, N=15, mFE=500154 ∆f # ERT 10% 90% RTsucc 10 5 1.1 e6 6.4 e4 2.6 e6 1.1 e5 1 0 18e+0 56e–1 87e+0 6.5 e4 1e−1 . . . . . 1e−3 . . . . . 1e−5 . . . . . 1e−8 . . . . . f 120 in 5-D, N=15, mFE=500196 ∆f # ERT 10% 90% RTsucc 10 15 4.4 e3 1.4 e3 9.2 e3 4.4 e3 1 9 4.8 e5 4.7 e4 1.2 e6 1.5 e5 1e−1 0 84e–2 27e–2 32e–1 1.2 e5 1e−3 . . . . . 1e−5 . . . . . 1e−8 . . . . .

f 102 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 0 43e+0 31e+0 56e+0 2.1 e5 . . . . . . . . . . . . . . . . . . . . . . . . . f 104 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 0 14e+1 13e+1 14e+1 5.0 e4 . . . . . . . . . . . . . . . . . . . . . . . . . f 106 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 0 52e+0 22e+0 17e+1 1.8 e5 . . . . . . . . . . . . . . . . . . . . . . . . . f 108 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 14 6.6 e5 2.5 e5 1.6 e6 5.2 e5 0 59e–1 31e–1 91e–1 8.7 e5 . . . . . . . . . . . . . . . . . . . . f 110 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 0 42e+1 24e+1 17e+2 8.5 e4 . . . . . . . . . . . . . . . . . . . . . . . . . f 112 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 0 53e+0 26e+0 19e+1 2.7 e5 . . . . . . . . . . . . . . . . . . . . . . . . . f 114 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 0 71e+0 29e+0 14e+1 4.2 e5 . . . . . . . . . . . . . . . . . . . . . . . . . f 116 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 0 11e+3 43e+2 17e+3 5.2 e5 . . . . . . . . . . . . . . . . . . . . . . . . . f 118 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 0 63e+0 33e+0 13e+1 2.5 e5 . . . . . . . . . . . . . . . . . . . . . . . . . f 120 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 3 8.3 e6 1.4 e5 2.0 e7 3.1 e5 0 12e+0 86e–1 22e+0 3.9 e5 . . . . . . . . . . . . . . . . . . . .

Table 2: Shown are, for functions f101 -f120 and for a given target difference to the optimal function value ∆f : the number of successful trials (#); the expected running time to surpass fopt + ∆f (ERT, see Figure 1); the 10%-ile and 90%-ile of the bootstrap distribution of ERT; the average number of function evaluations in successful trials or, if none was successful, as last entry the median number of function evaluations to reach the best function value (RTsucc ). If fopt + ∆f was never reached, figures in italics denote the best achieved ∆f -value of the median trial and the 10% and 90%-ile trial. Furthermore, N denotes the number of trials, and mFE denotes the maximum of number of function evaluations executed in one trial. See Figure 1 for the names of functions.

1669

D=5

D = 20

1.0

1.0 +1:26/30

f101-130

+1:13/30

-4:3/30 -8:1/30

0.5

0.0 0

f101-130 1

2

3

4

5

0

2

4

log10 of FEvals / DIM

6

8

-8:0/30

0.5

0.0 0

10 12 14 16 18 20 22 24

proportion of trials

-8:1/6

0.5

f101-106 1

2

3

4

5

0

2

4

6

2

4

6

8

10

12

14

16

18

20

log10 of Df / Dftarget

8

-8:0/6

0.5

0.0 0

10 12 14 16 18 20 22 24

f101-106 1

2

3

4

5

0

2

4

log10 of FEvals / DIM

f107-121

6

8

10

12

14

16

18

20

log10 of Df / Dftarget

proportion of trials

-1:1/15

-4:0/15 -8:0/15

0.5

f107-121 1

f107-121

+1:5/15

-1:3/15 proportion of trials

0

1.0 +1:13/15

severe noise

5

-4:1/6

log10 of Df / Dftarget

1.0

2

3

4

5

0

2

4

log10 of FEvals / DIM

6

8

-4:0/15 -8:0/15

0.5

0.0 0

10 12 14 16 18 20 22 24

f107-121 1

log10 of Df / Dftarget

2

3

4

5

0

2

4

log10 of FEvals / DIM

1.0

6

8

10

12

14

16

18

20

log10 of Df / Dftarget

1.0 f122-130

+1:9/9

proportion of trials

-1:2/9

-4:0/9 -8:0/9

0.5

f122-130 1

f122-130

+1:6/9

-1:3/9 proportion of trials

4

-1:1/6

log10 of FEvals / DIM

severe noise multimod.

3

f101-106

+1:2/6

-4:3/6

0.0 0

2

1.0

-1:3/6

0.0 0

f101-130 1

log10 of FEvals / DIM

f101-106

+1:4/6

proportion of trials

moderate noise

-4:1/30

log10 of Df / Dftarget

1.0

0.0 0

f101-130

-1:4/30 proportion of trials

proportion of trials

all functions

-1:9/30

2

3

log10 of FEvals / DIM

4

5

0

2

4

6

8

-4:0/9 -8:0/9

0.5

0.0 0

10 12 14 16 18 20 22 24

log10 of Df / Dftarget

f122-130 1

2

3

log10 of FEvals / DIM

4

5

0

2

4

6

8

10

12

14

16

18

20

log10 of Df / Dftarget

Figure 2: Empirical cumulative distribution functions (ECDFs), plotting the fraction of trials versus running time (left subplots) or versus ∆f (right subplots). The thick red line represents the best achieved results. Left subplots: ECDF of the running time (number of function evaluations), divided by search space dimension D, to fall below fopt + ∆f with ∆f = 10k , where k is the first value in the legend. Right subplots: ECDF of the best achieved ∆f divided by 10k (upper left lines in continuation of the left subplot), and best achieved ∆f divided by 10−8 for running times of D, 10 D, 100 D . . . function evaluations (from right to left cycling blackcyan-magenta). The legends indicate the number of functions that were solved in at least one trial. FEvals denotes number of function evaluations, D and DIM denote search space dimension, and ∆f and Df denote the difference to the optimal function value. Light brown lines in the background show ECDFs for target value 10−8 of all algorithms benchmarked during BBOB-2009.

1670

f 121 in 5-D, N=15, mFE=500182 ∆f # ERT 10% 90% RTsucc 10 15 1.0 e3 4.6 e1 2.2 e3 1.0 e3 1 6 7.6 e5 1.6 e3 2.0 e6 1.1 e4 1e−1 0 11e–1 14e–2 26e–1 6.1 e3 . . . . 1e−3 . 1e−5 . . . . . 1e−8 . . . . . f 123 in 5-D, N=15, mFE=500185 ∆f # ERT 10% 90% RTsucc 10 12 1.4 e5 1.7 e3 5.1 e5 1.1 e4 1 0 71e–1 27e–1 24e+2 3.0 e4 1e−1 . . . . . 1e−3 . . . . . 1e−5 . . . . . 1e−8 . . . . . f 125 in 5-D, N=15, mFE=500199 ∆f # ERT 10% 90% RTsucc 10 14 3.6 e4 4.5 e1 1.1 e2 7.1 e1 1 14 3.6 e4 6.0 e1 3.2 e2 1.2 e2 1e−1 14 5.3 e4 6.1 e2 2.0 e5 1.7 e4 1e−3 0 51e–3 13e–3 95e–3 5.0 e3 1e−5 . . . . . 1e−8 . . . . . f 127 in 5-D, N=15, mFE=500198 ∆f # ERT 10% 90% RTsucc 10 15 1.2 e2 2.0 e1 2.2 e2 1.2 e2 1 15 2.2 e4 4.8 e1 1.0 e5 2.2 e4 1e−1 2 3.4 e6 2.4 e5 8.1 e6 1.5 e5 1e−3 0 15e–2 35e–3 37e–2 1.2 e5 1e−5 . . . . . 1e−8 . . . . . f 129 in 5-D, N=15, mFE=500189 ∆f # ERT 10% 90% RTsucc 10 15 7.3 e4 4.3 e3 2.5 e5 7.3 e4 1 0 58e–1 19e–1 70e–1 2.5 e5 1e−1 . . . . . 1e−3 . . . . . 1e−5 . . . . . 1e−8 . . . . .

f 121 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 6 3.1 e6 8.7 e3 8.0 e6 7.2 e4 0 10e+0 47e–1 12e+0 2.9 e4 . . . . . . . . . . . . . . . . . . . . f 123 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 0 20e+1 17e+0 63e+3 5.4 e3 . . . . . . . . . . . . . . . . . . . . . . . . . f 125 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 10 1.0 e6 1.5 e2 4.0 e6 2.0 e2 10 1.0 e6 2.6 e2 4.0 e6 3.2 e2 6 3.4 e6 2.3 e5 8.4 e6 4.4 e5 0 12e–2 72e–3 65e+3 8.9 e4 . . . . . . . . . . f 127 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 15 9.6 e2 1.7 e2 1.9 e3 9.6 e2 7 2.3 e6 5.5 e2 6.0 e6 8.5 e3 3 8.0 e6 4.1 e3 2.0 e7 1.2 e4 0 10e–1 44e–3 32e–1 1.7 e4 . . . . . . . . . . f 129 in 20-D, N=15, mFE=2.00 e6 # ERT 10% 90% RTsucc 0 69e+0 65e+0 74e+0 1.0 e5 . . . . . . . . . . . . . . . . . . . . . . . . .

f 122 in 5-D, N=15, mFE=500200 ∆f # ERT 10% 90% RTsucc 10 15 1.2 e3 5.1 e1 3.5 e3 1.2 e3 1 0 46e–1 26e–1 69e–1 1.8 e4 1e−1 . . . . . . . . . 1e−3 . 1e−5 . . . . . 1e−8 . . . . . f 124 in 5-D, N=15, mFE=500197 ∆f # ERT 10% 90% RTsucc 10 15 5.0 e3 8.3 e1 8.6 e3 5.0 e3 1 2 3.4 e6 2.0 e5 8.2 e6 1.0 e5 1e−1 0 39e–1 30e–2 61e–1 7.6 e4 1e−3 . . . . . 1e−5 . . . . . 1e−8 . . . . . f 126 in 5-D, N=15, mFE=500162 ∆f # ERT 10% 90% RTsucc 10 2 3.3 e6 6.1 e2 8.0 e6 4.3 e2 1 2 3.3 e6 2.7 e3 8.0 e6 2.0 e3 1e−1 1 7.2 e6 6.7 e5 1.6 e7 1.7 e5 1e−3 0 16e+2 11e–2 41e+2 1.7 e1 1e−5 . . . . . 1e−8 . . . . . f 128 in 5-D, N=15, mFE=500184 ∆f # ERT 10% 90% RTsucc 10 9 4.0 e5 8.4 e1 1.0 e6 7.0 e4 1 0 89e–1 45e–1 22e+0 8.0 e4 1e−1 . . . . . 1e−3 . . . . . 1e−5 . . . . . 1e−8 . . . . . f 130 in 5-D, N=15, mFE=500198 ∆f # ERT 10% 90% RTsucc 10 15 7.8 e3 3.3 e2 2.3 e4 7.8 e3 1 4 1.6 e6 1.1 e5 3.6 e6 2.1 e5 1e−1 0 19e–1 14e–2 36e–1 1.0 e5 1e−3 . . . . . 1e−5 . . . . . 1e−8 . . . . .

f 122 in # ERT 5 4.0 e6 0 11e+0 . . . . . . . . f 124 in # ERT 4 6.1 e6 0 13e+0 . . . . . . . . f 126 in # ERT 1 2.8 e7 1 2.8 e7 0 45e+3 . . . . . . f 128 in # ERT 0 73e+0 . . . . . . . . . . f 130 in # ERT 1 2.8 e7 0 73e+0 . . . . . . . .

20-D, N=15, mFE=2.00 e6 10% 90% RTsucc 4.9 e2 1.0 e7 3.8 e4 76e–1 45e+3 5.9 e2 . . . . . . . . . . . . 20-D, N=15, mFE=2.00 e6 10% 90% RTsucc 1.6 e4 1.5 e7 5.6 e5 83e–1 53e+3 1.6 e4 . . . . . . . . . . . . 20-D, N=15, mFE=2.00 e6 10% 90% RTsucc 2.0 e6 6.6 e7 4.2 e3 2.0 e6 6.4 e7 2.9 e4 20e+2 95e+3 3.3 e1 . . . . . . . . . 20-D, N=15, mFE=2.00 e6 10% 90% RTsucc 65e+0 76e+0 3.7 e4 . . . . . . . . . . . . . . . 20-D, N=15, mFE=2.00 e6 10% 90% RTsucc 2.0 e6 6.4 e7 2.3 e4 60e+0 83e+0 9.0 e4 . . . . . . . . . . . .

Table 3: Shown are, for functions f121 -f130 and for a given target difference to the optimal function value ∆f : the number of successful trials (#); the expected running time to surpass fopt + ∆f (ERT, see Figure 1); the 10%-ile and 90%-ile of the bootstrap distribution of ERT; the average number of function evaluations in successful trials or, if none was successful, as last entry the median number of function evaluations to reach the best function value (RTsucc ). If fopt + ∆f was never reached, figures in italics denote the best achieved ∆f -value of the median trial and the 10% and 90%-ile trial. Furthermore, N denotes the number of trials, and mFE denotes the maximum of number of function evaluations executed in one trial. See Figure 1 for the names of functions. [2] S. Finck, N. Hansen, R. Ros, and A. Auger. Real-parameter black-box optimization benchmarking 2010: Presentation of the noisy functions. Technical Report 2009/21, Research Center PPE, 2010. [3] N. Hansen, A. Auger, S. Finck, and R. Ros. Real-parameter black-box optimization benchmarking 2010: Experimental setup. Technical Report RR-7215, INRIA, 2010. [4] N. Hansen, S. Finck, R. Ros, and A. Auger. Real-parameter black-box optimization benchmarking 2009: Noisy functions definitions. Technical Report RR-6869, INRIA, 2009. Updated February 2010. [5] J. Kiefer and J. Wolfowitz. Stochastic Estimation of the Maximum of a Regression Function. Annals of Mathematical Statistics, 23:462–466, 1952. [6] J. C. Spall. Multivariate Stochastic Approximation Using a Simultaneous Perturbation Gradient Approximation. IEEE Transactions on Automatic Control, 37(3):332–341, March 1992. [7] J. C. Spall. Adaptive Stochastic Approximation by the Simultaneous Perturbation Method. IEEE Transactions on Automatic Control, 45, 2000. [8] J. C. Spall. Introduction to Stochastic Search and Optimization. John Wiley & Sons, Hoboken, NJ, 2003. [9] J. C. Spall. Feedback and Weighting Mechanisms for Improving Jacobian Estimates in the Adaptive Simultaneous Perturbation Algorithm. IEEE Transactions on Automatic Control, 54, 2009.

1671