Sequential Parameter Optimization for Symbolic Regression Thomas Bartz-Beielstein Oliver Flasch Martin Zaefferer {firstname.lastname}@fh-koeln.de SPOT SEVEN Cologne University of Applied Sciences Faculty for Computer and Engineering Science
July 2012
SEVEN
Bartz-Beielstein, Flasch, Zaefferer (CUAS)
SPO for Symbolic Regression
July 2012
1 / 28
Agenda
Goals Introducing RGP Introducing SPOT Experiments with SPOT Advanced SPOT Features
SEVEN
Bartz-Beielstein, Flasch, Zaefferer (CUAS)
SPO for Symbolic Regression
July 2012
2 / 28
Goals
Goals of this Work
I
general (conceptual) framework for empirical analysis of I I
I
based on Experimental Research in Evolutionary Computation (Bartz-Beielstein, 2006) I I
I
GP system components and their . . . . . . influence on GP system performance
principles for obtaining statistically validated results . . . . . . of high reproducibility
prototypic software implementation of this framework I I I
automation of much of the necessary repeated work standardized result analysis available as open-source software based on the R environment
SEVEN
Bartz-Beielstein, Flasch, Zaefferer (CUAS)
SPO for Symbolic Regression
July 2012
3 / 28
Introducing RGP
R: Programming Language for Statistics I
I
R is “GNU S”, a freely available language and environment for statistical computing and graphics. R provides a wide variety of statistical and graphical techniques: I I I I I
I
Useful platform for GP, providing: I I I I
I
linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc. flexible interactive environment fast expression maniplutaion and evaluation powerful visualization tools tools for parallel computing
See R project homepage http://cran.r-project.org/ for further information. SEVEN
Bartz-Beielstein, Flasch, Zaefferer (CUAS)
SPO for Symbolic Regression
July 2012
4 / 28
Introducing RGP
RGP Overview I
modular GP implementation in R I I
I
simplicity beats complexity convention over configuration
large feature set multiple search heuristics (Pareto GP, TinyGP, . . . ) multiple representations (tree GP and linear GP) I multiple sets of variation operators I support for strongly-typed GP ,→ complex parameterization I I
I
performance-critical functions also implemented in C
I
comprehensive documentation
I
Freely available (GPL-2) on CRAN: install.packages("rgp")
I
See RGP project homepage http://rsymbolic.org/ for details and “bleeding edge releases”. SEVEN
Bartz-Beielstein, Flasch, Zaefferer (CUAS)
SPO for Symbolic Regression
July 2012
5 / 28
Introducing RGP
GP Search Heuristics I I
concrete search strategy employed by a GP system independent of the concrete GP search space
,→ decouple the search heuristic from the search space I GP search heuristic components: I I I
I
GP system components independent of the search heuristic: I I I
I
selection strategy variation pipeline (order of variation operator application) diversity preservation GP individual representation GP individual initialization and variation (mutation and crossover) GP individual evaluation
examples of GP search heuristics: I I
classical single-objective steady-state EAs with tournament selection modern multi-objective steady-state heuristics SEVEN
Bartz-Beielstein, Flasch, Zaefferer (CUAS)
SPO for Symbolic Regression
July 2012
6 / 28
Introducing RGP
TinyGP
I
popular small GP implementation mainly used in teaching
I
steady-state single-objective search heuristic with tournament selection
I
no direct means of diversity preservation
I
included as a reference with well-known performance characteristics
Table : Parameters of the TinyGP search heuristic.
Population Size Tournament Size Recombination Probability
Variable (Symbol)
Domain
Default
mu (µ) tournamentSize (stournament ) recombinationProbability (prec )
N N [0, 1]
300 2 0.9
SEVEN
Bartz-Beielstein, Flasch, Zaefferer (CUAS)
SPO for Symbolic Regression
July 2012
7 / 28
Introducing RGP
Generational Multi-Objective GP (GMOGP) I
based on the well-known multi-objective generational (µ + λ) EA NSGAII
I
coarsely scalable complexity through optional selection criteria: individual age, individual complexity
I
diversity preservation through age-layering
I
included as search heuristic with scalable complexity
Table : Parameters of the GMOGP search heuristic.
Population Size Children per Generation Recombination Probability Enable Complexity Criterion Enable Age Criterion New Individuals per Generation
Variable (Symbol)
Domain
Default
mu (µ) lambda (λ) recombinationProbability (prec ) complexityCriterion ageCriterion nu (ν)
N N [0, 1] B B N0
300 20 0.1 true true 1
SEVEN
Bartz-Beielstein, Flasch, Zaefferer (CUAS)
SPO for Symbolic Regression
July 2012
8 / 28
Introducing RGP
Experiment Setup I
Research goal: Quantify influence of RGP search heuristic parameters on algorithm performance (single algorithm, single problem).
Table : RGP parameters independent of the search heuristic. Problem Fitness Cases Error Measure
Symbolic Regression of f (x) := sin(x) + cos(2 · x) 200 equidistant samples in [0, 4 · π] sample RMSE
Function Set Input Variable Set Constant Set Individual Size Limit Mutation Operator Set Crossover Operator Set
{+, −, ·, ÷} {x} uniform random constants in [−1, 1] 64 { insert/delete subtree, change function/constant } { random subtree crossover }
Time Budget per GP Run Initial Experiment Design Size Number of Sequential GP Runs
5 minutes 10 100 SEVEN
Bartz-Beielstein, Flasch, Zaefferer (CUAS)
SPO for Symbolic Regression
July 2012
9 / 28
Introducing SPOT
SPOT Introduction Sequential Parameter Optimization [?] Toolbox (SPOT1 ) I
Based on statistical methods and Design of Experiment Create initial design
Optional User Input, e.g.: Add specific points to design (expert knowledge)
Evaluate design on target function Build surrogate model
Check output, remove outliers
Exploit model: choose new design no
Terminate? yes Report + End
1 SPOT and all other used R packages can be retrieved from the CRAN homepage, i.e. http://cran.r-project.org. Bartz-Beielstein, Flasch, Zaefferer (CUAS)
SPO for Symbolic Regression
July 2012
SEVEN
10 / 28
Experiments with SPOT
SPOT Setup
SPOT Setup
Table : Parameters influencing SPOT performance Initial Design Size Initial Design Repeats Maximum Repeats Budget (GP-Runs) Old Best Size New Design Size Budget Allocation
10 2 5 100 3 1 Linearly increasing
Surrogate Model Surrogate Optimization Method Surrogate Optimization Budget
Kriging Model CMA-ES 1000
SEVEN
Bartz-Beielstein, Flasch, Zaefferer (CUAS)
SPO for Symbolic Regression
July 2012
11 / 28
Experiments with SPOT
Code Example
Interfacing RGP and SPOT I
> spotRgpTargetFunction F) 0.000711
SEVEN
Bartz-Beielstein, Flasch, Zaefferer (CUAS)
SPO for Symbolic Regression
July 2012
18 / 28
Advanced SPOT Features
Linear Models: Screening, Model Selection aov(formula=y~VARX3,data=cbind(y,x))
0.955
0.960
0.965
0.970
VARX3
0.0
I
0.2
0.4
0.6
0.8
1.0
Starting 6 additional GP runs to refine the model: 3 old models, 1 new with three repeats
VARX1 519 208 108 75
VARX2 VARX3 VARX4 VARX5 VARX6 CONFIG REPEATS 0.46 0.65 0 0 0.04 2 1 0.40 0.02 1 1 0.14 9 1 0.26 0.49 0 0 0.96 4 1 0.24 0.00 0 0 0.37 21 3
STEP SEED 1 3 1 3 1 3 1 1 SEVEN
Bartz-Beielstein, Flasch, Zaefferer (CUAS)
SPO for Symbolic Regression
July 2012
19 / 28
Advanced SPOT Features
1.4
1.6
1.8
2.0
0.45 1.2
1.4
1.8
2.0
1.2
1.4
1.6
1.8
1.8
2.0
step
VARX5 ●
1.0
0.4 0.0
0.4
VARX4 1.6
0.0
0.2 0.0
●
1.4
0.8
●
0.8
●
1.2
2.0
step
●
1.0
●
1.0
step
0.4
0.6
1.6
0.43
VARX2 ●
1.0
step
VARX3
0.41
500 1.2
●
400
VARX1 ●
1.0
●
300
●
200
0.9526
Y
Eval: 46 , Y: 0.952603021030235
0.9532
0.9538
Linear Models: Regression and Dummy Variables
1.2
1.4
1.6
1.8
2.0
step
●
1.0
1.2
1.4
1.6
1.8
2.0
step
VARX6
0.04
0.08
0.12
●
●
1.0
1.2
1.4
1.6
1.8
2.0
step
I
SPOT search path at step 2
Bartz-Beielstein, Flasch, Zaefferer (CUAS)
SEVEN
SPO for Symbolic Regression
July 2012
20 / 28
Advanced SPOT Features
Linear Models: Regression Analysis
I
Second step shows similar results
I
Suggested design points after step 2 of the SPO:
VARX1 VARX2 VARX3 VARX4 VARX5 VARX6 CONFIG REPEATS 208 0.40 0.02 1 1 0.14 9 1 2 108 0.26 0.49 0 0 0.96 4 1 2 555 0.54 0.50 1 0 0.91 15 2 2 464 0.46 0.00 0 1 0.29 22 4 2
STEP SEED 4 4 3 1
SEVEN
Bartz-Beielstein, Flasch, Zaefferer (CUAS)
SPO for Symbolic Regression
July 2012
21 / 28
Advanced SPOT Features
2.0
2.5
3.0
0.45 1.5
3.0
3.0
step
●
1.0
1.5
2.0
2.5
3.0
step
3.0
●
●
1.0
1.5
2.0
2.5
3.0
step
●
VARX6
0.04
0.08
0.12
●
●
2.5
0.8 VARX5
0.0
●
2.5
2.0
●
0.8 2.0
1.5
●
0.4
VARX4 1.5
0.0
0.2
●
1.0
●
1.0
step
●
0.4
0.6
●
2.5
step
●
0.0
VARX3
2.0
0.43
VARX2 ●
1.0
step
0.4
1.5
●
0.41
500
●
1.0
●
400
VARX1
●
300
●
200
0.9526
Y
Eval: 54 , Y: 0.953649013942794
0.9532
0.9538
Linear Models: Regression and Dummy Variables
●
1.0
1.5
2.0
2.5
3.0
step
I
SPOT search path at step 3
Bartz-Beielstein, Flasch, Zaefferer (CUAS)
SEVEN
SPO for Symbolic Regression
July 2012
22 / 28
Advanced SPOT Features
Linear Models: Regression Analysis
I I
Second step shows similar results Additional configurations proposed after step 3 of the SPO:
VARX1 VARX2 VARX3 VARX4 VARX5 VARX6 CONFIG REPEATS STEP SEED 208 0.40 0.02 1 1 0.14 9 1 3 5 108 0.26 0.49 0 0 0.96 4 1 3 5 75 0.24 0.00 0 0 0.37 21 2 3 4 352 0.60 0.00 1 1 0.99 23 5 3 1
SEVEN
Bartz-Beielstein, Flasch, Zaefferer (CUAS)
SPO for Symbolic Regression
July 2012
23 / 28
Advanced SPOT Features
0.45
500
●
●
2.5
3.0
3.5
4.0
●
1.0
1.5
2.0
3.0
3.5
4.0
●
2.5
1.5
2.0
3.0
●
3.5
4.0
step
2.5
3.0
3.5
4.0
3.5
4.0
step
●
●
VARX5
0.8
●
0.4
VARX4 2.0
0.0
0.4 0.2
1.5
●
1.0
0.8
●
●
1.0
VARX6
2.5 step
●
0.0
VARX3
0.6
step
●
1.0
●
1.5
2.0
2.5
3.0
3.5
4.0
step
0.4
2.0
0.0
1.5
●
0.25
●
●
●
0.35
VARX2
300
VARX1
●
1.0
0.05 0.15 0.25 0.35
●
●
●
100
Y
Eval: 63 , Y: 0.954956008081695
0.9530
0.9540
0.9550
Linear Models: Regression and Dummy Variables
●
1.0
●
1.5
2.0
2.5
3.0
step
●
●
●
●
1.0
1.5
2.0
2.5
3.0
3.5
4.0
step
I
SPOT search path at step 4
Bartz-Beielstein, Flasch, Zaefferer (CUAS)
SEVEN
SPO for Symbolic Regression
July 2012
24 / 28
Advanced SPOT Features
Linear Models with Factors after Step 4 of SPO Population Size Children per Generation Recombination Probability Enable Complexity Criterion Enable Age Criterion New Individuals per Generation I
Variable (Symbol)
Domain
Default
mu (µ) lambda (λ) recombinationProbability (prec ) complexityCriterion ageCriterion nu (ν)
N N [0, 1] B B N0
300 20 0.1 true true 1
Refinement of the automated analysis: I
Perform stepwise model selection by AIC
VARX1 VARX3 VARX5 VARX3:VARX5 VARX1:VARX3 Residuals
Df 1.000000 1.000000 1.000000 1.000000 1.000000 57.000000
Sum Sq 0.000588 0.001004 0.000048 0.000290 0.000550 0.003564
Mean Sq 0.000588 0.001004 0.000048 0.000290 0.000550 0.000063
F value 9.403483 16.056302 0.765542 4.640992 8.790953
Pr(>F) 0.003309 0.000180 0.385272 0.035457 0.004414
SEVEN
Bartz-Beielstein, Flasch, Zaefferer (CUAS)
SPO for Symbolic Regression
July 2012
25 / 28
Advanced SPOT Features
Linear Models: Regression and Dummy Variables aov(formula=y~VARX1+VARX3+VARX5+VARX3:VARX5+VAR...
0
200
400
600
1 VARX1: VARX3
1000
0.99 0.98 0.97 0.96
0.0
0.2
0.4
0.6
0.8
2 VARX1: VARX5
0
1
3 VARX3: VARX5
X3
X1
R VA
1.0
R VA
X5
X3
R VA
R VA
X1
R VA
800
3 VARX5
0.95
0.96
0.97
0.98
0.99
2 VARX3
0.95
0.95
0.96
0.97
0.98
0.99
1 VARX1
X5
R VA
I
VARX1 + VARX3 + VARX5 + VARX3:VARX5 + VARX1:VARX3
I
Interactions between popsize (x1) and age (x5) and also between recombination prob. individuals (x3) and age (x5) SEVEN
Bartz-Beielstein, Flasch, Zaefferer (CUAS)
SPO for Symbolic Regression
July 2012
26 / 28
Advanced SPOT Features
Summary: Regression Analysis
I
Categorical parameters can be easily integrated into the SPOT tuning framework
I
More complex models (MARS, GLM) can be used as well, however: Occam’s razor
I
Crossover prob. has significant impact, should be low
I
Children per generation and age criterion: no effect
I
Interactions
I
Further steps: nested designs for complicated settings
SEVEN
Bartz-Beielstein, Flasch, Zaefferer (CUAS)
SPO for Symbolic Regression
July 2012
27 / 28
Acknowledgments
Acknowledgments
I
This work has been supported by the Federal Ministry of Education and Research (BMBF) under the grants FIWA (AIF FKZ 17N1009) and CIMO (FKZ 17002X11)
SEVEN
Bartz-Beielstein, Flasch, Zaefferer (CUAS)
SPO for Symbolic Regression
July 2012
28 / 28