Anticipation Mappings for Learning Classifier Systems Larry Bull, Pier Luca Lanzi, Toby O’hara Abstract— In this paper, we study the use of anticipation mappings in learning classifier systems. At first, we enrich the eXtended Classifier System (XCS) with two types of anticipation mappings: one based on array of perceptrons array, one based on neural networks. We apply XCS with anticipation mappings (XCSAM) to several multistep problems taken from the literature and compare its anticipatory performance with that of the Neural Classifier System X-NCS which is based on a similar approach. Our results show that, although XCSAM is not a “true” Anticipatory Classifier System like ACS, MACS, or X-NCS, nevertheless XCSAM can provide accurate anticipatory predictions while requiring smaller populations than those needed by X-NCS.
I. I NTRODUCTION Learning Classifier Systems (LCSs) [10] combine reinforcement learning and genetic algorithms [8] to solve problems modeled as trial-and-error interactions with an unknown environment. They maintain a population of conditionaction-prediction rules called classifiers which represents the current solution to the target problem. Each classifier represents a small portion of the overall solution. The classifier condition c identifies an area of the problem domain. The classifier action a represents a decision on the subproblem identified by the condition c. The classifier prediction p (or strength) provides an estimate of how valuable action a is in terms of problem solution on the subproblem identified by condition c. In learning classifier systems, the genetic component works on classifier conditions searching for an adequate decomposition of the target problem into a set of subproblems; the reinforcement component works on classifier predictions to estimate the action values in each subproblem. Anticipatory classifier systems (e.g., [19], [20], [4]) extend the typical classifier structure by adding the prediction of the next state that the system will encounter after the classifier action is performed. While learning classifier systems focus on the reward incoming from the environment and on the value of actions, anticipatory classifier systems focus on the accurate prediction of the effect that actions have on the environment. Accordingly, in an anticipatory classifier system the current knowledge consists a population of conditionaction-effect classifiers which also represents a model of the problem. The model can be used for planning, for speeding up the learning, or even to disambiguate perceptual aliasing [4]. In [13], we proposed an approach to estimate the effect of classifier actions so as to provide some anticipatory Larry Bull and Toby O’Hara are with the University of the West of England Bristol, (email:
[email protected],
[email protected]). Pier Luca Lanzi is with the Dipartimento di Elettronica e Informazione, Politecnico di Milano, (email:
[email protected]).
capabilities to typical learning classifier systems. The idea was not to develop an anticipatory classifier system but to keep the usual LCS structure and to extend classifiers so as to permit interesting, although limited, anticipatory capabilities. In [13], the focus is still on the value of actions, as in typical models, anticipatory ability is obtained as a side effect. The results we reported in [13] show that by adding elementary statistics to classifiers, it is possible to partially anticipate the effect of classifier actions. Recently, O’Hara and Bull [18] introduced neural anticipation as a way to add anticipatory capabilities to their neural classifier system X-NCS [3]. In [18], neural classifiers are extended with an additional anticipatory neural network trained in a supervised fashion based on the current input st and the next input st+1 that is encountered after the classifier action has been performed. To measure the accuracy of lookahead prediction, as previously done in the anticipatory version of YCSL [2], a metric, called lookahead accuracy, is kept of the percentage accuracy of the classifier with the highest numerosity in the current subproblem. In X-NCS with neural anticipation, called X-NCS(LNN) [18], classifiers are also extended with a lookahead error. This is a measure of whether the classifier anticipation is correct that is computed as the error affecting the last anticipation computed. Thus lookahead error is not an average error (as the prediction error in XCS [21]) but rather the error on the last next state prediction. The classifier prediction error, typical of accuracy-based LCSs like XCS, and the lookahead error are combined together in an overall error measure. X-NCS tends to evolve accurate classifier as XCS, but in X-NCS accurate classifiers are those that are “overall accurate”, i.e., they provide an accurate estimation of classifier prediction and an accurate estimation of classifier anticipation. In this paper, we follow up on our previous results [13]. We borrow the idea of function mappings from the work of Wilson [23] on computed prediction, the idea of neural anticipation from [18], and we analyze the use of anticipation mappings in XCS. Inspired by [23], we extend classifier structure with a parametrized anticipatory function anf which, following the approach of [18], is trained using supervised learning based on the current state st and the next state st+1 . In contrast to what was done in [18], we do not introduce a lookahead error. Instead, following the experience in [13], our XCS with anticipation mappings (XCSAM) only focuses on the value of actions, as XCS [21]. Anticipatory capability is obtained as a “side effect” [13]. Since in the proposed model no component aims at the development of accurate anticipatory behavior, the anticipatory burden falls completely on the function anf : the more powerful anf is, the more accurate the lookahead prediction will be. In our
2133 c 1-4244-1340-0/07$25.00 2007 IEEE
approach, the difficulty of the anticipatory task also depends on the amount of generalization that a problem allows. If the problem allows few generalizations, the anticipatory task will be easier and accurate lookahead prediction will be obtained rather easily. In fact, the more specific a classifier is, the fewer states anf needs to predict. On the other hand, the anticipatory task will be more difficult in problems that allow many generalizations. In fact, the more general a classifier is, the higher the number of accurate predictions that the anticipatory function anf has to provide. We applied XCS with a perceptron-based anticipation function and XCS with a neural anticipation function (used in [18]) on several multistep environments. The results we present show that both models achieve similar anticipatory accuracy both in typical Markov problems and in partially observable environments where the anticipatory prediction is innerly affected by errors. II. T HE XCS C LASSIFIER S YSTEM XCS is a reinforcement learning algorithm that works on a rule based representation [12]. It maintains a population of rules (the classifiers) which represents the solution to a reinforcement learning problem. Classifiers consist of a condition, an action, and four main parameters [21], [6]: (i) the prediction p, which estimates the relative payoff that the system expects when the classifier is used; (ii) the prediction error ε, which estimates the error of the prediction p; (iii) the fitness F , which estimates the accuracy of the payoff prediction given by p; and (iv) the numerosity num, which indicates how many copies of classifiers with the same condition and the same action are present in the population. At time t, XCS builds a match set [M] containing the classifiers in the population [P] whose condition matches the current sensory input st ; if [M] contains less than θnma actions, covering takes place and creates a new classifier that matches st and has a random action. For each possible action a in [M], XCS computes the system prediction P (st , a) which estimates the payoff that the XCS expects if action a is performed in st . The system prediction P (st , a) is computed as the fitness weighted average of the predictions of classifiers in [M] which advocate action a: P (st , a) =
pk ×
clk ∈[M](a)
Fk cli ∈[M](a)
Fi
(1)
where, [M](a) represents the subset of classifiers of [M ] with action a, pk identifies the prediction of classifier cl[k], and Fk identifies the fitness of classifier cl[k]. Then XCS selects an action to perform; the classifiers in [M] which advocate the selected action form the current action set [A]. The selected action at is performed, and a scalar reward rt+1 is returned to XCS together with a new input st+1 . When the reward rt+1 is received, the estimated payoff P (t) is computed as follows: P (t) = rt+1 + γ max P (st+1 , a) a∈[M]
2134
(2)
Next, the parameters of the classifiers in [A] are updated in the following order [6]: prediction, prediction error, and finally fitness. Prediction p is updated with learning rate β (0 ≤ β ≤ 1): pk ← pk + β(P (t) − pk )
(3)
Then, the prediction error and classifier fitness are updated as usual [21], [6]. On regular basis (dependent on parameter θga ), the genetic algorithm is applied to classifiers in [A]. It selects two classifiers, copies them, and with probability χ performs crossover on the copies; then, with probability μ it mutates each allele. The resulting offspring classifiers are inserted into the population and two classifiers are deleted to keep the population size constant. III. A NTICIPATION
WITH
F UNCTION M APPINGS
The extension of XCS with anticipation mappings is straightforward and it basically follows the principles used in computed prediction [23] and in X-NCS(LNN) [18]. In XCS with anticipation mappings (briefly XCSAM) each classifier is extended with the parameter vector w (the network weights in [18]) and an anticipation function anf (st , w) parametrized by w. While in [18], classifier error is the sum of the prediction error and the additional lookahead error, in our implementation we did not introduce any additional error. As done in [13], anticipatory prediction is obtained as a side effect so that nothing except the anticipation function is added to the system. XCS with anticipation mappings (XCSAM) works as XCS. At time t, given the current input is st , XCSAM builds the match set [M] containing the classifiers in [P] that match the current input st ; if [M] contains less than θnma actions, covering takes place as in XCS. For each possible action a in [M], XCSAM computes the system prediction P (a), then it selects an action to perform and builds the action set [A]. Given the current action set [A], the anticipatory prediction is computed as follows. For each classifier cli in [A], the corresponding anticipation st+1,i is computed as anf (st , wi ), where wi is the parameter vector associated to classifier cli . The overall anticipatory prediction ˆ st+1 is obtained as the fitness weighted average the anticipatory predictions of the classifiers in [A], i.e., [A] anf (st , wi )Fi (4) ˆ st+1 = [A] Fi As in [18], the real-valued anticipatory prediction given by ˆ st+1 is translated into a vector of 0s and 1s by setting to 0 st+1 any value in ˆ st+1 less or equal to 0.5; to 1 any value in ˆ greater than 0.5. Then, the selected action is performed and a scalar reward rt+1 is returned to the system together with the next input st+1 . During learning, the next input st+1 is used to train the weight vectors wi of the classifiers in [A]. As in [23], this phase depends on the type of anticipation function involved and it will be illustrated in detail later. Note that, while in [18] the update of the anticipation network continues until either the preset number of cycles is finished
2007 IEEE Congress on Evolutionary Computation (CEC 2007)
or until the rule vector matches the target next vector, in our case the classifiers in [A] are updated only once. During test, the weights vectors wi of the classifiers in [A] are not updated. To estimate the anticipatory performance, we track (i) the anticipatory accuracy measured as the percentage of bits of the next state that the system is able to predict, and (ii) the mean square error between the encountered next state st+1 and the predicted next state ˆ st+1 . Perceptron-based Anticipation. In this case, the anticipation function anf (st , w) is defined by an array of as many perceptron as the number of bits that needs to be predicted. The i-th perceptron predicts the i-th bit of the next state st+1 based on the weight vector wi . It takes the current state st and outputs 0 or 1 through a two stage process. First the linear combination wi st of the inputs st and of the weight vector wi is calculated, then the perceptron outputs 1 if wst is greater than zero, 0 otherwise. For this purpose, the original binary input st are enriched with the usual bias [9] (i.e., the constant input x0 [23], [14]). In addition, since zero values for inputs must be generally avoided [9], binary inputs are also mapped into integer values by replacing the zeros and ones in st with -1 and +1 respectively (as also done in [18]). Given the current state st and the next state st+1 , the weight wi,j associated to the perceptron i the j-th input bit st (j) is updated as [9]: wi,j ← wi,j + η(st+1 (i) − oi )sj
(5)
where oi is the output of the i-th perceptron for input st , and η is the usual learning rate. Neural Anticipation. As done in [18], the classifier anticipation function anf (st , w) is computed using a multilayer neural network that maps the current state st in the next state st+1 . Given the current input xt and the desired output yt , the network weights are updated using online backpropagation [9]. All the neural networks used in the experiments discussed in this paper contain 5 hidden nodes. In [18], the update of the anticipation network continues until either the preset number of cycles is finished or until the rule vector matches the target next vector. In our case, the network is updated only once, independently on whether the network has actually learned to predict the correct anticipation. IV. D ESIGN
OF
E XPERIMENTS
The experiments reported in this paper were performed following the standard settings used in the literature [21]. Each experiment consists of a number of problems that the system must solve. When the system solves the problem correctly, it receives a constant reward equal to 1000; otherwise it always receives a constant reward equal to 0. Each problem is either a learning problem or a test problem. In learning problems, the XCS selects actions randomly from those represented in the match set. In test problems, XCS always selects the action with the highest prediction. The genetic algorithm is enabled only during learning problems, while it is turned off during test problems. The covering operator is always
enabled, but operates only if needed. Learning problems and test problems alternate. The performance is computed as the moving average of the correctly classified examples in the last 50 test problems. All the statistics reported in this paper are averaged over 20 experiments. In all the experiments, the bias for perceptron-based anticipation and neural-based anticipation is 1; the learning rate for perceptron-based anticipation and neural-based anticipation is always 0.2. V. E XPERIMENTS In the first set of experiments, we applied XCS with anticipation mappings (briefly, XCSAM) to woods environments [21]. These are simple grid worlds like those depicted in Figure 1, which contains obstacles (T), empty positions, and goal positions (F). There are eight sensors, one for each possible adjacent cell. Each sensor is encoded with two bits: 10 indicates the presence of an obstacle T; 11 indicates a goal F; 00 indicates an empty cell. Classifiers inputs are 16 bits long (2 bits × 8 cells). There are eight possible actions, encoded with three bits. The system has to learn how to reach the goal positions from any free position; the system is always in one free position and it can move in any surrounding free position; when the system reaches the goal position (F) the problem ends and the systems is rewarded with 1000; in all the other cases the system receives zero reward. We applied XCSAM with perceptron-based and neuralbased anticipations to the very simple Woods1 (Figure 1a) using the usual settings [21] and a population of 1600 classifiers. Figure 2 reports (i) the performance of XCSAM computed as the average number of steps to the goal position (Figure 2a) and (ii) the number of macroclassifiers in the population (Figure 2b). Note that, since the anticipatory prediction does not influence XCSAM performance and generalization, there is no difference between the performance and generalization of the two versions considered. Accordingly, Figure 2 only reports the plots for perceptronbased anticipation. Figure 3 compares the anticipatory performance of perceptron-based and neural-based anticipation measured as (i) the percentage of the next state that is correctly predicted during test problems (Figure 3a) and (ii) the average mean square error (MSE) that affects the next state prediction during test problems (Figure 3b). Perceptronbased prediction can easily reach 100% accuracy on the next state prediction (Figure 3a), while the mean square error affecting the classifier anticipation rapidly drops to 0. Neural anticipation is slightly slower but it also easily reaches optimal anticipatory prediction. When compared to the results reported in [18, Figures 3 to 8], we note that our simple extension of XCS, which works on an elementary ternary representation, can learn faster (XCSAM reaches optimal performance by 200 learning problems, while XNCS needs around 500 steps to reach optimality) and needs fewer classifiers (at the end, XCSAM needs around 800 classifiers, while X-NCS needs around 1200 classifiers). XCSAM also reaches 100% anticipation accuracy faster than X-NCS: perceptron-based anticipation reach optimal
2007 IEEE Congress on Evolutionary Computation (CEC 2007)
2135
anticipatory prediction in 200 learning problems, whereas X-NCS needs around 2000 learning problems. When we move to more complex environments like Maze5 (Figure 1b) and Maze6 (Figure 1c) the difference between perceptron-based and neural-based prediction becomes even smaller. We applied XCS using perceptron-based and neural based prediction to Maze5 and Maze6 with a population of 4000 classifiers, 0 is set to 5, and during exploration XCS selects a random action with probability 1.0; all the other parameters are set as usual [11]. These settings allow XCS to perform optimally in Maze5 without any additional help from other techniques like those considered in [11], [5]. Figure 4 reports the performance and the number of macroclassifier in the population for XCSAM in Maze5. Figure 5 compares the anticipatory performance of the two versions of XCS in Maze5: Both perceptron-based and neural based anticipation can easily reach 100% performance while the mean square error affecting the prediction rapidly drop to zero. Because it is simpler, perceptron-based anticipation can reach optimal anticipatory prediction in less than 200 learning problems, neural-based anticipation needs around 1500 learning problems before reaching the same accuracy. Similar results are obtained for Maze6, see Figure 6. When we compare our results in Maze5 to the ones reported in [18, Figures 10 to 12] for X-NCS(LNN), we note that XCSAM and X-NCS perform similarly in terms of learning learning speed: both models reach optimal performance around 1000 problems (Figure 4 and [18, Fig. 10]). XCSAM works on a simple representation and thus it requires fewer classifiers, around 3000 (Figure 4), than X-NCS(LNN), which requires around 5000 classifiers [18, Fig. 11]. XCSAM with perceptron-based anticipation is faster than X-NCS, reaching 100% accuracy in less than 200 learning problems whereas X-NCS needs around 2000 problems. However, neural-based anticipation performs similarly in XCSAM and X-NCS. In fact both models needs 2000 problems to reach 100% anticipation accuracy [18]. VI. E NVIRONMENTS
WITH
A LIASING
In the previous set of experiments we focused on Markov problems in which the consequences of the system actions depend only on the current state. In the next set of experiments, we considered environments in which perceptual aliasing makes the anticipatory task more difficult. At first, we applied XCSAM with perceptron-based and neural-based prediction to Woods2 depicted in Figure 7. This is a grid world with two types of obstacles (represented by “O” and “Q” symbols), two types of goal positions (represented by “F” and “G” symbols), and empty positions. Woods2 is a torus: its left and right edges are connected as well as its top and bottom edges. As in the previous environments, XCSAM can stay in any of the empty positions and it is able to move to any adjacent position that is empty. XCSAM has eight sensors, one for each adjacent position. Sensors are encoded by three bits coding features of the object: one bit determines whether the position is empty, one bit determines the type of obstacle (“O” or “Q”), and
2136
(a)
(b)
(c) Fig. 1.
The Woods1, Maze5, and Maze6 environments.
one bit determines the type of goal (“F” or “G”). Thus, the agent’s sensory input is a string with 24 bits (3 bits × 8 positions). The available actions and the reward policy is the same one used in the previous experiments. Woods2 has a larger input space than the previous environments and it allows many generalizations [22]. Accordingly, Woods2 makes the anticipatory task more difficult since the more general the condition, the more states the anticipatory function associated to each classifier needs to predict. More important, Woods2 is a Class 1 environment [17] in that it has some aliasing states which, on the other hand, do not affect the possibility of solving the problem even without internal memory [1]. For instance, the three positions highlighted with a circle in Figure 7 are aliased (i.e., they are perceived as equal by the system but they are actually different positions) so that the same action has different consequences: when the system moves “north”, in the upper position, the next state is different. As a consequence, the anticipatory prediction will be affected by uncertainty. Figure 8 reports (a) the performance and (b) the number of macroclassifiers in the population for XCS with perceptron-based anticipation.1 As expected XCS reaches optimal performance since the aliased positions do not affect the incoming reward [17], [1]. Figure 9 compares the anticipatory behavior obtained by perceptron-based and neuralbased anticipation. Both approaches reach the same accuracy being able to correctly predict on the average the 96% of 1 We remind the reader that there is no difference between the performance of the two XCS models since anticipations do not take part in the learning process as it happens in some ACS models [4].
2007 IEEE Congress on Evolutionary Computation (CEC 2007)
(a)
(a)
(b)
(b)
Fig. 2. XCS applied to Woods1: (a) performance; (b) number of macroclassifiers.
Fig. 3. Perceptron-based and neural-based anticipation in Woods1: (a) anticipation accuracy; (b) anticipatory error.
the next state. The problem is slightly more difficult than the previous ones, as a consequence neural networks are slower. The analysis of the final populations reveals that the bits affected by uncertain prediction are the ones which determine the type of obstacle and the type of goal. This is coherent to what we can easily observe in Figure 7. The environment has a regular structure (obstacles and goals appear in the same positions), thus the perceptual aliasing does not affect the prediction of whether positions are empty, contain an obstacle, or a goal. In the aliased positions in Figure 7, the action that moves the system to “north”, in the upper position, has always the same consequence in terms of empty/occupied positions: the system always faces a wall of obstacles. The perceptual aliasing comes from the type of obstacles that the system encounters in each one of the consequent positions. Next, we considered an environment in which perceptual aliasing also affects the performance. The Woods101 environment in Figure 10 is non-Markov since it has two distinct positions, indicated by the two arrows, which the system perceives as identical but that require different optimal actions [7]. In the right aliased position the optimal action is “go south-west”; in the left aliased position the optimal action is “go south-east.” When XCSAM is in one of the aliased positions the effect of its action will be different, although from the system viewpoint the two positions are perceived as being one. Accordingly, in the two aliased positions,
anticipatory prediction will be affected by noise. Figure 11 reports (a) the performance and (b) the number of macroclassifiers in the population for XCSAM with perceptron-based anticipation in Woods101. As expected, XCSAM does not reach optimal performance which would require an additional memory mechanism [7], [16]. Nevertheless, XCSAM is able to learn a policy that allows the system to reach a goal position in a reasonable number of actions. Figure 12a compares the anticipatory accuracy obtained by perceptron-based and neural-based anticipation. In this case, perceptron-based anticipation reaches a slightly more accurate prediction (98%) than neural-based prediction (94%). Similarly, the mean square error affecting anticipations is lower for perceptron-based prediction than for neural-based prediction (Figure 12b). In this environment, the perceptual aliasing has a stronger effect on the system performance and to reach the goal position XCS quickly adapts its strategy based on the aliased position it enters. Accordingly, perceptron-based anticipation which is faster than neural-based anticipation can adapt faster so as to obtain on the average a more accurate anticipatory prediction. VII. C ONCLUSIONS In this paper, we introduced a version of XCS in which anticipatory prediction is computed using either an array of perceptrons or a neural network. We compared the anticipatory and generalization performance of XCS with anticipation mappings, briefly XCSAM, with that of the neural clas-
2007 IEEE Congress on Evolutionary Computation (CEC 2007)
2137
(a)
(a)
(b)
(b)
Fig. 4. XCS applied to Maze5: (a) performance; (b) number of macroclassifiers.
sifier system, X-NCS, introduced by O’Hara and Bull [18]. Our results show that XCSAM can quickly reach optimal anticipatory performance while needing smaller populations than X-NCS. They also show that the simple approach based on perceptron arrays can provide competitive performance over the more advanced neural-based anticipations. Overall, we believe that anticipation mappings represent a promising approach to the development of classifier systems with anticipatory capabilities. While in this work, we applied anticipation mappings only to simple binary anticipation problems, the approach can be generalized to environments involving continuous inputs such as those considered in [15].
Fig. 5. Perceptron-based and neural-based anticipation in Maze5: (a) anticipation accuracy; (b) anticipatory error.
(a)
R EFERENCES [1] Anthony J. Bagnall and Zhanna Zatuchna. On the classification of maze problems. volume 183 of Studies in Fuzziness and Soft Computing, pages 307–316. Springer, 2005. [2] Larry Bull. Lookahead and latent learning in a simple accuracy-based learning classifier system. In X. Yao et al., editor, Parallel Problem Solving from Nature - PPSN VIII, pages 1042–1050. Springer Verlag. [3] Larry Bull and Toby O’Hara. Accuracy-based neuro and neurofuzzy classifier systems. In William B. Langdon, Erick Cant´u-Paz, Keith E. Mathias, Rajkumar Roy, David Davis, Riccardo Poli, Karthik Balakrishnan, Vasant Honavar, G¨unter Rudolph, Joachim Wegener, Larry Bull, Mitchell A. Potter, Alan C. Schultz, Julian F. Miller, Edmund K. Burke, and Natasa Jonoska, editors, GECCO, pages 905– 911. Morgan Kaufmann, 2002. [4] Martin V. Butz. Anticipatory Learning Classifier Systems, volume 4 of Genetic Algorithms and Evolutionary Computation. Springer-Verlag, 2000.
2138
(b) Fig. 6. Perceptron-based and neural-based anticipation in Maze6: (a) anticipation accuracy; (b) anticipatory error.
2007 IEEE Congress on Evolutionary Computation (CEC 2007)
Fig. 7.
The Woods2 environment.
(a)
(b)
(a)
Fig. 9. Perceptron-based and neural-based anticipation in Woods2: (a) anticipation accuracy; (b) anticipatory error.
(b) Fig. 8. XCS applied to Woods2: (a) performance; (b) number of macroclassifiers.
[5] Martin V. Butz, David E. Goldberg, and Pier Luca Lanzi. Gradient descent methods in learning classifier systems: Improving xcs performance in multistep problems. IEEE Transaction on Evolutionary Computation, 9(5):452–473, October 2005. [6] Martin V. Butz and Stewart W. Wilson. An algorithmic description of xcs. Journal of Soft Computing, 6(3–4):144–153, 2002. [7] Dave Cliff and Susi Ross. Adding memory to ZCS. Adaptive Behavior, 3(2):101–150, 1994. [8] David E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading, Mass., 1989. [9] Simon Haykin. Neural Networks: A Comprehensive Foundation. Prentice Hall PTR, 1998. [10] J.H. Holland. Escaping brittleness: The possibilities of general purpose learning algorithms applied to parallel rule-based systems. In R.S. Michalski, J.G. Carbonell, and T.M. Mitchell, editors, Machine Learning, An Artificial Intelligence Approach, volume 2, chapter 20, pages 593–623. Morgan Kaufmann, Los Altos, CA, 1986. [11] Pier Luca Lanzi. An Analysis of Generalization in the XCS Classifier System. Evolutionary Computation Journal, 7(2):125–149, 1999.
Fig. 10.
The Woods101 environment.
[12] Pier Luca Lanzi. Learning classifier systems from a reinforcement learning perspective. Soft Computing - A Fusion of Foundations, Methodologies and Applications, 6(3):162–170, 2002. [13] Pier Luca Lanzi. Estimating classifier generalization and action’s effect: A minimalist approach. In E. Cant´u-Paz et al., editor, Genetic and Evolutionary Computation – GECCO-2003, volume 2724 of LNCS, pages 1894–1905, Chicago, 12-16 July 2003. Springer-Verlag. [14] Pier Luca Lanzi, Daniele Loiacono, Stewart W. Wilson, and David E. Goldberg. XCS with computed prediction for the learning of boolean functions. In Proceedings of the IEEE Congress on Evolutionary Computation – CEC-2005, pages 588–595, Edinburgh, UK, September 2005. IEEE. [15] Pier Luca Lanzi, Daniele Loiacono, Stewart W. Wilson, and David E. Goldberg. XCS with Computed Prediction in Continuous Multistep Environments. In Proceedings of the IEEE Congress on Evolutionary Computation – CEC-2005, pages 2032–2039, Edinburgh, UK, September 2005. IEEE. [16] Pier Luca Lanzi and Stewart W. Wilson. Toward optimal classifier system performance in non-Markov environments. Evolutionary Computation, 8(4):393–418, 2000.
2007 IEEE Congress on Evolutionary Computation (CEC 2007)
2139
(a)
[17] Michael L. Littman. An optimization-based categorization of reinforcement learning environments. In Jean-Arcady Meyer, Herbert L. Roitblat, and Stewart W. Wilson, editors, From Animals to Animats: Proceedings of the Second International Conference on Simulation and Adaptive Behavior, pages 262–270. MIT Press, 1992. [18] Toby O’Hara and Larry Bull. Building anticipations in an accuracybased learning classifier system by use of an artificial neural network. In IEEE Press, editor, IEEE Congress on Evolutionary Computation, pages 2046–2052, 2005. [19] Rick L. Riolo. Lookahead Planning and Latent Learning in a Classifier System. In J. A. Meyer and S. W. Wilson, editors, From Animals to Animats 1. Proceedings of the First International Conferenceon Simulation of Adaptive Behavior (SAB90), pages 316–326. A Bradford Book. MIT Press, 1990. [20] Wolfgang Stolzmann. An Introduction to Anticipatory Classifier Systems. In Learning Classifier Systems. From Foundations to Applications, volume 1813 of LNAI, pages 175–194, Berlin, 2000. SpringerVerlag. [21] Stewart W. Wilson. Classifier Fitness Based on Accuracy. Evolutionary Computation, 3(2):149–175, 1995. http://prediction-dynamics.com/. [22] Stewart W. Wilson. Generalization in the XCS classifier system. In Genetic Programming 1998: Proceedings of the Third Annual Conference, pages 665–674. Morgan Kaufmann, 1998. [23] Stewart W. Wilson. Classifiers that approximate functions. Journal of Natural Computating, 1(2-3):211–234, 2002.
(b) Fig. 11. XCS applied to Woods101: (a) performance; (b) number of macroclassifiers.
(a)
(b) Fig. 12. Perceptron-based and neural-based anticipation in Woods101: (a) anticipation accuracy; (b) anticipatory error.
2140
2007 IEEE Congress on Evolutionary Computation (CEC 2007)