A Measure of Landscapes Wim Hordijk
SFI WORKING PAPER: 1995-05-049
SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent the views of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or proceedings volumes, but not papers that have already appeared in print. Except for papers by our external faculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, or funded by an SFI grant. ©NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensure timely distribution of the scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the author(s). It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may be reposted only with the explicit permission of the copyright holder. www.santafe.edu
SANTA FE INSTITUTE
A Measure of Landscapes Wim Hordijk
[email protected] May 1995 Abstract The structure of a tness landscape is still an ill-dened concept. This paper introduces a statistical tness landscape analysis, that can be used on a multitude of tness landscapes. The result of this analysis is a statistical model that, together with some statistics denoting the explanatory and predictive value of this model, can serve as a measure for the structure of the landscape. The analysis is based on a statistical time series analysis known as the Box-Jenkins approach, that, among others, estimates the autocorrelations of a time series of tness values generated by a random walk on the landscape. From these estimates, a correlation length for the landscape can be derived.
Keywords:
Fitness landscapes, Correlation structure, Correlation length
1
1 Introduction \We need a real theory relating the structure of rugged multipeaked tness landscapes to the ow of a population upon those landscapes. We do not yet have such a theory."
This quote, from Stuart A. Kauman Kau93], states the goal of this paper. Its purpose is to contribute another brick in the wall of this (badly needed) theory. The part of the theory this paper is focusing on is the structure of tness landscapes. The tness landscape paradigm is used more and more in evolutionary computation, without being very well de ned. Recently, a new landscape model is introduced that tries to put an end to this ambiguity Jon95]. As with the landscape itself, the structure of a tness landscape is also still an ill-de ned concept. So far, dierent methods and de nitions have been used to de ne and measure this structure. This paper introduces a statistical tness landscape analysis that can be used on a multitude of tness landscapes, the result of which is an expression for the structure of the landscape in the form of a statistical model. This model, together with some statistics denoting its explanatory and predictive value, can be used to characterize the landscape and to compare dierent landscapes with each other. The analysis is based on the above mentioned landscape model. This model, together with the landscape analysis introduced here, can serve as a \standard" for building a theory that relates the structure of a tness landscapes to the population ow on it. In the next section, the concept of a tness landscape is explained, and the model introduced in Jon95] is described. Also, the NK-model, introduced by Kauman Kau89], is explained. The NK-model is used in this paper for constructing landscapes that dier in \ruggedness". In Section 3, dierent statistical concepts and de nitions are given, which are put together to come up with the statistical tness landscape analysis. In Section 4, then, this landscape analysis is applied to dierent tness landscapes. Finally, in Section 5, the major conclusions that can be drawn from the application of the landscape analysis are given.
2 Fitness landscapes The notion of a tness landscape comes from biology ( rst introduced in Wri32]), where it is used as a framework for thinking about evolution. This approach appeared to be very useful, which is the reason that the landscape paradigm has also been adopted in the eld of evolutionary computation. A tness landscape is de ned by three things: 1. A genotype space (i.e., the space of all possible genotypes). 2. A neighborhood relation that denotes which points are each others neighbors in the genotype space. 3. A tness function that attaches a tness value to each point in the space. 2
Most (if not all) people agree on this de nition, but there seems to be no real agreement about what exactly the neighborhood relation should be. Mostly, this relation is based on some metric that is the most natural for the genotype space that is chosen (see for example Pal91]), for example the Hamming distance for bit strings. This, however, seems to miss an important notion of evolutionary processes. The landscape metaphor was originally introduced to gain insight into the working of evolutionary processes. And evolutionary processes are driven by certain operators (for example mutation and selection). So it would be more natural to de ne the neighborhood relation in terms of the speci c operator that is studied. This is exactly what is done in the landscape model that is introduced in Jon95]. In this landscape model, the tness landscape is de ned as a directed graph. In this graph, the vertices are points in a space of possible inputs and outputs for the speci c operator that is considered. Each vertex is labeled with a tness value that is de ned by the tness function. An arc from point v to point w is labeled with the probability that point v is transformed to point w by the speci c operator. A vertex in such a graph does not necessarily correspond to one genotype. The crossover operator, for example, may take two genotypes as its input, and produce two genotypes as output. In this case, a vertex in the graph is a pair of genotypes. So in this landscape model, the \genotype space" should be taken in a broader sense. As tness label for such a vertex, the average of the tness values (as de ned by the tness function) of both genotypes making up the vertex can be taken for example. In this paper, tness landscapes with dierent structures, or dierent \ruggedness", are considered. For all landscapes, bit strings are taken as genotypes. Three dierent operators are considered, each of which de nes its own neighborhood relation. The operators are described in Section 4.1. For the tness function, the NK-model is used. This model is explained in Section 2.2. The next section explains the above mentioned landscape model in more detail.
2.1 The landscape model In this paper only operators that take one genotype as input and also produce one genotype as output are considered, so the landscape model will be explained for this case only. See Jon95] for the complete description of the model. A tness landscape L is de ned as a 5-tuple
L = (R f F >F ) where the ve components mean the following:
R: a representation space (or a genotypes space, to keep the terminology used above). : an operator, which is de ned as : R R ! 0::1]. If v w 2 R, then p = (v w) denotes the chance that v is transformed into w by a single application of the stochastic procedure represented . f : a tness function de ned as f : R ! F , attaching to each genotype a value that is an element of F . F : a tness space (i.e., a set of possible tness values). >F : a partial order over F . This means that if v w 2 R and f (v) >F f (w), then v is \better" than w. 3
This 5-tuple can be viewed as de ning a directed, labeled graph GL = (V E ), where V = R E V V and (v w) 2 E () (v w) > 0. A vertex v is labeled with f (v) 2 F , and an edge (v w) is labeled with (v w), i.e., the probability that the operator produces w from v. The partial order >F is used to determine relative tness. When (v w) = (w v) for all v w 2 V , then the landscape graph is considered undirected (each undirected edge denotes two directions). A simple example is given next, to make the model intuitively more clear:
R: The space of bit strings of length 2, or f0 1g2. : Point-mutation, i.e., a randomly chosen bit in the bit string is ipped (0 $ 1). f : ONEMAX, i.e., the number of 1's in the bit string. F : f0 1 2g >F : 2 >F 1 >F 0 The tness landscape in this case is the (undirected) graph in Figure 1. 10
11
1/2
(1)
(2)
1/2
00
1/2
1/2
01
(0)
(1)
Figure 1: A simple tness landscape for bit strings of length 2, point-mutation as operator and ONEMAX as tness function. All edge probabilities are 1/2
Three properties of the landscapes that result from this model are important for the remainder of this paper. First of all, when the operator de ning the neighborhood relation takes an equal number of genotypes as its input as it produces as output, then the resulting landscape is called walkable. This is the case, for example, for mutation (one genotype as input and one as output) and some forms of crossover (two genotypes as input and two as output). The output of applying the operator once can be used as the input for a second application, etc. This makes the landscape walkable by the operator. Second, an operator is called symmetric if (v w) = (w v) for all v w 2 V . This means that the probability of producing w from v is equal to the probability of producing v from w. This is the case in the landscape in Figure 1. The resulting graph is undirected. Third, a landscape is connected if there exists a path in the graph between every pair of vertices. The landscape in Figure 1 is connected. If a landscape is not connected, then it consists of several connected components, in which there is a path between every pair of vertices within such a component. Note that a component can consist of only one vertex, which is only connected to itself. 4
2.2 The NK-model The NK-model was introduced by Kauman to have a problem-independent model for constructing tness landscapes that can be tuned from smooth to rugged (see Kau89]). The main parameters of the model are N , the number of parts, or genes, in the genotype, and K , the number of genes that epistatically inuence a particular gene. Another parameter is A, the number of alleles every gene has. It appears, though, that almost all properties of the NK-model are independent of A, so here A will be set at 2, i.e., the genotypes are just bit strings (strings of 0's and 1's). The NK-model is explained here for the case where A = 2, but can be easily extended to larger values of A. Suppose every bit b (i = 1 : : : N ) in the bit string b is assigned a tness f of its own. The tness f of a bit b , however, does not only depend on the value (0 or 1) of the bit itself, but also on the value of K other bits b in the same bit string (0 K N ; 1). These dependencies are called epistatic interactions. i
i
i
i
j
So, the tness contribution of one bit depends on the value of K + 1 bits (itself and K others), giving rise to a total of 2 +1 possibilities. Each of these possibilities is assigned a random tness value, drawn from a Uniform probability distribution between 0.0 and 1.0. Therefore, the tness contribution f of bit b is speci ed by a list of 2 +1 random decimals between 0.0 and 1.0. This assigning of tness values is repeated for every bit b i = 1 : : : N in the bit string b. K
i
K
i
i
Having thus assigned the tness contributions for every bit in the string, the tness of the entire bit string, or genotype, is now de ned as the average of the tness contributions of all the bits:
X F = N1 f N
i
=1
i
One further aspect of the NK-model characterizes how the K epistatic interactions for each bit are chosen. Generally, this is done in one of two ways. The rst way is by choosing them at random (without repetition) from among the other N ; 1 bits. This is called random interactions. It is important to note that no reciprocity in epistatic inuence is assumed. This means that if the tness of bit b (partly) depends on bit b , it is not necessary that the reverse also holds. So, the epistatic interactions for one bit are determined independent of the epistatic interaction of other bits, but once they are chosen, they remain xed. i
j
The second way of choosing the epistatic interactions for a bit, is by taking the K neighboring bits. The K=2 bits on each side of a bit will inuence the tness of this bit. This is called nearest neighbor interactions. To make this possible, periodic boundary conditions are taken into account. This means that the bit string is considered circular, so the rst and the last bit are each others neighbors. Note that for K =0 and K = N ; 1, there is no dierence between the two types of interactions. In the rst case, the tness of each bit depends only on its own value, and in the second case, the tness of each bit depends on the value of all the bits in the string. By increasing the value of K from 0 to N ; 1, the tness landscape can be tuned from \smooth" to \rugged". When K is small, neighboring bit strings (that is, neighboring in terms of the operator that de nes the neighborhood relation) will have a relatively small dierence in tness, because those bits that are dierent in the two bit strings, inuence the tness contribution of only a few bits in both strings. So only a few bits in both strings will have a dierent tness contribution, which will have a small inuence on the overall tness of both strings. 5
When K is large, however, the diering bits of two neighboring strings will inuence the tness of a large number of bits in both strings, which will have a big inuence on the overall tness of both strings. When K = N ; 1, the tness landscape will be completely random, because changing the value of only one bit, changes the tness contribution of all bits in the string, so the overall tness will be completely dierent (the larger N is, the more the tness values will cluster around 0.5 however). The NK-model is used here as a tness function for constructing tness landscapes in the way that is explained in the previous section. Changing the value of N and/or K changes the tness function, and will thus result in a dierent tness landscape. One way to characterize dierent tness landscapes is by their correlation structure. The next section introduces a way to measure and express this correlation structure.
3 A tness landscape analysis As a measure for characterizing a tness landscape, often the correlation structure of the landscape is used. As with the de nition of the tness landscape itself, there appears to be no general agreement about what exactly the correlation structure and correlation length of a tness landscape are, or how to measure these things. Dierent approaches to measuring the correlation structure and de ning the correlation length have been used so far (see for example Wei90, MWS91, Lip91]). In this paper, aspects of all of these are combined and extended to construct a statistical tness landscape analysis, based on a time series analysis. The result of this tness landscape analysis is a general expression, in the form of a statistical model, for the correlation structure of the landscape. Dierent tness landscapes can then be compared on the basis of this statistical model. Moreover, this same analysis can be restricted to just a subspace of the entire landscape, thus making a more detailed analysis possible, if desired. The next two subsections explain the autocorrelation function, on which the analysis is based, and give the statistical de nition of correlation length. The following two subsections introduce the concept of time series analysis and describe such an analysis, known as the Box-Jenkins approach. Finally, in the last subsection, the entire landscape analysis is put together.
3.1 The autocorrelation function Weinberger introduced a procedure to measure the correlation structure of tness landscapes based on the autocorrelation function Wei90]. The idea is to generate a random walk on the landscape via neighboring points. His landscape model is based on point-mutation, thus neighboring points are points in the landscape that dier in exactly one bit (when the genotypes are bit strings). Starting at a randomly chosen point, at every step one randomly chosen bit in the string is ipped. At each step the tness of the genotype encountered is recorded. This way, a time series of tness values is generated. Next, the autocorrelation function is used to determine the correlation structure of this time series. The autocorrelation function relates the value of two points in the time series which are i steps (called times lags in case of a time series) apart. The autocorrelation for time lag i of a time series y t = 1 :: T i
t
6
is de ned as:
] ; E y ]E y + ] = Corr (y y + ) = E y y + Var (y ) where E y ] is the expected value of y and Var (y ) is the variance of y . It always holds that ;1 1. If j j is close to one, then there is much correlation between two values i steps apart if it is close to zero, then there is hardly any correlation. Estimates of these autocorrelations are: P =1; (y ; y)(y + ; y) r = P =1(y ; y)2 P where y = 1 =1 y and T 0. t
i
t
t
t
i
t
t
i
i
t
t
t
t
t
i
i
T
i
i
t
t
t
i
T
t
t
T
t
t
T
Weinberger's tness landscape is based on point mutation, or, rather, Hamming distance, and remains that way, also when other operators than point mutation are applied on it. So the values of r , the estimates of , that result from this analysis are considered the correlation structure of the landscape. In Section 3.5, this will be generalized to incorporate other operators, and thus neighborhood relations, as well. i
i
One further important assumption made here is that the tness landscape is statistically isotropic. This means that the statistics of the time series generated by a random walk are the same, regardless of the starting point. In other words, the landscape looks (globally) the same everywhere. The signi cance of statistical isotropy is that the random walk is \representative" of the entire landscape, and thus that the correlation structure of the time series can be regarded as the correlation structure of the landscape. This, too, will be generalized in Section 3.5. The tness landscape analysis introduced in this paper is based on Weinberger's method of generating a time series of tness values by a random walk, and estimating the autocorrelations of this time series.
3.2 Correlation length There still seems to be no agreement about an exact de nition of the correlation length of a tness landscape, since everybody uses their own de nition (see for example Wei90, MWS91, Lip91]). In a statistical sense, the correlation length gives an indication of the largest \distance", or time lag, between two points at which the value of one point still can provide some information about the expected value of the other point. In other words, the correlation length is the largest time lag i for which one can still expect some correlation between two points i steps apart. In statistics, it is usual to compare an estimated value with its two-standard-error bound1 , to see whether the estimated value is signi cantly dierentpfrom zero. For the r , the estimates of the (see Section 3.1), this two-standard-error bound is 2= T (see for example J+ 88]). So the statistical de nition of the correlation length of a time series is onep less than p the rst time lag i for which the estimated autocorrelation r falls inside the region (;2= T +2= T ), and thus becomes (statistically) equal to zero. This way, the correlation length of a time series is the largest time lag i for which the correlation between two points i steps apart is still statistically signi cant. In this paper, the above mentioned statistical de nition of correlation length is used. This measure is dierent from the ones used in for example Wei90, MWS91, Lip91]. i
i
i
1 The statistical estimation of a variable will never be exact, but contains some uncertainty. The standard error of an estimated value gives an indication of the amount of this uncertainty.
7
3.3 Time series analysis Methods used so far for measuring the correlation structure of tness landscapes, somehow generate a time series of tness values, like in the method of Weinberger Wei90], and calculate the autocorrelation of this time series. Or they start with a random population, apply an operator to all members of the population, and calculate the correlation between the tness values of the parent and ospring populations (MWS91, Lip91]). This is essentially the same as the rst method, where the original time series can be viewed as the parent population, and the same time series one time lag shifted as the ospring population. Moreover, the time series method has the advantage of being able to consider more generations (time lag > 1) with only one and the same time series, without the danger of converging populations. But this measuring of correlation is all that is done. Having a time series at hand, however, invites one to do a complete time series analysis. This involves identifying an appropriate model that adequately represents the data generating process (in this case the random walk that generated the time series), estimating the parameters of this model, and applying statistical tests to see how well the estimated model approximates the given data and what the explanatory and predictive value of the model is. In other words, a model of the form
y +1 = f (y y ;1 y0 ) is derived from the observed data, that can be used to simulate the outcome of a random process similar to the one that generated the original data, or to predict future values in the time series. This model, together with some statistics that denote the explanatory and predictive value of the estimated model, provides (statistically) complete information about the time series and the process that generated it. So this provides far more information about the landscape on which the time series was produced than only calculating the (auto)correlations. t
t
t
The next section introduces a time series analysis that is based on the estimation of the autocorrelations of a given time series. This time series analysis is known as the Box-Jenkins approach, and is used in the tness landscape analysis introduced in this paper.
3.4 The Box-Jenkins approach The Box-Jenkins approach BJ70] is a very useful standard statistical method of model building, based on the analysis of a time series y1 y2 : : : y , generated by a stochastic process. The purpose of the BoxJenkins approach is to nd an ARMA model2 that adequately represents this data generating process. Once an adequate model is found, it can be used to make forecasts about future values, or to simulate a process similar to the one that generated the original data. T
An ARMA model represents an autoregressive moving-average process, and is obtained by combining an autoregressive (AR) process and a moving-average (MA) process. An AR process of order p (AR(p)) has the form y = 1y ;1 + + y ; + " where the stochastic variable " is white noise, that is, E " ] = 0, V ar(" ) < 1 for all t, and Cov(" " ) = 0 for s 6= t, so all " are independent of each other. So each value in an AR(p) process depends on p past t
t
p
t
t
t
p
t
t
s
t
t
2 Actually, it should be an ARIMA model, if the original time series in not stationary. This case will not be dealt with here.
8
values and some stochastic variable " . An MA process of order q (MA(q)) has the form t
y = " + 1 " ;1 + + " ; where " is again white noise. So each value in an MA(q) process is a weighted sum of members of a white noise series. An ARMA(p q) process, then, is a combination of an AR(p) and an MA(q) process: y = 1y ;1 + + y ; + " + 1 " ;1 + + " ; The mean of a time series generated by one of these three processes is zero. If this does not match the original data, then a constant c can be added to the model, resulting in a non-zero mean of the time series. t
t
t
q
t
q
t
t
t
p
t
p
t
t
q
t
q
In economics (and business) the Box-Jenkins approach is used frequently when a model is needed to make forecasts about future values of some (partly) stochastic variable, for example the price of some commodity, or the index of industrial production (see for example Gra89]). The approach consists of three stages: 1. Identication, in which a choice is made for an appropriate ARMA model, by looking at the (partial) autocorrelations of the time series. 2. Estimation, in which the parameters of the chosen model are estimated. 3. Diagnostic checking, which involves applying various tests to see if the chosen model is adequate enough. Next, the three stages of the approach are explained in more detail.
Identication
At the identi cation stage an appropriate model is speci ed on the basis of the correlogram and the partial correlogram. The correlogram of a time series y is a plot of the (estimated) autocorrelations (the r as given in Section 3.1) of this series against the time lag i. The partial correlogram is the plot of the (estimated) partial autocorrelations of the time series against the time lag. It will not be explained here how to calculate the partial autocorrelations (see BJ70, Gra89, J+ 88]), but the i'th partial autocorrelation can be interpreted as the estimated correlation between y and y + , after the eects of all intermediate y's on this correlation are taken out. The choice of an appropriate model can now be made on the following basis: t
i
t
t
i
If the correlogram tapers o to zero and the partial correlogram suddenly \cuts o" after some point, say p, then an appropriate model is AR(p). To determine this cut-o point p p, the partial autocorrelations are compared with their two-standard-error bound, which is 2= T (T being the
length of the time series). If the correlogram \cuts o" after some point, say q, and the partial correlogram tapers o to zero, then an appropriate model is MA(q). Here the \cut-o" point p is also determined by comparing the autocorrelations with their two-standard-error bound of 2= T . If neither diagram \cuts o" at some point, but both taper o, then an appropriate model is ARMA(p,q). The values of p and q have to be inferred from the particular pattern of the two diagrams. 9
Estimation
Once the appropriate model is chosen, the parameters of this model can be estimated. This is achieved by using the estimates of the autocorrelations of the given time series (see Section 3.1). From these values, estimates for the parameters of the model can be derived (see again BJ70, Gra89, J+ 88]). As a measure of signi cance of the estimated parameters the t-statistic is used. This statistic is de ned as the estimated value of the parameter divided by its estimated standard error. Because the estimation of a parameter will never be exact, an interval of two times the standard error on both sides of the estimate determines a so called 95% condence interval. The probability that this con dence interval will contain the real value of the parameter is 95%. But if zero is also contained in this interval, then the parameter could just as well be equal to zero. For this reason, a parameter is called signicant (meaning signi cantly dierent from zero), if the absolute value of the t-statistic of its estimate is greater than two, because zero will then be outside the 95% con dence interval. As a measure of \goodness of t" of the estimated model, the R2 is used (0 R2 1). This value is a measure of the proportion of the total variance in the data accounted for by the inuence of the explanatory variables of the estimated model. A value of R2 close to one means that the explanatory variables can explain the observed data very well. A value of R2 close to zero means that the stochastic component of the model plays a dominant role (or it could be that there exist more explanatory variables than there are currently in the model this will usually not be the case here, because it is assumed that in the identi cation stage an appropriate model is already chosen).
Diagnostic checking
Before the estimated model is used, it is important to check that it is a satisfactory one. The usual test is to t the model on the data and calculate the autocorrelations of the residuals (the dierence between the observed values and those predicted by the estimated model). These residuals should be white noise, so all the autocorrelations should not be signi cantly dierent from zero.p To check this, the residual autocorrelations are compared with a two-standard-error bound (also 2= T ). Another test is to t a slightly higher-order model and then to see whether the extra parameters are signi cantly dierent from zero. So, if an AR(p) model is estimated, an AR(p +1) model could also be estimated and the signi cance of the extra parameter should be checked (it should be insigni cant). Summarizing, the Box-Jenkins approach is used to nd an appropriate model for a given time series generated by some stochastic process, to make forecasts for, or simulations of the process that generated the original data. In the next subsection, all concepts and de nitions introduced in the former subsections are put together to construct the entire statistical tness landscape analysis.
3.5 The statistical tness landscape analysis Having introduced the concept of random walks, the autocorrelation function, the de nition of correlation length, time series analysis and the Box-Jenkins approach, the entire statistical tness landscape analysis can now be put together. Starting from the landscape model as described in Section 2.1, there is a speci c operator that de nes the neighborhood relation for the tness landscape. This operator is used to perform a random walk on the landscape. Start at a random point in the landscape, record its tness, apply the operator once to this point, resulting in a new point which is a neighbor of the former one, record the tness of this new point, apply the operator again, etc. 10
This way, a time series of tness values is generated. Next, the Box-Jenkins approach is applied to this time series, providing the (estimated) autocorrelations of this time series, a statistical model that represents the data-generating process (i.e., the random walk), and some statistics from which the explanatory and predictive value of the estimated model can be derived. From the (estimated) autocorrelations a correlation length can be derived. All these statistics can then be used for expressing the correlation structure of the tness landscape, and for comparing the structure of dierent landscapes. There are, however, two limitations on this tness landscape analysis. First, it can only be applied on walkable landscapes (see section 2.1). But since most known operators in for example Hillclimbing and Genetic Algorithms produce just as many genotypes as output as they take as input, thus creating a walkable landscape, this limitation will not really be a problem. A second limitation occurs when the landscape is not connected. The correlation structure that results from the analysis can not be used as a global measure of the tness landscape, because the random walk can not leave the component of the landscape on which it started. However, the same analysis can be applied to a number of components of the landscape, resulting in a correlation structure for each of those components. Together, these results can be used as the structure of the entire landscape, if this is desired. This limitation, however, points to a more sophisticated analysis that is possible on any walkable landscape, whether connected or not. The analysis can be restricted to just a subspace of the landscape, by restricting the random walk to this subspace. A more detailed analysis can thus be applied, and the correlation structure of one or more parts of the entire landscape can be determined. This way, the assumption of statistical isotropy can be checked for correctness, or the correlation structures of dierent parts of the landscape can be compared to that of the entire landscape, etc. This makes the above analysis a powerful tool for both global and local tness landscape analysis. So every operator de nes its own tness landscape, with its own correlation structure. In the next section, three dierent operators are considered, their landscape is constructed, and for each landscape the statistical tness landscape analysis is applied.
4 Correlation structure of tness landscapes In this section, various landscapes are constructed, and their correlation structure is determined by using the statistical landscape analysis as introduced in Section 3. Landscapes are constructed using bit strings as genotypes, three dierent operators for de ning the neighborhood relation, and the NK-model as tness function. The three operators are such, that all landscapes are walkable, symmetric and connected. All operators take one genotype as input and produce one genotype as output. In the next subsection, the experimental setup is given and the three operators are described. The following subsections show the results of applying the complete landscape analysis to tness landscapes de ned by each of the operators.
11
4.1 Experimental setup First, dierent landscapes are constructed by implementing the landscape model (see Section 2.1) as follows:
R: The space of bit strings of length N , or f0 1g , where N = 100. : One of three dierent operators, all taking one bit string as input and producing one bit string as N
result: 1. Point mutation 2. Random-mate random-child one point crossover 3. Long jumps The dierent operators are described below. f : For the tness function, the NK-model is used (see Section 2.2). F : The possible tness values are in the range 0 1] (see again Section 2.2). >F : The \standard" order on R, which in fact is a total order.
In the NK-model, N is set to 100, and for each of the three operators de ning a neighborhood, the values 0, 2, 5, 25, 50 and 99 are considered for K . Furthermore, both random and nearest neighbor interactions are considered for each situation. So this gives rise to a total of 3 6 2 = 36 dierent tness landscapes. On each landscape, the operator that de nes the neighborhood relation for that landscape is used to perform a random walk of 10,000 steps on that landscape. This way, a time series of tness values is generated, to which the Box-Jenkins approach (see Section 3.4) is applied. The three operators are de ned as follows (operating on bit strings): Point mutation: At every step one randomly chosen bit in the bit string is ipped (0 $ 1). Random-mate random-child one-point crossover: At every step a second parent is randomly chosen out
of all possible bit strings. A crossover point is selected at random, and the parts of the parents after this crossover point are exchanged, creating two children. One of these two children is selected for the next step (each with chance 0.5). Long jumps: At every step each bit in the bit string is ipped with chance 0.5.
The crossover operator is implemented in a way to make the resulting tness landscape connected. The operator is just called \crossover" throughout the rest of this paper, instead of its full name random-mate random-child crossover. Long jumps are called this way because it is thought that this operator makes large jumps in the tness landscape. But this is just because most people use the Hamming distance for de ning the neighborhood relation of a tness landscape. However, in a tness landscape where the neighborhood relation is de ned by the operator Long jumps (as is the case in this paper), Long jumps, of course, makes \jumps" of just 12
one step, i.e. to a neighboring point. So actually the name \Long jumps" does not make much sense anymore, but it will be used here though, mainly for historical reasons. The next three subsections present the results of applying the complete tness landscape analysis to the dierent landscapes de ned above. In each subsection, a simple example of the resulting landscape is rst given, and then the results of the three steps of the Box-Jenkins approach are presented.
4.2 Results for Point mutation The landscape
Figure 2 gives the tness landscape for point mutation for bit strings of length 3. All edges are labeled with probability 1/3, because each bit has a change of 1/3 of being ipped by the operator point mutation. Fitness values, as de ned by an implementation of the NK-model as given in the example on page 42 of Kau93], are shown in parentheses. 110 111 (0.63)
(0.70)
100
101
(0.83)
(0.40)
010
011
(0.43)
(0.53)
000
001
(0.47)
(0.50)
The tness landscape for bit strings of length 3 and point mutation as operator. Fitness values are shown in parentheses. All edges are labeled with probability 1/3 (labels not shown).
Figure 2:
Identication
The correlograms for the dierent values of K are given in Figures 3 (randompinteractions) and 4 (nearest neighbor interactions), together with the two-standard-error bound of 2= T , or 0:02 for T =10,000. The correlograms all taper o to zero (except for K =99 here the autocorrelations almost immediately drop to zero), so an AR(p) or an ARMA(p,q) process should be most appropriate here. The graphs show clearly that the correlation length decreases as K increases, and that there is not much dierence between random interactions and nearest neighbor interactions. Table 1 gives the correlation lengths (as de ned in Section 3.2) for the dierent values of K . What is striking here is the fact that for K =99 there still is some correlation left. According to Kauman Kau89] it should be expected that the correlation is zero in the case of K = N ;1, because the landscape is completely random. So the time series generated by a random walk should be white noise (i.e. completely random) around the mean of the series. But although the estimated correlations are very small, as the graphs show, statistically they are signi cantly dierent from zero. Repeating the procedure for this value of K always gives the same sort of result, so it is not \just an accident". 13
1
0.8
0.8
0.6
K=0
0.6
Autocorrelation
Autocorrelation
1
K=2 0.4 K=25 K=5
K=0
0.4
K=2 K=25
0.2
0.2
K=5
K=50
K=50 K=99
K=99 0
0
-0.2
-0.2 0
5
10
15
20
25 Time lag
30
35
40
45
50
0
5
10
15
20
25 Time lag
30
35
40
45
50
The rst 50 autocorrelations for point mutation Figure 4: The rst 50 autocorrelations for point mutation for dierent values of K , random interactions. for dierent values of K , nearest neighbor interactions. Figure 3:
K 0 2 5 25 50 99
Table 1:
nearest random neighbor interactions interactions >50 >50 >50 >50 49 >50 18 14 5 7 2 3
The correlation lengths for the operator point mutation for dierent values of K .
A possible explanation for these minor correlations is that for K = N ; 1 all tness values vary just slightly around the mean of 0.5. This might introduce some slight correlations, which, strictly speaking, should not be present. To see which of the two models (AR(p) or ARMA(p,q)) is the most appropriate, the partial correlogram for K =0 (random interactions) is shown in Figure 5. This plot shows that the rst partial autocorrelation is almost equal to one, and thus well outside the two-standard-error bound of 0:02. The other partial autocorrelations are all within this bound (apart from one minor exception). So it is clear that the partial correlogram cuts o after one time lag, and thus an AR(1) model is the most appropriate choice here. The partial correlograms for the other values of K and for nearest neighbor interactions look very similar, except for K =99. In this case, the rst two or three partial autocorrelations are outside the two-standarderror bound, indicating an AR(2) or AR(3) process. Although these partial autocorrelations are very small, they are statistically signi cant. This is also an ever-recurring result when repeating the whole procedure.
Estimation
In all cases an AR(1) process of the form
y = c + 1y ;1 + " is estimated. The constant is added because the mean of the time series is not equal to zero. Table 2 shows the results of the estimation. The t-statistics (see Section 3.4) of the estimated parameters are shown in parentheses. t
t
14
t
random interactions K c 1 V ar(" ) R2 0 0.00850 0.98181 0.0000182 0.964
nearest neighbor interactions c 1 V ar(" ) R2 0.00959 0.98090 0.0000177 0.963
2 0.01620 0.96629 0.0000467 0.934
0.01763 0.96431 0.0000461 0.930
5 0.03309 0.93400 0.0000964 0.872
0.02999 0.94005 0.0000961 0.884
25 0.12857 0.74272 0.0003779 0.552
0.13140 0.73687 0.0003808 0.543
50 0.25419 0.49141 0.0006357 0.242
0.25494 0.49003 0.0006349 0.240
99 0.48293 0.03374 0.0008257 0.001
0.49013 0.02063 0.0008335 0.000
t
Table 2:
(9:59)
(518:20)
(13:08)
(375:48)
(18:46)
(261:59)
(38:35)
(110:91)
(58:29)
(56:41)
(96:51)
(3:38)
t
(9:94)
(510:85)
(13:47)
(364:39)
(17:55)
(275:72)
(38:86)
(109:00)
(58:40)
(56:21)
(97:78)
(2:06)
The results of the estimation of an AR(1) process for the operator point mutation for dierent values of K .
The table shows that all parameters are signi cant (t-statistic > 2). It also shows clearly that the correlation coecient ( 1) decreases and the variance of the error term (V ar(" )) increases as K increases. This explains the fact that the correlation length decreases for increasing K . Note that the correlation coecient decreases linearly with increasing K . t
Furthermore, the R2 decreases as K increases, so the estimated model is less capable of explaining the observed data, apart from random variances, for higher values of K . The table, just as the correlograms, shows no dierence between random and nearest neighbor interactions. It appears that the estimated 1 and the R2 for K =99 are both very small. So, the estimated model for K = N ; 1 is hardly any dierent from an ordinary white noise series around the mean of the series, which is, as mentioned, theoretically expected.
Diagnostic checking
Figure 6 shows the rst 25 residual autocorrelations for K =0 (random interactions). This plot shows that they are all well within the two-standard-error bound of 0.02 (apart from one minor exception). The plots for the other values of K and for nearest neighbor interactions look very similar. 1
0.1
Residual autocorrelation
Partial autocorrelation
0.8
0.6
0.4
0.2
0.05
0
-0.05
0 -0.1 0
5
10
15
20
25
0
Time lag
5
10
15 Time lag
20
25
The rst 25 partial autocorrelations for point Figure 6: The rst 25 residual autocorrelations for point mutation for K = 0, random interactions. mutation for K = 0, random interactions.
Figure 5:
15
To make absolutely sure, an AR(2) model is estimated for all cases. The results are presented in Table 3.
K 0 2 5 25 50 99
Table 3:
random interactions 2 t-statistic 0.00624 0.62 -0.01104 -1.10 -0.00716 -0.72 0.00526 0.53 0.00812 0.81 0.03189 3.19
nearest neighbor interactions 2 t-statistic -0.00548 -0.55 -0.01437 -1.44 0.00642 0.64 0.00955 0.96 -0.00746 -0.75 0.04821 4.83
The results of the overestimation of the chosen model for point mutation for dierent values of K .
As expected, the extra parameter is not signi cantly dierent from zero in all cases except for K =99. This exception is not surprising, because the partial autocorrelations already suggested an AR(2) or AR(3) process. But as noted above, the estimated parameters in this model are very small (as well as the R2 ), and it eectively comes down to an ordinary white noise series. So these two checks show that the chosen AR(1) model seems to be adequate in all cases, except for K =99.
4.3 Results for Crossover The landscape
Figure 7 shows the tness landscape for crossover for bit strings of length 3. Edge probabilities are 6/16, 3/16, 2/16 and 1/16. The same tness values as in Figure 2 are used. (0.47)
000 (0.43)
(0.50)
010
001
101
011
(0.40)
(0.53)
111
100
(0.70)
(0.83)
110 (0.63)
6/16
2/16
3/16
1/16
Figure 7: The tness landscape for bit strings of length 3 and random-mate random-child one-point crossover as operator. Fitness values are shown in parentheses. The dierent edge probabilities are shown beneath the landscape graph.
16
Identication
1
1
0.8
0.8
0.6
0.6
Autocorrelation
Autocorrelation
The correlograms for crossover for the dierent values of K are presented in Figures 8 (random interactions) and 9 (nearest neighbor interactions). These correlograms also taper o to zero, but much faster than for point mutation. The graphs show furthermore that for crossover there is some dierence between random interactions and nearest neighbor interactions. This is shown more clearly in Table 4, which shows the correlation lengths for the dierent values of K .
0.4
K=0 0.2
K=25 K=50 K=99
0.4
K=0 K=5 K=2
0.2
K=2
K=50
K=5
K=25
K=99
0
0
-0.2
-0.2 0
1
2
3
4
5 Time lag
6
7
8
9
10
0
1
2
3
4
5 Time lag
6
7
8
9
10
The rst 50 autocorrelations for crossover for Figure 9: The rst 50 autocorrelations for crossover for dierent values of K , random interactions. dierent values of K , nearest neighbor interactions.
Figure 8:
K 0 2 5 25 50 99
Table 4:
nearest random neighbor interactions interactions 6 4 5 4 3 4 1 3 0 2 1 1
The correlation lengths for the crossover operator for dierent values of K .
The table shows that the correlation dies out for random interactions, whereas for nearest neighbor interactions there still remains some correlation as K increases. This dierence can be explained by the fact that the genetic operator one-point crossover makes use of building blocks (see Hol92, Gol89]). If the epistatic interactions in the NK-model are spread randomly across the genotype, then every possible crossover point will aect the epistatic relations of almost all bits in the string, especially for larger values of K . But if the epistatic interactions are the K neighboring bits, then only the epistatic relations of the bits in the vicinity of the crossover point are aected (this is stated in another way in the Schema Theorem, which implies that schemata of short de ning length have a higher chance to survive under one-point crossover than longer ones see Hol92, Gol89]). So, if more epistatic relations stay intact, more bits will keep the same tness, and the tness of the entire genotype in the next step will be more correlated to that of its parents. Furthermore, it is again striking that for both random and nearest neighbor interactions there is some correlation for K = N ; 1. Repeating the procedure for this case gives similar results. The same explanation as for point mutation can be given here: the tnesses of the local optima vary just a little around 0.5, which might introduce slight correlations, which should strictly speaking not exist. 17
Looking at the partial correlogram for K =0 (random interactions), as shown in Figure 10, it can be concluded that an AR(1) process is the most appropriate model here too. The partial correlograms for the other values of K and for nearest neighbor interactions look very similar, except for K =50, random interactions. In this case, all partial autocorrelations are within the two-standard-error bound, indicating that a white noise series around the mean is the most appropriate model here. This is not surprising, because the correlation length is zero in this case, as Table 4 shows.
Estimation
As with point mutation, an AR(1) process with constant is estimated here. The results are presented in Table 5 (t-statistics shown between parentheses). random interactions K c 1 V ar(" ) R2 0 0.26140 0.50278 0.0003423 0.253
nearest neighbor interactions c 1 V ar(" ) R2 0.23012 0.50130 0.0003581 0.251
2 0.31878 0.37191 0.0006060 0.138
0.26521 0.49274 0.0005581 0.243
5 0.37223 0.26104 0.0007311 0.068
0.26929 0.45817 0.0006116 0.210
25 0.47254 0.05400 0.0008388 0.003
0.35425 0.29229 0.0007491 0.085
50 0.49038 0.01925 0.0008214 0.000
0.42640 0.14640 0.0008310 0.021
99 0.48515 0.02921 0.0008209 0.001
0.48189 0.03048 0.0008354 0.001
t
(57:52)
(58:21)
(67:56)
(40:06)
(76:42)
(27:04)
(94:56)
(5:41)
(97:92)
(1:92)
(96:95)
(2:92)
Table 5:
t
(57:57)
(57:93)
(58:21)
(56:62)
(60:85)
(51:54)
(73:87)
(30:56)
(86:14)
(14:80)
(96:84)
(3:05)
The results of the estimation of an AR(1) process for crossover on for dierent values of K .
All parameters appear to be signi cant, except for the 1 for K =50, random interactions (indicating again the absence of correlation in this case). The table also shows that the estimated value of 1 is larger for nearest neighbor interactions than for random interactions for 0 < K < N ; 1 (for the two extremes of K =0 and K = N ; 1 there is no dierence between random and nearest neighbor interactions, see Section 2.2). This indicates that there is indeed more correlation for one-point crossover on a landscape with nearest neighbor interactions for the NK-model, than on a landscape with random interactions. Also, the R2 is larger for nearest neighbor interactions for 0 < K < N ; 1, which means that the estimated model has more explanatory and predictive value than it does for random interactions. For crossover too, the model for K =99 is hardly distinguishable from ordinary white noise, considering the estimated value of 1 and R2.
Diagnostic checking
Figure 11 shows the rst 25 residual autocorrelations for K =0 (random interactions). All are well within the two-standard-error bound. The plots for larger K values and nearest neighbor interactions look very similar. Table 6 gives the results of the overestimation, the extra check on the adequacy of the chosen model. The table shows that all extra parameters are not signi cantly dierent from zero, indicating once more that the AR(1) model is adequate. 18
1
0.1
Residual autocorrelation
Partial autocorrelation
0.8
0.6
0.4
0.2
0.05
0
-0.05
0 -0.1 0
5
10
15
20
25
0
Time lag
5
10
15
20
25
Time lag
The rst 25 partial autocorrelations for Figure 11: The rst 25 residual autocorrelations for crossover for K = 0, random interactions. crossover for K = 0, random interactions. Figure 10:
K 0 2 5 25 50 99
Table 6:
random interactions 2 t-statistic 0.00412 0.41 0.01767 1.77 0.01198 1.20 0.00013 0.01 0.00386 0.39 0.00284 0.28
nearest neighbor interactions 2 t-statistic -0.00623 -0.62 -0.00866 -0.87 0.00613 0.61 0.01576 1.58 0.01270 1.30 0.00160 0.16
The results of the overestimation of the chosen model for crossover for dierent values of K .
4.4 Results for Long jumps The landscape
Figure 12 gives the tness landscape for long jumps for bit strings of length 3. It is the complete graph on the genotype space, plus loops at each vertex, and all edge probabilities are 1/8. The same tness values as in Figures 2 and 7 are used.
Identication, Estimation and Diagnostic checking
For all values of K (both random and nearest neighbor interactions), the correlation length for long jumps turns out to be zero (results not shown). So the tness landscape for long jumps is totally random, even for small values of K . The estimated parameters for an AR(1) process all proved to be insigni cant (except for the constants of course), indicating that the only appropriate model is just a white noise series. So for long jumps every landscape looks just the same: completely random.
19
(0.47)
000 (0.43)
(0.50)
010
001
101
011
(0.40)
(0.53)
111
100
(0.70)
(0.83)
110 (0.63)
The tness landscape for bit strings of length 3 and long jumps as operator. Fitness values are shown in parentheses. All edges are labeled with probability 1/8 (labels not shown).
Figure 12:
5 Conclusions The structure of the tness landscapes considered in this paper can be expressed by the correlation structure in the form of an AR(1) process: y = c + 1y ;1 + " where " is white noise. This veri es the claim made by Weinberger that NK-landscapes are generic members of AR(1) landscapes Wei90]. This AR(1) model is obtained by applying a time series analysis, the Box-Jenkins approach, to a time series of tness values obtained by generating a random walk via neighboring points in the tness landscape. Every landscape has its own speci c values for the parameters of this model, which are estimated in the time series analysis. Using this model to describe the correlation structure of these tness landscapes tells a lot about this structure: t
t
t
t
The AR(1) model implies that the tness at a particular step (y ) in a random walk generated on this landscape totally depends on the tness one step earlier (y ;1 ), and some stochastic variable (" ). Knowing the tness two steps earlier does not give any extra information for the expected t
t
t
value of the tness at the current step. One of the properties of an AR(1) process is that the value of the parameter 1 is the correlation coecient between the tness of two points one step apart in a random walk. The results show that this value decreases as K , the richness of epistatic interactions in the NK-model, increases. As a consequence, the correlation length of the landscape also decreases. The variance of the stochastic variable " , also estimated in the time series analysis, indicates the amount of inuence that this variable has in the model. As K increases, this variance increases, indicating a larger inuence, and thus less correlation between the tness of points one step apart. The value of R2, a measure of goodness of t of the model, indicates the explanatory and predictive value of the model. As K increases, this value decreases, indicating less explanatory and predictive value. t
20
So the correlation structure of the dierent landscapes considered here can be compared in terms of this AR(1) model. In general, the structures of all possible tness landscapes can be compared in terms of the models that are produced by applying the landscape analysis introduced here. Preliminary results show that this model is not necessarily an AR(1) model, but can also be an autoregressive model of a higher order. Furthermore, a more detailed analysis would be possible by restricting the random walk to a subspace of the landscape (which is not done here, though). This way, the statistical isotropy assumption can be checked, or the structures of dierent parts of the landscape can be compared to each other. The statistical tness landscape analysis introduced in this paper could be a good candidate for a \standard" landscape analysis, since at this moment everyone has their own way of doing this, and dierent methods are not always comparable. The analysis here makes use of standard statistical methods that are well understood and often used. This makes the analysis widely applicable.
6 Acknowledgments This paper resulted from a M.Sc. thesis done at the Erasmus University in Rotterdam, The Netherlands, under the supervision of Bernard Manderick. The paper itself was completed while staying at the Santa Fe Institute. The visit and research done at SFI was supported by the National Science Foundation grant NSF IRI-9320200 and the Department of Energy grant DE-FG03-94ER25231.
21
References BJ70] Gol89] Gra89] Hol92] Hor94] J+ 88] Jon95] Kau89] Kau93] Lip91] MWS91] Pal91] Wei90] Wri32]
G. E. P. Box and G. M. Jenkins. Time Series Analysis, Forecasting and Control. Holden Day, 1970. D. E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. AddisonWesley, 1989. C. W. J. Granger. Forecasting in Business and Economics. Academic Press, 2nd edition, 1989. J. H. Holland. Adaptation in Natural and Articial Systems. MIT Press, 2nd edition, 1992. W. Hordijk. Population Flow on Fitness Landscapes. Master's thesis, Erasmus University Rotterdam, 1994. Available via anonymous ftp at ftp.cs.few.eur.nl in pub/doc/masterstheses. G. G. Judge et al. Introduction to the Theory and Practice of Econometrics. John Wiley & Sons, 2nd edition, 1988. T. C. Jones. Evolutionary Algorithms, Fitness Landscapes and Search. PhD thesis, University of New Mexico, Albuquerque, NM, March 1995. S. A. Kauman. Adaptation on Rugged Fitness Landscapes. In D. Stein, editor, Lectures in the Sciences of Complexity, pages 527{618. Addison-Wesley, 1989. S. A. Kauman. Origins of Order: Self-Organization and Selection in Evolution. Oxford University Press, 1993. M. Lipsitch. Adaptation on Rugged Landscapes Generated by Iterated Local Interactions of Neighboring Genes. In R. K. Belew and L. B. Booker, editors, Proceedings of the Fourth International Conference on Genetic Algorithms, pages 128{135. Morgan Kaufmann, 1991. B. Manderick, M. de Weger, and P. Spiessens. The Genetic Algorithm and the Structure of the Fitness Landscape. In R. K. Belew and L. B. Booker, editors, Proceedings of the Fourth International Conference on Genetic Algorithms, pages 143{150. Morgan Kaufmann, 1991. R. Palmer. Optimization on Rugged Landscapes. In A. S. Perelson and S. A. Kauman, editors, Molecular evolution on rugged landscapes: Proteins, RNA and the immune system, pages 3{25. Addison-Wesley, 1991. E. D. Weinberger. Correlated and Uncorrelated Fitness Landscapes and How to Tell the Dierence. Biological Cybernetics, (63):325{336, 1990. S. Wright. The roles of mutation, inbreeding, crossbreeding and selection in evolution. In D.F. Jones, editor, Proceedings of the Sixth International Congress on Genetics, volume 1, pages 356{366, 1932.
22