Analysis of possible genome-dependence of mutation rates in Genetic ...

Comment

Report 2 Downloads 14 Views

published in:

Terence C. Fogarty, editor, Evolutionary Computing - selected papers from the 1996 AISB Workshop, vol. 1134 of Lecture Notes in Computer Science, pages 257 { 269, Springer Verlag, September 1996.

Analysis of possible genome-dependence of mutation rates in Genetic Algorithms Bernhard Sendho? and Martin Kreutz? Institut fur Neuroinformatik Ruhr-Universitat-Bochum 44780 Bochum, Germany

Abstract.

The performance of the evolutionary algorithms depends strongly upon the combined eect of the operators (e.g. mutation) and the mappings from genotype to phenotype space and phenotype to tness space. We demonstrate, with the example of the canonical Genetic Algorithm (cGA) for parameter optimization, that the right choice of the mutation operator should depend on the genom position and we show that generally point-mutation alone might not be sucient for the particular binary mapping in the cGA. We take up the idea from Evolution Strategy to mutate via addition of normally distributed random numbers and construct the point-mutation operator in a way to resemble this on average. This concrete approach for genetic algorithms is accompanied by more general remarks on the analysis of evolutionary algorithms in the rst and the last section of this paper.

1 Introduction The central point of all evolutionary algorithms is the application of Darwin's theory of evolution, which can be characterized by the major components mutation/crossover, recombination, selection. Once this principle is applied the performance of any algorithms is dependent on the mapping between the three spaces (genotype-space G , phenotype-space P , tness-space F ) and the way the \evolutionary" operators (mutation etc.) act upon them. In order to understand why one particular evolutionary algorithm performes better than another one, the mappings and operators must be analysed. Conversely, the right combination of these two is essential in constructing an algorithm that performs well on a particular class of problems. In this paper, we will restrict ourselves to the analysis of the mutation of the bitstring of a canonical Genetic Algorithm (cGA) for parameter optimization. The advantage of this choice is that in the domain of parameter optimization another type of evolutionary algorithm is present, the Evolution Strategy (ES), which uses a completely dierent setting and performes better than the Genetic ? e-mail: [email protected]; [email protected]

bochum.de

Algorithm (GA) [2]. We will keep the \evolutionary" spaces of both algorithms (f0; 1gL denotes all bit strings of length L and xk (i) the ith bit of the kth string): ES: G P = IRn G ! P : idIR cGA: G = f0; 1gL : : : f0; 1gL; P = IRn

G!P :

! LX ?1 i i x1 (i)2 ; : : : ; xn (i)2 i=0 i=0

LX ?1

and incorporate the idea of adding normally distributed numbers as mutation (ES) in the mutation rate of the cGA. Thus, we calculate the probability that bit i in a bit string is mutated (f0; 1g ! f1; 0g) P (i) pm . We concentrate on the genom dependence and neglect the time dependence of pm . The dependence of the mutation rate on genom and population size has been carried out in [6]. The idea to allow variable mutation probabilities on the genom goes back to Fogarty [5], who proposed an exponential decay of the mutation probability over the binary string and over evolution time. This result is a special case of the more general one presented in this paper, where for suciently small normally distributed numbers P (i) indeed decays exponentially. The concept of self-adaptation was incorporated into GAs by Back [1], who proposed a seperate mutation rate for the binary representation of each object variable. Using a kind of self-mutation, he claimed that the strategy parameters show self-adaptation. The paper is organized as follows. In section 2 and 3 we will calculate a mutation probability P (i) for a cGA for parameter optimization, which resembles (on average) a mutation by adding a normally distributed number. In section 4 the particular form of P (i) found will be applied to two standard optimization problems and tested against the cGA with constant pm . Furthermore, we will critically examine the question of self-adaptation. Section 5 picks up on the more general comments above and identi es two major conditions which the combination of \evolutionary" spaces and operators should ful ll. These are examined for both the variable and the constant mutation operator and their overall importance for the analysis of evolutionary algorithms is pointed out. The paper closes with some conclusions drawn in section 6.

2 The connection between bit string mutation probabilities and normally distributed random numbers. We analyse a GA with a L = 10 bit long binary encoding2 . We assume a uniformly distributed number xt 2 [0; L~ ], which has been mapped onto the bit 2

A longer binary sequence does not change the quality of the behaviour of the mutation rate

string3 . In analogy to the ES we then add a normally distributed number4 onto xt . xt (i), (i) denote the ith bit in the binary representation of xt and respectively. P (i) is the probability that bit i changes during the summation, this corresponds to the mutation probability pm in GAs. Furthermore, we de ne the following notation for the additional probabilities: P (i); Px(i) are the probabilities that the ith bit of the normally distributed number and the genome x are set. Pc (i) is the probability that the sum of the ith bits of x and results in a carry bit. In order to nd an expression for the probability P (i), we have to write down all intervals for whose elements the ith bit is set. It is obvious that all of these intervals will start with the bit con guration (a ? can be replaced by 1 or 0; the highest bit is on the left) ? : : : ? |{z} 1 0:::0 The last bit string included is

bit i

? : : : ? |{z} 1 1:::1 bit i

Therefore, the interval length is 2i ? 1. The distance between the intervals depends on which ? are set to 1/0. If higher bits are set the distances get larger. Carefully writing down the contributions from the intervals and grouping them together, we can deduce the following formula [7]:

P (i) =

LX ?i?1 n=1

2 n?1 2 X ?1 4

?

P 2 2i

n

+ +1

p=0 ? i +P 2 2 ; 2i+1 [

? p 2 i ? 2i ; 2i +1

n

+ +1

? p 2i

+1

[

3 5

(1) As an example it is straightforward to verify, that in the case i = 3, 2i = 8, (1) simpli es to P (3) = P ( 2 [ 8; 16 [ )+P ( 2 [ 24; 32 [ )+P ( 2 [ 56; 64 [ )+P ( 2 [ 40; 48 [ )+: : : Since is normally distributed5 with variance 2 and mean , Eq. (1) can be written as

P (i) =

LX ?i?1 n=1

2

?1 ?

+ +1 ?p2i+1

4

2nX 1 Z 2i n

p=0

+ +1 ?p2i+1 ?2i

2i n

3

p 1 exp (x 2? ) dx5 2 2

2

As long as the bit string is chosen to be suciently large the dierence between the real value space for the Evolution Strategy and the integer valued space for the GAs is negligible. 4 Of course 2 [0; L~ ], > L~ does not make any sense, since xextremal 2 [0; L~ ] by de nition. < 0 is mapped to [0; L~ ], since +=? is replaced in GAs by (0 ! 1)=(1 ! 0). 5 for a uniformly distributed number , Eq. (1) simpli es to Pu (i) = 0:5, as to be expected. 3

+

Z

+1

2i

p 1 exp (x 2? ) dx 2 2

(2)

2

2i

Pc(i) can be recursively de ned as a combination of P (i) and Px(i) 8 < Px (i) P (i) + Px (i) (1 ? P (i)) Pc (i ? 1) + (1 ? Px (i)) P (i) Pc (i ? 1) ; i > 0 Pc(i) = : Px(i) P (i)) ;i=0

(3)

The probability P (i) can now be derived as

P (i) = PP ((ii)) (1 ? Pc(i ? 1)) + Pc(i ? 1) (1 ? P (i)) ;; ii >= 00

(4)

3 Approximation of P (i) and experimental simulations We will start by setting Px (i) constant to 0:5 in Eq. (3), being aware that this is only true if we assume that the probability distribution of the parameter is uniform over the whole time of evolution.

Pc(i) =

(1 2

1 2

P (i) + (1 ? P (i)) Pc (i ? 1) + P (i) Pc (i ? 1) ; i > 0 P (i) ;i=0 1 2

1 2

(5)

This can be written in the form of a linear dierence equation

Pc(i + 1) ? 21 Pc (i) = 21 P (i + 1) Pc(0) = 12 P (0)

(6)

and solved using the standard technique of Z-transforms (see e.g. [4]) i i? ? i? i? i X X Pc(i) = 21 P (0) + 21 P ( + 1) 21 = 21 P ( ) 12 (7) +1

1

=0

1

=0

In the following two subsections we will introduce an approximation for P (i) for the cases of large and small . This is a technical necessity, which works ne in the speci c case considered here. Surely more re ned approximations can be found, this, however, is not the objective of this paper. Due to space limitations the following discussion will be very compact, for a more concise discussion about the quality and validity of the approximation see [7].

3.1

P

(i); Pc (i)

for standard deviations larger than one

Practical applications of Eq. (7) make an approximation of the Gaussian integral in Eq. (2) necessary. Firstly, for 1 we approximate P (i), (a1 ; a2 are constants, whose numerical values are determined in (14)) 8
aa12 1

Inserting (8) into (7),

i+1

Pc(i) = 21

2

2

P (0) + a 2?i 1

i?1 X =0

2 ? a2 2?i

(8) i?1 X =0

( + 1)2 2

(9)

calculating the sums (see [7]) and grouping terms together gives (b = ln(2))

q (i) = a 1 ? 2? i ? a ?i ? 2i + 3 ? 3 2?i 8 q < q (i) ; i aa12 q Pc (i) = : q a1 q a1 q a2 exp ?b i ? a2 ; i > aa12 1

1

( +1)

(10)

2

2

1

(11)

1

3.2

P

(i); Pc (i)

for standard deviations smaller than one

For < 1 we tackle (2) directly to obtain a simpli ed solution for P (i), can 2 ? x because exp 22 ' 0 8x 3; 2 1. Therefore, we can neglect all integrals in the sum in (2), whose boundaries are larger than this threshold. Z 2 +1 2 1 ? x p exp 22 P (i) ' 2 2 i (1 + ))2 i ? (2 2 = p exp ; = () 2 [ 0; 1 ] (12) 22 2 The crucial point of the last step6 is the right choice of the parameter , which we will approximate by a second order polynomial in , whose constants will be tted numerically. i

i

De ning the constants c1 = (22 )? 21 and c2 = p1+2 insert (12) into (7) i?1 X

2

for convenience, we can

Pc(i) = 2?i? c e?c2 + 2 ?i c exp ??c 2 c ? i ? c 2 ' 2 2 e + 2 c e? c2 + 8 c e? c2 := 2?i q (c ; c ) 1

1

2

+1

1

2

2 +2

=0

1

6

1

4

If f (x) is continuous on [; ], there exists a f ()( ? ).

1

16

2

1

with and

2

R

(13)

f (x) =

0.5

0.4

0.4

P(i)

σ=10

P(i) 0.2

0.3

15

0 0

10 2

0.2

σ

σ=0.5 0.1

4 5

i

(a)

(b)

6 8

0

2

4

6

8

i

Fig.1. (a) P (i) for 2 [1:0; 19:0] in = 1:0 steps (in the gure s ). (b) Top: P (i) for = 10:0 (interlaced curve - approximation from Eq. (14), dots - numerical results from Eq. (2)), bottom: P (i) (rescaled with a factor 8) for = 0:5 (interlaced curve - approximation from Eq. (14), dots - numerical results from Eq. (2)).

Since the adaptation of parameters which depened on each other can be problematic, we reduce (a1 ; a2 ; ) from Eq. (10) and Eq. (13) to only one parameter from which the three can be derived. Inserting Eqs. (8, 10, 12, 13) into Eq. (4), we arrive at the mutation probability for the binary encoding over the whole parameter range of . ? ? 8 i 2i 1 2( 1 > 1 2 exp 2 2? > > ?i 1 1 2i exp ? > + ( ) 2 > 2 1 2 > > > > 1 exp ( 2) > > > > h > > < ?i?2 2

c ?c q c ;c c ?c

9 =

? q c ; c ) 2?i ?c ?c 2 i ; i > 0 2

2

2

;i=0

;

? 2 ) ? a ()(i ? i ? 3 2?i? + ) P (i) ' > a+(a)(1 () (2?i? ? 1) ? a () (i ? 2i + 3 i ? 3ii 2?i ) > > q > > a1 ?i? ? 3 2?i + 3) > + a ( ) a ( )(2 i ? 2 i ? i 2 ; i > > > q q a2 q > > > > q aa21 exp ?b i ? aa12 ; i > aa12 > > : a ; i=0 ? q (i) = a () 1 ? 2? i ? a () i ? 2i + 3 ? 3 2?i q (x; y) = x2 e?y + 2 x e? y + 8 x e? y 2

1

1

2

1

2

1

1

2

2

2

2

2

( ) ( )

1

4

3

2

3 2

2

1

( ) (

1

1

( +1)

1

4

2

?

n X

16

1 + 1 ai () = ki;j loglog(2) j =0

c = (2 1

2

)? 21 ;

2

2

!j

; () = 0:4577 ? 0:5052 (1 ? 2 )

c2 = 1p+2

2

The ki;j are numerically tted as follows:

; b = ln(2)

( ) ( ) ( ) ( )

9 > > > > > > > = > > > > > > > ;

0 in Fig. 2 (b) is a result of the need for correlated mutations among the bits near points 2i ; i 2 IN , where an inversion of the typical curve Fig. 1 (b) can be observed. In order to reach these problematic points, it is necessary to

ip some (or even all) bits simultanously. In general this eect is averaged out. However, in cases where the population \lingers around" some of these points during evolution their contribution is not negligible anymore and averaging the carry bit probabilities Eq. (3) cannot provide the determinism needed among the bit ip probabilities. The eect is also strongly visible in Figure 4 (b).

0.5

0.5 0.4

0.4

50 0.3 P(i)

P(i) 0.2

0.3

0.1

t=100

100

0 0

0.2

t 2

t=200

150

0.1

4

(a)

i

6 8 200

(b)

0

2

4

6

8

i

Fig. 2. Position dependence of the bit- ip probability taken from Evolution Strategy. (a) The 30{dimensional sphere model serves as a test function. A (15,100){strategy with one self{adapting standard deviation was used. The mean over all object parameters and all individuals was taken and mapped to the standard binary code of 10 bit. (b) same settings as in (a), but the optimum lies exactly in the middle of the interval, a problematic point.

4 Application of position{dependent mutation probabilities to Genetic Algorithms We carry out several optimization runs for both unimodal and multimodal objective functions. Results are given for the sphere model7 and Ackley's function7 . In order to obtain comparative results the expected number of bits to be mutated in each generation is held constant over the whole genome and time. The representation of our strategy parameter is included in the genome of each individual. In this way the mutation probabilities are endogenous items. This results in a reduction of the number of external parameters and serves as a basis for problem-dependent self-adaptation. Since our test problems are symmetric one for all object variables is sucient. We use the rank-based selection according to Baker [3] in order to inhibit premature convergence. We Sphere model

105

100

best function value

best function value

100

10-5

10-10

(a)

10-15 0

Ackley’s function

102

10-2

10-4

5.0•105

1.0•106 1.5•106 function evaluations

2.0•106

2.5•106

(b)

10-6 0

5.0•105

1.0•106 1.5•106 function evaluations

2.0•106

2.5•106

Fig.3. Convergence plots of optimization runs. The solid line shows the results obtained by position-dependent mutations. Dashed lines the ones from the standard GA. (population size: 50; dimension: n = 30; encoding length: 32 bits using Gray-code8 ) examine the following methods for the variation of , which is restricted to [ 0:1; 20:0 ]: (1) is encoded as a oating point number and is mutated by a multiplicative logarithmic-normally distributed process. Intermediate recombination is applied. This method is used in some Evolution Strategies for adjusting the standard deviations. (2) is encoded as a bit-string and mutated (a) with a xed global mutation rate and (b) using the position-dependent mutation probabilities. In case (b) we use a kind of self-mutation similar to the one presented by Back in [1]. We observe a similar asymptotic behaviour but without the convergence towards zero. Varying according to (1) our position-dependent mutation rate converges faster to a better value than the canonical Genetic Algorithm. Figure 3 shows that in 7

Implementation and documentation are available via

ftp://ls11.informatik.uni-dortmund.de/pub/GA/src/GENEsYs-1.0.tar.Z

case of the sphere model the increase of convergence speed is of order 105 and for the Ackley function of order 101:5. We are not able to reach similar results with the method (2).

4.1 Some aspects of self-adaptation Self-adaptation of parameters which govern the exploration process of the \evolutionary" spaces is one of the features which make evolutionary algorithms so attractive among other stochastic optimization methods. However, the process of self-adaptation itself is not easy to dierentiate. We do not claim that our strategy parameter optimizes itself depending on position and history of the search spaces. However, using method (2) we are able to reproduce the kind of directed variation described in [1] but we are not convinced that this represents problem-dependent self-adaptation. Back himself showed that the convergence of his strategy parameter is achieved without any tness-based selection at all. We believe that one of the preliminaries of successful self-adaptation must be the principle of causality. We refer to causality in the physical sense of strong causal system, that is little variations in the cause only lead to little variations in the eect. In the case of evolutionary algorithm, we use this principle to describe the relation between the \evolutionary" spaces via mutations, see also section 5 and [8]. Coarsly, this can be compared to a continuous mapping between distances on G and on F . The distance on G can be expressed as a function of the transition probability. Although the new mutation operator improves on the causal relation between G and F (see [8] for a quantitative analysis), it still only provides point-mutation. We believe that exactly the problem of correlated mutations among the bits is responsible for the self-adaptation diculties. Since real adaptation of is not possible the best way is to provide a set of highly diverse values of to the population members and trying to preserve their diversity during the process of evolution. Method (1) seems to meet this requirement and therefore produces the best results.

5 Causality and Genetic Algorithms We determined the genome-dependence of the mutation probability in GAs for parameter optimization by comparison with ES. The practical results in section 4 were encouraging but still not as succesfull as the Evolution Strategy, in both respects: quality and self-adaptation. We believe that the settings in evolutionary algorithms have to be chosen in a way that causality in the evolution process is guaranteed in order to achieve problem-dependent self-adaptation. Gray code is used because of its unimodality in the decoding process. The standard binary code may turn unimodal objective functions on a phenotypic level into multimodal functions on a genotypic level. Moreover, there is the necessity to change several bits simultaneously to achieve any progress (crossing Hamming clis).

9

Since we are concerned with continuous parameter optimization, we will examine two manifestations of causality; see [8] for a complete treatment of the principle of causality in the analysis of evolutionary algorithms. (1) The transition probability to jump a distance of e.g. s = 1:0 in parameter space should not signi cantly depend on the starting point. (2) Smaller steps for the system should be more likely than larger steps especially in later stages of the evolution. Additionally, the step-length should depend on its parameters in a physical-causal sense. Figure 4 shows experimental results for the transition probability ( denotes the Kronecker symbol)

Ptrans =

l?1 Y i=0

?

(1 ? P (i)) x (i); x +1 (i) + P (i) 1 ? x (i);x +1 (i) t

t

t

t

(15)

Ptrans for the genome dependent mutation rate shows a exponential decay over the euclidian distance in Figure 4 (b). The probability in Fig. 4 (a) for the constant mutation rate is dominated by non-causal peaks belonging to problematic step-sizes which can be written as powers of two, since in their vicinity only a few bits have to be ipped even for large jumps. The observation that the standard deviation d in Fig. 4 (b) is generally larger than in (a) is a consequence of the larger transition probabilities. In order to sensibly compare the performance of the two mutation probabilities they should therefore be rescaled so that the overall probability remains the same and only its distribution along the bit position changes, this was done in section 4. Figure 5 examines the dependence of the s = 1:0 transition probability on the starting points. Both (a) and (b) are plagued with a zig-zag pro le corresponding to the occurrence of problematic starting points. Nevertheless, our genome-dependent mutation rate (b) minimizes the number and in uence of these points as long as correlation can be averaged over. It should be noted that the Evolution Strategy inherently ful lls both requirement (1) and (2). One is tempted to say that it has the optimal coding/mutation combination for continuous parameter optimization.

6 Conclusion In this paper we have highlighted the importance of the analysis of the intrinsic interplay between the mappings between the \evolutionary" spaces and the eect of mutation. We have done this on a more abstract basis in the rst and the last section, thereby building the framework for the direct approach to look into the genom dependence of the mutation probability pm P (i) for canonical Genetic Algorithms for parameter optimization in sections 2,3 and 4. We calculated this dependence by comparison with Evolution Strategies. Therefore, we aimed at a mutation probability which resembles the addition of normally distributed random numbers. We have done so in section 2 and 3

Ptrans Ptrans

(a)

s

(b)

s

(a)

P(X->X+1)

P(X->X+1)

Fig. 4. (a) Probability of making a step of length s = n + 1 for a constant mutation rate, averaged over the possible starting points [0; 2L ? 1 ? nmax ], including the standard deviation as vertical lines. (b) As in (a) with the bit dependent mutation rate P (i); = 5:0, equation (14).

X

(b)

X

Fig. 5. Probability of making a step of length s = 1:0 for (a) constant mutation rate and (b) bit dependent mutation rate P (i); = 5:0, for starting points x 2 [40; 160].

and applied the results successfully to two standard problems in section 4. However, we also noted the diculties we encountered during this research especially with self-adaptation, this again lead us back to the more general question of the (mapping, operator) relation. Analysis of the performance of evolutionary algorithms is the only way to be able to reach new, challenging and realistic applications for these optimization methods. We believe we presented in this paper both, new interesting ideas for the evaluation of EAs and a concrete mathematical analysis of the speci c genom dependence in one particular case, the canonical Genetic Algorithm.

Acknowledgements We would like to thank Werner von Seelen, Willfried Wienholt and Christian Goerick for stimulating discussions and ideas. This research work is part of the BMBF SONN project under Grant No. 01IB401A9.

References 1. Thomas Back. Self{adaptation in genetic algorithms. In Proceedings of the First European Conference on Arti cial Life, pages 263 { 271, Paris, France, 1991. MIT Press. 2. Thomas Back and Hans{Paul Schwefel. An overview of evolutionary algorithms for parameter optimization. Evolutionary Computation, 1(1):1 { 23, 1993. 3. James Edward Baker. Adaptive selection methods for genetic algorithms. In John J. Grefenstette, editor, Proc. 1st International Conference on Genetic Algorithms and Their Applications, pages 101{111, Pittsburgh, PA, 1985. Lawrence Erlbaum Associates (Hillsdale, NJ, 1988). 4. Bronstein and Semendjajew. Taschenbuch der Mathematik. Verlag Harri Deutsch, 23 edition, 1987. 5. Terence C. Fogarty. Varying the probability of mutation in the genetic algorithm. In J. David Schaer, editor, Proc. 3rd Int. Conf. on Genetic Algorithms, pages 104{ 109, San Mateo, CA, 1989. Morgan Kaufmann Publishers. 6. Jurgen Hesser and Reinhard Manner. Towards an optimal mutation probability for genetic algorithms. In H.-P. Schwefel and Reinhard Manner, editors, Parallel Problem Solving from Nature PPSN, pages 23 { 32. Springer Verlag, 1990. 7. Bernhard Sendho and Martin Kreutz. Variable, genom-dependent mutation probability for genetic algorithms. Internal Report 95{07, Institut fur Neuroinformatik, Ruhr-Universitat Bochum, 44780 Bochum, Germany, 1995. availabe via WWW: http://www.neuroinformatik.ruhr-uni-bochum.de/ini/VS/PUBLIST/1995/html/irini95.html. 8. Bernhard Sendho and Martin Kreutz. Causality and the analysis of evolutionary algorithms. submitted to Evolutionary Computation, 1996.

This article was processed using the LaTEX macro package with LLNCS style

Recommend Documents