Modeling and convergence analysis of a ... - Semantic Scholar

Report 25 Downloads 59 Views
Modeling and convergence analysis of a continuous multi-objective differential evolution algorithm Feng Xue

Arthur C. Sanderson

Robert J. Graves

General Electric Global Research Center 1 Research Circle Niskayuna, NY 12309 USA [email protected]

Dept. of Electrical, Computer and Systems Engineering Rensselaer Polytechnic Institute Troy, NY 12180 USA [email protected]

Thayer School of Engineering Dartmouth College Hanover, NH 03755 USA [email protected]

Abstract— This paper reports a mathematical modeling and convergence analysis of a continuous multi-objective differential evolution (C-MODE) algorithm that is proposed very recently. This C-MODE is studied in the context of global random search. The convergence of the population to the Pareto optimal solutions with probability one is developed. In order to facilitate the understanding of the C-MODE operators in a continuous space, a mathematical analysis of the operators is conducted based upon a Gaussian distributed initial population. A set of guidelines is derived for the parameter setting of the CMODE based on the theoretical results from the mathematical analysis. A simulation analysis on a specific numerical example is conducted to validate the mathematical analytical results and parameter-setting guidelines. The performance comparison based on a suite of complex benchmark functions also demonstrates the merits of such parameter-setting guidelines.

I. I NTRODUCTION The mathematical theory in generational based global random search due to Zhigljavsky [1] can be used to model the simple genetic algorithm [2]. The theory studies how the population distribution evolves due to the way that the algorithm produces the next generation by constructing certain sampling strategies based on whole population information and local information. Peck and Dhawan [3] reformulate the simple genetic algorithm in the framework of generational random optimization as described in [1] with the proportional selection operator corresponding to the global sampling while crossover and mutation to the local sampling. The operators of the simple genetic algorithm fit the general framework and the convergence conclusion was drawn based on the convergence theory in [1]. Qi et al. [4] consider the situation that search space is continuous and certain probability density function can be explicitly formulated for the population sampling strategies. The convergence of a simple genetic algorithm was concluded based on reformulation of the more general theory based on the probability measure in [1]. Specific cases with quadratic objective functions and a Gaussian initial population are considered in [4] to exemplify the convergence theory. Subbu and Sanderson [5], [6] extend the convergence theory to the co-evolutionary optimization framework, where the whole search space is divided into multiple subspaces and multiple evolutionary agents are executed to search these

subspaces while coordinated information interchanges among these agents to update the decision variable. All the theoretical analysis work mentioned above is based on the assumption of the existence of a unique global optimal solution. Although the cases with multiple global optimal solutions can be included in consideration as stated in [1], conclusion is limited to the finding of a global optimal solution but not the distribution concentration on all the global optimal solutions. The Markov process has been successfully applied to modeling single-objective evolutionary algorithms in the past such as Rudolph [7], [8], Suzuki [9], Fogel [10]. The convergence of single-objective evolutionary algorithms in continuous search space has been studied to some extent in the past as mentioned above, while the convergence properties of multi-objective evolutionary algorithms in the continuous space are seldom investigated. Rudolph [11] considered a simplified version of MOEA emulating the (1+1) evolution strategies. Several adaptive updating rules for the evolutionary algorithm are numerically studied based on a multi-objective optimization problem with two quadratic functions. Convergence analysis is performed based on one of the adaptive updating rules and conclusions are drawn on convergence to somewhere of the Pareto optimal set. Hanne [12] takes a more general way to prove the convergence of an evolutionary algorithm to some solutions (at least one) in the Pareto optimal set based on the non-dominated selection scheme. Recently, there is work [13] to study the running time of simplified MOEA based on specific functions. However, we concentrate on convergence analysis in the paper. In this paper, we perform a theoretical analysis of the continuous multi-objective evolutionary algorithm based on random search techniques and specific numerical examples (for a mathematical analysis of the discrete MODE, the readers are referred to [14]). The paper is organized in the following way: section II highlights the main features of the C-MODE; in section III, the convergence of C-MODE is treated based upon global random search without the detailed analysis of the operators; in section IV, the operators of C-MODE are mathematically analyzed under the Gaussian initial population assumption and simulation analysis based on a simple numerical example and complex benchmark

functions is conducted; the last section is the conclusion and future research work.

z2

Rank 3

II. C ONTINUOUS M ULTI -O BJECTIVE D IFFERENTIAL E VOLUTION

Rank 2 Rank 1

To facilitate the understanding of the theoretical analysis of the Multi-Objective Differential Evolution (MODE) in this paper, we summarize the main features and operators of the MODE in this section. For a complete design of the MODE, the readers are referred to [15]–[17]. The MODE algorithm is an extension of the Differential Evolution concepts to the multi-objective context. A. Differential Evolution Differential Evolution (DE) is a type of evolutionary algorithm proposed by Storn and Price [18] for optimization problems over a continuous domain. DE is similar to (µ, λ) evolution strategy in which mutation plays the key role. There are several variants of the original differential evolution. The particular one described below follows Joshi and Sanderson [19], [20]. The main operators that control the evolutionary process are the reproduction and selection operators. The following paragraph provides a brief description of the major components. The algorithm follows the general procedure of an evolutionary algorithm: an initial population is created by random sampling and evaluated; then the algorithm enters a loop of generating offspring, evaluating offspring, and selecting individuals to create the next generation. In DE, for every individual pi in the parent population, the following reproduction operator is used to create its offspring: p′i = γ · pbest + (1 − γ) · pi + F ·

K X

(pika − pikb )

(1)

k=1

where pbest is the best individual in the parent population, γ represents greediness of the operator, and K is the number of perturbation vectors, F is the scale factor of the perturbation, pika and pikb are randomly selected mutually distinct individual pairs in the parent population, and p′i is the offspring that is generated; γ, K, and F are the parameters associated with the algorithm. The basic idea of DE is to adapt the search step inherently along the evolutionary process in a manner that trades off exploitation against exploration. At the beginning of the evolution, the perturbation is large since parental individuals are far away to each other. As the evolutionary process proceeds to the final stage, the population converges to a small region and the perturbation becomes small. As a result, the adaptive search step benefits the evolution algorithm by performing global search with a large perturbation step at the beginning of the evolutionary process and refine the population with a small search step at the end. The selection operator in DE selects the better one between the parent and its offspring by comparing their fitness value.

z1

Fig. 1. The non-dominated sorting method assigns the first rank to the non-dominated individuals in the population and the second rank to the nondominated individuals in the rest of the unlabelled population. This process continues until all the individuals are assigned ranks

B. Multi-objective differential evolution Fitness assignment to the individuals in an evolving population of a multi-objective evolutionary algorithm is rather different than the single-objective counterpart. There are different approaches to evaluating individual fitness of a population, most of which can be regarded as the variants of the non-dominated sorting schema suggested by Goldberg [2]. This Pareto-based rank assignment is employed in a way to assign ranks to individuals based on their Pareto dominance relation to other individuals in the same population. The rank assignment process is illustrated in Figure 1 where the whole population is divided into multiple ranks with all individuals in each rank assigned the same fitness values. In order to mimic the reproduction operator in the DE approach described in Section II-A, we need to define two kinds of vectors: the differential vector and the perturbation vectors for each individual in the population for its reproduction of offspring. In the MODE, a Pareto-based approach is introduced to implement the selection of the best individual to define the differential vector for the reproduction operation of an individual. The algorithm can identify the set nondominated solutions (those assigned as first rank) of the population at each generation of the evolutionary process. In order to apply the reproduction operation to an individual, pi , we need to examine whether the individual is dominated or not. If this is a dominated individual, a subset non-dominated individuals, Di , that dominates this individual can be identified. A “best” solution, pbest , is chosen randomly from the set Di . The vector defined between pi and pbest becomes the differential vector for the reproduction operation of the individual pi . If the individual is already a non-dominated individual, the pbest will be the individual itself. In this case, the differential vector becomes 0 and only perturbation vectors play effect. The perturbation vectors are defined by randomly chosen distinct pairs of individuals from the parent population:  PK   pi + F · k=1 (pika − pikb ) pi is non-dominated γ · pbest + (1 − γ) · pi + p′i =   F · PK (p k − p k ) pi is dominated ia ib k=1 (2) where the symbols represent the same meaning as they are in the single-objective DE algorithm. In MODE, all the parent

and the children (one parental individual generates one child) are put together compete for entering into the next generation based on their ranks and the crowd distance metrics. III. S TOCHASTIC MODEL

OF

C-MODE

IN CONTINUOUS

SPACE

In this section, the convergence properties of C-MODE are studied in a similar manner to [12]. In order to formulate the analysis, we need another concept called convex cones in Rk , which has been extensively used in [21] in the development of the multi-objective optimization theory. The convex cone is defined as V = {v : v ∈ Rk , v 6 0} It should be noted that the symbol 6 is used to compare two vectors, v1 and v2 , where v1 6 v2 iff vi1 ≤ vi2 , ∀i = 1, · · · , k, ∃i, vi1 < vi2 . Having defined the convex cone, we denote the set of solutions that dominate a particular solution, x, by {≺ x} = {x′ : x′ ∈ χ, Z(x′ ) ∈ (Z(x) + V )}. Let’s also define the ǫ-neighborhood of a point, z, in the space Rk as Nǫ (z) = {z′ : |z′ − z| ≤ ǫ}. We have defined previously the Pareto optimal (or efficient) solution set Λ∗ with respect to a MOOP (χ, Z(.)), where χ ⊆ Rn is the feasible space and Z(.) is the vector objective function with the argument being any solution from the feasible decision space χ. For the convenience of explanation, we denote Λ∗ = E(χ, Z(.)), where E(·, ·) is a map which generates all the Pareto optimal solutions within the first argument with respect to the criteria (vector functions) described by the second argument. An ǫ-efficient solution can be defined as Λǫ = {x : x ∈ χ, Z({≺ x}) ⊆ Nǫ (Z(x))}. In another words, an ǫ-efficient solution is a solution that all its dominating solutions are located within its ǫ-neighborhood in the criterion space. Clearly, if a solution x is an ǫ1 -efficient solution, it is also an ǫ2 -efficient solution for any ǫ2 ≥ ǫ1 . Let P t denote the population at generation t of the CMODE. At each generation, a set of local efficient solutions with respect to the population at that generation, can be obtained, which is denoted by Λt = E(P t , Z(.)). For any solution i in Λt (i = 1, · · · , |Λt |), there is an minimal value ǫti ≥ 0 such that the this solution is an ǫti -efficient solution. Proposition 1: If we make a mild assumption about the selection operator of the C-MODE such that the parents are guaranteed to enter into the next generation when the non-dominated solutions E(P + P ′ , Z) obtained from the combined parents P and their offspring P ′ are over the population size N . Then, let ǫtmin = mini=1,··· ,|Λt | ǫti , and t ǫt+1 min ≤ ǫmin . Proof: The selection operator in C-MODE assures that Λt+1  Λt , which means any solution in Λt is weakly dominated by some solutions in Λt+1 . This dominance relation assures that for a solution that is ǫt -efficient in Λt there exists at least one solution that is ǫt+1 -efficient in Λt+1 with t ǫt+1 ≤ ǫt holding. Therefore, ǫt+1 min ≤ ǫmin . If we approximate the reproduction operation of C-MODE by independent Gaussian perturbation of each component of

the n-dimensional decision variable, the following proposition about population convergence to be a subset of Λǫ for any ǫ > 0 can be proved in a similar matter as in [12]. Proposition 2: Let the MOOP (χ, Z(.)) be regular. Then lim P r(P t ⊆ Λǫ ) = 1, ∀ǫ > 0 Proof: The regular condition assures that the objective functions zj (.), j = 1, · · · , k are continuous and the feasible set χ is a compact (closed and bounded) set. This also lead to the conclusion that the ǫ-efficient sets Λǫ , ∀ǫ > 0 are closed and bounded. With the component-wise Gaussian perturbation assumption, the probability that an individual in the parental population generates an offspring belonging to the set Λǫ via the evolutionary operators is just an integral of a certain Gaussian probability density function over the compact region Λǫ . This probability is positive since the Λǫ region is compact and measurable. Therefore, there exists a probability pt > 0 at each generation at which the parental population generates an offspring population with all its individuals located within the Λǫ region via the reproduction operators. These pt ’s do not need to be the same from generation to another. However, there is a lower bound to this probability corresponding to the population containing solutions with largest distance to the Λǫ . Since the feasible space χ is compact, such a lower bound exists by having a population consisting of solutions that are the least possible to generate offsprings in the Λǫ region. Let p denote this lower bound. Then we have P r(P t * Λǫ ) ≤ (1 − p)t . Therefore, t→∞

P r(P t ⊆ Λǫ ) ≥ 1 − (1 − p)t Taking a limitation of the above equation as t → ∞ leads to the proposition. IV. M ATHEMATICAL ANALYSIS OF C-MODE OPERATORS In this section, we will study the C-MODE operators and their effects on the convergence properties of the algorithm by examining the evolving probability distribution of the population based on which the search algorithm is run. The C-MODE is such a complex dynamic system that it is barely possible to conduct a tractable mathematical analysis on the operators in a general context. We will tackle this complex system by first examining the population evolution of the C-MODE with only reproduction operators, i.e., the differential and perturbation factors. The selection factor will incorporated into the analysis at the second stage in an indirect way. A. Analysis of population distribution To facilitate the mathematical analysis, we assume that the population is initialized by sampling from a Gaussian distribution with a mean µ0 and a covariance matrix Σ0 , that is, any individual in the initial population X0 ∼ N (µ0 , Σ0 ). The analysis of the C-MODE without the selection operator can be performed with rather simple mathematical analysis on the differential and perturbation operators. Proposition 3: If the initial population is Gaussian distributed and contains the Pareto optimal set Λ∗ , the subsequent

populations generated by the C-MODE without selection are also Gaussian distributed and the population mean converges to the center of the Pareto optimal set Λ∗ . Proof: We can prove this by induction. We know the population is initialized by a Gaussian distribution. Let assume that the population is Gaussian distributed at generation t, t ≥ 0 such that any individual in generation t is a radome solution Xt ∼ N (µt , Σt ). We want to examine the population distribution at generation t + 1. Since we consider the case without selection. The individuals of the next generation can be generated by apply only the differential and perturbation operators to the individuals at generation t. X t+1 = γX ∗ + (1 − γ)X t + F ·

K X

(Xai − Xbi )

i=1

where X t , Xai , Xbi are individuals from the population at generation t while X ∗ is a randomly chosen solution in the Pareto optimal set Λ∗ . Here we can regard X ∗ as a random solution uniformly distributed on the probability support defined by Λ∗ . We assume that a randomly chosen Pareto optimal solution is used to define the differential vector for the reproduction of all the individuals in the population. We make some approximations here for the easiness of the mathematical analysis. In the practical implementation of the C-MODE, the differential vector for each individual is defined independently of the others by choosing one dominating solution from the first non-dominated rank (the first nondominated rank here is the Pareto optimal set). X t , Xai , Xbi are individuals from the parental population, which means that they are from the same Gaussian distribution N (µt , Σt ). Therefore, X t+1 is just a linear combination of independent Gaussian random variable, which concludes that X t+1 is also Gaussian distributed. The expected value conditioned on that a specific Pareto optimal solution x∗ is chosen to define the differential vector can be expressed as E(X t+1 |X ∗ = x∗ ) = γx∗ + (1 − γ)µt . Therefore, the unconditioned expected value is E(X t+1 ) = γE(X ∗ ) + (1 − γ)µt . The covariance matrix can expressed as V ar(X t+1 ) = (2KF 2 + (1 − γ)2 ) · Σt . The first step induction implies that X t+1 ∼ N (γX ∗ + (1 − γ)µt , (2KF 2 + (1 − γ)2 ) · Σt ) provided that Xt ∼ N (µt , Σt ). If we start the evolution with initial population X0 ∼ N (µ0 , Σ0 ), we can obtain the population distribution at generation t by recursively applying the above formula. The population distribution at generation t can be expressed in terms of the parameters for the initial Pt−1 distribution as X t ∼ N (γ i=0 (1 − γ)i X ∗ + (1 − γ)t µ0 , (2KF 2 + (1 − γ)2 )t · Σ0 ) The expected value of X t is E(X t ) = γ

t−1 X

(1 − γ)i E(X ∗ ) + (1 − γ)t µ0

(3)

i=0

The following two limiting results lim (1 − γ)i µ0 = 0,

t→∞

lim

t→∞

t−1 X i=0

(1 − γ)i = 1/γ

lead us to the conclusion that lim E(X t ) = E(X ∗ )

t→∞

Since X ∗ denotes an uniformly distributed random solution with probability support of Λ∗ , the expected value is the center of the Pareto optimal set. However, it is should be noted that the limiting properties of the variance matrix depends on the factor (2KF 2 + (1 − γ)2 ). If this factor is less than 1, then the limiting variance matrix vanishes, while the limiting variance matrix explodes if the factor is greater than 1. We can examine this phenomenon in the simulation results performed later. This result can also be used to guide the parameter setting of the C-MODE when applied in real world application, which will be discussed later. In the practical application with finite population size, however, the optimal solutions will not be present in the initial population. The exploration of C-MODE would identify the global optimal solution during the evolutionary process and the selection operator would keep those optimal solutions found in the evolution. A model that is closer to the practical application is that the best solutions found by the C-MODE evolve as the algorithm proceeds. Let X t∗ denotes the an non-dominated solution identified up to generation t while the whole non-dominated solution set is denoted as Λt as defined before. Similarly, we can regard X t∗ as a random variable defined on the probability support Λt and a specific solution from Λt is used to define the differential vector for the whole population. The reproduction operation is performed in the following way. X t+1 = γX t∗ + (1 − γ)X t + F ·

K X

(Xai − Xbi )

(4)

i=1

Based on the similar arguments as in Proposition 3 the subsequent populations in the evolutionary process are Gaussian distributed with the same variance matrix as in the simplified model but a different mean, E(X t ) = Pt−1analyzedt−ibefore i∗ γ i=0 (1 − γ) X + (1 − γ)t µ0 . We can reformulate the Equation 4 by introducing a difference variable Dt = X t∗ − X ∗ . X t+1 = γ(X ∗ + Dt ) + (1 − γ)X t + F ·

K X

(Xai − Xbi ) (5)

i=1

This symbolic reformulation can interpreted in the following way. After a specific non-dominated solution xt∗ is sampled from the set Λt to generate the differential vectors for the individuals in the population, we can assume that a Pareto optimal solution x∗ can be identified in the set Λ∗ such that x∗ is the closest one in Λ∗ to the chosen non-dominated solution xt∗ . A difference between these two solutions can then be calculated. It is clearly that this difference varies with the choice of the non-dominated solution used to generate the differential vectors. We can then express the population mean

at generation t as E(X t ) = γ

t−1 X

(1 − γ)t−i−1 E(X ∗ )+

i=0

γ

t−1 X

(1 − γ)t−i−1 E(Di ) + (1 − γ)t µ0

(6)

i=0

Here we make some approximations. The expected value of X ∗ should be calculated with respect to how the nondominated solution X t∗ is distributed. We assume here that this expected value can be approximated by the center of the Pareto optimal set. Based on this approximation, there is an extra term other than the terms present in the previous case (Equation 3). In order to examine the convergence properties of the population mean, we need to study this extra term Pt−1 γ · i=0 (1 − γ)i · E(Di ), which is denoted as De . Let’s i−1 )| define a sequence q i = |E(D |E(Di )| , i = 1, · · · , t. We can find the maximal value of this sequence q = maxti=1 q i , such that |E(Di−1 )| ≤ q|E(Di )|, i = 1, · · · , t. If we express the above formula in terms of Dt , we have |E(Di )| ≤ q t−i |E(Dt )|, i = 0, · · · , t. We can then examine the mode of the extra term |De | = γ|

t−1 X

(1 − γ)t−i−1 · E(Di )|

i=0

≤ γ|E(Dt−1 )| ·

t−1 X

(1 − γ)t−i−1 q t−i−1

i=0

If we can assure that the following condition (1 − γ)q < 1

(7)

γ hold, the term |De | has a limit 1−(1−γ)q · |E(Dt−1 )| when the algorithm evolves infinite number of generations. The reproduction and selection operators of C-MODE make sure that the |E(Dt−1 )| decreases to 0 with probability one as the evolution proceeds. The limiting population mean with dynamically changing optimal solutions is:

lim E(X t ) = E(X ∗ )

t→∞

B. Simulation analysis on numerical example In order to examine the behavior of C-MODE numerically and validate the mathematical analysis developed in the previous section, we study a specific example with two quadratic objective functions. The bi-objective optimization problem is formulated as follows:   kx − x0 k2 (8) min Z(x) = kx − x1 k2 where x ∈ R2 , x0 , x1 are two distinct points in R2 , k · k2 is the 2-norm distance. The optimal solution set Λ∗ = {x : x = rx0 + (1 − r)x1 , r ∈ [0, 1]} In the numerical analysis, we set x0 = (3, 0)′ , x1 = (5, 0)′ . The C-MODE is performed with a population size N = 100, maximal number of generations G = 50, number

of perturbation vectors K = 2, and an initial population sampled from a Gaussian distribution with mean (8, 8)′ and a standard covariance matrix (the identity matrix). The CMODE is run with different parameter settings to reflect the different scenarios according to the theoretical analysis performed in the previous section. The simulations performed by applying the C-MODE without selection are used to study the reproduction operators, while the full version of CMODE is simulated to examine the combined performance. These studies provide insights about the impacts of the CMODE operators and the proper parameter settings for these operators. In order to show the population distribution in the subsequent evolution, normal probability distribution is fitted to each dimension of the decision variable. If the population can be well approximated by a bivariate normal distribution, then the probability plot for each dimension can be well fitted by a normal distribution. (Due to the limited space, only a portion of plots are shown here, for a complete analysis, please refer to [17]) The variance evolution trend can be easily illustrated by plotting the subsequent populations in the decision space. Of course, one can also statistically estimate this evolving population variances by calculating the sample variance matrix. In this paper, we use probability plots to analyze the population distribution. As to normal distributed population, the probability plot for each dimension of the decision variables should be close to a straight line so that the population can be regarded as normal distributed. The mathematical analysis in the previous section shows that if 2KF 2 + (1 − γ)2 is greater than 1, the population variance matrix explodes, otherwise, the population variance matrix vanishes. We set the C-MODE parameters as γ = 0.7, F = 0.5 to correspond to the situation when the covariance matrix explodes. As shown in Figure 2, the subsequence populations can be well fitted by normal distribution since the plots are close to straight lines, and the variance explodes as the algorithm evolves. In order to simulate another scenario where the variance vanishes as the evolution proceeds, we can simply change the parameters to γ = 0.7, F = 0.3. This setting satisfies the condition of 2KF 2 + (1 − γ)2 < 1. The simulation results are shown in Figure 3. In this case, the population tends to clumps together compared to the previous case as shown in Figure 2 though the variance does not vanish completely. In the mathematical analysis with dynamic non-dominated solution set, the parameter settings satisfying the condition (1 − γ)q < 1 can guarantee the convergence to the optimal solution set. The parameter q can be regarded as the relative progress that the algorithm makes from the current generation to the next one. The progress is dependent on the search performance with the combined operations of the exploration part (differential and perturbation operators) and the selection. Although it is still not clear about the control of the value q by changing the evolutionary operators and the associated parameters, we can manipulate the parameter γ to loose the restriction on the value of q to meet the condition (1 − γ)q < 1. It is obviously that the restriction of the

Normal Probability Plot

Normal Probability Plot

0.10 0.05 0.02 0.01 0.003

0.75 0.50 0.25 0.10 0.05 0.02 0.01 0.003

G−0 6

7

8 Data

9

10

0

Normal Probability Plot

0.50 0.25 0.10 0.05 0.02 0.01 0.003 −5

Normal Probability Plot

Normal Probability Plot

0

5 Data

10

7

9

0.50 0.25

0.50 0.25 0.10 0.05 0.02 0.01 0.003

G−50

15

−20

0 Data

0

1

2

3

4

5

Data Normal Probability Plot 0.997 0.99 0.98 0.95 0.90

0.75 0.50 0.25 0.10 0.05 0.02 0.01 0.003

20

G−15

10

0.997 0.99 0.98 0.95 0.90

1

2

3

4

0.75 0.50 0.25 0.10 0.05 0.02 0.01 0.003

G−25 0

5

G−50 0

1

2

Data

10

3

4

5

Data 4

10

6

8

8

4

6

2

6

0

2

0

−4

−2

−6

G−0 −4

0

5 x

10

0

2

−2 0

4

x2

2

x

x2

2

4

x

2

6

0.75

0.10 0.05 0.02 0.01 0.003

G−0 8 Data

0.75

G−25

0.25

10

Probability

0.75

0.50

Data

0.997 0.99 0.98 0.95 0.90 Probability

0.997 0.99 0.98 0.95 0.90

5

0.75

0.10 0.05 0.02 0.01 0.003

G−15 −5

Probability

0.25

0.997 0.99 0.98 0.95 0.90

Probability

0.50

Normal Probability Plot

0.997 0.99 0.98 0.95 0.90 Probability

0.75

Probability

Normal Probability Plot

0.997 0.99 0.98 0.95 0.90 Probability

Probability

0.997 0.99 0.98 0.95 0.90

G−15 −5

0

5 x

1

10

−2

−2

G−0

−4

0

5 x1

1

G−15 −4

10

4

4

2

2

0

2

4 x1

6

0

2

4 x1

6

8

20 5

−5

−10

−5

0

5 x

10

0

−2

−20

G−25

−10

x2

x2

2

0

x

x

2

10 0

G−50

15

−20

0 x

1

20

G−25 −4

0

2

4 x1

1

Fig. 2. Population probability plot and individual plotting in the decision space generated by C-MODE without selection with γ = 0.7, F = 0.5, the variance of the population explodes as algorithm evolves

0

−2

6

G−50 −4

8

8

Fig. 4. Population probability plot and individual plotting in the decision space generated by C-MODE with selection with γ = 0.7, F = 0.5, the population converges to the optimal solution set

4

4

10 8

8

2

6

2

0

−2 G−0

−4

0

5 x

0

2

4 x

1

6

−2

−2

G−15 −4

10

G−0

−4

8

0

5 x

G−15 −4

10

2

2

2

−2

−2

G−25 −4

0

2

4 x

1

6

8

0

x

0

2

2

x2

4

2

4

−2

0

2

4 x

6

q value can be mostly loosed by setting a relatively large value of γ. It should be noted that the mathematical model considers only the evolution of the non-dominated solution set. It is mathematically intractable to fully model the rankbased selection operation of the C-MODE. Therefore, we might expect that the normal fitting to the population will not be good for the cases to apply C-MODE with selection. Considering the C-MODE with selection, the simulated results with parameter setting of γ = 0.7, F = 0.5 are shown in Figure 4. It is shown that the normal distribution still fits the population to some extent and the population converges to the global optimal solution set. The normal fit to the evolved population using C-MODE with selection is not as good as using C-MODE without selection. This is reasonable since the diversity control mechanism and selection operator bias the population evolution in the MODE. The C-MODE successfully identify the optimal solution set

6

4 x

6

0

2

4 x

1

Fig. 5.

8

0

G−25 −4

8

1

C-MODE without selection with γ = 0.7, F = 0.3

4 x

−2

G−50 −4

2

1

4

0

0

1

1

4

x

2

0

2

−2

x

4

x

0

x2

x2

2

x

2 0

Fig. 3.

2

6

4

6

8

G−50 −4

0

2

8

1

C-MODE with selection with γ = 0.7, F = 0.3

with small number of generations and all the individuals clump around the optimal line segment. Such a parameter setting corresponds to a situation with population variance exploding. However, it is the selection operator that prevents such exploding from happening while enabling the C-MODE with enough capability of exploration. The simulation results (as shown in Figure 5) of the CMODE with selection with parameter setting of γ = 0.7, F = 0.3 cannot converge to the optimal solution set. The reason is that the reproduction operators provide limited capability to explore the search space due to the inherent vanishing population variance. Although this vanishing population variance is helpful to promote convergence when the optimal solutions already present in the population, it is harmful to the exploration of the search space using C-MODE with a limited population size. The analysis performed above suggests the use of parameter

10

10

8 5 2

4

x

x2

6

2

0

0 −2

G−0

−4

0

5 x1

10

−5

200

10

100

5

10

15

0

x2

2

0 x1

20

0

x

G−15

−5

−100 −10 −200 −20

G−25 −20

Fig. 6.

−10

0 x1

10

20

30

G−50 −300

−200

0 x1

200

C-MODE without selection with γ = 0.1, F = 0.3 4

10 8

2

2

4

x

x2

6 0

2 0

−2

−2

G−0

−4

0

5 x

G−15 −4

10

0

2

4 x

6

4

2

2

2

4

0

−2

C. Simulation analysis on complex functions

0

−2 G−25

−4

0

2

4 x

6

8

1

Fig. 7.

8

1

x

x2

1

G−50 −4

0

2

4 x

6

8

1

C-MODE with selection with γ = 0.1, F = 0.3

settings that lead to exploded population variance such that the C-MODE has enough capability to explore the search space. The following simulation results provide further insight about the parameter setting. It is easy to check that the parameter setting with γ = 0.1, F = 0.3 also corresponds to exploded variance since such a setting satisfies 2KF 2 + (1 − γ)2 > 1. The result from mathematical analysis suggests exploded population variance if this parameter setting is used to the CMODE without selection. The simulated results of C-MODE without selection with this parameter setting (as shown in Figure 6) confirms the mathematical results. However, the simulated results of C-MODE with selection using the parameter setting γ = 0.1, F = 0.3 are not good, which is shown in Figure7. The population does not converge to the optimal solution set. This phenomena can be explained by the condition described in Equation 7. The relatively small γ value puts a rigid restriction on the value of q in order to satisfy this condition. In our case, we need q < 1.1 for such small γ = 0.1. Remember that q ≥ 1 based on the selection procedure. Thus, it is difficult to meet this condition with such a parameter setting. The mathematical results developed in the previous section are confirmed by the simulation results obtained by applying C-MODE to the numerical example with different parameter settings. The analysis provides a lot insights about how the evolutionary operators impact the performance of C-MODE. Two conditions are introduced base on both the mathematical analysis and simulation results to guide the parameter setting of C-MODE: • •

2KF 2 + (1 − γ)2 > 1 (1 − γ)q < 1

In practice, we want to set the differential factor and perturbation factor such that they combine to make 2KF 2 +(1−γ)2 > 1 to allow that the C-MODE has enough capability to explore the search space. On the other hand, we want to set the differential factor γ a relatively large value to loose the restriction on q. The mathematical analysis by no means intends to obtain the optimal parameter setting for the C-MODE since the analytical results reflect only portions of the heuristics involved in this evolutionary algorithm. The analysis rather provides some insights about the process inside the evolutionary black box and hopefully this can indicate some guidelines on the parameter setting when the fledged version of C-MODE is applied to the real world problems. The following paragraphs exemplify such guidelines by applying the C-MODE to a set of rather complicated benchmark functions.

We apply the C-MODE to the benchmark functions described [22] (the function 6 in [22] becomes T5 here) with different parameter settings to reflect the different scenarios according to the aforementioned mathematical results. The C-MODE is performed with a population size N = 100, mutation probability pr = 0.3, maximal generations Gmax = 250, crowd distance σcrowd = 0.001, and number of perturbation vectors K = 2. We set the greediness γ = 0.7 and perturbation factor F = 0.5 to reflect the case with 2KF 2 + (1 − γ)2 > 1 while the greediness γ = 0.7 and perturbation factor F = 0.2 to reflect the case with 2KF 2 + (1 − γ)2 < 1. Another parameter setting with the greediness γ = 0.1 and perturbation factor F = 0.3 reflects the case of 2KF 2 + (1 − γ)2 > 1 while imposing a tight restriction on q to satisfy the condition (1 − γ)q < 1. The simulation results obtained from 30 independent runs for each parameter setting are summarized in Table I in terms of the performance metrics. We want both d˜min and d¯q of the results to be small and close to each other. On the other hand, we need a reasonable number of effective solutions represented by C. The d˜min measures the closeness of the computed non-dominated front to the true Pareto front. The extra distance d¯q − d˜min measures by how much the computed non-dominated front fails to reflect the whole picture of the true Pareto front. The detailed performance metric is referred to [17]. It is shown from the results that the simulation results with γ = 0.7, F = 0.5 are better than the other cases. From the simulation results conducted on these complicated continuous benchmark functions, it is shown that the CMODE performs better when the parameters are set to meet the condition 2KF 2 + (1 − γ)2 > 1. The parameter setting satisfying this condition enable the C-MODE to fully explore the search space as evidenced by the exploded variance in the mathematical analysis. On the other hand, it is also beneficial to set the greediness parameter γ a relatively large value such that the condition (1 − γ)q < 1 can be met easily. It should be noted that simulations conducted on the benchmark functions and the simple quadratic functions are compared

TABLE I S TATISTICAL SIMULATION RESULTS OF 30 RUNS OF

THE CONTINUOUS

BENCHMARK FUNCTIONS USING DIFFERENT PARAMETER SETTINGS

Function

Results for γ = 0.7, F = 0.5 C d˜min Std Mean Std Mean 0.0004 0.003 0.0005 98.2 0.0006 0.003 0.0006 99.9 0.001 0.004 0.0006 85 0.33 0.67 0.25 27 0.024 0.01 0.017 50.8 Results for γ = 0.7, F = 0.2 C d¯q d˜min Std Mean Std Mean 0.024 0.049 0.023 71 0.52 0.33 0.21 27.1 0.016 0.007 0.002 79 1.26 1.6 1.03 5.6 0.676 0.36 0.45 22.2 Results for γ = 0.1, F = 0.3 d¯q d˜min C Std Mean Std Mean 0.003 0.009 0.003 96 0.02 0.019 0.015 88 0.004 0.006 0.0008 84 1.5 2.2 1.34 2.3 0.51 0.37 0.3 14.6 d¯q

T1 T2 T3 T4 T5

Mean 0.009 0.01 0.01 0.75 0.02

T1 T2 T3 T4 T5

Mean 0.054 0.54 0.028 2.18 0.6

T1 T2 T3 T4 T5

Mean 0.01 0.025 0.017 2.8 0.63

Std 0.73 0.18 1.83 11.3 10.6

Std 12 12.8 6.5 6.9 21.6

Std 2.5 13.3 2.4 2.7 19.5

based on the same setting of other important parameters such as the reproduction probability. This parameter has significant impacts on the performance of the C-MODE. However, a value between 0.1 to 0.4 is good to all the cases we have experimented. V. C ONCLUSION

AND FUTURE RESEARCH WORK

In this paper, a mathematical analysis is developed for the C-MODE based on global random search techniques. The convergence of the C-MODE population to be Pareto optimal solutions with probability one is derived. The C-MODE is further studied under the Gaussian initial population assumptions. The evolutionary operators of C-MODE are studied in detail. A set of parameter setting guidelines are derived based upon the mathematical analysis of the C-MODE. The simulation results of the C-MODE using a specific example validate the mathematical results and corresponding parameter setting guidelines. A set of rather complex benchmark functions are further simulated and statistical results from multiple runs also demonstrate the merits of the parameter setting guidelines. The mathematical analysis on the convergence of the CMODE in this work proves that the population converges to the Pareto optimal solution set. However, how this population is distributed in the decision space is still unknown. To facilitate an understanding at this level, we analyze the population behaviors based on the assumption of a Gaussian distributed initial population. This is also demonstrated by numerical simulation using a specific example. However, it is advantageous to develop a mathematical analysis to examine the behaviors of the populations in the evolutionary process in a more general context without the initial Gaussian assumption.

ACKNOWLEDGMENT This work has been conducted in the Electronics Agile Manufacturing Research Institute (EAMRI) at Rensselaer Polytechnic Institute. The EAMRI is partially funded by grant number DMI-#0121902 from National Science Foundation. Dr. Sanderson was supported in part by grant #IIS 0329837 from the National Science Foundation. R EFERENCES [1] A. Zhigljavsky, Theory of Global Random Search. Kluwer Academic, 1991. [2] D. Goldberg, Genetic algorithms in search, optimization and machine learning. Addison-Wesley, 1989. [3] C. Peck and A. Dhawan, “Genetic algorithms as global random search methods: an alternative perspective,” Evolutionary Computation, vol. 3, no. 1, pp. 39–80, 1995. [4] X. Qi and F. Palmieri, “Theoretical analysis of evolutionary algorithms with an infinite population size in continous space part i: basic properties of selection and mutation,” IEEE Transaction on Neural Networks, vol. 5, no. 1, pp. 102–119, 1994. [5] R. Subbu and A. Sanderson, “Modeling and convergence analysis of distributed coevolutionary algorithms,” in Proc. of IEEE Congress on Evolutionary Computation, 2000, pp. 1276–1283. [6] ——, “Modeling and convergence analysis of distributed coevolutionary algorithms,” IEEE Tran. of Systems, Man and Cybernetics, Part B, vol. 34, no. 2, pp. 806–822, 2004. [7] G. Rudolph, “Convergence analysis of canonical genetic algorithms,” IEEE Transaction on Neural Networks, vol. 5, no. 1, pp. 96–101, 1994. [8] ——, “Convergence of evolutionary algorithms in general search space,” in Proc. of IEEE Int. Conf. on Evolutionary Computation, 1996, pp. 50–54. [9] J. Suzuki, “A markov chain analysis on simple genetic algorithm,” IEEE Trans. on Systems, Man, and Cybernetics, vol. 25, no. 4, pp. 655–659, 1995. [10] D. Fogel, “Asymptotic convergence properties of genetic algorithms and evolutionary programming: analysis and experiments,” Cybernetics and Systems, vol. 25, no. 3, pp. 389–407, 1994. [11] G. Rudolph, “On a multi-objective evolutionary algorithm and its convergence to the pareto set,” in Proc. of IEEE Congress on Evolutionary Computation, 1998, pp. 511–516. [12] T. Hanne, “On the convergence of multiobjective evolutionary algorithms,” European Journal of Operational Research, no. 117, 1999. [13] M. Laumanns, L. Thiele, and E. Zitzler, “Running time analysis of multiobjective evolutionary algorithms on pseudo-boolean functions,” IEEE Trans. of Evolutionary Computation, vol. 8, no. 2, pp. 170–182, 2004. [14] F. Xue, A. Sanderson, and R. Graves, “Modeling and convergence analysis of a discrete multi-objective differential evolution algorithm,” in sub. to IEEE Congress on Evolutionary Computation, 2005. [15] ——, “Multi-objective differential evolution and its application to enterprise planning,” in Proc. of the IEEE Int. Conf. on Robotics and Automation, 2003. [16] ——, “Pareto-based multi-objective differential evolution,” in Proc. of IEEE Congress on Evolutionary Computation, 2003. [17] F. Xue, “Multi-objective differential evolution: Theory and applications,” Ph.D. dissertation, Rensselaer Polytechnic Institute, 2004. [18] R. Storn and K. Price, “Differential evolution: a simple and efficient adaptive scheme for global optimization over continuous spaces,” International Computer Science Institute, Berkeley, Tech. Rep. TR-95012, 1995. [19] R. Joshi and A. Sanderson, “Minimal representation multisensor fusion using differential evolution,” IEEE Trans. on Systems, Man, and Cybernetics, Part A, vol. 29, no. 1, pp. 63–76, 1999. [20] ——, Multisensor Fusion: A Minimal Representation Framework. World Scientific, 1999. [21] P.-L. Yu, Multiple-Criteria Decision Making. Plenum Press, 1985. [22] E. Zitzler, K. Deb, and L. Thiele, “Comparison of multiobjective evolutionary algorithms: empirical results,” Evolutionary Computation, vol. 8, no. 2, pp. 173–195, 2000.