Simulated annealing type algorithms for multivariate optimization - MIT

Report 1 Downloads 64 Views
Algorithmica (1991) 6:419-436

Algorithmica 9 1991Springer-VerlagNewYorkInc.

Simulated Annealing Type Algorithms for Multivariate Optimization 1 Saul B. G e l f a n d 2 a n d S a n j o y K. M i t t e r 3 Abstract. We study the convergence of a class of discrete-time continuous-state simulated annealing type algorithms for multivariate optimization. The general algorithm that we consider is of the form Xk§ ~ = Xk -- ak(VU(Xk) + (k) + bkWk. Here U(-) is a smooth function on a compact subset of Nd, {r is a sequence of R>_r t w.p.1; Wk is independent o f ~ k for all k. For e > 0 let

dn"(x) = ~ / e x p

e2 j dx,

=

exp

~

j dx.

(A6) rc~ has a unique weak limit rc as e ~ 0. We remark that ~ concentrates on S*, the global minima of U(. ). The existence of ~ and a simple characterization in terms of the Hessian of U(.) is discussed in [14]. We also remark that under the above assumptions, it is clear that x(t) always stays in D, and it can be shown (see the remark following Proposition 1) that )~k eventually stays in D. For a process u(.) and function f ( . ) , let Et,.u,{f(u(t))} denote conditional expectation given u(t 1) = u 1 and let Eta,,I; t2,u2{ f(u(t))} denote conditional expectation given u(tl)= ul and u(t2)= u2 (more precisely, these are suitable fixed versions of conditional expectations). Also for a measure ~t(. ) and a function f ( . ) let #(f) = ~ f d~. By a modification of the main result of [11] there exists constants Co, C1 such that for C o < C < C 1 and any continuous function f ( . ) on D

(2.3)

lim E o . x { f ( x ( t ) ) }

= zc(f)

t~o9

uniformly for x ~ D (this modification follows easily from Lemma 3 below). The modification is needed here because [11] deals with a nondegenerate diffusion (a(x) = 1 for all x in (2.2)) while we are concerned with a degenerate diffusion (o-(x) --* 0 as [xl T r in (2.2)). The constant Co depends only on U(x) for Ix[ < ro and is defined in [11] in terms of the action functional for the dynamical system

Simulated Annealing Type Algorithms for Multivariate Optimization ~(t) = - V U ( z ( t ) ) . The constant by C1

C 1

depends only on U(x) for Ixl ~ ro and is given

~-( inf U ( x ) klxl=ra

=

423

sup U ( x ) ) . Ixl=ro

In [11] only C > Co and not C < Ca is required; however, U(x) and VU(x) must satisfy certain growth conditions as Ixl ~ oo. Note that a penalty function can be added to U(.) so that C1 is as large as desired. Here is our theorem on the convergence of {X,}. THEOREM. Let ~ > --1, fl > O, and C O < B / A < C1. Then for any continuous function f (. ) on D (2.4)

lim Eo,~{f(Xk) } = re(f) k~ao

uniformly for x E D. Since rc concentrates on S*, (2.3) and (2.4) imply x(t) --. S* as t --. oo and Xk --* S* as k ~ oo in probability, respectively. The proof of the theorem requires the following three lemmas. Let {tk} and fl(. ) be defined by k-1

tk = ~ an,

k=O,l,...,

n=O

ff

(~) log s du

=

S213,

s>l.

It is easy to check that fl(s) is well defined by this expression and in fact satisfies s q- s 2/3 ~ fl(s) ~_~s q- 2S2/3. LEMMA 1. Let ~ > -- 1, fl > O, and B / A = C. Then there exists ~ > 1 such that for any continuous function f ( . ) on D lim

sup

(Eo,x;.,,{f(Xk) } -- Et.,r{f(x(tk))} ) = 0

n~oO k;tn 0 for k large enough. Hence Xk e D for k large implies R k - akVU(Xk) ~ D for k large. Hence for k large enough

P{Xk+xq~D, ]VVk[ - 1 . PROPOSITION 2.

For each n let

{Un,k}k>_nbe a sequence of nonnegative numbers such

that Un,k+ 1 n,

Un,n 1, e > O, and M > O. Then there exists a ~ > 1 such that lim

sup

n~oo

k:tn~tk~tn

U,,,k = O.

Simulated Annealing Type Algorithms for Multivariate Optimization

427

PROOF. We m a y set M = 1 since a k = A / k for k large and the p r o o f is for arbitrary A > O. N o w k-1

k-1

U.,k } I + 2a2IE{(VU(Yk + Ak) -- VU(Yk), ~k>}] + 2a3/21E{}l, k > n. Let K1, K 2 be Lipshitz constants for VU(-), a(. ), respectively. Using the fact that Xk, Yk and hence A k are ~ k measurable, Wk is independent of ~ k , E{Wk} = 0, and

e{lCkl=l~k} ~ c3a~,

Ie(r

-< csa~k

w.p.1

we have

e{ivu(~

+ A~) - VU(~)l ~}

K2E(IAkl2},

E{l(~r(~ + A~) - ~(Y~))W~l ~} E(Ir 2} n. Substituting these expressions into (3.9) gives (after some simplification)

(3.10)

E{lAk+ll21{yc1,+~eo}c~{~k+,eo}} < (1 + c4ak)E{lA~,l z} + c4a~'E{IAkl} + c3a2+c~ 62 < (1 + c4ak)E{lAkl 2} + C4ak E{lAkl2} 1/2 + c3a2k +~ 62 _< (1 + csag)g{lAgl 2} + tsar,

_

k > 11,

where 51 = min{1 + fl,(3 + a)/2}>1 and 52 = min{61,2 § ~} > 1 since ~ > - 1 and fl > 0. Now combine (3.7), (3.8), and (3.10) to get 62 E{lAk+ll 2} _< (1 § c6ak)E{lAkl 2} + c6ak,

k >_ n,

E{IA,,I 2} =0 for n large enough. Applying Proposition 2 there exists ? > 1 such that (3.11)

lim

sup

E{[Ak[ 2} = 0.

n~o~ k:tn 0 such that [ f ( u ) - f ( v ) [ < e whenever ]u - v[ < 5 a n d u, v e D. Hence

IE{f(X,)} - E { f ( Y k ) } l < eP{IAgl < 5} + 2[[fItP{IAkl > 5} + 2]lft] E{IA,,/2},

< and by (3.11) li--m

]E{f(X,)}

sup

- E{f(Yk)}[ < ~,

n~oo k:tn b} < Ps'YI~s_fie -xT} < ~Es, r

-

y

~

c(u)a(x(u)) dw(u)

~s+ T

~2 .,s

Es, y{lc(u)a(x(u))l 2} du

82KT rl}.

[]

434

S.B. Gelfand and S. K. Mitter

Let C O < C < C1. Then there exists 8 > 0 such that

lim Ps,, {z >/~(s)} = 1 S~oo

uniformly for[y[ < ro + 8.

PROOf. Let 0 ( . ) be a twice continuously differentiable function from Ea to Ea such that for some R > r and K > 0

0(x)

-- g(x), Klxl2,

Ixl-< r, Ixl > R,

and VU(x) 4- 0 for r < Ixl < R (in view of (A1) such a [?(.) exists). For e > 0 let d~(t) = - V0(~(t)) dt + e dw(t)

and r = inf{t: I)~"(t)[ > rl}. For 0 < 8 < rl - ro let Ca(8 ) =

inf 0 ( x ) Ixl=r~

sup

U(x).

[xl=ro+~

On p. 750 of [11] it is shown that for any t / > 0 and 8 > 0

uniformly for lY] - ro + 8. Since C2(8) ~ 2C1 as 8 --, 0, it follows that for any t / > 0 there exists 8 > 0 such that

(4.3)

Vo,y{r > e x p ( ~ (2C 1 - t/))}-o 1

as

uniformly for [Yl -< ro + & Next let dR(t) = -VU(2(t)) dt + c(t) dw(t)

and = inf{t: 12(t)l > rl}.

e---,0

435

Simulated Annealing Type Algorithms for Multivariate Optimization

O n p. 745 of [11] it is shown that as

Ps, y{'~ > fl(s)} - P0,y{~ c(s) > s 2/3} ~ 0

(4.4)

s ~

uniformly for l yl ~ r. N o w choose q > 0 such that 1

2

-

_>

and choose fi > 0 such that (4.3) is satisfied. Hence using (4.3) and ( 4 . 4 ) e~,,{z > fl(s)} = es,~,{'~ > fl(s)} = Po.,{2 fl(s)} -- Po,y{2a') > s2/3})

{

> Po,y f:(~) > exp ~1

as

~ Ca - t/

+ o(1)

s ~ []

uniformly for [y] < ro + 6.

PROOF OF LEMMA 3. Let 2(. ) be defined as in the p r o o f of L e m m a 3.2. In L e m m a s 1-3 of [11] it is shown that

(4.5)

E~,y{f(2(fl(s)))} - rcc(')(f) ~ 0

as

s~

uniformly for ]y] __ r. By L e m m a 3.2 there exists 6 > 0 such that (4.6) [Es,,{f(x(fl(s)))}

- E~,r{f(2(fl(s)))}l

< [Es,/f(x(fl(s))) - f(2(fl(s))), ~ > fl(s)} [ + 2 IIf[[ P,,y{r < fl(s)} ~0

as

s ~

uniformly for [Y] -< ro + 3. Hence combining (4.5) and (4.6) and using L e m m a 3.1 there exists T > 0 such that

IE~,,{f(x(fl(s + T)))} - n~(~+ r)(f)] = IE~,r{E~ + r,x~, + r){ f(x(fl(s + T))) - n~(' + r)(f)}} [

[Es, y{Es + T, x(s + T){f(x(fl(S + T))) -- n c(s+ T)(f)} 1( I~(, + r) l- ro + 6} ~0

as

uniformly for [y] _< r.

s ~

[]

436

S.B. Gelfand and S. K. Mitter

References [1] [2] [3] [4] [5] [6] [7] [8] [9]

[10] [11] [12]

[13] [14] [15]

Wasan, M. T., Stochastic Approximation, Cambridge University Press, Cambridge, 1969. Rubinstein, R. Y., Simulation and the Monte Carlo Method, Wiley, New York, 1981. Ljung, L., Analysis of recursive stochastic algorithms, IEEE Transactions on Automatic Control, Vol. 22, pp. 551-575, 1977. Kushner, H., and Clark, D, Stochastic Approximation Methods for Constrained and Unconstrained Systems, Applied Mathematical Sciences, Vol. 26, Springer Verlag, Berlin, 1978. Metivier, M., and Priouret, P., Applications of a Kushner and Clark lemma to general classes of stochastic algorithm, IEEE Transactions on Information Theory, Vol. 30, pp. 140-151, 1984. Grenender, U., Tutorial in Pattern Theory, Division of Applied Mathematics, Brown University, Providence, RI, 1984. Geman, S., and Hwang, C. R., Diffusions for global optimization, SIAM Journal on Control and Optimization, Vol. 24, pp. 1031-1043, 1986. Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P., Optimization by simulated annealing, Science, Vol. 220, pp. 621-680, 1983. Gelfand, S. B., Analysis of Simulated Annealing Type Algorithms, Ph.D. Thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 1987. Gidas, B., Global optimization via the Langevin equation, Proceedings of the IEEE Conference on Decision and Control, pp. 774 778, Fort Lauderdale, FL, 1985. Chiang, T. S., Hwang, C. R., and Sheu, S. J., Diffusion for global optimization in ~n, SIAM Journal on Control and Optimization, Vol. 25, pp. 737-752, 1987. Kushner, H. J., Asymptotic global behavior for stochastic approximation and diffusions with slowly decreasing noise effects: Global minimization via Monte Carlo, SIAM Journal on Applied Mathematics, Vol. 47, pp. 169-185, 1987, Aluffi-Pentini, F., Parisi, V., and ZiriUi, F., Global optimization and stochastic differential equations, Journal of Optimization Theory and Applications, Vol. 47, pp. 1-16, 1985. Hwang, C.-R., Laplaces method revisited: weak convergence of probability measures, Annals of Probability, Vol. 8, pp. 1177-1182, 1980. Gikhman, I. I., and Skorohod, A. V., Stochastic Differential Equations, Springer-Verlag, Berlin, 1972.