Mathematical Proframming 81 (1998) 23-35
c 1998 The Mathematical Proframming Society, Inc. published by Elsevier Science B.V.
ON THE PROJECTED SUBGRADIENT METHOD FOR NONSMOOTH CONVEX OPTIMIZATION IN A HILBERT SPACE Ya.I. Alber* A.N. Iusemy and M.V. Solodovz Instituto de Matematica Pura e Aplicada Estrada Dona Castorina 110, Jardim Bot^anico Rio de Janeiro, RJ, CEP 22460-320, Brazil
Abstract We consider the method for constrained convex optimization in a Hilbert space, consisting of a step in the direction opposite to an "k -subgradient of the objective at a current iterate, followed by an orthogonal projection onto the feasible set. The normalized stepsizes k are exogenously given, 1 2 satisfying 1 k=0 k = 1, k=0 k < 1, and "k is chosen so that "k k for some > 0. We prove that the sequence generated in this way is weakly convergent to a minimizer if the problem has solutions, and is unbounded otherwise. Among the features of our convergence analysis, we mention that it covers the nonsmooth case, in the sense that we make no assumption of dierentiability of f , and much less of Lipschitz continuity of its gradient. Also, we prove weak convergence of the whole sequence, rather than just boundedness of the sequence and optimality of its weak accumulation points, thus improving over all previously known convergence results. We present also convergence rate results.
1991 AMS Classi cation numbers: 90C25, 90C30. Keywords: Convex optimization, nonsmooth optimization, projected gradient method, steepest descent method, weak convergence, convergence rate.
* Permanent address: Department of Mathematics, The Technion { Israel Institute of Technology, 32000 Haifa, Israel. y Research of this author was partially supported by CNPq grant no. 301280/86. z Research of this author was partially supported by CNPq grant no. 300734/95-6.
1. Introduction We consider in this paper an extension of the projected subgradient method for convex optimization in a Hilbert space H . Let C be a closed and convex subset of H and f : H ! R a convex and continuous function. The problem under consideration is min f (x)
(1)
s.t. x 2 C:
(2)
The projected subgradient method consists of generating a sequence fxk g, by taking from xk a step in the direction opposite to a subgradient of f at xk and then projecting the resulting vector orthogonally onto C . When C = H and f is dierentiable this is just the steepest descent method. Dierent variants of the method arise according to the rule used to choose the stepsizes. Frequently, these are chosen so as to ensure functional decrease at each iteration, e.g. through either exact onedimensional minimization or an Armijo-type search. The rst option cannot be implemented in actual computation and the second one works only when f is smooth. In the nonsmooth case the only reasonable alternative P1 seems to be exogenously given stepsizes. In this paper we use stepsizes k satisfying k=0 k = 1, P1 2 k=0 k < 1. This selection rule has been considered several times in the literature (e.g. [7], [18]). We also generalize the projected subgradient method by allowing inexact computation of the subgradient: the k-th direction needs not be a subgradient of f at xk but rather an "k -subgradient, where f"k g is a nonincreasing sequence of nonnegative numbers satisfying "k k for some > 0. We remark that these two features (exogenously given stepsizes and inexact subgradients) have as a consequence that the sequence of functional values needs not be decreasing, which provokes considerable complications in the convergence analysis. Nevertheless, we establish that the sequence fxk g is always a \minimizing" one (in the sense that lim inf k!1 f (xk ) = inf x2C f (x)), that it is weakly convergent to a solution of (1){(2) when this problem has solutions, and that it is unbounded otherwise. We emphasize three features of our convergence analysis: 1) We make no dierentiability assumptions on f , and much less on Lipschitz continuity of its gradient. Convexity and continuity of f are enough. We also need no boundedness assumption either on C or on the level sets of f ; in fact the solution set might even be unbounded. 2) We prove weak convergence of the whole sequence to a solution (provided that a solution exists) rather than just boundedness of the sequence and optimality of all its weak accumulation points. 3) All our results hold in a Hilbert space (of course, in the nite dimensional case we get strong, rather than weak, convergence). In the following section, after a formal statement of the algorithm, we will compare our result with other related results in the literature, particularly in connection with the 2
features mentioned above.
2. Statement of the algorithm and discussion of related results
Let H be a Hilbert space, C a closed and convex subset of H , and f : H ! R a convex and continuous function. We assume that f is nite valued, so that its eective domain is H . We remind that for " 0 the "-subdierential of f at x is the set @"f (x) de ned by
@"f (x) = fu 2 H : f (y) ? f (x) hu; y ? xi ? " for all y 2 H g:
(3)
Since f is convex and continuous, and its eective domain is H , @"f (x) is nonempty for all " 0 and all x 2 H [10, Lemma, p. 174 and Theorem 9, p. 112]. We also mention that a sucient condition for continuity of a convex function f at any x in H is boundedness of f at some neighborhood of some x 2 H [10, Theorem 8, p. 110]. We need the following boundedness assumption on @" f . S A) @" f is bounded on bounded sets, i.e. x2B @" f (x) is bounded for any bounded subset B of H . In connection with (A) we mention that @" f is always locally bounded (i.e. for any S X 2 H there exists a neighborhood V of X such that x2V @" f (x) is bounded). This followsSfrom local boundednessS of @f [19, Theorem 1] and the fact that for all bounded B , Diam( x2B @" f (x)) Diam( x2B @f (x)) + "= Diam(B ), where Diam(B ) = supfkx ? yk : x; y 2 B g [5, Lemma 1]. In nite dimension, this result implies, through an easy compactness argument, that (A) always holds, but this is not space, as P the case?1in a 2Hilbert n the following example shows: let H = `2 and f (x) = 1 (2 n ) ( x ) . It is easy to n n=1 2 n ? 1 check that f is well de ned, convex and dierentiable, with rf (x)n = (xn ) . Take now e j 2 `2 de ned as ejn = 2jn (Kronecker's delta) and observe that ej = 2 for all j while
rf (ej ) = 22j ?1 , i.e. rf is unbounded in the ball with center at 0 and radius 2. A suf cient (indeed also necessary)S condition for (A) to hold is that jf j is bounded on bounded sets: in order to prove that x2B @" f (x) is bounded, take u 2 @"f (x), let x0 = x + u= kuk and get, by de nition of @" f , kuk = hu; x0 ? xi f (x0) ? f (x)+ " jf (x0)j + jf (x)j + ". Let B 0 = fx 2 H : ky ? xk 1 for some y 2 B g. Then B 0 is bounded and x0 2 B 0, so that we get a bound of kuk in terms of " and the bounds of jf j on B and B 0. We also remind that the subdierential @f (x) of f coincides with @0 f (x), i.e. the right hand side of (3) with " = 0. Take a sequence fk g of nonnegative real numbers satisfying 1 X k=0
1 X k=0
k = 1;
(4)
2k < 1;
(5)
3
and a nonincreasing sequence of nonnegative real numbers f"k g such that there exists > 0 satisfying "k k (6) for all k. Let PC : H ! C be the orthogonal projection onto C . The algorithm is de ned as follows.
Initialization.
x0 2 H:
(7) k k k k k Iterative step. Given
x , if 0 2 @f (x ) then stop. Otherwise, take u 2 @"k f (x ), u 6= 0, k let k = maxf1; u g and de ne
xk+1 = PC (xk ? k uk ) k
(8)
with k ; "k , satisfying (4){(6). We make now some remarks on Algorithm (7){(8). First, note that k = "k = k1 satis es (4){(6). Secondly, in connection with the stopping criterion, it is usually assumed, in the nonsmooth case, that an \oracle" is available which can provide an "-subgradient of f at any x 2 H . Our stopping criterion requires a little bit more: besides the oracle, we assume that we have a procedure which decides whether a given vector (the null vector in our case) is or not a subgradient of f at x. This looks reasonable, since checking the subgradient inequality for a given vector should be easier than nding a vector which satis es it, but if such a procedure is not available, then the iterative step should be rewritten as: \Take uk 2 @"k f (xk ); if uk = 0 stop, otherwise let k = : : : ". In this case two consequences follow. First, in the stopping case we can only ensure that xk is an "k -solution (meaning that f (xk ) f (x) + "k , where x is a solution of (1){(2)). Secondly, the sequence can hit an exact solution at iteration k and nevertheless continue, converging eventually to the same solution or to another one. We discuss next convergence results on algorithms related to (7){(8). First we mention that assuming dierentiability of f , nite dimension of H and Lipschitz continuity of rf with constant L, it is rather straightforward to prove that fxk g is bounded and all its accumulations points are solutions of (1){(2), when the problem has solutions. In this case (5) can be relaxed to k < 2L?1 (see e.g. [18] for the nite dimensional case, i.e. H = Rn ). The unrestricted case (i.e. C = H ) in a Hilbert space with nonsmooth f and exact subgradients (i.e. xk+1 = xk ? k uk = uk , with uk 2 @f (xk )) is considered in [17], where it is proved mainly that the sequence is minimizing (i.e. lim inf k!1 f (xk ) = inf x2H f (x)) but no results are given on convergence of fxk g. In this work (5) is relaxed to limk!1 k = 0. The constrained case with exact subgradients and the same rule for k (i.e. with limk!1 k = 0 instead of (5)) is studied in [8], but only in the case of nite dimensional H . In addition, boundedness of C or of the level sets of f is assumed. The result is the same as in [17], namely that fxk g is a minimizing sequence. In [1] the unrestricted case is considered in a Hilbert space with f dierentiable and rf Lipschitz or Holder continuous. k is given by explicit formulae in terms of the Lipschitz or Holder constants. Under these hypotheses it is proved that if the problem has solution then 4
fxk g is bounded and all its weak accumulation points are optimal. For nite dimensional H , convergence of the whole sequence fxk g to a solution is established. The algorithm uses exact gradients (i.e. uk = rf (xk )). This work, as well as [2], also presents convergence rate
results, which are related to our results in Section 3. The constrained case with nonsmooth f , and k satisfying (4) and limk!1 k = 0 (instead of (5)), is studied in [2]. The iterative step is given by a formula similar to (8) with uk 2 @f (xk ), but an error term is allowed, as in [20], which we discuss below. It is assumed that @f is uniformly monotone, i.e. hu ? v; x ? yi '(kx ? yk) for all x; y 2 H , u 2 @f (x), v 2 @f (y), where ': R+ ! R+ satis es '(0) = 0 and some additional regularity conditions. In fact, the algorithm is discussed in the context of variational inequalities and a general operator T is used instead of @f . We mention that uniform monotonicity of @f does not imply dierentiability of f , but it implies uniqueness of the solution. Under this rather strong assumption on @f is is possible to prove strong convergence of fxk g to the unique solution of the problem. The unconstrained case with exact subgradients and our rule
for k (i.e. (4){(5)) is k +1 k k studied in [7]. The iteration is of the form x = x ? k u = uk with uk 2 @f (xk ). In this is work convergence of the whole sequence to a solution is proved (provided that the problem has solutions) without further assumptions on f , like those used in [1] and [2], but the result is obtained only in the nite dimensional case, previously considered in [11] and [16]. In in nite dimension, it is only proved in [7] that fxk g is a minimizing sequence, like in [8] and [17]. The case of inexact subgradients is considered in [2] and [20]. The iteration in [2] and [20] is of the form xk+1 = PC (xk ? k (uk + vk )) with uk 2 @f (xk ). In [2] it is assumed that limk!1 vk = 0. In [20], on the other hand, the hypothesis is that vk , i.e. an error of magnitude less that is allowed in the computation of the subgradient. In [20] f is not assumed to be convex, just locally Lipschitzian, but in the convex case this algorithm is virtually identical to (7){(8), since @"f (x) contains and is contained in the image through @f of appropriate balls around x. [20] characterizes the set of attractors of the sequence fxk g, which is shown to consist of approximate solutions of the problem. Finally we mention brie y some similar results for the unconstrained and smooth case with an Armijo-type rule for the k 's. It is rather straightforward to prove boundedness of the sequence and optimality of the accumulation points assuming boundedness of the level sets of f and Lipschitz continuity of its gradient, and this result can be found in several text books (e.g. [15] and [18]). Without such assumptions, i.e. assuming just convexity and continuous dierentiability of f , together with existence of solutions, convergence of the whole sequence to one solution has been established in [4], [6] and [13] for nite dimensional spaces, and in [21] for Hilbert spaces. Our analysis in this paper uses several results which appear in [6] and [21], particularly the notion of quasi-Fejer convergence. Some of the results presented here have been further extended by the authors in the subsequent paper [3], where convergence analysis for subgradient-type methods is developed 5
for uniformly smooth and uniformly convex Banach spaces.
3. Convergence analysis.
We need rst two preliminary results unrelated to the algorithm. The rst one is needed mainly to ensure uniqueness of the weak accumulation point of fxk g. The second one, on series of nonnegative real numbers, is related to conditions (4), (5) on the k 's. Loosely speaking, condition (5) ensures that the stepsizes are small enough to guarantee boundedness of fxk g, while (4) ensures that they are not too small, in which case fxk g could get stuck midway to the solution set, i.e. converge to a point which is not a solution. Our second preliminary result will be used together with (5) to establish that fxk g is a minimizing sequence, i.e. that lim inf k!1 f (xk ) = inf x2C f (x). In order to state our rst result we need a de nition. De nition 1. Let H be a Hilbert space and V a nonempty subset of H . A sequence exists k~ 0 and fxk g H is said to be quasi-FejerPconvergent to V i for all x 2 V there
2 2 k+1 k a sequence fk g R+ such that 1 k=0 k < 1 and x ? x x ? x + k for all k k~. This de nition originates in [9] and has been further elaborated in [12].
Proposition 1. If fxk g is quasi-Fejer convergent to V then i) fx k g is bounded,
ii) f xk ? x 2 g converges for all x 2 V , iii) if all weak accumulation points of fxk g belong to V then fxk g is weakly convergent, i.e. it has a unique accumulation point.
2
Proof: i) Using recurrently De nition 1 for k > k~, xk ? x 2
xk~ ? x
+ Pkj=?k~1 j
2 P
k~ 1
x ? x + j =0 j . So the tail of the sequence, i.e. fxk gk>k~ , is contained in a certain
result follows. ball centered at x, and the
k
ii) The sequence f x ? x 2 g is bounded by (i). Assume that it has two accumulations points, say and . Take subsequences fxjk g and fx`k g of fxk g such that
2 2 limk!1
xjk ? x = , limk!1 x`k ? x = . Fix > 0. Take k^ such that `k^ > k~ and
^. Take k k^ such that P1
x`k ? x 2 + =2 for all k > k i=`k i =2. Using recurrently De nition 1, we get, for all k such that jk > `k ,
j
x k
?
x 2
`
x k
?
k ?1
2 jX
x + i
i=`k
`
x k
+ 2 + 2 +
?
1
2 X
x + i i=`k
x`k ? x 2 + 2
(9) 6
using `k^ > k~ in the leftmost inequality and k > k^ in the rightmost inequality. Taking limits in (9) as k goes to 1, we get + for all > 0. It follows that . Reversing the roles of fx`k g and fxjk g a similar argument shows that , and we conclude therefore
xk ? x 2 g coincide, i.e. that that = . It follows that all accumulation points of f
f xk ? x 2 g converges (not necessarily to 0). iii) Existence of weak accumulation points of fxk g follows from (i). Let x~; x^ be two weak accumulation points of fxk g and fx~jk g, fx ^jk g be two subsequences of fxk g weakly convergent to x~; x^ respectively. Let = limk!1 xk ? x~ 2 , = limk!1 xk ? x^ 2. and exist by (ii), since x~; x^ belong to V by hypothesis. Let ! = kx^ ? x~k2 . Then
^jk
x
~jk
x
?
2 x~
=
x^jk
2 x^
=
x~jk
?
2 x^
+
kx^ ? x~k2 + 2
2 x~
+
D
^jk
E
x ? x^; x^ ? x~ ;
D
(10)
E
kx~ ? x^k2 + 2 x~jk ? x~; x~ ? x^ : (11) ? Take limits in (10), (11) as k goes to 1, observing that the inner products in the right hand sides of (1), (11) converge to 0 because x^; x~ are the weak limits of fx^jk g, fx~jk g respectively, and get, using the de nitions of ; ; !, ?
= + !;
(12)
= + !: (13) >From (12), (13), we get ? = ! = ? , which implies ! = 0, i.e. x~ = x^. It follows that all weak accumulation ponts of fxk g coincide, i.e. that fxk g is weakly convergent.
A slightly stronger result holds in the nite dimensional case: it is enough to have one accumulation point in V in order to ensure convergence of fxk g. The proof, much easier than in the Hilbert space case, can be found in [6]. In the nite dimensional case, as a consequence of the observation just made, item (ii) of the following proposition is not needed. The result of Proposition 1(ii) in the nite dimensional case appears in [16, Lemma 3.2.1].
P1 Proposition 2. . Let f g , f g R . Assume that 0 for all k 0, k k k k=0 k = 1, P1 ~ ~ k=0 k k < 1 and there exists k 0 such that k 0 for k k. Then i) There exists a subsequence f i(k) g of f k g such that limk!1 i(k) = 0. ii) If, additionally, there exists > 0 such that k+1 ? k k for all k then
limk!1 k = 0.
that k for Proof: i) If the result doesn't hold thenPthere exists > 0 and k k~ such P1 P1 1 all k k, so that 1 > k=k k k k=k k , in contradiction with k=0 k = 1. ii) By (i) there exists a subsequence f i(k) g of f k g such that limk!1 i(k) = 0. If the result doesn't hold then there exists some > 0 and some other subsequence f m(k) g 7
of f k g such that m(k) for all k. In this case, we can construct a third subsequence f j(k) g of f k g, where the subindices j (k) are chosen in the following way: and, given j (2k),
j (0) = minf` 0 : ` g
(14)
j (2k + 1) = minf` j (2k) : ` 2 g;
(15)
j (2k + 2) = minf` j (2k + 1) : ` g: (16) Note that the existence of the subsequences f i(k) g, f m(k) g guarantees that j (k) is well de ned for all k 0. Observe also that, by (15), (16), (17) for j (2k) ` j (2k + 1) ? 1: `
2
P Then, since 1 k=0 k k < 1, we have, in view of (17),
1>
1 X k=0
k k
+1)?1 1 j (2kX X
k=0 `=j (2k)
+1)?1 1 j (2kX X ` ` 2 ` :
(18)
k=0 `=j (2k)
P1 P ?1 Let k = j`=(2jk(2+1) k) ` . It follows from (18) that k=0 k < 1, implying
lim = 0: k!1 k
(19)
On the other hand, by (15), (16), we have j(2k) , j(2k+1) =2, so that for all k,
2
j (2k) ? j (2k+1) =
j (2kX +1)?1 `=j (2k)
( ` ? `+1 )
j (2kX +1)?1 `=j (2k)
` =
k
(20)
using the hypothesis of (ii) in the rightmost inequality of (20). By (20), k =(2) for all k, in contradiction with (19). The contradiction arises from assuming that there exists a subsequence of f k g which is bounded away from 0, and therefore limk!1 k = 0. To nish with the preliminaries, we gather in the following proposition two well known facts on orthogonal projections, to be used in the sequel.
Proposition 3. i) kPC (y) ? PC (z)k ky ? zk for all y; z 2 H . ii) hy ? y; y ? PC (y)i 0 for all y 2 H , y 2 C . Proof: See [18, p. 121]. 8
The following lemma contains the main ideas of our result. It is written in an indirect way (with a hypothesis on existence of some x~) in order to cover both the cases of nonempty and of empty solution set. For x 2 C , let L(x) = fy 2 C : f (y) f (x)g.
Lemma 1. If the algorithm generates an in nite sequence and there exists x~ 2 C and k~ 0 such that f (~x) f (xk ) for all k k~, then i) fxk g is quasi-Fejer convergent to L(~x), ii) ff (xk )g is a convergent sequence, and limk!1 f (xk ) = f (~x), iii) The sequence fxk g is weakly convergent to some x 2 L(~x). Proof: Take any x 2 L(~x). Let zk = xk ? (k =k )uk , k = f (xk ) ? f (x). It follows from (8) that xk 2 C for all k 1, so that PC (xk ) = xk and therefore
k+1
x (21) ? xk = PC (zk ) ? PC (xk ) zk ? xk = k uk k k
uk
k . We proced to prove that fxk g is quasi-Fejer conusing Proposition 3(i) and vergent to L(~x). In the following chain of equalities and inequalities, where we establish a summable upper bound of k k =k , the equalities are trivial and the inequalities are justi ed immediately below. We have
2k + xk ? x 2 ? xk+1 ? x 2 xk+1 ? xk 2 + xk ? x 2 ? xk+1 ? x 2
= 2 xk ? x; xk ? xk+1 = 2 xk ? x; xk ? zk + 2 xk ? x; zk ? xk+1
= 2 k uk ; xk ? x + 2 xk ? zk ; zk ? xk+1 + 2 zk ? x; zk ? xk+1 k
= 2 k uk ; xk ? x + 2 xk ? zk ; zk ? xk+1 + 2 zk ? x; zk ? PC (zk ) k
2 k uk ; xk ? x + 2 xk ? zk ; zk ? xk+1 k
= 2 k uk ; xk ? x + 2 xk ? zk ; zk ? xk + 2 xk ? zk ; xk ? xk+1 k
2 k uk ; xk ? x ? 2 xk ? zk 2 ? 2 xk ? zk xk ? xk+1 k 2 2
2 k uk ; xk ? x ? 2 2k uk 2 ? 2 k uk k k k
k k k k 2 k 2 u ; x ? x ? 4k 2 [f (x ) ? f (x) ? "k ] ? 42k k k k k 2 = 2 k ? 2 "k ? 4k k k k 2 k ? 2k "k ? 42k 2 k k ? (2 + 4)2k k
k
(22)
using (21) in the rst inequality, Proposition 3(ii) in the second
one, Cauchy-Schwartz inequality in the third one, (21) again in the fourth one, uk k in the fth one, de nition of @"k f (xk ) in the sixth one, k 1 in the seventh one and (6) in the eight one. 9
Since x 2 L(~x) we have, for k k~, f (x) f (~x) f (xk ), so that k = f (xk ) ? f (x) 0. Therefore we get from (22), 0 2 k k xk ? x 2 ? xk+1 ? x 2 + (2 + 5)2k
(23)
k
2 , = P1 2 . By (5), < 1 so that P1 k = (2 +5) < for k k~. Let k = (2 +5) k=0 k k=0
k
1. Since xk+1 ? x 2 xk ? x 2 + k for k k~ by (23) and x is an arbitrary element of L(~x), we conclude that fxk g is quasi-Fejer convergent to L(~x), and therefore (i) holds. ii) fxk g is bounded by (i) and Proposition 1(i). let B be a bounded set containing fxk g and " = supf"k g. Then
uk 2 @"k f (xk ) @" f (xk )
[
y2B
@"f (y):
(24)
By (24) and assumption (A), fuk g is bounded, so that there exists > 1 such that uk
for all k. Therefore k = maxf1; uk g maxf1; g = . By (23)
0 2 k k xk ? x 2 ? xk+1 ? x 2 + (2 + 5)2k
(25)
for k k~ and x 2 L(~x). Summing (25) from k = k~ to n n n X
0
2 n+1
2
2X 2
x0 ? x 2 + (2 + 5) : (26) x ? x ? x ? x + (2 + 5) k k k k=k~
k=k~
2 P 0 By (26), 1 k=k~ k k (=2)( x ? x + (2 + 5) ) < 1, implying
1 X k=0
k k < 1:
(27)
Up to now, x is any element of L(~x). Take now x = x~ so that k = f (xk ) ? f (~x). Observe that
k+1 ? k = f (xk+1) ? f (xk ) uk+1 ; xk+1 ? xk + "k+1
uk+1 xk+1 ? xk + "k ( + )k
(28)
using de nition of @"k+1 f (xk+1) in the rst inequality, "k+1 "k and Cauchy-Schwartz inequality in the second one and (6) together with (21) in the third one. Let = + . Since k 0 for k k~, we are, in view of (27) and (28), within the hypotheses of Proposition 2, and we can conclude that limk!1 k = 0, i.e. that limk!1 f (xk ) = f (~x). 10
iii) Let x be a weak accumulation point of fxk g, which exists by (i) and Proposition 1(i). If fxjk g is a subsequence of fxk g whose weak limit is x, then we have, since convex functions are weakly lower semicontinuous, inf f (xjk ) = klim f (xk ) = f (~x): f (x) lim k!1 !1
(29)
It follows from (29) that x 2 L(~x), noting that x 2 C because C is closed and convex, henceforth weakly closed. We have proved that all weak accumulation points of fxk g belong to L(~x). By (i) and Proposition 1(iii), we conclude that there exists only one accumulation point, i.e. that fxk g is weakly convergent to some x 2 L(~x). Finally, we state and prove our main convergence result.
Theorem 1. i) If Algorithm (7){(8) generates an in nite sequence then
lim inf k!1 f (xk ) = inf x2C f (x). ii) If the set S of solutions of problem (1){(2) is nonempty then either Algorithm (7){(8) stops at some iteration k, in which case xk 2 S , or it generates an in nite sequence which converges weakly to some x 2 S . iii) If S is empty then fxk g is unbounded.
Proof: i) Let f = inf x2C f (x) (possibly f = ?1). Since xk 2 C for all k 1, we have lim inf k!1 f (xk ) f . Assume lim inf k!1 f (xk ) > f . Then there exists x~ such that k ) > f (~x): lim inf f ( x k!1
(30)
It follows from (30) that there exists k~ such that f (xk ) f (~x) for all k k~. By Lemma 1(ii) limk!1 f (xk ) = f (~x), in contradiction with (30). The result follows. ii) Since S 6= ;, take any x 2 S , in which case L(x) = S . By optimality of x , f (xk ) f (x) for all k. Apply Lemma 1(iii) with x~ = x , k~ = 0, and conclude that fxk g converges weakly to some x 2 S . iii) Assume that S is empty but fxk g is bounded. Let fxjk g be a subsequence of fxk g such that limk!1 f (xjk ) = lim inf k!1 f (xk ). Since fxjk g is bounded, without loss of generality (i.e. re ning fxjk g if necessary), we may assume that fxjk g converges weakly to some x 2 C . By weak lower semicontinuity of f
f (x) lim inf f (xjk ) = klim f (xjk ) = lim inf f (xk ) = f k!1 !1 k!1
(31)
using (i) in the equality. By (31), x belongs to S , in contradiction with the hypothesis. It follows that fxk g is bounded. We make a few comments on the results of Theorem 1. To our knowledge, this is the rst proof of convergence of the whole sequence generated by (7){(8) to a unique weak limit, without assuming nite dimensionality (as in [7]) or uniform monotonicity of @f (as in [2]). Additionally, our analysis includes the feature of approximate subgradients. The result of Theorem 1(i), on the other hand, is similar to the result in [7], excepting for the 11
inclusion of inexact subgradients and constrained problems, which are not considered in [7]. We remark also that (5) (i.e. summability of 2k ) is needed only to establish quasi-Fejer convergence of fxk g to S in the case of nonempty S . It is easy to check that if fxk g is known to be bounded beforehand (e.g. when C is bounded) then our results hold also with limk!1 k = 0 instead of (5). Finally, we present a convergence rate result for the sequence of functional values ff (xk )g. Theorem 2. If problem (1) has solutions and the sequence fxk g generated by Algorithm (7){(8) is in nite, then there exists a subsequence fx`k g of fxk g such that f (x`k ) ? f (x) P` k ( j=0 j )?1 , where x is any solution of (1){(2). Proof: We look at the proof of Lemma 1 with x~ = x 2 S , k~ = 0, k = f (xk ) ? f (x ). By P1 Pk (27), 0 k=0 k k < 1. Let sk = j=0 j , N1 = fk : k s?k 1 g, N2 = fk : k > s?k 1g. Suppose that N1 is nite. Then there exists k such that xk 2 N2 for all k k, so that 1 1 X k X k k k k < 1: s k k=0 k=k k =k 1 X
It follows that 1 the other hand,PAbel-Dini's for divergent k=0 k =sk < 1. POn Pcriterion 1 1 n series [see 14, x39] states that P if n=0 n = 1 then n=0 [n =( j=0 j )] = 1. So we conclude, in view of (4), that 1 implies that N1 is k=0 k =sk = 1. This contradiction ` k k in nite, and we can take fx g as consisting precisely of those x with k 2 N1 . This result does not give any information on the asymptotic behavior of ff (xk )g outside the subsequence fx`k g. If we assume that f is Gateaux dierentiable, that its gradient is uniformly continuous and that "k = 0 for all k (i.e. uk = rf (xk ) in (8)), then we can get results on the asymptotic behavior of the whole sequence ff (xk )g. More precisely, if ': R+ ! R+ is a continuous and nondecreasing function such that '(0) = 0 and krf (x) ? rf (y)k '(kx ? yk) for all x; y 2 H , then we get, in addition to the result of Theorem 2, that `k +1 `k +`k '(`k ) and i `k +1 for all i such that `k +1 i < `k+1. The proof is rather involved and we will not develop it in this paper. For the nite dimensional case, a sharper and nonasymptotic convergence rate result can be found in [16, Theorem 3.2.2]. P
Acknowledgement. The rst author thanks the Institute for Pure and Applied Mathematics (IMPA), at Rio de Janeiro, Brazil, where he was a visiting professor while this paper was written.
References
[1] Alber, Ya.I. On minimization of smooth functional by gradient methods. USSR Computational Mathematics and Mathematical Physics 11 (1971) 752{758. [2] Alber, Ya.I. Recurrence relations and variational inequalities. Soviet Mathematics, Doklady 27 (1983) 511{517. 12
[3] Alber, Ya.I., Iusem, A.N., Solodov, M.V. Minimization of nonsmooth convex functionals in Banach spaces. Journal of Convex Analysis, to appear. [4] Bereznyev, V.A., Karmanov, V.G., Tretyakov, A.A. The stabilizing properties of the gradient method. USSR Computational Mathematics and Mathematical Physics 26 (1986) 84{85. [5] Brnsted, A., Rockafellar, R.T. On the subdierentiability of convex functions. Proceedings of the Americal Mathematical Society 16 (1965) 605{611. [6] Burachik, R., Gra~na Drummond, L.M., Iusem, A.N., Svaiter, B.F. Full convergence of the steepest descent method with inexact line searches. Optimization 32 (1995) 137{146. [7] Correa, R., Lemarechal, C. Convergence of some algorithms for convex minimization. Mathematical Programming 62 (1993) 261{275. [8] Ermoliev, Yu.M. Methods for solving nonlinear extremal problems. Cybernetics 2 (1966) 1{17. [9] Ermoliev, Yu.M. On the method of generalized stochastic gradients and quasi-Fejer sequences. Cybernetics 5 (1969) 208{220. [10] Giles, J.R. Convex Analysis with Applications in Dierentiation of Convex Functions . Research Notes in Mathematics 58, Pitman, Boston (1982). [11] Golstein, E., Tretyakov, N. Modi ed Lagrangian Functions . Moscow (1989). [12] Iusem, A.N., Svaiter, B.F., Teboulle, M. Entropy-like proximal methods in convex programming. Mathematics of Operations Research 19 (1994) 790{814. [13] Kiwiel, K., Murty, K. Convergence of the steepest descent method for minimizing quasi convex functions. Journal of Optimization Theory and Applications 89 (1996) 221-226. [14] Knopp, K. Theory and Application of In nite Series. Dover, New York (1990). [15] Minoux, M. Mathematical Programming, Theory and Algorithms . John Wiley, New York (1986). [16] Nesterov, Yu. Eective Methods in Nonlinear Programming . Moscow (1989). [17] Polyak, B.T. A general method of solving extremum problems. Soviet Mathematics Doklady 8 (1967) 593{597. [18] Polyak, B.T. Introduction to Optimization . Optimization Software, New York (1987). [19] Rockafellar, R.T. Local boundedness of nonlinear monotone operators. Michigan Mathematical Journal 16 (1969) 397{407. [20] Solodov, M.V., Zavriev, S.K. Error stability properties of generalized gradient-type algorithms. Journal of Optimization Theory and Applications 98 (1998), to appear. [21] Svaiter, B.F. Steepest descent method in Hilbert spaces with Armijo search (to be published). 13