J Optim Theory Appl (2013) 158:216–233 DOI 10.1007/s10957-012-0206-3
Convergence Analysis of the Gauss–Newton-Type Method for Lipschitz-Like Mappings M.H. Rashid · S.H. Yu · C. Li · S.Y. Wu
Received: 30 January 2011 / Accepted: 6 October 2012 / Published online: 20 October 2012 © Springer Science+Business Media New York 2012
Abstract We introduce in the present paper a Gauss–Newton-type method for solving generalized equations defined by sums of differentiable mappings and set-valued mappings in Banach spaces. Semi-local convergence and local convergence of the Gauss–Newton-type method are analyzed. Keywords Set-valued mappings · Lipschitz-like mappings · Generalized equations · Gauss–Newton-type method · Semi-local convergence
1 Introduction The generalized equation problem, defined by the sum of a differentiable mapping and a set-valued mapping in Banach spaces, was introduced by Robinson [1, 2] as
Communicated by Nguyen Don Yen. M.H. Rashid · C. Li () Department of Mathematics, Zhejiang University, Hangzhou 310027, P.R. China e-mail:
[email protected] M.H. Rashid e-mail:
[email protected] Present address: M.H. Rashid Department of Mathematics, University of Rajshahi, Rajshahi 6205, Bangladesh S.H. Yu Department of Mathematics, Zhejiang Normal University, Jinhua 321004, P.R. China e-mail:
[email protected] S.Y. Wu Department of Mathematics, National Cheng Kung University, Tainan, Taiwan e-mail:
[email protected] J Optim Theory Appl (2013) 158:216–233
217
a general tool for describing, analyzing, and solving different problems in a unified manner. This kind of generalized equation problems has been studied extensively. Typical examples are systems of inequalities, variational inequalities, linear and nonlinear complementary problems, systems of nonlinear equations, equilibrium problems, etc.; see, for example, [1–3]. The classical method for finding an approximate solution is the Newton-type method (see Algorithm 3.1 in Sect. 3), which was introduced by Dontchev in [4]. Under some suitable conditions, around a solution x ∗ of the generalized equation, it was proved in [4] that there exists a neighborhood U (x ∗ ) of x ∗ such that, for any initial point in U (x ∗ ), there exists a sequence generated by the Newton-type method that converges quadratically to the solution x ∗ . Generally, for an initial point, near to a solution, the sequences generated by the Newton-type method are not uniquely defined and not every generated sequence is convergent. The above convergence result established in [4] guarantees the existence of a convergent sequence. Therefore, from the viewpoint of practical computations, this kind of Newton-type methods would not be convenient in practical applications. This drawback motivates us to propose a (modified) Gauss–Newton-type method (see Algorithm 3.2 in Sect. 3), which indeed is an extension of the famous Gauss–Newton method for solving nonlinear least square problems and the extended Newton method for solving convex inclusion problems; see, for example, [5–11]. Usually, one’s interest in convergence issues about the Gauss–Newton-type method is focused on two types: local convergence, which is concerned with the convergence ball based on the information in a neighborhood of a solution, and semilocal analysis, which is concerned with the convergence criterion based on the information around an initial point. There has been fruitful work on semi-local analysis of the Gauss–Newton-type method for some special cases such as the Gauss–Newton method for nonlinear least square problems (cf. [6, 8, 10]), and the extended Gauss– Newton method for convex inclusion problems (cf. [12]). However, to the best of our knowledge, there is no study on semi-local analysis for the general case considered here, even for the Newton-type method. Our purpose in the present paper is to analyze semi-local convergence of the Gauss–Newton-type method. The main tool is the Lipschitz-like property for setvalued mappings, which was introduced in [13] by Aubin in the context of nonsmooth analysis, and studied by many mathematicians; see, for example, [14–18] and the references therein. Our main results are the convergence criteria established in Sect. 3, which, based on the information around the initial point, provide some sufficient conditions ensuring the convergence to a solution of any sequence generated by the Gauss–Newton-type method. As a consequence, local convergence results for the Gauss–Newton-type method are obtained. This paper is organized as follows. The next section contains some necessary notations and preliminary results. In Sect. 3, we introduce a Gauss–Newton-type method for solving the generalized equation, and establish existence results of solutions of the generalized equation and convergence results of the Gauss–Newton-type method for Lipschitz-like mappings. In the last section, we give a summary of the major results to close our paper.
218
J Optim Theory Appl (2013) 158:216–233
2 Notations and Preliminary Results Throughout the whole paper, we assume that X and Y are two real or complex Banach spaces. Let x ∈ X and r > 0. The closed ball centered at x with radius r is denoted by Br (x). Let F : X ⇒ Y be a set-valued mapping. The domain dom F , the inverse F −1 and the graph gph F of F are, respectively, defined by dom F := x ∈ X : F (x) = ∅ , F −1 (y) := x ∈ X : y ∈ F (x) for each y ∈ Y and
gph F := (x, y) ∈ X × Y : y ∈ F (x) .
Let A ⊂ X. The distance function of A is defined by dist(x, A) := inf x − a : a ∈ A for each x ∈ X, while the excess from the set A to a set C ⊆ X is defined by e(C, A) := sup dist(x, A) : x ∈ C . Recall from [19] the notions of pseudo-Lipschitz and Lipschitz-like set-valued mappings. These notions were introduced by Aubin in [13, 20], and have been studied extensively. In particular, connections to linear rate of openness, coderivative and metrically regularity of set-valued mappings were established by Penot and Mordukhovich; see, for example, [21, 22] and the book [19] for details. Definition 2.1 Let Γ : Y ⇒ X be a set-valued mapping, and let (y, ¯ x) ¯ ∈ gph Γ . Let rx¯ > 0, ry¯ > 0 and M > 0. Γ is said to be (a) Lipschitz-like on Bry¯ (y) ¯ relative to Brx¯ (x) ¯ with constant M iff the following inequality holds: e Γ (y1 ) ∩ Brx¯ (x), ¯ Γ (y2 ) ≤ My1 − y2 for any y1 , y2 ∈ Bry¯ (y). ¯ (b) Pseudo-Lipschitz around (y, ¯ x) ¯ iff there exist constants ry¯ > 0, rx¯ > 0 and M > 0 such that Γ is Lipschitz-like on Bry¯ (y) ¯ relative to Brx¯ (x) ¯ with constant M . The following lemma is useful and its proof is similar to that for [19, Theorem 1.49(i)]. Lemma 2.1 Let Γ : Y ⇒ X be a set-valued mapping, and let (y, ¯ x) ¯ ∈ gph Γ . As¯ relative to Brx¯ (x) ¯ with constant M. Then sume that Γ is Lipschitz-like on Bry¯ (y) dist x, Γ (y) ≤ M dist y, Γ −1 (x) for any x ∈ Brx¯ (x) ¯ and y ∈ B ry¯ (y) ¯ satisfying dist(y, Γ −1 (x)) ≤ 3
(1) ry¯ 3.
J Optim Theory Appl (2013) 158:216–233
219
Proof Let x ∈ Brx¯ (x) ¯ and y ∈ B ry¯ (y) ¯ be such that dist(y, Γ −1 (x)) ≤ 3
show that (1) holds. To do this, let 0 < ≤
ry¯ 3
ry¯ 3.
We have to
and y˜ ∈ Γ −1 (x) be such that
2ry¯ . y˜ − y ≤ dist y, Γ −1 (x) + ≤ 3
(2)
Then x ∈ Γ (y) ˜ and y˜ − y ¯ ≤ y˜ − y + y − y ¯ ≤ ry¯ , that is, y˜ ∈ Bry¯ (y). ¯ Thus, by the assumed Lipschitz-like property of Γ , we have ¯ Γ (y) ≤ My˜ − y. e Γ (y) ˜ ∩ Brx¯ (x), Since x ∈ Γ (y) ˜ ∩ Brx¯ (x), ¯ it follows from the definition of excess that dist x, Γ (y) ≤ e Γ (y) ˜ ∩ Brx¯ (x), ¯ Γ (y) ≤ My˜ − y. This together with (2) implies that dist x, Γ (y) ≤ M dist y, Γ −1 (x) as 0 < ≤
ry¯ 3
is arbitrary. This completes the proof.
We end this section with the following known lemma in [23]. Lemma 2.2 Let φ : X ⇒ X be a set-valued mapping. Let η0 ∈ X, r > 0 and 0 < λ < 1 be such that dist η0 , φ(η0 ) < r(1 − λ) (3) and
e φ(x1 ) ∩ Br (η0 ), φ(x2 ) ≤ λx1 − x2 for any x1 , x2 ∈ Br (η0 ).
(4)
Then φ has a fixed point in Br (η0 ), that is, there exists x ∈ Br (η0 ) such that x ∈ φ(x). If φ is additionally single-valued, then the fixed point of φ in Br (η0 ) is unique.
3 Convergence Analysis of the Gauss–Newton-Type Method Let f : Ω ⊆ X → Y be Fréchet differentiable with its Fréchet derivative denoted by ∇f , and let F : X ⇒ Y be a set-valued mapping with closed graph, where Ω is an open set in X. The generalized equation problem considered in the present paper is to find a point x ∈ Ω satisfying 0 ∈ f (x) + F (x). Fixing x ∈ X, by D(x) we denote the subset of X defined by D(x) := d ∈ X : 0 ∈ f (x) + ∇f (x)d + F (x + d) . Recall that the Newton-type method introduced in [4] is defined as follows.
(5)
220
J Optim Theory Appl (2013) 158:216–233
Algorithm 3.1 (The Newton-type method) Step 1. Step 2. Step 3. Step 4.
Select x0 ∈ X and put k := 0. If 0 ∈ D(xk ) then stop; otherwise, go to Step 3. Choose dk ∈ D(xk ) and set xk+1 := xk + dk . Replace k by k + 1 and go to Step 2.
The Gauss–Newton-type method we propose here is given in the following: Algorithm 3.2 (The Gauss–Newton-type method) Step 1. Select η ∈ [1, ∞[, x0 ∈ X and put k := 0. Step 2. If 0 ∈ D(xk ) then stop; otherwise, go to Step 3. Step 3. Choose dk ∈ D(xk ) such that dk ≤ η dist 0, D(xk ) .
(6)
Step 4. Set xk+1 := xk + dk . Step 5. Replace k by k + 1 and go to Step 2. We remark that in the case where F := 0, Algorithm 3.2 is reduced to the famous Gauss–Newton method, which is a well known iterative technique for solving nonlinear least square (model fitting) problems and has been studied extensively; see, for example, [5–10]; while in the case where F := C, where C is a closed convex cone, this algorithm is reduced to the extended Newton method for solving convex inclusion problems, which was presented and studied by Robinson in [11]. The Gauss– Newton method for solving convex composite optimization problems was studied in [12, 24, 25] and the references therein. 3.1 Linear Convergence Let x ∈ X and define the mapping Qx by Qx (·) := f (x) + ∇f (x)(· − x) + F (·). Then
D(x) = d ∈ X : 0 ∈ Qx (x + d) .
(7)
Moreover, the following equivalence is clear for any z ∈ X and y ∈ Y : z ∈ Q−1 x (y)
⇐⇒
y ∈ f (x) + ∇f (x)(z − x) + F (z).
(8)
In particular, x¯ ∈ Q−1 ¯ x¯ (y)
for each (x, ¯ y) ¯ ∈ gph(f + F ).
(9)
Let (x, ¯ y) ¯ ∈ gph(f + F ) and let rx¯ > 0, ry¯ > 0. Throughout the whole paper, we assume that Brx¯ (x) ¯ ⊆ Ω ∩ dom F and that the mapping Qx−1 ¯ (·) is Lipschitz-like on Bry¯ (y) ¯ relative to Brx¯ (x) ¯ with constant M, that is, ¯ Q−1 ¯ e Q−1 x¯ (y1 ) ∩ Brx¯ (x), x¯ (y2 ) ≤ My1 − y2 for any y1 , y2 ∈ Bry¯ (y).
(10)
J Optim Theory Appl (2013) 158:216–233
221
Let ε0 > 0 and write rx¯ (1 − Mε0 ) r¯ := min ry¯ − 2ε0 rx¯ , . 4M
(11)
ry¯ 1 . , ε0 < min 2rx¯ M
(12)
Then
r¯ > 0
⇐⇒
The following lemma plays a crucial role in convergence analysis of the Gauss– Newton-type method. The proof is a refinement of the one for [26, Lemma 1]. Lemma 3.1 Suppose that Q−1 ¯ relative to Brx¯ (x) ¯ with x¯ (·) is Lipschitz-like on Bry¯ (y) constant M and that ry¯ 1 ¯ ≤ ε0 < min . (13) sup ∇f (x) − ∇f (x) , 2rx¯ M x∈B rx¯ (x) ¯ 2
Let x ∈ B rx¯ (x). ¯ Then Q−1 ¯ relative to B rx¯ (x) ¯ with conx (·) is Lipschitz-like on Br¯ (y) 2
stant
M 1−Mε0 ,
2
that is,
¯ Q−1 e Q−1 x (y1 ) ∩ B rx¯ (x), x (y2 ) ≤ 2
M y1 − y2 for any y1 , y2 ∈ Br¯ (y). ¯ 1 − Mε0
Proof Note that (12) and (13) imply r¯ > 0. Now let y1 , y2 ∈ Br¯ (y) ¯
and x ∈ Q−1 ¯ x (y1 ) ∩ B rx¯ (x). 2
(14)
It suffices to show that there exists x ∈ Q−1 x (y2 ) such that x − x ≤
M y1 − y2 . 1 − Mε0
To this end, we shall verify that there exists a sequence {xk } ⊂ Brx¯ (x) ¯ such that ¯ k − xk−1 ) + F (xk ) y2 ∈ f (x) + ∇f (x)(xk−1 − x) + ∇f (x)(x
(15)
xk − xk−1 ≤ My1 − y2 (Mε0 )k−2
(16)
and
hold for each k = 2, 3, 4, . . . . We proceed by induction on k. Write ¯ + ∇f (x) ¯ x − x¯ zi := yi − f (x) − ∇f (x) x − x + f (x)
for each i = 1, 2.
¯ + x¯ − x ≤ rx¯ . It follows from (14) and the Note by (14) that x − x ≤ x − x relation r¯ ≤ ry¯ − 2ε0 rx¯ (thanks to (11)) that
222
J Optim Theory Appl (2013) 158:216–233
¯ − ∇f (x)(x ¯ − x) ¯ zi − y ¯ ≤ yi − y ¯ + f (x) − f (x) ¯ x − x + ∇f (x) − ∇f (x) ≤ r¯ + ε0 x − x ¯ + x − x
rx¯ ≤ r¯ + ε0 + rx¯ ≤ ry¯ . 2 ¯ for each i = 1, 2. Define x1 := x . Then x1 ∈ Q−1 That is zi ∈ Bry¯ (y) x (y1 ) by (14), and it follows from (8) that y1 ∈ f (x) + ∇f (x)(x1 − x) + F (x1 ), which can be rewritten as y1 + f (x) ¯ + ∇f (x)(x ¯ 1 − x) ¯ ∈ f (x) + ∇f (x)(x1 − x) + F (x1 ) + f (x) ¯ + ∇f (x)(x ¯ 1 − x). ¯ This, by the definition of z1 , means that z1 ∈ f (x) ¯ + ∇f (x)(x ¯ 1 − x) + F (x1 ). Hence −1 (z ) by (8), and then x ∈ Q (z ) ∩ B ( x) ¯ thanks to (14). By the assumed x1 ∈ Q−1 1 1 1 rx¯ x¯ x¯ −1 ¯ from (10) we infer Lipschitz-like property of Qx¯ (·) and noting that z1 , z2 ∈ Br¯ (y), (z ) with that there exists x2 ∈ Q−1 2 x¯ x2 − x1 ≤ Mz1 − z2 = My1 − y2 . Moreover, by the definition of z2 and noting x1 = x , we have −1 x2 ∈ Q−1 ¯ + ∇f (x)(x ¯ 1 − x) ¯ , x¯ (z2 ) = Qx¯ y2 − f (x) − ∇f (x)(x1 − x) + f (x) which, together with (14), implies that y2 ∈ f (x) + ∇f (x)(x1 − x) + ∇f (x)(x ¯ 2 − x1 ) + F (x2 ). This shows that (15) and (16) are true for the constructed points x1 , x2 . We assume that x1 , x2 , . . . , xn are constructed such that (15) and (16) are true for k = 2, 3, . . . , n. We need to construct xn+1 such that (15) and (16) are also true for k = n + 1. For this purpose, we write ¯ + ∇f (x)(x ¯ n+i−1 − x) ¯ zin := y2 − f (x) − ∇f (x)(xn+i−1 − x) + f (x) for each i = 0, 1. Then, by the inductive assumption, n z − zn = ∇f (x) ¯ − ∇f (x) (xn − xn−1 ) 0
1
≤ ε0 xn − xn−1 ≤ y1 − y2 (Mε0 )n−1 .
(17)
J Optim Theory Appl (2013) 158:216–233
Since x1 − x ¯ ≤
rx¯ 2
223
and y1 − y2 ≤ 2¯r by (14), it follows from (16) that xn − x ¯ ≤
n
xk − xk−1 + x1 − x ¯
k=2
≤ 2M r¯
n rx¯ (Mε0 )k−2 + 2 k=2
≤ By (11), we have r¯ ≤
rx¯ (1−Mε0 ) , 4M
2M r¯ rx¯ + . 1 − Mε0 2
and so xn − x ¯ ≤ rx¯ .
(18)
3 xn − x ≤ xn − x ¯ + x¯ − x ≤ rx¯ . 2
(19)
Consequently,
Furthermore, using (14) and (19), one has, for each i = 0, 1, n z − y¯ ≤ y2 − y ¯ − ∇f (x)(x ¯ − x) ¯ ¯ + f (x) − f (x) i ¯ (x − xn+i−1 ) + ∇f (x) − ∇f (x)
rx¯ 3rx¯ + ≤ r¯ + ε0 x − x ¯ + x − xn+i−1 ≤ r¯ + ε0 2 2 = r¯ + 2ε0 rx¯ . It follows from the definition of r¯ in (11) that zin ∈ Bry¯ (y) ¯ for each i = 0, 1. Since assumption (15) holds for k = n, we have y2 ∈ f (x) + ∇f (x)(xn−1 − x) + ∇f (x)(x ¯ n − xn−1 ) + F (xn ), which can be written as y2 + f (x) ¯ + ∇f (x)(x ¯ n−1 − x) ¯ ∈ f (x) + ∇f (x)(xn−1 − x) + ∇f (x)(x ¯ n − xn−1 ) + F (xn ) + f (x) ¯ + ∇f (x)(x ¯ n−1 − x), ¯ that is, z0n ∈ f (x) ¯ + ∇f (x)(x ¯ n − x) ¯ + F (xn ) by the definition of z0n . This, together −1 n ¯ By using (10) again, there exists an with (8) and (18), yields xn ∈ Qx¯ (z0 ) ∩ Brx¯ (x). −1 n element xn+1 ∈ Qx¯ (z1 ) such that xn+1 − xn ≤ M z0n − z1n ≤ My1 − y2 (Mε0 )n−1 , (20) where the last inequality holds by (17). By the definition of z1n , we have n −1 xn+1 ∈ Q−1 ¯ + ∇f (x)(x ¯ n − x) ¯ , x¯ z1 = Qx¯ y2 − f (x) − ∇f (x)(xn − x) + f (x)
224
J Optim Theory Appl (2013) 158:216–233
which, together with (8), implies y2 ∈ f (x) + ∇f (x)(xn − x) + ∇f (x)(x ¯ n+1 − xn ) + F (xn+1 ). This, together with (20), completes the induction step, and the existence of sequence {xn } satisfying (15) and (16) is established. Since Mε0 < 1, we see from (16) that {xk } is a Cauchy sequence, and hence it is convergent. Let x := limk→∞ xk . Then, taking limit in (15) and noting that F has closed graph, we get y2 ∈ f (x) + ∇f (x)(x − x) + F (x ) and so x ∈ Q−1 x (y2 ). Moreover, n x − x ≤ lim sup xk − xk−1 n→∞
≤ lim
n→∞
=
k=2
n
(Mε0 )k−2 My1 − y2
k=2
M y1 − y2 . 1 − Mε0
This completes the proof of the lemma.
For convenience, we define, for each x ∈ X, the mapping Zx : X → Y by Zx (·) := f (x) ¯ + ∇f (x)(· ¯ − x) ¯ − f (x) − ∇f (x)(· − x), and the set-valued mapping φx : X ⇒ X by
φx (·) := Q−1 x¯ Zx (·) .
(21)
Then Zx x − Zx x ≤ ∇f (x) ¯ − ∇f (x)x − x for any x , x ∈ X. (22) Our first main theorem, which provides some sufficient conditions ensuring the convergence of the Gauss–Newton-type method with initial point x0 , is as follows. ¯ relative to Theorem 3.1 Suppose that η > 1 and Q−1 x¯ (·) is Lipschitz-like on Bry¯ (y) Brx¯ (x) ¯ with constant M. Let (23) ε0 ≥ sup ∇f (x) − ∇f x , x,x ∈B rx¯ (x) ¯ 2
and let r¯ be defined by (11). Let δ > 0 be such that (a) δ ≤ min{ r4x¯ , 3εr¯ 0 , 5εy¯0 , 1}, (b) ε0 M(1 + 2η) ≤ 1, (c) y ¯ < ε0 δ. r
J Optim Theory Appl (2013) 158:216–233
Suppose that
225
lim dist y, ¯ f (x) + F (x) = 0.
x→x¯
(24)
Then there exists some δˆ > 0 such that any sequence {xn } generated by Algorithm 3.2 ¯ converges to a solution x ∗ of (5). with initial point in Bδˆ (x) Proof By assumption (b), we obtain η
q :=
1 ηMε0 1+2η ≤ = . 1 1 − Mε0 1 − 1+2η 2
Take 0 < δˆ ≤ δ be such that dist 0, f (x0 ) + F (x0 ) ≤ ε0 δ
for each x0 ∈ Bδˆ (x) ¯
(25)
(noting that such δˆ exists by (24) and assumption (c)). Let x0 ∈ Bδˆ (x). ¯ We will proceed by induction to show that Algorithm 3.2 generates at least one sequence and any sequence {xn } generated by Algorithm 3.2 satisfies the following assertions: xn − x ¯ ≤ 2δ
(26)
xn+1 − xn ≤ q n+1 δ
(27)
and for each n = 0, 1, 2, . . . . For this purpose, we define rx :=
3 ε0 Mx − x ¯ + My ¯ for each x ∈ X. 2
By assumptions (b) and (c), we see that 3Mε0 ≤ 1 and y ¯ < ε0 δ. It follows that 3 rx < (3Mε0 δ) ≤ 2δ 2
for each x ∈ B2δ (x). ¯
(28)
Note that (26) is trivial for n = 0. Firstly, we need to show that x1 exists and (27) holds for n = 0. To complete this, we have to prove that D(x0 ) = ∅ by applying ¯ Let us check that both assumptions (3) Lemma 2.2 to the mapping φx0 with η0 := x. ¯ ∩ Bδ (x) ¯ by and (4) of Lemma 2.2 hold with r := rx0 and λ := 13 . Since x¯ ∈ Q−1 x¯ (y) (9), according to the definition of the excess e and the mapping φx0 in (21), we obtain ¯ ≤ e Q−1 ¯ ∩ Bδ (x), ¯ φx0 (x) ¯ dist x, ¯ φx0 (x) x¯ (y)
¯ ∩ Brx¯ (x), ¯ Qx−1 ¯ (29) ≤ e Q−1 x¯ (y) ¯ Zx0 (x) (note that Bδ (x) ¯ ⊆ Brx¯ (x)). ¯ By the choice of ε0 , we see that Zx (x) − y¯ = f (x) ¯ + ∇f (x)(x ¯ − x) ¯ − f (x0 ) − ∇f (x0 )(x − x0 ) − y¯ 0 ¯ − f (x0 ) − ∇f (x0 )(x¯ − x0 ) ≤ f (x)
226
J Optim Theory Appl (2013) 158:216–233
¯ + ∇f (x0 ) − ∇f (x) ¯ x¯ − x + y ≤ ε0 x¯ − x0 + x¯ − x + y. ¯
(30)
¯ ≤ δˆ ≤ δ, 5δε0 ≤ ry¯ by assumption (a) and y ¯ < ε0 δ by assumpNote that x0 − x tion (c). It follows from (30) that, for each x ∈ Bδ (x), ¯ Zx (x) − y¯ ≤ ε0 x¯ − x0 + x¯ − x + y ¯ ≤ 5δε0 ≤ ry¯ . 0 In particular, Zx0 (x) ¯ ∈ Bry¯ (y) ¯
and Zx0 (x) ¯ − y¯ ≤ ε0 x¯ − x0 + y. ¯
Hence, by (29) and the assumed Lipschitz-like property, we have dist x, ¯ φx0 (x) ¯ ≤ M y¯ − Zx0 (x) ¯ ≤ Mε0 x0 − x ¯ + My ¯
1 = 1− rx0 = (1 − λ)r, 3 that is, assumption (3) of Lemma 2.2 is checked. ¯ Then we have x , x ∈ To fulfill assumption (4) of Lemma 2.2, let x , x ∈ Brx0 (x). ¯ ⊆ B2δ (x) ¯ ⊆ Brx¯ (x) ¯ by (28) and assumption (a), and Zx0 (x ), Zx0 (x ) ∈ Brx0 (x) Bry¯ (y) ¯ by (30). This, together with the assumed Lipschitz-like property, implies that e φx0 x ∩ Brx0 (x), ¯ φx0 x ≤ e φx0 x ∩ Brx¯ (x), ¯ φx0 x ∩ Brx¯ (x), ¯ Q−1 = e Q−1 x¯ Zx0 x x¯ Zx0 x ≤ M Zx0 x − Zx0 x . Applying (22), we get Zx x − Zx x ≤ ∇f (x) ¯ − ∇f (x0 )x − x ≤ ε0 x − x . 0 0 Combining the above two inequalities yields 1 e φx0 x ∩ Brx0 (x), ¯ φx0 x ≤ Mε0 x − x ≤ x − x = λx − x . 3 This means that assumption (4) of Lemma 2.2 is also checked. Thus Lemma 2.2 is applicable and there exists xˆ1 ∈ Brx0 (x) ¯ satisfying xˆ1 ∈ φx0 (xˆ1 ). Hence 0 ∈ f (x0 ) + ∇f (x0 )(xˆ1 − x0 ) + F (xˆ1 ) and so D(x0 ) = ∅. Below we show that (27) also holds for n = 0. Note by (23) that ¯ ε0 ≥ sup ∇f (x) − ∇f (x) x∈B rx¯ (x) ¯ 2
and note also that r¯ > 0 by assumption (a). Therefore assumption (13) is satisfied by (12). Since Q−1 ¯ relative to Brx¯ (x), ¯ it follows from x¯ (·) is Lipschitz-like on Bry¯ (y)
J Optim Theory Appl (2013) 158:216–233
227
Lemma 3.1 that the mapping Q−1 ¯ relative to B rx¯ (x) ¯ x (·) is Lipschitz-like on Br¯ (y) with constant
M 1−Mε0
2
for each x ∈ B rx¯ (x). ¯ In particular, Q−1 x0 (·) is Lipschitz-like on 2
M ¯ relative to B rx¯ (x) ¯ with constant 1−Mε as x0 ∈ Bδˆ (x) ¯ ⊂ Bδ (x) ¯ ⊂ B rx¯ (x) ¯ by Br¯ (y) 0 2 2 ˆ Furthermore, assumptions (a) and (c) imply that assumption (a) and the choice of δ.
r¯ y ¯ < ε0 δ ≤ , 3
(31)
r¯ dist 0, Qx0 (x0 ) = dist 0, f (x0 ) + F (x0 ) ≤ ε0 δ ≤ . 3
(32)
and (25) implies that
Thus Lemma 2.1 is applicable and hence by applying it we have dist x0 , Qx0 −1 (0) ≤
M dist 0, Qx0 (x0 ) 1 − Mε0
(noting that x0 ∈ B rx¯ (x) ¯ as observed earlier and 0 ∈ B r¯ (y) ¯ by (31)). This, together 2 3 with (7), yields dist 0, D(x0 ) = dist x0 , Q−1 x0 (0) ≤
M dist 0, Qx0 (x0 ) . 1 − Mε0
(33)
According to (6) in Algorithm 3.2 and using (32) and (33), we have x1 − x0 = d0 ≤ η dist 0, D(x0 ) ≤
ηM ηMε0 δ dist 0, Qx0 (x0 ) ≤ < qδ. 1 − Mε0 1 − Mε0
This shows that (27) holds for n = 0. We assume that x1 , x2 , . . . , xk are constructed such that (26) and (27) hold for n = 0, 1, . . . , k − 1. We show that there exists xk+1 such that assertions (26) and (27) hold for n = k. Since (26) and (27) are true for each n ≤ k − 1, we have the following inequality: xk − x ¯ ≤
k−1
di + x0 − x ¯ ≤δ
i=0
≤
k−1
q i+1 + δ
i=0
δq + δ ≤ 2δ. 1−q
This shows that (26) holds for n = k. Now with almost the same argument as we did for the case where n = 0, we can show that assertion (27) holds for n = k. The proof is complete. In particular, in the case where x¯ is a solution of (5), that is, y¯ = 0, Theorem 3.1 is reduced to the following corollary, which gives a local convergent result for the Gauss–Newton-type method.
228
J Optim Theory Appl (2013) 158:216–233
Corollary 3.1 Suppose that η > 1, 0 ∈ f (x) ¯ + F (x), ¯ and Q−1 x¯ (·) is pseudo-Lipschitz around (0, x). ¯ Let r˜ > 0, and suppose that ∇f is continuous on Br˜ (x) ¯ and (34) lim dist 0, f (x) + F (x) = 0. x→x¯
Then there exists some δˆ > 0 such that any sequence {xn } generated by Algorithm 3.2 ¯ converges to a solution x ∗ of (5). with an initial point in Bδˆ (x) Proof Since Q−1 ¯ there exist constants r0 , rˆx¯ x¯ (·) is pseudo-Lipschitz around (0, x), −1 and M such that Qx¯ (·) is Lipschitz-like on Br0 (y) ¯ relative to Brˆx¯ (x) ¯ with constant M. Then, for each 0 < r ≤ rˆx¯ , one has ¯ Q−1 e Q−1 x¯ (y1 ) ∩ Br (x), x¯ (y2 ) ≤ My1 − y2 for any y1 , y2 ∈ Br0 (0), that is, Q−1 ¯ with constant M. Let ε0 ∈ x¯ (·) is Lipschitz-like on Br0 (0) relative to Br (x) 1 ]0, 1[ be such that Mε0 ≤ 1+2η . By the continuity of ∇f , we can choose rx¯ ∈ ]0, rˆx¯ [ such that r2x¯ ≤ r˜ , r0 − 2ε0 rx¯ > 0 and ∇f (x) − ∇f x . ε0 ≥ sup x, x ∈B rx¯ (x) ¯ 2
Then
rx¯ (1 − Mε0 ) r¯ = min r0 − 2ε0 rx¯ , > 0, 4M
and
rx¯ r¯ r0 > 0. , min , 4 3ε0 5ε0
Thus we can choose 0 < δ ≤ 1 such that rx¯ r¯ r0 , . δ ≤ min , 4 3ε0 5ε0 Now it is routine to check that inequalities (a)–(c) of Theorem 3.1 are satisfied. Thus we can apply Theorem 3.1 to complete the proof. 3.2 Quadratic Convergence In the following theorem we show that, if the derivative of f is Lipschitz continuous around x, ¯ then any sequence generated by Algorithm 3.2, with initial point near x, ¯ is quadratically convergent. Theorem 3.2 Suppose that Q−1 ¯ relative to Brx¯ (x) ¯ with x¯ (·) is Lipschitz-like on Bry¯ (y) ¯ with Lipschitz constant L. Let constant M and ∇f is Lipschitz continuous on B rx¯ (x) 2 η > 1 and let 2 rx¯ (1 − MLrx¯ ) . r¯ := min ry¯ − 2Lrx¯ , 4M
J Optim Theory Appl (2013) 158:216–233
229
Let δ > 0 be such that r
y¯ , 6¯r , 1}, (a) δ ≤ min{ r4x¯ , 11L (b) (M + 1)L(ηδ + 2rx¯ ) ≤ 2, 2 (c) y ¯ < Lδ4 .
Suppose that (24) holds. Then there exists some δˆ > 0 such that any sequence {xn } ¯ converges quadratically to generated by Algorithm 3.2 with an initial point in Bδˆ (x) a solution x ∗ of (5). Proof By assumption (b), one sees q :=
LηMδ ≤ 1. 2(1 − MLrx¯ )
Select a δˆ ∈ ]0, δ] with Lδ 2 dist 0, f (x0 ) + F (x0 ) ≤ 4
for each x0 ∈ Bδˆ (x) ¯
¯ As in the proof (noting that such δˆ exists by (24) and assumption (c)). Let x0 ∈ Bδˆ (x). for Theorem 3.1, we use induction to show that Algorithm 3.2 generates at least one sequence and any sequence {xn } generated by Algorithm 3.2 satisfies the following assertions: ¯ ≤ 2δ xn − x
(35)
2n 1 δ dn ≤ q 2
(36)
and
for each n = 0, 1, 2, . . . . For this purpose, we define rx := Due to η > 1 and δ ≤
9 MLx − x ¯ 2 + 2My ¯ 10 rx¯ 4
for each x ∈ X.
in assumption (a), it follows from assumption (b) that
9(M + 1)Lδ = (M + 1)L(δ + 8δ) ≤ ML(ηδ + 2rx¯ ) ≤ 2. This gives MLδ ≤
2 9
2 and Lδ ≤ , 9
and so y ¯
0 by assumption (a). Therefore (12) and (41) imply that assumption (13) is satisfied with ε0 := Lrx¯ . By the choice of δˆ and assumption (a), one has x0 ∈ ¯ ⊂ Bδ (x) ¯ ⊂ B rx¯ (x). ¯ Since Q−1 ¯ relative to Brx¯ (x), ¯ Bδˆ (x) x¯ (·) is Lipschitz-like on Bry¯ (y) 2
it follows from Lemma 3.1 that Q−1 ¯ relative to B rx¯ (x) ¯ x0 (·) is Lipschitz-like on Br¯ (y) 2
M . The remainder of the proof is similar to the corresponding with constant 1−MLr x¯ part of the proof of Theorem 3.1 and so we omit it.
Consider the special case where x¯ is a solution of (5) (that is, y¯ = 0) in Theorem 3.2. The following corollary describes the local quadratic convergence of the Gauss–Newton-type method. The proof is similar to that we did for Corollary 3.1. Corollary 3.2 Suppose that η > 1, 0 ∈ f (x) ¯ + F (x), ¯ and Q−1 x¯ (·) is pseudo-Lipschitz ¯ around (0, x). ¯ Let r˜ > 0, and suppose that ∇f is Lipschitz continuous on Br˜ (x) with Lipschitz constant L and (34) holds. Then there exists some δˆ > 0 such that ¯ converges any sequence {xn } generated by Algorithm 3.2 with initial point in Bδˆ (x) quadratically to a solution x ∗ of (5). Remark 3.1 For the case where η = 1, the question whether the results are true for the Gauss–Newton-type method is a little bit complicated. However, from the proofs of the main theorems, one sees that all the results in the present paper remain true provided that, for any x ∈ Ω with D(x) = ∅, there exists d¯ ∈ D(x) such that ¯ = mind∈D(x) d. The following proposition provides some sufficient conditions d
232
J Optim Theory Appl (2013) 158:216–233
ensuring the existence of such d¯ ∈ D(x). Thus, by Proposition 3.1, Theorems 3.1 and 3.2 as well as Corollaries 3.1 and 3.2 are true for η = 1 provided that X is finite dimensional, or X is reflexive and the graph of F is convex. Proposition 3.1 Suppose that X is finite dimensional, or X is reflexive and the graph ¯ = mind∈D(x) d for any of F is convex. Then there exists d¯ ∈ D(x) such that d x ∈ Ω with D(x) = ∅. Proof Let x ∈ Ω be such that D(x) = ∅. By assumption, D(x) is closed. Hence the conclusion holds trivially, if X is finite dimensional. Below we consider the case where X is reflexive and the graph of F is convex. Then the set-valued mapping T (·) := f (x) + ∇f (x)(· − x) + F (·) has closed convex graph. Hence D(x) = T −1 (0) is weakly closed and convex. This means that D(x) is weakly compact; hence the conclusion follows.
4 Concluding Remarks We have established semi-local convergence and local convergence results for the Gauss–Newton-type method with η > 1 under the assumptions that Q−1 x¯ (·) is Lipschitz-like and ∇f is continuous. In particular, if ∇f is additionally Lipschitz continuous, we further show that the Gauss–Newton-type method is quadratically convergent. As noted in Remark 3.1, our results are also true for η = 1 in some special cases. The Gauss–Newton-type method and the established convergence results seem new for the generalized equation problem (5). Acknowledgements The authors thank the referees and the associate editor for their valuable comments and constructive suggestions which improved the presentation of this manuscript. Research work of the first author is fully supported by Chinese Scholarship Council, and research work of the third author is partially supported by National Natural Science Foundation (grant 11171300) and Zhejiang Provincial Natural Science Foundation (grant Y6110006) of China.
References 1. Robinson, S.M.: Generalized equations and their solutions, part I: basic theory. Math. Program. Stud. 10, 128–141 (1979) 2. Robinson, S.M.: Generalized equations and their solutions, part II: applications to nonlinear programming. Math. Program. Stud. 19, 200–221 (1982) 3. Ferris, M.C., Pang, J.S.: Engineering and economic applications of complementarity problems. SIAM Rev. 39, 669–713 (1997) 4. Dontchev, A.L.: Local convergence of the Newton method for generalized equation. C. R. Acad. Sci. Paris, Ser. I 322, 327–331 (1996) 5. Dedieu, J.P., Kim, M.H.: Newton’s method for analytic systems of equations with constant rank derivatives. J. Complex. 18, 187–209 (2002) 6. Dedieu, J.P., Shub, M.: Newton’s method for overdetermined systems of equations. Math. Comput. 69, 1099–1115 (2000) 7. Li, C., Zhang, W.H., Jin, X.Q.: Convergence and uniqueness properties of Gauss–Newton’s method. Comput. Math. Appl. 47, 1057–1067 (2004) 8. He, J.S., Wang, J.H., Li, C.: Newton’s method for underdetemined systems of equations under the modified γ -condition. Numer. Funct. Anal. Optim. 28, 663–679 (2007)
J Optim Theory Appl (2013) 158:216–233
233
9. Xu, X.B., Li, C.: Convergence of newton’s method for systems of equations with constant rank derivatives. J. Comput. Math. 25, 705–718 (2007) 10. Xu, X.B., Li, C.: Convergence criterion of Newton’s method for singular systems with constant rank derivatives. J. Math. Anal. Appl. 345, 689–701 (2008) 11. Robinson, S.M.: Extension of Newton’s method to nonlinear functions with values in a cone. Numer. Math. 19, 341–347 (1972) 12. Li, C., Ng, K.F.: Majorizing functions and convergence of the Gauss–Newton method for convex composite optimization. SIAM J. Optim. 18, 613–642 (2007) 13. Aubin, J.P.: Lipschitz behavior of solutions to convex minimization problems. Math. Oper. Res. 9, 87–111 (1984) 14. Jean-Alexis, C., Piétrus, A.: On the convergence of some methods for variational inclusions. Rev. R. Acad. Cien. Serie A. Mat. 102, 355–361 (2008) 15. Argyros, I.K., Hilout, S.: Local convergence of Newton-like methods for generalized equations. Appl. Math. Comput. 197, 507–514 (2008) 16. Dontchev, A.L.: The Graves theorem revisited. J. Convex Anal. 3, 45–53 (1996) 17. Haliout, S., Piétrus, A.: A semilocal convergence of a secant-type method for solving generalized equations. Positivity 10, 693–700 (2006) 18. Pietrus, A.: Does Newton’s method for set-valued maps converges uniformly in mild differentiability context? Rev. Colomb. Mat. 34, 49–56 (2000) 19. Mordukhovich, B.S.: Variational Analysis and Generalized Differentiation I: Basic Theory. Springer, Berlin (2006) 20. Aubin, J.P., Frankowska, H.: Set-Valued Analysis. Birkhäuser, Boston (1990) 21. Mordukhovich, B.S.: Sensitivity analysis in nonsmooth optimization. In: Field, D.A., Komkov, V. (eds.) Theoretical Aspects of Industrial Design. SIAM Proc. Appl. Math., vol. 58, pp. 32–46 (1992) 22. Penot, J.P.: Metric regularity, openness and Lipschitzian behavior of multifunctions. Nonlinear Anal. 13, 629–643 (1989) 23. Dontchev, A.L., Hager, W.W.: An inverse mapping theorem for set-valued maps. Proc. Am. Math. Soc. 121, 481–489 (1994) 24. Burke, J.V., Ferris, M.C.: A Gauss–Newton method for convex composite optimization. Math. Program. 71, 179–194 (1995) 25. Li, C., Wang, X.H.: On convergence of the Gauss–Newton method for convex composite optimization. Math. Program., Ser. A 91, 349–356 (2002) 26. Dontchev, A.L.: Uniform convergence of the Newton method for Aubin continuous maps. Serdica Math. J. 22, 385–398 (1996)