J Optim Theory Appl (2014) 163:105–129 DOI 10.1007/s10957-013-0489-z
Inexact Alternating-Direction-Based Contraction Methods for Separable Linearly Constrained Convex Optimization Guoyong Gu · Bingsheng He · Junfeng Yang
Received: 19 February 2013 / Accepted: 17 November 2013 / Published online: 13 December 2013 © Springer Science+Business Media New York 2013
Abstract Alternating direction method of multipliers has been well studied in the context of linearly constrained convex optimization. In the last few years, we have witnessed a number of novel applications arising from image processing, compressive sensing and statistics, etc., where the approach is surprisingly efficient. In the early applications, the objective function of the linearly constrained convex optimization problem is separable into two parts. Recently, the alternating direction method of multipliers has been extended to the case where the number of the separable parts in the objective function is finite. However, in each iteration, the subproblems are required to be solved exactly. In this paper, by introducing some reasonable inexactness criteria, we propose two inexact alternating-direction-based contraction methods, which substantially broaden the applicable scope of the approach. The convergence and complexity results for both methods are derived in the framework of variational inequalities. Keywords Alternating direction method of multipliers · Separable linearly constrained convex optimization · Contraction method · Complexity
Communicated by Panos M. Pardalos.
B
G. Gu ( ) · B. He · J. Yang Department of Mathematics, Nanjing University, Nanjing, 210093, China e-mail:
[email protected] B. He e-mail:
[email protected] J. Yang e-mail:
[email protected] 106
J Optim Theory Appl (2014) 163:105–129
1 Introduction Because of its significant efficiency and easy implementation, the alternating direction method of multipliers (ADMM) has attracted wide attention in various areas [1, 2]. In particular, some novel and attractive applications of the ADMM have been discovered very recently; e.g., total-variation regularization problems in image processing [3], 1 -norm minimization in compressive sensing [4], semidefinite optimization problems [5], the covariance selection problem and semidefinite least squares problem in statistics [6, 7], the sparse and low-rank recovery problem in engineering [8], etc. The paper is organized as follows. As a preparation, in the next section, we give a brief review of ADMM, which serves as a motivation of our inexact alternatingdirection-based contraction methods. Some basic properties of the projection mappings and variational inequalities are recalled as well. Then, in Sect. 3 we present two contraction methods, which are based on two descent directions generated from an inexact alternating minimization of LA defined in (4). The rationale of the two descent search directions follows in Sect. 4, and the convergence and complexity results are proved in Sects. 5 and 6, respectively. Finally, some conclusions are drawn in Sect. 7.
2 Preliminaries Before introducing the ADMM, we briefly review the augmented Lagrangian method (ALM), for which we consider the following linearly constrained convex optimization problem: min θ (x) | Ax = b, x ∈ X . (1) Here θ (x) : Rn → R is a convex function (not necessarily smooth), A ∈ Rm×n , b ∈ Rm , and X ⊆ Rn is a closed and convex set. For solving problem (1), the classical ALM generates a sequence of iterates via the following scheme: x k+1 = arg min{θ (x) − λk , Ax − b + 12 Ax − b2H | x ∈ X }, λk+1 = λk − H (Ax k+1 − b), where H ∈ Rm×m is a positive definite scaling matrix penalizing the violation of the linear constraints, and λk ∈ Rm is the associated Lagrange multiplier; see, e.g., [9, 10] for more details. An important special scenario of (1), which captures concrete applications in many fields [6, 11–13], is the following case, where the objective function is separable into two parts: (2) min θ1 (x1 ) + θ2 (x2 ) | A1 x1 + A2 x2 = b, xi ∈ Xi , i = 1, 2 . Here n1 + n2 = n, and, for i = 1, 2, θi : Rni → R are convex functions (not necessarily smooth), Ai ∈ Rm×ni , and Xi ⊆ Rni are closed and convex sets. For solving (2), the ADMM, which dates back to [14] and is closely related to the Douglas–Rachford
J Optim Theory Appl (2014) 163:105–129
107
operator splitting method [15], is perhaps one of the most popular methods. Given (x2k , λk ), the ADMM generates (x1k+1 , x2k+1 , λk+1 ) via the following scheme: ⎧ k+1 ⎪ x = arg min{θ1 (x1 ) − λk , A1 x1 + A2 x2k − b + 12 A1 x1 + A2 x2k − b2H | ⎪ ⎪ 1 ⎪ ⎪ x1 ∈ X1 }, ⎨ k+1 x2 = arg min{θ2 (x2 ) − λk , A1 x1k+1 + A2 x2 − b + 12 A1 x1k+1 + A2 x2 − b2H | ⎪ ⎪ ⎪ x2 ∈ X2 }, ⎪ ⎪ ⎩ k+1 k = λ − H (A1 x1k+1 + A2 x2k+1 − b). λ Therefore, ADMM can be viewed as a practical and structure-exploiting variant (in a split or relaxed form) of the classical ALM for solving the separable problem (2), with the adaptation of minimizing the involved separable variables separately in an alternating order. In this paper, we consider a more general separable case of (1) in the sense that the objective function is separable into finitely many parts: N N
min θi (xi )
Ai xi = b, xi ∈ Xi , i = 1, . . . , N , (3) i=1
i=1
ni where N i=1 ni = n, and, for i = 1, . . . , N , θi : R → R are convex functions (not m×n m i necessarily smooth), Ai ∈ R , b ∈ R , and Xi ⊆ Rni are closed and convex sets. Without loss of generality, we assume that the solution set of (3) is nonempty. Recently, the ADMM was extended to handle (3) by He et al. [16]. For convenience, we denote the augmented Lagrangian function of (3) by N 2
N N 1 LA (x1 , . . . , xN , λ) := θi (xi ) − λ, Ai xi − b + Ai xi − b , (4) 2 i=1
∈ Rm×m
i=1
i=1
H
where H is a symmetric positive definite scaling matrix, and λ ∈ Rm is the k , λk ), the algorithm in [16] generates the Lagrangian multiplier. Given (x1k , . . . , xN k+1 k+1 , λ ) in two steps. First, the algorithm produces a trial next iterate (x1k+1 , . . . , xN k k k ˜ point (x˜1 , . . . , x˜N , λ ) by the following alternating minimization scheme: k , x , x k , . . . , x k , λk ) | x ∈ X }, i = 1, . . . , N, x˜ik = arg min{LA (x˜1k , . . . , x˜i−1 i i+1 i i N N ˜λk = λk − H ( i=1 Ai x˜ k − b). i (5) k+1 k+1 k , λk ) and Then, the next iterate (x1k+1 , . . . , xN , λ ) is obtained from (x1k , . . . , xN k ,λ ˜ k ). To the best of our knowledge, it is the first time that the trial point (x˜1k , . . . , x˜N tractable algorithms for (3) based on the full utilization of their separable structure have been developed. As demonstrated in [16], the resulting method falls into the frameworks of both the descent-like methods in the sense that the iterates generated by the extended ADMM scheme (5) can be used to construct descent directions and the contraction-type methods (according to the definition in [17]) as the distance between the iterates and the solution set of (3) is monotonically decreasing. Therefore, the method is called the alternating-direction-based contraction method.
108
J Optim Theory Appl (2014) 163:105–129
From a practical point of view, however, in many cases, solving each subproblem in (5) accurately is either expensive or even impossible. On the other hand, there seems to be little justification of the effort required to calculate the accurate solutions at each iteration. In this paper, we develop two inexact methods for solving problem (3), in which the subproblems in (5) are solved inexactly. So, the methods presented in this paper are named inexact alternating-direction-based contraction methods. In the sequel, we briefly review some basic properties and related definitions that will be used in the forthcoming analysis. We consider the generally separable linearly constrained convex optimization problem (3). For i = 1, . . . , N , we let fi (xi ) ∈ ∂(θi (xi )), where ∂(θi (xi )) denotes the subdifferential of θi at xi . Moreover, we let W := X1 × · · · × XN × Rm . Then, it is evident that the first-order optimality condition of (3) is equivalent to the following variational inequality for finding ∗ , λ∗ ) ∈ W: (x1∗ , . . . , xN x − xi∗ , fi (xi∗ ) − ATi λ∗ ≥ 0, i ∗ λ − λ∗ , N i=1 Ai xi − b ≥ 0, By defining ⎛
⎞ x1 ⎜ .. ⎟ ⎜ ⎟ w := ⎜ . ⎟ ⎝xN ⎠ λ
i = 1, . . . , N,
∀(x1 , . . . , xN , λ) ∈ W. (6)
⎛
⎞ f1 (x1 ) − AT1 λ ⎜ ⎟ .. ⎜ ⎟ . ⎟ and F (w) := ⎜ ⎜fN (xN ) − AT λ⎟ ⎝ N ⎠ N i=1 Ai xi − b
the problem (6) can be rewritten in a more compact form as w − w ∗ , F w ∗ ≥ 0 ∀w ∈ W, which we denote by VI(W, F ). Recall that F (w) is said to be monotone iff u − v, F (u) − F (v) ≥ 0 ∀u, v ∈ W. One can easily verify that F (w) is monotone whenever all fi , i = 1, . . . , N , are monotone. Under the nonempty assumption of the solution set of (3), the solution set, W ∗ , of VI(W, F ) is nonempty and convex. Given a positive definite matrix √ G of size (n + m) × (n + m), we define the Gnorm of u ∈ Rn+m as uG = u, Gu. The projection onto W under the G-norm is defined as PW ,G (v) := arg min v − wG | w ∈ W . From the above definition it follows that v − PW ,G (v), G w − PW ,G (v) ≤ 0 ∀v ∈ Rn+m , ∀w ∈ W. Consequently, we have PW ,G (u) − PW ,G (v) ≤ u − vG G
∀u, v ∈ Rn+m
(7)
J Optim Theory Appl (2014) 163:105–129
109
and PW ,G (v) − w 2 ≤ v − w2 − v − PW ,G (v)2 G G G
∀v ∈ Rn+m , ∀w ∈ W. (8)
An important property of the projection is contained in the following lemma, for which the omitted proof can be found in [18, pp. 267]. Lemma 2.1 Let W be a closed convex set in Rn+m , and let G be any (n + m) × (n + m) positive definite matrix. Then w ∗ is a solution of VI(W, F ) if and only if w ∗ = PW ,G w ∗ − αG−1 F w ∗ ∀α > 0. In other words, we have w ∗ = PW ,G w ∗ − αG−1 F w ∗
w − w∗ , F w∗ ≥ 0
w ∗ ∈ W,
⇔
∀w ∈ W.
(9)
3 Two Contraction Methods In this section, we present two algorithms, each of which consists of a search direction and a step length. The search directions are based on the inexact alternating minimization of LA , while both algorithms use the same step length. 3.1 Search Directions k , λk ), the alternating direction scheme of the kth iteration For given w k = (x1k , . . . , xN k ,λ k ˜ k ) via the following procedure: generates a trial point w˜ = (x˜1k , . . . , x˜N
For i = 1, . . . , N , x˜ik is computed as x˜ik = PXi
i N Aj x˜jk + Aj xjk − b + ξik , x˜ik − fi x˜ik − ATi λk + ATi H j =1
j =i+1
(10) where ξik ≤ Ai xik − x˜ik H Then, we set
1 2 and xik − x˜ik , ξik ≤ Ai xik − x˜ik H . 4
N λ˜ k = λk − H Aj x˜jk − b .
(11)
(12)
j =1
We claim that the above approximation scheme is appropriate for inexactly solving the minimization subproblems, i.e., k k k min LA x˜1k , . . . , x˜i−1 (13) , xi , xi+1 , . . . , xN , λk | xi ∈ X i .
110
J Optim Theory Appl (2014) 163:105–129
Assume that x¯ik is an optimal solution of (13). By the definition of LA in (4), the first-order optimality condition of (13) reduces to finding x¯ik ∈ Xi such that
i−1 N xi − x¯ik , fi x¯ik −ATi λk −H Aj x˜jk +Ai x¯ik + Aj xjk −b ≥0 j =1
∀xi ∈ Xi
j =i+1
where fi (x¯ik ) is a subgradient of θi (xi ) at x¯ik , i.e., fi (x¯ik ) ∈ ∂(θi (x¯ik )). According to Lemma 2.1, the above variational inequality is equivalent to
x¯ik
= PXi
x¯ik
i−1 N k T k T k k k − fi x¯i − Ai λ + Ai H Aj x˜j + Ai x¯i + Aj xj − b . j =1
j =i+1
If x˜ik is an optimal solution of (13), then by setting ξik = 0 in (10) the approximation conditions in (11) are satisfied. In fact, if Ai (xik − x¯ik )H = 0, then the approximation conditions in (11) require that the corresponding subproblem (10) be solved exactly. Thus, without loss of generality, we may assume that k Ai x − x¯ k = 0. i i H Suppose that xˆik is an approximate solution of (13), i.e., xˆik ≈ PXi
i−1 N Aj x˜jk + Ai xˆik + Aj xjk − b . xˆik − fi xˆik − ATi λk + ATi H j =1
j =i+1
By setting x˜ik = PXi
i−1 N Aj x˜jk + Ai xˆik + Aj xjk − b xˆik − fi xˆik − ATi λk + ATi H j =1
j =i+1
simple manipulation shows that (10) holds with ξik = x˜ik − xˆik − f x˜ik − f xˆik − ATi H Ai x˜ik − xˆik . When xˆik is sufficiently close to the exact solution x¯ik , the relations between x¯ik , xˆik , and x˜ik ensure that xˆik ≈ x˜ik
and Ai xik − x˜ik H ≈ Ai xik − x¯ik H .
Therefore, for a suitable approximate solution xˆik , we can define an x˜ik such that the inexactness conditions (10) and (11) are satisfied.
J Optim Theory Appl (2014) 163:105–129
111
Instead of accepting w˜ k as a new iterate, we use it to generate descent search directions. In the sequel, we define the matrix M and the vector ξ k as ⎛
AT1 H A1
⎜ T ⎜ A H A1 ⎜ 2 .. M := ⎜ ⎜ . ⎜ ⎝ AT H A1 N
0
0 AT2 H A2 .. . ATN H A2 0
··· .. . .. . ··· ···
0 .. . 0 ATN H AN 0
⎞ ξ1k ⎜ξ k ⎟ ⎜ 2⎟ ⎜ . ⎟ ⎟ and ξ k := ⎜ ⎜ .. ⎟ . ⎜ ⎟ ⎝ξ k ⎠ N 0 (14)
⎞ 0 .. ⎟ . ⎟ ⎟ ⎟ 0 ⎟ ⎟ 0 ⎠ H −1
⎛
Then, our two search directions are, respectively, given by d1 w k , w˜ k , ξ k = M w k − w˜ k − ξ k and
(15)
⎛
⎞ AT1 N . ⎟ k k k k ⎜ ⎜ .. ⎟ d2 w , w˜ , ξ = F w˜ + ⎜ ⎟ H Aj xjk − x˜jk . ⎝AT ⎠ j =1 N 0
(16)
Based on w k , w˜ k , and ξ k , we define
N k k k k k k k k k k k k ϕ w , w˜ , ξ = w − w˜ , d1 w , w˜ , ξ + λ − λ˜ , Aj xj − x˜j . (17) j =1
The function ϕ(w k , w˜ k , ξ k ) is a key component in analyzing the proposed methods. In the subsequent section, we prove (in Theorem 4.1) that
k k 2 k ˜k 2 ϕ(w k , w˜ k , ξ k ) ≥ 14 ( N j =1 Aj (xj − x˜ j )H + λ − λ H −1 ), ϕ(w k , w˜ k , ξ k ) = 0 ⇔ w˜ k ∈ W ∗ .
(18)
Hence, ϕ(w k , w˜ k , ξ k ) can be viewed as an error measuring function, which measures how much w k fails to be in W ∗ . Furthermore, by utilizing ϕ(w k , w˜ k , ξ k ) we will prove in Theorems 4.2 and 4.3 that, for any given positive definite matrix G ∈ R(n+m)×(n+m) , both −G−1 d1 (w k , w˜ k , ξ k ) and −G−1 d2 (w k , w˜ k , ξ k ) are descent directions with respect to the unknown distance function w − w ∗ 2G . 3.2 The Step Length In this subsection, we provide a step length for the search directions d1 (w k , w˜ k , ξ k ) and d2 (w k , w˜ k , ξ k ). Later, we will justify the choice of the step length to be defined and show that the step length has a positive lower bound.
112
J Optim Theory Appl (2014) 163:105–129
Given a positive definite matrix G ∈ R(n+m)×(n+m) , the new iterate w k+1 is generated as w k+1 = w k − αk G−1 d1 w k , w˜ k , ξ k (19) or
w k+1 = PW ,G w k − αk G−1 d2 w k , w˜ k , ξ k ,
(20)
ϕ(w k , w˜ k , ξ k ) , G−1 d1 (w k , w˜ k , ξ k )2G
(21)
where αk = γ αk∗ ,
αk∗ =
and γ ∈ (0, 2).
The sequence {w k } generated by (19) is not necessarily contained in W, while the sequence produced by (20) lies in W. Note that the step length αk in both (19) and (20) depends merely on ϕ(w k , w˜ k , ξ k ), d1 (w k , w˜ k , ξ k ), and γ . The proposed methods utilize different search directions but the same step length. According to our numerical experiments in [19], the update form (20) usually outperforms (19), provided that the projection onto W can easily be carried out. We mention that the proposed inexact ADMMs (19) and (20) are different from those in the literature [19, 20]. In fact, the inexact methods proposed in [19, 20] belong to the proximal-type methods [21–24], while the ADMM subproblems in this paper do not include proximal terms of any form.
4 Rationale of the Two Directions To derive the convergence of the proposed methods, we use similar arguments as those in the general framework proposed in [25]. By Lemma 2.1 the equality in (10) is equivalent to i
N k k T k k k k Aj x˜j + Aj xj − b + ξi ≥ 0 ∀xi ∈ Xi . xi − x˜i , fi x˜i − Ai λ − H j =1
j =i+1
By substituting λ˜ k given in (12), the above inequality can be rewritten as N
Aj xjk − x˜jk + ξik ≥ 0 ∀xi ∈ Xi . (22) xi − x˜ik , fi x˜ik − ATi λ˜ k + ATi H j =i+1
According to the general framework [25], for the pair (w k , w˜ k ), the following conditions are required to guarantee the convergence: (23) w˜ k = PW w˜ k − d2 w k , w˜ k , ξ k − d1 w k , w˜ k , ξ k , k w˜ − w ∗ , d2 w k , w˜ k , ξ k ≥ ϕ w k , w˜ k , ξ k − w k − w˜ k , d1 w k , w˜ k , ξ k , (24) for which we give proofs in Lemmas 4.1 and 4.2, respectively.
J Optim Theory Appl (2014) 163:105–129
113
k ,λ ˜ k ) be generated by the inexact alternating diLemma 4.1 Let w˜ k = (x˜1k , . . . , x˜N k , λk ). Then, we have rection scheme (10)–(12) from the given vector w k = (x1k , . . . , xN k w˜ ∈ W and w − w˜ k , d2 w k , w˜ k , ξ k − d1 w k , w˜ k , ξ k ≥ 0 ∀w ∈ W,
where d1 (w k , w˜ k , ξ k ) and d2 (w k , w˜ k , ξ k ) are defined in (15) and (16), respectively. k ) and X := X × · · · × X . It is clear from (10) that Proof Denote x˜ k := (x˜1k , . . . , x˜N 1 N k x˜ ∈ X . From (22), for all x ∈ X , we have
⎞T x1 − x˜1k ⎜ x − x˜ k ⎟ ⎜ 2 2 ⎟ ⎟ ⎜ .. ⎟ ⎜ ⎠ ⎝ . − x˜ k xN N ⎛
⎧⎛ ⎞ ⎛ k ⎞⎫ ⎞ ⎛ T N f1 (x˜1k ) − AT1 λ˜ k ξ1 ⎪ A1 H ( j =2 Aj (xjk − x˜jk )) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎜ f (x˜ k ) − AT λ˜ k ⎟ ⎜ T N ⎟ ⎜ ξ k ⎟⎪ k k ⎬ ⎨ A H ( A (x − x ˜ )) ⎜ 2 2 ⎟ ⎜ ⎟ ⎜ ⎟ j 2 j =3 j j ⎟ ⎜ ⎜ .2 ⎟ ⎟+⎜ 2 + .. .. ⎜ ⎟ ⎜ . ⎟⎪ ⎟ ⎜ ⎪ ⎪⎝ ⎠ ⎝ . ⎠⎪ ⎠ ⎝ . . ⎪ ⎪ ⎪ ⎪ ⎩ k ⎭ k T k ˜ ξ fN (x˜N ) − AN λ 0 N
≥ 0.
(25)
By adding ⎛
⎞T x1 − x˜1k ⎜ x − x˜ k ⎟ ⎜ 2 2 ⎟ ⎜ ⎟ .. ⎜ ⎟ ⎝ ⎠ . k xN − x˜N
⎛
⎞ AT1 H ( 1j =1 Aj (xjk − x˜jk )) ⎜ T 2 ⎟ ⎜ A2 H ( j =1 Aj (xjk − x˜jk )) ⎟ ⎜ ⎟ ⎜ ⎟ .. ⎜ ⎟ . ⎝ ⎠ N k k T AN H ( j =1 Aj (xj − x˜j ))
to both sides of (25), we get ⎞ ⎛ ⎞⎫ ⎛ ⎞T ⎧⎛ k − x˜ k )) ⎪ f1 (x˜1k ) − AT1 λ˜ k + AT1 H ( N A (x x1 − x˜1k ξ1k ⎪ j ⎪ ⎪ j =1 j j ⎪ ⎪ ⎪ ⎜ ⎪ ⎟ N ⎪ ⎜ x − x˜ k ⎟ ⎪ ⎜ ⎟ k k k T T k k ˜ ⎬ ⎜ ⎨ ⎟ λ f ( x ˜ ) − A + A H ( A (x − x ˜ )) ξ 2 j ⎜ 2 ⎜ ⎟ ⎟ j =1 j j 2 2 2 2 2 ⎜ ⎟ ⎜ ⎜ ⎟ ⎟ + ⎜ ⎟ .. .. .. ⎟⎪ ⎜ ⎟ ⎪⎜ ⎟ ⎜ ⎪⎝ ⎝ ⎠ ⎪ . . ⎪ ⎠ ⎝ . ⎠⎪ ⎪ ⎪ ⎪ k ⎪ − x˜ k ⎭ ⎩ f (x˜ k ) − AT λ˜ k + AT H ( N A (x k − x˜ k )) xN ξ N N N N j =1 j j j N N ⎞ ⎛ ⎞T ⎛ T 1 A1 H ( j =1 Aj (xjk − x˜jk )) x1 − x˜1k 2 ⎜ x − x˜ k ⎟ ⎜ k k ⎟ ⎜ T ⎟ ⎜ 2 2 ⎟ ⎜ A2 H ( j =1 Aj (xj − x˜ j )) ⎟ ⎜ ⎟ ≥⎜ ⎜ ⎟ .. .. ⎟ ⎜ ⎟ ⎝ ⎠ ⎝ . . ⎠ N k k k T xN − x˜N AN H ( j =1 Aj (xj − x˜j )) ⎛
⎞T x1 − x˜1k ⎜ x − x˜ k ⎟ ⎜ 2 2 ⎟ ⎟ =⎜ .. ⎜ ⎟ ⎝ ⎠ . k xN − x˜N
⎛
AT1 H A1
⎜ ⎜ AT H A 1 ⎜ 2 ⎜ .. ⎜ ⎝ . T AN H A1
0 AT2 H A2 .. . T AN H A2
··· .. . .. . ···
⎞⎛
⎞ x1k − x˜1k ⎟⎜ k ⎟ ⎜ x2 − x˜2k ⎟ ⎟ ⎟⎜ ⎟ (26) ⎟⎜ .. ⎟ ⎟⎝ ⎠ . ⎠ 0 k k xN − x˜N ATN H AN 0 .. .
114
J Optim Theory Appl (2014) 163:105–129
for all x ∈ X . Since
N
k i=1 Ai x˜ i
− b = H −1 (λk − λ˜ k ), by embedding the equality
N k Ai x˜i − b = λ − λ˜ k , H −1 λk − λ˜ k λ −λ ,
˜k
i=1
into (26) we obtain w˜ k ∈ W and ⎞ ⎛ ⎞⎫ ⎛ ⎞T ⎧⎛ f1 (x˜1k ) − AT1 λ˜ k + AT1 H ( N Aj (xjk − x˜jk )) ⎪ x1 − x˜1k ξ1k ⎪ j =1 ⎪ ⎪ ⎪ ⎪ ⎜ ⎪ ⎟ N k k k k T T k k ⎟⎪ ⎜ ⎪ ⎜ x − x˜ ⎟ ⎪ ˜ ⎜ ⎪ ⎪ ⎟ λ f ( x ˜ ) − A + A H ( A (x − x ˜ )) ξ 2 2 j j j =1 ⎜ ⎜ 2 ⎟ 2 ⎟ ⎪ j 2 2 2 ⎬ ⎟ ⎜ ⎟⎪ ⎜ ⎟ ⎨⎜ ⎜ ⎟ . . ⎜ .. ⎟ ⎜ ⎟ . . + ⎜ ⎟ . . . ⎜ ⎟ ⎪⎜ ⎟ ⎟ ⎜ ⎜ k ⎟⎪ ⎪ ⎜ ⎟ ⎪⎜ k⎠ ⎪ k ) − AT λ k + AT H ( N A (x k − x˜ k ))⎟ ⎪ ⎪ ⎝ ⎝xN − x˜N ⎠⎪ ˜ ξ f ( x ˜ ⎪ ⎪ ⎝ ⎠ N N j j N j =1 ⎪ ⎪ j N N ⎪ ⎪ ⎭ ⎩ k N k ˜ 0 λ −λ A x ˜ − b i i=1 i ⎛
⎞T x1 − x˜1k ⎜ x − x˜ k ⎟ ⎜ 2 2 ⎟ ⎜ ⎟ . ⎜ ⎟ .. ≥⎜ ⎟ ⎜ ⎟ k⎠ ⎝xN − x˜N λ − λ˜ k
⎛
AT1 H A1
⎜ ⎜ AT H A1 ⎜ 2 ⎜ .. ⎜ . ⎜ ⎜ T ⎝ AN H A1 0
0 AT2 H A2 .. . ATN H A2 0
··· .. . .. . ··· ···
0 .. . 0 ATN H AN 0
⎞⎛ ⎞ 0 x1k − x˜1k .. ⎟ ⎜ x k − x˜ k ⎟ . ⎟ 2 2 ⎟ ⎟⎜ ⎟ ⎟⎜ .. ⎜ ⎟ ⎟⎜ . ⎟ 0 ⎟⎜ ⎟ ⎟⎝ k k⎠ x − x ˜ 0 ⎠ N N λk − λ˜ k H −1
for all w ∈ W. Recalling the definitions of d2 (w k , w˜ k , ξ k ) and M (see (16) and (14)), the last inequality can be rewritten in the following compact form: ∀w ∈ W. w˜ k ∈ W, w − w˜ k , d2 w k , w˜ k , ξ k + ξ k ≥ w − w˜ k , M w k − w˜ k The assertion of this lemma follows directly from the above inequality and the defi nition of d1 (w k , w˜ k , ξ k ) in (15). According to (9), the assertion in Lemma 4.1 is equivalent to (23). k ,λ ˜ k ) be generated by the inexact alternating diLemma 4.2 Let w˜ k = (x˜1k , . . . , x˜N k , λk ). Then, we have rection scheme (10)–(12) from the given vector w k = (x1k , . . . , xN
k w˜ −w ∗ , d2 w k , w˜ k , ξ k ≥ ϕ w k , w˜ k , ξ k − w k − w˜ k , d1 w k , w˜ k , ξ k
∀w ∗ ∈ W ∗ (27) where d1 (w k , w˜ k , ξ k ), d2 (w k , w˜ k , ξ k ), and ϕ(w k , w˜ k , ξ k ) are defined in (15), (16), and (17). Proof According to (16), we have k w˜ − w ∗ , d2 w k , w˜ k , ξ k
N N k k ∗ k ∗ k k . Aj x˜j − xj , H Aj xj − x˜j = w˜ − w , F w˜ + j =1
j =1
(28)
J Optim Theory Appl (2014) 163:105–129
115
From the monotonicity of F (w) and the fact that w˜ k ∈ W it follows that k w˜ − w ∗ , F w˜ k ≥ w˜ k − w ∗ , F w ∗ ≥ 0. Substituting the above inequality into (28), we obtain
N N k k k k ∗ k k w˜ − w , d2 w , w˜ , ξ ≥ . Aj x˜j − xj , H Aj xj − x˜j
∗
k
j =1
j =1
N k ∗ k ˜k Since N j =1 Aj xj = b and H ( j =1 Aj x˜ j − b) = λ − λ , the above inequality becomes
N k k k k k k k w˜ − w , d2 w , w˜ , ξ ≥ λ − λ˜ , ∀w ∗ ∈ W ∗ . Aj xj − x˜j
∗
k
j =1
Inequality (27) follows immediately by further considering the definition of ϕ(w k , w˜ k , ξ k ). k ,λ ˜ k ) be generated by the inexact alternating diLemma 4.3 Let w˜ k = (x˜1k , . . . , x˜N k , λk ). Then, we have rection scheme (10)–(12) from the given vector w k = (x1k , . . . , xN
w − w˜ , M w − w˜ k
k
k
k
+ λ − λ˜ k , k
N k k Aj xj − x˜j j =1
N 2 N k 2 k 2 1 k k k Aj x − x˜ + λ − λ˜ −1 + Aj x − b = . j j H j H 2 j =1
j =1
H
Proof From the definition of the matrix M in (14) we have ⎞⎛ ⎛ T ⎞ A1 H A1 0 ··· 0 0 x1k − x˜1k ⎟ ⎜ .. .. ⎟ ⎜ x k − x˜ k ⎟ .. ⎜ AT H A . . . ⎟⎜ AT2 H A2 2 2 ⎟ 1 ⎜ 2 ⎟ k ⎟⎜ ⎜ .. ⎜ ⎟, .. .. .. M w − w˜ k = ⎜ ⎟ . ⎜ ⎟ . . . 0 0 ⎟⎜ ⎜ ⎟ ⎟ ⎜ T k k ⎝ ⎠ T T x − x ˜ ⎝ AN H A1 AN H A2 · · · AN H AN 0 ⎠ N N k k ˜ λ −λ 0 0 ··· 0 H −1 from which we can easily verify the following equality:
w k − w˜ k , M w k − w˜ k N k 2 1 k k k 2 Aj x − x˜ + λ − λ˜ −1 = j j H H 2 j =1
116
J Optim Theory Appl (2014) 163:105–129
⎛
⎞T A1 (x1k − x˜1k ) ⎜ A (x k − x˜ k ) ⎟ ⎜ 2 2 2 ⎟ ⎟ 1⎜ . ⎜ ⎟ .. + ⎜ ⎟ 2⎜ k − x˜ k )⎟ ⎝AN (xN N ⎠ k k ˜ λ −λ
⎛
H ⎜H ⎜ ⎜ .. ⎜ . ⎜ ⎝H 0
H H .. . H 0
··· ··· .. . ··· ···
H H .. . H 0
⎞ ⎛ A (x k − x˜ k ) ⎞ 0 1 1 1 0 ⎟⎜ A2 (x2k − x˜2k ) ⎟ ⎜ ⎟ ⎟ ⎟ .. ⎟ ⎜ .. ⎜ ⎟. . ⎟ . ⎜ ⎟ ⎟⎜ k − x˜ k )⎟ 0 ⎠ ⎝AN (xN N ⎠ k k ˜ H −1 λ −λ (29)
Simple calculations show that
N k k k k λ − λ˜ , Aj xj − x˜j ⎛
j =1
A1 (x1k − x˜1k )
⎞T ⎛
⎜ A (x k − x˜ k ) ⎟ ⎜ 2 2 2 ⎟ ⎟ 1⎜ ⎟ ⎜ .. = ⎜ ⎟ . ⎟ ⎜ 2 ⎜ k − x˜ k )⎟ (x A ⎝ N N N ⎠ λk − λ˜ k
⎜ ⎜0 ⎜ ⎜ .. ⎜. ⎜ ⎜ ⎝0
0
···
0
0 .. . 0
··· .. . ···
0 .. . 0
I
I
...
I
0
I
⎞⎛
A1 (x1k − x˜1k )
⎞
⎟⎜ ⎟ I ⎟ ⎜ A2 (x2k − x˜2k ) ⎟ ⎟⎜ ⎟ ⎟ .. .. ⎟ ⎜ ⎜ ⎟ ⎟. . . ⎟⎜ ⎟ ⎟⎜ k − x˜ k )⎟ I ⎠ ⎝AN (xN N ⎠ 0 λk − λ˜ k
(30)
The addition of (29) and (30) yields
N k k k k k k k k w − w˜ , M w − w˜ + λ − λ˜ , Aj xj − x˜j ⎛
A1 (x1k A2 (x2k
− x˜1k ) − x˜2k )
⎞T ⎛
j =1
H
H
...
H
I
⎞⎛
A1 (x1k − x˜1k )
⎞
⎜ ⎟ ⎜H H ... H ⎟ ⎜ I ⎟ ⎟ ⎜ ⎟ ⎜ A2 (x2k − x˜2k ) ⎟ ⎜ ⎜ ⎟ ⎜ ⎟ ⎟ ⎜ 1⎜ ⎟ ⎜ .. ⎟ .. . . .. .. ⎟ ⎜ .. .. = ⎜ ⎜ ⎟ ⎜ ⎟ ⎟ . . . . ⎟⎜ . . ⎟ ⎜ . ⎟ 2⎜ ⎜ ⎟ ⎜ ⎟ ⎜ k − x˜ k ) k − x˜ k )⎟ H H · · · H I (x A ⎝ ⎠ ⎝ ⎠ ⎝AN (xN N N N N ⎠ −1 k k k k ˜ ˜ I I ··· I H λ −λ λ −λ N 1 Aj x k − x˜ k 2 + λk − λ˜ k 2 −1 + j j H H 2 j =1 N 2 1 k = Aj xj − x˜jk + H −1 λk − λ˜ k 2 j =1
H
k k 2 2 1 k k Aj x − x˜ + λ − λ˜ −1 + j j H H 2 j =1 N 2 N 1 1 2 2 Aj x k − x˜ k + λk − λ˜ k −1 , = Aj xjk − b + j j H H 2 2
N
j =1
H
j =1
where the last equality follows from (12). The lemma is proved.
J Optim Theory Appl (2014) 163:105–129
117
Now, we prove (18) and the descent properties of −d1 (w k , w˜ k , ξ k )
and
− d2 (w k , w˜ k , ξ k ).
k ,λ ˜ k ) be generated by the inexact alternating diTheorem 4.1 Let w˜ k = (x˜1k , . . . , x˜N k , λk ). If the inexactrection scheme (10)–(12) from the given vector w k = (x1k , . . . , xN ness criteria (11) are satisfied, then N k k 2 k k k 1 2 k k Aj x − x˜ + λ − λ˜ −1 . ϕ w , w˜ , ξ ≥ (31) j j H H 4 j =1
In addition, if ϕ(w k , w˜ k , ξ k ) = 0, then w˜ k ∈ W ∗ is a solution of VI(W, F ). Proof First, it follows from (15), (17), and Lemma 4.3 that N k k k k 1 2 2 Aj x − x˜ k + λk − λ˜ k −1 ϕ w , w˜ , ξ = j j H H 2 j =1 N 2 1 k + Aj xj − b − w k − w˜ k , ξ k . 2 j =1
(32)
H
From the inexactness criteria (11) we have N 1 Aj x k − x˜ k 2 . − w k − w˜ k , ξ k ≥ − j j H 4 j =1
Substituting the above inequality into (32), the first part of the theorem follows immediately. Consequently, if ϕ(w k , w˜ k , ξ k ) = 0, it follows that Aj xjk − x˜jk = 0, j = 1, . . . , N, and λk = λ˜ k . Moreover, it follows from (11) that ξjk = 0, j = 1, . . . , N . By substituting the above equations into (22) we get (33) x˜ik ∈ Xi , xi − x˜ik , fi x˜ik − ATi λ˜ k ≥ 0 ∀xik ∈ Xi , i = 1, . . . , N, and N Aj x˜jk − b = H −1 λk − λ˜ k = 0.
(34)
j =1
Combining (33) and (34), we get w˜ k ∈ W, w − w˜ k , F w˜ k ≥ 0 ∀w ∈ W, and thus w˜ k is a solution of VI(W, F ).
118
J Optim Theory Appl (2014) 163:105–129
k ,λ ˜ k ) be generated by the inexact alternating diTheorem 4.2 Let w˜ k = (x˜1k , . . . , x˜N k , λk ). Then, we have rection scheme (10)–(12) from the given vector w k = (x1k , . . . , xN
w k − w ∗ , d1 w k , w˜ k , ξ k ≥ ϕ w k , w˜ k , ξ k
∀w ∗ ∈ W ∗ ,
(35)
where d1 (w k , w˜ k , ξ k ) and ϕ(w k , w˜ k , ξ k ) are defined in (15) and (17), respectively. Proof First, it follows from Lemma 4.1 that k w˜ − w ∗ , d1 w k , w˜ k , ξ k ≥ w˜ k − w ∗ , d2 w k , w˜ k , ξ k
∀w ∗ ∈ W.
Combining with Lemma 4.2, we have k w˜ −w ∗ , d1 w k , w˜ k , ξ k ≥ ϕ w k , w˜ k , ξ k − w k − w˜ k , d1 w k , w˜ k , ξ k which implies (35).
∀w ∗ ∈ W ∗
Recall that the sequence {w k } generated by (19) is not necessarily contained in W. Therefore, the w k in Theorem 4.2 must be allowed to be any point in Rm+n . In contrast, the sequence produced by (20) lies in W, and thus it is required in the next theorem that w k belongs to W. k ,λ ˜ k ) be generated by the inexact alternating diTheorem 4.3 Let w˜ k = (x˜1k , . . . , x˜N k , λk ). If w k ∈ W, rections scheme (10)–(12) from the given vector w k = (x1k , . . . , xN then k w − w ∗ , d2 w k , w˜ k , ξ k ≥ ϕ w k , w˜ k , ξ k ∀w ∗ ∈ W ∗ , (36)
where d2 (w k , w˜ k , ξ k ) and ϕ(w k , w˜ k , ξ k ) are defined in (16) and (17), respectively. Proof Since w k ∈ W, it follows from Lemma 4.1 that k w − w˜ k , d2 w k , w˜ k , ξ k ≥ w k − w˜ k , d1 w k , w˜ k , ξ k . Then, the addition of (27) to both sides of (37) yields (36).
(37)
5 Convergence Using the directions d1 (w k , w˜ k , ξ k ) and d2 (w k , w˜ k , ξ k ) offered by (15) and (16), the new iterate w k+1 is determined by the positive definite matrix G and the step length αk , see (19) and (20). In order to explain how to determine the step length, we define the new step-length-dependent iterate by (38) w1k+1 (αk ) = w k − αk G−1 d1 w k , w˜ k , ξ k and
w2k+1 (αk ) = PW ,G w k − αk G−1 d2 w k , w˜ k , ξ k .
(39)
J Optim Theory Appl (2014) 163:105–129
In this way,
and
119
2 ϑ1 (αk ) = w k − w ∗ 2G − w1k+1 (αk ) − w ∗ G
(40)
2 ϑ2 (αk ) = w k − w ∗ 2G − w2k+1 (αk ) − w ∗ G
(41)
measure the improvement in the kth iteration by using updating forms (38) and (39), respectively. Since the optimal solution w ∗ ∈ W ∗ is unknown, it is generally infeasible to maximize the improvement directly. The following theorem introduces a tight lower bound on ϑ1 (αk ) and ϑ2 (αk ), which does not depend on the unknown vector w ∗ . Theorem 5.1 For any w ∗ ∈ W ∗ and αk ≥ 0, we have ϑ1 (αk ) ≥ q(αk ) where
and ϑ2 (αk ) ≥ q(αk ),
2 q(αk ) = 2αk ϕ w k , w˜ k , ξ k − αk2 G−1 d1 w k , w˜ k , ξ k G .
(42)
Proof From (38) and (40) we have 2 ϑ1 (αk ) = w k − w ∗ 2G − w k − w ∗ − αk G−1 d1 w k , w˜ k , ξ k G 2 = 2 w k − w ∗ , αk d1 w k , w˜ k , ξ k − αk2 G−1 d1 w k , w˜ k , ξ k G 2 ≥ 2αk ϕ w k , w˜ k , ξ k − αk2 G−1 d1 w k , w˜ k , ξ k G = q(αk ), where the last inequality follows from (35). Hence, the first assertion of this theorem is proved. By setting v = w k − αk G−1 d2 (w k , w˜ k , ξ k ) and u = w ∗ in (8) we get k+1 w (αk ) − w ∗ 2 ≤ w k − αk G−1 d2 w k , w˜ k , ξ k − w ∗ 2 2 G G 2 − w k − αk G−1 d2 w k , w˜ k , ξ k − w2k+1 (αk )G . Substituting the above inequality into (41), we obtain 2 2 ϑ2 (αk ) ≥ w k − w ∗ G − w k − w ∗ − αk G−1 d2 w k , w˜ k , ξ k G 2 + w k − w2k+1 (αk ) − αk G−1 d2 w k , w˜ k , ξ k G 2 = w k − w2k+1 (αk )G + 2 w2k+1 (αk ) − w ∗ , αk d2 w k , w˜ k , ξ k .
(43)
Since w2k+1 (αk ) ∈ W, it follows from Lemma 4.1 that k+1 w2 (αk ) − w˜ k , d2 w k , w˜ k , ξ k ≥ w2k+1 (αk ) − w˜ k , d1 w k , w˜ k , ξ k .
(44)
The addition of (27) to both sides of (44) yields k+1 w2 (αk ) − w ∗ , d2 w k , w˜ k , ξ k ≥ ϕ w k , w˜ k , ξ k + w2k+1 (αk ) − w k , d1 w k , w˜ k , ξ k .
(45)
120
J Optim Theory Appl (2014) 163:105–129
Substituting (45) into the right-hand side of (43), we obtain 2 ϑ2 (αk ) ≥ w k − w2k+1 (αk )G + 2αk ϕ w k , w˜ k , ξ k T + 2αk w2k+1 (αk ) − w k d1 w k , w˜ k , ξ k 2 2 = w k − w2k+1 (αk ) − αk G−1 d1 w k , w˜ k , ξ k G − αk2 G−1 d1 w k , w˜ k , ξ k G + 2αk ϕ w k , w˜ k , ξ k 2 ≥ 2αk ϕ w k , w˜ k , ξ k − αk2 G−1 d1 w k , w˜ k , ξ k G = q(αk ).
Hence, the proof is completed. Denote
ϑ(αk ) := min ϑ1 (αk ), ϑ2 (αk ) ≥ q(αk ).
(46)
Note that q(αk ) is a quadratic function of αk , and it reaches its maximum at αk∗ =
ϕ(w k , w˜ k , ξ k ) , G−1 d1 (w k , w˜ k , ξ k )2G
(47)
which is just the same as defined in (21). Consequently, it follows from Theorem 5.1 that ϑ αk∗ ≥ q αk∗ = αk∗ ϕ w k , w˜ k , ξ k . Since inequality (35) is used in the proof of Theorem 5.1, in practice, multiplication of the “optimal” step length αk∗ by a factor γ > 1 may result in faster convergence. By using (42) and (47) we have 2 2 q γ αk∗ = 2γ αk∗ ϕ w k , w˜ k , ξ k − γ αk∗ G−1 d1 w k , w˜ k , ξ k G = γ (2 − γ )αk∗ ϕ w k , w˜ k , ξ k .
(48)
In order to guarantee that the right-hand side of (48) is positive, we choose γ ∈ [1, 2). The following theorem shows that the sequence {w k } generated by the proposed method is Fejèr monotone with respect to W ∗ . Theorem 5.2 For any w ∗ ∈ W ∗ , the sequence {w k } generated by each of the proposed methods (with update form (19) or (20)) satisfies 2 2 k+1 w − w ∗ G ≤ w k − w ∗ G − γ (2 − γ )αk∗ ϕ w k , w˜ k , ξ k ∀w ∗ ∈ W ∗ .
(49)
Proof It follows from Theorem 5.1 that ϑ(γ αk∗ ) ≥ q(γ αk∗ ), which is equivalent to 2 2 k+1 w − w ∗ G ≤ w k − w ∗ G − q γ αk∗ . Then, the result of this theorem directly follows from (48).
J Optim Theory Appl (2014) 163:105–129
121
Theorem 5.2 indicates that the sequence {w k } converges to the solution set monotonically in the Fejèr sense. Thus, according to [16, 17], the proposed method belongs to the class of contraction methods. In the following, we show that αk∗ > 0 has a positive lower bound. Lemma 5.1 For any given (but fixed) positive definite matrix G, there exists a constant c0 > 0 such that αk∗ ≥ c0 for all k > 0. Proof From the definition of M in (14) it is easy to show that ⎛ T 1/2 A1 H 0 ··· 0 ⎜ T 1/2 .. . .. ⎜A H . AT2 H 1/2 2 ⎜ k .. .. . M w − w˜ k = ⎜ .. ⎜ . . 0 ⎜ ⎝ AT H 1/2 AT H 1/2 · · · AT H 1/2 N N N 0 0 ··· 0 ⎛ 1/2 ⎞ k k H A1 (x1 − x˜1 ) ⎜ H 1/2 A (x k − x˜ k ) ⎟ ⎜ 2 2 2 ⎟ ⎜ ⎟ ⎜ ⎟ . .. ×⎜ ⎟. ⎜ ⎟ ⎜ 1/2 k − x˜ k )⎟ ⎝H AN (xN N ⎠ −1/2 k k ˜ H (λ − λ )
0 .. . 0 0
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
H −1/2
Therefore, there exists a constant K > 0 such that N k M w − w˜ k 2 ≤ K Aj x k − x˜ k 2 + λk − λ˜ k 2 −1 . j j H H j =1
Moreover, according to the inexactness criteria (11), we have N k k 2 Aj x − x˜ k 2 . ξ ≤ j j H j =1
Since d1 (w k , w˜ k , ξ k ) = M(w k − w˜ k ) − ξ k , we have k k k 2 d1 w , w˜ , ξ ≤ 2M w k − w˜ k 2 + 2ξ k 2 N k 2 2 k k k Aj x − x˜ + λ − λ˜ −1 . ≤ 2(K + 1) j j H H j =1
Using (31) and recalling that any two norms in a finite-dimensional space are equivalent, we have αk∗ =
ϕ(w k , w˜ k , ξ k ) c ϕ(w k , w˜ k , ξ k ) c > 0, ≥ ≥ c := 0 8(K + 1) d1 (w k , w˜ k , ξ k )2 d1 (w k , w˜ k , ξ k )2G−1
where c > 0 is a constant. This completes the proof of the lemma.
(50)
122
J Optim Theory Appl (2014) 163:105–129
Theorem 5.3 Let {w k } and {w˜ k } be the sequences generated by the proposed alternating-direction-based contraction methods (19) and (20) for problem (3). Then, we have (i) (ii) (iii) (iv)
k The sequence N {w } is bounded. limk→∞ { j =1 Aj (xjk − x˜jk )2 + λk − λ˜ k 2 } = 0. Any accumulation point of {w˜ k } is a solution of (3). When Ai , i = 1, . . . , N , are full column rank matrices, the sequence {w˜ k } converges to a unique w ∞ ∈ W ∗ .
Proof The first assertion follows from (49). Moreover, by combining the recursion of (49) and the fact that αk∗ ≥ c0 > 0 it is easy to show that lim ϕ w k , w˜ k , ξ k = 0.
k→∞
Consequently, it follows from Theorem 4.1 that lim Ai xik − x˜ik = 0, i = 1, . . . , N, and k→∞
lim λk − λ˜ k = 0,
k→∞
(51)
and the second assertion is proved. Using similar arguments as in the proof of Theorem 4.1, we obtain w˜ k ∈ W,
lim w − w˜ k , F w˜ k ≥ 0 ∀w ∈ W,
k→∞
(52)
and thus any accumulation point of {w˜ k } is a solution of VI(W, F ), i.e., a solution of (3). If all Ai are full column rank matrices, it follows from the first assertion and (51) that {w˜ k } is also bounded. Let w ∞ be an accumulation point of {w˜ k }. Then, there exists a subsequence {w˜ kj } that converges to w ∞ . It follows from (52) that w˜ kj ∈ W,
lim w − w˜ kj , F w˜ kj ≥ 0 ∀w ∈ W,
k→∞
and consequently, we have w ∞ ∈ W,
w − w ∞ , F w ∞ ≥ 0 ∀w ∈ W,
which implies that w ∞ ∈ W ∗ . Since {w k } is Fejèr monotone and limk→∞ w k − w˜ k = 0, the sequence {w˜ k } cannot have any other accumulation point and thus must converge to w ∞ .
6 Complexity The analysis in this section is inspired by [26]. It is based on a key inequality (see Lemmas 6.1 and 6.2) that is similar to that in [26]. In the current framework of variational inequalities, the analysis becomes much simpler and more elegant. As a prepa-
J Optim Theory Appl (2014) 163:105–129
123
ration for the proof of the complexity result, we first give an alternative characterization of the optimal solution set W ∗ , namely, # w ∗ ∈ W : w − w ∗ , F (w) ≥ 0 .
W∗ =
w∈W
For a proof of this characterization, we refer to Theorem 2.3.5 in the book [27]. According to the above alternative characterization, we have that wˆ ∈ W is an optimal solution of VI(W, F ) if it satisfies wˆ ∈ W
and
sup
w∈W
wˆ − w, F (w) ≤ .
(53)
In general, our complexity analysis follows the lines of [28], but instead of using w˜ k directly, we need to introduce an auxiliary vector, namely, ⎞ x˜1k ⎜ .. ⎟ ⎜ ⎟ wˆ k = ⎜ . ⎟ , ⎝x˜ k ⎠ N λˆ k ⎛
where λˆ k = λ˜ k − H
N
Aj xjk − x˜jk .
j =1
N k k k ˆk Since λ˜ k = λk − H ( N j =1 Aj x˜ j − b) (see (12)), we have λ = λ − H ( j =1 Aj xj − b) or, equivalently, N N H −1 λk − λ˜ k = Aj x˜jk − b = − Aj xjk − x˜jk + H −1 λk − λˆ k . j =1
j =1
As a consequence, we may rewrite the two descent directions separately as d1 w k , w˜ k , ξ k = M w k − w˜ k − ξ k = Mˆ w k − wˆ k − ξ k =: dˆ1 w k , wˆ k , ξ k , (54) where
⎛
AT1 H A1
⎜ T ⎜ A H A1 ⎜ 2 ˆ .. M := ⎜ ⎜ . ⎜ ⎝ AT H A1 N −A1
0 AT2 H A2 .. . ATN H A2 −A2
··· .. . .. . ··· ···
0 .. . 0 ATN H AN −AN
⎞ 0 .. ⎟ . ⎟ ⎟ ⎟ 0 ⎟ ⎟ 0 ⎠
H −1
and ⎛
⎞ AT1 N . ⎟ k k k k ⎜ ⎜ .. ⎟ d2 w , w˜ , ξ = F w˜ + ⎜ ⎟ H Aj xjk − x˜jk = F wˆ k . ⎝AT ⎠ j =1 N 0
(55)
124
J Optim Theory Appl (2014) 163:105–129
In fact, we constructed wˆ k and Mˆ such that dˆ1 (w k , wˆ k , ξ k ) is exactly the direction d1 (w k , w˜ k , ξ k ) and F (wˆ k ) is exactly the direction d2 (w k , w˜ k , ξ k ). Moreover, the assertion in Lemma 4.1 can be rewritten accordingly as (56) w − wˆ k , F wˆ k − dˆ1 w k , wˆ k , ξ k ≥ 0 ∀w ∈ W. Now we are ready to prove the key inequality for both algorithms, which is given in the following two lemmas. Lemma 6.1 If the new iterate w k+1 is updated by (19), then we have 2 2 1 w − wˆ k , γ αk∗ F wˆ k + w − w k G − w − w k+1 G ≥ 0 2
∀w ∈ W.
Proof Due to (56), we have ∀w ∈ W. w − wˆ k , γ αk∗ F wˆ k ≥ w − wˆ k , γ αk∗ dˆ1 w k , wˆ k , ξ k
(57)
In addition, we have, by using (54) and (19), γ αk∗ dˆ1 w k , wˆ k , ξ k = γ αk∗ d1 w k , w˜ k , ξ k = G w k − w k+1 . Substitution of the last equation into (57) yields w − wˆ k , γ αk∗ F wˆ k ≥ w − wˆ k , G w k − w k+1 . Thus, it suffices to show that
2 2 1 w − wˆ k , G w k − w k+1 + w − w k G − w − w k+1 G ≥ 0 2
∀w ∈ W. (58)
Applying the equality
1 1 a − b, G(c − d) + a − c2G − a − d2G = c − b2G − d − b2G , 2 2
we derive that
1 w − wˆ k , G w k − w k+1 + w − w k 2G − w − w k+1 2G 2 2 1 2 = w k − wˆ k G − w k+1 − wˆ k G . 2
In view of (19), we have k w − wˆ k 2 − w k+1 − wˆ k 2 G G k 2 k 2 k = w − wˆ G − w − wˆ k − γ αk∗ G−1 d1 w k , w˜ k , ξ k G 2 2 = 2 w k − wˆ k , γ αk∗ d1 w k , w˜ k , ξ k − γ 2 αk∗ G−1 d1 w k , w˜ k , ξ k G .
(59)
J Optim Theory Appl (2014) 163:105–129
125
Moreover, we have k w − wˆ k , d1 w k , w˜ k , ξ k = w k − w˜ k , d1 w k , w˜ k , ξ k + w˜ k − wˆ k , d1 w k , w˜ k , ξ k
N k k k k −1 k k k k k = w − w˜ , d1 w , w˜ , ξ + H Aj xj − x˜j , H λ − λ˜ j =1
N k k k k k k k ˜ = w − w˜ , d1 w , w˜ , ξ + λ − λ , Aj xj − x˜j
k
k
= ϕ w k , w˜ k , ξ k ,
j =1
(60)
where the second equality follows from the definitions of w˜ k , wˆ k , and d1 (w k , w˜ k , ξ k ). By combining the last two equations and using (21) we obtain k w − wˆ k 2 − w k+1 − wˆ k 2 G G 2 2 = 2γ αk∗ ϕ w k , w˜ k , ξ k − γ 2 αk∗ G−1 d1 w k , w˜ k , ξ k G = γ (2 − γ )αk∗ ϕ w k , w˜ k , ξ k . Substituting the last equation into (59) yields 1 w − wˆ k , G w k − w k+1 + w − w k 2G − w − w k+1 2G 2 k k k 1 ∗ = γ (2 − γ )αk ϕ w , w˜ , ξ ≥ 0, 2
which is just (58). Thus, the proof is complete.
Lemma 6.2 If the new iterate w k+1 is updated by (20), then we have 1 w − wˆ k , γ αk∗ F wˆ k + w − w k 2G − w − w k+1 2G ≥ 0 ∀w ∈ W. 2
Proof To begin with, we separate the term (w − wˆ k )T γ αk∗ F (wˆ k ) into two as w − wˆ k , γ αk∗ F wˆ k = w k+1 − wˆ k , γ αk∗ F wˆ k + w − w k+1 , γ αk∗ F wˆ k . (61) In the sequel, we will deal with the above two terms separately. Since w k+1 ∈ W, we have, by substituting w with w k+1 in (56), k+1 − wˆ k , γ αk∗ F wˆ k w ≥ w k+1 − wˆ k , γ αk∗ dˆ1 w k , wˆ k , ξ k = w k+1 − wˆ k , γ αk∗ d1 w k , w˜ k , ξ k = w k − wˆ k , γ αk∗ d1 w k , w˜ k , ξ k − w k − w k+1 , γ αk∗ d1 w k , w˜ k , ξ k .
(62)
126
J Optim Theory Appl (2014) 163:105–129
Recall that the first term on the right-hand side of (62) was calculated in (60). As to the second term, we have k w − w k+1 , γ αk∗ d1 w k , w˜ k , ξ k = w k − w k+1 , G γ αk∗ G−1 d1 w k , w˜ k , ξ k 2 1 1 2 ≤ w k − w k+1 2G + γ 2 αk∗ G−1 d1 w k , w˜ k , ξ k G . 2 2 Thus, we obtain k+1 w − wˆ k , γ αk∗ F wˆ k 1 2 2 1 ≥ γ αk∗ ϕ w k , w˜ k , ξ k − γ 2 αk∗ G−1 d1 w k , w˜ k , ξ k G − w k − w k+1 2G 2 2 1 1 = γ (2 − γ )αk∗ ϕ w k , w˜ k , ξ k − w k − w k+1 2G , (63) 2 2 where the equality follows from (21). Now, we turn to consider the second term w − w k+1 , γ αk∗ F (wˆ k ) in (61). Since k+1 w is updated by (20), w k+1 is the projection of [w k − γ αk∗ G−1 F (wˆ k )] onto W under the G-norm. It follows from (7) that k w − γ αk∗ G−1 F wˆ k − w k+1 , G w − w k+1 ≤ 0 ∀w ∈ W. As a consequence, we have w − w k+1 , γ αk∗ F wˆ k ≥ w − w k+1 , G w k − w k+1 . By applying the formula a, Gb = 12 (a2G − a − b2G + b2G ) to the right-hand side of the last inequality we derive that
1 1 w −w k+1 , γ αk∗ F wˆ k ≥ w −w k+1 2G −w −w k 2G + w k −w k+1 2G . (64) 2 2
By incorporating inequalities (63) and (64) into Eq. (61), the assertion of Lemma 6.2 follows. Having the same key inequality for both methods, the O(1/t) rate of convergence (in an ergodic sense) can be obtained easily. Theorem 6.1 For any integer t > 0, we have a wˆ t ∈ W that satisfies wˆ t − w, F (w) ≤
1 w − w 0 2G 2γ Υt
∀w ∈ W,
where wˆ t =
t 1 ∗ k αk wˆ Υt k=0
and Υt =
t k=0
αk∗ .
J Optim Theory Appl (2014) 163:105–129
127
Proof In Lemmas 6.1 and 6.2, we have proved the same key inequality for both methods, namely, 1 w − wˆ k , γ αk∗ F wˆ k + w − w k 2G − w − w k+1 2G ≥ 0 ∀w ∈ W. 2 Since F is monotone, we have 1 w − wˆ k , γ αk∗ F (w) + w − w k 2G − w − w k+1 2G ≥ 0 ∀w ∈ W, 2
or, equivalently, k 1 wˆ − w, γ αk∗ F (w) + w − w k+1 2G − w − w k 2G ≤ 0 ∀w ∈ W, 2 When taking the sum of the above inequalities over k = 0, . . . , t, we obtain t
t 1 t+1 ∗ k ∗ w αk wˆ − αk w, F (w) + − w 0 2G − w − w 0 2G ≤ 0. 2γ k=0
k=0
By dropping the term w t+1 − w 0 2G and incorporating Υt and wˆ t into the above inequality, we have w − w 0 2G wˆ t − w, F (w) ≤ 2γ Υt
∀w ∈ W.
Hence, the proof is complete. Since it follows from (50) that Υt =
t
αk∗ ≥ (t + 1)c0 ,
k=0
we have, by Theorem 6.1, wˆ t − w, F (w) ≤
w − w 0 2G 1 w − w 0 2G ≤ 2γ Υt 2γ c0 (t + 1)
∀w ∈ W.
According to (53), the above inequality implies the O(1/t)-rate of convergence immediately. We emphasize that our convergence rate is in the ergodic sense. From a theoretical point of view, this suggests to use a larger parameter γ ∈ (0, 2) in implementations.
7 Conclusions Attracted by the practical efficiency of the alternating direction method of multipliers, an alternating-direction-based contraction method was proposed in [16]. The
128
J Optim Theory Appl (2014) 163:105–129
new method deals with the general separable and linearly constrained convex optimization problem, where the objective function is separable into finitely many parts. However, the new method requires the exact solution of ADMM subproblems, which limits its applicability. To overcome this limitation, this paper presents two inexact alternating-direction-based contraction methods. These methods are practically more viable since the subproblems are solved inexactly. The convergence properties and complexity results (O(1/t) rate of convergence) of the proposed methods were derived. We emphasize that even for the simplest case, where the objective function is separable into two parts, our methods are different from the common inexact methods in the literature [19, 20], as our subproblems for computing search directions do not include proximal terms of any form. In addition, the complexity results, which are proved in the framework of variational inequalities, are new for this kind of inexact ADMs. Acknowledgements Authors wish to thank Prof. Panos M. Pardalos (the associate editor) and Prof. Cornelis Roos (Delft University of Technology) for useful comments and suggestions. G. Gu was supported by the NSFC grant 11001124. B. He was supported by the NSFC grant 91130007. J. Yang was supported by the NSFC grant 11371192.
References 1. Floudas, C.A., Pardalos, P.M. (eds.): Encyclopedia of Optimization, 2nd edn. Springer, Berlin (2009) 2. Pardalos, P.M., Resende, M.G.C. (eds.): Handbook of Applied Optimization. Oxford University Press, Oxford (2002) 3. Ng, M.K., Weiss, P., Yuan, X.: Solving constrained total-variation image restoration and reconstruction problems via alternating direction methods. SIAM J. Sci. Comput. 32(5), 2710–2736 (2010) 4. Yang, J., Zhang, Y.: Alternating direction algorithms for 1 -problems in compressive sensing. SIAM J. Sci. Comput. 33(1), 250–278 (2011) 5. Wen, Z., Goldfarb, D., Yin, W.: Alternating direction augmented Lagrangian methods for semidefinite programming. Math. Program. Comput. 2(3–4), 203–230 (2010) 6. He, B., Xu, M., Yuan, X.: Solving large-scale least squares semidefinite programming by alternating direction methods. SIAM J. Matrix Anal. Appl. 32(1), 136–152 (2011) 7. Yuan, X.: Alternating direction method for covariance selection models. J. Sci. Comput. 51(2), 261– 273 (2012) 8. Yuan, X., Yang, J.: Sparse and low-rank matrix decomposition via alternating direction method. Pac. J. Optim. 9(1), 167–180 (2013) 9. Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Computer Science and Applied Mathematics. Academic Press/Harcourt Brace Jovanovich Publishers, New York (1982) 10. Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer Series in Operations Research and Financial Engineering. Springer, New York (2006) 11. Candès, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006) 12. Chan, R.H., Yang, J., Yuan, X.: Alternating direction method for image inpainting in wavelet domains. SIAM J. Imaging Sci. 4(3), 807–826 (2011) 13. Goldstein, T., Osher, S.: The split Bregman method for L1-regularized problems. SIAM J. Imaging Sci. 2(2), 323–343 (2009) 14. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976) 15. Douglas, J. Jr., Rachford, H.H. Jr.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82, 421–439 (1956) 16. He, B., Tao, M., Xu, M., Yuan, X.: An alternating direction-based contraction method for linearly constrained separable convex programming problems. Optimization 62(4), 573–596 (2013)
J Optim Theory Appl (2014) 163:105–129
129
17. Blum, E., Oettli, W.: Grundlagen und Verfahren. Springer, Berlin (1975). Grundlagen und Verfahren, Mit einem Anhang “Bibliographie zur Nichtlinearer Programmierung”, Ökonometrie und Unternehmensforschung, No. XX 18. Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. PrenticeHall, Upper Saddle River (1989) 19. He, B., Liao, L., Qian, M.: Alternating projection based prediction-correction methods for structured variational inequalities. J. Comput. Math. 24(6), 693–710 (2006) 20. He, B., Liao, L., Han, D., Yang, H.: A new inexact alternating directions method for monotone variational inequalities. Math. Program., Ser. A 92(1), 103–118 (2002) 21. Chen, G., Teboulle, M.: A proximal-based decomposition method for convex minimization problems. Math. Program., Ser. A 64(1), 81–101 (1994) 22. Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976) 23. Teboulle, M.: Convergence of proximal-like algorithms. SIAM J. Optim. 7(4), 1069–1083 (1997) 24. Tseng, P.: Alternating projection-proximal methods for convex programming and variational inequalities. SIAM J. Optim. 7(4), 951–965 (1997) 25. He, B., Xu, M.: A general framework of contraction methods for monotone variational inequalities. Pac. J. Optim. 4(2), 195–212 (2008) 26. Nemirovski, A.: Prox-method with rate of convergence O(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004) (Electronic) 27. Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems, vol. I. Springer Series in Operations Research. Springer, New York (2003) 28. Cai, X., Gu, G., He, B.: On the O(1/t) convergence rate of the projection and contraction methods for variational inequalities with Lipschitz continuous monotone operators. Comput. Optim. Appl. (2013). doi:10.1007/s10589-013-9599-7