and Second-Order Necessary Conditions via Exact ... - Semantic Scholar

Comment

Report 3 Downloads 12 Views

First- and Second-Order Necessary Conditions via Exact Penalty Functions K. W. Meng School of Economics and Management, Southwest Jiaotong University, Chengdu 610031, China

Email: [email protected] X. Q. Yang Department of Applied Mathematics, The Hong Kong Polytechnic University, Kowloon, Hong Kong

Email: [email protected] November 18, 2013 Abstract: In this paper we study first- and second-order necessary conditions for nonlinear programming problems from the viewpoint of exact penalty functions. By applying the variational description of regular subgradients, we first establish necessary and sufficient conditions for a penalty term to be of KKT-type by using the regular subdifferential of the penalty term. In terms of the kernel of the subderivative of the penalty term, we also present sufficient conditions for a penalty term to be of KKT-type. We then derive a second-order necessary condition by assuming a second-order constraint qualification which requires that the second-order linearized tangent set is included in the closed convex hull of the kernel of the parabolic subderivative of the penalty term. In particular, for an l2/3 penalty term, by assuming the nonpositiveness of a sum of a second-order derivative and a third-order derivative of the original data and applying a third-order Taylor expansion, we obtain the second-order necessary condition. Keywords: nonlinear programming problem, exact penalty function, KKT condition, secondorder necessary condition, subderivative, regular subdifferential.

1

Introduction

Consider the nonlinear programming problem (NLP) min f(x) s.t. gi (x) ≤ 0, i ∈ I := {1, 2, · · · , m}, hj (x) = 0, j ∈ J := {m + 1, m + 2, · · · , m + q}, 1

where the functions f, gi , hj : Rn → R are assumed to be twice continuously differentiable. Throughout this paper, let C be the feasible set of (NLP) and let the Lagrange function L : Rn × Rm+q → R be defined by L(x, λ) := f(x) +

X

λi gi (x) +

i∈I

X

λj hj (x).

j∈J

We say that the KKT condition, originated with Karush (1939) and Kuhn and Tucker (1951), holds at a local minimum x¯ of (NLP) if there exists a vector λ ∈ Rm+q , called a KKT multiplier, such that ∇xL(¯ x, λ) = 0,

λi ≥ 0, λi gi (¯ x) = 0 ∀i ∈ I.

Denote the set of all KKT multipliers at x¯ by KKT(¯ x) and the critical cone at x¯ by     x), wi ≤ 0  h∇f(¯    n . V(¯ x) := w ∈ R h∇gi (¯ x), wi ≤ 0 ∀i ∈ I with gi (¯ x) = 0       h∇h (¯ x), wi = 0 ∀j ∈ J j

We say that the second-order necessary condition (for short, SON), originated with Ioffe (1979), holds at a local minimum x¯ of (NLP) if sup hw, ∇2xx L(¯ x, λ)wi ≥ 0

∀w ∈ V(¯ x),

(1)

λ∈KKT(¯ x)

where the convention sup ∅ := −∞ is used. It should be noticed that if the SON condition (1) holds at x¯, then the KKT condition holds at x¯, i.e., KKT(¯ x) 6= ∅. It is well-known that the KKT or SON conditions are not necessarily fulfilled at local minima of (NLP) unless some regularity conditions are assumed. If the assumed regularity conditions rely on the constraint functions only, they are more often referred to as constraint qualifications (for short, CQs). It is well-known that the Guignard constraint qualification (for short, GCQ), originated with Guignard (1969), is the weakest one in the sense that the GCQ holds at a feasible point x¯ of (NLP) if and only if the KKT condition holds at x¯ for any objective function having a local minimum at x¯ subject to the same constraints, see Gould and Tolle (1971) for the original version of this result, and Theorem 6.11 of Rockafellar and Wets (1998) for new features of this result. The SON conditions in the form of (1) (i.e., depending on the entire KKT multiplier set) have been extensively investigated in e.g. Ben-Tal (1980); Ben-Tal and Zowe (1982); Rockafellar (1989); Ioffe (1989); Burke and Poliquin (1992). As for the secondorder CQs ensuring the SON conditions of type (1), it is worth mentioning the MFCQ (see Mangasarian and Fromovitz (1967)) and the second-order Guignard constraint qualification 2

(for short, SGCQ), see Rockafellar (1988) and Kawasaki (1988) respectively. It should be noticed that the SGCQ at the particular direction 0 is identical with the GCQ. Concerning the classical second-order necessary conditions depending on some fixed KKT multiplier, we refer the reader to Fiacco and McCormick (1968); Anitescu (2000); Baccari and Trad (2005); Andreani et al. (2010). In particular, Arutyunov (1991) showed by a counterexample that the MFCQ alone is not sufficient for the classical second-order necessary condition to hold, see also Anitescu (2000). Later on Baccari and Trad (2005) established this type of second-order necessary conditions under the MFCQ plus the active constraint rank deficiency being 1. We refer the reader to Bonnans and Shapiro (2000) for detailed bibliographical discussions on the developments in the theory of second-order conditions. There can be found in the literature another type of regularity conditions, which in contrast relies on not only the constraint functions but also the objective function. Regularity conditions of this type are often related with an exact penalty function of (NLP). To be precise, we consider the lp (0 ≤ p ≤ 1) penalty function of (NLP) as follows: Fp (x) := f(x) + µS p (x)

∀x ∈ Rn ,

where the nonnegative real number µ is the penalty parameter, the function S is defined by S(x) =

X

max{gi (x), 0} +

i∈I

X

|hj (x)|

∀x ∈ Rn ,

j∈J

and the function S p(x) := (S(x))p is the penalty term with the convention 00 := 0 in the case of p = 0. When p = 1, Fp (x) reduces to the classical l1 penalty function, which dates back to Eremin (1967) and Zangwill (1967), and has been investigated by many researchers, e.g., Pietrzykowski (1969), Clarke (1983), Burke (1991a,b), and Rockafellar (1989). When 0 ≤ p < 1, Fp(x) is often referred to as the lower order lp penalty function, which was introduced by Luo et al. (1996) for the study of mathematical programs with equilibrium constraints, and was rediscovered from nonlinear Lagrangian and unified augmented Lagrangian schemes by Rubinov and Yang (2003) and Huang and Yang (2003) respectively. Another application of lp penalty function Fp(x) is given in the regularization of sparse optimization for compressed sensing theory, where f(x) is the data error (a quadratic function) and S(x) is the lp norm P 1/p defined by ( ni=1 |xi |p ) measuring the sparcity of the unknown image to be estimated, see

Chartrand and Staneva (2008); Wright et al. (2009). It is worth noting that these structured penalty/regularized problems are special cases of the composite optimization model investigated in Lewis and Wright (2011), where the convergence analysis has been established by virtue of active constraint identification. 3

Beside the penalty term S p , we consider any lower semicontinuous function φ : Rn → R+ ∪ {+∞} with the property C = {x ∈ Rn | φ(x) = 0} as a general penalty term associated with (NLP). Corresponding to such a penalty term, there is a penalty function of the form f + µφ associated with (NLP). It is clear that the penalty function f +µφ includes all the lp (0 ≤ p ≤ 1) penalty functions as special cases. The penalty function f + µφ as well as the penalty function Fp is said to be exact at a local minimum x¯ of (NLP) if it has a local minimum at x¯ for all sufficiently large values of the penalty parameter. By definition, the penalty function F0 is exact at any local minimum of (NLP). It was shown in Rubinov and Yang (2003) that the penalty function Fp with 0 < p ≤ 1 is exact if and only if the following generalized calmness-type condition holds: lim inf u→0

β(u) − β(0) > −∞, kukp

(2)

where u ∈ Rm+q and β(u) is the perturbation function defined to be the optimal value of the optimization problem min f(x) s.t. gi (x) ≤ ui,

i ∈ I,

hj (x) = uj , j ∈ J. When p = 1, this result was established in Clarke (1983) and Burke (1991a). We refer the reader to the excellent survey paper by Burke (1991b) for a comprehensive investigation on the central roles that the exact penalty function F1 plays in constrained optimization. It is well known that both the KKT condition and the SON condition hold at x¯ if the penalty function F1 is exact at x¯, see in particular Clarke (1983) and Rockafellar (1989). But for 0 ≤ p < 1, the KKT condition may not hold at x¯ even if the penalty function Fp is exact at x¯. However, by applying Farkas’ Lemma and by estimating Dini upper-directional derivatives of Fp(x) using the tools of (generalized) second-order Taylor expansions, Yang and Meng (2007) showed that it is possible to derive KKT conditions from lower order exact penalty functions when the constraint functions satisfy some additional conditions. In presenting these additional conditions, the kernel consisting of directions at which the Dini upper-directional derivative of the penalty term S p vanishes, plays a key role. Later on, Meng and Yang (2010) presented some weaker conditions by using the kernel of the contingent derivative of the penalty term S p . 4

The first aim of the paper is to study refinements of first-order necessary conditions for (NLP) obtained in Yang and Meng (2007) and Meng and Yang (2010) by using the exact penalty functions Fp . We say that the penalty term φ (resp. the penalty term S p) is of KKTtype at a local minimum x¯ of (NLP) if it has the ability of detecting KKT conditions in the sense that the KKT condition holds at x¯ whenever the penalty function f + µφ (resp. the penalty function Fp) is exact at x¯. By definition, S is of KKT-type at x¯, and so is S 0 if and only if the GCQ holds at x¯. We begin by showing an auxiliary result which asserts that the polar cones of the subderivative kernel and the subderivative domain of an extended-real-valued function at a local minimum are closely related with its regular subdifferential at this local minimum. We then establish necessary and sufficient conditions for the general penalty term φ, in particular the lp penalty term S p, to be of KKT-type by virtue of the regular subdifferentials of the penalty terms. In our proof, the variational description of regular subgradients (Rockafellar and Wets (1998), Proposition 8.5) plays a key role. In particular when p = 12 , we show that a sufficient condition for the penalty term S p to be of KKT-type can be equivalently expressed by the gradients and Hessians of the original data of (NLP). The second aim of this paper is to derive the SON conditions for (NLP) from the exactness of the penalty function f +µφ. This is done by assuming a second-order constraint qualification which requires that the second-order linearized tangent set is included in the closed convex hull of the kernel of the parabolic subderivative of the penalty term and by applying the duality theorem of linear programming. In particular, by assuming that the problem data are three times continuously differentiable and applying the third-order Taylor expansions, we obtain specific forms of this second-order constraint qualification for the case of φ = S p. For example, for (NLP) with inequality constraints only and when p = 32 , we require a sum of a second-order derivative and a third-order derivative of the constraint functions to be nonpositive and for (NLP) and when p ∈ ( 23 , 1], we require no additional condition. It is easy to see that the exactness of F1 implies that of Fp, if p < 1. Hence our result in the case of p ∈ ( 32 , 1] extends the earlier ones obtained by Clarke (1983) and Rockafellar (1989) that both the KKT condition and the SON condition hold at x¯ if the penalty function F1 is exact at x¯. It is also worth noting that we retrieve the SGCQ when p = 0. The distinguishing characteristics of our conditions (cf. Proposition 2.1 and Theorem 3.2 below) are that second- and third-order derivatives information are involved respectively when corresponding first- and second-order conditions are established. These characteristics may play a key role when penalty functions Fp are applied to degenerate optimization problems (such as MPCC and MPEC). In recent years, many conditions of the similar characteristics have been 5

extensively investigated when studying first- and second-order conditions by using the notion of 2-regularity, see Izmailov and Solodov (2001); Arutyunov et al. (2008). It is worth noting that 2-regularity implies the exactness of F1/2, see Izmailov and Solodov (2001). However, 2-regularity alone is still not able to derive a first- or second-order necessary condition, but it can attain prime first- and second-order necessary conditions involving second- and third-order derivative information respectively for (NLP), see Arutyunov et al. (2008). The notation that we employ in this paper is for the most part borrowed from the book by Rockafellar and Wets (1998). Let R := R ∪ {±∞} and let R+ := {t ∈ R | t ≥ 0}. For vectors x, y in Rn , we denote by xT the transpose of x, by hx, yi the inner product of x and y, by x⊥ := {v | hv, xi = 0} the orthogonal complement of x, and by kxk the Euclidean norm of x. For any function f : Rn → R+ ∪ {+∞} and any p > 0, let f p (x) = (f(x))p for all x ∈ Rn with the convention that (+∞)p = +∞. For a given subset A of Rn , we denote the closure of A, the interior of A, the boundary of A and the convex hull of A respectively by clA, intA, bdA and convA. The polar cone of A is defined by A∗ = {v ∈ Rn | hv, xi ≤ 0 ∀x ∈ A}. The positive hull of A is defined by posA = {λx | x ∈ A, λ ≥ 0}. The horizon cone of A, representing the direction set of A, is defined by A∞ = {x ∈ Rn | ∃xk ∈ A, ∃λk → 0 + with λk xk → x}. The distance function to A, written as dA (·), is defined by dA (x) := inf kx − yk. y∈A

The indicator function of A is defined by δA (x) :=

(

0

if x ∈ A,

+∞ otherwise.

If A is empty, we set by convention A∗ = Rn ,

posA = {0},

A∞ = {0},

dA (·) = +∞,

and δA (·) = +∞.

The variational geometry of A at some x¯ ∈ A can be captured by a number of notions that have been investigated in Chapters 6 and 13 of Rockafellar and Wets (1998). Some of them are as follows. 6

(i) A vector w ∈ Rn belongs to the tangent cone TA (¯ x) to A at x¯, if there are sequences tk → 0+ and wk → w such that x¯ + tk wk ∈ A for all k. bA (¯ (ii) The regular normal cone N x) to A at x¯ is the polar cone of TA (¯ x).

(iii) A vector z ∈ Rn belongs to the second-order tangent set to A at x¯ for a vector w ∈ TA (¯ x), written as z ∈ TA2 (¯ x | w), if there are sequences tk → 0+ and zk → z such that x¯ + tk w + 21 t2k zk ∈ A for all k. When w 6∈ TA (¯ x), we interpret TA2 (¯ x | w) as an empty set. Let f : Rn → R be an extended-real-valued function. The effective domain of f is the set domf := {x ∈ Rn | f(x) < +∞},

the kernel of f is the set kerf := {x ∈ Rn | f(x) = 0}, and the epigraph of f is the set epif := {(x, α) ∈ Rn × R | α ≥ f(x)}. The function f is said to be lower semicontinuous if epif is closed in Rn × R. Moreover, f is said to be positively homogeneous if 0 ∈ domf and f(λx) = λf(x) for all x and all λ > 0, and it is sublinear if in addition f(x + x0 ) ≤ f(x) + f(x0 ) for all x and x0 . Let x¯ be a point with f(¯ x) finite. The notions of subgradients and subderivatives that we need throughout the paper are summarized as follows. b x), if (i) The vector v ∈ Rn is a regular subgradient of f at x¯, written v ∈ ∂f(¯ f(x) ≥ f(¯ x) + hv, x − x¯i + o(kx − x¯k).

(ii) For any w ∈ Rn , the subderivative of f at x¯ for w is defined by df(¯ x)(w) :=

f(¯ x + τ w0 ) − f(¯ x) . τ →0+, w →w τ lim inf 0

(iii) For any vector w with df(¯ x)(w) finite and z ∈ Rn , the parabolic subderivative of f at x¯ for w with respect to z is defined by f(¯ x + τ w + 21 τ 2 z 0) − f(¯ x) − τ df(¯ x)(w) . 1 2 τ →0+, z →z τ 2

d2 f(¯ x)(w | z) := lim inf 0

7

For f : Rn → R and any point x¯ with f(¯ x) finite, the subderivative function df(¯ x) : Rn → R b x) is is lower semicontinuous and positively homegeneous, and the regular subdifferential ∂f(¯

closed and convex, and furthermore, we have two formulas as follows: epi df(¯ x) = Tepif (¯ x, f(¯ x)),

(3)

b x) = {v ∈ Rn | hv, wi ≤ df(¯ ∂f(¯ x)(w) ∀w ∈ domdf(¯ x)}.

(4)

and

These notions and results have been studied in Chapters 8 and 13 of Rockafellar and Wets (1998).

2

Refinements of First-order Necessary Conditions

In this section, we study conditions under which penalty terms are of KKT-type in the following sense. Definition 2.1 We say that the penalty term φ is of KKT-type at some feasible point x¯ of (NLP) if the KKT condition holds at x¯ whenever the penalty function f + µφ is exact at x¯. For any extended-real-valued function having a finite local minimum, it turns out that the kernel and the domain of its subderivative are closely related with its regular subdifferential, as can be seen from the following lemma. Lemma 2.1 Suppose that the function ψ : Rn → R has a local minimum at x¯ with ψ(¯ x) finite. Then we have b x) ⊂ [kerdψ(¯ [domdψ(¯ x)]∗ ⊂ ∂ψ(¯ x)]∗.

(5)

b x) is a The first inclusion in (5) is an equality if and only if the regular subdifferential ∂ψ(¯

cone, while the second inclusion in (5) is an equality if and only if [domdψ(¯ x)]∗ = [kerdψ(¯ x)]∗. Furthermore, if the subderivative dψ(¯ x) is a sublinear function, then b x)) = [kerdψ(¯ clpos(∂ψ(¯ x)]∗.

(6)

dψ(¯ x)(w) ≥ 0 ∀w ∈ Rn .

(7)

Proof. Since ψ has a local minimum at x¯ with ψ(¯ x) finite, it follows from Theorem 10.1 of b x) or equivalently Rockafellar and Wets (1998) that 0 ∈ ∂ψ(¯ 8

By definition, v ∈ [domdψ(¯ x)]∗ if and only if hv, wi ≤ 0 ∀w ∈ domdψ(¯ x), which together with (7) implies that hv, wi ≤ dψ(¯ x)(w) ∀w ∈ domdψ(¯ x), b x). That is, the first inclusion in (5) holds. For any or equivalently by (4) that v ∈ ∂ψ(¯ b x), it follows from (4) that v ∈ ∂ψ(¯ hv, wi ≤ 0 ∀w ∈ kerdψ(¯ x),

or equivalently v ∈ [kerdψ(¯ x)]∗. That is, the second inclusion in (5) is true. By the definition of the regular normal cone, we have bepiψ (¯ N x, ψ(¯ x)) = Tepiψ (¯ x, ψ(¯ x))∗ .

(8)

b x)∞ = [domdψ(¯ ∂ψ(¯ x)]∗.

(9)

b x)∞ = ∂ψ(¯ b x), ∂ψ(¯

(10)

In view of (3) and (8), we can get from Theorem 8.9 of Rockafellar and Wets (1998) that

b x) is necessarily a cone. Conversely, if ∂ψ(¯ b x) If the first inclusion in (5) is an equality, then ∂ψ(¯ b x)∞ = cl∂ψ(¯ b x). Since ∂ψ(¯ b x) is a cone, then by the definition of the horizon cone we have ∂ψ(¯

is closed, we thus have

which together with (9) implies that the first inclusion in (5) is an equality. If the second b x) is necessarily a cone or equivalently the first inclusion inclusion in (5) is an equality, then ∂ψ(¯

in (5) is an equality, implying that [domdψ(¯ x)]∗ = [kerdψ(¯ x)]∗. Conversely, if [domdψ(¯ x)]∗ = [kerdψ(¯ x)]∗, then the second inclusion in (5) is an equality.

Assume now that the subderivative function dψ(¯ x) is sublinear. In view of (7), we have kerdψ(¯ x) = {w ∈ Rn | dψ(¯ x)(w) ≤ 0}, which implies by the sublinearity of dψ(¯ x) that kerdψ(¯ x) is a closed and convex cone. By Theorem 8.9 of Rockafellar and Wets (1998), we have b x), λ > 0} ∪ {(v, 0) | v ∈ ∂ψ(¯ b x)∞ }. bepiψ (¯ N x, ψ(¯ x)) = {λ(v, −1) | v ∈ ∂ψ(¯ 9

(11)

b x)]∗ be arbitrarily given. For any v ∈ ∂ψ(¯ b x) and λ > 0, we have Let w ∈ [∂ψ(¯ h(w, 0), λ(v, −1)i = λhw, vi ≤ 0.

(12)

b x)∞ , we obtain from (9) and the first inclusion in (5) that v 0 ∈ ∂ψ(¯ b x) and For any v 0 ∈ ∂ψ(¯ hence

h(w, 0), (v 0, 0)i = hw, v 0i ≤ 0.

(13)

In view of (11), (12) and (13), we have bepiψ (¯ (w, 0) ∈ N x, ψ(¯ x))∗ .

(14)

In view of (3) and the sublinearity of dψ(¯ x), Tepiψ (¯ x, ψ(¯ x)) must be a closed and convex cone. This together with (14) and (8) implies that (w, 0) ∈ Tepiψ (¯ x, ψ(¯ x)). Then from (3) and (7), we b x)]∗ is arbitrarily given, we thus have have dψ(¯ x)(w) = 0, i.e., w ∈ kerdψ(¯ x). Since w ∈ [∂ψ(¯ b x)]∗ ⊂ kerdψ(¯ [∂ψ(¯ x).

The converse inclusion follows from (5) and the fact that kerdψ(¯ x) is a closed and convex cone. That is, we have shown b x)]∗, kerdψ(¯ x) = [∂ψ(¯

or equivalently

This completes the proof.

b x)]∗∗ = clpos(∂ψ(¯ b x)). [kerdψ(¯ x)]∗ = [∂ψ(¯

2

Remark 2.1 If ψ is regular at x¯ (see Definition 7.25 of Rockafellar and Wets (1998)) as is true in particular when ψ is convex and locally lower semicontinuous at x¯ (see Example 7.27 of Rockafellar and Wets (1998)), then the subderivative dψ(¯ x) is necessarily a sublinear function, as can be seen from Theorem 8.18 and Corollary 8.19 of Rockafellar and Wets (1998). Example 2.1 below demonstrates that the closure operation cannot be removed from the left-hand side of (6) even if ψ is convex. On the other hand, Example 2.2 below demonstrates that when the subderivative function dψ(¯ x) is not sublinear, the equality (6) may not hold even if the closure operation is kept on the left-hand side. Example 2.1 Consider at x¯ = (0, 0)T the function ψ(x) = max g(x, t), where g(x, t) = tx1 + 0≤t≤1

2

2

t x2 for all x ∈ R and t ∈ R. Since for each fixed t, the function g(·, t) is linear, ψ is by

10

definition a sublinear function and hence epiψ is a closed and convex cone. In view of the definition of dψ(¯ x) and ψ(¯ x) = 0, we have dψ(¯ x) = ψ. By simple calculations, we have   x1 + x2 if x ∈ A1 ,    0 if x ∈ A2 , ψ(x) = 2   x   − 1 if x ∈ A3 , 4x2

where A1 = {x ∈ R2 | x1 + x2 ≥ 0, x1 + 2x2 ≥ 0}, A2 = {x ∈ R2 | x1 + x2 ≤ 0, x1 ≤ 0} and A3 = {x ∈ R2 | x1 + 2x2 < 0, x2 < 0}. From the above formula, it is clear to see that ψ(x) ≥ ψ(¯ x) = 0 for all x ∈ R2 . That is, the function ψ admits its global minimum at x¯. Since dψ(¯ x) = ψ, we have kerdψ(¯ x) = kerψ = {x ∈ R2 | ψ(x) = 0} = A2 , and hence [kerdψ(¯ x)]∗ = {x ∈ R2 | 0 ≤ x2 ≤ x1 }. From the definition of ψ and Theorem 10.31 of Rockafellar and Wets (1998), it follows that

and hence

b x) = conv{(t, t2)T | 0 ≤ t ≤ 1} = {x ∈ R2 | x2 ≤ x2 ≤ x1}, ∂ψ(¯ 1 b x)) = {x ∈ R2 | 0 ≤ x2 ≤ x1 }\{x ∈ R2 | x1 > 0, x2 = 0}. pos(∂ψ(¯

Therefore, the equality

b x)) = [kerdψ(¯ clpos(∂ψ(¯ x)]∗

holds, but when the closure operation is removed from the left-hand side, we merely have b x)) ( [kerdψ(¯ pos(∂ψ(¯ x)]∗.

Example 2.2 (An example on p.316 of Rockafellar and Wets (1998)) Consider at x¯ = (0, 0)T the function

 2 x2     |x1| if x1 6= 0, ψ(x) = 0 if x = (0, 0)T ,     +∞ otherwise.

b x) = {(0, 0)T }. By It can be found in Rockafellar and Wets (1998) that dψ(¯ x) = ψ and ∂ψ(¯

letting y = (1, 1)T and z = (−1, 1)T , we have 12 ψ(y) + 21 ψ(z) = 1 < ψ( 21 (y + z)) = +∞. This suggests that ψ is not convex and hence dψ(¯ x) is not a sublinear function. Moreover, we have domdψ(¯ x) = {x ∈ Rn | x1 6= 0} ∪ {(0, 0)T }, kerdψ(¯ x) = {x ∈ R2 | x2 = 0}, and hence b x) = clpos(∂ψ(¯ b x)) ( [kerdψ(¯ [domdψ(¯ x)]∗ = ∂ψ(¯ x)]∗. 11

By applying the variational description of regular subgradients, we now present some conditions for the general penalty term φ to be of KKT-type. Let x¯ ∈ C. Denote the active inequality index set of (NLP) at x¯ by I(¯ x) := {i ∈ I | gi (¯ x) = 0}, and the first-order linearized tangent cone to C at x¯ by ( ) h∇g (¯ x ), wi ≤ 0 ∀i ∈ I(¯ x ) i LC (¯ x) := w ∈ Rn . h∇hj (¯ x), wi = 0 ∀j ∈ J

Theorem 2.1 Let x¯ be a feasible point of (NLP) and let φ be a general penalty term of (NLP). Consider the following conditions: (i) [kerdφ(¯ x)]∗ ⊂ LC (¯ x )∗ . b x) ⊂ LC (¯ (ii) ∂φ(¯ x )∗ .

(iii) φ is a KKT-type penalty term at x¯. Then (i) =⇒ (ii) ⇐⇒ (iii).

Proof. [(i) =⇒ (ii)]: Note that φ admits a global minimum at x¯. The implication follows immediately from Lemma 2.1. [(ii) =⇒ (iii)]: Suppose that there is some µ ≥ 0 such that the penalty function f(x) + µφ(x) has a local minimum at x¯. It then follows from Exercise 8.8 (c) and Theorem 10.1 of Rockafellar and Wets (1998) that b + µφ)(¯ b x). 0 ∈ ∂(f x) = ∇f(¯ x) + µ∂φ(¯

(15)

b x). From (ii), it follows that −∇f(¯ Thus, we have −∇f(¯ x) ∈ µ∂φ(¯ x) ∈ LC (¯ x)∗ . This implies by Farkas’ Lemma that the KKT condition holds at x¯. Therefore, according to Definition 2.1,

φ is a KKT-type penalty term at x¯.

b x). According to the variational description of regular subgradi[(iii) =⇒ (ii)]: Let v ∈ ∂φ(¯

ents in Proposition 8.5 of Rockafellar and Wets (1998), there exist a neighborhood V of x¯ and

a continuously differentiable function ψ : Rn → R with ψ(¯ x) = φ(¯ x) = 0 and ∇ψ(¯ x) = v such that ψ(x) ≤ φ(x) 12

∀x ∈ V.

Set f = −ψ. It is clear to see that f(x) + φ(x) = −ψ(x) + φ(x) ≥ 0 = f(¯ x) + φ(¯ x)

∀x ∈ V.

That is, the penalty function f +φ admits a local minimum at x¯. Since φ is a KKT-type penalty term at x¯, the KKT condition holds at x¯. By Farkas’ Lemma again, we have −∇f(¯ x) ∈ LC (¯ x)∗ . b x) ⊂ LC (¯ Since ∇f(¯ x) = −∇ψ(¯ x) = −v, we have v ∈ LC (¯ x)∗. Therefore, we have ∂φ(¯ x)∗. This

completes the proof.

2

From Theorem 2.1, we find that the subderivative dφ(¯ x) and in particular its kernel are very crucial for the understanding of the penalty term φ being of KKT-type at x¯. Some basic properties of dφ(¯ x) are summarized in the following lemma. Lemma 2.2 Let x¯ be a feasible point of (NLP) and let φ be a general penalty term of (NLP). The following statements are true: (i) dφ(¯ x)(w) ≥ 0 for all w ∈ Rn . (ii) kerdφ(¯ x) is a nonempty closed cone. (iii) w ∈ kerdφ(¯ x) if and only if there exist tk → 0+ and wk → w such that φ(¯ x + t k wk ) → 0. tk

(16)

(iv) TC (¯ x) ⊂ kerdφ(¯ x), and the inclusion is an equality if φ = δC or S 0 , or more generally there exist τ > 0 and δ > 0 such that for all x ∈ Rn with kx − x¯k ≤ δ, τ dC (x) ≤ φ(x).

(17)

Since φ includes the penalty terms S p with 0 ≤ p ≤ 1 as special cases, we can apply Theorem 2.1 to establish some conditions under which the penalty terms S p are of KKT-type. Let x¯ be a feasible point of (NLP). To begin with, we present two formulas as follows: dS(¯ x)(w) =

X

max{h∇gi (¯ x), wi, 0} +

X

|h∇hj (¯ x), wi| ∀w ∈ Rn ,

j∈J

i∈I(¯ x)

b x) = {∇x L0 (¯ ∂S(¯ x, λ) | λi = 0∀i 6∈ I(¯ x), 0 ≤ λi ≤ 1∀i ∈ I(¯ x), −1 ≤ λj ≤ 1∀j ∈ J } ,

where

L0 (x, λ) :=

(18)

X i∈I

λi gi (x) +

X

λj hj (x) ∀x ∈ Rn , λ ∈ Rm .

j∈J

13

(19)

We refer the reader to Example 7.28, Exercise 8.31 and Corollary 10.9 of Rockafellar and Wets (1998) for details on how to get (18) and (19), from which it follows that kerdS(¯ x) = LC (¯ x),

(20)

b x)) = LC (¯ pos(∂S(¯ x)∗ = {∇xL0 (¯ x, λ) | λi = 0∀i 6∈ I(¯ x), λi ≥ 0∀i ∈ I(¯ x)} .

(21)

and

In view of Theorem 2.1 and (20), it is clear that S is a KKT-type penalty term at x¯. Now we present conditions under which the penalty terms S p with 0 ≤ p < 1 are of KKT-type at x¯. Theorem 2.2 Let x¯ be a feasible point of (NLP) and let 0 ≤ p < 1. Consider the following conditions: (i) [kerdS p (¯ x)]∗ = LC (¯ x )∗ . b p (¯ x )∗ . (ii) ∂S x) = LC (¯

(iii) S p is a KKT-type penalty term at x¯. Then (i) =⇒ (ii) ⇐⇒ (iii). Proof. From the definition of the subderivative and (20), we have domdS p (¯ x) ⊂ LC (¯ x).

(22)

It then follows from (22) and Lemma 2.1 that b p (¯ LC (¯ x)∗ ⊂ ∂S x) ⊂ [kerdS p (¯ x)]∗ .

(23)

From (23), it follows immediately that (i) =⇒ (ii), and that (ii) holds if and only if b p(¯ ∂S x) ⊂ LC (¯ x )∗ .

According to Theorem 2.1, the latter condition holds if and only if (iii) holds. That is, we have (ii) ⇐⇒ (iii). This completes the proof.

2

Remark 2.2 (a) Conditions (ii) and (iii) are not equivalent in general when p = 1, because b x) may not be a cone as indicated by (19). However, it follows from (21) and the proof ∂S(¯ b p (¯ of Theorem 2.2 that condition (iii) holds with 0 ≤ p ≤ 1 if and only if pos(∂S x)) = LC (¯ x)∗ , though the positive hull operation is surplus when 0 ≤ p < 1. 14

b 0(¯ (b) By the definition of the regular subdifferential, we have ∂S x) = TC (¯ x)∗ . Thus, condition

(ii) holds with p = 0 if and only if TC (¯ x)∗ = LC (¯ x)∗ or by definition the GCQ at x¯

holds. Note that the penalty function F0 is exact at any local minimum of (NLP). Thus, the equivalence of conditions (ii) and (iii) in the case of p = 0 recovers the well-known result that the GCQ holds at a feasible point x¯ of (NLP) if and only if the KKT condition holds at x¯ for any objective function having a local minimum at x¯ subject to the same constraints.

(c) Meng and Yang (2010) showed that S p is a KKT-type penalty term at x¯ if [kerDS p (¯ x)]∗ = LC (¯ x )∗ ,

(24)

where DS p (¯ x) is the contingent derivative of S p at x¯, see Aubin and Ekeland (1984) for its definition. However, we can easily verify that kerdS p (¯ x) = kerDS p (¯ x). Thus, condition (i) and condition (24) are equivalent. In order to verify that the penalty term S p is of KKT-type at some feasible point x¯ of (NLP), it is convenient to show some stronger condition as follows: kerdS p (¯ x) = LC (¯ x),

(25)

for which a number of sufficient conditions in terms of the original data of (NLP) can be found in Yang and Meng (2007). In particular, it has been shown in Yang and Meng (2007) that the equality (25) holds with p =

1 2

if for every w ∈ LC (¯ x), it follows that

hw, ∇2gi (¯ x)wi ≤ 0 ∀i ∈ I(¯ x, w),

hw, ∇2hj (¯ x)wi = 0 ∀j ∈ J,

(26)

where I(¯ x, w) := {i ∈ I(¯ x) | h∇gi (¯ x), wi = 0}

∀w ∈ Rn .

Now we present an equivalent condition for (25) in the following proposition. Proposition 2.1 The equality (25) holds with p = follows that max

λ∈KKT0 (¯ x)

where

( X

λi hw, ∇2gi (¯ x)wi +

i∈I

1 2

X j∈J

  

if and only if for every w ∈ LC (¯ x), it )

λj hw, ∇2 hj (¯ x)wi

= 0,

X X λi ∇gi (¯ x) + λj ∇hj (¯ x) = 0 m+q j∈J KKT0 (¯ x) := λ ∈ R i∈I   λi ≥ 0 ∀i ∈ I(¯ x), λi = 0 ∀i ∈ I\I(¯ x) 15

(27)     

.

Proof. To begin with, we need the notion of second-order subderivative, which has been studied thoroughly in Chapter 13 of Rockafellar and Wets (1998). For any extended-realvalued function f : Rn → R with f(x) finite and v, w ∈ Rn , the second subderivative of f at x for v and w is by definition given by d2 f(x | v)(w) :=

f(x + τ w0 ) − f(x) − τ hv, w0 i . 1 2 τ →0+, w →w τ 2 lim inf 0

By definition, it is straightforward to verify that 1

x)(w)]2 d2 S(¯ x | 0)(w) = 2[dS 2 (¯

∀w ∈ Rn .

(28)

In view of (18) and (19), and by applying Example 13.16 and Proposition 13.19 of Rockafellar b x) and w ∈ kerdS(¯ and Wets (1998), we have for each v ∈ ∂S(¯ x) ∩ v ⊥ ,   ∇xL0 (¯ x, λ) = v   d2 S(¯ x | v)(w) = max hw, ∇2xx L0 (¯ x, λ)wi λi = 0 ∀i 6∈ I(¯ x), 0 ≤ λi ≤ 1 ∀i ∈ I(¯ x)    −1 ≤ λj ≤ 1 ∀j ∈ J

In view of (28) and (29), the result follows immediately. This completes the proof.

      

. (29)

2

Remark 2.3 Let w ∈ LC (¯ x). The geometric interpretation of condition (27) is that there exists some z ∈ Rn such that the parabolic subderivative d2 S(¯ x)(w | z) of S at x¯ for w with respect to z is zero. Indeed, by the duality theorem of linear programming (see Mangasarian (1969)), condition (27) holds if and only if there exists some z ∈ Rn such that h∇gi (¯ x), zi + hw, ∇2 gi (¯ x)wi ≤ 0 ∀i ∈ I(¯ x, w), h∇hj (¯ x), zi + hw, ∇2hj (¯ x)wi = 0 ∀j ∈ J. By the second-order Taylor expansion, z satisfies the above system if and only if there exist some sequences tk → 0+ and zk → z such that 1

S 2 (¯ x + tk w + t2k zk ) → 0. tk This implies d2 S(¯ x)(w | z) = 0. It is clear to see that (26) =⇒ (27). The following example illustrates that condition (27) can be strictly weaker than condition (26) even when the GCQ does not hold at x¯.

16

Example 2.3 In (NLP), let n = m = 2, q = 0, g1 (x) = x21x2, g2 (x) = x22 − x1, and let x¯ = (0, 0). Condition (26) is not satisfied, because for any w = (0, w2 ) with w2 6= 0, we have w ∈ LC (¯ x) = R+ × R and I(¯ x, w) = {1, 2}, but hw, ∇2g2 (¯ x)wi = 2w22 > 0. However, we can show that condition (27) is satisfied. By definition, we have KKT0 (¯ x) = {λ ∈ R2+ | λ1 ∇g1(¯ x) + λ2 ∇g2 (¯ x) = 0} = R+ × {0}. Then, for each w ∈ LC (¯ x), we have max λ1 hw, ∇2g1 (¯ x)wi + λ2 hw, ∇2g2 (¯ x)wi = 0. λ∈KKT0 (¯ x)

1

Therefore, condition (27) is satisfied and hence kerdS 2 (¯ x) = LC (¯ x). Moreover, by Lemma 2.2, we can calculate kerdS p (¯ x) for all p ∈ [0, 1]. In fact, we have   R+ × (−R+ ) if      kerdS p (¯ x) = R+ × (−R+ ) ∪ {0} × R+ if       R ×R if + Thus,

kerdS p (¯ x) = LC (¯ x) [kerdS p (¯ x)]∗ = LC (¯ x )∗ and [kerdS p (¯ x)]∗ 6= LC (¯ x )∗

1 0≤p≤ , 5 1 1 0. This implies that w ∈ clO and hence clO = Q. Since LC (¯ x) = Rn and Q = Rn , we get from (36) that kerdS 0 (¯ x) = LC (¯ x). In view of (20), statement (b) follows readily. ˜ = {y ∈ [(c)]: It follows from Lemma 2.4 of Yang and Meng (2007) that (32) holds. Let Q ˜ and Q∗ = P Q ˜ ∗ . Since ∇2g(¯ Rn | y T Λy ≤ 0}. In view of (35), we have Q = P Q x) is positive semi-definite with ∇2g(¯ x) 6= 0, we can assume without loss of generality that there exists a positive integer n1 such that Λ(i, i) > 0 for 1 ≤ i ≤ n1, and Λ(i, i) = 0 for n1 + 1 ≤ i ≤ n. ˜ = {0Rn1 } × Rn−n1 and hence Q ˜ ∗ = Rn1 × {0Rn−n1 }. Since Q∗ = P Q ˜ ∗ and P Then, we have Q is an orthogonal matrix, we have Q∗ 6= {0}. Since LC (¯ x)∗ = {0}, it follows from (36) that 1

x)]∗ 6= LC (¯ x )∗ . [kerdS 2 (¯ Thus, (31) holds. [(d)]: The result follows directly from (36) and (20). ˜ = {y ∈ [(e)]: It follows from Lemma 2.4 of Yang and Meng (2007) that (34) holds. Let O ˜ and O∗ = P O ˜ ∗ . Since ∇2 g(¯ Rn | y T Λy < 0}. In view of (35), we have O = P O x) is indefinite, we can assume without loss of generality that there exist positive integers n1 and n2 with n1 + n2 ≤ n such that Λ(i, i) > 0 for 1 ≤ i ≤ n1 , Λ(i, i) < 0 for n1 + 1 ≤ i ≤ n1 + n2 and ˜ =O ˜1 × Rn−n1 −n2 , where Λ(i, i) = 0 for n1 + n2 + 1 ≤ i ≤ n. Then we have O ˜1 = O

(

y ∈ Rn1 +n2 |

n1 X

Λ(i, i)yi2 −

nX 1 +n2

)

|Λ(i, i)|yi2 < 0 .

i=n1 +1

i=1

(i)

(i)

For every n1 + 1 ≤ i ≤ n1 + n2 , let y (i) ∈ Rn1 +n2 be such that yi = 1 and yj = 0 for all j 6= i. (i)

For every 1 ≤ i ≤ n1 , let y (i) ∈ Rn1 +n2 be such that yj = 1 for all n1 + 1 ≤ j ≤ n1 + n2 , nX 1 +n2 q (i) (i) ∆ yi = , and yj = 0 for all 1 ≤ j ≤ n1 but j 6= i, where ∆ = |Λ(i, i)|. It is 2Λ(i,i) i=n1 +1

˜ 1 for all 1 ≤ i ≤ n1 + n2 , and that the n1 + n2 vectors straightforward to check that ±y (i) ∈ O ˜ ∗ = {0} and O ˜ ∗ = {0}. y (i) with 1 ≤ i ≤ n1 + n2 are linearly independent. Thus, we have O 1 ˜ ∗ , we have O∗ = {0}. Since LC (¯ Since O∗ = P O x∗) = {0}, we have O∗ = LC (¯ x)∗ = {0}. In view of (36), (33) follows readily. This completes the proof. 19

2

3

Second-order Necessary Conditions

In this section, by applying the duality theorem of linear programming, we derive the SON conditions for (NLP) from the exactness of f + µφ under an inclusion condition between the second-order linearized tangent set of C and the kernel of the parabolic subderivative of φ. Then by applying the lp exact penalty functions and the third-order Taylor expansions, we also give some sufficient conditions in terms of the original data of (NLP) to guarantee that this inclusion condition holds as an equality. To begin with, we list some basic properties of the parabolic subderivatives d2 φ(¯ x)(w | ·) and d2 S p(¯ x)(w | ·) as follows. Lemma 3.1 Let x¯ be a feasible point of (NLP). The following statements are true: (i) d2 φ(¯ x)(w | z) ≥ 0 for all z ∈ Rn and w ∈ kerdφ(¯ x). (ii) Let w ∈ kerdφ(¯ x). The set kerd2 φ(¯ x)(w | ·) is a closed (possibly empty) subset of Rn with the property that z ∈ kerd2 φ(¯ x)(w | ·) if and only if there exist tk → 0+ and zk → z such that

φ(¯ x + tk w + 12 t2k zk ) → 0. t2k

(37)

(iii) kerd2 φ(¯ x)(w | ·) = kerdφ(¯ x) when w = 0. 1

x). (iv) kerd2 φ(¯ x)(w | ·) = ∅ when w ∈ kerdφ(¯ x)\kerdφ 2 (¯ (v) For any w ∈ TC (¯ x), TC2 (¯ x | w) ⊂ kerd2 φ(¯ x)(w | ·). The equality holds if φ = δC or S 0 , or more generally there exist τ > 0 and δ > 0 such that (17) holds for all x ∈ Rn with kx − x¯k ≤ δ. (vi) Let 0 ≤ p < p0 ≤ 1. Then for any w ∈ kerdS p (¯ x), 0

kerd2 S p(¯ x)(w | ·) ⊂ domd2 S p (¯ x)(w | ·) ⊂ kerd2 S p (¯ x)(w | ·). In what follows, we interpret kerd2 φ(¯ x)(w | ·) as an empty set when w 6∈ kerdφ(¯ x), and adapt a similar interpretation for kerd2 S p (¯ x)(w | ·). Now we establish a condition which allows us to derive the SON condition from the exactness of f + µφ. For each feasible point x¯ of (NLP), we denote the second-order linearized tangent set to C at x¯ in the direction w ∈ LC (¯ x) by ( ) x), zi + hw, ∇2 gi (¯ x)wi ≤ 0 ∀i ∈ I(¯ x, w) 2 n h∇gi (¯ , LC (¯ x | w) := z ∈ R h∇hj (¯ x), zi + hw, ∇2 hj (¯ x)wi = 0 ∀j ∈ J 20

and interpret L2C (¯ x | w) as an empty set in the case of w 6∈ LC (¯ x). Theorem 3.1 Let x¯ be a local minimum of (NLP). Suppose that the penalty function f + µφ is exact at x¯. If L2C (¯ x | w) ⊂ clconv[kerd2 φ(¯ x)(w | ·)]

∀w ∈ V(¯ x),

(38)

then the SON condition (1) holds, and in particular when L2C (¯ x | w) = ∅, the supremum in (1) is +∞. Proof. Since the penalty function f + µφ is exact at x¯, we have by applying Theorem 13.66 of Rockafellar and Wets (1998) that, for all sufficiently large µ > 0, d(f + µφ)(¯ x)(w) ≥ 0,

(39)

and in the case of w 6= 0 with d(f + µφ)(¯ x)(w) = 0, inf d2 (f + µφ)(¯ x)(w | z) ≥ 0.

z∈Rn

(40)

As in the proof of Theorem 2.1, it follows from (39) that, for all sufficiently large µ > 0, d(f + µφ)(¯ x)(w) = h∇f(¯ x), wi + µdφ(¯ x)(w) > 0 ∀w 6∈ kerdφ(¯ x), and d(f + µφ)(¯ x)(w) = h∇f(¯ x), wi ≥ 0 ∀w ∈ kerdφ(¯ x).

(41)

Therefore, for all sufficiently large µ > 0, d(f + µφ)(¯ x)(w) = 0 ⇐⇒ w ∈ kerdφ(¯ x) ∩ ∇f(¯ x )⊥ .

(42)

By the definition of the parabolic subderivative, we can verify that for any µ ≥ 0 and w, z ∈ Rn , d2 (f + µφ)(¯ x)(w | z) = h∇f(¯ x), zi + hw, ∇2f(¯ x)wi + µd2 φ(¯ x)(w | z).

(43)

It follows from Lemma 3.1 (ii) and (40)-(43) that there exists no vector (w, z) ∈ Rn × Rn such that h∇f(¯ x), wi = 0, h∇f(¯ x), zi + hw, ∇2 f(¯ x)wi < 0, z ∈ kerd2 φ(¯ x)(w | ·), or equivalently there exists no vector (w, z) ∈ Rn × Rn such that h∇f(¯ x), wi = 0, h∇f(¯ x), zi + hw, ∇2f(¯ x)wi < 0, z ∈ clconv[kerd2 φ(¯ x)(w | ·)].

21

(44)

Observe that 0 ∈ V(¯ x). It follows from the definition of L2C (¯ x | 0) and Lemma 3.1 (iii) that (38) holds with w = 0 if and only if LC (¯ x) ⊂ clconv[kerdφ(¯ x)], or equivalently [kerdφ(¯ x)]∗ ⊂ LC (¯ x)∗ . This implies by Theorem 2.1 that the penalty term φ is of KKT-type and hence KKT(¯ x) 6= ∅. By the definition of V(¯ x), it is easy to verify that for any w ∈ V(¯ x),   x, λ) = 0  ∇x L(¯  KKT(¯ x) = λ ∈ Rm+q λi ≥ 0 ∀i ∈ I(¯ x, w)    λ = 0 ∀i ∈ I\I(¯ x, w) i

      

.

(45)

Let w ∈ V(¯ x). First we assume that L2C (¯ x | w) 6= ∅. It follows from (38), the definition

of L2C (¯ x | w), and the inconsistency of the system (44) that the optimal value of the linear program min h∇f(¯ x), zi + hw, ∇2 f(¯ x)wi

z∈Rn

s.t.

h∇gi (¯ x), zi + hw, ∇2 gi (¯ x)wi ≤ 0 h∇hj (¯ x), zi + hw, ∇2 hj (¯ x)wi = 0

∀i ∈ I(¯ x, w),

(46)

∀j ∈ J,

is nonnegative. Applying the duality theorem of linear programming (see Mangasarian (1969)), we confirm that the optimal value of the linear program max hw, ∇2xx L(¯ x, λ)wi m+q

λ∈R

s.t.

∇xL(¯ x, λ) = 0, λi ≥ 0 ∀i ∈ I(¯ x, w), λi = 0 ∀i ∈ I\I(¯ x, w),

is also nonnegative. This together with (45) implies that max hw, ∇2xxL(¯ x, λ)wi ≥ 0.

λ∈KKT(¯ x)

That is, the SON condition (1) holds in the case of L2C (¯ x | w) 6= ∅. Next we assume that L2C (¯ x | w) = ∅. Since V(¯ x) ⊂ LC (¯ x), we have w ∈ LC (¯ x). It follows from the definition of L2C (¯ x | w) that there exists no z ∈ Rn such that h∇gi (¯ x), zi + hw, ∇2gi (¯ x)wi ≤ 0 2

∀i ∈ I(¯ x, w),

h∇hj (¯ x), zi + hw, ∇ hj (¯ x)wi = 0 ∀j ∈ J. 22

The duality theorem of linear programming (see Mangasarian (1969)) guarantees the existence ˜ ∈ Rm+q with λ ˜i ≥ 0 for all i ∈ I(¯ ˜ i = 0 for all i ∈ I\I(¯ of some λ x, w) and λ x, w) such that X

˜ i ∇gi (¯ λ x) +

i∈I

and

X

˜ i hw, ∇2 gi (¯ λ x)wi +

i∈I

X

˜ j ∇hj (¯ λ x) = 0,

(47)

˜ j hw, ∇2 hj (¯ λ x)wi > 0.

(48)

j∈J

X j∈J

¯ ∈ KKT(¯ ¯ + tλ ˜ for all t ≥ 0. It follows from (45), (47), and (48) that Let λ x) and let λt = λ λt ∈ KKT(¯ x) for all t ≥ 0, and that sup hw, ∇2xx L(¯ x, λ)wi ≥ suphw, ∇2xxL(¯ x, λt )wi = +∞.

λ∈KKT(¯ x)

t≥0

This completes the proof.

2

Let x¯ ∈ C and let φ = S p. In what follows, by assuming that all constraint functions in defining (NLP) are three times continuously differentiable, we shall give sufficient conditions in terms of the original data for the inclusion L2C (¯ x | w) ⊂ kerd2 S p (¯ x)(w | ·)

∀w ∈ LC (¯ x)

(49)

to hold, which is slightly stronger than (38) since in general kerd2 S p(¯ x)(w | ·) is not a closed and convex set and V(¯ x) ( LC (¯ x). The following index set is useful in the sequel: I(¯ x, w, z) := {i ∈ I(¯ x, w) | hz, ∇gi (¯ x)i + hw, ∇2 gi (¯ x)wi = 0} ∀w, z ∈ Rn . Assume that ψ : Rn → R is a three times continuously differentiable function. Let {tk } ⊂ R+ be a sequence such that tk → 0+, and let (x, u) ∈ Rn × Rn be such that ψ(x) = 0 and h∇ψ(x), wi = 0. For each z ∈ Rn , it follows from the third-order Taylor expansion that 1 ψ(x + tk w + t2k z) 2 1 1 1 1 = ψ(x) + tk h∇ψ(x), wi + t2k h∇ψ(x), zi + t2k hw + tk z, ∇2ψ(x)(w + tk z)i 2 2 2 2 1 1 1 1 + t3k ψ (3)(x)(w + tk z, w + tk z, w + tk z) + o(t3k ) 6 2 2 2 =

1 2 1 tk [h∇ψ(x), zi + hw, ∇2 ψ(x)wi] + t4k hz, ∇2 ψ(x)zi 2 8 1 3 1 (3) 1 1 1 2 + tk hw, ∇ ψ(x)zi + ψ (x)(w + tk z, w + tk z, w + tk z) + o(t3k ). 2 3 2 2 2 23

(50)

Theorem 3.2 Let x¯ be a local minimum of (NLP). Suppose that the functions gi with i ∈ I and hj with j ∈ J are three times continuously differentiable, and that the lp penalty function is exact at x¯. If, in addition, one of the following conditions is satisfied: (i) p ∈ ( 23 , 1], (ii) p =

2 3

and, for every z ∈ L2C (¯ x | w), it follows that  1 (3)  x)zi + gi (¯ x)(w, w, w) ≤ 0  hw, ∇2 gi (¯ 3 1 (3)   hw, ∇2 h (¯ x)(w, w, w) = 0 j x)zi + hj (¯ 3

∀ i ∈ I(¯ x, w, z), (51) ∀ j ∈ J,

(iii) p ∈ [0, 23 ), q = 0 (i.e., there is no equality constraint) and, for every z ∈ L2C (¯ x | w) with (w, z) 6= 0, it follows that 1 (3) x)(w, w, w) < 0 hw, ∇2 gi (¯ x)zi + gi (¯ 3

∀ i ∈ I(¯ x, w, z),

(52)

then the SON condition (1) holds. Proof. By Theorem 3.1, we only need to show that (49) holds. Let w ∈ LC (¯ x) and let z ∈ L2C (¯ x | w). Moreover, let tk → 0+. For any i ∈ I\I(¯ x, w), we have either gi (¯ x) < 0 or gi (¯ x) = 0 with h∇gi (¯ x), wi < 0. In each case, it is easy to verify by the first-order Taylor expansion that for all sufficiently large k, 1 gi (¯ x + tk w + t2k z) ≤ 0 2

∀i ∈ I\I(¯ x, w).

(53)

By the third-order Taylor expansion (50) and the definition of L2C (¯ x | w), we have, for every α ∈ [2, 3) max and

gi (¯ x + tk w + 12 t2k z) , 0 → 0 ∀i ∈ I(¯ x, w), tαk hj (¯ x + tk w + 12 t2k z) → 0 ∀j ∈ J. tαk

Combining (53), (54), and (55), we have for every p ∈ ( 23 , 1], S p (¯ x + tk w + 12 t2k z) t2k( ) !p X X hj (¯ gi (¯ x + tk w + 12 t2k z) x + tk w + 12 t2k z) = max ,0 + | | 2/p 2/p tk tk i∈I j∈J → 0.

24

(54)

(55)

x)(w | ·) and hence This implies by Lemma 3.1 (ii) that for any p ∈ ( 23 , 1], z ∈ kerd2 S p (¯ L2C (¯ x | w) ⊂ kerd2 S p (¯ x)(w | ·). Thus, statement (i) is true. Now we show that statement (ii) is true. It follows from the third-order Taylor expansion (50), condition (51), and the definition of L2C (¯ x | w) that (54) and (55) hold with α = 3. Thus, in view of (53), we have 2

S 3 (¯ x + tk w + 12 t2k z) → 0. t2k This implies by Lemma 3.1 (ii) that 2

L2C (¯ x | w) ⊂ kerd2 S 3 (¯ x)(w | ·). Thus, statement (ii) is true. Finally, we show that statement (iii) is true. It follows from the third-order Taylor expansion (50) and condition (52) that, for all sufficiently large k, 1 gi (¯ x + tk w + t2k z) ≤ 0 ∀i ∈ I(¯ x, w). 2

(56)

Combining (53) and (56), we have for all sufficiently large k, 1 x¯ + tk w + t2k z ∈ C. 2 This implies by definition that z ∈ TC2 (¯ x | w). Thus, we have L2C (¯ x | w) ⊂ TC2 (¯ x | w). In view of Lemma 3.1 (v), we have for any p ∈ [0, 23 ], L2C (¯ x | w) ⊂ kerd2 S p (¯ x)(w | ·). Thus, statement (iii) is true. This completes the proof.

2

Remark 3.1 (a) Let p = 1. By applying the second-order Taylor expansion we have kerd2 S(¯ x)(w | ·) = L2C (¯ x | w)

∀w ∈ LC (¯ x),

(57)

which implies that condition (38) holds. This recovers a well-known result that the (SON) condition (1) holds at x¯ when the l1 penalty function is exact at x¯, see Rockafellar (1989). 25

(b) Let 0 ≤ p < 1. In view of Lemma 3.1 (vi) and (57), we have kerd2 S p (¯ x)(w | ·) ⊂ L2C (¯ x | w)

∀w ∈ kerdS p (¯ x).

(58)

Thus, condition (38) holds if and only if L2C (¯ x | w) = clconv[kerd2 S p (¯ x)(w | ·)] ∀w ∈ V(¯ x).

(59)

According to Lemma 3.1 (v), condition (59) with p = 0 reduces to the so-called SGCQ, originated with Kawasaki (1988), which holds at x¯ if by definition L2C (¯ x | w) = clconv[TC2 (¯ x | w)] ∀w ∈ V(¯ x). It was shown by Kawasaki (1988) that if the linear independent constraint qualification (for short, LICQ) holds at x¯, that is, the vectors {∇gi (¯ x), i ∈ I(¯ x)} ∪ {∇hj (¯ x), j ∈ J } are linearly independent, then L2C (¯ x | w) = TC2 (¯ x | w) ∀w ∈ LC (¯ x), and hence (49) holds for any p ∈ [0, 1]. In the following example, we demonstrate that condition (51) may not hold even if the LICQ holds at x¯. Example 3.1 In (NLP), let n = 2, m = 1, q = 0, and g1 (x) = x31 − x2 . Consider a feasible point x¯ = (0, 0)T . Since ∇g(¯ x) = (0, −1)T , the LICQ holds at x¯. This implies by Lemma 3.1 (v) that, for any p ∈ [0, 1] and w ∈ LC (¯ x) = R × R+ , TC2 (¯ x | w) = kerd2 S p (¯ x)(w | ·) = L2C (¯ x | w) =

(

R × R+ if w2 = 0, R2

otherwise.

Let w ∈ LC (¯ x) with w2 = 0 and let z ∈ L2C (¯ x | w) with z2 = 0. By definition, we have I(¯ x, w, z) = {1}. Now it is easy to check that condition (51) is invalid when w1 > 0 because 1 hw, ∇2 g(¯ x)zi + g (3)(¯ x)(w, w, w) = 2w13 > 0. 3 In the following example, we illustrate that even when neither the GCQ nor the SGCQ holds, Theorem 3.1 can be applied to derive the SON condition (1). Example 3.2 In (NLP), let n = 2, m = 3, q = 0, f(x) = −x41 + x2, g1 (x) = −x2, g2 (x) = x61 + x32, g3 (x) = −x21 + x22 , and let x¯ = (0, 0). By direct calculation, we have TC (¯ x) = {¯ x},

26

LC (¯ x) = {w ∈ R2 | w2 ≥ 0}, and V(¯ x) = {w ∈ R2 | w2 = 0}. Thus, the GCQ does not hold at x¯. Moreover, we have TC2 (¯ x | w) = and

(

{0} if w = 0, ∅

otherwise,

    R × R+ if w2 = 0, 2 LC (¯ x | w) = R2 if w2 > 0 and w12 ≥ w22 ,    ∅ otherwise.

Thus, for any w ∈ V(¯ x), we have

L2C (¯ x | w) 6= clconv[TC2 (¯ x | w)], which implies that the SGCQ does not hold at x¯. By checking condition (38), we have 2

x | w) = kerd2 S 3 (¯ L2C (¯ x)(w | ·)

∀w ∈ V(¯ x).

This indicates that Theorem 3.1 is applicable once the exactness of the penalty function F 2 is 3

verified. In what follows, we will show that the penalty function Fp is exact at x¯ for p = 23 but not 2 for p > 23 . Let δ ∈ (0, 1) and µ ˜= ˜ > 2. Let µ ≥ µ ˜ and let x ∈ R2 be such 2 . Clearly, µ (1 − δ 2 ) 3 that |x1 | ≤ δ and |x2| ≤ δ. We consider two cases for x: Case 1: x2 ≥ 0. We have 2 F 2 (x) = −x41 + x2 + µ (−x2)+ + (x61 + x32 )+ + (−x21 + x22)+ 3 3

2

≥ −x41 + µ(x61 ) 3

= (µ − 1)x41 ≥ 0. Case 2: x2 < 0. We have from (60) 2

F 2 (x) ≥ −x41 + x2 + µ[(−x2 + x61 + x32)+ ] 3 3

2

= −x41 + x2 + µ[−x2(1 − x22) + x61] 3 1 2 1 2 ≥ −x41 + x2 + µ[−x2(1 − x22)] 3 + µ(x61 ) 3 2 2 1 µ 2 ≥ ( µ − 1)x41 + (−x2)[ (1 − δ 2) 3 − 1] 2 2 ≥ 0, 27

(60)

where the second inequality follows from Lemma 4.1 in Huang and Yang (2003). To show that the penalty function Fp is not exact at x¯ when p > 32 , we consider a sequence xk := (x1k , 0) ∈ R2 with x1k → 0+. It is easy to check that for any µ > 0 and p > 23 , the following condition holds for all sufficiently large k: Fp(xk ) = −x41k + µx6p x) = 0. 1k < Fp (¯ Now Theorem 3.1 can be applied to derive the SON at x¯. In fact, by direct calculation, we have KKT(¯ x) = {λ ∈ R3 | λ1 = 1, λ2 ≥ 0, λ3 ≥ 0}. Thus, for each w ∈ V(¯ x), we have sup hw, ∇2xx L(¯ x, λ)wi = sup (−2λ3 w12) = 0. λ3 ≥0

λ∈KKT(¯ x)

In Example 2.5 of Meng and Yang (2010), we presented a class of parameterized problems and identified when exactly KKT conditions can be obtained. Now, by using a similar class of parameterized problems, we illustrate when the SON condition can be derived by one of the existing methods and when it can be obtained from Theorem 3.2 only. Example 3.3 Let x¯ = 0 ∈ R3 be a local minimum of the following (NLP): min

f(x)

s.t. g1 (x) = aT x + a4x43 ≤ 0, g2 (x) = bT x + b4 x43 ≤ 0,

(61)

g3 (x) = cT x + c4 x43 ≤ 0, where a = (a1, a2, a3)T , b = (b1, b2, b3 )T , c = (c1 , c2, c3 )T ∈ R3 and a4 , b4 , c4 ∈ R. Assume that the vectors a and b are linearly independent, and that the vector c has a unique representation c = −k1 a − k2 b with k1 , k2 ∈ R. Let e3 = (0, 0, 1)T . According to the discussions given in Example 2.5 of Meng and Yang (2010), we consider the following seven cases: (i) min{k1 , k2 } < 0. (ii) min{k1 , k2 } ≥ 0 and k1 a4 + k2 b4 + c4 ≤ 0. (iii) min{k1 , k2 } = 0, k1 + k2 > 0, k1 a4 + k2 b4 + c4 > 0 and the vectors c and e3 are linearly dependent. 28

(iv) min{k1 , k2 } = 0, k1 + k2 > 0, k1 a4 + k2 b4 + c4 > 0 and the vectors c and e3 are linearly independent. (v) min{k1 , k2 } = 0, k1 + k2 = 0 and k1 a4 + k2 b4 + c4 > 0. (vi) min{k1 , k2 } > 0, k1 a4 + k2 b4 + c4 > 0 and the vectors a, b, e3 are linearly dependent. (vii) min{k1 , k2 } > 0, k1 a4 + k2 b4 + c4 > 0 and the vectors a, b, e3 are linearly independent. It was shown in Meng and Yang (2010) that the MFCQ holds for the case (i) and the GCQ holds for the case (ii). As for the cases (iii) and (vi), it follows from Theorem 2.5 of Ng and Zheng (2001) that there exist τ > 0 and δ > 0 such that for all x ∈ R3 with kx − x¯k ≤ δ, the inequality τ dC (x) ≤ S(x) holds. This local error bound property implies by Proposition 2.4.3 of Clarke (1983) that the penalty function F1 is exact at x¯. Therefore, for the cases (i), (ii), (iii) and (vi), we confirm that the SON condition holds at x¯ for any twice continuously differentiable objective function f having a local minimum at x¯ relative to the feasible set C. For the cases (iv), (v) and (vii), we will show that the GCQ does not hold at x¯. For any x ∈ C, we have k1 g1 (x) + k2 g2 (x) + g3 (x) = (k1 a4 + k2 b4 + c4)x43 ≤ 0, which implies that x3 = 0 because k1 a4 + k2 b4 + c4 > 0. Thus, we have C = {x ∈ R3 | aT x ≤ 0, bT x ≤ 0, cT x ≤ 0, x3 = 0}. That is, C is a polyhedral cone. We thus have C = TC (¯ x) = clconvTC (¯ x). By definition, we have LC (¯ x) = {x ∈ R3 | aT x ≤ 0, bT x ≤ 0, cT x ≤ 0}. In any of the cases (iv), (v) and (vii), we can find some x ∈ R3 with x3 6= 0 such that x ∈ LC (¯ x). That is, clconvTC (¯ x) ( LC (¯ x). or equivalently the GCQ does not hold at x¯. Thus, for some instances of the problem (61), we cannot expect the KKT condition, not to mention the SON condition. In what follows, by assuming case (vii), we consider an instance of problem (61) with 8

f(x) = wT x + w4 x33 ,

(62)

where w4 < 0, and w = (w1, w2 , w3)T = −ρ1 a − ρ2 b for some ρ1 , ρ2 ≥ 0. First, we show that the penalty function Fp (x) cannot be exact at x¯ when p > 32 . Since the vectors a, b, e3 are 29

linearly independent, we can find a sequence xk := (x1k , x2k , x3k )T such that xk → x¯, x3k 6≡ 0, aT xk = 0, and bT xk = 0. For such a sequence, we can easily verify that for any µ > 0, the following inequality holds for all sufficiently large k: 8 p 3 Fp (xk ) = w4 x3k + µ (a4x43k )+ + (b4x43k )+ + (c4 x43k )+ < 0. This indicates that the penalty function Fp (x) is not exact at x¯ when p > 32 .

Next, we show that the penalty function F 2 (x) is exact at x¯. By definition, we have 3

4 3

8

F 2 (x) = [w4 + (ρ1 a4 + ρ2 b4)x3 ]x33 − ρ1 (aT x + a4x43) − ρ2 (bT x + b4x43 ) 3

2 +µ (aT x + a4x43 )+ + (bT x + b4 x43)+ + (cT x + c4 x43)+ 3 .

1 Set δ = min{1, kak+|a , 1 } and 4 | kbk+|b4 | (

µ ˜ = 2 max 2ρ1 , 2ρ2 ,

|ρ1 a4 + ρ2 b4| − w4 2

2

(min{ k11 , k12 , 1}) 3 (k1 a4 + k2 b4 + c4 ) 3

)

.

Let µ ≥ µ ˜ and kxk ≤ δ. By the definition of δ, we have 4

4

(ρ1 a4 + ρ2 b4)x33 ≥ −|ρ1a4 + ρ2 b4|δ 3 ≥ −|ρ1a4 + ρ2 b4|,

(63)

|aT x + a4x43 | ≤ kakkxk + |a4|x43 ≤ kakδ + |a4|δ 4 ≤ (kak + |a4|)δ ≤ 1,

(64)

and and similarly, |bT x + b4x43 | ≤ 1.

(65)

Thus, from (64) and (65), we obtain 2

[(aT x + a4x43 )+ + (bT x + b4 x43)+ + (cT x + c4 x43)+ ] 3 2

≥ [(aT x + a4x43 )+ + (bT x + b4 x43)+ ] 3 ≥

1 [(aT x 2

≥

1 (aT x 2

+

2 a4 x43)+ ] 3

+

1 [(bT x 2

+

(66)

2 b4x43 )+ ] 3

+ a4x43 )+ + 12 (bT x + b4x43 )+ ,

where the second inequality follows from Lemma 4.1 in Huang and Yang (2003). Since k1 a4 + k2 b4 + c4 > 0, we have 2

[(aT x + a4 x43)+ + (bT x + b4x43 )+ + (cT x + c4 x43 )+ ] 3 2

2

≥ [min{ k11 , k12 , 1}] 3 [k1(aT x + a4 x43)+ + k2 (bT x + b4x43 )+ + (cT x + c4 x43 )+ ] 3 2 2 ≥ [min{ k11 , k12 , 1}] 3 k1 (aT x + a4x43 ) + k2 (bT x + b4x43 ) + cT x + c4 x43 +3 2

2

8

= (min{ k11 , k12 , 1}) 3 (k1 a4 + k2 b4 + c4) 3 x33 . 30

(67)

In view of (63), (66), (67), and the definition of µ ˜ , we have 4

8

F 2 (x) ≥ [w4 + (ρ1 a4 + ρ2 b4)x33 ]x33 − ρ1 (aT x + a4x43 ) − ρ2 (bT x + b4x43 ) 3

2

2

8

+ µ4 (aT x + a4 x43)+ + µ4 (bT x + b4x43 )+ + µ2 (min{ k11 , k12 , 1}) 3 (k1 a4 + k2 b4 + c4 ) 3 x33 ≥ 0. This implies that the penalty function F 2 (x) is exact at x¯. Moreover, it is easy to verify that 3

condition (51) holds. Thus, Theorem 3.2 (ii) can be applied to derive the SON condition at x¯. Acknowledgments The first author was supported by the Fundamental Research Funds for the Central Universities [SWJTU12CX121] and by the National Science Foundation of China [71090402, 11201383]. The second author was supported by the Research Grants Council of Hong Kong [PolyU 5295/12E].

References Andreani, R., Echag¨ ue, C. E., and Schuverdt, M. L. (2010). Constant-rank condition and second-order constraint qualification. Journal of Optimization Theory and Applications, 146:255–266. Anitescu, M. (2000). Degenerate nonlinear programming with a quadratic growth condition. SIAM Journal on Optimization, 10:1116–1135. Arutyunov, A. V. (1991). Perturbations of extremal problems with constraints and necessary optimality conditions. Journal of Mathematical Sciences, 54:1342–1400. Arutyunov, A. V., Avakov, E. R., and Izmailov, A. (2008). Necessary optimality conditions for constrained optimization problems under relaxed constraint qualifications. Mathematical Programming, 114:37–68. Aubin, J.-P. and Ekeland, L. (1984). Applied Nonlinear Analysis. John Wiley, New York. Baccari, A. and Trad, A. (2005). On the classical necessary second-order optimality conditions in the presence of equality and inequality constraints. SIAM Journal on Optimization, 15:394– 408.

31

Ben-Tal, A. (1980). Second-order and related extremality conditions in nonlinear programming. Journal of Optimization Theory and Applications, 31(2):143–165. Ben-Tal, A. and Zowe, J. (1982). A unified theory of first and second order conditions for extremum problems in topological vector spaces. Mathematical Programming Study, 19:39– 76. Bonnans, J. F. and Shapiro, A. (2000). Perturbation Analysis of Optimization Problems. Springer-Verlag, New York. Burke, J. V. (1991a). Calmness and exact penalization. SIAM Journal on Control and Optimization, 29(2):493–497. Burke, J. V. (1991b). An exact penalization viewpoint of constrained optimization. SIAM Journal on Control and Optimization, 29(4):968–998. Burke, J. V. and Poliquin, R. A. (1992). Optimality conditions for non-finite valued convex composite functions. Mathematical Programming, 57:103–120. Chartrand, R. and Staneva, V. (2008). Restricted isometry properties and nonconvex compressive sensing. Inverse Problems, 24:1–14. Clarke, F. H. (1983). Optimization and Nonsmooth Analysis. John Wiley, New York. Eremin, I. I. (1967). The penalty method in convex programming. Cybernetics and Systems Analysis, 3(4):53–56. Fiacco, A. V. and McCormick, G. P. (1968). Nonlinear Programming: Sequential Unconstrained Minimization Techniques. Wiley, New York. Gould, F. J. and Tolle, J. W. (1971). A necessary and sufficient qualification for constrained optimization. SIAM Journal on Applied Mathematics, 20(2):164–172. Guignard, M. (1969). Generalized Kuhn-Tucker conditions for mathematical programming problems in a Banach space. SIAM Journal on Control, 7(2):232–241. Huang, X. X. and Yang, X. Q. (2003). A unified augmented Lagrangian approach to duality and exact penalization. Mathematics of Operations Research, 28(3):533–552. Ioffe, A. D. (1979). Necessary and sufficient conditions for a local minimum. 3: Second order conditions and augmented duality. SIAM Journal on Control and Optimization, 17(2):266– 288. 32

Ioffe, A. D. (1989). On some recent developments in the theory of second order optimality conditions. In Dolecki, S., editor, Optimization, volume 1405 of Lecture Notes in Mathematics, pages 55–68. Springer Berlin Heidelberg. Izmailov, A. F. and Solodov, M. V. (2001). Error bounds for 2-regular mappings with lipschitzian derivatives and their applications. Mathematical Programming, 89:413–435. Karush, W. (1939). Minima of functions of several variables with inequalities as side constraints. Master’s thesis, Dept. of Mathematics, Univ. of Chicago, Chicago, Illinois. Kawasaki, H. (1988). Second-order necessary conditions of the Kuhn-Tucker type under new constraint qualifications. Journal of Optimization Theory and Applications, 57(2):253–264. Kuhn, H. W. and Tucker, A. W. (1951). Nonlinear programming. Proceedings of 2nd Berkeley Symposium. Berkeley: University of California Press, pages 481–492. Lewis, A. S. and Wright, S. J. (2011). Identifying activity. SIAM Journal on Optimization, 21(2):597–614. Luo, Z. Q., Pang, J. S., and Ralph, D. (1996). Mathematical Programs with Equilibrium Constraints. Cambridge University Press, Cambridge. Mangasarian, O. L. (1969). Nonlinear Programming. McGraw-Hill, New York. Mangasarian, O. L. and Fromovitz, S. (1967). The Fritz John necessary optimality conditions in the presence of equality and inequality constraints. Journal of Mathematical Analysis and Applications, 17(1):37–47. Meng, K. W. and Yang, X. Q. (2010). Optimality conditions via exact penalty functions. SIAM Journal on Optimization, 20(6):3208–3231. Ng, K. F. and Zheng, X. Y. (2001). Error bounds for lower semicontinuous functions in normed spaces. SIAM Journal on Optimization, 12(1):1–17. Pietrzykowski, T. (1969). An exact potential method for constrained maxima. SIAM Journal on Numerical Analysis, 6(2):299–304. Rockafellar, R. T. (1988). First- and second-order epi-differentiability in nonlinear programming. Transactions of the American Mathematical Society, 307(1):pp. 75–108.

33

Rockafellar, R. T. (1989). Second-order optimality conditions in nonlinear programming obtained by way of epi-derivatives. Mathematics of Operations Research, 14:462–484. Rockafellar, R. T. and Wets, R. J.-B. (1998). Variational Analysis. Springer-Verlag, Berlin. Rubinov, A. M. and Yang, X. Q. (2003). Lagrange-Type Functions in Constrained Non-Convex Optimization. Kluwer Academic Publishers, Dordrecht, The Netherlands. Wright, S. J., Nowak, R. D., and Figueiredo, M. A. T. (2009). Sparse reconstruction by separable approximation. IEEE Trans. Signal Process, 57(7):2479–2493. Yang, X. Q. and Meng, Z. Q. (2007). Lagrange multipliers and calmness conditions of order p. Mathematics of Operations Research, 32(1):95–101. Zangwill, W. I. (1967). Non-linear programming via penalty functions. Management Science, 13(5):344–358.

34