A COMPARATIVE STUDY OF KERNEL ... - Semantic Scholar

Report 0 Downloads 210 Views
c 2004 Society for Industrial and Applied Mathematics 

SIAM J. OPTIM. Vol. 15, No. 1, pp. 101–128

A COMPARATIVE STUDY OF KERNEL FUNCTIONS FOR PRIMAL-DUAL INTERIOR-POINT ALGORITHMS IN LINEAR OPTIMIZATION∗ Y. Q. BAI† , M. EL GHAMI‡ , AND C. ROOS‡ Abstract. Recently, so-called self-regular barrier functions for primal-dual interior-point methods (IPMs) for linear optimization were introduced. Each such barrier function is determined by its (univariate) self-regular kernel function. We introduce a new class of kernel functions. The class is defined by some simple conditions on the kernel function and its derivatives. These properties enable us to derive many new and tight estimates that greatly simplify the analysis of IPMs based on these kernel functions. In both the algorithm and its analysis we use a single neighborhood of the central path; the neighborhood naturally depends on the kernel function. An important conclusion is that inverse functions of suitable restrictions of the kernel function and its first derivative more or less determine the behavior of the corresponding IPMs. Based on the new estimates we present a simple and unified computational scheme for the complexity analysis of kernel function in the new class. We apply this scheme to seven specific kernel functions. Some of these functions are self-regular, and others are not. One of the functions differs from the others, and from all self-regular functions, in the sense that its growth term is linear. Iteration bounds for both large- and small-update methods are derived. It is shown that small-update methods based on √ the new kernel functions all have the ). For large-update methods same complexity as the classical primal-dual IPM, namely, O( n log n ε √ ), which until now has been the best known bound for the best obtained bound is O( n(log n) log n ε such methods. Key words. linear optimization, interior-point method, primal-dual method, large-update method, polynomial complexity AMS subject classifications. 90C05, 90C51 DOI. 10.1137/S1052623403423114

1. Introduction. After the groundbreaking paper of Karmarkar [5], linear optimization (LO) became an active area of research. The resulting interior-point methods (IPMs) are now among the most effective methods for solving LO problems. For a survey we refer to recent books on the subject [14, 16, 17]. In this paper we deal with so-called primal-dual IPMs. It is generally agreed that these IPMs are most efficient from a computational point of view (see, e.g. Andersen et al. [1]). Until now primal-dual IPMs all have used the Newton direction as the search direction; this direction is closely related to the well-known primal-dual logarithmic barrier function. For a discussion of this relation we refer the reader to, e.g., [2, 3, 11]. There is a gap between the practical behavior of these algorithms and the theoretical performance results, especially for so-called large-update methods. If n denotes the number of inequalities in the problem, then the theoretical iteration bound is O(n log(n/ε)), where ε represents the desired accuracy of the solution. In practice, large-update methods are much more efficient than the √ so-called small-update methods for which the theoretical iteration bound is only O( n log(n/ε)). So the current ∗ Received by the editors February 14, 2003; accepted for publication (in revised form) February 2, 2004; published electronically October 14, 2004. This research was supported by the Dutch Organization for Scientific Research (NWO grant 613.000.110). http://www.siam.org/journals/siopt/15-1/42311.html † Department of Mathematics, Shanghai University, Shanghai 200436, China (yqbai@staff.shu. edu.cn). ‡ Faculty of Information Technology and Systems, Delft University of Technology, P.O. Box 5031, 2600 GA Delft, The Netherlands ([email protected], [email protected], http://www. isa.ewi.tudelft.nl/∼roos).

101

102

Y. Q. BAI, M. EL GHAMI, AND C. ROOS

√ theoretical bounds differ by a factor of n, in favor of the small-update methods. This significant gap between theory and practice has been referred to as the irony of IPMs [13, page 51]. Recently a new class of IPMs was introduced and the aforementioned gap could be narrowed up to a factor of log n. These methods do not use the classical Newton direction. Instead they use a direction that can be characterized as a steepest descent direction (in a scaled space) for a so-called self-regular barrier function [11, 12]. Any such barrier function is determined by a simple univariate self-regular function, called its kernel function. The prototype self-regular kernel function in [11, 12] is given by (1.1)

Υp,q (t) =

tp+1 − 1 t1−q − 1 p − q + (t − 1), + pq p(p + 1) q(q − 1)

where p ≥ 1 and q > 1. The parameter p is called the growth degree, and q the barrier degree of the kernel function. It may be noted that the iteration bounds obtained in [11, 12] depend monotonically on p: the smaller p is, the better is the iteration bound. Hence it is stated in [12, page 63] that in practical implementations small values of p should be used. In this paper we introduce a new class of kernel functions which are not necessarily self-regular; on the other hand, all self-regular kernel functions Υp,q (t) with growth degree p ≤ 1 belong to the new class. We develop some new analytic tools for the analysis of IPMs based on kernel functions from the new class. As a result the analysis in this paper is much simpler than in [11, 12], whereas the iteration bounds are at least as good. In addition, our analysis also applies to (self-regular) functions with growth degree p ≤ 1. In both the algorithm and its analysis we use a single neighborhood of the central path; the neighborhood naturally depends on the kernel function. The paper is organized as follows. In section 2.1 we first briefly recall the notion of the central path, which is a basic concept underlying all primal-dual IPMs. In section 2.2 we describe the idea underlying the approach of the paper. A crucial observation is that any function that is strictly convex on the positive orthant and that attains its minimal value in the all-one vector e determines in a natural way a primal-dual interior-point method, which is formally described in section 2.3. In section 2.4 we define the notion of a (univariate) kernel function and in section 2.5 we introduce the class of kernel functions considered in this paper. The first property shared by kernel functions in this class is exponential convexity. The other properties control the growth and barrier behavior of the kernel function. Section 2.6 introduces seven specific kernel functions belonging to the new class. In section 3 we study the growth behavior of the new barrier functions. Section 4 is devoted to the analysis of the amount of decrease of the barrier function during an inner iteration. This analysis yields a default step size and, in section 5, it yields the iteration complexity of the algorithm in terms of the kernel function and its first and second derivative. It also turns out that this complexity depends on the inverse function of (a restriction of) the kernel function itself and on the inverse function of (a restriction of) its first derivative. In section 6 we apply our results to seven kernel functions in order to obtain iteration bounds for large-update methods; to make clear that the small-update methods based on the new kernel functions are as good as the classical logarithmic barrier method we also derive iteration bounds for smallupdate methods. Finally, section 7 contains some concluding remarks and directions for future research. We use the following notational conventions. If x, s ∈ Rn , then xs denotes the coordinatewise (or Hadamard) product of the vectors x and s. Furthermore, e denotes

COMPARATIVE STUDY OF KERNEL FUNCTIONS

103

the all-one vector of length n. The nonnegative orthant and positive orthant are denoted as Rn+ and Rn++ , respectively. Finally, if z ∈ Rn+ and f : R+ → R+ , then f (z) denotes the vector in Rn+ whose ith component is f (zi ), with 1 ≤ i ≤ n. We write f (x) = O(g(x)) if f (x) ≤ cg(x) for some positive constant c and f (x) = Θ(g(x)) if c1 g(x) ≤ f (x) ≤ c2 g(x) for positive constants c1 and c2 . As in this paragraph, functions like f and g always have a local meaning in this paper. 2. Preliminaries. 2.1. The central path. We deal with the LO problem in standard format (P )

min{cT x : Ax = b, x ≥ 0},

where A ∈ Rm×n , rank(A) = m, b ∈ Rm , and c ∈ Rn , and its dual problem (D)

max{bT y : AT y + s = c, s ≥ 0}.

It is well known that finding an optimal solution of (P ) and (D) is equivalent to solving the following system: Ax = b,

x ≥ 0,

A y + s = c, xs = 0.

s ≥ 0,

T

(2.1)

The basic idea of primal-dual IPMs is to replace the third equation in (2.1), the so-called complementarity condition for (P ) and (D), by the parameterized equation xs = µe, with µ > 0. Thus we consider the system Ax = b,

x ≥ 0,

A y + s = c, s ≥ 0, xs = µe. T

(2.2)

Due to the last equation, any solution (x, y, s) of (2.2) will satisfy x > 0 and s > 0. So a solution exists only if (P ) and (D) satisfy the interior-point condition (IPC), i.e., there exists (x0 , s0 , y 0 ) such that (2.3)

Ax0 = b,

x0 > 0,

AT y 0 + s0 = c,

s0 > 0.

Surprisingly enough, if the IPC is satisfied, then a solution exists, for each µ > 0, and this solution is unique. It is denoted as (x(µ), y(µ), s(µ)) and we call x(µ) the µ-center of (P ) and (y(µ), s(µ)) the µ-center of (D). The set of µ-centers (with µ running through all positive real numbers) gives a homotopy path, which is called the central path of (P ) and (D). The relevance of the central path for LO was recognized first by Sonnevend [15] and Megiddo [6]. If µ → 0, then the limit of the central path exists and since the limit points satisfy the complementarity condition, the limit yields optimal solutions for (P ) and (D). From a theoretical point of view the IPC can be assumed without loss of generality. In fact we may, and will, assume that x0 = s0 = e. In practice, this can be realized by embedding the given problems (P ) and (D) into a homogeneous self-dual problem which has two additional variables and two additional constraints. For this and the other properties mentioned above, see, e.g., [14].

104

Y. Q. BAI, M. EL GHAMI, AND C. ROOS

2.2. A wide class of primal-dual path-following methods. IPMs follow the central path approximately. We now describe how this proceeds. We assume that we are given a strictly convex differentiable function Ψ(v), v ∈ Rn++ , such that Ψ(v) is minimal at v = e and Ψ(e) = 0. We will argue below that any such function determines in a natural way an IPM. Without loss of generality we assume that a good approximation of (x(µ), y(µ), s(µ)) is known for some positive µ. For example, due to the above assumption we may assume this for µ = 1, with x(1) = s(1) = e. We then decrease µ to µ := (1−θ)µ, for some fixed θ ∈ (0, 1). At this stage we are given a primal feasible x > 0 and a dual feasible s > 0 and we aim at finding the new µ-centers x(µ), y(µ), and s(µ), the solution of the system (2.2). Defining  xs v := (2.4) , µ we observe that (2.2) can be solved by minimizing Ψ(v), subject to the linear constraints in (2.2), since this function has its (unique) minimizer at v = e, which is equivalent to xs = µe. In this paper we will choose Ψ(v) in such a way that its Hessian at e is a positive multiple of the identity matrix: ∇2 Ψ(e) = ρI for some positive ρ. This implies that the positive multiple (2.5)

1 − ∇Ψ(v) ρ

of the steepest descent direction for minimizing Ψ is a good approximation, at least in the neighborhood of e, of the Newton direction and hence this is an excellent search direction for minimizing Ψ(v). For (small) displacements ∆x and ∆s, in the x- and s-space, respectively, let ∆v be the resulting change in v. Then we have   (x + ∆x)(s + ∆s) xs x∆s + s∆x x∆s + s∆x ∆v = − ≈ = . √ µ µ 2 µxs 2µv Hence, since we want this to be about equal to (2.5), we arrive at the equation x∆s + s∆x 1 = − ∇Ψ(v). 2µv ρ Due to the definition of v and up to the positive factor 2/ρ, this can be rewritten as s∆x + x∆s = −µv∇Ψ(v). Thus, adding the conditions for primal feasibility of x + ∆x and dual feasibility of s + ∆s, neglecting for the moment the nonnegativity constraints, we find the following linear system for the search directions ∆x, ∆s, and ∆y:

(2.6)

A∆x = 0, AT ∆y + ∆s = 0, s∆x + x∆s = −µv∇Ψ(v).

We proceed by showing that system (2.6) uniquely defines the search directions and that these vanish if and only if Ψ(v) = 0, which holds if and only if v = e, i.e., if and

COMPARATIVE STUDY OF KERNEL FUNCTIONS

105

only if (x, y, s) = (x(µ), y(µ), s(µ)). To this end, using the vector v defined in (2.4), we introduce scaled versions of the displacements ∆x and ∆s as follows: (2.7)

dx :=

v∆x , x

ds :=

v∆s . s

Now one easily checks that the system (2.6), which defines the search directions, can be rewritten as ¯ x = 0, Ad (2.8)

1 ¯T A ∆y + ds = 0, µ dx + ds = −∇Ψ(v),

where A¯ = AV −1 X,

V = diag(v), X = diag(x).

The last equation in (2.8) is called the scaled centering equation. It states that the sum of the scaled search directions dx and ds is equal to −∇Ψ(v), the steepest descent direction of Ψ(v). Note that dx and ds are orthogonal vectors, since the vector dx ¯ Thus it follows belongs to the null space and ds to the row space of the matrix A. that dx and ds form an orthogonal decomposition of the steepest descent direction of the function Ψ(v). Note that since dx and ds are orthogonal, we will have dx = ds = 0 ⇔ ∇Ψ(v) = 0 ⇔ v = e ⇔ Ψ(v) = 0, i.e., if and only if x = x(µ) and s = s(µ). Hence, if (x, y, s) = (x(µ), y(µ), s(µ)), then (∆x, ∆s, ∆y) is nonzero. By taking a step along the search direction, with the step size α defined by some line search rules, one constructs a new triple (x, y, s) according to (2.9)

x+ = x + α∆x,

y+ = y + α∆y,

s+ = s + α∆s.

If necessary, this procedure is repeated until we find iterates (x, y, s) that are “close” enough to (x(µ), y(µ), s(µ)). Then µ is again reduced by the factor 1 − θ and we apply the above method targeting the new µ-centers, and so on. This process is repeated until µ is small enough, say, until nµ ≤ ε; at this stage we have found an ε-solution of the problems (P ) and (D). 2.3. A generic primal-dual algorithm. We can now describe the algorithm in a more formal way. The generic form of the algorithm is shown in Figure 1. It is clear from this description that closeness of (x, y, s) to (x(µ), y(µ), s(µ)) is measured by the value of Ψ(v), with τ as a threshold value: if Ψ(v) ≤ τ , then we start a new outer iteration by performing a µ-update; otherwise we enter an inner iteration by computing the search directions at the current iterates with respect to the current value of µ and apply (2.9) to get new iterates. The parameters τ , θ and the step size α should be chosen in such a way that the algorithm is “optimized” in the sense that the number of iterations required by the algorithm is as small as possible. The choice of the barrier update parameter θ plays an important role in both theory and practice of IPMs. Usually, if θ is a constant independent of the dimension n of the problem, for instance, θ = 12 , then we call the algorithm

106

Y. Q. BAI, M. EL GHAMI, AND C. ROOS

Generic Primal-Dual Algorithm for LO Input: A threshold parameter τ > 0; an accuracy parameter ε > 0; a fixed barrier update parameter θ, 0 < θ < 1; begin x := e; s := e; µ := 1; while nµ ≥ ε do begin µ := (1 − θ)µ; while Ψ(v) > τ do begin x := x + α∆x; s := s + α∆s; y :=  y + α∆y; v := xs µ ; end end end

Fig. 1. Generic algorithm.

a large-update √ (or long-step) method. If θ depends on the dimension of the problem, such as θ = 1/ n, then the algorithm is called a small-update (or short-step) method. The choice of the step size α (α > 0) is another crucial issue in the analysis of the algorithm. It has to be made such that the closeness of the iterates to the current µ-center improves by a sufficient amount. In the theoretical analysis the step size α is usually given a value that depends on the closeness of the current iterates to the µ-center. 2.4. Kernel functions. To simplify matters we will restrict ourselves in this paper to the case where Ψ(v) is separable with identical coordinate functions. Thus, letting ψ denote the function on the coordinates, we have (2.10)

Ψ(v) =

n 

ψ(vi ),

i=1

where ψ(t) : D → R+ , with R++ ⊆ D, is strictly convex and minimal at t = 1, with ψ(1) = 0. We call the univariate function ψ(t) the kernel function of the barrier function Ψ(v). Obviously, the resulting iteration bound will depend on the kernel function underlying the algorithm, and our main task becomes to find a kernel function that minimizes the iteration bound. Except for the kernel function considered in [3], all kernel functions considered so far are twice differentiable and go to infinity if either t ↓ 0 or t → ∞. Thus they satisfy (2.11a)

ψ  (1) = ψ(1) = 0,

(2.11b) (2.11c)

ψ  (t) > 0, lim ψ(t) = lim ψ(t) = ∞. t↓0

t→∞

COMPARATIVE STUDY OF KERNEL FUNCTIONS

107

Clearly, (2.11a) and (2.11b) say that ψ(t) is a nonnegative strictly convex function such that ψ(1) = 0. Note that this implies that if ψ(t) is twice differentiable, then it is completely determined by its second derivative:  t (2.12)

ξ

ψ(t) = 1

ψ  (ζ) dζdξ.

1

Moreover, (2.11c) expresses that ψ(t) is coercive and has the barrier property. Having such a kernel function ψ(t), its definition is extended to positive n-dimensional vectors v by (2.10), thus yielding the induced (scaled) barrier function Ψ(v). In what follows we assume in this paper that a kernel function satisfies (2.11a)–(2.11c). As we indicated in the previous section, Ψ(v) not only serves to define a search direction, but also acts as a measure of closeness of the current iterates to the µ-center. In the analysis of the algorithm we also use the norm-based proximity measure δ(v) defined by (2.13)

δ(v) :=

1 1

∇Ψ (v) = dx + ds . 2 2

Note that since Ψ(v) is strictly convex and minimal at v = e we have Ψ (v) = 0 ⇔ δ (v) = 0 ⇔ v = e. Thus the algorithm considered in this paper uses the barrier function Ψ(v) to measure closeness of the iterates to the µ-center; as will become clear below, in the analysis of the algorithm δ(v) serves as a second proximity measure. Both measures are naturally determined by the kernel function. 2.5. Further conditions on the kernel function. In this paper we work with five more conditions on the kernel function, namely, (2.14a)

tψ  (t) + ψ  (t) > 0, 



(2.14b) (2.14c) (2.14d)

tψ (t) − ψ (t) > 0, ψ  (t) < 0, 2ψ  (t)2 − ψ  (t)ψ  (t) > 0,

(2.14e)

ψ  (t)ψ  (βt) − βψ  (t)ψ  (βt) > 0,

t < 1, t > 1, t > 0, t < 1, t > 1, β > 1.

Note that conditions (2.14c) and (2.14d) require that ψ(t) be three times differentiable. Furthermore, condition (2.14a) is obviously satisfied if t ≥ 1, since then ψ  (t) ≥ 0 and, similarly, condition (2.14b) is satisfied if t ≤ 1, since then ψ  (t) ≤ 0. Also (2.14d) is obviously satisfied if t ≥ 1 since then ψ  (t) ≥ 0, whereas ψ  (t) < 0. We conclude that conditions (2.14a) and (2.14d) are conditions on the barrier behavior of ψ(t). On the other hand, condition (2.14b) deals only with t ≥ 1 and hence concerns the growth behavior of ψ(t). Condition (2.14e) is technically more involved; we will discuss it later. The next two lemmas make clear that conditions (2.14a) and (2.14b) admit a nice interpretation. The proof of the first lemma can be found in [12]; the proof of the second lemma is of the same spirit and is left to the reader. Lemma 2.1 (Lemma 2.1.2 in [12]). Let ψ(t) be a twice differentiable function for t > 0. Then √ the following three properties are equivalent: (i) ψ( t1 t2 ) ≤ 12 (ψ (t1 ) + ψ (t2 )) for t1 , t2 > 0;

108

Y. Q. BAI, M. EL GHAMI, AND C. ROOS

(ii) ψ  (t) + tψ  (t) ≥ 0, t > 0; (iii) ψ(eξ ) is convex. Lemma 2.2. Let ψ(t) be a twice differentiable function for t > 0. Then the following three  properties are equivalent: t2 +t2

(i) ψ( 1 2 2 ) ≤ 12 (ψ (t1 ) + ψ (t2 )); (ii) tψ √(t) − ψ  (t) ≥ 0, t > 0; (iii) ψ( ξ) is convex. Following [8], we call the property described in Lemma 2.1 exponential convexity, or e-convexity for short. This property has been proven to be very useful in the analysis of primal-dual algorithms based on kernel functions (cf. [10, 11, 12]). We recall from [12, Lemma 2.1.3] that if ψ(t) is e-convex, then ψ(ta ) is e-convex as well, for every α ∈ R. Also, for any β0 ∈ R the function β0 log t +

m 

βi (tαi − 1)

i=1

is e-convex whenever βi ≥ 0 and αi ∈ R for i = 1, . . . , m [12, Proposition 2.1.4]. It may be noted that the kernel function ψ(t) is defined to be self-regular in [11, 12] if ψ(t) is e-convex and moreover ψ  (t) = Θ(Υp,q (t)), where Υp,q (t) denotes the prototype self-regular kernel function (1.1). Since Υp,q (t) = tp−1 + t−q−1 ,

p−2 Υ − (q + 1)t−q−2 , p,q (t) = (p − 1)t

the kernel function Υp,q (t) (and hence each self-regular kernel function) satisfies (2.14c) only if p ≤ 1. Remark 2.3. It is worth pointing out that the conditions (2.14a)–(2.14d) are logically independent. The following table shows four kernel functions, and the signs indicate whether a condition is satisfied (+) or not (−). ψ(t)

(2.14a)

(2.14b)

(2.14c)

(2.14d)

t2 − t − 1 + e1−t



+

+

+

t − 1 − log t

+



+

+

t3 + t−3 − 2

+

+



+

+

+

+



8t2 − 11t + 1 +

2 √ t

− 4 log t

The last condition, however, is not independent from the other conditions. One has the following lemma. Lemma 2.4. If ψ(t) satisfies (2.14b) and (2.14c), then ψ(t) satisfies (2.14e). Proof. For t > 1 we consider the function f (β) := ψ  (t)ψ  (βt) − βψ  (t)ψ  (βt),

β ≥ 1.

Since f (1) = 0, the lemma will follow if f  (β) > 0 for β > 1. This can be shown as follows: f  (β) = tψ  (t)ψ  (βt) − ψ  (t)ψ  (βt) − βtψ  (t)ψ  (βt) = ψ  (βt) [tψ  (t) − ψ  (t)] − βtψ  (t)ψ  (βt) > 0.

COMPARATIVE STUDY OF KERNEL FUNCTIONS

109

The last inequality follows since ψ  (βt) ≥ 0, tψ  (t) − ψ  (t) ≥ 0, by (2.14b), and −βtψ  (t)ψ  (βt) > 0, since t > 1, which implies ψ  (t) > 0, and ψ  (βt) < 0, by (2.14c). This proves the lemma. The following two lemmas deal with consequences of condition (2.14c). Lemma 2.5. If the kernel function ψ(t) satisfies (2.14c), then 1 (t − 1)ψ  (t) and ψ  (t) > (t − 1)ψ  (t) if t > 1, 2 1 ψ(t) < (t − 1)ψ  (t) and ψ  (t) > (t − 1)ψ  (t) if t < 1. 2

ψ(t) >

Proof. Consider the function f (t) = 2ψ(t) − (t − 1)ψ  (t). One has f (1) = 0 and f (t) = ψ  (t) − (t − 1)ψ  (t). Hence f  (1) = 0 and f  (t) = −(t − 1)ψ  (t). Using that ψ  (t) < 0 it follows that if t > 1, then f  (t) > 0, whence f  (t) > 0 and f (t) > 0, and if t < 1, then f  (t) < 0, whence f  (t) > 0 and f (t) < 0. From this the lemma follows. Lemma 2.6. If the kernel function ψ(t) satisfies (2.14c), then 

1  1 ψ (t)(t − 1)2 < ψ(t) < ψ  (1)(t − 1)2 if t > 1, 2 2 1  1 ψ (1)(t − 1)2 < ψ(t) < ψ  (t)(t − 1)2 if t < 1. 2 2 Proof. By using Taylor’s theorem and ψ(1) = ψ  (1) = 0, we obtain ψ(t) =

1  1 ψ (1)(t − 1)2 + ψ  (ξ)(ξ − 1)3 , 2 3!

where 1 < ξ < t if t > 1 and t < ξ < 1 if t < 1. Since ψ  (ξ) < 0 the second inequality for t > 1 and the first inequality for t < 1 in the lemma follow. The remaining two inequalities are an immediate consequence of Lemma 2.5. The theory developed in this paper applies to kernel functions that satisfy conditions (2.14a), (2.14c), (2.14d), and (2.14e). In this paper we call any such function an eligible kernel function. Due to Lemma 2.4, any kernel function satisfying the first four conditions, (2.14a)–(2.14d), is eligible. 2.6. Seven kernel functions. By way of example we consider in this paper the seven kernel functions in the second column of Table 2.1. In this section we show that these kernel functions are eligible. For the first six functions this is almost immediate: they satisfy conditions (2.14a)–(2.14d), as we now show. The 5th and 6th columns in Table 2.1 make clear that each of these kernel functions satisfies conditions (2.14a) and (2.14b). From Table 2.2 it follows that they also satisfy conditions (2.14c) and (2.14d). It is obvious from the 2nd column that (2.14c) is satisfied. That (2.14d) also is satisfied is not always immediately clear from the given expressions in the 4th column in Table 2.2; we leave it as an exercise to the reader to verify that these expressions indeed are nonnegative for t ≤ 1. The seventh kernel function requires special attention. It is clear from the tables that it satisfies (2.14a), (2.14c), and (2.14d), but it fails to satisfy (2.14b). To prove that this function is eligible it suffices to show that it satisfies (2.14e). This, in fact, is easy. Some straightforward computations yield that for ψ = ψ7 one has ψ  (t)ψ  (βt) − βψ  (t)ψ  (βt) =

q (1 − β −q ) , tq+1

110

Y. Q. BAI, M. EL GHAMI, AND C. ROOS Table 2.1 The seven kernel functions and the conditions (2.14a) and (2.14b) (q > 1).

i

ψi (t) t2 −1 2

1

1 2

2

t2 −1 2

t2 −1 2

5

t2 −1 2

6

t1−q −1 q−1

+

t1−q −1 q(q−1)

+

t



1

−e e

1 −1



2 t

3 t4

2t +

1+

t−

1 t3

1+

1 + qt−q−1

t−q −1 q

1 −1 et t2

1+

1

1+2t t4

1+

1 − t−q

4 t3

2 t3

2t + (q − 1)t−q

1 + t−q−1

t − e t −1

t1−q −1 q−1

2t

1 t

t− dξ

1 t2

t−

− 1) t − 1 −

1 et

tψi (t) − ψi (t)

ψi (t)

t − t−q

q−1 (t q



+

t−1+

7



1 2 t

t−

t2 −1 2

3 4



− log t

tψi (t) + ψi (t)

ψi (t)

2t +

1

e t −1

1 −1

et t2

qt−q−1

q−1 q



(q + 1)t−q

t−q − 1



q−1 q

q+1 −q t q

+

1

1+3t t3

1

1+t t

2t +

1+t t3

e t −1

2t +

1−t t

e t −1

1 + (q − 1)t−q

1

e t −1 1

e t −1

−1 + (q + 1)t−q

Table 2.2 The seven kernel functions and the conditions (2.14c) and (2.14d). i

t2 −1 2

1

1 2

2

5 6 7



t2 −1 2

+

− log t

+

t −1 2 t2 −1 2





t1−q −1 q−1

−q(q + 1)t−q−2

+

t

t−1+

− t23 − 12 t5

t1−q −1 q(q−1) 2

2ψi (t)2 − ψi (t)ψi (t)

1 2 t

t−

t2 −1 2

3 4

ψi (t)

ψi (t)

1

q−1 (t q

− 1 et

1 −1

− 1+6t+6t t6 dξ

t1−q −1 q−1

6 t2

24 t4

+

2+

−(q + 1)t−q−2 2

−e e



− 1)

2+

e

1 −1 t

1

− 1+2t e t −1 t4 −q(q + 1)t−q−2



2 1+

q tq+1

2 1 + 2 1 + tq+1

2 1 



2

2 t4 +(1+2t)e t

−1

+

6 t8

q(q+1)(tq+1 −1)

t2(q+1)

2 1+

1 −1 et t2







tq+2 1 −1

−(1+6t+6t2 ) e t t8



−q

(q+1) t−1+ 1−tq

2

1 −1

et

+ (1 + 2t)

−t3



1 −1

et

1 −1



t−e t t4

q (q − 1 + (q + 1)tq ) t−2(q+1)

which is clearly positive if t > 1 and β > 1. Thus we have shown that each of the seven kernel functions is eligible. We now briefly discuss these functions. The first function, ψ1 (t), is the kernel function of the classical primal-dual logarithmic barrier function. This can be seen as follows. The barrier function induced by ψ(t) = ψ1 (t) is n n 2 n   vi − 1 1 2 Ψ(v) = − log vi = ψ(vi ) = (vi − 1 − log vi2 ). 2 2 i=1 i=1 i=1 Because of (2.4), by defining Φ(x, s, µ) := Ψ(v) the barrier function can be expressed in x and s:

 n n  1 n log µ − n 1  xi si xi si T = x s−µ Φ(x, s, µ) = log xi si + − 1 − log . µ µ 2µ 2 2 i=1 i=1 The expression between the second pair of brackets is the well-known primal-dual

COMPARATIVE STUDY OF KERNEL FUNCTIONS

111

logarithmic barrier function for (P ) and (D). It differs from Ψ(v) only by some additive and multiplicative “constants,” depending on n and µ but not on x and s. Note that for any kernel function, the above definition of Φ(x, s, µ) gives rise to a function that goes to infinity if for some i, xi si goes either to infinity or to 0, and hence Φ(x, s, µ) is a barrier function for the feasible domain. It may be worth emphasizing that the search direction in this paper, as defined by (2.6), is obtained neither by solving a barrier problem, nor as a primal-dual Newton direction induced by Φ(x, s, µ). The second kernel function has been studied in [9]; one may easily verify that it is a special case of ψ3 (t), by taking q = 3. The third one has been studied for general q > 1 in [8, 12]. It is in turn a special case of a so-called self-regular kernel function, as studied in [11, 12]. Also note that ψ1 (t) is the limiting value of ψ3 (t) when q approaches 1. The fourth kernel function is the special case of Υp,q (t) for p = 1. The fifth and sixth kernel functions are new, at least to the knowledge of the authors. In each of the first six cases we can write ψ(t) as (2.15)

ψ(t) =

t2 − 1 + ψb (t), 2

where t 2−1 is the so-called growth term and ψb (t) is the barrier term of the kernel function. The growth term dominates the behavior of ψ(t) when t goes to infinity, whereas the barrier term dominates its behavior when t approaches zero. Note that in all cases the barrier term is monotonically decreasing in t. The last kernel function (ψ7 (t), with q > 1) differs from all the other kernel functions in that its growth term, i.e., t − 1, is linear in t. It was first introduced and analyzed in [4]. 2

3. Growth behavior. Note that at the start of each outer iteration of the algorithm, just before the update of µ with the factor √ 1 − θ, we have Ψ(v) ≤ τ . Due to the update of µ the vector v is divided by the factor 1 − θ, with 0 < θ < 1, which in general leads to an increase in the value of Ψ(v). Then, during the subsequent inner iterations, Ψ(v) decreases until it passes the threshold τ again. Hence, during the course of the algorithm the largest values of Ψ(v) occur just after the updates of µ. That is why in this section we derive an estimate for the effect of a µ-update on 1 the value of Ψ(v). In other words, with β = √1−θ , we want to find an upper bound for Ψ(βv) in terms of Ψ(v). We start with the following lemma. Lemma 3.1. Suppose that ψ(t1 ) = ψ(t2 ), with t1 ≤ 1 ≤ t2 and β ≥ 1. Then ψ(βt1 ) ≤ ψ(βt2 ). Equality holds if and only if β = 1 or t1 = t2 = 1. Proof. Consider f (β) = ψ(βt2 ) − ψ(βt1 ). One has f (1) = 0 and f  (β) = t2 ψ  (βt2 ) − t1 ψ  (βt1 ). Since ψ  (t) ≥ 0 for all t > 0, ψ  (t) is monotonically increasing. Hence ψ  (βt2 ) ≥ ψ  (βt1 ). Substitution gives f  (β) = t2 ψ  (βt2 ) − t1 ψ  (βt1 ) ≥ t2 ψ  (βt2 ) − t1 ψ  (βt2 ) = ψ  (βt2 ) (t2 − t1 ) ≥ 0.

112

Y. Q. BAI, M. EL GHAMI, AND C. ROOS

The last inequality holds since t2 ≥ t1 , and ψ  (t) ≥ 0 for t ≥ 1. This proves that f (β) ≥ 0 for β ≥ 1, and hence the inequality in the lemma follows. If β = 1, then we obviously have equality. Otherwise, if β > 1 and f (β) = 0, then the mean value theorem implies f  (ξ) = 0 for some ξ ∈ (1, β). But this implies ψ  (ξt2 ) = ψ  (ξt1 ). Since ψ  (t) is strictly monotonic, this implies ξt2 = ξt1 , whence t2 = t1 . Since also t1 ≤ 1 ≤ t2 , we obtain t2 = t1 = 1. This completes the proof. We have the following result. Theorem 3.2. Let : [0, ∞) → [1, ∞) be the inverse function of ψ(t) for t ≥ 1. Then we have for any positive vector v and any β ≥ 1 that Ψ(v) . Ψ(βv) ≤ nψ β n Proof. First we consider the case where β > 1. We consider the following maximization problem: max {Ψ(βv) : Ψ(v) = z} , v

where z is any nonnegative number. The first order optimality conditions for this problem are (3.1)

βψ  (βvi ) = λψ  (vi ),

i = 1, . . . , n,

where λ denotes the Lagrange multiplier. Since ψ  (1) = 0 and βψ  (β) > 0, we must have vi = 1 for all i. We even may assume that vi > 1 for all i. To see this, let zi (1) be such that ψ(vi ) = zi . Given zi , this equation has two solutions: vi = vi < 1 and (2) (1) (2) vi = vi > 1. As a consequence of Lemma 3.1 we have ψ(βvi ) ≤ ψ(βvi ). Since (2) we are maximizing Ψ(βv), it follows that we may assume vi = vi > 1. Thus we have shown that without loss of generality we may assume that vi > 1 for all i. Note that then (3.1) implies βψ  (βvi ) > 0 and ψ  (vi ) > 0, whence also λ > 0. Now defining g(t) according to g(t) := we deduce from (3.1) that g(vi ) = g  (t) =

β λ

ψ  (t) , ψ  (βt)

t ≥ 1,

for all i. One has

ψ  (t)ψ  (βt) − βψ  (t)ψ  (βt) 2

(ψ  (βt))

.

At this stage we use that ψ(t) satisfies condition (2.14e). Due to this we have g  (t) > 0 for t > 1 and β > 1. So g(t) is strictly monotonically increasing. Hence it follows that all vi ’s are mutually equal. Putting vi = t > 1, for all i, we deduce from Ψ(v) = z that nψ(t) = z. This implies t = ( nz ). Hence the maximal value that Ψ(v) can attain is given by z

Ψ(v) = nψ β . Ψ (βte) = nψ (βt) = nψ β n n This proves the theorem if β > 1. For the case β = 1 it suffices to observe that both sides of the inequality in the theorem are continuous in β.

COMPARATIVE STUDY OF KERNEL FUNCTIONS

113

Corollary 3.3. Using the notation of Theorem 3.2, we have 2 n Ψ(v) −1 . Ψ(βv) ≤ ψ  (1) β n 2 Proof. Since β ≥ 1 and ( Ψ(v) n ) ≥ 1, the corollary follows from Theorem 3.2 by also using Lemma 2.6. 1 As a result we have that if Ψ(v) ≤ τ and β = √1−θ , then

(3.2)

Lψ (n, θ, τ ) := nψ

   nτ √ 1−θ

v is an upper bound for Ψ( √1−θ ), the value of Ψ(v) after the µ-update. Remark 3.4. Note that the bound of Theorem 3.2 is sharp: one may easily verify that if v = βe, with β ≥ 1, then the bound holds with equality.

4. Decrease of the barrier function during an inner iteration. 4.1. Computation of the step size and the decrease. After a damped step we have x+ = x + α∆x,

y+ = y + α∆y,

s+ = s + α∆s.

Hence, with  v= we have

xs , µ



∆x x+ = x e + α x and



∆s s+ = s e + α s

dx =





v∆x , x



ds =

dx =x e+α v

ds =s e+α v



v∆s , s

=

x (v + αdx ) , v

=

s (v + αds ) . v



Thus we obtain (4.1)

2 v+ =

x+ s+ = (v + αdx ) (v + αds ) . µ

Hence, since ψ is e-convex, 

1 Ψ (v+ ) = Ψ (v + αdx ) (v + αds ) ≤ (Ψ (v + αdx ) + Ψ (v + αds )). 2 Defining f (α) := Ψ (v+ ) − Ψ (v) , we thus have f (α) ≤ f1 (α), where f1 (α) :=

1 (Ψ (v + αdx ) + Ψ (v + αds )) − Ψ (v) . 2

114

Y. Q. BAI, M. EL GHAMI, AND C. ROOS

Obviously, f (0) = f1 (0) = 0. Taking the derivative with respect to α, we get 1  (ψ (vi + αdxi ) dxi + ψ  (vi + αdsi ) dsi ). 2 i=1 n

f1 (α) =

This gives, using the third equation in (2.8) and (2.13), (4.2)

f1 (0) =

1 1 ∇Ψ(v)T (dx + ds ) = − ∇Ψ(v)T ∇Ψ(v) = −2δ(v)2 . 2 2

Differentiating once more, we obtain 1   = (ψ (vi + αdxi ) dx 2i + ψ  (vi + αdsi ) ds 2i ). 2 i=1 n

(4.3)

f1 (α)

Below we use the following notation: v1 := min(v),

δ := δ(v),

with δ(v) as defined in (2.13). Lemma 4.1. One has f1 (α) ≤ 2δ 2 ψ  (v1 − 2αδ). Proof. Since dx and ds are orthogonal, (2.13) implies that (dx , ds ) = 2δ. Hence we have dx ≤ 2δ and ds ≤ 2δ. Therefore, vi + αdxi ≥ v1 − 2αδ,

vi + αdsi ≥ v1 − 2αδ,

1 ≤ i ≤ n.

Due to (2.14c), ψ  (t) is monotonically decreasing. Therefore, from (4.3) we obtain f1 (α) ≤

n  1  ψ (v1 − 2αδ) (dx 2i + ds 2i ) = 2δ 2 ψ  (v1 − 2αδ). 2 i=1

This proves the lemma. Lemma 4.2. f1 (α) ≤ 0 certainly holds if α satisfies the inequality (4.4)

−ψ  (v1 − 2αδ) + ψ  (v1 ) ≤ 2δ.

Proof. We may write, using Lemma 4.1, and also (4.2),  α   f1 (α) = f1 (0) + f1 (ξ) dξ 0  α ≤ −2δ 2 + 2δ 2 ψ  (v1 − 2ξδ) dξ  α0 2 = −2δ − δ ψ  (v1 − 2ξδ) d (v1 − 2ξδ) 0

= −2δ − δ (ψ  (v1 − 2αδ) − ψ  (v1 )) . 2

Hence, f1 (α) ≤ 0 will certainly hold if α satisfies −ψ  (v1 − 2αδ) + ψ  (v1 ) ≤ 2δ,

COMPARATIVE STUDY OF KERNEL FUNCTIONS

115

which proves the lemma. Lemma 4.3. Let ρ : [0, ∞) → (0, 1] denote the inverse function of the restriction of − 12 ψ  (t) to the interval (0, 1]. Then the largest step size α that satisfies (4.4) is given by α ¯ :=

(4.5)

1 (ρ (δ) − ρ (2δ)). 2δ

Proof. We want α such that (4.4) holds, with α as large as possible. Since ψ  (t) is decreasing, the derivative with respect to v1 of the expression at the left in (4.4) (i.e., −ψ  (v1 − 2αδ) + ψ  (v1 )) is negative. Hence, fixing δ, the smaller v1 is, the smaller α will be. One has δ=

1 1 1

∇Ψ(v) ≥ |ψ  (v1 )| ≥ − ψ  (v1 ) . 2 2 2

Equality holds if and only if v1 is the only coordinate in v that differs from 1, and v1 ≤ 1 (in which case ψ  (v1 ) ≤ 0). Hence, the worst situation for the step size occurs when v1 satisfies 1 − ψ  (v1 ) = δ. 2

(4.6)

The derivative with respect to α of the left expression in (4.4) equals 2δψ  (v1 − 2αδ) ≥ 0, and hence the left-hand side is increasing in α. So the largest possible value of α satisfying (4.4) satisfies 1 − ψ  (v1 − 2αδ) = 2δ. 2

(4.7)

Due to the definition of ρ, (4.6) and (4.7) can be written as v1 − 2αδ = ρ (2δ) ,

v1 = ρ (δ) , respectively. This implies α=

1 1 (v1 − ρ (2δ)) = (ρ (δ) − ρ (2δ)), 2δ 2δ

proving the lemma. Lemma 4.4. Let ρ and α ¯ be as defined in Lemma 4.3. Then α ¯≥

(4.8)

1 . ψ  (ρ (2δ))

Proof. By the definition of ρ, −ψ  (ρ(δ)) = 2δ. Taking the derivative with respect to δ, we find −ψ  (ρ(δ)) ρ (δ) = 2, which gives (4.9)

ρ (δ) = −

2 < 0. ψ  (ρ(δ))

116

Y. Q. BAI, M. EL GHAMI, AND C. ROOS

Hence ρ is monotonically decreasing. An immediate consequence of (4.5) is  δ  dσ 1 1 2δ α ¯= ρ (σ) dσ = (4.10) ,  2δ 2δ δ δ ψ (ρ(σ)) where we also used (4.9). To obtain a lower bound for α, ¯ we want to replace the argument of the last integral by its minimal value. So we want to know when ψ  (ρ(σ)) is maximal, for σ ∈ [δ, 2δ]. Due to (2.14c), ψ  is monotonically decreasing. So ψ  (ρ(σ)) is maximal for σ ∈ [δ, 2δ] when ρ(σ) is minimal. Since ρ is monotonically decreasing, this occurs when σ = 2δ. Therefore  1 2δ δ dσ 1 1 α ¯= ≥ =  , δ δ ψ  (ρ(σ)) δ ψ  (ρ(2δ)) ψ (ρ(2δ)) which proves the lemma. In what follows we use the notation α ˜=

(4.11)

ψ 

1 (ρ(2δ))

and we will use α ˜ as the default step size. By Lemma 4.4 we have α ¯≥α ˜. Lemma 4.5. If the step size α is such that α ≤ α ¯ , then f (α) ≤ −αδ 2 .

(4.12)

Proof. Let the univariate function h be such that h (α) = 2δ 2 ψ  (v1 − 2αδ) .

h (0) = f1 (0) = −2δ 2 ,

h(0) = f1 (0) = 0,

Due to Lemma 4.1, f1 (α) ≤ h (α). As a consequence, f1 (α) ≤ h (α) and f1 (α) ≤ h(α). Taking α ≤ α ¯ , with α ¯ as defined in Lemma 4.3, we have  α ψ  (v1 − 2ξδ) dξ = −2δ 2 − δ(ψ  (v1 − 2αδ) − ψ  (v1 )) ≤ 0. h (α) = −2δ 2 + 2δ 2 0 

Since h (α) is increasing in α, using Lemma A.3, we may write f1 (α) ≤ h(α) ≤

1  αh (0) = −αδ 2 . 2

Since f (α) ≤ f1 (α), the proof is complete. Combining the results of Lemmas 4.4 and 4.5 we obtain the following theorem. Theorem 4.6. With α ˜ being the default step size, as given by (4.11), one has (4.13)

f (˜ α) ≤ −

ψ 

δ2 . (ρ (2δ))

Lemma 4.7. The right-hand side expression in (4.13) is monotonically decreasing in δ. Proof. Putting t = ρ(2δ), which implies t ≤ 1, and which is equivalent to 4δ = −ψ  (t), t is monotonically decreasing if δ increases. Hence the right-hand expression in (4.13) is monotonically decreasing in δ if and only if the function (ψ  (t)) 16ψ  (t) 2

g(t) :=

COMPARATIVE STUDY OF KERNEL FUNCTIONS

117

is monotonically decreasing for t ≤ 1. Note that g(1) = 0 and g  (t) =

2ψ  (t)ψ  (t)2 − ψ  (t)2 ψ  (t) . 16ψ  (t)2

Hence, since ψ  (t) < 0 for t < 1, g(t) is monotonically decreasing for t ≤ 1 if and only if 2ψ  (t)2 − ψ  (t)ψ  (t) ≥ 0,

t ≤ 1.

The last inequality is satisfied, due to condition (2.14d). Hence the lemma is proved. We want to express the decrease as a function of Ψ(v). To this end we need a lower bound on δ(v) in terms of Ψ(v). Such a bound is provided in the following section. 4.2. Bound on δ(v) in terms of Ψ(v). We need the following lemma. Lemma 4.8. Suppose that ψ(t1 ) = ψ(t2 ), with t1 ≤ 1 ≤ t2 . Then ψ  (t1 ) ≤ 0 and  ψ (t2 ) ≥ 0, whereas −ψ  (t1 ) ≥ ψ  (t2 ). Proof. The lemma is obvious if t1 = 1 or t2 = 1, because then ψ(t1 ) = ψ(t2 ) = 0 implies t1 = t2 = 1. So we may assume that t1 < 1 < t2 . Since ψ(t1 ) = ψ(t2 ), Lemma 2.6 implies that 1 1 2 2 (t1 − 1) ψ  (1) < ψ(t1 ) = ψ(t2 ) < (t2 − 1) ψ  (1). 2 2 Hence, since ψ  (1) > 0, it follows that t2 − 1 > 1 − t1 . Using this and Lemma 2.5, while assuming −ψ  (t1 ) < ψ  (t2 ), we may write ψ(t2 ) >

1 1 1 1 (t2 −1)ψ  (t2 ) > (1−t1 )ψ  (t2 ) > − (1−t1 )ψ  (t1 ) = (t1 −1)ψ  (t1 ) > ψ(t1 ). 2 2 2 2

This contradiction proves the lemma. Theorem 4.9. One has δ(v) ≥

1  ψ ( (Ψ(v))) . 2

Proof. The statement in the lemma is obvious if v = e since then δ(v) = Ψ(v) = 0. Otherwise we have δ(v) > 0 and Ψ(v) > 0. To deal with the nontrivial case we consider, for ω > 0, the problem   n 1  2 2 zω = min δ(v) = ψ (vi ) : Ψ(v) = ω . v 4 i=1 The first order optimality condition is 1  ψ (vi )ψ  (vi ) = λψ  (vi ), 2

i = 1, . . . , n,

where λ ∈ R. From this we conclude that we have either ψ  (vi ) = 0 or ψ  (vi ) = 2λ for each i. Since ψ  (t) is monotonically decreasing, this implies that all vi ’s for which

118

Y. Q. BAI, M. EL GHAMI, AND C. ROOS

ψ  (vi ) = 2λ have the same value. Denoting this value as t, and observing that all other coordinates have value 1 (since ψ  (vi ) = 0 for these coordinates), we conclude that, after reordering the coordinates, v has the form v = (t, . . . , t, 1, . . . , 1 ).       k times n−k times

Now Ψ(v) = ω implies kψ(t) = ω. Given k, this uniquely determines ψ(t), whence we have ω ψ(t) = . 4δ(v)2 = k(ψ  (t))2 , k Note that the equation ψ(t) = ωk has two solutions, one smaller than 1 and one larger than 1. By Lemma 4.8, the larger value gives the smallest value of (ψ  (t))2 . Since we are minimizing δ(v)2 , we conclude that t > 1 (since ω > 0). Hence we may write ω

, t= k where, as before, denotes the inverse function of ψ(t) for t ≥ 1. Thus we obtain that ω

(4.14) . 4δ(v)2 = k(ψ  (t))2 , t = k The question now is which value of k minimizes δ(v)2 . To investigate this, we take the derivative with respect to k of (4.14) extended to k ∈ R. This gives d4δ(v)2 dt = (ψ  (t))2 + 2kψ  (t)ψ  (t) . dk dk

(4.15) From ψ(t) =

ω k

we derive that ψ  (t)

ω dt ψ(t) =− 2 =− , k dk k

which gives dt ψ(t) =−  . dk kψ (t) Substitution into (4.15) gives d4δ(v)2 ψ(t) = (ψ  (t))2 − 2kψ  (t)ψ  (t)  = (ψ  (t))2 − 2ψ(t)ψ  (t). dk kψ (t) Defining f (t) := (ψ  (t)) − 2ψ(t)ψ  (t), we have f (1) = 0 and 2

f  (t) = 2ψ  (t)ψ  (t) − 2ψ  (t)ψ  (t) − 2ψ(t)ψ  (t) = −2ψ(t)ψ  (t) > 0. 2

We conclude that f (t) > 0 for t > 1. Hence dδ(v) > 0, so δ(v)2 increases when k dk 2 increases. Since we are minimizing δ(v) , at optimality we have k = 1. Also using that ψ(t) ≥ 0, we obtain from (4.14) that min {δ(v) : Ψ(v) = ω} = v

1 1 1  ψ (t) = ψ  ( (ω)) = ψ  ( (Ψ(v))) . 2 2 2

119

COMPARATIVE STUDY OF KERNEL FUNCTIONS

ψ(t) − 12 ψ  (t) p = Ψ(v) ψ  (t) s = ψ  ( (p))

0

0 ρ (ψ  ( (p))) 1

t = (p)

Fig. 2. Graphical illustration of the entities arising in (4.16).

This completes the proof of the theorem. Remark 4.10. The bound of Theorem 4.9 is sharp. One may easily verify that if v is such that all coordinates are equal to 1 except one coordinate which is greater than or equal to 1, then the bound holds with equality. Remark 4.11. It is worth noting that the proof of Theorem 4.9 implies that our kernel functions satisfy the inequality (cf. [12, page 37]) ψ  (t)2 − 2ψ(t)ψ  (t) ≥ 0,

t ≥ 1.

Combining the results of Theorems 4.6 and 4.9 we obtain (ψ  ( (Ψ(v)))) . 4ψ  (ρ (ψ  ( (Ψ(v))))) 2

(4.16)

f (˜ α) ≤ −

This expresses the decrease in Ψ(v) during an inner iteration completely in ψ, its first and second derivatives, and the inverse functions ρ and . The entities arising in the right-hand side of (4.16) are depicted in Figure 2. 5. Iteration bounds. After the update of µ to (1−θ)µ we have, by Theorem 3.2 and (3.2),

   τ (5.1) . Ψ(v+ ) ≤ Lψ (n, θ, τ ) = nψ √ n 1−θ We need to count how many inner iterations are required to return to the situation where Ψ(v) ≤ τ . We denote the value of Ψ(v) after the µ-update as Ψ0 , and the subsequent values are denoted as Ψk , k = 1, 2, . . . . The decrease on each inner iteration is given by (4.16). In what follows we assume that the expression in the right-hand side expression of (4.16) satisfies (ψ  ( (Ψ(v)))) ≥ κΨ(v)1−γ 4ψ  (ρ (ψ  ( (Ψ(v))))) 2

(5.2)

120

Y. Q. BAI, M. EL GHAMI, AND C. ROOS

for some positive constants κ and γ, with γ ∈ (0, 1]. It may be worth noting at this point that property (5.2) has to be checked for each case individually. When dealing with the seven specific examples later on, we will show how appropriate values of κ and γ are obtained in each case. Lemma 5.1. If K denotes the number of inner iterations, we have K≤

Ψγ0 . κγ

Proof. The definition of K implies ΨK−1 > τ and ΨK ≤ τ and , Ψk+1 ≤ Ψk − κΨ1−γ k

k = 0, 1, . . . , K − 1.

Yet we apply Lemma A.2, with tk = Ψk . This yields the desired inequality. The last lemma provides an estimate for the number of inner iterations in terms of Ψ0 and the constants κ and γ. Recall that Ψ0 is bounded above according to (5.1). An upper bound for the total number of iterations is obtained by multiplying (the upper bound for) the number K by the number of barrier parameter updates, which is bounded above by (cf. [14, Lemma II.17, page 116]) 1 n log . θ ε Thus we obtain the following upper bound on the total number of iterations:

  γ τ Ψγ0 n n 1 nψ √ n log . (5.3) log ≤ θκγ ε θκγ ε 1−θ 6. Application to the seven kernel functions. 6.1. Introduction. We apply the results of the previous sections to obtain iteration bounds for large- and small-update methods based on the seven kernel functions introduced before. Thus, for each of the kernel functions we will basically do the following. Step 0: Input a kernel function ψ; an update parameter θ, 0 < θ < 1; a threshold parameter τ ; and an accuracy parameter . Step 1: Solve the equation − 12 ψ  (t) = s to get ρ(s), the inverse function of 1  − 2 ψ (t), t ∈ (0, 1]. If the equation is hard to solve, derive a lower bound for ρ(s). Step 2: Calculate the decrease of Ψ(v) in terms of δ for the default step size α ˜ from f (˜ α) ≤ −

δ2 . ψ  (ρ(2δ))

Step 3: Solve the equation ψ(t) = s to get (s), the inverse function of ψ(t), t ≥ 1. If the equation is hard to solve, derive lower and upper bounds for (s). Step 4: Derive a lower bound for δ in terms of Ψ(v) by using δ(v) ≥

1  ψ ( (Ψ(v))) . 2

Step 5: Using the results of Steps 3 and 4 find positive constants κ and γ, with γ ∈ (0, 1], such that f (˜ α) ≤ −κΨ(v)1−γ .

COMPARATIVE STUDY OF KERNEL FUNCTIONS

121

Step 6: Calculate the uniform upper bound Ψ0 for Ψ(v) from

   τ . Ψ0 ≤ Lψ (n, θ, τ ) = nψ √ n 1−θ Step 7: Derive an upper bound for the total number of iterations from Ψγ0 n log . θκγ  Step 8: Set τ = O(n) and θ = Θ(1) so as to calculate an iteration bound for large-update methods, or set τ = O(1) and θ = Θ( √1n ) to get an iteration bound for small-update methods. Using the above scheme, our aim is to compute iteration bounds for large- and small-update methods based on the seven kernel functions. Large-update methods are characterized by τ = O(n) and θ = Θ(1). It may be noted that we could also take smaller values of τ , e.g., τ = O(1), but one may easily check from the outcome of our analysis that this would not affect the order of magnitude of the bounds. Small-update methods are characterized by τ = O(1) and θ = Θ( √1n ). One more remark is in order. At the start of each inner iteration we have Ψ(v) ≥ τ . By Theorem 4.9 this implies that δ(v) ≥ 12 ψ  ( (τ )). We always assume that τ ≥ 1, and that τ is large enough to ensure that δ(v) ≥ 1 at the start of each inner iteration. By way of example, we perform in the next section the complete analysis for ψ(t) = ψ6 (t). Later on, we discuss more briefly the analysis for the remaining kernel functions. In the analysis we make use of three more lemmas. The first two lemmas are useful in deriving bounds for the inverse functions and ρ, in case these functions cannot be computed explicitly. The third lemma sometimes gives a better estimate for Ψ0 in Step 6. The lemmas apply only to the first six kernel functions. Lemma 6.1. Let ψ(t) = ψi (t), with 1 ≤ i ≤ 6, and let ρ : [0, ∞) → (0, 1] be the inverse function of the restriction of −ψb (t) to the interval (0, 1], with ψb (t) as defined in (2.15). Then one has ρ(s) ≥ ρ(1 + 2s). Proof. Let t = ρ(s). Due to the definition of ρ as the inverse function of − 12 ψ  (t) for t ≤ 1 this means that −2s = ψ  (t) = t + ψb (t),

t ≤ 1.

Since t ≤ 1 this implies −ψb (t) = t + 2s ≤ 1 + 2s. Since −ψb (t) is monotonically decreasing in all six cases, it follows from this that t = ρ(s) ≥ ρ(1 + 2s), proving the lemma. To illustrate the use of this lemma consider, for example, the case i = 2. Then 2 1 1 t2 − 1 t−2 − 1 ψ(t) = t− = + , t 2 2 2

122

Y. Q. BAI, M. EL GHAMI, AND C. ROOS −2

whence ψb (t) = t 2−1 . The inverse function of −ψb (t) = Hence, by Lemma 6.1, ρ(s) ≥

1 1

(1 + 2s) 3

1 t3

is given by ρ(s) =

1 1

s3

.

.

It follows that for the default step size α ˜ we have 2

f (˜ α) ≤ −

δ2 δ2 δ3 δ2 = − ≤ − . 4 ≤ − 3 ψ  (ρ(2δ)) 27 1 + ρ(2δ)4 1 + 3 (1 + 4δ) 3

For the last inequality we used our assumption that δ ≥ 1 at the start of each inner iteration. Lemma 6.2. When ψ(t) = ψi (t) and 1 ≤ i ≤ 6, then √ √ 1 + 2s ≤ (s) ≤ 1 + 2s. Proof. The inverse function of ψ(t) for t ∈ [1, ∞) is obtained by solving t from the equation ψ(t) = s, for t ≥ 1. In almost all cases it is hard to solve this equation explicitly. However, we can easily find a lower and an upper bound for t, and this suffices for our goal. First one has s = ψ(t) =

t2 − 1 t2 − 1 + ψb (t) ≤ , 2 2

where ψb (t) denotes the barrier term. The inequality is due to the fact that ψb (1) = 0 and ψb (t) is monotonically decreasing. It follows that √ t = (s) ≥ 1 + 2s. For the second inequality we use that ψi (t) ≥ 1 for 1 ≤ i ≤ 6, as is clear from Table 2.1. Also using (2.12) we may write  t s = ψ(t) = 1

ξ

ψ  (ζ) dζdξ ≥

 t

1

ξ

dζdξ = 1

1

1 (t − 1)2 , 2

which implies t = (s) ≤ 1 +



2s.

This completes the proof. Lemma 6.3. Let 1 ≤ i ≤ 6. Then one has √ √ ψ  (1) ( 2τ + θ n)2 Lψ (n, θ, τ ) ≤ . 2 1−θ Hence, if τ = O(1) and θ = Θ( √1n ), then Ψ0 = O(ψ  (1)). √ Proof. By Lemma 6.2 we have (s) ≤ 1 + 2s. Hence, also using (5.1) we have  ⎞ ⎛

   2τ τ 1 + n ⎠. ≤ nψ ⎝ √ Ψ0 ≤ Lψ (n, θ, τ ) = nψ √ n 1−θ 1−θ

123

COMPARATIVE STUDY OF KERNEL FUNCTIONS

Applying Lemma 2.6 we obtain   ⎞2 ⎛ ⎛ ⎞2 √ √ 2τ 2τ   1 + θ + nψ (1) ⎝ nψ (1) ⎝ ψ  (1) ( 2τ + θ n)2 n n ⎠ ⎠ √ √ , Ψ0 ≤ ≤ = −1 2 2 2 1−θ 1−θ 1−θ where we also used 1−

(6.1)



1−θ =

1+

θ √ ≤ θ. 1−θ

This proves the lemma. 6.2. Example: Analysis of methods based on ψ6 (t). Consider ψ(t) = ψ6 (t):  t 1 t2 − 1 ψ(t) = e ξ −1 dξ. − 2 1 The inverse function of −ψb (t) = e t −1 is given by ρ(s) = Lemma 6.1, 1

ρ(s) ≥

1 1+log s .

Hence, by

1 . 1 + log(1 + 2s)

It follows that f (˜ α) ≤ −

ψ 

δ2 =− (ρ(2δ))

δ2 1+

1 −1 e ρ(2δ) ρ(2δ)2

≤−

δ2 . 1 + (1 + 4δ) (1 + log(1 + 4δ))2

By Lemma 6.2 the inverse function of ψ(t) for t ∈ [1, ∞) satisfies √ √ 1 + 2s ≤ (s) ≤ 1 + 2s. Thus we have, omitting the argument v, (Ψ(v)) ≥



1 + 2Ψ.

Now using that δ(v) ≥ 12 ψ  ( (Ψ(v))), we obtain

√ 1 −1 1 √ 1 √ Ψ 1+2Ψ √ δ≥ ≥ . 1 + 2Ψ − e 1 + 2Ψ − 1 = 2 2 1 + 1 + 2Ψ Substitution gives, after some elementary reductions, while using Ψ0 ≥ Ψ ≥ τ ≥ 1, 1

f (˜ α) ≤ −

1

Ψ2 Ψ2 √ √ ≤− . 2 21(1 + log(1 + Ψ0 ))2 21(1 + log(1 + Ψ))

Thus it follows that 1−γ

Ψk+1 ≤ Ψk − κ (Ψk )

,

k = 0, 1, . . . , K − 1,

1 √ and γ = 12 , and where K denotes the number of inner with κ = 21(1+log(1+ Ψ0 ))2 iterations. Hence the number K of inner iterations is bounded above by 

2 1 Ψγ Ψ02 . K ≤ 0 = 42 1 + log 1 + Ψ0 κγ

124

Y. Q. BAI, M. EL GHAMI, AND C. ROOS Table 6.1 Results of Steps 1 and 2. i

Kernel function ψi (t) 2

t −1 2

1

1 2

2

5 6 7

t2 −1 2

+

+

t2 −1 2



t1−q −1 q−1



+

t

t−1+





t1−q −1 q(q−1) t2 −1 2

=

1 2 t

t−

t2 −1 2

3 4



− log t

1

1 et

−e e

1 −1



f (α) ˜ ≤

√1

1 − 19

s+

1+s2

2

− δ273

1 1

(1+2s) 3

q−1 (t q



ρ(s)



t1−q −1 q−1

1 1

− 1)



1 1

δ2



(1+2s) q

q+1

1+q(1+4δ) q δ2



q+1

(1+2qs) q

1+(1+4qδ) q



1 1+log(1+2s)

δ − 1+15δ(1+log(1+4δ)) 2



1 1+log(1+2s)

δ − 1+(1+4δ)(1+log(1+4δ)) 2

=

1 1 (1+2s) q

2

2

δ2



q+1

q(4δ+1) q

We use Lemma 6.3, with ψ  (1) = 2, to estimate Ψ0 . This gives √ √ (θ n + 2τ )2 . Ψ0 ≤ 1−θ Substitution in the expression for K gives

√ 2 √ √ √ θ n + 2τ θ n + 2τ √ K ≤ 42 1 + log 1 + √ . 1−θ 1−θ Thus the total number of iterations is bounded above by

√ 2 √ √ √ K θ n + 2τ n θ n + 2τ n √ log ≤ 42 1 + log 1 + √ log . θ ε ε 1−θ θ 1−θ For large-update√methods (when τ = O(n) and θ = Θ(1)) the right-hand side expresmethods (when τ = O(1) and sion becomes O( n(log n)2 log nε ), and for small-update √ θ = Θ( √1n )) the right-hand side expression becomes O( n log nε ). 6.3. Analysis of all the examples. In this section we survey the analysis results for all the kernel functions ψi (t), 1 ≤ i ≤ 7. The analysis for each of the kernel functions proceeds in the same way as in the previous section for ψ6 (t). To save space, we do not present details of the computations for the other kernel functions. We only give the outcome of each step of our computational scheme for each of the kernel functions. For readers who want to see the details of the computations, we refer to an earlier draft of this paper that can be downloaded from http://www.isa.ewi.tudelft.nl/∼roos/wpapers.html. The outcome of Steps 1–7 is given in Tables 6.1, 6.2, and 6.3. 6.4. Complexity results. To get complexity results we finally have to perform Step 8 in our scheme. Setting τ = O(n) and θ = Θ(1) we obtain the iteration bound for large-update methods, and setting τ = O(1) and θ = Θ( √1n ) we obtain the iteration bound for small-update methods. The resulting iteration bounds are summarized in the 3rd and 5th columns of Table 6.4.

125

COMPARATIVE STUDY OF KERNEL FUNCTIONS Table 6.2 Results of Steps 3, 4, and 5. i

(s)

δ(v) ≥

γ

κ

1

not needed

not needed

1

1 19

1 1 Ψ2 2

2 3

0.02333

2

q+1 2q

1 56 q 1 53 q

=

2 3 4 5 6 7

√ √ √ √

s

+

2



s 2

1+

1 + 2s ≤  (s) ≤ 1 + 1 + 2s ≤  (s) ≤ 1 + 1 + 2s ≤  (s) ≤ 1 + 1 + 2s ≤  (s) ≤ 1 +

1 + s ≤  (s) ≤ 1 +



√ √ √ √

s2 +

 1 1 Ψ 2

2s

2

2s 2s 2s q s q−1

1 2



1+

√Ψ 1+2Ψ

q+1 2q

1+

√Ψ 1+2Ψ

1 2

1+

√Ψ 1+2Ψ

1 2

1−



1 (Ψ+1)q



1

44 1+log(1+





1

21 1+log(1+

2Ψ0 )



Ψ0 )

2

2

1 64 q

1

Table 6.3 Results of Steps 6 and 7. i

Ψ0 ≤

1

√ √ (θ n+ 2τ )2 1−θ

2

1 1−θ

q+1 (θ 2

3

τ+

nθ √ 2n

2



√ √ 2 n+ 2τ ) 1−θ √ √ (θ n+ 2τ )2 2 1−θ √ √ (θ n+ 2τ )2 1−θ

5 6

q

θ

√ n+



τ2 n





√ √ 2 n+ 2τ ) 1−θ



4

7



K ≤ θ √ √ (θ n+ 2τ )2 19 θ(1−θ) 4 √ 3 √ 65 τ + θ√ n 2 2 3 θ(1−θ) q+1 √ √ (θ n+ 2τ )2 2q 112 q(q+1) θ 2(1−θ) 106 q θ







q+1 √ √ 2 2q n+ 2τ ) 1−θ √ √ 2 √ √ 2τ θ n+ 2τ √ 1 + 2 θ √n+ 1−θ θ 1−θ √ √ 2 √ √ 2τ θ n+ 2τ √ 1 + θ √n+ 1−θ θ 1−θ



√ 88 2 1 + log

2



42 1 + log

2(1−θ)





q2

+2τ



θ

√ n+

32



τ2 n

2

+2τ

θ(1−θ)

Small-update methods based on the seven kernel functions all have the same complexity √ as the small-update method based on the logarithmic barrier function, namely, O( n log nε ); as is well known, until now this has been the best iteration bound for methods solving LO problems. It must be noted that where appropriate (i.e., for i ∈ {3, 4, 7}) one has to take q = O(1) to achieve this bound; moreover, the derivation of the small-update bound for i = 7 is valid only for q ≥ 2. Now let us consider the bounds for large-update methods in Table 6.4. It should be mentioned that the large-update bound in this table for i = 3 is based on estimates for Ψ0 and K θ that differ from the estimates in Table 6.3; for this case we used that √ θn + 2τ + 2 2τ n Ψ0 ≤ , 2 (1 − θ)

K 112q ≤ θ θ

 q+1 √ 2q θn + 2τ + 2 2τ n . 2 (1 − θ)

126

Y. Q. BAI, M. EL GHAMI, AND C. ROOS Table 6.4 Complexity results for large- and small-update methods.

i

Kernel function ψi (t) 2

t −1 2

1

 1

2

2

4 5 6 7

t2 −1 2

+

− log t

t−

t2 −1 2

3

t1−q −1 q(q−1) t −1 2

t2 −1 2



+

t

t−1+

O

 1 2

O

t

1

q−1 (t q

− 1 et

−e e

e

1 −1 ξ

− 1)

t1−q −1 q−1

n log

√ 

n log

O

√ 

n log

√ 

n log

 2√ 

O q

Ref.

n ε

e.g., [14]

n ε

[7]

 √  O q n log O



√ 

 √  O q 2 n log

t1−q −1 q−1

+

2

Small-update

n ε n ε



O qn

new



O O

new

√ √

log

q+1

[11]

n ε

2

O n3 O qn

new

n ε



2q

q+1 2q

Ref.

n ε

O (n) log

[8]

n ε

n log

Large-update

e.g., [14] n ε

[7]

log

n ε

[8, 12]

log

n ε

[11, 12]



n log2 n log



n log2 n log

O (qn) log

n ε

n ε

new

n ε

new [4]

One may easily verify that the best iteration bound for large-update methods is obtained for i ∈ {3, 4} by taking q = 12 log n. This gives the iteration bound √ O( n(log n) log nε ), which is currently the best known bound for large-update methods. 7. Concluding remarks. This paper was inspired by recent work on so-called self-regular barrier functions for primal-dual IPMs for LO [11, 12]. Each such barrier function is determined by its (univariate) self-regular kernel function. We introduced a new class of kernel functions which differs from the class of self-regular kernel functions. The class is defined by some simple conditions on the kernel function which concern the growth and the barrier behavior of the kernel function. These properties enable us to derive many new and tight estimates that greatly simplify the analysis of IPMs based on these kernel functions. It has become clear that inverse functions of suitable restrictions of the kernel function and its first derivative play a key role in the behavior of the corresponding IPMs. Moreover, due to the new estimates the analysis of an IPM based on a given kernel function is greatly simplified and uses tools only from a first year calculus course. Future research might focus on finding a kernel function for √ which the complexity of large-update methods is equal to (or even better than) O( n log nε ), or show that such a kernel function does not exist. In this respect it might be of interest to show that the new class of kernel functions contains many functions not considered so far. For example, ψ(t) =

6 π(1 − t) t2 − 1 + tan 2 π 4t + 2

also is a member of the new class. Also the extensions to semidefinite optimization and second order cone optimization deserve to be investigated. At present no computational results exist for the methods presented in this paper. This will be another issue for future research. Appendix. Three technical lemmas. We need three simple technical results. For the sake of completeness we include their (short) proofs. The first lemma is needed only in the proof of the second lemma, which is interesting in itself.

COMPARATIVE STUDY OF KERNEL FUNCTIONS

127

Lemma A.1 (Lemma 2.1 in [9]). If α ∈ [0, 1], then (1 + t)α ≤ 1 + αt

(A.1)

∀t ≥ −1.

Proof. Consider the function f (t) = (1 + t)α − 1 − αt for t ≥ −1. One has f (t) = α(1 + t)α−1 − α and f  (t) = α(α − 1)(1 + t)α−2 . Since f  (t) ≤ 0, f (t) is concave. Since f  (0) = 0, the function f is maximal at t = 0. Finally, since f (0) = 0, the lemma follows. Lemma A.2 (Proposition 2.2 in [9]). Let t0 , t1 , . . . , tK be a sequence of positive numbers such that 

, tk+1 ≤ tk − κt1−γ k

(A.2)

where κ > 0 and 0 < γ ≤ 1. Then K ≤



k = 0, 1, . . . , K − 1, tγ 0 κγ

 .

Proof. Using (A.2), we may write γ −γ γ γ 0 < tγk+1 ≤ (tk − κt1−γ )γ = tγk (1 − κt−γ k ) ≤ tk (1 − κγtk ) = tk − κγ, k

where the second inequality follows from (A.1). Hence, for each k, tγk ≤ tγ0 − kγκ. Taking k = K we obtain 0 < tγ0 − Kγκ, which implies the lemma. Lemma A.3 (Lemma 3.12 in [11]). Let h(t) be a twice differentiable convex function with h(0) = 0, h (0) < 0, and let h(t) attain its (global) minimum at t∗ > 0. If h (t) is increasing for t ∈ [0, t∗ ], then h(t) ≤

th (0) , 2

0 ≤ t ≤ t∗ .

Proof. Using the hypothesis of the lemma we may write  t ξ  t h (ξ)dξ = h (0)t + h (ζ)dζdξ ≤ h (0)t + ξh (ξ)dξ 0 0 0 0  t  t ξdh (ξ) = h (0)t + (ξh (ξ)) |t0 − h (ξ)dξ = h (0)t + 0 0  t  dh (ξ) = h (0)t − h(t). ≤ h (0)t − 

t

h(t) =

0

This implies the lemma. Acknowledgments. The authors kindly acknowledge the help of the associate editor and two anonymous referees in improving the readability of the paper. The new and nice motivation of the search direction presented in section 2.2, as a Newton-like step, was provided by one of the referees, and is especially acknowledged. REFERENCES ´sza ´ros, and X. Xu, Implementation of interior-point [1] E.D. Andersen, J. Gondzio, Cs. Me methods for large scale linear programs, in Interior Point Methods of Mathematical Programming, T. Terlaky, ed., Kluwer Academic Publishers, Dordrecht, The Netherlands, 1996, pp. 189–252. [2] Y.Q. Bai, M. El ghami, and C. Roos, A primal-dual interior-point algorithm for linear optimization based on a new proximity function, Optim. Methods Softw., 17 (2002), pp. 985– 1008.

128

Y. Q. BAI, M. EL GHAMI, AND C. ROOS

[3] Y.Q. Bai, M. El ghami, and C. Roos, A new efficient large-update primal-dual interior-point method based on a finite barrier, SIAM J. Optim., 13 (2003), pp. 766–782. [4] Y.Q. Bai and C. Roos, A primal-dual interior-point method based on a new kernel function with linear growth rate, Proceedings of the 9th Australian Optimisation Day, Perth, Australia, 2002, to appear. [5] N.K. Karmarkar, A new polynomial-time algorithm for linear programming, Combinatorica, 4 (1984), pp. 373–395. [6] N. Megiddo, Pathways to the optimal set in linear programming, in Progress in Mathematical Programming: Interior Point and Related Methods, N. Megiddo, ed., Springer-Verlag, New York, 1989, pp. 131–158. [7] J. Peng, C. Roos, and T. Terlaky, New complexity analysis of the primal-dual Newton method for linear optimization, Ann. Oper. Res., 99 (2000), pp. 23–39. [8] J. Peng, C. Roos, and T. Terlaky, A new and efficient large-update interior-point method for linear optimization, Vychisl. Tekhnol., 6 (2001), pp. 61–80. [9] J. Peng, C. Roos, and T. Terlaky, A new class of polynomial primal-dual methods for linear and semidefinite optimization, European J. Oper. Res., 143 (2002), pp. 234–256. [10] J. Peng, C. Roos, and T. Terlaky, Primal-dual interior-point methods for second-order conic optimization based on self-regular proximities, SIAM J. Optim., 13 (2002), pp. 179–203. [11] J. Peng, C. Roos, and T. Terlaky, Self-regular functions and new search directions for linear and semidefinite optimization, Math. Program., 93 (2002), pp. 129–171. [12] J. Peng, C. Roos, and T. Terlaky, Self-Regularity: A New Paradigm for Primal-Dual Interior-Point Algorithms, Princeton University Press, Princeton, NJ, 2002. [13] J. Renegar, A Mathematical View of Interior-Point Methods in Convex Optimization, MPS/SIAM Ser. Optim. 3, SIAM, Philadelphia, 2001. [14] C. Roos, T. Terlaky, and J.-Ph. Vial, Theory and Algorithms for Linear Optimization. An Interior-Point Approach, John Wiley & Sons, Chichester, UK, 1997. [15] G. Sonnevend, An “analytic center” for polyhedrons and new classes of global algorithms for linear (smooth, convex) programming, in System Modelling and Optimization: Proceedings of the 12th IFIP-Conference, Budapest, Hungary, 1985, Lecture Notes in Control and Inform. Sci. 84, A. Pr´ ekopa, J. Szelezs´ an, and B. Strazicky, eds., Springer-Verlag, Berlin, 1986, pp. 866–876. [16] S.J. Wright, Primal-Dual Interior-Point Methods, SIAM, Philadelphia, 1997. [17] Y. Ye, Interior Point Algorithms, Theory and Analysis, John Wiley & Sons, Chichester, UK, 1997.