Derivatives of spectral functions A.S. Lewis Department of Combinatorics & Optimization University of Waterloo Waterloo, Ontario N2L 3G1 1
[email protected] March 3, 1994 Key words matrix functions, spectral functions, eigenvalues, perturbation, unitarily invariant, dierentiability, nonsmooth analysis, Clarke derivative
AMS 1991 Subject Classi cation: Primary: 15A18, 26B05 Secondary: 49K40, 90C31
1 Research partially supported by the Natural Sciences and Engineering Research Council of Canada
1
Abstract A spectral function of a Hermitian matrix X is a function which depends only on the eigenvalues of X , 1 (X ) 2(X ) : : : n(X ), and hence may be written f (1(X ); 2(X ); : : :; n(X )) for some symmetric function f . Such functions appear in a wide variety of matrix optimization problems. We give a simple proof that this spectral function is dierentiable at X if and only if the function f is dierentiable at the vector (X ), and we give a concise formula for the derivative. We then apply this formula to deduce an analogous expression for the Clarke generalized gradient of the spectral function. A similar result holds for real symmetric matrices.
2
1 Introduction and notation Optimization problems involving a symmetric matrix variable, X say, frequently involve symmetric functions of the eigenvalues of X in the objective or constraints. Examples include the maximum eigenvalue of X , or log(det X ) (for positive de nite X ), or eigenvalue constraints such as positive semide niteness. The aim of this paper is to provide a uni ed, concise and constructive approach to the calculus of such matrix functions. The convex case was covered in [14]: here we use an independent approach to develop the nonconvex case. Since the seminal paper [5], the study of matrix optimization problems (and in particular eigenvalue optimization) has become extremely prominent. A typical constraint is positive semide niteness [7, 22, 25, 26], and with the modern trend towards interior point methods, it has become popular to incorporate this constraint by a barrier penalty function (involving the eigenvalues) [16, 1, 12]. A related objective function is used in [8] to give an elegant variational characterization of certain quasi-Newton formulae (see also [25]). One very common objective function is the maximum eigenvalue [17, 18, 21, 12], or more generally, sums of the largest eigenvalues [19, 10]. A key step in algorithm development is the investigation of sensitivity results, and hence dierentiability questions about the eigenvalues. The standard reference on the eect on eigenvalues of perturbations to a matrix is [13], which for the most part deals with matrices parametrized by a scalar. By contrast, what are needed in this context are sensitivity results with respect to matrix perturbations: the two recent papers [19] and [10] undertake 3
detailed constructive studies of this question. More generally, we may wish to construct generalized gradients: [2] is an interesting example. For recent, second-order approaches, see [23] and the references therein. Interesting applications include [20] and [24]. Let Hn denote the real vector space of n n Hermitian matrices, endowed with the trace inner product, hX; Y i = trXY , and let Un denote the n n unitary matrices. A real-valued function F de ned on a subset of Hn is unitarily invariant if F (U XU ) = F (X ) for any unitary U . Such functions are called spectral functions [9], since clearly F (X ) depends only on the set of eigenvalues of X , denoted 1(X ) 2(X ) : : : n (X ). (This notation also permits us to consider the function : Hn ! Rn .) Associated with any spectral function F is a symmetric, real-valued function f of n real variables (where by symmetric we mean that f () = f (P) for all n n permutation matrices P ). Speci cally, we de ne f () = F (Diag), where Diag is the diagonal matrix with diagonal 1; 2; : : :; n . Thus we see that spectral functions F (X ) are exactly those functions of the form f ((X )) for symmetric functions f . We begin by describing a straightforward approach to answering the question: when is the spectral function f (()) = (f )() dierentiable at the Hermitian matrix X ? We prove the following result. (A set in Rn is symmetric if P = for all n n permutation matrices P .)
Theorem 1.1 Let the set in Rn be open and symmetric, and suppose that the function f : ! R is symmetric. Then the spectral function f (()) is dierentiable at the matrix X if and only if f is dierentiable at the vector
4
(X ). In this case (1.2)
(f )0(X ) = U (Diag(f 0((X ))))U;
for any unitary matrix U satisfying X = U (Diag((X )))U .
It is easy to see that f must be dierentiable at (X ) whenever f is dierentiable at X , since we can write
f () = f ((U (Diag)U )); (with U as in the theorem), and apply the chain rule at = (X ). Furthermore, the converse is also straightforward at matrices X with distinct eigenvalues, since then the map : Hn ! Rn is dierentiable at X and we can easily apply the chain rule to deduce formula (1.2). The interesting case is when some of the eigenvalues of X coalesce: remarkably the spectral function f (()) remains dierentiable at X even though the map is not. The situation where the function f is convex is considered in rather more generality and with an entirely dierent approach in [14]. The following analogous result is [14, Theorem 3.2]. In this result, @ denotes the convex subdierential.
Theorem 1.3 Suppose that the function f : Rn ! (?1; +1] is symmetric, convex and lower semicontinuous. Then the Hermitian matrix Y lies in @ (f )(X ) if and only if (Y ) lies in @f ((X )) and there exists a unitary U with X = U (Diag(X ))U and Y = U (Diag(Y ))U .
In fact when f is convex, and dierentiable at the vector (X ) (lying in the interior of its domain), Theorem 1.3 reduces to formula (1.2) (see [14]). 5
A nice example is to take the symmetric (convex) function
f () = ?
n X
1
log i on = f j 1; 2; : : :; n > 0g;
which corresponds to the spectral function
F (X ) = f ((X )) = ? log(det X ); de ned on the set of positive de nite matrices X . Formula (1.2) then gives F 0(X ) = ?X ?1 (which may also be obtained directly). Our ultimate aim in this paper is to unify Theorems 1.1 and 1.3 by considering the Clarke generalized gradient. For an open set in Rn and a function f : ! R, we say that f is locally Lipschitz around a point in
if there is a real constant k with jf ( ) ? f ( )j kk ? k for all points and close to . If the set is symmetric and the function f is symmetric and locally Lipschitz around a point in then the spectral function f is locally Lipschitz around the matrix Diag, because each component i() is locally Lipschitz throughout Hn (see the next section). The Clarke directional derivative [3] is de ned for a direction in Rn by f (; ) = lim sup f ( + tt) ? f ( ) ; !; t#0 and we say that a vector lies in the Clarke generalized gradient @f () if h ; i f (; ) for all in Rn . We make analogous de nitions for locally Lipschitz functions on Hn . The set @f () is compact, convex and nonempty. It coincides with the convex subdierential when f is nite and convex on the open, convex set
, and it is exactly ff 0()g if f is continuously dierentiable (though not 6
necessarily if f 0 is not continuous at ): see [3] for these and related results. Our main result, the following theorem, thus goes a long way towards unifying Theorems 1.1 and 1.3. (We will make this more precise at the end of the paper.)
Theorem 1.4 Let the set in Rn be open and symmetric, and suppose that the Hermitian matrix X has (X ) 2 . Suppose that the function f : ! R is symmetric, and is locally Lipschitz around the point (X ). Then
@ (f )(X ) = fU (Diag )U j 2 @f ((X )); U 2 Un ; U (Diag(X ))U = X g: We conclude by observing that the same approach applies to real symmetric matrices X , simply substituting `real orthogonal' for `unitary' wherever appropriate.
2 The dierentiable case For each integer m = 1; 2; : : : ; n, de ne a function m : Hn ! R by m(X ) = Pm (X ), the sum of the m largest eigenvalues of the matrix X . It is a well1 i known result of Fan's (see [6]) that m is convex (see also [11]). Our proof revolves around the following known fact. We denote the standard basis in Rn by e1; e2 ; : : :; en .
Theorem 2.1 For real numbers 1 2 : : : n , if m > m+1 for some
m then the function m is dierentiable at Diag with m X m0 (Diag) = Diag ei: (The condition holds vacuously for m = n.)
7
i=1
Proof. See [10, Corollary 3.10], or the proof of [14, Corollary 3.10], or
2
formula (3.28) in [19] for example.
Given two vectors and in Rn, we say that block-re nes if i = j whenever i = j .
Lemma 2.2 If block-re nes in Rn, and 1 2 : : : n , then the function T () is dierentiable at Diag with (T )0 (Diag) = Diag. Proof. Suppose that 1 = 2 = : : : = k1 > k1 +1 = : : : = k2 > k2 +1 : : : = k : r
Since block-re nes , there exist reals 1; 2; : : : ; r with
i = j whenever kj?1 < i kj ; j = 1; 2; : : : ; r; where we set k0 = 0. De ning 0 0, we obtain
kj r r X X X T (X ) = j i (X ) = j kj (X ) ? kj?1 (X ) : j =1 j =1 i=kj?1 +1
Now applying Theorem 2.1 gives
1 0 kX k ?1 r X X j @Diag ei ? Diag eiA (T )0(Diag) = j
j
=
j =1 r X j =1
j Diag
i=1 k Xj
ei = Diag;
i=kj?1 +1
as required.
i=1
2
Henceforth we shall assume that the set is open and symmetric in Rn and that the function f : ! R is symmetric. 8
Lemma 2.3 If f is dierentiable at a point in satisfying 1 2 : : : n, then block-re nes f 0(). Consequently the function f 0()T () is dierentiable at the matrix Diag, with (f 0 ()T )0 (Diag) = Diag(f 0 ()).
Proof. Suppose that i = j for some distinct indices i and j . Let P be
the matrix of the permutation which transposes the ith and j th components. Since the function f is symmetric, f ( ) = f (P ) for all points in the set , so applying the chain rule at = gives f 0() = P T f 0(P). Thus Pf 0() = f 0(), so that (f 0())i = (f 0())j . The last statement follows from the previous lemma. 2
Since each component of the function () can be written as a dierence of two nite, convex functions, i() = i() ? i?1(), it follows that is locally Lipschitz [3]. We now prove the key result.
Theorem 2.4 If the symmetric function f is dierentiable at a point in
Rn satisfying 1 2 : : : n , then the spectral function f (()) is dierentiable at the matrix Diag with
(f )0(Diag) = Diag(f 0()):
Proof. Given any real > 0, since f is dierentiable at we have jf ( ) ? f () ? f 0()T ( ? )j k ? k; for points suciently close to . Since is locally Lipschitz around Diag, there is a real constant K with
k(Y + Diag) ? k K kY k 9
for all Hermitian Y suciently small. Hence
jf ((Y + Diag)) ? f () ? f 0()T ((Y + Diag) ? )j k(Y + Diag) ? k KkY k; for all small Y . We also know from Lemma 2.3 that
jf 0()T (Y + Diag) ? f 0()T ? tr(Y Diag(f 0()))j kY k; for all small Y . Now adding the two previous inequalities and using the triangle inequality gives
jf ((Y + Diag)) ? f () ? tr(Y Diag(f 0()))j (K + 1)kY k for all small Y , which completes the proof.
2
Proof of Theorem 1.1. As we observed, one direction is easy, so suppose
that f is dierentiable at the vector (X ), and choose any unitary matrix U with X = U (Diag((X )))U . Now clearly for all Hermitian Z close to X , (f )(UZU ) = (f )(Z ): Applying Theoren 2.4 and the chain rule at Z = X gives (f )0 (X ) = U ((f )0(UXU ))U = U ((f )0(Diag((X ))))U = U (Diag(f 0((X )))U; since the adjoint of the linear map X 7! UXU is just W 7! U WU . 10
2
Corollary 2.5 Theorem 2.4 holds without the assumption that 1 2 : : : n . Proof. Let be the vector obtained by permuting the components of the
vector into nonincreasing order, and pick a permutation matrix P with P = . Since f is symmetric, we know that f (P ) = f ( ) for all points close to , so applying the chain rule at = gives f 0() = P T f 0(P), and hence f 0() = Pf 0(). Now if we set X = Diag then (X ) = . Observe that P (Diag)P T = Diag(P), so we can choose U = P in Theorem 1.1, and deduce that (f )0 (Diag) = P T (Diag(f 0()))P = Diag(P T f 0()) = Diag(f 0());
2
as required. As an example, let the symmetric function f : Rn ! R be de ned by
f () = sum of the m largest elements of f1; 2; : : :n g: This function is dierentiable at any point for which
1 2 : : : m > m+1 m+2 : : : n ; with f 0() = Pm1 ei. Now Theorem 1.1 reduces to Theorem 2.1.
3 The Clarke case Throughout this section we shall suppose that the set in Rn is symmetric and open, that the point in Rn satis es 1 2 : : : n , and that the symmetric function f : ! R is locally Lipschitz around . 11
For a Hermitian matrix Z , the vector diagZ is the diagonal of Z . The map diag : Hn ! Rn may be thought of as the adjoint of the map Diag : Rn ! Hn. For square matrices A1; A2; : : :; Ar, we write the block-diagonal matrix
0 B B B B B B B B @
A1 0 ... 0
0 A2 ... 0
::: ::: ... :::
0 0 ... Ar
1 CC CC CC = Diag(A1; A2; : : :; Ar ): CC A
We denote the set of n n (real) doubly-stochastic matrices by Sn. The following result is elementary
Lemma 3.1 For any unitary matrix U and any Hermitian matrices X and Z we have (f )(X ; Z ) = (f )(U XU ; U ZU ). Proof. (f )(X ; Z ) = lim sup f ((Y + tZ )) ? f ((Y )) t Y !X; t#0 = lim sup f ((U (Y + tZ )Ut)) ? f ((U Y U )) Y !X; t#0 )) ? f ((W )) = lim sup f ((W + tU ZU t W !U XU; t#0 = (f ) (U XU ; U ZU );
2
as required. We de ne a compact set of n n matrices, (3.2)
W = fU 2 Un j U (Diag)U = Diagg: 12
Theorem 3.3 For any Hermitian matrix Z we have (3.4)
(f )(Diag; Z ) = maxff (; diag(UZU )) j U 2 Wg:
Proof. By [3, page 64], there exists a sequence of Hermitian Xr ! Diag with
h(f )0(Xr ); Z i ! (f ) (Diag; Z ); and notice that (Xr ) ! . For each r = 1; 2; : : :, there exists a unitary Ur with Ur(Diag((Xr ))Ur = Xr , and so by Theorem 1.1,
(f )0(Xr ) = Ur(Diag(f 0((Xr ))))Ur : Since Un is compact, there is a subsequence for which Ur0 ! U 2 Un . But now,
X 0 = Diag; U 0 (Diag((Xr0 )))Ur0 = lim U (Diag)U = lim r0 r r0 r
so that U 2 W . Hence
0 (f )(Diag; Z ) = lim r hUr (Diag(f ((Xr ))))Ur ; Z i = lim hf 0 ((Xr0 )); diag(Ur0 ZUr0 )i r0
hf 0 ((Xr0 )); diag(UZU )i = lim r0
lim !sup hf 0( ); diag(UZU )i = f (; diag(UZU )):
Thus we have proved `' in formula (3.4). On the other hand, x a matrix U in W . Again by [3, page 64], there is a sequence of points r ! with 0 r f (; diag(UZU )) = lim r hf ( ); diag(UZU )i
13
0 = lim r hDiag(f (r )); UZU i
0 = lim r h(f ) (Diag r ); UZU i
lim sup h(f )0(Y ); UZU i =
Y !Diag (f )(Diag; UZU )
= (f )(Diag; Z );
2
using Corollary 2.5 and Lemma 3.1.
We next translate Theorem 3.3 into a statement about the Clarke generalized gradient. We de ne another set of n n matrices (3.5)
D = fU (Diag )U j U 2 W ; 2 @f ()g:
Notice that since @f () is a compact set in Rn and W is a compact set in Cnn , and since the map ( ; U ) 7! U (Diag )U is clearly continuous, it follows that D is compact.
Corollary 3.6 The Clarke generalized gradient @ (f )(Diag) is the convex hull of the set D. Proof. The convex hull of D, and the generalized gradient are both com-
pact, convex sets, so it will suce to see that the two corresponding support functions are identical [3, page 28]. The support function of the convex hull of D, evaluated at a Hermitian matrix Z , is (using Theorem 3.3) maxfhZ; Y i j Y 2 Dg = maxfhZ; U (Diag )U i j U 2 W ; 2 @f ()g 14
= maxfmaxfhdiag(UZU ); i j 2 @f ()g j U 2 Wg = maxff (; diag(UZU )) j U 2 Wg = (f )(Diag; Z ); which is the support function of the required Clarke generalized gradient. 2 To continue the proof of Theorem 1.4 we need to show that the set D is convex. The rst step is an alternative description of the set W . Suppose that (3.7) 1 = 2 = : : : k1 > k1 +1 = : : : = k2 > k2 +1 : : : = k ; r
and de ne k0 = 0.
Proposition 3.8 The set W consists of all block-diagonal matrices of the form Diag(U1 ; U2 ; : : :; Ur ), with Uj in Uk ?k ? , for j = 1; 2; : : : ; r. j
j 1
Proof. This is standard: matrices in W have the form (u1; u2; : : :; un),
with columns an orthonormal basis of eigenvectors for Diag, and since the standard unit vector ei is an eigenvector with eigenvalue i, we have huj ; eii = 0 whenever j 6= i. The result then follows. 2
Lemma 3.9 Suppose that for vectors j in Rk ?k ? (j = 1; 2; : : : ; r), the j
j 1
(partitioned) vector = ( 1 ; 2 ; : : :; r ) lies in @f (). Then for any doublystochastic matrices Sj in Sk ?k ?1 (j = 1; 2; : : : ; r), the (partitioned) vector (S1 1; S2 2; : : :; Sr r ) also lies in @f (). j
j
15
Proof. By Birkho's Theorem [11], since the set @f () is convex it will
suce to prove the result when each Sj is a permutation matrix. In this case, de ne the permutation matrix P = Diag(S1; S2; : : :; Sr ), and note that P = . Now since f is symmetric, f ( ) = f (P ) for all points close to , so by the chain rule [3, Theorem 2.3.10], @f () = P T @f (P). Hence P = (S1 1; S2 2; : : :; Sr r ) 2 @f (). 2 We can now derive an alternative description of the set D.
Corollary 3.10 The set D consists of all block-diagonal matrices of the form Diag(D1; D2; : : :; Dr ), with Dj in Hk ?k ? , for j = 1; 2; : : : ; r, and with the j
j 1
vector ((D1); (D2 ); : : :; (Dr )) in @f ().
Proof. This follows easily by applying Proposition 3.8 and Lemma 3.9 (with
the Sj 's chosen as suitable permutation matrices).
2
Lemma 3.11 For any two m m Hermitian matrices D and E and any real in [0; 1], there is an m m doubly-stochastic matrix S with (D + (1 ? )E ) = S ((D) + (1 ? )(E )):
Proof. In the Schur partial order, the vector (D)+(1 ? )(E ) majorizes the vector (D + (1 ? )E ): in other words [15] j X i=1
i(D + (1 ? )E )
j X i=1
16
(i(D) + (1 ? )i (E ))
for j = 1; 2; : : : ; m, with equality for j = m. This follows from Fan's result that the function Pj1 i () is convex (c.f. [9]). The result now follows from [15, page 11]. 2
Theorem 3.12 The set D is convex. Proof. Suppose that the matrices A and B belong to D, and x a real in [0; 1]. By Corollary 3.10 there are matrices Dj and Ej in Hk ?k ? for j
j 1
j = 1; 2; : : : ; r with A = Diag(D1; D2; : : :; Dr ), B = Diag(E1; E2; : : :; Er ), and with both the vectors ((D1 ); (D2); : : : ; (Dr )) and ((E1); (E2); : : :; (Er )) in @f (). By Lemma 3.11 there exist matrices Sj in Sk ?k ?1 with j
(3.13)
j
(Dj + (1 ? )Ej ) = Sj ((Dj ) + (1 ? )(Ej ));
for j = 1; 2; : : : ; r. Since the set @f () is convex we have that
((D1 ); (D2 ); : : :; (Dr )) + (1 ? )((E1); (E2 ); : : :; (Er )) 2 @f (): Hence by Lemma 3.9 and (3.13), ((D1 + (1 ? )E1 ); (D2 + (1 ? )E2 ); : : : ; (Dr + (1 ? )Er )) 2 @f (); whence A + (1 ? )B 2 D by Corollary 3.10.
2
We have now proved, by Corollary 3.6, that @ (f )(Diag) = D. The general result follows by a simple change of variables. 17
Proof of Theorem 1.4. Pick a unitary V with V (Diag((X ))V = X . Since (f )(V Y V ) = (f )(Y ) for all Hermitian Y close to X , we can
apply the chain rule [3, Theorem 2.3.10] at Y = X (observing that the linear map Y 7! V Y V is invertible) to deduce that
@ (f )(X ) = V (@ (f )(V XV ))V = V (@ (f )(Diag((X ))))V = fV U (Diag )UV j 2 @f ((X )); : : : : : : U 2 Un; U (Diag((X )))U = Diag((X ))g = fW (Diag )W j 2 @f ((X )); W 2 Un ; W (Diag((X )))W = X g;
2
as required.
Example [4]. Let the function f : Rn ! R be de ned by f ( ) = ith largest element of f 1; 2; : : : ; ng: Notice that f is symmetric and locally Lipschitz on Rn, and the corresponding spectral function is given by
f ((X )) = ith largest eigenvalue of X: Suppose that the point in Rn satis es (3.7) and that kj < i kj+1 . Then it is easy to compute (for example using [3, Theorem 2.5.1]) that
@f () = convfei j kj < i kj+1 g: 18
Using this observation, it is a straightforward consequence of Theorem 1.4 that for any Hermitian matrix X ,
@ (f )(X ) = convfuu j Xu = i (X )ug;
2
as observed in [4].
To conclude, we observe how to obtain a special case of the convex result, Theorem 1.3, from the locally Lipshitz result, Theorem 1.4. Suppose then in Theorem 1.4 that the set is convex and that the function f is convex. If for some Hermitian Y with (Y ) in @f ((X )) there exists a unitary U with X = U (Diag((X )))U and Y = U (Diag((Y )))U , then by taking = (Y ) and applying Theorem 1.4 we deduce that Y lies in @ (f )(X ). Conversely, suppose that Y lies in @ (f )(X ), so by Theorem 1.4 there is a vector in @f ((X )) and a unitary U with U (Diag((X )))U = X and U (Diag )U = Y . Let be the vector with components 1; 2; : : :; n permuted into nonincreasing order, so that (Y ) = . Then
f ((X ))+ f ( ) = T (X ) T (X ) f ((X ))+ f ( ) = f ((X ))+ f ( ); whence 2 @f ((X )), and there is a permutation matrix P with P = and P(X ) = (X ) [14, Lemma 2.1]. Now Y = (PU )(Diag((Y )))(PU ) and X = (PU )(Diag((X )))(PU ), giving the result of Theorem 1.3.
References [1] F. Alizadeh. Optimization over the positive de nite cone: interior point methods and combinatorial applications. In P. Pardolos, editor, 19
Advances in optimization and parallel computing, pages 1{25. NorthHolland, Amsterdam, 1992.
[2] J. Burke and M.L. Overton. On the subdierentiability of functions of a matrix spectrum: I: Mathematical foundations; II: Subdierential formulas. In F. Giannessi, editor, Nonsmooth optimization, 1993. [3] F.H. Clarke. Optimization and nonsmooth analysis. Wiley, New York, 1983. [4] S. Cox and M. Overton, 1994. Private communication. [5] J. Cullum, W.E. Donath, and P. Wolfe. The minimization of certain nondierentiable sums of eigenvalues of symmetric matrices. Mathematical Programming Study, 3:35{55, 1975. [6] K. Fan. On a theorem of Weyl concerning eigenvalues of linear transformations. Proceedings of the National Academy of Sciences of U.S.A., 35:652{655, 1949. [7] R. Fletcher. Semi-de nite matrix constraints in optimization. SIAM Journal on Control and Optimization, 23:493{513, 1985. [8] R. Fletcher. A new variational result for quasi-Newton formulae. SIAM Journal on Optimization, 1:18{21, 1991. [9] S. Friedland. Convex spectral functions. Linear and multilinear algebra, 9:299{316, 1981. 20
[10] J.-B. Hiriart-Urruty and D. Ye. Sensitivity analysis of all eigenvalues of a symmetric matrix. Technical report, Laboratoire d'analyse numerique, Universite Paul Sabatier, Toulouse, France, 1992. [11] R.A. Horn and C. Johnson. Matrix analysis. Cambridge University Press, Cambridge, U.K., 1985. [12] F. Jarre. An interior-point method for minimizing the maximum eigenvalue of a linear combination of matrices. SIAM Journal on Control and Optimization, 31:1360{1377, 1993. [13] T. Kato. A short introduction to perturbation theory for linear operators. Springer-Verlag, New York, 1982. [14] A.S. Lewis. Convex analysis on the Hermitian matrices. Technical Report CORR 93-33, University of Waterloo, 1993. Submitted to SIAM Journal on Optimization. [15] A.W. Marshall and I. Olkin. Inequalities: theory of majorization and its applications. Academic Press, New York, 1979. [16] Y.E. Nesterov and A.S. Nemirovsky. Interior point polynomial methods in convex programming. SIAM Publications, Philadelphia, 1993. [17] M.L. Overton. On minimizing the maximum eigenvalue of a symmetric matrix. SIAM Journal on Matrix Analysis and Applications, 9:256{268, 1988. [18] M.L. Overton. Large-scale optimization of eigenvalues. SIAM Journal on Optimization, 2:88{120, 1992. 21
[19] M.L. Overton and R.S. Womersley. Optimality conditions and duality theory for minimizing sums of the largest eigenvalues of symmetric matrices. Mathematical programming, Series B, 62:321{357, 1993. [20] E. Polak and Y. Wardi. A nondierentiable optimization algorithm for structural problems with eigenvalue inequality constraints. Journal of structural mechanics, 11:561{577, 1983. [21] F. Rendl and H. Wolkowicz. Applications of parametric programming and eigenvalue maximization to the quadratic assignment problem. Mathematical Programming, 53:63{78, 1992. [22] A. Shapiro. Extremal problems on the set of nonnegative de nite matrices. Linear Algebra and its Applications, 67:7{18, 1985. [23] A. Shapiro and M.K.H. Fan. On eigenvalue optimization. SIAM Journal on Optimization, 1994. To appear. [24] G.A. Watson. An algorithm for optimal l2 scaling of matrices. IMA Journal of Numerical Analysis, 11:481{492, 1991. [25] H. Wolkowicz. Explicit solutions for interval semide nite linear programs. Technical Report CORR 93-29, Department of Combinatorics and Optimization, University of Waterloo, 1993. [26] B. Yang and R.J. Vanderbei. The simplest semide nite programs are trivial. Technical report, Program in Statistics and Operations Research, Princeton University, 1993. 22