An implementation of the steepest descent ... - Optimization Online

Comment

Report 10 Downloads 69 Views

An implementation of the steepest descent method using retractions on riemannian manifolds Ever F. Cruzado Quispe ∗ and Erik A. Papa Quiroz

†

Abstract In 2008 Absil et al. [1] published a book with optimization methods in Riemannian manifolds. The authors developed steepest descent, Newton, trust-region and conjugate gradients methods using an approximation of the geodesic called retraction. In this paper we present implementations of the of steepest descent method of Absil et al. [1] using Matlab software. We show the implementation and numerical results to minimize the Rayleigh quotient on the unit sphere and the Brockett function on the Stiefel manifold that are strongly related to eigenvalue problems in computational Linear Algebra. Moreover, we introduce nonconvex functions studied in the paper of Papa Quiroz et al. [7], verifying that for these functions the method also has good performance.

Keywords: Optimization on manifolds, riemannian manifolds, retractions, steepest descent method.

1

Introduction

We are interested in studying the problem:   min f (x) s.to :  x∈M

(1.1)

where M is a Riemannian manifold and f is a function defined on M. The problem (1.1) belongs to the branch of Mathematical Optimization on Riemannian manifolds wich arises as a natural extension of the theory and optimization methods in Euclidian space for more general spaces. One of the advantages of using Riemannian geometry tools is that constrained optimization problems can be seen as unconstrained ones considering the intrinsic properties of the manifold; another advantage is that optimization problems with nonconvex objetive functions become convex choosing a appropriate Riemannian metric. We will study the generalization of steepest descent method from Euclidian spaces to Riemannian manifolds, this method generates the following sequence of poinTS xk give by: xk+1 = Rxk (tk ηk ), where ηk ∈ Txk M (Txk M is the tangent space to the manifold M in the point xk ), tk is a scalar for which we will make a Armijo search to determine the next point xk+1 and R is a application called retraction, this ones is defined as a smooth application of tangent bundle on the manifold, i.e., R : T M → M; further if we denote for Rx the restriction of R to Tx M it must satisfy the following conditions: ∗ †

Callao National University, [email protected] Federal University of Rio de Janeiro, [email protected]

1

• Rx (0x ) = x, where 0x ∈ Tx M is the null vector. • With the canonical identification T0x Tx M ' Tx M, Rx satisfies DRx (0x ) = idTx M , where, idTx M denotes the identity application on Tx M. Many key ideas for line search methods were developed by LUEMBERGER (1972) [6], who proposed to use a search direction along geodesic obtained by projecting the gradient vector in Rn on the tangent space of the set of constraints. Other contributions in optimization on manifolds can be found in GABAY(1982) [5], who generalized Quasi-Newton methods obtaining superlinear convergence, there are also contributions of UDRISTE (1994) [8] and of YANG (2007) [9], who proposed a strategy of Armijo line search along geodesic. In this work, we study the retraction application of a practical point of view, performing implementations of the steepest descent method on Riemannian manifolds, to minimize the Rayleigh quotient on the unit sphere and to minimize the Brockett function on the Stiefel manifold. Later, we present some computational experiments showing that the proposed method also works for nonconvex functions, wich were taken of the paper of Papa Quiroz et al. [7]. Our main contribution is to implement the respective algorithms using MATLAB. The paper is divided as follows. In Section 2 we give some basic results of manifolds and retractions that we will use along the paper. In Section 3 we introduce the method of steepest descent. In Section 4 we give some convergence results of the sequence generated by the method. In Section 5 we show the applications with their respective algorithms. Finally, in Section 6 we realize some computational experiments using our method for nonconvex functions of which we also implement their algorithms and then we show the computational results.

2

Basic results

2.1

Differentiable manifolds

The abstract definition of a manifold relies on the concepts of charts and atlases, so first we define these concepts. Definition 2.1 (Chart) Let M be a set, a chart d-dimensional of the set M, is a bijection ϕ of a subset U of M onto a open subset of Rd , i.e., ϕ : U ⊂ M → S ⊂ Rd x 7→ ϕ(x) A chart is denoted by (U, ϕ), if there is no risk of confusion, we will simply write ϕ. Definition 2.2 (Atlas) An atlas C ∞ of M into Rd is a collection of charts {(Uα , ϕα ) : α ∈ I} of the set M such that: [ 1. Uα = M, α∈I

2. for any pair α, β ∈ I with Uα ∩ Uβ 6= ∅, the sets ϕα (Uα ∩ Uβ ) and ϕβ (Uα ∩ Uβ ) are open sets in Rd and the change of coordinates d d ϕβ ◦ ϕ−1 α : ϕα (Uα ∩ Uβ ) ⊂ R → ϕβ (Uα ∩ Uβ ) ⊂ R

is smooth with its inverse also smooth (i.e., is a diffeomorphism). 2

Two atlases A1 and A2 are equivalent or compatible if A1 ∪ A2 is an atlas, in other words, for every chart (U, ϕ) in A2 , the set of charts A1 ∪ {(U, ϕ)} is still an atlas. Given an atlas A, let A+ be the set of all charts (U, ϕ) such that A ∪ {(U, ϕ)} is also an atlas; the atlas A+ is called the maximal atlas or complete generated by the atlas A. An maximal atlas, is called differentiable structure. Definition 2.3 (Differentiable manifold) A differentiable manifold (d-dimensional) is a pair (M, A+ ), where M is a set and A+ is a maximal atlas of M into Rd , such that the topology induced by A+ is Hausdorff y second countable.

2.2

Submanifolds

Let (M, A+ ) and (N , B + ) be two differential manifolds such that N ⊂ M. The manifold (N , B + ) is called an immersed submanifold of (M, A+ ) if the inclusion application i : N → M defined by i(x) = x is an immersion (see Absil et al. [1], page 24). Let (N , B + ) be a submanifold of (M, A+ ). Since M and N are also topological spaces with the topology of the manifold, if the topology of the manifold N coincide with the topology of the induced subspace by the topological space M, then N is called a embedded submanifold, regular submanifold or simply submanifold of the manifold M.

2.3

Riemannian manifolds

A Riemannian manifold is a differentiable manifold in which every tangent space Tx M is endowed with an inner product h·, ·ix (i.e., a bilinear, Symmetric positive-definite form) and, moreover, this inner product is smoothly varying. This fact allows us to define several metric notions, as length of curves, gradient functions, etc. The smoothly varying inner product is called Riemannian metric. We will use the following notations to denote the inner product of two elements ξx and ζx of Tx M g(ξx , ζx ) = gx (ξx , ζx ) = hξx , ζx i = hξx , ζx ix . Strictly speaking, a Riemannian manifold is a pair (M, g), where M is a differentiable manifold and g is a Riemannian metric on M. Definition 2.4 Given a smooth scalar function f defined on a Riemannian manifold M, the gradient of f at x denoted by grad f (x), is defined as the unique element of Tx M that satisfies hgrad f (x), ξix = D f (x)[ξ], ∀ ξ ∈ Tx M. The gradient of a function has the following remarkable properties of steepest ascent: • The direction of grad f (x) is the direction of steepest ascent of f at x: grad f (x) = arg max D f (x)[ξ]. kgrad f (x)k ξ∈Tx M:kξx k=1 • The norm of grad f (x) gives the steepest slope of f at x: grad f (x) kgrad f (x)k = D f (x) . kgrad f (x)k These two properties are very important in the scope of optimization methods. 3

(2.2)

2.3.1

Riemannian submanifolds

Let M be a regular submanifold of a Riemannian submanifold M, since every tangent space Tx M can be considered as a subspace of Tx M, the Riemannian metric g of M induces a Riemannian metric g on M according to gx (ξ, ζ) = g x (ξ, ζ); ξ, ζ ∈ Tx M, where ξ and ζ of right-hand side of the equality are view as elements of Tx M. These turns to M into a Riemannain manifold of M. Endowed with this Riemannian metric, M is called Riemannian submanifold of M. The orthogonal complement of Tx M at Tx M is called the normal space to M at x and is denoted by (Tx M)⊥ : (Tx M)⊥ = {ξ ∈ Tx M : g x (ξ, ζ) = 0, ∀ ζ ∈ Tx M}. Any element ξ ∈ Tx M can be uniquely decomposed into the sum of an element of Tx M and an element of (Tx M)⊥ : ξ = Px ξ + Px⊥ ξ, where, Px denotes the orthogonal projection onto Tx M and Px⊥ denotes the orthogonal projection onto (Tx M)⊥ . Example 2.1 The unit sphere S n−1 is considered a Riemannian submanifold of Rn , the inner product is inherited from standard inner product at Rn is given by hξ, ηi := ξ T η

(2.3)

The normal space is (Tx S n−1 )⊥ = {αx : α ∈ R}, and the projection are given by Px ξ = (I − xxT )ξ y Px⊥ ξ = xxT ξ, for x ∈ S n−1 . Remark 2.1 Let f be a function defined on a Riemannian manifold M and let f denote the restriction of f to a Riemannian submanifold M. The gradient of f is equal to the projection of the gradient of f onto Tx M, that is, grad f (x) = Px grad f (x).

(2.4)

In fact, Px grad f (x) belongs to Tx M and the equation (2.2) is satisfied since, ∀ζ ∈ Tx M we have: E

D Px grad f (x), ζ = grad f (x) − Px⊥ grad f (x), ζ = grad f (x), ζ = Df (x)[ζ] = Df (x)[ζ].

2.4

Retractions

A retraction on a manifold M is a smooth application R from the tangent bundle T M onto M with the following property: if for any x ∈ M, Rx denote the restriction of R to Tx M, i.e., Rx : Tx M → M ξ 7→ Rx (ξ), then it must satisfy the followings properties: (i) Rx (0x ) = x, where 0x denotes the zero element of Tx M. 4

(ii) With the canonical identification T0x Tx M ' Tx M, Rx satisfies D Rx (0x ) = idTx M ,

(2.5)

where idTx M denotes the identity application on Tx M. Concerning condition (2.5) notice that, since Rx es an aplication from Tx M to M sending 0x to x, it follows that D Rx (0x ) is an aplication from T0x (Tx M) to Tx M. Since Tx M is a vectorial space, there exists a natural identification T0x (Tx M) ' Tx M. The condition D Rx (0x ) = idTx M is called local rigidity condition. Proposition 2.1 Let M be a regular submanifold of a vector space E and let N be an abstract manifold such that dim(M) + dim(N ) = dim(E). Assume that exists a diffeomorphism φ: M×N (F, G)

→ E∗ 7→ φ(F, G)

where E∗ is an open subset of E ( thus E∗ is an open submanifold of E), with a neutral element I ∈ N satisfying φ(F, I) = F, F ∈ M. Under the above assumptions on φ, the aplication RX (E) := π1 (φ−1 (X + ξ)); where, π1 : M × N → M defined as π1 (F, G) = F is the proyection onto the first component; defines a retraction on M. Proof. See Proposition 4.1.2 de Absil et al. (2008), [1]. Example 2.2 (Retraction on the Stiefel manifold) Consider St(p, n) = {X ∈ Rn×p : X T X = Ip }, a retraction is, RX (ξ) := qf(X + ξ),

the

Stiefel

manifold

; as A = QR, where where qf(A) denotes the Q factor of the decomposition of A ∈ Rn×p ∗ Q ∈ St(p, n) and R is an upper triangular p × p matrix with strictly positive diagonal elements. Proof. See Example 4.1.3 de Absil et al. (2008), [1].

Example 2.3 (Retracction on the sphere S n−1 ) Let M = S n−1 and let N = {x ∈ R : x > 0}, and consider the aplication φ: M×N (x, r)

→ Rn∗ 7→ xr.

It is straightforward to verify that φ is a diffeomorphism; then by the Proposition 2.1 is obtained the retraction x+ξ Rx (ξ) = , kx + ξk defined for all ξ ∈ Tx S n−1 . The point Rx (ξ) is the point of S n−1 that minimizes the distance to x + ξ.

5

3

Steepest descent method

This method on Riemannian manifolds is based on the iteration formula xk+1 = Rxk (tk ηk ), where ηk ∈ Txk M, tk is a scalar and Rxk is a retraction application. To obtain the global convergence of the method we should impose some conditions on ηk and tk . Definition 3.1 (Gradient related sequence) Given a function f on a Riemannian manifold M, a sequence {ηk }, ηk ∈ Txk M, is gradient related if, for any subsequence {xk }k∈K de {xk } that converges to a noncritic point of f , the correspondence subsequence {ηk }k∈K is bounded and satisfies lim sup hgrad f (xk ), ηk i < 0. k→∞, k∈K

Definition 3.2 (Armijo point) Given a function f on a Riemannian manifold M with retraction R, a point x ∈ M, a tangent vector η ∈ Tx M, and scalars α > 0, β, σ ∈ (0, 1), the Armijo point is η A = tA η = β m αη, where m is the smallest nonnegative integer such that f (x) − f (Rx (β m αη)) ≥ −σ hgrad f (x), β m αηix . The real tA is the Armijo step size. Now we propose the algorithm 1 that describe a Armijo line search. Algorithm 1 Armijo line search Require: A Riemannian manifold M, a continuously differentiable function f on M, a retraction R from T M to M, scalars α > 0; c, β, σ ∈ (0, 1) and a initial point x0 ∈ M. Ensure: Sequence of iterates {xk }. 1: for k = 0, 1, 2, ... do 2: Pick ηk ∈ Txk M such that the sequence {ηk }k=0,1,... is gradient related. 3: Select xk+1 such that f (xk ) − f (xk+1 ) ≥ c(f (xk ) − f (Rxk (tA (3.6) k ηk ))), A where, tk is the Armijo step size for the given α, β, σ and ηk . 4: end for

4

Convergence analysis

First, we define the notions of convergence and limit points on manifolds, then we give a global convergence result for Algorithm 1.

4.1

Convergence on manifolds

The notion of convergence on manifolds is a generalization of the Rn case. Definition 4.1 An infinite sequence {xk }k=0,1,... of points of a manifold M is said to be convergent if there exists a chart (U, ψ) of M, a point x∗ ∈ U, and a k > 0 such that xk is in U for all k ≥ K and such that the sequence {ψ(xk )}k=K,K+1,... converges to ψ(x∗ ). 6

Definition 4.2 Given a sequence {xk }k=0,1,... we say that x is an accumulation point or limit point of the sequence, if there exists a subsequence {xjk }k=0,1,... that converges a x. The set of accumulation points of a sequence is called the limit set of the sequence.

4.2

Convergence of the method

Theorem 4.1 Let {xk } be an infinite sequence of iterations generated by Algorithm 1. Then every accumulation point {xk } is a critical point of the cost function f . Proof. See Theorem 4.3.1 de Absil et al. (2008), [1].

Corollary 4.1 Let {xk } be an infinite sequence of iterations generated by Algorithm 1. Assume that the level set L = {x ∈ M : f (x) ≤ f (x0 )} is compact (which holds in particular when M itself is compact). Then lim kgrad f (xk )k = 0. k→∞

Proof. See Corollary 4.3.2 de Absil et al. (2008), [1].

4.3

Stability of fixed points

The Theorem 4.1 states that only critical points of a cost function f can be accumulation points of sequences {xk } generated by Algorithm 1; but it does not specify whether the acumulation points are local minimizers, local maximizers or saddle points. In practice, if a initial point x0 is carefully crafted, the Algoritm 1 generates sequences whose acumulation points are local minimal of the cost function. This observation is supported by the following stability analysis of critical points. Definitions 4.3 1. Let F be a application from M a M. A point x∗ ∈ M is a fixed point of F if F (x∗ ) = x∗ . Let F (n) denote the result of n applications of F to x, i.e., F (1) (x) = F (x), F (i+1) (x) = F (F (i) (x)), i = 1, 2, .... 2. A fixed point x∗ of F is a stable point of F if, for every neighborhood U of x∗ , there exists a neighborhood V of x∗ such that, for all x ∈ V and all positive integer n, it holds that F (n) (x) ∈ U. 3. The fixed point x∗ is asymptotically stable if it is stable, and, moreover lim F (n) (x) = x∗ n→∞ for all x sufficiently close to x∗ . 4. The fixed point x∗ is unstable if it is not stable; in other words, there exists a neighborhood U of x∗ such that, for all neighborhood V of x∗ , there is a point x ∈ V such that F (n) (x) ∈ /U for some n. 5. We say that F is a descent application for a cost function f if f (F (x)) ≤ f (x), ∀x ∈ M.

7

Theorem 4.2 (Unstable fixed points) Let F : M → M be a descent application for a smooth cost function f and assume that for every x ∈ M, all the acumulation points of {F (k) (x)}k=1,2,... are critical points of f . Let x∗ be a fixed point of F (thus x∗ is a critical point of f ). Assume that x∗ is not a local minimum of f . Further assume that there is a compact neighborhood U of x∗ where, for every critical point y of F in U, f (y) = f (x∗ ). Then x∗ is an unstable point for F. Proof. See Theorem 4.4.1 de Absil et al. (2008), [1]. We now give a stability result.

Theorem 4.3 (Capture theorem) Let F : M → M be a descent application for a smooth cost function f and assume that, for every x ∈ M, all the acumulation points of {F k (x)}k=1,2,... are critical points of f . Let x∗ be a local minimizer and an isolated critical point of f . Assume further that dist(F (x), x) goes to zero as x goes to x∗ . Then x∗ is an asymptotically stable point of F . Proof. See Theorem 4.4.2 de Absil et al. (2008), [1].

5 5.1

Applications Rayleigh quotient minimization on the unit sphere

Consider the function

f : Rn → R x 7→ f (x) = xT Ax

whose restriction to the unit sphere S n−1 is the function f : S n−1 → R x 7→ f (x) = xT Ax,

(5.7)

where the matrix A is symmetric (A = AT ) but not necessarily positive-definite. We view S n−1 as a Riemannian submanifold of Euclidean space Rn endowed with the canonical Riemannian metric. g(ξ, ζ) = ξ T ζ. The formulas that we use are summarized in the following table.

function metric tangent space projection onto tangent space gradient retracction

Manifold (S n−1 ) f (x) = xT Ax, x ∈ S n−1 induced metric ξ ∈ Rn : xT ξ = 0 Px ξ = (I − xxT )ξ grad f (x) = Px grad f (x) Rx (ξ) = qf(x + ξ)

Embedding space (Rn ) f (x) = xT Ax, x ∈ Rn g(ξ, ζ) = ξ T ζ Rn identity grad f (x) = 2Ax Rx (ξ) = x + ξ

Hereafter we consider that ηk := − grad f (xk ), therefore for the Rayleigh quotient we have that ηk = −2(Axk − xk xTk Axk ). It is clear that this choice of search direction is gradient related. 8

Next we have to pick a retraction; a reasonable choice is Rx (ξ) =

x+ξ kx + ξk

(5.8)

We now show the algorithm for the problem of minimizing the Rayleigh quotient on the sphere S n−1 ; the which is an adaptation of Algorithm 1, considering c = 1. Algorithm 2 Armijo line search for the Rayleigh quotient on S n−1 Require: Symetric matrix A, scalars α > 0, β, σ ∈ (0, 1) and an initial point x0 ∈ Rn such that kx0 k = 1. Ensure: Secuence of iterations {xk }. 1: for k = 0, 1, 2, ... do 2: Compute ηk = −2(Axk − xk xTk Axk ). 3: Find the smallest integer m ≥ 0 such that f (Rxk (αβ m ηk )) ≤ f (xk ) − σαβ m ηkT ηk , with f defined in (5.7) and R defined in (5.8). 4: Use the updating formula xk+1 = Rxk (αβ m ηk ). 5: end for 5.1.1

Implementation of Algorithm 2 in MATLAB

We now present the algorithm in Matlab code that resolves the problem of minimize the Rayleigh quotient on the unit sphere. 1− Y=i n p u t ( ’ \ n Enter a v e c t o r Y: ’ ) ; % must be a column v e c t o r o f i components 2− f p r i n t f ( ’ \ n The norm N o f Y i s : ’ ) 3− N=norm (Y) 4− f p r i n t f ( ’ \ n The u n i t v e c t o r X o f Y i s : ’ ) ; 5− X=Y/norm (Y) 6− A=i n p u t ( ’ \ n Enter a s y m e t r i c matrix A: ’ ) % must be a matrix o f o r d e r i ∗ i 7− a l p h a =1; 8− sigma = 0 . 1 ; 9− b e t a = 0 . 5 ; 10− e p s i l o n = 0 . 0 0 0 0 1 ; % a p p r o x i m a t i o n e r r o r a l l o w e d 11− Nmax=100; % maximum number o f i t e r a t i o n s t o be performed 12− k=1; 13− t e s t A r m i j o =0; 14− f p r i n t f ( ’ \ n The i n i t i a l p o i n t X( i ) i s : % d ’ ,X) % i v a r i e s from 1 u n t i l t h e number o f components t h a t have X 15− w h i l e k= e p s i l o n 23− m=0; 24− f a c t u a l=X’ ∗A∗X

9

25− 26− 27− 28− 29− 30− 31− 32− 33− 34− 35− 36− 37− 38− 39− 40− 41− 42− 43− 44− 45− 46− 47− 48− 49− 50− 51− 52− 53− 54− 55− 56− 57−

e t a =−2∗(A∗X−X∗X’ ∗A∗X ) ; % s e a r c h d i r e c t i o n R=(X+e t a ) / norm (X+e t a ) ; % t h i s i s t h e taken r e t r a c t i o n f n u e v o=R’ ∗A∗R ; d i f=f a c t u a l −f n u e v o t =1; t e s t=sigma ∗ t ∗ eta ’ ∗ e t a t e s t A r m i j o=t e s t A r m i j o +1; while d i f < test , m = m+1 t e s t A r m i j o=t e s t A r m i j o +1; t=b e t a ˆm∗ a l p h a ; % Armijo s t e p s i z e r =(X+t ∗ e t a ) / norm (X+t ∗ e t a ) % t h e r e t r a c t i o n i s updated a t each i t e r a t i o n d i f=f a c t u a l −r ’ ∗A∗ r t e s t=sigma ∗ t ∗ eta ’ ∗ e t a end X=(X+t ∗ e t a ) / norm (X+t ∗ e t a ) % u p d a t i n g f o r m u l a f o r each i t e r a t i o n f p r i n t f ( ’ \ n New p o i n t x ( i ) i s =: % d ’ ,X) k=k+1; else f p r i n t f ( ’ \ n E r r o r c r i t e r i o n s a t i s f i e d Norm−g r a d i e n t =:% d ’ , Normadf ) f p r i n t f ( ’ \ n The approximate optimum p o i n t x ( i ) i s :% d ’ ,X) f p r i n t f ( ’ \ n The approximate optimum v a l u e i s :% d ’ , f a c t u a l ) f p r i n t f ( ’ \ n I t e r a t i o n number i s :% d ’ , k−1) f p r i n t f ( ’ \ n Number o f Armijo t e s t :% d ’ , t e s t A r m i j o ) k =1000000; end end i f k >999999 f p r i n t f ( ’ \ n S o l v e d t h e problem : % d ’ ) else f p r i n t f ( ’ \ n : I t e r a t i o n s number e x c e e d e d t o % d ’ , k−1) f p r i n t f ( ’ \ n Number o f Armijo t e s t : % d ’ , t e s t A r m i j o ) end

Remark 5.1 The numbers before each line of the algorithm were placed for didactic questions, so for run the algorithm in Matlab only must to copy the Matlab code but not the number. We now will make a brief comment on the performance of the algorithm, line 1 to 9 are the input data as the initial point and the parameters used in the Armijo test, Normadf denotes the gradient norm since this is the stop criterion and can see that if the gradient norm is greater than the approximation error, then the algorithm performs the first iteration, where factual denotes the function evaluated at the initial point, thereafter the Armijo test is performed for it always considers m = 0 but if not satisfies the test is increased at one unit until that the test is satisfied. Once that the Armijo test is satisfied, the retraction “r” leads to the next point and with this point is realized the second iteration and later the algorithm is repeated until that the stop condition is satisfied. Now we will show the computational results of applying the algorithm implemented in MATLAB to solve the problem (5.7) for the particular case when

10

2 5 A= , 5 1 In Algorithm 2 are considered σ = 0.1, β = 0.5, α = 1, also we consider an approximation error T = 0.00001 and as initial point we take 0.6 0.8 which is the unit vector obtained from T 3 4 . With these data the computational results are presented in the following table. xk x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

Iteration 1 2 3 4 5 6 7 8 9 10 11

Obtained point (0.600000, 0.800000) (-0.618911, 0.785461) (-0.683516, 0.729935) (-0.667774, 0.744364) (-0.671831, 0.740704) (-0.670794, 0.741644) (-0.671060, 0.741403) (-0.670991, 0.741465) (-0.671009, 0.741449) (-0.671004, 0.741453) (-0.671006, 0.741452)

Obtained value 6.160000 -3.478254 -3.522032 -3.524748 -3.524925 -3.524937 -3.524938 -3.524938 -3.524938 -3.524938 -3.524938

kgrad f (xk )k 3.760000 1.366731 0.341732 0.087431 0.022401 0.005740 0.001471 3.7685e-004 9.6562e-005 2.4743e-005 6.3399e-006

In the table we can see that taking the initial point x0 = (0.6, 0.8), it is need to perform 11 iterations for reach the optimal point, being the optimal point x∗ = (−0.671006, 0.741452) and the optimal value f ∗ = f (x∗ ) = −3.524938.

5.2

Brockett function minimization on the Stiefel manifold

We will minimize the following function, which admits a more friendly expression in matrix form: f : St(p, n) → R (5.9) X 7→ tr(X T AXN ), where N = diag(µ1 , ..., µp ), with 0 ≤ µ1 ≤ ... ≤ µp , and St(p, n) denotes the orthogonal Stiefel manifold St(p, n) = {X ∈ Rn×p : X T X = Ip }. n×p endowed with the canonical We view St(p, n) as a regular manifold of the Euclidean space R T inner product hZ1 , Z2 i := tr Z1 Z2 . The necessary formulas are summarized in the following table.

cost function metric tangent space projection onto the tangent space gradient retraction

Manifold (S n−1 ) tr(X T AXN ), X T X = Ip induced metric Z ∈ Rn×p : sim(X T Z) = 0 PX Z = Z − X sim(X T Z) grad f (x) = Px grad f (X) RX (ξ) = qf(X + Z)

Embedding space (Rn ) tr(X T AXN ), X ∈ Rn×p hZ1 , Z2 i = tr(Z1T Z2 ) Rn×p identity grad f (x) = 2AXN RX (Z) = X + Z

For the implementation we consider that ηk = − grad f (X) = −(2AXN − XX T AXN − XN X T AX) 11

and we choose the proposed retraction of the example 2.2, i.e., RX (ξ) = qf(X + ξ). 5.2.1

Implementation

The implementation is similar to the implementation of the Rayleigh quotient on the unit sphere, only we make the following changes. Delete Line 1 until line 6 and replace by: M=i n p u t ( ’ \ n Enter a matrix M: ’ ) % must be a matrix o f o r d e r n∗p [ Q,R]= qr ( M )% i s t h e d e c o m p o s i t i o n QR o f M A=i n p u t ( ’ \ n Enter a s y m e t r i c matrix A: ’ ) % must be a matrix o f o r d e r n∗n X = Q; % t h i s i s t h e o r t h o g o n a l matrix t h a t we u s e a s i n i t i a l p o i n t u=i n p u t ( ’ \ n Enter a v e c t o r u such t h a t 0

Recommend Documents

application of gradient steepest descent method to the problem of ...

FINE TUNING NESTEROV'S STEEPEST DESCENT ... - UFPR

Stochastic Compositional Gradient Descent - Optimization Online