An Improved Newton Iteration for the Generalized Inverse of a Matrix ...

Report 20 Downloads 16 Views
Research

An for

Improved

the

Newton

Generalized with

Victor

Institute

for Advanced Computer Science NASA Ames Research Center

Iteration

Inverse

of a Matrix,

Applications

Robert

Pan

Schreiber

1

RIACS

To appear:

Technical

SIAM

(NASA-CR-185983) ITERATION HAT£1X, fOF

WITH

Advanced

Journal AN

FOR

THE

Report

on Scientific

IMPROVEO

GENERALIZED

APPLICATIO'_S Computer

and Statistical

1990

Computing N92-I1723

NEWTON INVERSE (Research

Science)

March,

90.16

40

OF

A

Inst. pCSCL

12A

G3/64

Unclas 004]049

for

An Improved Newton the Generalized Inverse with Applications

Victor

Pan

Robert

Iteration of a Matrix,

Schreiber

1.4

U

0J

0.fl O,4

0.2

02

0_

_

U

1

t.g

_tA

1.6

The Research Institute of Advanced Computer Science is operated by Universities Space Research Association, The American City Building, Suite 311, Columbia, MD 244, (301)730-2656 Work reported herein was supported by the NAS Systems Division of NASA and DARPA via Cooperative Agreement NCC 2°387 between NASA and the University Space P_esearch Association (USBA). Work was performed at the Research Institute for Advanced Computer Science (RIACS), NASA Ames Research Center, Moffett Field, CA 94035.

An Improved

Newton

Iteration for the Generalized with Applications _"

Inverse

of a Matrix,

Victor Pan Computer Science Department, Albany, NY 12222 and

SUNY

Mathematics Department, Lehman College CUNY, Bronx, NY 10468 by NSF Grant CCR-8805782 and PSC-CUNY Award 668541) and

(Supported

Robert Schreiber Research Institute for Advanced Computer Science Moffett Field, CA 94035 (Supported by ONR Contract number N00014-86-K-0610 and ARO Grant DAAL03-86-K-0112 to the Rensselaer Polytechnic Institute, and by the NAS Systems Division via Cooperative Agreement NCC2-387 between NASA and the Universities Space Research Association.) Abstract

Pan and Reif have shown that Newton iteration

nxn, weU-conditioned

efficient.

matrix

Newton's

reduce

is expensive

the cost of Newton's

the factor is four.

mounts

with great efficiency

method

speedup by a factor

method

to a sequence of matrix-matrix

with several

new

acceleration

multiplications,

count.

input matrices; for symmetric positive

and Statistical

In this paper we

procedures.

We obtain a

definite

procedure is a form of Tchebychev

method uses instead a Neumann series approximation.

on Scientific

is processor

on systolic arrays and parallel computers.

We also show that the accelerated

to the SIAM Journal

the inverse of an

and that this computation

in terms of the arithmetic operation

of two for arbitrary

fion, while Newton's

_e Submitted

in parallel time O(logha)

Since the algorithm essentially

it can be implemented

may be used to compute

Computing.

matrices,

accelera-

-2-

In addition,

lems.

we develop

Newton-like

We show how to compute

eralized

inverses

and projections

signal processing

method

the nearest matrices

of these nearby

onto subspaces

when applied

matrices,

spanned

applications.

procedures

to a singular

matrix

show how to use these tools to devise

vectors;

such computations

time parallel

in

of Newton's

methods.

algorithms

from A),

are important

instability

is absent from these improved

prob-

A, the gen-

of their distances

we show that the numerical

new polylog

related

of lower rank to a given matrix

their ranks (as a function

by singular

Furthermore,

for a number of important

Finally,

for the singular

we

value

decomposition.

Key

Words:

decomposition,

Matrix

Newton

computation,

We would like to thank

for helping to simplify

gesting the Tchebychev

generalized

inverse,

singular

value

iteration.

Acknowledgements.

especially

Moore-Penrose

analysis;

the referees

and clarify the proofs;

and Sally Goodall

for many

valuable

Roland Freund and Youcef

for expertly

suggestions,

Saad for sug-

typing the manuscript.

1. Introduction

Pan and Reif [11],

[12] have

inverse of an nxn, well-conditioned

of processors

describe

that is within

and to analyze,

shown

that Newton

iteration

may be used to compute

matrix in parallel time proportional

a factor

and is strongly

of log n of the optimum.

numerically

stable

Newton

for no_ingular

the

to log2n using a number

iteration

is simple

input matrices;

to

this is

-3-

not true of earlier

polylog

matrix multiplications,

computers.

time matrix

inversion

it can be implemented

methods.

Moreover,

with great efficiency

since it is rich in matrix-

on systolic

1

Unforumately,

the computation

of matrix multiplications

required

can be rather expensive.

It has been shown that the number

can be as high as 4 log k--(A), where k-'(A) .= I I A I 12 I 1A-t I 12.

Here we show how to reduce this cost, in some cases quite dramatically.

extend the usual Newton

iteration

in several

ways. In particular,

substantially accelerate theconvergenceof theiteration;



insure



show how to compute the matrix A(e) obtained

its stability

values

Theorem

We accelerate

even when A is singular;,

from A by setting to zero any of its singular

that are less than e. Thus A(e) is a closest

lower

rank

approximation

to A (see

2.1 below);

compute A+(e),the Moorc-Pcnrose generalizedinverse(alsocalledthepseudo-inverse) of

A(O;



and

we





arrays and parallel

compute

the rank of A(8);

I (On tl_ Connection Machine [7]matrix productsmay be cornpute_l at5X10 9 Olm'ationsper second,but matrix computations done in standardlanguages rarelycan exceed I0 _ operationsper s_eontl.Furthermore, computer manufacturersarectmrentlybeing encouraged toprovidethe f_mst numdx productsoftwarepossible,sinceother matrix computations (QR and LU decomposition,inparticular) nmy be computed with algorithms thatarerichinmatrix multiply[4].)



compute

the matrix P(e) that projects

Concerning

acceleration,

we obtain

and we prove that the scaled iterates

minimax

definite

sense,

in a subspace

matrices, we get another

the initial

lower

optimal

iterate.

bound

applications,

methods

method

by Tchebychev

in ATA.

to be useful in practice.

The

method

methods in terms of operation count, but the standard methods

lel architectures).

the Kogbetliantz

Good experimental

method

are needed.

evidence

tions.

Thus,

Newton's

method

for most

computations

that the

discussed

of the SVD in a highly

is not competitive

implementing

with

a

standard

are not well suited to highly paral-

is available to show that from six to ten sweeps of

is therefore

is competitive

further

we do not claim

For real A, each sweep requires

sequential operation count of this method

have

the SVD of a matrix.

n n processors, a square array of -2-x-2-

[3]. (This

of constructing

Our results

of the full SVD of A. The computation

to Kogbetliantz

positive

we prove no such a priori

environments,

alternative,

that are, in the usual

a new means

promise

computing

at each step,

In the case of symmetric

for which

and to computing

the iterates

polynomials

procedures

is best done using

due

by scaling

adaptive

advantageous.

computation

speedup

of polynomials

in highly parallel

are always

environment

Jacobi-like

but which

efficiency

here, is a parallel

parallel

are defined

in particular, to signal processing

Concerning

present

a twofold

onto the range of A(_).

speedup by a factor of two through

We also consider

on speedup,

orthogonally

8n 3 multiplications.

the same as that of 32 m

with a Kogbetliantz

SVD

52 Newton

for these problems

The

itera-

on

-5-

highly parallel

computers.

if the adaptive

acceleration

steps axe needed,

more,

It is a clear winner if the condition

we employ

or if we can exploit

it is often the case on parallel

tion, can be computed

especially

rims an order of magnitude

Of course,

is especially

sparsity

machines

fast.

I.I

successful,

or other properties

of A to reduce its cost. Further-

that matrix products, the core of the Newton

faster than code written in Fortran,

other parallel

so that far fewer than 30 Newton

(On the Connection Machine,

methods

are not available,

microeoded

item-

matrix multiply

C*, or *LISP.)

for the SVD may arise.

sors is not large, so that O(n 2) processors

costly methods

number of A(e) is not too large, or

And when the number of proces-

then more modestly

parallel,

but less

for the SVD are better.

Contents.

We organize

customary

Newton

tion procedure

the paper

iteration

used in Newton's

symmetric

positive

Section

for matrix inversion;

and in Section

quadratics

matrices

as follows:

5 an adaptive

method.

2 is for definitions.

in Section

acceleration

We give a method

definite A in Section

6. In Sections

In Section

4 we present a Tchebychev

using cubic polynomials

for finding an improved

7-9 we give methods

A(e) and A+(e) (see above), prove the stability of these computations,

subspaces

experiments

spanned

by singular

are presented

vectors,

in Section

10.

3, we recall

and show some further

applications.

rather

the

accelera-

than the

initial iterate

for

for computing

the

show how to find

Some

numerical

-6-

,

2. Notation

We denote

most matrices

Greek letters, and the elements

use lower case Greek letters

particular, columns

by upper

case Roman

letters,

of a matrix by the corresponding

for various

of matrices.

The letters

that all the quantities are real; extension

The basis for our analysis

scalars,

diagonal

matrices

by upper case

lower case Greek letter.

lower case bold Roman

letters for vectors

i, j, k, m, n, r, and s are used for integers.

to the complex

is the singular

We also

and in

We assume

case is straightforward.

value decomposition

(hereafter

referred

to as the

SVD) of A e Rmxn,

A =UY-V r , where

U=[ub

Y_= diag(ol

""

(2.1) ,urn]

and

V=[vl,

>- a2 _ "" • _ ar > ar+l =

"--,v,]

are

square

orthogonal

"'" = % = 0). Here, r = rank(A).

matrices

The generalized

and

inverse

of A is

A + = V]E + U T where

Y_+=_diag(oi -i,aE l, • • • ,Or 1,0, "" " ,0). If A is symmetric

and positive

of A are c_jand the eigenvectors

We shall use the Euclidean

IIA112

definite,

then m = n and U = V. In this case, the eigenvalues

are vj, 1 < j < n.

vector norm I Ix I 12--E(xTx)1/2 and the following

-ffimax_IIAx112/I

Ix112

matrix

norms:

,

-7I IAI Ii

mmax

_,, laijl

I IAI I.

--max

_, Ior,ijl ,

J

l

]

j

L'J It

I IAI

is

known

IF=

,

[6]

J that

I IZI IF=(_ioT)

if

A

1r2. The

has

the

following

SVD

theorem

(2.1)

then

(the Eckata-Young

IIA112=oi,

and

Theorem)

can be

found in [6]. p. 19.

Theorem

2.1. Let A be a matrix of rank r with SVD (2.1).

A, -

Let s _A and an addi-

is that (3.1) essentially

we do not need to form A, which is convenient

each iteration

double

of all the pj to within E of 1. Thus, if the floating-

can be very efficiently

in the case of Toeplitz

is required,

p_)

in this method

less costly to compute;

vectors

values

[1], [2], [11], [12].

of cost, the method is more expensive

for our interest

which

of or are available

steps of (3.1) to get to the point where

for convergence

are the measure

The mason

an estimate

5 we show how to reduce

that is

this to

- 10Schreiber

[14] presented

Xk+l = (Xk+l(2I

to minimize

the scaled iteration

k = 0, 1, .-.

- XkA)Xk,

where Xo is given by (3.2).

v.

Here, we employ

a bound on the maximum

(4.1)

an acceleration

distance

of any nonzero

parameter

singular

Let us assume that we know bounds Omin and Omaxon the singular

values

(Xk+le [1,2] chosen so as

value of Xk+lA

from one.

of A satisfying

Or= -- 0, we let

2or= = or= + _.,x

(4.3)

and

_(0) = (XOOma x -

20mix Groin + Om_

. '

these being lower and upper bounds on the singular

(4.4)

values

of XoA. Then take

-11-

Otk+l = _0¢+1)=

2 l+(2-.O_O0)_&) '

which is both an acceleration

(4.5) and an upper bound on {p_+_)}, and

parameter

p_&+1) = _+I (2--P_(k))P_ &) , which

is a lower

bound

on {p_+0}.

(4.6) The definitions

(4.3)-(4.6)

imply that for all 1 <j < r and

k>l,

and

12(k) = 2 - _(k).

(4.7)

(Note that (4.7) follows from immediately

straightforward.

(4.5) and (4.6).

The upper bound p_) < _00 is likewise

Finally, if

then

(2--_O0)p_O0< (2-p_))p_) whence,

by (4.6) and the definition

< 1, of pjt_) the lower bound fl00 < pjtk) fouows.)

Except for the last few iterations

before

convergence,

1200_ 1, which implies that ak+l = 2.

Thus, p_+l) = 4pr(k). Therefore,

- log4Pr (°)= log2[(_A)2 + I)I/2] = log2k'(A) + O(I/_A) 2) stepssufficeto bring allthe singularvaluesof XkA up to V2,which ishalfas many as forthe

unaccelerated

version.

- 12-

We shall now derive

given by 0.5).

a theorem

Let the symmetric

concerning

v

the optimality

residual matrix initially

of the acceleration

parameters

be

E - I - otoATA = I - XoA. It is straightforward

(4.8)

to show that Newton's

method,

starting

with Xo = otoA T, produces

iterates

Xk

satisfying

XkA = I -- E m

(4.9)

where m = 2k. For a nonsingular

in the open interval

Furthermore,

A and for the choice (4.2) for tXo, the eigenvalues

and we have that E = _ 0 and, therefore,

XkA _

I as k _

of E lie

-0.

(4.8) and (4.9) imply that

Xk--(I+E

Thus, Newton's

(-1,1),

matrix

+

---

method

+Eat-l)_A

T.

is related to the Ncumann

series expansion

(I-E) -1 = I + E + E2 + .. • We therefore

polynomial

inverse

ask whether

approximation

the accelerated

to (I-E) -l.

by a Tchebychevpolynomial

method (3.2), (4.1) B

In fact, it is exactly

equivalent

(4.6) is related

to a better

to approximation

of this

in E, as we now show.

Let

T2,(_ ) -- coS (2 k Cos-l_) be the Tchebychev

polynomial

of degree 2 k on (-1,1).

T2,.,(_) =2 T],(_) - 1.

Recall that To(_) = 1, Tl(_) = _, and

(4.10)

- 13-

The scaled Tchebychev

tk(_) -=

polynomials

T2_(_ + 5) T2,(8) 2

where

T=

(Oma

on (am_n,Om_ arc defined by

x _

-(am_x + amin) Omin

)

and 8

=

(Oma

x _

(I .rain )

.

Surely, tk is a polynomial

of degree

2k, and

tk(O) = 1, so that

tk( ) = 1-- Tk( ) for some polynomial

polynomials,

recurrence

tk minimizes

like (4.10).

[_ktk(_)

Theorem

Y4,of degree

2k--l;

furthermore,

the norm _

s't_c_ ' [tk(_)l

it is a classical

result

-ffiI I tkl I..(a..,a,.).

that among

They

all such

also satisfy

Let [_k -=T2_(_). Then by (4.10),

---- 2(_k-llk-1(_))

2-

(4.11)

I.

4.1. Let the sequence of matricesXk, k=0, I, "'" be generatedby (3.2),(4.1)--

(4.6).Then

Xk = [_k(ATA)A T where _(_)

is a polynomial

polynomial

(4.12) of degree 2k---1. The polynomial

1 - _ _(_)

is the scaled

tk(_) of degree 2 k on (Omin,Om_x).

Remark.

Before proving the theorem,

I - XkA = I - pk(ATA), and therefore

that

II I-XkA112=max

I <j._n

I I--pk(O2)l

we point out that (4.12) implies that

Tchebychev

a

which shows the relevance

An analogue

of the theorem's

conclusion.

of this result also holds for Xo any matrix of the form r(ATA)A T, with r a poly-

nomial, such that the 2-norm of I - XoA is less than one.

Proof.

1 - _(_)

The claim (4.12) is clearly

true when k=0,

with _(_)

= 1 - 2_ / (am_+C_m0 = to(X) is the appropriate

Now use induction

all k > 1, l 0 such that we are certain

Let T=XA

and compute

T2 and 8-

that there

are no eigenvalues

pj in

I IT-T21

IF. Now, it follows

from

(3.3) that

2= J_

[pj(l_p)]2;

(5.1)

-16-

whence,

for _11 1 < j _¼ then T2VALID := true; _oto STEP 1;

else og.qLQSTEP 3; STEP 3 [USE ACCELERATION]:

_:=__

,4_,_&

,i

X := --_-(T2-(2+fl)T+(I+2R)I

) X;

T := XA; _oto STEP 1;

COMMENTS

on ALGORITHM

CUINV.

(1)

Stopping

criteria

(2)

First, we have made use of the fact that after a Newton

are discussed

Tk+I,=Xk+iA = (2I-XkA)XkA = (2I--Tk)Tk.

by Soderstmm

and Stewart

[17].

step (3.1),

A+.

- 20 -

Thus, if we decide

,..-

to reject the use of a cubic acceleration

STEP 2 is not wasted,

because

step, the computation

of T 2 in

it saves us at least that much work in the following

Newton

step (STEP 1). (3)

We do not use cubic steps exclusively; step because to contain

Newton

Furthermore,

near one to actually

We use only Newton

If A is an mxn matrix,

the Newton

step for each cubic

[0,p_] and [1 -g!,l steps

+p..] known

finish the job

of forcing

converge.

of trace(XA)

values of XA are close to one.

to within one half of its limiting

then we assume

flops (we exploit the symmetry cost of computing

find intervals

steps when all the singular

naled by the convergence

CUINV

steps, we cannot

all the eigenvalues.

eigenvalues (4)

without

we do need at least one Newton

that the cost of computing

of XA), the cost of computing

X in STEP 1 or STEP 3 is mn 2 flops.

This is sig-

value of n.

the product

XA is _Amnz

the product "1`2 is 1/2n3 flops, and the

Then, if we skip STEP 3, an iteration

of

costs nZm + 'An3 flops,

assuming

that T2VALID

was true. The cost of an iteration

in which we take STEPS

1 --

3 is

3mn 2 + _An3, assuming

that T2VALID

was false.

take STEP 3 that cause T2VALID 16% compared

with two Newton

These assumptions to be false.

since it is the iterations

Thus, for n = m, we pay a premium

that

of only about

steps.

What is the effect of a full iteration of XA are mapped

are warranted

assuming

STEP 3 is taken?

Formally,

the eigenvalues

as follows:

P ---> c((2---p)p) = I+I-_

(l---p) 4- _-(l---P) 6 - C(p;p_).

(5.3)

For small p, C(p;p_) = (4+_-)p. And because itwilloftenbe thecase thatp_¢: I,the combined iteration greatlyamplifiesthe small eigenvalues.Those near one continueto converge superlinearly; equation (5.3)shows that

I 1--C(p;p_) I = _

(i--p) 4 + o((1-p)

6)

-21 -

< (1--p_)(1--.p)3 + O((1--9) 6) = 0((1--t9)3). since any eigenvalue See Section RITHM

p of XA exceeding

10 for some experimental

evidence

of the efficiency

and reliability

of ALGO-

CUINV.

6. An Improved

Initial

Let us consider many applications

Iterate

in the Symmetric

the case of a symmetric

Positive

positive

Definite

definite

Case

matrix A, which is important in

[6]. Given a matrix A with the SVD

A=VZV (which

fl must be in [1 - _,1 + p..], so R >_1-p.

T,

Z=diag(ol>

is its eigendecomposition,

...

>on>0)

(6.1)

too) we shall choose

Xo = 13I+ (xoA so as to optimally

place the singular values of XoA. From (6.1) and (6.2)

XOA -" V(,_X

We choose

(6.2)

+ OtO_2)V T .

([3,O.o)so that, with p(a) -=15o + o_o 2, the largest possible III-X0AII2ffi

max

initial error

Ip(o)-ll

o._o_ot

is minimized.

By standard arguments

(see [5], ch. 9), 1 -p is a scaled Tchebychev

on (%,O1):

1 - p(o) -- T2(O) ffi 2(0

- Omid) 2 --

_2

20_id -- _2 where

Omid m 1/2(%

+ Ol ) and

III-XoA112=

We are most concerned

p(On)-

_ - o I - Omid.

max o, _;o