Research
An for
Improved
the
Newton
Generalized with
Victor
Institute
for Advanced Computer Science NASA Ames Research Center
Iteration
Inverse
of a Matrix,
Applications
Robert
Pan
Schreiber
1
RIACS
To appear:
Technical
SIAM
(NASA-CR-185983) ITERATION HAT£1X, fOF
WITH
Advanced
Journal AN
FOR
THE
Report
on Scientific
IMPROVEO
GENERALIZED
APPLICATIO'_S Computer
and Statistical
1990
Computing N92-I1723
NEWTON INVERSE (Research
Science)
March,
90.16
40
OF
A
Inst. pCSCL
12A
G3/64
Unclas 004]049
for
An Improved Newton the Generalized Inverse with Applications
Victor
Pan
Robert
Iteration of a Matrix,
Schreiber
1.4
U
0J
0.fl O,4
0.2
02
0_
_
U
1
t.g
_tA
1.6
The Research Institute of Advanced Computer Science is operated by Universities Space Research Association, The American City Building, Suite 311, Columbia, MD 244, (301)730-2656 Work reported herein was supported by the NAS Systems Division of NASA and DARPA via Cooperative Agreement NCC 2°387 between NASA and the University Space P_esearch Association (USBA). Work was performed at the Research Institute for Advanced Computer Science (RIACS), NASA Ames Research Center, Moffett Field, CA 94035.
An Improved
Newton
Iteration for the Generalized with Applications _"
Inverse
of a Matrix,
Victor Pan Computer Science Department, Albany, NY 12222 and
SUNY
Mathematics Department, Lehman College CUNY, Bronx, NY 10468 by NSF Grant CCR-8805782 and PSC-CUNY Award 668541) and
(Supported
Robert Schreiber Research Institute for Advanced Computer Science Moffett Field, CA 94035 (Supported by ONR Contract number N00014-86-K-0610 and ARO Grant DAAL03-86-K-0112 to the Rensselaer Polytechnic Institute, and by the NAS Systems Division via Cooperative Agreement NCC2-387 between NASA and the Universities Space Research Association.) Abstract
Pan and Reif have shown that Newton iteration
nxn, weU-conditioned
efficient.
matrix
Newton's
reduce
is expensive
the cost of Newton's
the factor is four.
mounts
with great efficiency
method
speedup by a factor
method
to a sequence of matrix-matrix
with several
new
acceleration
multiplications,
count.
input matrices; for symmetric positive
and Statistical
In this paper we
procedures.
We obtain a
definite
procedure is a form of Tchebychev
method uses instead a Neumann series approximation.
on Scientific
is processor
on systolic arrays and parallel computers.
We also show that the accelerated
to the SIAM Journal
the inverse of an
and that this computation
in terms of the arithmetic operation
of two for arbitrary
fion, while Newton's
_e Submitted
in parallel time O(logha)
Since the algorithm essentially
it can be implemented
may be used to compute
Computing.
matrices,
accelera-
-2-
In addition,
lems.
we develop
Newton-like
We show how to compute
eralized
inverses
and projections
signal processing
method
the nearest matrices
of these nearby
onto subspaces
when applied
matrices,
spanned
applications.
procedures
to a singular
matrix
show how to use these tools to devise
vectors;
such computations
time parallel
in
of Newton's
methods.
algorithms
from A),
are important
instability
is absent from these improved
prob-
A, the gen-
of their distances
we show that the numerical
new polylog
related
of lower rank to a given matrix
their ranks (as a function
by singular
Furthermore,
for a number of important
Finally,
for the singular
we
value
decomposition.
Key
Words:
decomposition,
Matrix
Newton
computation,
We would like to thank
for helping to simplify
gesting the Tchebychev
generalized
inverse,
singular
value
iteration.
Acknowledgements.
especially
Moore-Penrose
analysis;
the referees
and clarify the proofs;
and Sally Goodall
for many
valuable
Roland Freund and Youcef
for expertly
suggestions,
Saad for sug-
typing the manuscript.
1. Introduction
Pan and Reif [11],
[12] have
inverse of an nxn, well-conditioned
of processors
describe
that is within
and to analyze,
shown
that Newton
iteration
may be used to compute
matrix in parallel time proportional
a factor
and is strongly
of log n of the optimum.
numerically
stable
Newton
for no_ingular
the
to log2n using a number
iteration
is simple
input matrices;
to
this is
-3-
not true of earlier
polylog
matrix multiplications,
computers.
time matrix
inversion
it can be implemented
methods.
Moreover,
with great efficiency
since it is rich in matrix-
on systolic
1
Unforumately,
the computation
of matrix multiplications
required
can be rather expensive.
It has been shown that the number
can be as high as 4 log k--(A), where k-'(A) .= I I A I 12 I 1A-t I 12.
Here we show how to reduce this cost, in some cases quite dramatically.
extend the usual Newton
iteration
in several
ways. In particular,
substantially accelerate theconvergenceof theiteration;
•
insure
•
show how to compute the matrix A(e) obtained
its stability
values
Theorem
We accelerate
even when A is singular;,
from A by setting to zero any of its singular
that are less than e. Thus A(e) is a closest
lower
rank
approximation
to A (see
2.1 below);
compute A+(e),the Moorc-Pcnrose generalizedinverse(alsocalledthepseudo-inverse) of
A(O;
•
and
we
•
•
arrays and parallel
compute
the rank of A(8);
I (On tl_ Connection Machine [7]matrix productsmay be cornpute_l at5X10 9 Olm'ationsper second,but matrix computations done in standardlanguages rarelycan exceed I0 _ operationsper s_eontl.Furthermore, computer manufacturersarectmrentlybeing encouraged toprovidethe f_mst numdx productsoftwarepossible,sinceother matrix computations (QR and LU decomposition,inparticular) nmy be computed with algorithms thatarerichinmatrix multiply[4].)
•
compute
the matrix P(e) that projects
Concerning
acceleration,
we obtain
and we prove that the scaled iterates
minimax
definite
sense,
in a subspace
matrices, we get another
the initial
lower
optimal
iterate.
bound
applications,
methods
method
by Tchebychev
in ATA.
to be useful in practice.
The
method
methods in terms of operation count, but the standard methods
lel architectures).
the Kogbetliantz
Good experimental
method
are needed.
evidence
tions.
Thus,
Newton's
method
for most
computations
that the
discussed
of the SVD in a highly
is not competitive
implementing
with
a
standard
are not well suited to highly paral-
is available to show that from six to ten sweeps of
is therefore
is competitive
further
we do not claim
For real A, each sweep requires
sequential operation count of this method
have
the SVD of a matrix.
n n processors, a square array of -2-x-2-
[3]. (This
of constructing
Our results
of the full SVD of A. The computation
to Kogbetliantz
positive
we prove no such a priori
environments,
alternative,
that are, in the usual
a new means
promise
computing
at each step,
In the case of symmetric
for which
and to computing
the iterates
polynomials
procedures
is best done using
due
by scaling
adaptive
advantageous.
computation
speedup
of polynomials
in highly parallel
are always
environment
Jacobi-like
but which
efficiency
here, is a parallel
parallel
are defined
in particular, to signal processing
Concerning
present
a twofold
onto the range of A(_).
speedup by a factor of two through
We also consider
on speedup,
orthogonally
8n 3 multiplications.
the same as that of 32 m
with a Kogbetliantz
SVD
52 Newton
for these problems
The
itera-
on
-5-
highly parallel
computers.
if the adaptive
acceleration
steps axe needed,
more,
It is a clear winner if the condition
we employ
or if we can exploit
it is often the case on parallel
tion, can be computed
especially
rims an order of magnitude
Of course,
is especially
sparsity
machines
fast.
I.I
successful,
or other properties
of A to reduce its cost. Further-
that matrix products, the core of the Newton
faster than code written in Fortran,
other parallel
so that far fewer than 30 Newton
(On the Connection Machine,
methods
are not available,
microeoded
item-
matrix multiply
C*, or *LISP.)
for the SVD may arise.
sors is not large, so that O(n 2) processors
costly methods
number of A(e) is not too large, or
And when the number of proces-
then more modestly
parallel,
but less
for the SVD are better.
Contents.
We organize
customary
Newton
tion procedure
the paper
iteration
used in Newton's
symmetric
positive
Section
for matrix inversion;
and in Section
quadratics
matrices
as follows:
5 an adaptive
method.
2 is for definitions.
in Section
acceleration
We give a method
definite A in Section
6. In Sections
In Section
4 we present a Tchebychev
using cubic polynomials
for finding an improved
7-9 we give methods
A(e) and A+(e) (see above), prove the stability of these computations,
subspaces
experiments
spanned
by singular
are presented
vectors,
in Section
10.
3, we recall
and show some further
applications.
rather
the
accelera-
than the
initial iterate
for
for computing
the
show how to find
Some
numerical
-6-
,
2. Notation
We denote
most matrices
Greek letters, and the elements
use lower case Greek letters
particular, columns
by upper
case Roman
letters,
of a matrix by the corresponding
for various
of matrices.
The letters
that all the quantities are real; extension
The basis for our analysis
scalars,
diagonal
matrices
by upper case
lower case Greek letter.
lower case bold Roman
letters for vectors
i, j, k, m, n, r, and s are used for integers.
to the complex
is the singular
We also
and in
We assume
case is straightforward.
value decomposition
(hereafter
referred
to as the
SVD) of A e Rmxn,
A =UY-V r , where
U=[ub
Y_= diag(ol
""
(2.1) ,urn]
and
V=[vl,
>- a2 _ "" • _ ar > ar+l =
"--,v,]
are
square
orthogonal
"'" = % = 0). Here, r = rank(A).
matrices
The generalized
and
inverse
of A is
A + = V]E + U T where
Y_+=_diag(oi -i,aE l, • • • ,Or 1,0, "" " ,0). If A is symmetric
and positive
of A are c_jand the eigenvectors
We shall use the Euclidean
IIA112
definite,
then m = n and U = V. In this case, the eigenvalues
are vj, 1 < j < n.
vector norm I Ix I 12--E(xTx)1/2 and the following
-ffimax_IIAx112/I
Ix112
matrix
norms:
,
-7I IAI Ii
mmax
_,, laijl
I IAI I.
--max
_, Ior,ijl ,
J
l
]
j
L'J It
I IAI
is
known
IF=
,
[6]
J that
I IZI IF=(_ioT)
if
A
1r2. The
has
the
following
SVD
theorem
(2.1)
then
(the Eckata-Young
IIA112=oi,
and
Theorem)
can be
found in [6]. p. 19.
Theorem
2.1. Let A be a matrix of rank r with SVD (2.1).
A, -
Let s _A and an addi-
is that (3.1) essentially
we do not need to form A, which is convenient
each iteration
double
of all the pj to within E of 1. Thus, if the floating-
can be very efficiently
in the case of Toeplitz
is required,
p_)
in this method
less costly to compute;
vectors
values
[1], [2], [11], [12].
of cost, the method is more expensive
for our interest
which
of or are available
steps of (3.1) to get to the point where
for convergence
are the measure
The mason
an estimate
5 we show how to reduce
that is
this to
- 10Schreiber
[14] presented
Xk+l = (Xk+l(2I
to minimize
the scaled iteration
k = 0, 1, .-.
- XkA)Xk,
where Xo is given by (3.2).
v.
Here, we employ
a bound on the maximum
(4.1)
an acceleration
distance
of any nonzero
parameter
singular
Let us assume that we know bounds Omin and Omaxon the singular
values
(Xk+le [1,2] chosen so as
value of Xk+lA
from one.
of A satisfying
Or= -- 0, we let
2or= = or= + _.,x
(4.3)
and
_(0) = (XOOma x -
20mix Groin + Om_
. '
these being lower and upper bounds on the singular
(4.4)
values
of XoA. Then take
-11-
Otk+l = _0¢+1)=
2 l+(2-.O_O0)_&) '
which is both an acceleration
(4.5) and an upper bound on {p_+_)}, and
parameter
p_&+1) = _+I (2--P_(k))P_ &) , which
is a lower
bound
on {p_+0}.
(4.6) The definitions
(4.3)-(4.6)
imply that for all 1 <j < r and
k>l,
and
12(k) = 2 - _(k).
(4.7)
(Note that (4.7) follows from immediately
straightforward.
(4.5) and (4.6).
The upper bound p_) < _00 is likewise
Finally, if
then
(2--_O0)p_O0< (2-p_))p_) whence,
by (4.6) and the definition
< 1, of pjt_) the lower bound fl00 < pjtk) fouows.)
Except for the last few iterations
before
convergence,
1200_ 1, which implies that ak+l = 2.
Thus, p_+l) = 4pr(k). Therefore,
- log4Pr (°)= log2[(_A)2 + I)I/2] = log2k'(A) + O(I/_A) 2) stepssufficeto bring allthe singularvaluesof XkA up to V2,which ishalfas many as forthe
unaccelerated
version.
- 12-
We shall now derive
given by 0.5).
a theorem
Let the symmetric
concerning
v
the optimality
residual matrix initially
of the acceleration
parameters
be
E - I - otoATA = I - XoA. It is straightforward
(4.8)
to show that Newton's
method,
starting
with Xo = otoA T, produces
iterates
Xk
satisfying
XkA = I -- E m
(4.9)
where m = 2k. For a nonsingular
in the open interval
Furthermore,
A and for the choice (4.2) for tXo, the eigenvalues
and we have that E = _ 0 and, therefore,
XkA _
I as k _
of E lie
-0.
(4.8) and (4.9) imply that
Xk--(I+E
Thus, Newton's
(-1,1),
matrix
+
---
method
+Eat-l)_A
T.
is related to the Ncumann
series expansion
(I-E) -1 = I + E + E2 + .. • We therefore
polynomial
inverse
ask whether
approximation
the accelerated
to (I-E) -l.
by a Tchebychevpolynomial
method (3.2), (4.1) B
In fact, it is exactly
equivalent
(4.6) is related
to a better
to approximation
of this
in E, as we now show.
Let
T2,(_ ) -- coS (2 k Cos-l_) be the Tchebychev
polynomial
of degree 2 k on (-1,1).
T2,.,(_) =2 T],(_) - 1.
Recall that To(_) = 1, Tl(_) = _, and
(4.10)
- 13-
The scaled Tchebychev
tk(_) -=
polynomials
T2_(_ + 5) T2,(8) 2
where
T=
(Oma
on (am_n,Om_ arc defined by
x _
-(am_x + amin) Omin
)
and 8
=
(Oma
x _
(I .rain )
.
Surely, tk is a polynomial
of degree
2k, and
tk(O) = 1, so that
tk( ) = 1-- Tk( ) for some polynomial
polynomials,
recurrence
tk minimizes
like (4.10).
[_ktk(_)
Theorem
Y4,of degree
2k--l;
furthermore,
the norm _
s't_c_ ' [tk(_)l
it is a classical
result
-ffiI I tkl I..(a..,a,.).
that among
They
all such
also satisfy
Let [_k -=T2_(_). Then by (4.10),
---- 2(_k-llk-1(_))
2-
(4.11)
I.
4.1. Let the sequence of matricesXk, k=0, I, "'" be generatedby (3.2),(4.1)--
(4.6).Then
Xk = [_k(ATA)A T where _(_)
is a polynomial
polynomial
(4.12) of degree 2k---1. The polynomial
1 - _ _(_)
is the scaled
tk(_) of degree 2 k on (Omin,Om_x).
Remark.
Before proving the theorem,
I - XkA = I - pk(ATA), and therefore
that
II I-XkA112=max
I <j._n
I I--pk(O2)l
we point out that (4.12) implies that
Tchebychev
a
which shows the relevance
An analogue
of the theorem's
conclusion.
of this result also holds for Xo any matrix of the form r(ATA)A T, with r a poly-
nomial, such that the 2-norm of I - XoA is less than one.
Proof.
1 - _(_)
The claim (4.12) is clearly
true when k=0,
with _(_)
= 1 - 2_ / (am_+C_m0 = to(X) is the appropriate
Now use induction
all k > 1, l 0 such that we are certain
Let T=XA
and compute
T2 and 8-
that there
are no eigenvalues
pj in
I IT-T21
IF. Now, it follows
from
(3.3) that
2= J_
[pj(l_p)]2;
(5.1)
-16-
whence,
for _11 1 < j _¼ then T2VALID := true; _oto STEP 1;
else og.qLQSTEP 3; STEP 3 [USE ACCELERATION]:
_:=__
,4_,_&
,i
X := --_-(T2-(2+fl)T+(I+2R)I
) X;
T := XA; _oto STEP 1;
COMMENTS
on ALGORITHM
CUINV.
(1)
Stopping
criteria
(2)
First, we have made use of the fact that after a Newton
are discussed
Tk+I,=Xk+iA = (2I-XkA)XkA = (2I--Tk)Tk.
by Soderstmm
and Stewart
[17].
step (3.1),
A+.
- 20 -
Thus, if we decide
,..-
to reject the use of a cubic acceleration
STEP 2 is not wasted,
because
step, the computation
of T 2 in
it saves us at least that much work in the following
Newton
step (STEP 1). (3)
We do not use cubic steps exclusively; step because to contain
Newton
Furthermore,
near one to actually
We use only Newton
If A is an mxn matrix,
the Newton
step for each cubic
[0,p_] and [1 -g!,l steps
+p..] known
finish the job
of forcing
converge.
of trace(XA)
values of XA are close to one.
to within one half of its limiting
then we assume
flops (we exploit the symmetry cost of computing
find intervals
steps when all the singular
naled by the convergence
CUINV
steps, we cannot
all the eigenvalues.
eigenvalues (4)
without
we do need at least one Newton
that the cost of computing
of XA), the cost of computing
X in STEP 1 or STEP 3 is mn 2 flops.
This is sig-
value of n.
the product
XA is _Amnz
the product "1`2 is 1/2n3 flops, and the
Then, if we skip STEP 3, an iteration
of
costs nZm + 'An3 flops,
assuming
that T2VALID
was true. The cost of an iteration
in which we take STEPS
1 --
3 is
3mn 2 + _An3, assuming
that T2VALID
was false.
take STEP 3 that cause T2VALID 16% compared
with two Newton
These assumptions to be false.
since it is the iterations
Thus, for n = m, we pay a premium
that
of only about
steps.
What is the effect of a full iteration of XA are mapped
are warranted
assuming
STEP 3 is taken?
Formally,
the eigenvalues
as follows:
P ---> c((2---p)p) = I+I-_
(l---p) 4- _-(l---P) 6 - C(p;p_).
(5.3)
For small p, C(p;p_) = (4+_-)p. And because itwilloftenbe thecase thatp_¢: I,the combined iteration greatlyamplifiesthe small eigenvalues.Those near one continueto converge superlinearly; equation (5.3)shows that
I 1--C(p;p_) I = _
(i--p) 4 + o((1-p)
6)
-21 -
< (1--p_)(1--.p)3 + O((1--9) 6) = 0((1--t9)3). since any eigenvalue See Section RITHM
p of XA exceeding
10 for some experimental
evidence
of the efficiency
and reliability
of ALGO-
CUINV.
6. An Improved
Initial
Let us consider many applications
Iterate
in the Symmetric
the case of a symmetric
Positive
positive
Definite
definite
Case
matrix A, which is important in
[6]. Given a matrix A with the SVD
A=VZV (which
fl must be in [1 - _,1 + p..], so R >_1-p.
T,
Z=diag(ol>
is its eigendecomposition,
...
>on>0)
(6.1)
too) we shall choose
Xo = 13I+ (xoA so as to optimally
place the singular values of XoA. From (6.1) and (6.2)
XOA -" V(,_X
We choose
(6.2)
+ OtO_2)V T .
([3,O.o)so that, with p(a) -=15o + o_o 2, the largest possible III-X0AII2ffi
max
initial error
Ip(o)-ll
o._o_ot
is minimized.
By standard arguments
(see [5], ch. 9), 1 -p is a scaled Tchebychev
on (%,O1):
1 - p(o) -- T2(O) ffi 2(0
- Omid) 2 --
_2
20_id -- _2 where
Omid m 1/2(%
+ Ol ) and
III-XoA112=
We are most concerned
p(On)-
_ - o I - Omid.
max o, _;o