CS 6210 – Matrix Computations Midterm Lee Gao (lg342) October 28, 2013
Contents Exercise 1 – Cancelling Cancellation Tests and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 5
Exercise 2 – Consider Sensitivity Tests and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5 7
Exercise 3 – Factoring Out Diagonals Tests and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8 9
Exercise 4 – All Of Your Singular Bases. . . 9 Tests and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Exercise 5 – Triangulation 13 Tests and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Exercise 6 – Taming Bidiagonals 16 Tests and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Listings 1 2 3 4 5 6 7 8 9
More Accurate Computation . . . . . . . . . . . Algorithm to compute f 0 (0) . . . . . . . . . . . . Algorithm to compute the LU from the cholesky Algorithm to minimize the Frobenius norm. . . . Shortest code to minimize the Frobenius Norm. . Trilaterating One Point . . . . . . . . . . . . . . mtrecover . . . . . . . . . . . . . . . . . . . . . Row sum of the inverse of a bidiagonal . . . . . . Computing the Condition Number in O(n) Time
1 of 20
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
5 7 8 12 12 14 15 17 20
Lee Gao
Exercise 1 – Cancelling Cancellation
Exercise 1 Cancelling Cancellation
√ √ Consider the function f (x) = 1 + x − 1 − x defined on [−1, 1]. Give an example that illustrates that the following naive Matlab algorithm can have poor relative accuracy in floating point. function f = mtbadf(x) f = sqrt(1+x) - sqrt(1-x)
Write an alternate formulation mtgoodf that has good relative accuracy (relative error of a few machine epsilon) over the entire range [−1, 1]. Let’s do a simple δ error analysis of this method. We can break this down into somewhat more manageable components. Suppose that the square root function introduces at most δ = O() relative error, then p √ fl( 1 + x) = fl(1 + x)(1 + δ1 ) p = (1 + x + xδx )(1 + δ2 )(1 + δ1 ) s xδx = (1 + x) 1 + (1 + δ2 )(1 + δ1 ) 1+x ≈
√
xδx 1 1 + x 1 + + δ2 + δ1 2 1+x {z } | δ+
likewise √
fl( 1 − x) ≈
√
1 −xδx 0 + δ2 + δ10 1 − x 1 + 2 1−x | {z } δ−
therefore √ √ √ √ fl( 1 + x − 1 − x) = fl( 1 + x) − fl( 1 − x) (1 + δ 0 ) √ √ ≈ 1 + x(1 + δ+ ) − 1 − x(1 − δ− ) (1 + δ 0 ) √ √ √ √ 1 + xδ+ − 1 − xδ− √ 1+x− 1−x 1+ √ (1 + δ 0 ) = 1+x− 1−x √ √ √ √ 1 + xδ+ − 1 − xδ− √ ≈ 1 + x − 1 − x 1 + √ + δ0 1+x− 1−x | {z } δˆ
Unfortunately, it seems that this method will incur a large amount of relative error whenever the computed value itself is expected to be small: this will occur as x → 0. Now, it seems that δ2 lim δ+ = + δ1 x→0 2 2 of 20
Lee Gao
Exercise 1 – Cancelling Cancellation
and lim
x→0
but since
√
1+x−
√
√
1 ± x = 1,
1−x≈1+
x x − 1 + = x, 2 2
then
δ+ − δ− lim δˆ = + δ0 = O x→0 x x In fact, recall that √ the actual √ computed value is asymptotic to the function λx.x, so we can plot the plot of 1 + x − 1 − x versus just x to find that the computed value looks very misbehaving around x = 10−15 Figure 1:
√
1+x−
√
1 − x misbehaves for x → 0
The problem here is that cancellation is occurring when we subtract out the two terms, which may cause multiplicative accumulation of roundoff. We know that division will only accumulate round off additively, so if we can rewrite the entire computation such that the point at which our value gets reduced comes from a division instead, we might be saved! It turns out that we can do just this by a little bit of clever algebra: √ √ √ √ √ √ 1+x+ 1−x √ 1+x− 1−x= 1+x− 1−x √ 1+x+ 1−x 1+x−1+x √ =√ 1+x+ 1−x 3 of 20
Lee Gao
Exercise 1 – Cancelling Cancellation
= √
2x √ 1+x+ 1−x
Now, we know that the numerator will be computed as 2x(1 + δ× + δx ) while the numerator will be √ √ √ √ fl 1 + x + 1 − x = 1 + x(1 + δ+ ) + 1 − x(1 + δ− ) (1 + δ 0 ) √ √ √ √ 1 + xδ+ + 1 − xδ− √ (1 + δ 0 ) = 1+x+ 1−x 1+ √ 1+x+ 1−x √ √ √ √ 1 + xδ+ + 1 − xδ− 0 √ ≈ 1+x+ 1−x 1+ √ +δ 1+x+ 1−x so the final relative error of the computation ought to be just √ √ 1 + xδ+ + 1 − xδ− ˆ √ √ δ = δ× + δx − − δ0 1+x+ 1−x Let’s bound this error! (on the interval between √ −1 and√1) We know that |δ| ≤ ; furthermore, we also know that the denominator f (x) = 1 + x + 1 − x is even because √ √ f (−x) = 1 − x + 1 + x = f (x) √ and it’s monotone decreasing on the interval [0, 1], so we can compute its infimum as 2, so we know that √ √ 1 + xδ+ + 1 − xδ− ˆ √ δ ≤ 3 + 2 err+ + err− √ = 3 + 2 where the errs denote the absolute errors of the roundoffs, let’s try to bound these as well! √ √ err+ = fl( 1 + x) − 1 + x p √ ≤ (1 + x + x)(1 + )(1 + ) − 1 + x √ √ √ ≈ 1 + x + 2x + 1 + 2 − 1 + x √ √ ≈ 1 + x + 4x + 3 − 1 + x √ √ < 1 + x + 7 − 1 + x 7 √ =√ 1 + x + 7 + 1 + x √ √ the denominator 1 + x + 7 + 1 + x is obiously monotone increasing, so the supremum of the fraction must occur at the infimum of the denominator, at x = −1 √ ≤ 7 which is quite reasonable! (It seems from plotting sqrt(1+x) around x → −1 that the √ maximal absolute error is much better (on the order of machine epsilon) than just (Using the same analysis, we can see that the original computation also behaves similarly at the extremal regions of the interval, so we should just use the new method throughout the entire interval). 4 of 20
Lee Gao
Exercise 2 – Consider Sensitivity
Listing 1: More Accurate Computation 1 2 3
function f = mtgoodf ( x ) f = 2 . ∗ x . / ( sqrt (1+x ) + sqrt (1−x ) ) ; end
Tests and Validation We’ve already validated that this formula is equivalent to the original formula on R. Let’s just now check to see that this form is well-conditioned everywhere (by testing it on a bunch of intervals), which seems to be the case according to figure 2. Exercise 2 Consider Sensitivity Suppose P M = LU for a square matrix M ∈ Rn×n , and let f (s) be the (1, 1) entry for (M + sE)−1 , where E ∈ Rn×n is a given fixed matrix. Finish the following routine to compute f 0 (0). Your code should take O(n2 ) time. function [dfds] = mtderiv(L,U,P,E) Let F (s) = (M + sE)−1 , recall that d F F −1 = 0 ds F 0 F −1 + F F 0−1 = 0 dF −1 = −F −1 F 0 F −1 ds so F 0 (s) = −F (s)
d(M + sE) F (s) = −F (s)EF (s) ds
and F 0 (0) = −F (0)EF (0) = M −1 EM −1 . Now, the operation of taking the (1, 1) entry turns out to be a linear operation in the sense that f (s) = eT1 F (s)e1 furthermore, the differentiation operation is preserved under linear transformations, so that f 0 (s) =
d(eT1 F (s)e1 ) d(F (s)e1 ) = eT1 = eT1 F 0 (s)e1 ds ds
so f 0 (0) = eT1 M −1 EM −1 e1 = (M −T e1 )T E(M −1 e1 ) Now, in order to accomplish this in O(n2 ) time, we need a way to solve M x = y in O(n2 ) time (the PLU decomposition!) and a way of ordering the products so that we never do a matrixmatrix product. Suppose that a = M −T e1 , b = M −1 e1 ∈ Rn×1 , then there are two ordering of the product: (aT M )b, aT (M b), both of which involve one O(n2 ) time matrix-vector product and another O(n) time inner product.
5 of 20
Lee Gao
Exercise 2 – Consider Sensitivity
Figure 2: mtgoodf(x) for x around −1, 0, and 1 respectively.
6 of 20
Lee Gao
Exercise 3 – Factoring Out Diagonals
Finally, let’s figure out how to compute a and b. We know that P M = LU M = P T LU M −1 = U −1 L−1 P M −T = P T L−T U −T so we can use the following algorithm Listing 2: Algorithm to compute f 0 (0) 1 2 3 4 5 6
function [ d f d s ] = mtd eri v (L , U, P , E) e1 = zeros ( length (L) ) ; e1 ( 1 ) = 1 ; a = P ’ ∗ ( L ’ \ ( U’ \ e1 ) ) ; b = U\ (L\ (P∗ e1 ) ) ; d f d s = −a ’ ∗ E∗b ; end
The computation of a and b both involves two triangular solves each and also one matrixvector production, so that’s 3 O(n2 ) operations each. The computation of dfds requires one vector-matrix operations a’*E for O(n2 ) time and another inner product at O(n), so the total asymptotic time is just O(n2 ).
Tests and Validation Verifying that the derivative is computed correctly is a little difficult to do. What we can do is do a quick finite difference approximation and test that the relative difference between the computed derivative and the finite difference is small. Let’s approximate f 0 (0) ≈
f () − f (−) , 2
which should have an error on the order of 2 . We can test this using the script
1 2 3 4 5 6 7 8 9 10 11 12
eps = 1 e −6; for k = 10:300 M = rand ( k ) ; E = rand ( k ) ; [ L , U, P ] = lu (M) ; f 0 = mt der iv (L , U, P , E) ; F = @( s ) inv (M + s ∗E) ; f 0 = (F( eps ) − F(−eps ) ) / ( 2 ∗ eps ) ; f 0 =f 0 ( 1 , 1 ) ; r e l e r r = abs ( ( f 0 − f 0 ) / f 0 ) ; i f r e l e r r > 1 e−4 && cond (M) < 1 e3 relerr end end
which seems to say that the finite difference is close (irregardless of the size k) when the condition of M is low. 7 of 20
Lee Gao
Exercise 3 – Factoring Out Diagonals
Exercise 3 Factoring Out Diagonals Suppose A is symmetric positive definite and the Cholesky factorization A = RT R is given. Write a function to compute an LU factorization of A in O(n2 ) time. function [L,U] = mtcholtolu(R) If A = RT R, it would look something like x x x A= x x x x x x x
x x x x x x x x x x
However, what we really want is to make sure that the lower triangular matrix’s diagonal is inhabited by 1 only: b b b b 1 a 1 b b b A= a a 1 b b b a a a 1 it seems that we can divide each row of A by some factor so the diagonal elements are ones: this turns out to just be right multiplication of RT by a diagonal of the inverse of the diagonal: −1 x x x x x x x x x x x x x−1 x × −1 x x x x x x x x x−1 x x x x x | {z } | {z } L
U
so suppose D = DT is the diagonal of R, then L = RT D−1 = (D−1 R)T U = DR Now, it’s easy to verify that D−1 is just the element-wise reciprocal of the diagonal of D, so this could be done in O(n) time. Unfortunately, matrix-matrix multiplication is O(n3 ), but left-multiplying by a diagonal matrix equates to just multiplying each row of the matrix with the diagonal element, which can also be done in O(n2 ) time. In fact, we just need to represent D as a column vector of the diagonal elements d, which gives the algorithm: Listing 3: Algorithm to compute the LU from the cholesky 1 2 3 4 5 6 7 8
function [ L ,U] = m t c h o l t o l u (R) d = diag (R) ; L = R ’ ; U = R; f o r k = 1 : length (R) L ( : , k ) = R( k , : ) ’ / d ( k ) ; U( k , : ) = R( k , : ) ∗d ( k ) ; end end
8 of 20
Lee Gao
Exercise 4 – All Of Your Singular Bases. . .
Tests and Validation This is relatively easy to test. We just need to ensure that kLU − RT Rk is tiny and that L, U have the required structures, and that the diagonal of L are ones. We can use the script
1 2 3 4 5 6 7 8 9
for k = 10:200 A = k∗rand ( k ) ; A = A’ ∗A; R = chol (A) ; [ L , U] = m t c h o l t o l u (R) ; a s s e r t (norm(L∗U − A) /norm(A) < 1 e −10) ; a s s e r t (norm( diag (L) − o n e s ( k , 1 ) ) < sqrt ( k ) ∗1 e −10) ; a s s e r t (norm( t r i l (L) − L) /norm(A) < 1 e −10) ; a s s e r t (norm( t r i u (U) − U) /norm(A) < 1 e −10) ; end
which seems to think that we’re okay. Exercise 4 All Of Your Singular Bases. . . Suppose A ∈ Rn×n , and let C ∈ Rn×k with k < n. Assuming that C is not rank deficient, find B ∈ Rk×n to minimize kA − CBkF . function [B] = mtfindB(A,C) From the outside, this looks really similar to a least squares problem of the form min ky − Cxk22 x
and as we will see, this does actually reduce to it. However, there’s a bit of a gap between minimizing the frobenius norm and the least squares problem. Now, let’s say that B = X T , where X ∈ Rn×k , then our problem boils down to min kA − CX T k2F X
I will be using the notation xk and x0k to denote the k th column and row of X respectively, so 0 x1 x02 X = x1 x2 x3 = x03 now, let’s begin by finding another equation for kA − CX T k2F = tr A − CX T
T
A − CX T
= tr(AT A − XC T A − AT CX T + XC T CX T ) = tr(AT A) − 2tr(AT CX T ) + tr(XC T CX T ). 9 of 20
Lee Gao
Exercise 4 – All Of Your Singular Bases. . .
Now, let’s try to figure out what the tr(ABC) is: tr(ABC) = tr((AB)C) 0 a1 a02 = tr a03 0 a1 b1 = tr a02 b1 a03 b1 | 0 d1 = tr d02 d03 X = d0i ci
b1 b2 b3 C
a01 b2 a01 b3 a02 b2 a02 b3 C a03 b2 a03 b3 {z } =D c1 c2 c3
i
where d0i =
a0i b1 a0i b2 a0i b3
= a0i B so tr(ABC) =
X
a0i Bci
i
Using this formula, we can substitute the desired ABC to find that X X tr(AT CX T ) = aTi Cx0T aTi Cbi i = i
i
Next, let’s try to figure out what the trace of AT A is, since the XC T CX T reduces to this anyways. T a1 tr(AT A) = tr aT2 a1 a2 a3 aT3 X = aTi ai i
Now, let’s suppose that E = CX T , then tr(E T E) =
X
eTi ei
i
where c01 E = c02 c03
10 of 20
x0T 1
x0T 2
x0T 3
Lee Gao
Exercise 4 – All Of Your Singular Bases. . .
c01 b1 c01 b2 c01 b3 = c02 b1 c02 b2 c02 b3 c03 b1 c03 b2 c03 b3
so c01 b1 ei = c02 b1 c03 b1
= Cbi and we know that tr(XC T CX) =
X
bTi C T Cbi
i
Anyways, we can put this together to find that X kA − CBk2F = aTi ai − 2aTi Cbi + bTi C T Cbi i
But wait, that inner expression looks really familiar! Since I was hoping t hat this connected to the least squared problem, it took little to figure out why. Let’s look at the expansion of the expression we’re trying to minimizing in least squared: ky − Cxk22 = (y − Cx)T (y − Cx) = y T y − 2y T Cx + xT C T Cx Therefore, if we just replace y = ai and x = bi , we find that X kA − CBk2F = kai − Cbi k22 i
One of the first things to notice is that the 2-norm is always positive, so we should theoretically be able to allow both the min and the sum to commute with respect to each other. Theorem (Distributivity of min). X X min kai − Cbi k22 = min kai − Cbi k22 B
i
i
bi
ˆ such that Proof. Suppose not, then there must be some other B X X kai − C ˆbi k22 < kai − Cbi k22 i
i
and in particular there must exists some i such that kai − C ˆbi k22 < kai − Cbi k22 (because if not, then the sum of the left cannot be less than the sum of the right). But that means ˆbi is a better minimizer than bi for the system kai − Cxk22 , which is absurd since bi is the best minimizer. Therefore, the theorem holds. 11 of 20
Lee Gao
Exercise 4 – All Of Your Singular Bases. . .
This is awesome, because we already have great algorithms to solve the system min kai − Cbi k22 , bi
in fact, we can do this in one line in Matlab B(:,i) = C\A(:,i); => B = C\A; Because the above becomes equivalent to B = C † C T A = (C T C)−1 C T A But let’s be a little bit more cautious (rather than using C\A) and implement the algorithm Listing 4: Algorithm to minimize the Frobenius norm. 1 2 3 4 5 6 7
function [ B ] = mtfindB (A, C) [ n , k ] = s i z e (C) ; B = zeros ( k , n ) ; f o r i = 1 : length (A) B ( : , i ) = C\A( : , i ) ; end end
Listing 5: Shortest code to minimize the Frobenius Norm. 1 2 3
function [ B ] = mtfindB (A, C) B = C \ A; end
Tests and Validation This is yet another problem that is tricky to test, mostly because we don’t really have a way to know whether we are really minimizing the Frobenius norm. There are a few properties we can count on however, one of which is the property that X kAk2F = σi2 i
where σi are the singular values of A. Now, suppose that A = U ΣV T then the best rank n − k approximation of this (in the Frobenius/sum of squared singular values sense) would be n X ui σi viT i=k+1
12 of 20
Lee Gao
Exercise 5 – Triangulation
where σi is ordered from highest to lowest. Alternatively min rank(D)=k
kA − Dk2F =
n X
σi2
i=k+1
ˆ ∈ Rn×k be the truncated U , then the best so we would expect that if wePgave C = U k T minimizer would make CB = i σi ui vi , so we could write a script that just tests for this property.
1 2 3 4 5 6 7 8
for i = 1:100 kn = sort ( r a n d i ( [ 1 1 0 0 ] , 2 , 1 ) ) ; k = kn ( 1 ) ; n = kn ( 2 ) ; A = rand ( n ) ; [U S V] = svd (A) ; C = U∗S ; C = C ( : , 1 : k ) ; B = mtfindB (A, C) ; a s s e r t (norm(B’ − V ( : , 1 : k ) ) < 1 e −10) ; end
Furthermore, we would also assume that this minimizer is a local minimum (because the Frobenius norm of a linear function is convex), so we would expect that if we were to perturb the components of B (by something relatively small, but not too small to get into the murky realm of round-off), then the resulting residual Frobenius norm would be larger. This is also a easy to test property, resulting in the following script
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
for i = 1:100 kn = sort ( r a n d i ( [ 1 2 0 ] , 2 , 1 ) ) ; k = kn ( 1 ) ; n = kn ( 2 ) ; A = rand ( n ) ; C = rand ( n , k ) ; B = mtfindB (A, C) ; min norm = norm(A − C∗B, ’ f r o ’ ) ; % p e r t u r b each e n t r y o f B and s e e what d i r e c t i o n we move for a = 1 : n for b = 1 : k eps = 1 e −5; B = B ; B ( b , a ) = B( b , a ) + eps ; new norm = norm(A − C∗B , ’ f r o ’ ) ; a s s e r t ( new norm >= min norm ) ; end end end
Both of these seem to pass, which gives credit to our method. Exercise 5 Triangulation1 Let (xi , yi ) for i = 1, · · · , n be the coordinates of n points in the plane. Let D ∈ Rn×n be a matrix of squared Euclidean distances, i.e. dij = (xi − xj )2 + (yi − yj )2 . Supposing the first three points are not co-linear, write a function to recover all the points from the distance matrix and the coordinates of those points. 1
actually, according to Wikipedia, this is actually called trilateration
13 of 20
Lee Gao
Exercise 5 – Triangulation
function [x,y] = mtrecover(D, x1 , y1 ,
x2 , y2 , x3 , y3 )
Let’s start simple and work out how to recover one point given the initial three points. We know√that√the point of interest, (x, y) lay on the intersection of the three circles of radius √ d1 , d2 , d3 centered on the first three points, which gives the equation: (x1 − x)2 + (y1 − y)2 = d1 (x2 − x)2 + (y2 − y)2 = d2 (x3 − x)2 + (y3 − y)2 = d3 Assuming that the distance matrix isn’t poorly formed, this should be a fully-determined system. Now, solving for x and y individually is a little tedious, so let’s transform this system. Let ∆x = x1 − x and ∆y = y1 − y, then rewrite the above system into ∆2x + ∆2y = d1 (∆x + x )2 + (∆y + y )2 = d2 (∆x + 0x )2 + (∆y + 0y )2 = d3 where ∆x + x = x2 − x =⇒ x = x2 − x1 and 0x = x3 − x1 , and similar for y and 0y . Consider the second of the three equations above, if we expand out the left hand quadratic out, we would get (∆2x + 2x ∆x + 2x ) + (∆2y + 2y ∆y + 2y ) from here, we can subtract out the squared ∆s as (∆2x + 2x ∆x + 2x ) + (∆2y + 2y ∆y + 2y ) − (∆2x + ∆2y ) = d2 − d1 2x ∆x + 2y ∆y = d2 − d1 − (2x + 2y ) 02 20x ∆x + 20y ∆y = d3 − d1 − (02 x + y )
why, this is just a linear system! d2 − d1 − (2x + 2y ) x y 2 ∆= 02 0x 0y d3 − d1 − (02 x + y ) which can be solved really easily, as −1 0 1 x y y −y = 0x 0y −0x x x 0y − 0x y We can implement this as a sub-function Listing 6: Trilaterating One Point 1 2 3 4 5 6 7
function [ x , y ] = t r i l a t e r a t e ( d1 , d2 , d3 , x1 , y1 , x2 , y2 , x3 , y3 ) ex = [ x1 − x2 ; x1 − x3 ] ; ey = [ y1 − y2 ; y1 − y3 ] ; b = [ d2 − ( ex ( 1 ) ˆ2 + ey ( 1 ) ˆ 2 ) ; d3 − ( ex ( 2 ) ˆ2 + ey ( 2 ) ˆ 2 ) ] − d1 ; D = ( [ ex ey ] \ b ) / 2 ; x = x1 − D( 1 ) ; y = y1 − D( 2 ) ; end
14 of 20
Lee Gao
Exercise 5 – Triangulation
and finally, we can just find the rest of the points iteratively Listing 7: mtrecover 1 2 3 4 5 6
function [ x , y ] = m t r e c o v e r (D, x1 , y1 , x2 , y2 , x3 , y3 ) x = [ x1 ; x2 ; x3 ] ; y = [ y1 ; y2 ; y3 ] ; f o r i = 4 : length (D) [ x i , y i ] = t r i l a t e r a t e (D( 1 , i ) , D( 2 , i ) , D( 3 , i ) ) ; x = [ x ; xi ] ; y = [ y ; yi ] ; end
7 8 9 10 11 12 13 14 15
function [ x , y ] = t r i l a t e r a t e ( d 1 , d 2 , d 3 ) ex = [ x2 − x1 ; x3 − x1 ] ; ey = [ y2 − y1 ; y3 − y1 ] ; b = [ d 2 − ( ex ( 1 ) ˆ2 + ey ( 1 ) ˆ 2 ) ; d 3 − ( ex ( 2 ) ˆ2 + ey ( 2 ) ˆ 2 ) ] − d 1 ; d = ( [ ex ey ] \ b ) / 2 ; x = x1 − d ( 1 ) ; y = y1 − d ( 2 ) ; end end
Tests and Validation In order to test this method, we can generate a random set of x and y coordinates, compute D, and check if the recovered x and y vectors match up with what we sent in. Digression: How to compute D really inefficiently. Recall that dij = (xi − xj )2 + (yi − yj )2 . Now, we can compute the difference matrix ∆xij = xi − xj via ∆x = xeT − exT , ∆y = yeT − ey T of course, this doesn’t exactly give us the (xi − xj )2 that we want. Let’s expand this out ∆x2ij = (xi − xj )2 = x2i − 2xi xj + x2j so let’s define the operator x ⊗ y = diag(x)y be the component-wise product of x and y, and furthermore let x = x ⊗ x, then T T x e + e x = x2i + x2j ij T
xx
ij
= xi xj
T ∆x2 = x eT − 2xxT + e x T ∆y 2 = y eT − 2yy T + e y D = ∆x2 + ∆y 2 which lends to the test script 15 of 20
Lee Gao
Exercise 6 – Taming Bidiagonals
1 2 3 4
5
6 7 8
for n x D
i = = =
= 1:100 randi ([4 ,1000] ) ; rand ( n , 1 ) ; y = rand ( n , 1 ) ; e = o n e s ( n , 1 ) ; ( x . ˆ 2 ) ∗ ( e ’ ) − 2∗ x∗x ’ + e ∗ ( ( x . ˆ 2 ) ’ ) + ( y . ˆ 2 ) ∗ ( e ’ ) − 2∗ y∗y ’ + e ∗ ( ( y . ˆ 2 ) ’ ) ; [ xnew , ynew ] = m t r e c o v e r (D, x ( 1 ) , y ( 1 ) , x ( 2 ) , y ( 2 ) , x ( 3 ) , y (3) ) ; a s s e r t (norm( xnew − x ) < n∗1 e −10) ; a s s e r t (norm( ynew − y ) < n∗1 e −10) ; end
Exercise 6 Taming Bidiagonals Suppose L is nonsingular and lower bidiagonal (i.e. lij is only nonzero for i = j or i = j + 1). Write a program to compute κ∞ (L) in O(n) time. function [condL] = mtcondL(L) Hint: The Matlab cond function is not O(n)! Recall that κ∞ (A) = kAk∞ kA−1 k∞ now, if L is bidiagonal, we can pretty easily compute the maximum row sum (2 additions for n rows). The tricky part is trying to compute the inf norm of L−1 . Now L is bidiagonal, which affords us with a lot of nice properties. Unfortunately, inverting it does not preserve L’s bidiagonal structure. (It will however still remain lower triangular) Therefore, just storing all of the entries of L−1 (nevermind computing them) will take O(n2 ) time. Can we do better?2 As a first thought, it occurred to me that the row-sum operation is a linear operation, so computing the row sum of L−1 is equivalent to the system L−1 e one of the nice things about L being lower bidiagonal is that it allows us to solve system of equation really quickly. In fact, we can do this in O(n) time! Here’s how: we’re essentially solving the system Lx = e, which looks like x1 1 l1 r2 l2 x2 1 x3 = 1 r3 l3 x4 1 r4 l4 looking at the sets of equations this generates l1 x1 = 1 r2 x1 + l2 x2 = 1 r3 x2 + l3 x3 = 1 2
Well, the problem statement obviously hints that we can...
16 of 20
Lee Gao
Exercise 6 – Taming Bidiagonals
.. . x1 = l1−1 1 − rk xk−1 xk = lk We can implement this as the subfunction Listing 8: Row sum of the inverse of a bidiagonal 1 2 3 4 5 6
function [ x ] = invrowsum (L) x= [ L ( 1 , 1 ) ˆ −1]; f o r k = 2 : length (L) x = [ x ; ( 1 − L( k , k−1)∗x ( k−1) ) /L( k , k ) ] ; end end
It turns out that this isn’t quite what we’re looking for. We want the absolute row sum, not just the row sum. :( we’re so close! At this point, I decided to explore around with bidiagonal matrices. In particular, I was 2 interested in the algebraic structures of L’s inverse (since we fill in nearly n2 of it by inverting L). For the sake of puns, let’s call L−1 = Γ. Now, we have the system LΓ = I, which looks like l1 γ11 1 r2 l2 γ21 γ22 1 = γ31 γ32 γ33 r3 l3 1 r4 l4 γ41 γ42 γ43 γ44 1 now, this seems not at all fun to compute, but let’s just look at the sets of equations that it generates l1 γ11 = 1 r2 γ11 + l2 γ21 = 0 l2 γ22 = 1 r3 γ21 + l3 γ31 = 0 r3 γ22 + l3 γ32 = 0 l3 γ33 = 1 r4 γ31 + l4 γ41 = 0 r4 γ32 + l4 γ42 = 0 r4 γ33 + l4 γ43 = 0 l4 γ44 = 1 Inductively, we can show that γkk =
1 lk
furthermore, we also see that rk γk−1,1 + lk γk1 = 0 17 of 20
Lee Gao
Exercise 6 – Taming Bidiagonals
and inductively ∀j
rj γk−1,j + lk γkj = 0 We can compute every gamma with the recurrence 0 γkj = lk−1 rk − lk γk−1,j and
Γ=
l1−1 − l1r2l2
j>k j=k j 1 – Here, γkj = 0, which shows the case. j = 1 – Here, γkj = ˆl−1 , which is positive by construction. 1
j < 1 – Impossible. The inductive step proceeds as follows. Once again, we do a case analysis on k: j > k + 1 – Here, γk+1,j = 0, which shows the case. −1 > 0 by construction, which shows the case. j = k + 1 – Here, γk+1,j = ˆlk+1 rˆ j < k + 1 – Here, γk+1,j = − ˆk+1 γkj , now, rˆk+1 < 0 and ˆlk+1 > 0 by construction, so lk+1
rˆ − ˆk+1 lk+1
> 0. By the induction hypothesis, we know that γkj ≥ 0, so γk+1,j ≥ 0 as well,
which shows the case and concludes the proof. ˆ (which should take O(n) operations) and So we can now just transform the matrix L into L ˆ = e, which also takes O(n) operations. But in fact, we can implicitly then compute Lx ˆ into the solver directly! Recall that the recurrence incorporate the transformation of L → L −1 ˆ e as we used to compute x = L x1 = ˆl1−1
xk =
1 − rˆk xk−1 ˆlk
From the constructive lemma above, we know that ˆlk = |lk | and rˆk = −|rk |, so we can transform the above directly as x1 = |l1 |−1
xk =
So we can compute infinity norm condition number as
19 of 20
1 + |rk |xk−1 |lk |
LISTINGS
Lee Gao
Listing 9: Computing the Condition Number in O(n) Time 1 2 3 4 5 6 7 8 9
function [ condL ] = mtcondL (L) l = abs ( diag (L , 0 ) ) ; r = [ 0 ; abs ( diag (L, −1) ) ] ; nL = l ( 1 ) ; nI = l ( 1 ) ˆ −1; f o r k = 2 : length ( l ) nL ( k ) = r ( k ) + l ( k ) ; nI ( k ) = ( 1 + r ( k ) ∗ nI ( k−1) ) / l ( k ) ; end condL = max( nL ) ∗ max( nI ) ; end
Tests and Validation This is pretty easy to test. Construct random lower bidiagonal matrices, then compute its inf norm using mtcondL(L) and compare it to the inf-norm computed using norm(L, inf).
1 2 3 4 5 6
for i = 1:100 n = randi ([10 ,1000]) ; L = i ∗rand ( n ) ; L = diag ( diag (L) ) + diag ( diag (L, −1) , −1) ; c1 = mtcondL (L) ; c2 = cond (L , i n f ) ; a s s e r t ( abs ( ( c1 − c2 ) / c2 ) < 1 e −10) ; end
this seems to work even when L is ill-conditioned, according the the warnings that Matlab is giving me.
20 of 20