CS 6210 Homework 3

Report 12 Downloads 147 Views
CS 6210 Homework 3 Lee Gao (lg342) October 9, 2013

Exercise 1 Hessenberg LU. A matrix H ∈ Rn×n is upper Hessenberg if Hij = 0 for i > j + 1. That is, anything below the first subdiagonal must be zero. Without using Matlab’s lu. Write an O(n2 ) function to compute P H = LU using Gaussian elimination with partial pivoting. Note that the Schur complements will always remain upper Hessenberg: why? Your function should have the signature function [P,L,U] = hw3lu(H) An upper hessenberg matrix looks like 

x x x  x x x   x x x

Now, let’s trace through the major to mean the entries of U .    x x x x x x  x x x x   0 x ˆ  →   x x x  x x x

 x x  . x  x

steps of transforming this into an upper form, I’ll use x ˆ x x ˆ x x

  x x x   0 x ˆ x ˆ  →   0 x x

x x ˆ x ˆ x

  x x x x x  0 x x ˆ  ˆ x ˆ x ˆ →  0 x ˆ x ˆ x ˆ  0 x ˆ x

   

this suggests that at each step, we only need up to O(n) operations, which suggests that we can come up with an O(n2 ) algorithm here. Let’s first consider the pivotless version. The algorithm roughly looks like

Algorithm 1 Pivotless Hessenberg LU function Hess-LU(A) . A is upper Hessenberg for k = 1, . . . n − 1 do a ak+1,k = k+1,k akk A(k + 1, k + 1 : end) += −ak+1,k × A(k, k + 1 : end) end for return L = trilower(A) + I, U = triupper(A) end function 1

Lee Gao

We can do a runtime analysis on this, counting the number of flops performed: we have a for loop for k = 1, . . . n − 1, so the runtime will look like n−1 X

gk

k=1

Now, the computation of ak+1,k costs a single flop while the update of the k + 1st column costs (n − k − 1) × 2 flops, so gk = 2n − 2k − 1, and we get n−1 X

gk = (2n − 1)(n − 1) − n × (n − 1) = (n − 1)2 = O(n2 ) flops.

k

Now, we can incorporate partial pivoting by swapping rows at each iteration so that the akk entry is always larger than ak+1,k , which means that we only need to decide whether to swap the k and the k + 1 row or not. Since both of these rows have the same structure, then swapping will not change the shape of the Schur Complement, which will remain upper hessenberg. Algorithm 2 Hessenberg LU with Partial Pivoting function Hess-PLU(A) . A is upper Hessenberg P ←I for k = 1, . . . n − 1 do if akk < ak+1,k then . Swap so diag is larger Swap the k th and k + 1st row of A and P end if a ak+1,k ← k+1,k akk A(k + 1, k + 1 : end) += −ak+1,k × A(k, k + 1 : end) end for return P, L = trilower(A) + I, U = triupper(A) end function Here, the only addition is the conditional checking for whether to swap or not, so effectively our new runtime gˆk = gk + 2 + n (we can represent the permutation matrix P as a vector, so that’s only O(1) flop to swap two rows of P then), so the final runtime will still be O(n2 ). Listing 1: Hessenber PLU 1 2 3 4 5 6 7 8 9

10 11 12 13

function [ P , L ,U ] = hw3lu (A) n = length (A) ; p = 1 : n ; f o r k = 1 : n−1 i f abs (A( k+1 ,k ) ) > abs (A( k , k ) ) A( k + 1 , : ) = A( k , : ) ; p ( [ k k +1]) = p ( [ k+1 k ] ) ; end A( k+1,k ) = A( k+1,k ) /A( k , k ) ; A( k+1, k+1:end ) = A( k+1, k+1:end ) − A( k+1, k ) ∗A( k , k+1:end ) ; end L = t r i l (A, −1)+eye ( n ) ; U = t r i u (A) ; P = eye ( n ) ; P = P ( : , p ) ; end

2

Lee Gao

Exercise 2 Elementary, upon reflection. Complete the following routine: function [u] = hw3reflect(v,w) % Find a reflector H = I - 2*u*u’ s.t. w and H*v are parallel. % Assume v and w are nonzero column vectors of equal length. Let’s boil this down to the 2-D case and see what we would do. Let all hatted vectors ˆ· be the unit vectors parallel to · Figure 1: Figuring out what u is

r = vˆ + w ˆ vˆ u w ˆ

it seems that we can get the reflection vector r by taking the average of vˆ and w, ˆ calling the normalized reflection vector rˆ, we can find u by finding the component of v relative to r and subtracting that off from v: u = v − projrˆv = v − rˆT vˆ r which lends to the generalized code Listing 2: Reflector 1 2 3 4 5

function [ u ] = h w 3 r e f l e c t ( v , w) v = v/norm( v ) ; w = w/norm(w) ; r = v + w ; r = r /norm( r ) ; u = v − r ’ ∗ v∗ r ; u = u/norm( u ) ; end

Exercise 3 Alternate orthogonalization. Recall that any symmetric positive definite matrix M defines an inner product hx, yiM = y T M x and an associated norm kxk2M = xT M x. We can associate with this an alternate notion of the QR decomposition: (a) Write a Matlab function [W,R] = hw3wr(A,M) that computes the factorization A = W R,

W T M W = I.

Hint: Use QR factorization of A and Cholesky factorization of M .

3

Lee Gao

Here’s one way of doing this WR factorization. Suppose that R is always invertible, then W = AR−1 = I T R now, let RM M = M be the cholesky factor of M , then recall that

kxkM = kRm xk2 and that hW X, W XiM = X T X = hX, Xi2 so let A = W R, then hA, AiM = hRm A, Rm Ai2 = hR, Ri2 let’s call this kind of a matrix product JAK◦ = hA, Ai◦ , then since JAK2 is invariant under orthogonal transformations, we know that JRm AK2 = JRK2 =⇒ ∃ orthogonal Q : QR = Rm A fortunately for us, this is also constructive because the QR decomposition is unique, so we know that R is upper triangular in the QR decomposition of Rm A where Rm is the cholesky factor of the SPD M . Finally, the equation JW KM = I means that JRM W K2 = I =⇒ RM W = Q in the least square sense, because we also have RM W R = QR, so we can also find that −1 Q W = RM

and RM is guaranteed invertible because M is SPD and hence non-singular. This then gives the algorithm Listing 3: WR Decomposition 1 2 3 4 5

function [W,R] = hw3wr (A,M) Rm = chol (M) ; [ Q,R] = qr (Rm∗A, 0 ) ; W = Rm\Q; end d P

(b) Write the matrix M such that if p(x) =

ai xd and q(x) =

i

then aT M b =

Z

d P

bj xj are polynomials,

j

1

p(x)q(x)dx −1

where a = [a0 , · · · , ad ] is the coefficient vector for p and b is the coefficient vector for q. Let x be the vector of such that it’s ith component is xi , then we know that p = aT x,

4

q = xT b

Lee Gao

so that Z

1

pqdx = aT (xxT )b dx

−1

since vector algebra obeys linear superposition and homogeneity, we can push that integral inside the inner product Z 1  T T xx dx b =a −1 Z 1  T =a Xd dx b −1

where Xd is the “shifted vandermonde matrix” of dimension d + 1, shifted in the sense that Xij = xi+j−2 , so in the 3 × 3 case   1  X2 =  x  1 x x 2 x2   1 x x2 =  x x 2 x3  x2 x3 x4 next, taking the integral componentwise, we find that ( 2 i + j is even Mij = i+j−1 0 otherwise 6 × 6 that looks like  0 23 0 25 0 2 2 2  3 0 5 0 7  2 2 0 5 0 7 0   2 2 2  5 0 7 0 9  2 2 0  5 0 7 0 9 2 2 2 0 7 0 9 0 11

so for example, the d = 5 case would be a  2  0  2  3 M6 =   0   2

Using the hw3wr command in first part of the problem to compute the normalized Legendre polynomials up to degree 5; these are polynomials qk such that the degree of P is k and ( Z 1 1 k=j qk qj dx = 0 otherwise −1 Let qk = CkT x, then Z

1

Z

1

qk qj dx = −1

then let Ck be the

k th

−1

CkT XCj dx

column of C, this is the same as saying that CT M C = I 5

Lee Gao

where C is upper triangular by qk being of degree k. Therefore, we can solve this problem by solving the equivalent system I = CC −1 ,

CT M C = I

I find that      C=    



2 2

0 0 0 0 0



0 √ √

2 3 2

√ √ 2 5 4

0

√ √ 3 2 5 4

0 0 0 0

−3

0 √

2 4



√ 9 2 16 7

0√

− 458 2 0√

0

√ √ 5 2 7 4

0 0 0

0



35

105 2 16

0 0



0

√ √ 15 2 11 16 √ √ 2 11 8

0

63

0

√ √ 2 11 16

        

which is the inverse of the cholesky factor of M . Exercise 4 Refined tastes. Consider the bordered linear system  A bT

b c



x y



 =

f g



where  A=

 1 1−h , 1−h 1   1 , b= 0

 f=

1 −1

 ,

g = 0.5,

c=1 and h = 10−14 . Note that the inverse of the block matrix is just  1 1 − (h−1) − h−1 2  − 1 0  h−1 1 (h−1)2

1 h−1

1 (h−1)2 1 h−1 2 h−h2 − (h−1) 2

  

and the solution ought to be 





x1   x2  =   y

2 h−3 2 (h2 −2 h+1) 1 − 2 (h−1) 1 − 12 h 1 2 −1



 3  −2  ≈ 1  2  2

2 h−h

(a) Solve the bordered system both by forming the whole 3-by-3 matrix and solving using Matlab’s backslash and by doing 2-by-2 block Gaussian elimination. What are the normwise relative residual errors for the two calculated results? 6

Lee Gao

If we form the three-by-three system directly and solve, we find that the computed solution is approximately   −1.5  0.5  2 which is incredibly close to the true solution. Using the symbolic tool box to evaluate the analytical solution of the system, I find that the 2-norm relative error of the solution is approximate 10−14 , which is almost as good as we can hope for. This makes sense: we’re slightly perturbing the bordered system to   1 1 1  1 1 0  1 0 1 which is boringly well-behaving, and so our 2-norm residual is just   0  O(10−16 )  0 Next, let’s do block level solve: we find that the solve boils down to the algorithm g − bT A−1 f c − bT A−1 b x = A−1 (f − by) y=

computing the residual however, I find that I can get the f part mostly correct, but my residual in g is unwieldy!      0 x ˆ A b  0 − [f ; g] =  yˆ bT c −0.0222 whose 2-norm residual has relative error O(10−2 ); this is clearly unacceptable! Let’s try to understand what is happening to our problem. (b) In the block 2-by-2 solve, what is the normwise relative residual for the step involving a solve with A? The steps involving a solve using A in the computation of y are the computations of A−1 f and A−1 b. Let’s first compute the inverse of A:   1 1 h−1 −1 A = h−1 1 2h − h2 so that q1 = A−1 b =

1 2h − h2



7

1 h−1



Lee Gao 1 ≈ 2h



1 −1



q2 = A−1 f = A−1 (b − e2 )     1 1 1−h = + h−1 −1 2h − h2   1 2−h = h−2 2h − h2   1 1 ≈ h −1     1 1 1014 14 and q2 ≈ 10 , but Matlab computes So we would expect q1 ≈ 2 −1 −1 them differently: q1 = 1.0e+13 * 5.0040 -5.0040 q2 = 1.0e+14 * 1.0008 -1.0008    1 1 10 . Therefore, and in q2 is 8 × 10 so the residual error in q1 is 4 × −1 −1 the relative error in both of these computations are 1010



δ = 8 × 10−4 ; this spells trouble for us if we allow cancellation to occur! Fortunately for us, the rest of the computation of y doesn’t alter the value of the numerator nor the denominator much, and so we’re really just computing the relative error of division, which just subtracts out the errors. Since both the numerator and the denominator has 10−4 relative error, the total error gets canceled out (this is really awesome!) and the relative error for the computation of y is just 10−14 . Yay! Next, let’s look at the computation of x. >> x = A\(f-b*y) x = -1.522222222222237 0.522222222222222 so the relative error of x ˆ is approximately

 

−3 1

x

ˆ − 2

1

 

1 −3

2

1 which is significant! 8

2-norm



0.02

Lee Gao

(c) What are the condition numbers of A and of the bordered system respectively? Argue that the relative error in the x computed by block 2-by-2 elimination is consistent with the bounds based on the relative residual and the condition number. The 2-norm condition number of A, according to Matlab at least, is 2 × 1014 , while it is just 6 for the 3-by-3. Now, suppose that the error of the computation of f − yb is within machine precision of /2, then we would expect the error of the final computation to be κA δ ≈ O(10−2 ) which is consistent with the error we saw. (d) For the 2-by-2 block approach, perform one step of iterative refinement on the bordered system. What is the residual error for this new solution? We shall refer to Govaerts and Pryce’s paper, using the refinement process:        ef f A b x ˆ = − eg h bT c yˆ eg − bT A−1 ef c − bT A−1 b ex = A−1 (ef − bey )   ∗     ex x ˆ x + = yˆ ey y∗ ey =

actually computing the errors to compensate: >> ey = (eg - b’*(A\ef))/(c - b’*(A\b)) ey = -2.220446049249299e-16 >> ex = A\(ef - b*ey) ex = 0.022222222222217 -0.022222222222217 and the absolute residual error of the new system is O(10−16 ), which is extremely close!

9