CS 6210 Homework 1 Lee Gao (lg342) September 9, 2013 Exercise 1 A problem of performance. In addition to dense matrices, Matlab supports a sparse matrix data structure, in which only the nonzero elements of the matrix are stored. For a variety of square matrix size n and sparsity levels s, compare the speed of dense matrix-vector multiply and sparse matrix-vector multiply. What do you observe about the relative performance of these options? NOTE: I define s to be the percent of non-zeros, AKA s = 1 is fully dense. So I decided to vary the sparsity level s linearly between 0.1 and 1 while increasing the size quadratically between 9 and 2500. I used the following code to generate my time matrix:
1 2 3 4 5
ns = ns = ss = time time
3:1:50; ns . ˆ 2 ; 0.1:0.1:1; s p a r s e = zeros ( length ( ns ) , length ( s s ) ) ; d e n s e = zeros ( length ( ns ) , length ( s s ) ) ;
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
f o r i = 1 : length ( ns ) f o r j = 1 : length ( s s ) n = ns ( i ) ; s = s s ( j ) ; As = sprand ( n , n , s ) ; A = f u l l ( As ) ; x = rand ( n , 1 ) ; tic ; y = A∗x ; t i m e d e n s e ( i , j ) = toc ; tic ; y = As∗x ; t i m e s p a r s e ( i , j ) = toc ; end end
and plotted a 3D scatter plot of the time distribution unfortunately, the density of the information here makes this graph all but useless. From it, we can only glimpse some simple properties: (a) For the sparse matrices, the operation time seems to grow proportional to both the matrix size (the 0 → 2500 scale) and the sparsity level; (b) For the dense matrices, the operation time seems to grow only with respect to the matrix size, but not so much from the sparsity level. 1 of 9
Lee Gao
Figure 1: 3D scatter plot of the time distribution
We can take a closer look, consider figure 2 now, for low sparsity levels, it seems that while both the dense operation and the sparse operation are growing about linearly, sparse is faring far better than dense multiplication. However, this advantage seems to tapper off as we increase the sparsity level as can be seen in the s = 0.9 graph. Furthermore, as can be seen in3, for any sized matrices, the multiplication time for dense matrices seems to be invariant under the sparsity level. This seems to also be the case for small sparse matrices, but as we increase the size of the matrices, the difference between more sparse and less sparse operations become drastically different, the s = 0.1 operation being ten times faster than the dense s = 1 operation for 2500 × 2500 matrices!
2 of 9
Lee Gao
Figure 2: Time vs Matrix Size for various sparsity levels
3 of 9
Lee Gao
Figure 3: Time vs Sparsity for various matrix size
4 of 9
Lee Gao
Exercise 2 Identity plus low-rank. Suppose u, v ∈ Rn and A = I + uv T . Write a function to compute Ak x in O(n) time. Your code should have the signature
1
function [ Akx ] = p2pow ( u , v , k , x )
Note: If c is a scalar, you may assume c^k is computed in O(1) time. Let’s first compute a few of Ak out for some reasonably sized ks (using the binomial theorem): A = I + uv T A2 = I + 2uv T + uv T uv T A3 = I + 3uv T + 3uv T uv T + uv T uv T uv T .. . and inductively k X k k A = (uv T )i i i=0
we can write this last equation into a nicer form by exploiting commutativity and grouping together all of the inner products k X k =I+ u(v T u)i−1 v T i i=1
since the inner product is all scalars, we can disassociate that from the outer product. Furthermore, since the outer product is invariant under i, we can take that out of the summation k X k (v T u)i−1 i
=I+
! uv T
i=1
now, to simplify the notation, let z = uT v, and assuming that z 6= 0 (we will do a separate case analysis for z = 0)
k X i=1
k P
k i−1 z = i
i=1
zi
z
=
k i
k P
i=0
k i
zi
−1
z
fortunately for us (how lucky!), the sum is the binomial expansion of (z + 1)k =
(z + 1)k − 1 z 5 of 9
Lee Gao
therefore, when u, v are not orthogonal, we have Ak = I +
(uT v + 1)k − 1 T uv uT v
otherwise, when uT v = 0, we just have k A =I+ uv T 1 k
this gives a really nice algorithm to compute Ak x; let z = uT v, then ( k x + (z+1)z −1 u(v T x) if z 6= 0 k A x= x + ku(v T x) otherwise
(1)
which gives
1 2 3 4 5 6 7 8
function [ Akx ] = p2pow ( u , v , k , x ) z = u’∗v ; coef = k ; i f ( z ˜= 0 ) c o e f = ( ( z + 1 ) ˆk − 1 ) / z ; end Akx = x + c o e f ∗ ( v ’ ∗ x ) ∗ u ; end
Here, all inner products and vector operations are O(n), so this algorithm terminates after a constant number of such vector operations and hence must also be O(n). Exercise 3 Structured perturbations. In this problem, we consider the structure of the groups of orthogonal and symplectic matrices: (a) Suppose Q : R → Rn×n is a differentiable matrix-valued function and that Q(t) is ˙ orthogonal for all t. Show that S(t) = Q(t)T Q(t) is a skew-symmetric matrix, i.e. T S(t) = −S(t) . Proof. Take the assumption that Q is orthogonal, then we get the equation QT Q = I. Differentiate this equation on both sides to get QT Q˙ + Q˙ T Q = 0 which immediately gives ˙ T = S + ST = 0 QT Q˙ + (QT Q) so S is skew-symmetric. (b) Suppose S : R → R2n×2n is a differentiable matrix-valued function and that S(t) is symplectic for all t, i.e. 0 I T S JS = J, J= . −I 0 Show that M = S −1 S˙ is Hamiltonian, i.e. JM + M T J = 0. 6 of 9
Lee Gao
Proof. We will proceed by direct algebraic manipulation in two steps: in the first step, I will give an equation for JM and in the second, I will show that JM is symmetric. ˙ −1 . The first is We will need some machinery first: J = −J T and (X˙−1 ) = X −1 XX obvious, the second one comes from differentiating the equation XX −1 = I. First, we take the symplecticity equation S T JS = J S T J = JS −1 now, let’s take the derivative of both sides ˙ ) ˙ −1 + J (S −1 S T J˙ + S˙ T J = JS ˙ −1 S˙ T J = J |S −1 {z S} S M
S˙ T JS = JM Next, let’s again start at the symplecticity equation S T JS = J, but let A = S T J, so that A˙ = S˙ T J + S T J˙ = S˙ T J to get AS = J taking the derivative of both sides ˙ + AS˙ = 0 AS S˙ T JS + S T J S˙ = 0 S˙ T JS = −S T J S˙ recall that J = −J T S˙ T JS = S T J T S˙ = (S˙ T JS)T but we’ve already shown that S˙ T JS = JM , so JM = (JM )T JM − (JM )T = 0 JM − M T J T = 0 JM + M T J = 0 which concludes the proof.
7 of 9
Lee Gao
Exercise 4 Norm! Suppose W ∈ Rm×n is nonnegative and row stochastic (each row sums to one), and let p ≥ 1 be fixed. The function α 7→ αp is convex, so if y = W x, then X |yi |p ≤ wij |xj |p . j
Argue that (a) If uj =
P
wij and vj = |xj |p , then kykpp ≤ u · v.
i
kykpp =
X
|yi |p ≤
XX
i
=
i
XX j
wij |xj |p
j p
wij |xj |
i
! =
X X j
i
{z
| =
|xj |p |{z}
wij
X
}
uj
vj
uj vj
j
=u·v (b) Therefore, kykpp ≤ kuk∞ kvk1 . kykpp ≤ u · v X = uj vj j
recall that uj , vj are non-negative ≤
X max uk vj k
j
=
max uk
X
k
vj
j
= kuk∞ kvk1 (c) Therefore, kykpp ≤ kW k1 kxkpp . P P Here, it’s clear that kvk1 = |xj |p = kxkpp . Furthermore, kuk∞ = maxj wij , along j
i
with the fact that W is non-negative, makes this the maximal column sum of W , which is exactly kW k1 ! Thus kykpp ≤ kuk∞ kvk1 = kW k1 kxkpp
8 of 9
Lee Gao 1/p
(d) Therefore, kW kp ≤ kW k1 . Since y = W x, the previous statement gives kW xkpp ≤ kW k1 kxkpp kW xkp 1/p ≤ kW k1 kxkp now, if this holds for any arbitrary x, then it must also hold for the maximizing x, so that max x
kW xkp 1/p ≤ kW k1 kxkp 1/p
kW kp ≤ kW k1 which concludes the proof.
9 of 9