arXiv:1410.0633v2 [stat.ML] 5 Oct 2014
Deterministic Conditions for Subspace Identifiability from Incomplete Sampling Daniel L. Pimentel-Alarc´on† , Robert D. Nowak† , Nigel Boston†‡ †
Electrical and Computer Engineering, ‡ Mathematics University of Wisconsin–Madison, WI, 53706, USA October 7, 2014 Abstract
Consider a generic r-dimensional subspace of Rd , r < d, and suppose that we are only given projections of this subspace onto small subsets of the canonical coordinates. The paper establishes necessary and sufficient deterministic conditions on the subsets for subspace identifiability.
1
Introduction
Consider low-rank matrices of size d × N with columns from a generic rdimensional subspace S ⋆ of Rd , r < d. Suppose that only a specific subset of the entries in these matrices are observed. This situation arises in the so-called matrix completion problem [1]. This paper establishes deterministic conditions on the sampling pattern of entries that guarantee that S ⋆ is the only r-dimensional subspace consistent with all such incomplete observations. It is easy to see that an identifiability condition of this sort can only be possible if at least r + 1 entries are observed in each column of the matrices, and so we will assume this bare minimum number of observed entries. Let Ω be a d × N binary mask with exactly r + 1 nonzero entries per column. Since ker S ⋆ is (d − r)-dimensional, we will see that N ≥ d − r is necessary for identifiability. Thus, we will assume N = d − r for the rest of the paper. Let ω i denote the ith ⋆ ⊂ Rr +1 the restriction of S ⋆ to the nonzero coordinates column of Ω, and Sω i in ωi . Let Gr(r, Rd ) denote the Grassmannian manifold of r-dimensional subspaces in Rd . Define S(S ⋆ , Ω) ⊂ Gr(r, Rd ) such that every S ∈ S(S ⋆ , Ω) satisfies ⋆ ∀ i. In words, S(S ⋆ , Ω) is the set of all r-dimensional subspaces Sω i = Sω i ⋆ matching S on Ω. The main result of this paper is the following theorem, which gives necessary and sufficient conditions on Ω to guarantee that S(S ⋆ , Ω) contains no subspace other than S ⋆ .
1
Given a matrix, let n(·) denote its number of columns, and m(·) the number of its nonzero rows.
Theorem 1. For almost every (a.e.) S ⋆ , with respect to the uniform measure over Gr(r, Rd ), S ⋆ is the only subspace in S(S ⋆ , Ω) if and only if for every matrix Ω′ formed with a subset of the columns in Ω, m(Ω′ ) ≥ n(Ω′ ) + r.
(1)
Example 1. The following mask, where 1 denotes a block of all 1’s, and I the identity matrix, satisfies the conditions of Theorem 1: 1 r Ω = I d − r.
2
Proof
For any subspace, matrix or vector that is compatible with a binary vector υ, we will use the subscript υ to denote its restriction to the nonzero coordinates/rows ⋆ is an r-dimensional subspace of Rr +1 , and the kernel of in υ. For a.e. S ⋆ , Sω i ⋆ Sωi is a 1-dimensional subspace of Rr+1 . ⋆ . All entries of aω i Lemma 1. Let aωi ∈ Rr +1 be a nonzero element of ker Sω i ⋆ are nonzero for a.e. S .
Proof. Suppose aωi has at least one zero entry. Use υ to denote the binary ⋆ , for every vector of the nonzero entries of aωi . Since aω i is orthogonal to Sω i T ⋆ T ⋆ uω i ∈ Sωi we have that aω i uωi = aυ uυ = 0. Then Sυ satisfies dim Sυ⋆ ≤ dim ker aT υ = kυk1 − 1 < kυk1 .
(2)
Observe that for every binary vector υ with kυk1 ≤ r, a.e. r-dimensional subspace S satisfies dim Sυ = kυk1 . Thus (2) holds only in a set of measure zero. Define ai as the vector in Rd with the entries of aωi in the nonzero positions ⋆ of ω i and zeros elsewhere. Then S ⊂ ker aT i for every S ∈ S(S , Ω) and every d −r i. Letting A be the d × (d − r) matrix formed with {ai }i=1 as columns, we have that S ⊂ ker AT for every S ∈ S(S ⋆ , Ω). Note that if dim ker AT = r, then S(S ⋆ , Ω) contains just one element, S ⋆ , which is the identifiability condition of interest. Thus, we will establish conditions on Ω guaranteeing that the columns of A are linearly independent. 2
Recall that for any matrix A′ formed with a subset of the columns in A, n(A′ ) denotes the number of columns in A′ , and m(A′ ) denotes the number of nonzero rows in A′ . Lemma 2 (Independence). For a.e. S ⋆ , the columns of A are linearly dependent if and only if m(A′ ) < n(A′ ) + r for some matrix A′ formed with a subset of the columns in A. In order to prove this statement, we will need Lemmas 3 and 4 below. Lemma 3. Let ℓ(A′ ) be the number of linearly independent columns in A′ . Then m(A′ ) ≥ ℓ(A′ ) + r for a.e. S ⋆ . Proof. Let υ be the binary vector of nonzero rows of A′ , and A′υ be the m(A′ )× n(A′ ) matrix formed with these nonzero rows. ′T ⋆ For a.e. S ⋆ , dim Sυ⋆ = r. Since Sυ⋆ ⊂ ker A′T υ , r = dim Sυ ≤ dim ker Aυ = ′ ′ m(A ) − ℓ(A ). We say ai is minimally linearly dependent on A′ if ai is linearly dependent on the columns of A′ , but linearly independent of every proper subset of the columns in A′ . Lemma 4. Let ai be minimally linearly dependent on A′ . Then m(A′ ) = n(A′ ) + r for a.e. S ⋆ . Proof. Let m = m(A′ ), n = n(A′ ), and ℓ = ℓ(A′ ). If A′ has only one column, then by Lemma 1, m = r + 1 and the claim holds. If A′ has more than one column, define β ∈ Rn such that A′ β = ai .
(3)
Note that because ai is minimally linearly dependent on A′ , all entries in β are nonzero. Since the columns of A′ are linearly independent, n = ℓ. Thus, by Lemma 3, m ≥ n + r. We want to show that m = n + r, so suppose for contradiction that m > n + r. We can assume without loss of generality that A′ has all its zero rows (if any) in the first positions. In that case, since ai is linearly dependent on the columns of A′ , it follows that the nonzero entries of ai cannot be in the corresponding rows. Thus, without loss of generality, assume that ai has its first r nonzero entries in the first r nonzero rows of A′ , and that the last nonzero entry of ai is b i ∈ Rr denote 1 (i.e., re-scale ai if needed), and is located in the last row. Let a
3
the vector with the first nonzero entries of ai , such that we can write:
′
A
ai =
0
0
C
bi a
B |
{z n
0 1 } | {z } 1
d − m r m−r−1 1,
(4)
where C and B are submatrices used to denote the blocks of A′ corresponding to the partition of ai . The columns of B are linearly independent. To see this, suppose for contradiction that they are not. This means that there exists some nonzero γ ∈ Rn , such that Bγ = 0. Let c = A′ γ and note that only the r rows in c corresponding to the block C may be nonzero. Let υ denote the binary vector of these nonzero entries. Since S ⋆ is orthogonal to every column of A′ and c is a linear combination of the columns in A′ , it follows that Sυ⋆ ⊂ ker cT υ . This implies that dim Sυ⋆ ≤ dim ker cT υ = kυk1 − 1. As in the proof of Lemma 1, this implies that the columns of B are linearly dependent only in a set of measure zero. Going back to (4), since the n columns of B are linearly independent and because we are assuming that m − r > n, it follows that B has n linearly independent rows. Let B 1 denote the n × n block of B that contains n linearly independent rows, and B 2 the (m − n − r) × n remaining block of B. Notice that the row of B corresponding to the 1 in ai must belong to B 1 , since otherwise, we have that B 1 β = 0, with β as in (3), which implies that B 1 is rank deficient, in contradiction to its construction. We can further assume without loss of generality that the first nonzero entry of every column of B is 1 (otherwise we may just re-scale each column), and that these nonzero entries are in the first columns (otherwise we may just permute e 2 denote all but the first row of the columns accordingly). We will also let B
4
B 2 . Thus, our matrix is organized as
A′
ai = B2
0
0
C
bi a
1
0
e2 B
B1
0
0 0 1
d−m r 1 m−n−r−1≥0 n−1 1.
(5)
Now (3) implies B 1 β = [ 0 | 1 ]T , and since B 1 is full-rank, we may write 0 , β = B −1 1 1 i.e., β is the the last column of the inverse of B 1 , which is a rational function in the elements of B 1 . Next, let us look back at (3). If m > n + r, then using the additional row [ 1 | 0 ] of (5) (which does not appear if m = n + r) we obtain [ 1 | 0 ]β = 0. Recall that all the entries of β are nonzero. Thus, the last equation defines the following nonzero rational function in the elements of B 1 : −1 0 1 0 B1 = 0. (6) 1 Equivalently, (6) is a polynomial equation in the elements of B 1 , which we will denote as f (B 1 ) = 0. Next note that for a.e. S ⋆ , we can write S ⋆ = ker A⋆ T for a unique A⋆ ∈ d ×(d−r ) R in column-echelon form1 : d−r ⋆ (7) A = r. D⋆
I
On the other hand every D ⋆ ∈ Rr×(d−r ) defines a unique r-dimensional subspace of Rd , via (7). Thus, we have a bijection between Rr ×(d−r ) and a dense open subset of Gr(r, Rd ). 1 Certain S ⋆ may not admit this representation, e.g., if S ⋆ is orthogonal to certain canonical coordinates. However, as discussed in Lemma 1, this is not the case for almost every S ⋆ in Gr(r, Rd ).
5
Since the columns of A′ must be linear combinations of the columns of A⋆ , the elements of B 1 are linear functions in the entries of D⋆ . Therefore, we can express f (B 1 ) as a nonzero polynomial function g in the entries of D ⋆ and re-write (6) as g(D ⋆ ) = 0. But we know that g(D⋆ ) 6= 0 for almost every D⋆ ∈ Rr×(d−r ) , and hence for almost every S ⋆ ∈ Gr(r, Rd ). We conclude that almost every subspace in Gr(r, Rd ) will not satisfiy (6), and thus m = n + r. We are now ready to present the proofs of Lemma 2 and Theorem 1. Proof. (Lemma 2) (⇒) Suppose that column ai in A is minimally linearly dependent on the columns in A′′ , a matrix formed with a subset of the columns in A. By Lemma 4, n(A′′ ) = m(A′′ ) − r. Let A′ = [ A′′ | ai ]. It is clear that m(A′ ) = m(A′′ ) and n(A′ ) = n(A′′ ) + 1. Thus m(A′ ) < n(A′ ) + r, and we have the first implication. (⇐) Suppose there exists an A′ with m(A′ ) < n(A′ ) + r. By Lemma 3, n(A′ ) > ℓ(A′ ), which implies A′ , and hence A, has a linearly dependent column. Proof. (Theorem 1) Lemma 1 shows that for a.e. S ⋆ , the (j, i)th entry of A is nonzero if and only if the (j, i)th entry of Ω is nonzero. (⇒) Suppose there exists an Ω′ such that m(Ω′ ) < n(Ω′ )+r. Then m(A′ ) < n(A′ ) + r for some A′ . Lemma 2 implies that the columns of A′ , and hence A, are not linearly independent. This implies dim ker AT > r. (⇐) Suppose every Ω′ satisfies m(Ω′ ) ≥ n(Ω′ ) + r. Then m(A′ ) < n(A′ ) + r for every A′ , including A. Therefore, by Lemma 2, A has d − r linearly independent columns, hence dim ker AT = r.
References [1] Emmanuel Cand`es and Benjamin Recht, “Exact Matrix Completion Via Convex Optimization,” in Foundations of Computational Mathematics, 2009, vol. 9, pp. 717–772.
6
List of symbols Symbol A A′ aω i ai a.e. d Gr(r, Rd ) i ℓ(·) m(·) n(·) N Ω Ω′ ωi υ r S Sω i S⋆ ⋆ Sω i ·υ ⋆ S(S , Ω)
Description
pp.
−r d × (d − r) matrix with {ai }id=1 as its columns Matrix formed with a subset of the columns in A. ⋆ Vector in Rr +1 orthogonal to Sω . i d Vector in R with the entries of aωi in the nonzero positions of ω i . Almost every with respect to the uniform measure over Gr(r, Rd ). Ambient dimension.
Grassmannian manifold of r-dimensional subspaces in Rd . Used to index vectors. In general, ∈ {1, ..., d − r}. Number of linearly independent columns in · . Number of nonzero rows in · . Number of columns in · . Number of columns in Ω and in A, N = d − r. d × (d − r) mask of observed entries with r + 1 nonzero entries per column. Mask formed with a subset of the columns in Ω. ith column of Ω. Arbitrary binary vector. Dimension of S ⋆ . r-dimensional subspace. Subspace of Rr+1 . The restriction of S to ω i . Subspace consistent with the incomplete observations. Subspace of Rr+1 . The restriction of S ⋆ to ω i . The restriction of · to υ. Set of all r-dimensional subspaces that agree with S ⋆ on Ω.
7
2 3 2 2 2 1 1 1 3 2 2 1 1 2 1 2 1 1 2 1 1 2 1