Probabilistic analysis of Wiedemann's algorithm for minimal ...

Report 2 Downloads 24 Views
arXiv:1412.5071v1 [cs.SC] 16 Dec 2014

Probabilistic analysis of Wiedemann’s algorithm for minimal polynomial computation ∗ Gavin Harrison Drexel University

Jeremy Johnson Drexel University

B. David Saunders University of Delaware, December 17, 2014

Abstract Blackbox algorithms for linear algebra problems start with projection of the sequence of powers of a matrix to a sequence of vectors (Lanczos), a sequence of scalars (Wiedemann) or a sequence of smaller matrices (block methods). Such algorithms usually depend on the minimal polynomial of the resulting sequence being that of the given matrix. Here exact formulas are given for the probability that this occurs. They are based on the Jordan block structure of the matrix. Sharp bounds follow from this for matrices of unknown Jordan structure. The bounds are valid for all finite field sizes and show that a small blocking factor can give high probability of success for all field cardinalities and matrix dimensions.

1

Introduction

The Wiedemann algorithm (Wiedemann, 1986) projects a matrix sequence Ai , for i = 0, 1, 2, . . . to a scalar sequence si = uT Ai v using random vectors u, v. When applied to solving the linear system Ax = v, only u is random and properties of the sequence vi = Ai v, for i = 0, 1, 2, . . . are also of interest. Block Wiedemann algorithms (Coppersmith, 1995; Eberly et al., 2006; Kaltofen, 1995; Villard, 1997, 1999) fatten uT to U having multiple rows and v to V having multiple columns, so that the projection is to a sequence of small matrices, Bi = U Ai V, for i = 0, 1, 2, . . .. All of the algorithms based on these random projections rely on preservation of some properties, including at least the minimal polynomial. When the minimal polynomial is preserved, a block Berlekamp/Massey algorithm can then compute this minimal polynomial (Yuhasz, 2009; Giorgi et al., 2003). Here we develop an analysis giving lower ∗ Research supported by National Science Foundation Grants CCF-1018063 and CCF1016728

1

bounds on the probability of success in preserving the minimal polynomial as a function of the dimensions of the given matrix, the dimensions of the small matrix blocks, and the cardinality of the field. Our bounds are less pessimistic than earlier bounds such as (Kaltofen and Saunders, 1991; Kaltofen, 1995) which primarily apply when the field is large. Even for cardinality 2, sufficient blocking assures high probability of preserving the minimal polynomial. A key idea exploited is that when the cardinality is small the number of low degree irreducible polynomials is also small. Wiedemann (1986) used this to make a bound for probability of minimal polynomial preservation in the non-blocked algorithm. Here we use it to compute an exact bound for the probability of success and we also encompass the blocked case. Every square matrix over a field F is similar over F to its generalized Jordan normal form, a block diagonal direct sum of the Jordan blocks of its elementary divisors (powers of irreducible polynomials in F[x]). We begin by considering the special case of a single Jordan block, then generalized (rational) Jordan normal forms, and finally the case of a general matrix. Our results show how to calculate the probability that the minimal polynomial is preserved under projection for any given matrix. Using this result we then bound the probability for all matrices.

2

Algorithm Review

Let F be a field. We are primarily interested in the case F = Fq , a finite field of cardinality q. Let S = (s0 , s1 , s2 , . . .), si ∈ F, be a linearly generated sequence defined by a generating polynomial c(x) = c0 +c1 x+. . .+cd−1 xd−1 +xd ∈ F[x], so d−1 X that sk = − ci sk−d+i for all k ≥ d. If no polynomial generating S has degree i=0

lower than that of c(x), c(x) is called the minimal generating polynomial (or simply minimal polynomial) of S. For a matrix A ∈ Fn×n , the minimal polynomial  of A is the minimal generating polynomial of Ai | i = 0, 1,2, . . . . Given vectors u, v ∈ Fn , the minimal polynomial of Ai v | i = 0, 1, 2, . . . divides the minimal polynomial of A, and the minimal polynomial of uT Ai v | i = 0, 1, 2, . . . divides the minimal polynomial of Ai v | i = 0, 1, 2, . . . . Wiedemann’s algorithm can be used to find the minimal polynomial of a matrix. Given a matrix A ∈ Fn×n , vectors u, v ∈ Fn are selected randomly. The first 2n elements of the sequence uT Ai v | i = 0, 1, 2, . . . , 2n − 1 suffice to compute the minimal polynomial. If the vectors u, v are in general position, the minimal polynomial of the projected sequence is the minimal polynomial of A. For large finite fields, Wiedemann’s algorithm has a high probability of success. Coppersmith (1995) gives an extension of Wiedemann’s algorithm using “fat vectors”, i.e. matrices U ∈ Fqb×n , V ∈ Fn×b to generate a sequence of small q  matrices, B = U Ai V | i = 0, 1, 2, . . . , 2n − 1 . From this sequence a minimal generating polynomial with matrix coefficients is computed. With high probability, the largest invariant factor of the minimal generating polynomial of B is

2

the minimal polynomial of A (Yuhasz (2009), Theorem 2).

3

Definitions and Jordan blocks

Let Fm×n be the vector space of m × n matrices over F, and Fm×n the vector ∞ space of sequences of m × n matrices over F. For a sequence S = (S0 , S1 , S2 , ...) ∈ Pd Fm×n and polynomial f (x) = i=0 fi xi ∈ F[x], define f (S) as the sequence ∞ Pd whose k-th term is i=o fi Si+k . This action is a multiplicative group action of F[x] on Fm×n , because (f g)(S) = f (g(S)) for f, g ∈ F[x] and f (S + αT ) = ∞ f (S) + αf (T ) for S, T ∈ Fm×n and α ∈ F. Further, if f (S) = 0 we say f ∞ annihilates S. In this case, S is completely determined by f and the leading d coefficient matrices S0 , S1 , . . . , Sd−1 . Then S is said to be linearly generated, and f (x) is also called a generator of S. Moreover, for given S, the set of polynomials that generate S is an ideal of F[x]. Its unique monic generator is called the minimal polynomial of S and is denoted minpoly(S). In particular, the ideal of the whole of F[x] is generated by 1 and, acting on sequences, generates only the zero sequence. For a square matrix A, the minimal polynomial of the sequence A¯ = (I, A, A2 , . . .) is also called the minimal polynomial of A. ¯ (minpoly(A) = minpoly(A)). We will consider the natural transforms of sequences by matrix multiplication on either side. For U ∈ Fb×m , U S = (U S0 , U S1 , U S2 , . . .) over Fb×n , and for V ∈ Fn×b , SV = (S0 V, S1 V, S2 V, . . .) over Fm×b . For any polynomial g, it follows from the definitions that g(U SV ) = U g(S)V . It is easy to see that the generators of S also generate U S and SV so that minpoly(U S) | minpoly(S), and minpoly(U SV ) | minpoly(SV ) | minpoly(S). More specifically, we are concerned ¯ , with uniformly random with random projections of a square matrix A, U AV U ∈ Fb×n , V ∈ Fn×b . Note that similar matrices have the same distribution of random projections since for a nonsingular matrix W , the (U, V ) projection of W −1 AW is the (U W −1 , W V ) projection of A. If U, V are uniformly random variables, then so are U W −1 and W V . Thus, without loss of generality, in the rest of the paper we will restrict attention to matrices in generalized Jordan normal form. We describe our notation for Jordan forms next. The companion matrix of a monic polynomial f (x) = f0 + f1 x + . . . + fd−1 xd−1 + xd of degree d is     0 0 0 ··· 0 −f0 Cf 0 0 ··· 0 0 1 0 0 · · · 0  I Cf 0 · · · 0 0  −f1      0 1 0 · · · 0  I Cf · · · 0 0  −f2  and Jf e =  Cf =   0  .. .. .. . .  .. . ..  .. .. . ..  ..  . . . . . .. . .. .  . . .  0

0

0

···

1

−fd−1

0

0

0

···

I

Cf

is the Jordan block corresponding to f e , a de × de matrix. It is standard knowledge that the minimal polynomial of Jf e is f e . When e = 1, Jf = Cf . Generalized Jordan normal forms are direct sums of primary components, L J = f J(f ), where the f are distinct irreducibles and each J(f ) is a direct 3

Lkf sum of Jordan blocks, J(f ) = i=1 Jf ei . It is the block diagonal matrix whose diagonal blocks are the Jf ei . Every matrix is similar to a generalized Jordan normal form, unique up to order of blocks.

4

Probability Computation

In this section we determine, for a matrix A over Fq , the probability that a block ¯ preserves the minimal polynomial projection of the matrix power sequence, A, of A. For the results of this paper the characteristic of the field is not important. However the cardinality q is a key parameter in the results. For simplicity, we restrict to projection to square blocks. It is straightforward to adjust these formulae to the case of rectangular blocking. Since every square matrix A is similar to its generalized Jordan form, we can assume that A = J. The probability for J is reduced to the probability of a primary component in 4.1 and this is further reduced to the probability for a direct sum of companion matrices Cf for an irreducible polynomial f in 4.2.2. L The probability for Cf is calculated in 4.2.3 by reducing it to the probability that a sum of rank 1 matrices over the extension field Fq [x]/hf (x)i is zero. The projections of a block diagonal matrix are sums of independent Lprojections of the blocks. In other words for the U, V projection of A = Ai let Ui , Vi be the blocks of columnsPof U and rows of V conformal with the block ¯ = sizes of the Ai . Then U AV Ui A¯i Vi . We next sketch the basic properties of polynomial action on such sums of sequences.

4.1

Reduction to Primary Components

In this section we show that theL probability of minimal polynomial preservation by the block projection U (J(f ) J(g))V is the product of the probabilities for the projections of J(f ) and J(g) when f and g are relatively prime. Lemma 1. Let S and T be linearly generated matrix sequences. Then minpoly(S+ T ) | lcm(minpoly(S), minpoly(T )). Proof. Let f = minpoly(S), g = minpoly(T ) and d = gcd(f, g). The lemma follows from the observation that (f g/d)(S + T ) = (f g/d)(S) + (f g/d)(T ) = (g/d)(f (S)) + (f /d)(g(T )) = 0.

As an immediate corollary we get equality when f and g are relatively prime. Corollary 1. Let S and T be linearly generated matrix sequences with f = minpoly(S) and g = minpoly(T ) such that gcd(f, g) = 1. Then minpoly(S+T ) = f g.

4

Proof. By the previous lemma, minpoly(S + T ) = f1 g1 with f1 |f and g1 |g. We show that f1 = f and g1 = g. Under our assumptions, 0 = f g1 (S + T ) = f g1 (S) + f g1 (T ) = f g1 (T ) so that f g1 is a generator of T . But if g1 is a proper divisor of g, then f g1 is not in the ideal generated by g, a contradiction. Similarly f1 must equal f .

4.2

Probability for a Primary Component

Next we calculate the L probability that, for an irreducible polynomial f , a block projection of J(f ) = Jf ei preserves the minimal polynomial of J(f ), which e is f for e = max(ei ). 4.2.1

Probability for a Single Jordan Block

Consider the Jordan block J determined by an irreducible power, f e . It turns out that the probability of projection preserving minimal polynomial is independent of e. Thus U Jf e V has minpoly f e with the same likelihood that U Cf V has minpoly f , even though the U, V are of different size in the two cases. This fact and the specific probability for projection of Cf are the subject of the next lemma. Lemma 2. Given a finite field Fq , an irreducible polynomial f (x) ∈ Fq [x] of be the Jordan degree d, an exponent e, and a block size b, let J = Jf e ∈ Fde×de q block of f e and let J¯ be the sequence (I, J, J 2 , . . .). For U ∈ Fb×de and V ∈ Fde×b q q the following properties of minimal polynomials hold. 1. If the entries of V are uniformly random in Fq , then ¯ )) = 1 − 1/q db . Prob(f e = minpoly(JV Note that the probability is independent of e. 2. If V is fixed and the entries of U are uniformly random in Fq , then ¯ ) = minpoly(U JV ¯ )) ≥ 1 − 1/q db , Prob(minpoly(JV with equality if V 6= 0. 3. If U and V are both uniformly random, then ¯ )) = (1 − 1/q db )2 . Prob(f e = minpoly(U JV Proof. For parts 1 and 2, let M be the lower left d × d block of f e−1 (J). M is nonzero and all other parts of f e−1 (J) are zero. Note that Fq [Cf ], the set of polynomials in the companion matrix Cf , is isomorphic to Fq [x]/hf i. Since M is nonzero and a polynomial in Cf , it is nonsingular. Since for any polynomial ¯ = Ag(A), ¯ g and matrix A one has g(A) the lower left blocks of the sequence e−1 ¯ f (J) form the sequence (M, Cf M, Cf2 M, . . .) = C¯f M . 5

¯ is zero except in its lower d rows which are C¯f M V1 , where Part 1. f e−1 (J)V V1 is the top d rows of V . This sequence is nonzero with minimal polynomial f unless V1 = 0 which has probability 1/q db . ¯ Part 2. If V = 0 the inequality is trivially true. For V 6= 0, U f e−1 (J)V is zero except in its lower left d × d corner Ue C¯f M V1 , where V1 is the top d rows of V and Ue is the rightmost d columns of U . Since M is nonsingular, M V1 is uniformly random and the question is reduced to the case of projecting a companion matrix. ¯ is Let C = Cf for irreducible f of degree d. For nonzero V ∈ Fd×b , CV b×d nonzero and has minpoly f . We must show that if U ∈ F is nonzero then ¯ also has minpoly f . Let v be a nonzero column of V . The Krylov matrix U CV KC (v) = (v, Cv, C 2 v, . . . , C d−1 v) has as it’s columns the first d vectors of the ¯ A right nullspace vector corresponds to a lower degree generating sequence Cv. polynomial which also must be a factor of f . Since f is irreducible, the nullspace vector can only be the first unit vector e1 = (1, 0, . . . , 0)T corresponding to the constant polynomial 1. If this nullspace vector exists, v = 0 contradicting our hypothesis, otherwise the Krylov matrix is nonsingular and has no nonzero left ¯ 6= 0 so nullspace vector u either. Thus, for any nonzero vector u, we have uCv ¯ that, for nonzero U , the sequence U Cf V is nonzero and has minimal polynomial f as needed. Of the q db possible U , only U = 0 fails to preserve the minimal polynomial. Part 3. By parts 1 and 2, we have (1 − 1/q db ) probability of preservation of ¯ and minimum polynomial f e , first at right reduction by V to the sequence JV ¯ . then again the same probability at the reduction by U to block sequence U JV

4.2.2

Reduction to a Direct Sum of Companion Matrices L Consider the primary component J = Jf ei , for irreducible f , and let e = max(ei ). We reduce the question of projections preserving minimal polynomial for J to the corresponding question for direct sums of the companion matrix Cf , which is then addressed in the next section. The minimal polynomial of J is f e and that of f e−1 (J) is f . A projection ¯ ¯ ) has minimal U JV preserves minimal polynomial f e if and only if f e−1 (U JV e−1 polynomial f . For all ei < e we have f Jf ei = 0, so it suffices to consider direct sums of Jordan blocks for a single (highest) power f e . Lk Let Je = Jf e be the Jordan block for f e , and let A = i=1 Je be the ¯ is successful if it has the same direct sum of k copies of Je . A projection U AV minimal polynomial as A. This is the same as saying the minimal polynomial ¯ ) is f . We have of f e−1 (U AV ¯ ) = U f e−1 (A)V ¯ = f e−1 (U AV

k X i=0

Ui f e−1 (J¯e )Vi =

k X

Ui,e C¯f V˜i,1 .

i=0

For the last expression Ui,e is the rightmost block of Ui and V˜i,1 is the top block of M Vi . The equality follows from the observation in the proof of lemma 2 that 6

¯ is the sequence that has C¯f M (M nonsingular) in the lower left block f e−1 (J) and zero elsewhere. Thus the probability of success in projection is the same as that for projection of direct sums of the companion matrix Cf . 4.2.3

Probability for a Direct Sum of Companion Matrices Lt To determine the probability that a block projection of A = i=1 Cf , f irreducible of degree d, preserves the minimal polynomial of A we need to determine t X the probability that Ui C¯f Vi = 0. We show that this is equivalent to the i=1

probability that a sum of rank one matrices over K = Fq [x]/hf (x)i is zero and establish a recurrence relation for this probability in Corollary 2. The connection between sums of sequences U C¯f V and sums of rank one matrices over the extension field K is obtained through the observation that for column vectors u, v, one has uT C¯f v = uT ρ(v) where ρ is the regular matrix representation of K, i.e. ρ(v)u = vu in K. The vectors u and v can be interpreted Pd−1 as elements of K by associating them with the polynomials u(x) = i=0 ui xi Pd−1 and v(x) = i=0 vi xi . Moreover, if {1, x, x2 , . . . , xd−1 } is chosen as a basis for Pd−1 Pd−1 K over F, then ρ(x) = Cf and ρ(v) = i=0 vi ρ(x)i = i=0 vi Cfi . Letting C = Cf , the sequence uT C¯f v is equal to uT (v, Cv, C 2 v, . . . , C d−1 v), which is uT KC (v), where KC (v) is the Krylov matrix whose columns are C i v. The following lemma shows that KC (v) = ρ(v) and establishes the connection uT C¯f v = uT ρ(v). Lemma 3. Let f be an irreducible polynomial and K = F[x]/hf i be the extension field defined by f . Let ρ be the regular representation of K and C = Cf the Pd−1 companion matrix of f . Then ρ(v) = j=0 vj C j = KC (v). Proof. Let ej be the vector with a one in the j-th location and zeros elsewhere. Then, abusing notation, ρ(v)ej = v(x)xj (mod f ) and KC (v)ej = C j v = xj v(x)(mod f ). Since this is true for arbitrary j the lemma is proved. Let U and V be b × d and d × b matrices over F. Let ui be the i-th row ¯ of b × b matrices can of U and vj be j-th column of V . The sequence U CV be viewed as a b × b matrix of sequences whose (i, j) element is equal, by the discussion above to ui ρ(vj )T . This matrix can be mapped, to the b × b matrix over K whose (i, j) element is the product ui vj = ρ(vj )ui . This is the outer product, U V T , of U and V , viewed as a column vector and a row vector over K respectively. Hence it is a rank one matrix over K provided neither U nor V is zero. Since any rank one matrix is an outer product, this mapping can be ¯ with rank one reversed and there is a one to one association of sequences U CV matrices over K. To show that this mapping preserves the probability that the block projection U AV preserves the minimum polynomial of A, we must show that if Pt ¯ U k=1 k Cf Vk = 0 then there is a sum of t rank one matrices over K, equivalently outer products, whose sum is the zero matrix and vice versa. This will be 7

shown using the fact that the transpose ρ(v)T is similar to ρ(v), which follows from the next lemma. Lemma 4. Given an irreducible monic polynomial f ∈ Fq [x] of degree d, there exists a symmetric non-singular matrix P such that P −1 Cfk P = (Cfk )T for all 0 ≤ k < d. Proof. Let Hn (a1 , a2 . . . , an , an+1 , . . . , a2n−1 ) denote the Hankel matrix with first row (a1 ,a2 , . . . , an ) and last row (an , an+1 , . . . , a2n−1 ). For example a b H2 (a, b, c) = . Let P0 = Id and, for 1 ≤ k < d, let b c Pk = Hk (0, . . . , 0, −f0 , −f1 , −f2 , . . . , −fk−1 ) ⊕ Hd−k (f2 , f3 , . . . , fd−k , 1, 0, . . . , 0). Then define P as P = P1 = −f0 ⊕ Hd−1 (f2 , f3 , . . . , fd−1 , 1, 0, . . . , 0). As can be readily shown by induction, Cfk P = Cf Pk = Pk+1 for 0 ≤ k < d. Because Cfk P is symmetric, and P is symmetric, Cfk P = P (Cfk )T and −1 k P Cf P = (Cfk )T . Theorem 1. Given an irreducible monic polynomial f ∈ Fq [x] and it’s extension field K = Fq [x]/hf (x)i, there exists a one-to-one, onto mapping from the b × b projections of C¯f to Kb×b that preserves zero sums. Proof. The previous discussion shows that the mapping U C¯f V → U V T from b × b projections of C¯f onto rank one matrices over K is one-to-one. Let uk,i and vk,j be the i-th row of Uk and and the j-th column of Vk , respectively, and Pt let P −1 ρ(v)P = ρ(v)T . Assume k=1 Uk C¯f Vk = 0. Then t X

uTk,i C¯f vk,j = 0



k=1



t X

uTk,i P P −1 ρ(vk,j )P P −1 = 0



k=1



t X

(uTk,i P )ρ(vk,j )T = 0

k=1



t X k=1 t X k=1 t X

uTk,i ρ(vk,j ) = 0 uTk,i P ρ(vk,j )T P −1 = 0 u ˜k,i · vk,j = 0,

k=1

where u ˜k,i = (uTk,i P ). ˜k be the vector whose i-th row is uk,i then the corresponding sum of outer Let U Pt ˜k V T = 0. Because P is invertible, the argument can be done projects k=1 U k in reverse, and for any zero sum of rank one matrices over K we can construct a sum of projections equal to zero. Thus the probability that

t X

U C¯f V = 0 is the probability that randomly

i=1

selected outer products over K sum to zero. For the basic result on these probabilities we will use this lemma on rank one updates. 8

Lemma 5. Let r, s ≥ 0 be given and consider rank one updates to A = Ir ⊕ 0s . For conformally blocked column vectors u = (uT1 , uT2 )T , v = (v1T , v2T )T ∈ Fr × Fs . we have that rank(A + uv T ) = r − 1 if and only if uT1 v1 = −1 and u2 , v2 are both zero, and rank(A + uv T ) = r + 1 if and only if u2 , v2 are both nonzero. Proof. Without loss of generality (orthogonal change of basis) we may restrict attention to the case that u1 = αer and u2 = βer+1 , where ei is the i-th unit vector, α = 0 if u1 = 0 and α = 1 otherwise, and similarly for β vis a vis u2 . Suppose that in this basis v = (w1 , . . . , wr , zr+1 , . . . , zn )T . Then   0 ... 0 1 ... 0  .. .. ..  .. .. ..  . . . . .  .     αw . . . 1 + αw αz . . . αz 1 r r+1 n  T  (Ir ⊕ 0) + uv =  . βw . . . βw βz . . . βz 1 r r+1 n    . .. ..  .. .. ..  .. . . . .  . ... 0 0 0 0 The rank of Ir + u1 v1T is r − 1 just in case uT1 v1 = −1 (Meyer, 2000). In our setting this condition is that αwr = −1. We see that, for a rank of r-1, we must have that αwr = −1 and β, z both zero. For rank r + 1 it is clearly necessary that both of β, z are  nonzero. It is  also sufficient because for zi 6= 0 the order 1 + αwr αzi r + 1 minor Ir−1 ⊕ has determinant βzi 6= 0. These conditions βwr βzi translate into the statements of the lemma before the change of basis. be of rank r, and let u, v be uniformly random in Theorem 2. Let A ∈ Fn×n q Fnq . Then, 1. the probability that rank(A + uv T ) = r − 1 is q r−1 (q r − 1) , q 2n

D(r) =

2. the probability that rank(A + uv T ) = r + 1 is U (r) =

(q n−r − 1)2 , q 2(n−r)

3. the probability that rank(A + uv T ) = r is N (r) = 1 − D(r) − U (r) ≥ with equality when r = 0.

9

2q n − 1 , q 2n

Proof. There exist nonsingular P, Q such that P AQ = Ir ⊕0 and P (A+uv T )Q = Ir ⊕ 0 + (P u)(QT v)T . Since P u and QT v are uniformly random when u, v are, we may assume without loss of generality that A = Ir ⊕ 0. For part 1, by the preceeding lemma, the rank of Ir ⊕ 0 + uv T is only decreased if both u, v are zero in their last n − r rows and uT v = −1. For u, v ∈ Fr , uT v = −1 P only when u 6= 0 and we have, for the first i such that r ui 6= 0, that vi = u−1 i j6=i uj vj . Counting, there are q − 1 possible u and then q r−1 v’s satisfying the conditions. The stated probability follows. For part 2, by the preceeding lemma, the rank is increased only if the last n−r −1)2 n − r rows of u and v are both nonzero. The probability of this is (qq2(n−r) . For the part 3 inequality, if the sign is changed and 1 is added to both sides, n 2 n r 2 ) Note that U (r) = (q q−q and the inequality becomes D(r) + U (r) ≤ (q q−1) 2n 2n D(r) ≤

(q r −1)2 q 2n .

Apply the triangle inequality.

Corollary 2. Let A =

t X

ui viT , for uniformly random ui , vi ∈ Fnq , and let

i=1

D(r), U (r), and N (r) be as in Theorem 2. Define Pt (r) to be the probability that rank(A) = r. Then Pt (r) satisfies the recurrence relation 2q n − 1 , q 2n (q n − 1)2 , P1 (1) = q 2n 2q n − 1 q−1 Pt+1 (0) = Pt (0) + Pt (1) 2n , q 2n q Pt+1 (r) = Pt (r − 1)U (r − 1) + Pt (r)N (r) + Pt (r + 1)D(r + 1), for r > 0. P1 (0)

=

Proof. The general recurrence is evident from the fact that a rank one update can change the rank by at most one. Also P1 (0) = N (0), and P1 (1) = U (0). Finally the Pt+1 (0) case is the general case without the impossible U (−1) term and the explicit formulas for N (0) and D(1). These probabilities apply as well to the preimage of our mapping (block projections of direct sums of companion matrices), which leads to the next theorem. Theorem 3. Let f be an irreducible polynomial over Fq , let e1 = . . . = es ≥ es+1 ≥ . . . ≥ et . and let U, V be a projection. Let A be similar to Jf e1 ⊕· · ·⊕Jf et . Then ¯ )) = 1 − Ps (0) ≥ 1 − P1 (0). Prob(minpoly(A) = minpoly(U AV

10

Proof. As discussed in Section 4.2.2, the probability of minimal polynomial preservation is identical to that for B = ⊕si=1 Cf (Jf e for e < e1 is irrelevant and the probability for a Jf e1 is that for a Cf ). By Theorem 6 the probability that a projection of form B fails is precisely Ps (0). For the inequality, in all cases, Ps (1) ≤ 1 − Ps (0), therefore, q−1 2q b − 1 + Ps (1) 2b q 2b q b q−1 2q − 1 + (1 − Ps (0)) 2b ≤ Ps (0) q 2b q b 2q − q q − 1 = Ps (0) + 2b q 2b q

Ps+1 (0)

b

= Ps (0)

Because Ps (0) 2qq2b−q +

q−1 q 2b

is linear with positive slope, the maximum value

occurs when Ps (0) = 1. When Ps (0) = 1, Ps+1 (0) ≤ P1 (0) ≥ Ps (0), for all s.

2q b −1 q 2b

= P1 (0). Therefore,

Given theorem 3 and the recurrence relations of corollary 2 we can exactly compute the probability that minimal polynomial is preserved in a b × b block projection of a matrix with known Jordan form. This is done in the following examples.

4.3

Examples

These five examples illustrate the effect of varying matrix structure and block ¯ ). Comparing A1 with size on the probability that minpoly(A) = minpoly(U AV A2 , and A2 with A3 shows how having repeated blocks increases the probability of preserving the minimal polynomial, and how higher multiplicity blocks do not affect the probability. A2 and A4 show how only the highest multiplicity blocks of a particular factor affect the probability of preserving the minimal polynomial. Finally, A5 is a worst case for the matrix order 5 and field F7 . Table 1 shows the probabilities for these examples with varying blocksize.    A1 =   

0 1 1 4 0 0 0 0 0 0

0 0 3 0 0

0 0 0 3 0 

  A4 =   

0 0 0 0 3

0 1 1 4 0 0 0 0 0 0





     , A2 =      0 0 3 1 0

0 0 0 3 0

0 0 0 0 3

0 1 1 0 0 

1 4 0 1 0

0 0 0 1 0

0 0 1 4 0 

     , A5 =     

11

1 0 0 0 0

0 0 0 0 3 0 2 0 0 0





     , A3 =      0 0 3 0 0

0 0 0 4 0

0 0 0 0 5

0 1 1 4 0 0 0 0 0 0    .  

0 0 0 1 0

0 0 1 4 0

0 0 0 0 3

   ,  

Table 1: Probability of Success for q=7 b=1 b=2 b=3 b=4 A1 0.820 0.998 0.99998 0.9999996 A2 0.705 0.959 0.994 0.9992 A3 0.719 0.960 0.994 0.9992 A4 0.705 0.959 0.994 0.9992 A5 0.214 0.814 0.971 0.996

5

Worst Case

Given the probabilities of minimum polynomial preservation under projection determined in section 4, it is intuitively clear that the lowest probability of success would occur when there are many elementary divisors and the degrees of the irreducibles are as small as possible. This is true and is precisely stated in theorem 4 below. First we need several lemmas concerning direct sums of Jordan blocks. and let Pq,b (A) denote the probability that minpoly(A) = Let A ∈ Fn×n q ¯ ) for uniformly random U ∈ Fqb×n and V ∈ Fn×b . minpoly(U AV q Lemma 6. Let f be an irreducible polynomial over Fq , let e1 = . . . = es ≥ es+1 ≥ . . . ≥ et , and let b be the projection block size. Then Pq,b (Jf e1 +···+et ) ≤ Pq,b (Jf e1 ⊕ · · · ⊕ Jf et ) = Pq,b (Jf e1 ⊕ · · · ⊕ Jf es ) Proof. Setting deg(f ) = d, this follows from Part 3 of Lemma 2 and Theorem 3, since Pq,b (Jf e1 +···+et ) = 1 − P1 (0) ≤ 1 − Ps (0) = Pq,b (Jf e1 ⊕ · · · ⊕ Jf et ). Note that only the number of occurrences of the largest block effect the probability of success. Lemma 7. Let f be an irreducible polynomial over Fq of degree d, let f1 , . . . , fe be distinct irreducible polynomials of degree d over Fq , and let b be the projection block size. Then, Pq,b (Jf1 ⊕ · · · ⊕ Jfe ) ≤ Pq,b (Jf e ). Proof. This follows from Part 3 of Lemma 2 since Pq,b (Jf e ) = Pq,b (Jfi ) and (1 − 1/q db )2 < 1. Lemma 8. Let f1 and f2 be irreducible polynomials over Fq of degree d1 and d2 respectively and let b be any projection block size. Then, if d1 ≤ d2 , Pq,b (Jf1 ) ≤ Pq,b (Jf2 ).

12

Proof. The Lemma follows from Part 3 of Lemma 2 since (1 − 1/q d1 b )2 ≤ (1 − 1/q d2 b )2 . For matrix dimension n, define Pq,b (n) = minA∈Fn×n Pq,b (A). This is the q worst case probability that an n × n matrix has minimal polynomial preserved by uniform random projection to a b × b sequence. We wish to consider matrices with the maximal number of elementary divisors. Define Lq (m) to be the number of monic irreducible polynomials of degree m. By the well known formula of Gauss (1981), X Lq (m) := 1/m µ(m/d)q d , d|m

where µ is the M¨ obus function. Asymptotically Lq (m) converges to q m /m. By definition, µ(a) = (−1)k for square free a with k distinct prime factors and µ(a) = 0 otherwise. The degree of the product of all the monic irreducible polynomials of degree d is then dLq (d). When we want to have a maximal number of irreducible factors in a product of degree n, we will use Lq (1), Lq (2), . . . , Lq (m − 1) etc., until the contribution of Lq (m) no longer fits within the degree n. In that case we finish with as many of the degree m irreducibles as will fit. For this purpose we adopt the notation %! $ Pm−1 n − d=1 dLq (d) . Lq (n, m) := min Lq (m), m Theorem 4. Let F = Fq Pm−1 Pm For m such that d=1 dLq (d) ≤ n < d=1 dLq (d), Pq,b (n) =

m Y

(1 − 1/q db )2Lq (n,m) .

d=1

Pm

Moreover, for r = n − d=1 dLq (m, d), when r ≡ 0 (mod m), the minimum occurs for all matrices whose elementary divisors are irreducible (not powers thereof ), distinct, and with degree as small as possible. When r 6≡ 0 (mod m) the minimum occurs when the elementary divisors involve exactly the same irreducibles as in the r ≡ 0 (mod m) case, but with some elementary divisors being powers so that that the total degree is brought to n. Proof. Let A ∈ Fn×n and let f1e1 , . . . , ftet be irreducible powers equal to the q invariant factors of A. If Pq,b (A) is minimal, then by Lemmas 6,7,8 we can assume that the fi are P Pmdistinct and have as small degrees as possible. Since m−1 dL (d) ≤ n < q d=1 d=1 dLq (d), this assumption implies that all irreducibles of degree less than m have been exhausted. If additional polynomials of degree m can be added to obtain an n×n matrix this will lead to the minimal probability since adding any irreducibles of higher degree will, by lemma 2 reduce the total probability by a lesser amount. In this case all of the exponents, ei will be equal to one. If r is not 0, then an n × n matrix can be obtained by increasing some of the exponents, ei without changing 13

the probability. This, again by Lemma 2, will lead to a smaller probability than those obtained by removing smaller degree polynomials and adding a polynomial of degree m or higher.

5.1

Approximations

Theorem 4 can be simplified using the approximations Lq (m) ≈ q m /m and (1 − 1/a)a ≈ 1/e. Corollary 3. For field cardinality q, matrix dimension n, and projection block dimension b, − 2 H Pq,b (n) ≈ e qb m , where Hm is the m-th harmonic number.  Also, for large primes, the formula of theorem 4 simplifies quite a bit because there are plenty of small degree irreducibles. In the next corollary we consider (a) the case in which there are n linear irreducibles and (b) a situation in which the worst case probability will be defined by linear and quadratic irreducibles. Corollary 4. For field cardinality q, matrix dimension n, and projection block dimension b, if q ≥ n then b

Pq,b (n) = (1 − 1/q b )2n ≈ e−2n/q . If n > q ≥ n1/2 then Pq,b (n) = (1 − 1/q b )2q (1 − 1/q 2b )n−q ≈ e−(2/q

b−1

+(n−q)/q 2b )

.



5.2

Comparison to prevous bounds

When b = 1 and we are only concerned with projection on one side, the first formula of the corrolary simplifies to (1 − 1/q)n = 1 − n/q + . . .. The bound given by Kaltofen and Pan (Kaltofen and Pan, 1991; Kaltofen and Saunders, ¯ = minpoly(Av) ¯ is the first two terms 1991) for the probability of minpoly(uAv) of this expansion, though developed with a very different proof. For small primes, Wiedemann (1986)(proposition 3) treats the case b = 1 and he fixes the projection on one side because he is interested in linear system ¯ For small q, his formula, 1/(6 logq (n)), solving and thus in the sequence Ab. computed with some approximation, is nonetheless quite close to our exact formula. However as q approaches n the discrepancy with our exact formula increases. At the large/small crossover, q = n, Kaltofen/Pan’s lower bound is 0, Wiedemann’s is 1/6, and ours is 1/e. The Kaltofen/Pan probability bound improves as q grows larger from n. The Wiedemann bound becomes more accurate as q goes down from n. But the area q ≈ n is of some practical importance. In integer matrix algorithms where the finite field used is a choice 14

Figure 1: Probability of Failure to Preserve Minimal Polynomial vs Block Size and Field Cardinality of the algorithm, sometimes practical considerations of efficient field arithmetic encourages the use of primes in the vicinity of n. For instance, exact arithmetic in double precision and using BLAS (Dumas et al., 2008) works well with q ∈ 106 ..107 . Sparse matrices of order n in that range are tractable. Our bound may help justify the use of such primes. But the primary value we see in our analysis here is the understanding it gives of the value of blocking, b > 1. Figure 1 shows the bounds for the worst case probability that a random projection will preserve the minimal polynomial of a 8 ×108 for various fields and projection block sizes. It shows that matrix A ∈ F10 q the probability of finding the minimal polynomial correctly under projection converges rapidly to 1 as the projected block size increases.

6

Conclusion

We have drawn a precise connection between the elementary divisors of a matrix and the probability that a random projection, as done in the (blocked or unblocked) Wiedemann algorithms, preserves the minimal polynomial. We provide sharp formulas both for the case where the elementary divisor structure of the matrix is known and for the worst case. As indicated in figure 1 for the worst case, a blocking size of 22 assures probability of success greater than 1−10−6 for all finite fields and all matrix dimensions up to 108 . The probability

15

decreases very slowly as matrix dimension grows and, in fact, Further probability computations show that the one in a million bound on failure applies to blocking size 22 with much larger matrix dimensions as well. Looking forward, it would be worthwhile to extend the analysis to apply to the determination of additional invariant factors. Blocking is known to be useful for finding and exploiting them. For example, some rank and Frobenius form algorithms are based on block Wiedemann (Eberly, 2000b,a) . Also, we have not addressed preconditioners. The preconditioners such as diagonal, Toeplitz, butterfly (Chen et al., 2002), either apply only for large fields or have only large field analyses. One can generally use an extension field to get the requisite cardinality, but the computational cost is high. Block algorithms hold much promise here and analysis to support them over small fields will be valuable.

References Chen, L., Eberly, W., Kaltofen, E., Turner, W., Saunders, B. D., Villard, G., 2002. Efficient matrix preconditioners for black box linear algebra. LAA 343344, 2002, 119–146. Coppersmith, D., 1995. Solving homegeneous linear equations over GF (2) via block Wiedemann algorithm. Mathematics of Computation 62 (205), 333–350. Dumas, J.-G., Giorgi, P., Pernet, C., 2008. Dense linear algebra over word-size prime fields: the fflas and ffpack packages. ACM Trans. Math. Softw. 35 (3), 1–42. Eberly, W., 2000a. Asymptotically efficient algorithms for the frobenius form. Technical report, Department of Computer Science, University of Calgary. Eberly, W., 2000b. Black box frobenius decompositions over small fields. In: Proc. of ISSAC’00. ACM Press, pp. 106–113. Eberly, W., Giesbrecht, M., Giorgi, P., Storjohann, A., Villard, G., 2006. Solving sparse rational linear systems. In: Proc. of ISSAC’06. ACM Press, pp. 63–70. ¨ Gauss, C. F., 1981. Untersuchungen Uber H¨ohere Arithmetik, second edition, reprinted. Chelsea. Giorgi, P., Jeannerod, C.-P., Villard, G., 2003. On the complexity of polynomial matrix computations. In: Proc. of ISSAC’03. pp. 135–142. Kaltofen, E., 1995. Analysis of Coppersmith’s block Wiedemann algorithm for the parallel solution of sparse linear systems. Mathematics of Computation 64 (210), 777–806. Kaltofen, E., Pan, V., 1991. Processor efficient parallel solution of linear systems over an abstract field. In: Third annual ACM Symposium on Parallel Algorithms and Architectures. ACM Press, pp. 180–191.

16

Kaltofen, E., Saunders, B. D., 1991. On Wiedemann’s method of solving sparse linear systems. In: Proc. AAECC-9. Vol. 539 of Lect. Notes Comput. Sci. Springer Verlag, pp. 29–38. Meyer, C. D. (Ed.), 2000. Matrix analysis and applied linear algebra. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA. Villard, G., 1997. Further analysis of coppersmith’s block wiedemann algorithm for the solution of sparse linear systems. In: International Symposium on Symbolic and Algebraic Computation. ACM Press, pp. 32–39. Villard, G., 1999. Block solution of sparse linear systems over gf(q): the singular case. Sigsam Bulletin 32 (4), 10–12. Wiedemann, D., 1986. Solving sparse linear equations over finite fields. IEEE Trans. Inform. Theory 32, 54–62. Yuhasz, G. L., 2009. Berlekamp/massey algorithms for linearly generated matrix sequences. Ph.D. thesis, North Carolina State University.

17