Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
ON MERGING NETWORKS
A Technical Report By Levy Tamir Litman Ami Faculty of computer science, Technion, Haifa, Israel {levyt,litman @ cs.technion.ac.il } Abstract Many algorithms for oblivious merging have been invented. However, it is a common phenomena that radically different algorithms produce identical networks. In order to prevent a duplication of effort it is desired to tell when different algorithms produce identical networks. Our work studies this question and out main result is a criteria which shows that all published merging networks belong to a family of networks which is a generalization of Batcher’s odd/even network. A significant advantage of our criteria is that it does not require a complete understanding of the technique in question; in fact, a very superficial understanding of the algorithm suffices to establish that a merging technique produces generalized Batcher merging network. Our criteria is as follows: A network has the At Most One Path Property, or is AMOP, if it has at most one path from every input to every output. A comparator of a network is degenerate if its incoming edges can be named e0 and e00 s.t. under any valid input the value transmitted on e0 is smaller or equal to the value transmitted on e00 . Our main theorem states the following characterization of the Batcher merging networks. A merging network is a Batcher merging network iff it is AMOP, it has no degenerate comparators and its width is a power of two. We survey several published merging techniques and use this criteria to show that all these techniques produce Batcher merging networks. Additional contributions of this work are presented in the introduction.
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
Table of Contents Table of Contents
ii
1 Introduction
1
2 Isomorphism of networks
5
3 Keys and Edges
9
4 The bypass transformation 4.1 Bypass charted by a vector . . 4.2 Splitting of a network . . . . . 4.2.1 Producing half merging 4.2.2 Pruning a network . . 4.3 Minors of merging networks .
. . . . . . . . . . . . networks . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
12 14 16 17 18 19
5 The Batcher merging networks
22
6 Congruent functions
28
7 The Input Cone
31
8 The i.m.f. of a split
35
9 Degenerate comparators for half merging.
39
10 Characterization of the Batcher merging networks
42
11 oblivious algorithms
44
ii
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
12 Variety of Merging Techniques 12.1 pre-recursive and post-recursive 12.2 Powerful building blocks . . . . 12.2.1 Matrix technique . . . . 12.2.2 Modulo Merge . . . . . .
. . . .
49 50 53 54 55
13 Conclusive sets 13.1 conclusiveness by monotonic functions . . . . . . . . . . . . . . . . . 13.2 conclusiveness by agreement . . . . . . . . . . . . . . . . . . . . . . .
56 57 61
14 The unique bitonic sorter
63
15 Zipper sorters 15.1 Tolerance-halvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2 The zipper merging technique . . . . . . . . . . . . . . . . . . . . . . 15.3 Unique zipper sorter . . . . . . . . . . . . . . . . . . . . . . . . . . .
67 68 71 72
16 Merging by tri-section 16.1 The asymmetric tri-section method . . . . . . . . . . . . . . . . . . . 16.2 The symmetric tri-section method . . . . . . . . . . . . . . . . . . . .
75 76 79
Bibliography
81
. . . .
iii
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
Chapter 1 Introduction Many ingenious algorithms for oblivious merging have been invented. However, it is a common phenomena that radically different algorithms produce identical networks. For many applications the networks themselves are of importance and their properties are investigated. In order to prevent a duplication of effort it is desired to tell when different algorithms produce identical networks. Our work studies this question and our main result is a criteria which shows that all published merging networks belong to a family of networks which is a natural generalization of Batcher’s odd/even network [1]. A significant advantage of our criteria is that it does not require a complete understanding of the technique in question; in fact, a very superficial understanding of the algorithm suffices to establish that a merging technique produces generalized Batcher merging network. We do not know of any other work on the general question of isomorphism of arbitrary merging networks. There are some works on specific merging networks. Bilardi [6] have shown that the Bitonic merging network which was invented by Batcher [1] is isomorphic to the Balanced merging network invented by Dowd, Perl, Rudolph and Sacks. Dalpiaz and Rizzi [17] shown that the bitonic merging network is isomorphic to a merging network presented by Leighton [13, pp 623]. Batcher, in his seminal paper [1], introduced the idea of recursive construction of merging networks; moreover, the depth of his networks is much smaller than previous ones and in fact is minimal. Batcher actually presented two networks. The first is his odd/even network described shortly and the second, which is based on his concept of bitonic sequences, is discussed later in Chapter 14. Batcher’s odd/even merging technique works as follows: Each of the input sequences is partitioned into its even part and its odd part. The even part of one sequence is merged with the even part of the other sequence recursively and similarly, the odd part of one sequence is recursively merged with the odd part of the other 1
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
2
sequence. As Batcher has shown [1], the two resulting sequences can be combined into a single sorted sequence by a network of depth one. Moreover, there are exactly two such networks of depth one but one of them has a degenerate comparator (defined shortly) and we insist on using the other network. A slight variant of this method, mentioned by Leighton [13, pp 623], recursively merges the even part of each input sequence with the odd part of the other sequence. Again the two resulting sequences can be combined by a network of depth one. In this variant there exists only one such depth one network. We refer to the family of networks produced by allowing each of the above two variants in each step of the recursive algorithm as the Batcher merging networks. By our terminology, a Batcher merging network merges two sequences of the same length which is a power of two. All published merging networks we encountered are members of this family. As said, in this work we provide an easy criteria to establish that a merging network is a Batcher merging network. To this end, a network has the At Most One Path Property, or is AMOP, if it has at most one path from every input to every output. A comparator of a network is degenerate if its incoming edges can be named e0 and e00 s.t. under any valid input the value transmitted on e0 is smaller or equal to the value transmitted on e00 . A merging network is non-degenerate if it has no degenerate comparators. Our criteria is a conjunction of the following requirements: AMOP, non-degenerate and a width which is a power of two. Our main theorem is that a merging network is a Batcher merging network iff it satisfies the above criteria. We survey several published merging techniques and use this criteria to show that all these techniques produce Batcher merging networks. Each of the three ingredients of our criteria is mandatory for the above characterization. Consider the AMOP property. A counter-example is a network that merges two sorted sequences ha0 , a1 , a2 , a3 i and hb0 , b1 , b2 , b3 i as follows. A comparator c0 sorts the pair (a0 , b0 ) and a comparator c00 sorts the pair (a2 , b1 ). The four keys emerging from these comparators together with the rest of the input keys are now sorted by an arbitrary sorting network. The resulting network is a merging network which may have degenerate comparators. Clearly, all the degenerate comparators can be omitted without disturbing the functionality of the network (This is discussed in Lemma 4.1.2). The comparators c0 and c00 are not degenerate and are not omitted. The resulting merging network has no degenerate comparators. It is not hard to see that no Batcher merging network has comparators c0 and c00 as above. Next, consider the property of having no degenerate comparators. A counterexample of an AMOP merging network which is not a Batcher merging network is depicted in Figure 2.1. A natural question is whether the AMOP property can be replaced by the condition of having minimal depth. This question is answered negatively in Chapter 16, but a similar question regarding the stronger functionality of sorting bitonic sequences
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
3
(defined in Chapter 14) is answered positively. In fact, we show that for any n = 2k , there is a unique bitonic sorter of width n and of minimal depth. This statement does not hold when n is not a power of two. Moreover, for such general n, the minimal depth of a bitonic sorter of width n is not a monotonic function of n. Since most networks are described indirectly via an oblivious algorithm, this work studies the concept of an oblivious algorithm. We present several oblivious models of computation and emphasize their differences and their relative computational power. We present a model which we feel is more natural than the generally accepted one used in literature. Our model is stronger; yet, it maintains the famous “0 − 1 principle” and all its known generalizations. This work presents and uses some generalizations of the known “0-1 principle” [10, pp 224]. Instead of addressing specific functionalities such as sorting [10] or merging [11] we consider a more general context. For a given functionality, we search for a small set V (called a conclusive set) of valid input vectors s.t. any network has this functionality w.r.t. V iff it has this functionality w.r.t. all its valid input vectors. Such conclusive sets simplify the design and analysis of oblivious algorithms of all mentioned computations models, among them networks of comparators. For the functionality of merging two sequences of length n each, we present a conclusive set having n + 1 vectors. This set is much smaller than a conclusive set of 0 − 1 vectors since, as we show, such a set of 0 − 1 vectors has at least (n + 1)2 − 2 vectors. We also introduce two new merging techniques. The first called “zipper sorting” is based on a certain way to quantify the amount of “unsortedness” of a pair of sorted sequences and on a network of depth one which halves this quantity. This technique produces minimal depth merging networks which, not surprisingly, are Batcher merging networks. The second technique, called “tri-section”, is based on partitioning, by a depth one network, the input of a merging network into three sequences, such that every element in one sequence is less than or equal to every element in the next sequence. This technique enables us to build minimal depth merging network which produce some of the output keys faster then the depth of the entire network. Namely, for any k, significantly smaller than n, we construct a network of minimal depth which produces the k lowest and k highest keys in a delay of log(k) + 1 comparators. This technique also gives rise to minimal depth merging networks which have no recursive structure and are clearly not Batcher merging networks. The outline of this work is as follows: Chapter 2 sets the preliminary definitions of our work, what is a network, and when are two networks identical. Chapter 3 studies the key values that are transmitted on the edges of a network under certain scenarios. Chapter 4 presents a technique for deriving networks from larger ones. Chapter 5 elaborates on the Batcher merging networks, establishes some of their properties and computes the number of such non-isomorphic networks. Chapters 6 introduces
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
4
the notion of input matching function of a merging network and characterizes input matching functions of Batcher merging networks. Chapters 7,8 and 9 study special subnetworks of Batcher merging networks which are needed to establish the above criteria in Chapter 10. Chapter 11 discusses several oblivious models of computation. Chapter 12 surveys several published oblivious merging algorithms and shows them to produce Batcher merging networks. Chapter 13 presents useful generalizations of the “0-1 principle”. Chapter 14 establishes the uniqueness of minimal depth bitonic sorters. Chapters 15 presents the “zipper sorting” technique and Chapter 16 presents the “tri-section” technique.
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
Chapter 2 Isomorphism of networks This chapter formalizes the concept of a network of comparators and defines when two such networks are identical (isomorphic). It shows that any two networks that perform the same comparisons for a certain input are isomorphic. A comparator is a combinational device that sorts two keys. Namely, it has two incoming edges and it receives a key from each one of them. It has two outgoing edges of distinct types; a min edge and a max edge. A comparator transmits the minimal key on the min edge and the maximal key on the max edge. A network of comparators is an acyclic network of these devices. These networks are useful for performing operation on keys such as merging or sorting. Our concept of a network encompasses both the structure of the network and the manner in which the network is used to process keys. The structure of the network is manifested by its underlying graph which is a directed acyclic graph having three types of vertices and three types of edges. The type of a vertex is determined by its indegree and its outdegree, as follows: 1. An input vertex has no incoming edges and one outgoing edge. 2. An output vertex has one incoming edge and no outgoing edges. 3. An internal vertex (a comparator ) has two incoming edges and two outgoing edges. The three types of edges concern the functionality of a comparator. Out of the two edges which exit a comparator, one is of type min and carries the minimal key and the other is of type max and carries the maximal key. Edges of the third type are those which exit an input vertex rather than a comparator. 5
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
6
As said, our concept of a network encompass also the manner in which the network is used to process keys. We use labels to specify how to apply a given arrangement of keys to the network and how to assemble the resulting keys. Our labels are symbols of the form α ˆ i where α range on the lower case Latin letters and i ∈ N. For example, the symbols a ˆ0 , ˆb3 , oˆ7 are labels. We refer to the set of these labels as L. Labels are assigned to the input edges (edges which exit an input vertex) and output edges (edges which enter an output vertex). Consider for example a merging network. Such a network merges two sorted sequences of the same given length n. In this context, we refer to the two input sequences of such a network by ~a = ha0 , a1 . . . an−1 i and ~b = hb0 , b1 . . . bn−1 i and to the output sequence as ~o = ho0 , o1 . . . o2n−1 i. The input edges are labelled by {ˆ ai , ˆbi |i ∈ [0, n)} and the output edges are labelled by {ˆ oi , |i ∈ [0, 2n)}. The two ~ sequences ~a and b enter the network according to those labels; that is, the key ai enter the edge labelled a ˆi and the key bi enter the edge labelled ˆbi . Similarly, the output labels denote how to assemble the resulting keys into a single sequence. In all the networks studied in this work the labels on the input edges are non-repeating; that is, distinct input edges have distinct labels; the same holds for the output labels. This enables us to name the input/output edges by their labels and we usually do so. The width of a network is the number of its input edges. Clearly, this equals the number of its output edges. The depth of an edge or a vertex x is the number of comparators along the longest path from any input vertex to x. The depth of a network is the depth of its deepest edge. Note that a network may have no comparators at all and in this case its depth is zero. In our drawings of networks the type of an edge is depicted by the form of its arrowhead; namely a hollow arrowhead depicts a min edge, a solid arrowhead depicts a max edge and an open arrowhead depicts an untyped (input) edge; input and output vertices are omitted and the labels of the corresponding input and output edges are written instead of them. Fig 2.1 is a drawing of two merging networks of width four and of depth two. The keys that the networks process are members of an infinite ordered set called K. The order type of K and the identity of its members are usually irrelevant to our studies. However, it is sometimes convenient to assume that K is the set of the natural numbers. Informally, a vector (of keys) is an arrangement of keys having a certain structure. For example, a vector composed of a pair of sequences of the same given length can be an input of a given merging network; a vector composed of a single sequence is the output of this network. Formally, a vector is a function v : D → K where D is a finite subset of L (the set of labels). The width of the vector is the size of D. When D is the set of (labels of) input edges of a network N , we say that v conforms to N or that v is an input vector of N . Only in this case it is meaningful to apply v to N . This is done by transmitting the key v(e) on each
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
7
a ˆ0 ˆb0
oˆ0 oˆ1 oˆ2
a ˆ1 ˆb1
oˆ3 a
a ˆ0 ˆb0
deg
oˆ0 oˆ1 oˆ2
a ˆ1 ˆb1
oˆ3 b
Figure 2.1: Two merging network of width four and of depth two. The network (b) has one degenerate comparator.
input edge e. (Recall that we name input/output edges by their labels.) Note that the fact that v is an input vector of a certain network implies nothing on the value of the keys in v; it only specify how these keys are structured. We usually refer to a vector in an indirect manner; for example, when ~a and ~b are two sequences of the same length, n, we denote by v , h~a, ~bi the vector composed of these two sequences. Such a vector is called a bisequenced vector. Formally, it is a function as above whose domain is D = {ˆ ai , ˆbi |i ∈ [0, n)} and is defined by v(ˆ ai ) = ai and v(ˆbi ) = bi . When an input vector v is applied to a network N the network produces an output vector which is formally a function from the set of output labels into K. We denote this output vector by T N (v) and refer to T N as the input/output transformation of N . Following mathematical logic, the concept of an isomorphism of networks encompass all aspects of a network that are relevant to our studies. Namely, an isomorphism, π, of a network N1 onto a network N2 is a one-to-one mapping of the vertices and edges of N1 onto the vertices and edges of N2 . This mapping is required, of course, to preserve the connectivity of vertices and edges; i.e. if an edge e enters (exit) the vertex x in N1 , then π(e) enters (exits) the vertex π(x) in N2 ; this mapping is also required to preserve the labels of the input/output edges and the min/max type of the edges. Usually, we do not distinguish between isomorphic networks and consider them identical. Clearly, a network can be drawn in many different ways. We do not study drawing of networks and so isomorphism does not need to preserve this aspect of a network. A vector is non-repeating if it is one-to-one. Let v be a non-repeating input vector of a network N . Under v, a key k of v traverses a path in N . In each comparator along this path, the key k encounters another key. The sequence of keys that k encounters is denoted ~s N (v, k). The next lemma shows that if those sequences of some input vector are equal in two networks then these networks are isomorphic. For N, v and
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
8
k as above, let act(N, v, k) , hk, l0 , ~s N (v, k), l∞ i where l0 and l∞ are the labels of the input edge and output edge that k traverses. The syndrome of v in N is the set synd(N, v) , {act(N, v, k) | k appears in v}. A syndrome of N is a set synd(N, v), for some non-repeating input vector v of N . Note that, in the above definitions, there is no requirement on the values of the keys of v except of the non-repeating requirement; for example, in the case of a merging network, the two sequences in question are not required to be sorted. Theorem 2.0.1. Two networks are isomorphic iff they have a common syndrome. Proof. The Left to right direction is immediate. Consider the other direction and let S be a common syndrome of two networks N1 and N2 . Let S = synd(N1 , v). Clearly, a syndrome encodes the input vector that generates it; hence, S = synd(N2 , v). We show, by induction on the number of comparators in N1 , that N1 and N2 are isomorphic. The case where N1 has no comparators is trivial so assume N1 has at least one comparator. In this case, N1 has a comparator c1 of depth one. ( The two incoming edges of c1 are input edges.) Let the edges d01 and d001 enter the comparator c1 , let l0 and l00 be their labels, let k 0 and k 00 be the two keys traversing those edges and assume, without loss of generality, that k 00 > k 0 . Let e01 and e001 be the min and max edges exiting c1 , respectively. The fact that synd(N1 , v) = synd(N2 , v) imply that the same scenario happens in N2 . Namely, two input edges, d02 and d002 , labelled by l0 and l00 , enter a comparator c2 and (under v) carry the keys k 0 and k 00 , respectively; let e02 and e002 be the min and max ¯1 be the network generated from N1 by: edges exiting c2 . Let N 1. Removing c1 , d01 and d001 . 2. Assigning to e01 and e001 the labels l0 and l00 , respectively. ¯2 is generated from N2 . Clearly, v is an input vector of In the same manner, N ¯ ¯ ¯1 , v) is derived from synd(N1 , v) by removing the N1 and N2 ; furthermore, synd(N ¯2 , v) and so first element of ~s N1 (v, k 0 ) and of ~s N1 (v, k 00 ). The same holds for synd(N ¯ ¯ synd(N1 , v) = synd(N2 , v). By the induction hypothesis, there is an isomorphism g¯ ¯1 onto N ¯2 . of N Let g : N1 → N2 be the extension of g¯ defined by g(c1 ) = c2 , g(d01 ) = d02 and g(d001 ) = d002 . Clearly, g preserves the edges/vertices connectivity; moreover, g preserves the input/output labels and the types of the edges; that is, g is an isomorphism of N1 onto N2 .
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
Chapter 3 Keys and Edges This chapter studies the key values that are transmitted on the edges of a network under certain scenarios. A network N is usually associated with a certain functionality. This functionality specifies a set of input vectors called valid vectors and specifies a condition that the resulting output vector T N (v) should satisfy whenever v is a valid vector. Consider, for example, the functionality of merging. Recall that an input vector v of a merging network is a bisequenced vector. With respect to the functionality of merging, a vector v is valid if it is bisequenced and both ~a and ~b are sorted. We refer to such a vector as a bisorted vector. The merging functionality requires that, for any bisorted vector v, the outcome T N (v) is sorted. This functionality, as any other functionality, does not imply anything on T N (v) when v is a non-valid input vector of N . Let v be an input vector and e an edge of a network N . We denote by V N (e, v) the key transmitted on e when v is applied to N . Let v N be the extension of v over all the edges of N defined by v N (e) = V N (e, v). A key function is a function f : K → K. (Recall that K is the set of possible key values.) A monotonic key function is a key function f such that f (k1 ) ≤ f (k2 ), for any k1 ≤ k2 . Recall that formally a vector is a mapping from a finite subset of L (the set of labels) into K. For a key function f and a vector v define f (v) as the vector u that has the same domain as v, and satisfies u(l) = f (v(l)). In such a case we say that u is a monotonic image of v. The following lemma is attributed by Knuth to W.G. Bouricius [10, pp 224]. Lemma 3.0.2. For any network N and any monotonic key-function f , the functions f and T N commute; that is, f (T N (v)) = T N (f (v)) for every input vector v. This lemma implies the well-known “0-1 principle” [10, pp 224]. A generalization of this principle is presented in Section 13.1. We now present a lemma which is a more versatile version of Lemma 3.0.2. Let g, g 0 : D → K where D is an arbitrary 9
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
10
set. We say that g and g 0 agree on a pair of elements (x, y) of D if x and y can be named z1 and z2 such that g(z1 ) ≤ g(z2 ) and g 0 (z1 ) ≤ g 0 (z2 ). If g and g 0 agree on every pair of elements of D we say that g and g 0 agree. Note that the agree relation is symmetric and reflexive but is not transitive. For example, consider the three sequences h0, 1i, h1, 1i and h1, 0i. The first agree with the second which agree with the third, but the first does not agree with the third. Note that if g 0 is a monotonic image of g then g and g 0 agree; however, the opposite direction does not hold. For example, the two sequences h0, 0, 1i and h0, 1, 1i agree but none of them is a monotonic image of the other. Lemma 3.0.3. Let v and u agree and be input vectors of a network N . Then v N and uN agree; in particular T N (v) and T N (u) agree. Proof. It suffices to show that for every edge x of N there is an input edge y of N such that v N (x) = v N (y) and uN (x) = uN (y). We prove this claim by induction on the depth of x, i.e. the number of comparators on the longest path to x. The case where x is an input edge is trivial. Let x exit a comparator c and let e1 and e2 be the incoming edges of c. Let y1 and y2 be the input edges provided by the induction hypothesis, for the edges e1 and e2 , respectively. The fact that v and u agree on the pair (y1 , y2 ) imply that v N and uN agree on e1 and e2 . Hence, there is an i ∈ {1, 2} such that v N (x) = v N (ei ) = v N (yi ) and uN (x) = uN (ei ) = uN (yi ). For a set V of input vectors of a network N , define V N (e, V ) , {V N (e, v)|v ∈ V }. For a comparator c of N whose incoming edges are e1 and e2 , define V N (c, V ) , V N (e1 , V ) ∪ V N (e2 , V ). A vector v is a permutation if it is non-repeating and the range of v is an interval of integers starting at 0. For a network N , let PN be the set of permutations which are input vectors of N . When the network in question is clear from context we omit it from the above notations and use the shortcuts V(e, v), V(e, V ), V(c, V ) and P. The following lemma was observed by Knuth [10, pp. 239,639]. Lemma 3.0.4. For any edge e of a network N , V(e, PN ) is an interval. We present a variant of this lemma, regarding merging networks. A comparator c is degenerate with respect to a set of input vectors V if its incoming edges can be named e1 and e2 such that under any input vector v ∈ V , v N (e1 ) ≤ v N (e2 ). A comparator c of N is degenerate under a certain functionality if it is degenerate w.r.t. the valid input vectors of this functionality. When the functionality of the network is clear from context we just say that c is degenerate, without specifying this functionality. Let Pbs 2n denote the set of valid input vectors of merging networks of width 2n which are permutations. In other words, Pbs 2n is the set of bisorted permutations of width 2n. For an edge or a vertex x of a merging network M of width 2n define V(x) = V(x, Pbs 2n ).
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
11
Lemma 3.0.5. Let M be a merging network of width 2n. Then: (a) For any edge e of M , V(e) is an interval. (b) Let e0 and e00 enter a non-degenerate comparator c. Then V(e0 ) ∩ V(e00 ) 6= φ (Hence, V(c) is an interval.). Proof. Consider statement (a). The network M can be extended into a sorting network S by appending to it two sorting networks, which generate the two input sequences of M . When the input to S is a permutation then the input to M is a bisorted permutation and any bisorted permutation is generated that way; hence, for S S any edge e of M , V M (e, Pbs 2n ) = V (e, P ). This and Lemma 3.0.4 imply statement (a). Consider statement (b). Any valid vector is a monotonic image of some valid permutation. This and Lemma 3.0.2 imply that if a comparator is non-degenerate w.r.t. the set of all valid vectors then it is non-degenerate w.r.t. the set of valid permutations. Therefore, statement (b) follows statement (a). Two edges e1 and e2 of a network N are called disagreeable iff there are two valid input vectors of N , v and u, such that v N (e1 ) > v N (e2 ) and uN (e1 ) < uN (e2 ); that is, u and v do not agree on the pair e1 and e2 . By Lemma 3.0.5(a), if e1 and e2 are disagreeable edges of a merging network M then V(e1 ) and V(e2 ) are non-disjoint intervals. This implies: Lemma 3.0.6. Let e1 and e2 be two disagreeable edges of a merging network M . Then there is an output edge which is reachable both from e1 and e2 . Two keys, k 0 and k 00 , are adjacent in a bisorted vector v = h~a, ~bi iff they are distinct, each of them appears exactly once in v, one of them appears in ~a and the other appears in ~b and v contain no key which is strictly between k 0 and k 00 . Lemma 3.0.7. Let v be an input vector of a merging network M and let k 0 and k 00 be two adjacent keys in v. Then, under v, k 0 and k 00 encounter each other. Proof. Let u be the input vector of M derived from v by swapping the keys k 0 and k 00 . If k 0 and k 00 do not encounter each other then, under u, k 00 traverses the same path k 0 traverse under v. Therefore, M does not sort one of these input vectors. This contradicts the fact that both v and u are valid.
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
Chapter 4 The bypass transformation This chapter presents several transformations of a network that produce a smaller network by removing some comparators. This proves helpful for producing networks of a certain functionality out of larger ones, for removing degenerate comparators and for analyzing AMOP merging networks. The most elementary transformation, called the bypass transformation, bypasses a given comparator c as follows. The comparator c is removed and each incoming edge of c is merged with a distinct outgoing edge of c; the min/max type of the resulting edge is that of the incoming edge it replaces. A comparator can be bypassed in two different manners as depicted in Figure 4.1. Note that the bypass transformation does not change the width of the network or its input and output labels. a ˆ0 ˆb0
oˆ0
a ˆ0
oˆ0
a ˆ0
oˆ0
oˆ1
ˆb0
oˆ1
ˆb0
oˆ1
oˆ2
a ˆ1
oˆ2
a ˆ1
oˆ2
oˆ3
ˆb1
oˆ3
ˆb1
oˆ3
c
a ˆ1 ˆb1
(b)
(a)
(c)
Figure 4.1: Networks (b) and (c) derive from network (a) by bypassing the comparator c. The concept of a minor network is a generalization of the bypass transformation. Namely, a network N 0 is a minor of a network N if N 0 , or a network isomorphic to N 0 , is derived from N by a sequence of bypass transformations. Sometimes we need to keep track of the association of the members of N 0 with those of N . To this end, for 12
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
13
two networks N and N 0 , we say that the network N 0 is a minor of the network N via the embedding σ if σ is an embedding of N 0 into N having the following properties: 1. The embedding σ maps the input/output edges of N 0 onto the input/output edges of N in a one-to-one manner. 2. The congestion of any edge e ∈ N is exactly one – there is a unique edge e0 ∈ N 0 such that the path σ(e0 ) passes through e. For any edge e0 ∈ N 0 , let σ(e0 )1 and σ(e0 )∞ denote the first and last edges of the path σ(e0 ). As discussed shortly, σ(e0 ) has at least one edge; hence σ(e0 )1 and σ(e0 )∞ are always defined. ¡ ¢ ¡ ¢ 3. For any input output edge e0 ∈ N 0 , the two edges e0 and σ(e0 )1 e0 and σ(e0 )∞ have the same label. 4. For any non-input edge e0 ∈ N 0 , e0 and σ(e0 )1 have the same min/max type. We refer to the unique edge provided by requirement (2) as σ −1 (e). Since N and N 0 are networks, the incoming degree and the outgoing degree of any internal vertex is exactly two. This fact and requirement (2) imply that the load of any vertex of N is at most one – at most one vertex is mapped to it. By requirement (1), the load of input and output vertices is exactly one. As shown shortly, the comparators of N with zero load correspond to those comparators which were bypassed in the construction of the minor. The above fact that the load is at most one imply that the dilation of any edge e0 ∈ N 0 is at least one – the path σ(e0 ) has at least one edge. The following lemma shows that the above two definitions of a minor, one via a sequence of bypasses and the other via an embedding, are equivalent. Lemma 4.0.8. A network N 0 is a minor of a network N iff N 0 is a minor of N via some embedding σ. Proof. Consider the left to right implication. Clearly, if N 0 is derived from N by a (single) bypass transformation then it is a minor of N via some embedding σ. With the obvious definition of composition of embeddings, if N 0 is a minor of N via the embedding σ 0 and N 00 is a minor of N 0 via the embedding σ 00 then N 00 is a minor of N via the composition of σ 0 and σ 00 , denoted σ 0 ◦ σ 00 . This implies the left to right direction of the lemma. The right to left implication is proven by induction on the number k of comparators of N whose load is zero. If k = 0 then σ is an isomorphism. Assume k > 0. There exists a network N ∗ and two embeddings σ 1 : N 0 → N ∗ and σ 2 : N ∗ → N such that: 1) σ = σ 2 ◦ σ 1 . 2) N ∗ is derived from N by a (single) bypass transformation. 3) N ∗ has k − 1 comparators of zero load under σ 1 .
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
14
The induction hypothesis implies that a network isomorphic to N 0 is derived from N ∗ by a sequence of bypass transformations; hence a network isomorphic to N 0 is derived from N by a sequence of bypass transformations. The following two lemmas are straightforward. Lemma 4.0.9. Let N 0 be a minor of a network N via an embedding σ, let e1 and e2 be two edges of N and let N 0 have a path from σ −1 (e1 ) to σ −1 (e2 ). Then N has a path from e1 to e2 . Recall that a network is AMOP if it has at most one path from every input edge to every output edge. Lemma 4.0.10. Any minor of an AMOP network is AMOP.
4.1
Bypass charted by a vector
Recall that a comparator can be bypassed in two different manners. This section shows how to use an input vector v to chart (specify) how to bypass a comparator, or a set of comparators. This bypassing does not disturb the functionality of the network w.r.t. v and to any input vector that agrees with the charting vector v. To this end, an input vector v of a network N is called decisive for a comparator c if two different keys enter c when v is applied to N . Let v be an input vector of a network N which is decisive for a comparator c and let e1 and e2 be the two incoming edges of c such that V(e1 , v) < V(e2 , v). The v-charted bypassing of c is the bypassing of c in which e1 is merged with the outgoing min edge of c and e2 is merged with the outgoing max edge of c. We refer to the resulting network as ℘(N, c, v). Let C be a set of comparators in N . If v is decisive for every member of C we say that v is decisive for C. For such N, C and v, let Π(N, C, v) denote the set of the paths p of N having the following properties: p has at least two vertices. The first and last vertices of p are not members of C, all the other vertices of p are members of C and under v the same key traverses all the edges of p. Note that Π(N, C, v) covers all the edges of N and each edge is covered exactly once; moreover, if c is an internal vertex of a path p ∈ Π(N, C, v) then there is exactly one more path p0 ∈ Π(N, C, v) such that c is an internal vertex of p0 . Let v be an input vector of a network N decisive for a set of comparators C. The N minus C v-charted minor, denoted ℘(N, C, v), is the network N 0 defined as follows: • The vertices of N 0 are the vertices of N except of those which are members of C.
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
15
• The edges of N 0 are the members of Π(N, C, v). Let p ∈ Π(N, C, v) lead from x to y in N . Then in the context of N 0 , p is an edge from x to y. The min/max type of p in N 0 is the min/max type of the first edge of p in N . ¡ ¢ ¡ ¢ • If p is an input output edge of N 0 then its label is the label of the first last edge of the path p in N . It is not hard to establish that N 0 is a network; to this end, the following items should be checked: 1. N 0 is acyclic. 2. N 0 has three types of vertices. The input vertices have in-degree zero and outdegree one; the output vertices have in-degree one and out-degree zero; the internal vertices have in-degree and out-degree two. These internal vertices function as comparators. 3. Of the two edges exiting a comparator of N 0 , one is a min edge and the other is a max edge. It is not hard to check that N 0 is a minor of N via the embedding σ(N, C, v) which is the identity function over the the vertices and the edges of N 0 . (Note that an edge p of N 0 is a path of N .) We remind the reader that two functions, g and g 0 agree on a pair of elements (x, y) if both g and g 0 are defined over x and y and if x and y can be named z1 and z2 such that g(z1 ) ≤ g(z2 ) and g 0 (z1 ) ≤ g 0 (z2 ). We henceforth use the term ‘agree’ in an additional manner as follows: Let v and u be two input vectors of a network N and let e1 and e2 be the incoming edges of a comparator c of N . We say that v and u agree on c if v N and uN agree on the pair (e1 , e2 ). Recall that, v N and uN denote the natural extension of the input vectors v and u over all the edges of N . Recall that for an input vector v of a network N , T N (v) is the output vector generated by applying v to N . The following lemma is straightforward. Lemma 4.1.1. Let v be an input vector of a network N , decisive for a set of comparators C; let N 0 = ℘(N, C, v) and σ = σ(N, C, v); let u agree with v on each comparator of C. Then: a) If u is decisive for C then ℘(N, C, u) = ℘(N, C, v) and σ(N, C, u) = σ(N, C, v). 0 b) uN (e) = uN (σ −1 (e)) for any edge e of N . 0 c) T N (u) = T N (u).
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
16
Bypassing degenerate comparators. The v-charted bypassing transformation enables us to remove all the degenerate comparators of a network without disturbing its functionality, as follows. Let N be a network having a certain functionality, let V be the set of its valid vectors and let C be the set of the comparators of N that are degenerate under V . Let v ∈ V be a non-repeating input vector1 . Since v is non-repeating, it is decisive for all of the comparators of N and so ℘(N, C, v) is well defined. Any two input vectors of V agree on any comparator of C; hence, by Lemma 4.1.1, ℘(N, C, v) is independent of v (as long as v is a non-repeating member of V ). Define ℘(N, C, V ) , ℘(N, C, v). Lemma 4.1.1 implies the following lemma: Lemma 4.1.2. Let C be the set of comparators of a network N which are degenerate under a set of input vectors V . Then: a) The input/output transformations of N and ℘(N, C, V ) are identical over the members of V ; that is, T N (v) = T ℘(N,C,V ) (v) for every v ∈ V . b) The network ℘(N, C, V ) has no degenerate comparators w.r.t. V . Usually, the functionality of a network, and therefore its valid vectors, are clear from the context. In this case, where V is the set of valid vectors and C is the set of comparators of N degenerate under V , we simply denote the network ℘(N, C, V ) by undeg(N ).
4.2
Splitting of a network
This section presents a transformation of a given network into a disjoint sum of several smaller networks. To this end, we present a special minor charted by a given input vector v. The number of disjoint components of this minor equals the number of different keys in v. We usually apply this transformation with input vectors having a small number of different keys, either two or three. A comparator c of a network N is mixed under an input vector v if two different keys enter it under v (i.e. v is decisive for c). Let mix(N, v) denote the set of comparators of N that are mixed under v. The split of N by v, denoted split(N, v), is the network split(N, v) , ℘(N, mix(N, v), v). It is easy to see that split(N, v) is composed of several disjoint components; namely for each different key k that appears in v there is a disjoint component in which all the edges carry the key k under v. Next we show two applications of the split transformation that transform any merging network into a smaller network having a meaningful functionality. 1
We implicitly assume that V has such a vector.
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
17
4.2.1
Producing half merging networks
Our first application transforms any merging network into a network of a weaker functionality – a network that merges only a subset of the bisorted sequences. A bisequenced vector h~a, ~bi of width 4n is halved if ai , bi ≤ aj , bj whenever i < n ≤ j. A network is a half-merging network if it sorts all vectors which are both bisorted and halved. For the next lemma we need the following terminology. A network N is normalized if for every letter α, the two sets {i ∈ N|ˆ αi is a label of an input edge of N . } and {i ∈ N|ˆ αi is a label of an output edge of N . } are initial intervals of N (which could be empty). A network N 0 is the normalized variant of a network N if N 0 is normalized and is derived from N by simply replacing the labels of N via some monotonic function over the labels indices. In other words, let e be an input (output) edge of N labelled αi and assume there are k other input (output) edges of N labelled by αj for some j < i. Then the edge e is labelled by αk in N 0 . The following lemma is straightforward. Lemma 4.2.1. Any non-degenerate half-merging network of width 4n is composed of two disjoint networks, each of width 2n. One of these networks is a non-degenerate merging network and the normalized variant of the other is a non-degenerate merging network. Let ζ 4n be the unique halved bisequenced, 0 − 1 vector of width 4n having exactly 2n zeroes. (Note that ζ 4n has exactly 2n ones and is bisorted.) When the width of ζ 4n is clear from context, we omit the superscript 4n . We say that a comparator c of a merging network M is mixed if it is mixed under ζ. For a merging network M of width 4n, we define split(M ) , split(M, ζ 4n ). An example of such a splitting is depicted in Figure 4.2. The following lemma is straightforward. Lemma 4.2.2. A bisequenced vector of width 4n is halved iff it agrees with ζ 4n . Lemma 4.2.3. Let c be a non-degenerate comparator of a merging network M of width 2n which is mixed. Then n ∈ V M (c). Proof. Let v ∈ Pbs 2n be halved and let h : K → {0, 1} be the monotonic key function defined by: ( ) 0 k 1 and B = B(r, B1 , B2 , Z ). The cleaver r is determined by the even/odd parity of imf B (0). Clearly, imf B and r determines imf B1 and imf B2 . This, by the induction hypothesis, determine B1 and B2 . By Lemma 5.0.5, r determines Z. The following lemma states that if a merging network M is constructed from two networks M 0 and M 00 of arbitrary functionalities and widths in a manner similar to the Batcher construction than M 0 and M 00 must be merging networks of the same width and the construction is actually a Batcher construction. Two edges e1 and e2 of a network N are called disagreeable iff there are two valid input vectors of N , v and u, such that v N (e1 ) > v N (e2 ) and uN (e1 ) < uN (e2 ). A sandwich is a bisorted vector h~a, ~bi that can be made sorted by sandwiching the entire ~a sequence between an initial part of ~b and the rest of ~b; this also includes the case where the initial part of ~b or the rest of ~b is empty Lemma 5.0.10. Let n ∈ N and let M be a merging network of width 4n of the following form: • The network M is a concatenation of a network N followed by a depth one network Z. • The network N be a disjoint sum of two networks M 0 and M 00 . • No comparator of Z is degenerate in M • The input edge a0 of M enters M1 . Then M = B(r, M1 , M2 , Z ) for some r, M1 , M2 and Z where M1 and M2 are derived from M 0 and M 00 by a relabelling that normalizes the input edges. Proof. We first show that the input vector v = h~a, ~bi of M must be partitioned into M 0 and M 00 using one of the two Batcher cleavers; that is, even(~a) and odd(~a) go to different subnetworks and the same holds for ~b. Assume, for a contradiction, that two consecutive keys ai and ai+1 for i ∈ [0, n−1) enter the same subnetwork, say M 0 and let bj enter M 00 . It is not hard to see that such a bj exists. Clearly, there is a bisorted vector v in which ai and bj are adjacent and bj and ai+1 are adjacent. By Lemma 3.0.7, under v, bj encounters both ai and ai+1 . Since M 0 and M 00 are disjoint these two encounters occur in Z. This contradicts the fact that Z is of depth one. By symmetry, the even(~b) and odd(~b) enter different subnetworks. This implies that the input vector of M is partitioned by some Batcher cleaver.
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
27
Next, we show that two merging networks M1 and M2 can be derived from M 0 and M 00 , respectively, by assigning them output labels and normalizing their input labels. ¯ 1 derived from M 0 By symmetry, we show it only for M1 . We start with a network M by assigning it temporary arbitrary non-repeating output labels and normalizing its ¯ 1 , no input labels. Next, we show that under the set of bisorted input vectors of M 0 ¯ 1 are disagreeable. Assume, for a contradiction, that e and e00 two output edges of M ¯ 1 . Hence, there are two valid input vectors v are two disagreeable output edges of M ¯ 1 such that v(e0 ) > v(e00 ) and u(e0 ) < u(e00 ). Clearly 2v and 2u are valid and u of M input vectors of M ; furthermore, e0 and e00 are disagreeable in M under 2v and 2u. Since M is a merging network, by Lemma 3.0.6, e0 and e00 enter the same comparator of Z. Consider any comparator c of Z. Since c is non-degenerate in M , V M (c) = [k, k+1] for some k ∈ [0, 4n − 1). It is not hard to see that there is a valid permutation input vector of M (in fact, a sandwich vector) in which k and k + 1 belong to ~a and are consecutive there. Therefore, under any Batcher cleaver used, one key of {k, k + 1} goes to M 0 and the other goes to M 00 . This implies that c receives one edge from M 0 ¯ 1 has disagreeable and the other from M 00 . This contradicts the assumption that M ¯ edges; hence, a merging network M1 can be derived from M1 by relabelling its output edges.
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
Chapter 6 Congruent functions This chapter characterizes, via certain equivalence relations, the input matching functions of the Batcher merging networks. It shows that for any given width, these functions constitute a group. Let N denote the set of natural numbers including zero. For X, Y ⊂ N, X is a congruent class of Y if X is an equivalence class of Y under the equivalence relation“p ≡ q mod 2j ”, for some j ∈ N. A set X is a congruent class if it is a congruent class of the interval [0, 2k ) for some k ∈ N. The following two lemmas are straightforward: Lemma 6.0.11. The relation “X is a congruent class of Y ” is transitive. Lemma 6.0.12. Let Y ⊂ N and |Y | > 1. Then there is exactly one unordered pair of non-empty sets (X1 , X2 ) such that X1 ∩ X2 = φ, Y = X1 ∪ X2 and X1 and X2 are congruent classes of Y . Furthermore, if Y is a congruent class then X1 and X2 are congruent classes and |X1 | = |X2 |. We refer to the unordered pair (X1 , X2 ), provided by the last lemma, as the congruent partition of Y . The concept of congruent classs is relevant to the Batcher merging network as follows. Let X, Y ⊂ N such that |X| = |Y |. We denote by v X,Y the bisorted vector v X,Y = h~x, ~y i where the image of the sorted sequences ~x and ~y are X and Y respectively. A vector is congruent if it is of the form v X,Y where X and Y are congruent classes. Batcher merging networks are constructed recursively from smaller Batcher merging networks. By the following lemma, when a Batcher merging network receives a congruent vector, all these smaller Batcher merging networks receive a congruent vector. 28
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
29
Lemma 6.0.13. Let X, Y ⊂ N be congruent classs, let |X| = |Y | > 1, let v = v X,Y , 1 1 2 2 let r be a Batcher cleaver and let r1 (v) = v X ,Y and r2 (v) = v X ,Y . Then (X 1 , X 2 ) is a congruent partition of X and (Y 1 , Y 2 ) is a congruent partition of Y . Lemma 6.0.14. Let (X1 , X2 ) be the congruent partition of a set Y , let X be a congruent class of Y and X 6= Y . Then X is a congruent class of either X1 or X2 . Let X, Y ⊂ N. A congruent function from X onto Y is a bijection from X onto Y under which the image of any congruent class of X is a congruent class of Y ; we use −→Y to denote that f is a congruent function from X onto Y . A the notation f : X cong function is a congruent function if it is a congruent function from [0, 2j ) onto [0, 2j ) for some j ∈ N. Lemma 6.0.15. For any j ∈ N the set congruent functions from [0, 2j ) onto [0, 2j ) is a group under the composition operator. Proof. It follows from definition that for any two such functions f and g, f ◦ g is a congruent function and f −1 is a congruent function.
The following lemma presents several congruent functions Lemma 6.0.16. Let j, k ∈ N. Then: a) The permutation x 7→ x + k(mod 2j ) of [0, 2j ) is a congruent function. b) The order reversing1 permutation of [0, 2j ) is a congruent function. Proof. Both statements follow from the fact that for every l, the relation “x ≡ y(mod 2l )00 is invariant under the permutations in those statements. The following lemma provides recursive characterization of congruent functions. Lemma 6.0.17. Let X, Y ⊂ N, |X| = |Y | > 1. Then f is a congruent function from X onto Y iff there are (X1 , X2 ) a congruent partition of X, (Y1 , Y2 ) a congruent partition −→Y1 and f2 : X2 cong −→Y2 such that f = f1 ∪f2 . of Y and two congruent functions f1 : X1 cong 1
The permutation x 7→ 2j − 1 − x
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
30
Proof. We first prove left to right implication. Let (X1 , X2 ) be the congruent partition of X provided by Lemma 6.0.12. Let Y1 and Y2 be the images of X1 and X2 under f , respectively. Since f is a congruent function, (Y1 , Y2 ) is a congruent partition of Y . For i ∈ {1, 2}, let fi be the function f restricted to the set Xi . Clearly, f = f1 ∪f2 . By Lemma 6.0.14, any congruent class of Xi is a congruent class of X; hence, the image of any congruent class of Xi is a congruent class of Yi , establishing that f1 and f2 are congruent functions. The right to left implication follows from Lemmas 6.0.14 and 6.0.11. Lemma 6.0.17 enables us to compute the number of congruent functions from a congruent class X onto a congruent class Y . Let Π(X, Y ) denote the number of X to Y congruent functions. Lemma 6.0.17 imply that Π(X, Y ) = Π(X1 , Y1 ) · Π(X2 , Y2 ) + Π(X1 , Y2 )·Π(X2 , Y1 ). A simple induction shows that if X and Y are congruent classes then Π(X, Y ) depends only on |X| and |Y |. If |X| 6= |Y | then Π(X, Y ) = 0. For the other case define Π(n) = Π(X, Y ) for |X| = |Y | = n. Clearly Π(1) = 1 and by Lemma 6.0.17, Π(2n) = 2 · (Π(n))2 . The solution to this recursive equation is: Lemma 6.0.18. There are exactly 2n−1 congruent functions from [0, n) onto [0, n) when n is a power of two. Let X, Y ⊂ N and let v X,Y = h~a, ~bi be an input vector of a merging network M . Let f M (v X,Y ) be the function f M (v X,Y ) : X → X ∪ Y defined by f M (v X,Y )(x) = y if y is the first key that the key x of the ~a sequence encounters under the input vector v X,Y . Lemma 6.0.19. A function is a congruent function iff it is the i.m.f. of some Batcher merging network. Proof. Consider the right to left direction. We first claim that if X and Y are congruent classes and B is a Batcher merging network then f B (v X,Y ) is a congruent function from X onto Y . This claim follows by induction and Lemmas 6.0.17 and 6.0.13. By Lemma 5.0.8, imf B is total; and so, for X = Y = [0, n), imf B = f B (v X,Y ). Therefore, imf B is a congruent function. The left to right direction follows from Lemmas 5.0.9,6.0.18 and 5.0.7.
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
Chapter 7 The Input Cone This chapter studies a certain subnetwork of compact merging networks. The input cone of an edge e of a network N , denoted IC N (e), is the subgraph of N composed of the vertices and edges having a path to e. (This subgraph includes e.) This section takes special interest of the cone entering the edge oˆn of a merging network (say M ) of width 2n and denote this cone by IC M . For an edge e of a network N , The output cone of e, denoted OC N (e), is the subgraph of N composed of the vertices and edges that can be reached from e. Let V be a set of input vectors of a network N and let e be an edge of N . A key k is obliged to e under V iff V(e, v) = k for any v ∈ V under which k enters an input edge of IC N (e). As with other notations, when the network referred to is clear from context we omit it from the above notations and use the shortcuts IC(e), OC(e) and IC. The following lemma follows directly from the definition of the AMOP property. Lemma 7.0.20. Let e be an edge of an AMOP merging network and let oˆi ∈ OC(e). Then i is obliged to e under Pbs . Let e1 and e2 be the incoming edges of a non-degenerate comparator c of a network N having a certain functionality. Then there are two valid vectors of N that do not agree on the pair (e1 , e2 ); that is, these vectors can be named v 0 and v 00 such that v 0N (e1 ) > v 0N (e2 ) and v 00N (e1 ) < v 00N (e2 ). In this case we say that the unordered pair (v1 , v2 ) establishes that c is non-degenerate. Lemma 7.0.21. Let M be an AMOP merging network and let c be a comparator of IC M . Then c is non-degenerate.
31
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
32
Proof. Let the width of M be 2n. Let d1 and d2 be the two incoming edges of c and let e be an outgoing edge of c where e ∈ IC M . By symmetry, it suffices to consider only the case where e is a min edge. Since the width of M is 2n, for each input edge d there is a valid vector v s.t. v M (d) = n. Let v1 , v2 ∈ Pbs be two input vectors such that under vi the key n enters an input edge of IC M (ei ) for i ∈ {1, 2}. By Lemma 7.0.20, n is obliged to d1 , d2 and e; therefore n = V(e, vi ) = V(di , vi ) for i ∈ {1, 2}. Hence, V(d1 , v1 ) < V(d2 , v1 ) and V(d1 , v2 ) > V(d2 , v2 ). That is, the unordered pair v1 , v2 establishes that c is non-degenerate. For n and m, both powers of two, define an m-class of the interval [0, n) to be an equivalence class of [0, n) under the equivalence relation“p ≡ q mod m”. By definition (Chapter 6), any m-class of [0, n) is a congruent class. Recall that a sandwich vector is a bisorted permutation in which the range of the ~a sequence is an interval. Clearly, there are exactly n + 1 different sandwich vectors of width 2n; furthermore, for a given width, a sandwich vector is determined by the value of any single element of the ~a sequence. Recall that a merging network is regular iff its width is a power of two and that O(e) , {i|ˆ oi ∈ OC(e)}, Ia (e) , {i|ˆ ai ∈ IC(e)} and Ib (e) , {i|ˆbi ∈ IC(e)}. Lemma 7.0.22. Let M be a regular AMOP merging network of width 2n. Then: a) IC M is a balanced binary tree that contains all the input edges of M . b) Let e be a non-input edge of IC M . Then Ia (e) and Ib (e) are congruent classes of [0, n). c) Let C be a congruent class of [0, n). Then there is a unique non-input edge e ∈ IC M with Ia (e) = C. d) The function imf M is total and is a congruent function over [0, n). Proof. Consider statement (b). Due to symmetry, it suffices to prove this statement only for Ia (e). The proof is by induction on the distance from e to oˆn . If e = oˆn , then clearly Ia (e) = [0, n) which is a 1-class of [0, n). Assume e 6= oˆn . Let e and e0 enter the same comparator c and let t0 and t00 be the min and max outgoing edges of c, respectively. Either t0 ∈ IC M or t00 ∈ IC M . These two cases are similar and we consider only the former. By the induction hypothesis, Ia (t0 ) is an m-class of [0, n) for some m a power of two. Assume, for a contradiction, that Ia (e) is not a 2m-class of [0, n). Then there is an edge e∗ ∈ {e, e0 } and integer i such that i, i + m ∈ Ia (e) belong to IC(e∗ ). Let s be the sandwich input vector with V(ˆ ai , s) = n. Under this vector, the keys n and ∗ n + m enter IC(e ). By Lemma 7.0.20, n is obliged to e∗ and so V(e∗ , s) = n. This clearly implies that n + m is not obliged to e∗ under Pbs ; therefore, by Lemma 7.0.20, oˆn+m ∈ / OC(e∗ ).
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
33
Since s is a sandwich and Ia (t0 ) is an m-class of [0, n), under s no key in the interval (n, n + m) enters IC(t0 ) = IC(t00 ). This and V(t0 , s) = n imply that V(t00 , s) ≥ n + m. By Lemma 7.0.21, c is non-degenerate and so, by Lemma 3.0.5(b), V(c) = V(t0 ) ∪ V(t00 ) is an interval; therefore n + m ∈ V(c). This contradicts the above conclusion that oˆn+m ∈ / OC(e∗ ) and establishes statement (b). Statements (a) and (c) follow immediately from statement (b) and the properties of congruent classes. Consider statement (d). Statements (a) and (b) imply that imf M total. Any input matching function is one-to-one. It remains to show that, for every congruent class A, the image of A under imf M is a congruent class. By statement (c), there exists a non-input edge e such that Ia (e) = A. By statement (b), Ib (e) is an congruent class of [0, n). That is, Ib (e) = imf M (Ia (e)) is a congruent class. 0
00
The following lemma concerns isomorphism of the structures IC M and IC M of two networks M 0 and M 00 . Note that by our definition (Chapter 2, page 7) these structures are not networks; however the concept of isomorphism is clearly applicable for them. Namely, such an isomorphism has to preserve the edges/vertices connectivity as well as the input labels and the min/max types of the edges. 0
Lemma 7.0.23. Let M 0 and M 00 be two regular AMOP merging networks with imf M = 00 0 00 imf M . Then IC M ∼ = IC M . 0
00
Proof. We need to show a bijection from IC M to IC M which preserves the connectivity, the input labels and the min/max types of the edges. Define a mapping σ 0 00 from the edges of IC M onto the edges of IC M by: ) ( 00 0 00 e if e and e are input edges having the same label. σ(e0 ) = e00 if e0 and e00 are not input edges and Ia (e0 ) = Ia (e00 ). 0
By Lemma 7.0.22(c), σ is well defined and is a bijection from the edges of IC M 00 onto the edges of IC M . We next show that σ preserves the edges connectivity of 0 0 IC M ; i.e. if he1 , e2 i is a path of length two of IC M then hσ(e1 ), σ(e2 )i is a path of 00 length two of IC M . 0 0 Let y be a non-input edge of IC M . By Lemma 7.0.22(b), IaM (y) is a congruent class 0 0 of [0, n). First assume that |IaM (y)| = 1; that is IaM (y) is a singleton, say {i}. In this 0 0 00 0 case, hˆ ai , yi and hˆbj , yi are pathes in IC M , for j = imf M (i). Since imf M = imf M the same holds for σ(y), σ(ˆ ai ) and σ(ˆbj ); that is, hσ(ˆ ai ), σ(y)i and hσ(ˆbj ), σ(y)i are 00 pathes of IC M . 0 Assume now that |IaM (y)| > 1, let (Y1 , Y2 ) be the unique congruent partition of 0 IaM (y) provided by Lemma 6.0.12. By Lemma 7.0.22(c) there are two unique edges
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
34
0
0
y1 and y2 s.t. IaM (y1 ) = Y1 and IaM (y2 ) = Y2 . Clearly, hy1 , yi and hy2 , yi are pathes 0 00 00 in IC M . By definition, σ preserves the fact that (IaM (σ(y1 )), IaM (σ(y2 ))) are the 00 congruent partition of IaM (σ(y)); hence, hσ(y1 ), σ(y)i and hσ(y2 ), σ(y)i are pathes of 00 0 IC M . Therefore, σ preserves the connectivity of IC M . This implies that σ can be 0 extended over the vertices of IC M while preserving the edges/vertices connectivity. 0 By definition, σ preserves the input labels of IC M . It remains to show that σ 0 preserves the min/max type of any edge e0 of IC M . This is shown by induction on the depth of IC(e0 ). The case where e0 is an input edge is trivial. Let e0 emerges from a comparator c and let e00 be the edge emerging from σ(c) and having the same type as e0 . We show that σ(e0 ) = e00 . 0 00 Let σ ¯ be the mapping from IC M (e0 ) onto IC M (e00 ) such that σ ¯ is identical to σ on all the vertices and edges, but e0 , and σ ¯ (e0 ) = e00 . By the induction hypothesis, σ ¯ M0 0 M 00 00 M0 0 is an isomorphism and so V (e ) = V (e ). The fact that n ∈ V (e ) implies that 00 00 n ∈ V M (e00 ) and therefore e00 ∈ IC M . Since σ preserves connectivity and since, by 0 00 Lemma 7.0.22, IC M and IC M are trees, it follows that σ(e0 ) = e00 .
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
Chapter 8 The i.m.f. of a split This chapter studies the effect of the split transformation on the i.m.f. of a compact merging network and shows that the resulting i.m.f. is uniquely determined by the original i.m.f. Lemma 8.0.24. Let e be an edge of a non-degenerate merging network M . Then O(e) is an interval. Proof. Let the width of M be 2n. The proof is by induction on the depth of OC(e). The case where e is an output edge is trivial. For the general case we use the following fact. For two intervals of integers, I 0 and I 00 , I 0 ∪ I 00 is an interval iff there exist i0 ∈ I 0 and i00 ∈ I 00 such that |i0 − i00 | ≤ 1. Let e be an incoming edge of a comparator c. Let e1 and e2 be the two outgoing edges of c. By the induction hypothesis, O(e1 ) and O(e2 ) are intervals. By Lemma 3.0.5(a), V(e1 ) and V(e2 ) are intervals. Clearly, V(ei ) ⊂ O(ei ) for i ∈ {1, 2}. By Lemma 3.0.5(b), V(e1 ) ∪ V(e2 ) is an interval. By the above fact, O(e1 ) ∪ O(e2 ) is an interval. An input edge a ˆj or ˆbj of a merging network of width 4n is called small iff j < n; otherwise (n ≤ j < 2n) it is called large. Recall that an input or output edge whose label is α is named α. We now name additional edges of a network. Let N be a network with a comparator whose incoming edges are input edges named α and β d (i.e. α and β are matched). In this case, min(α, β) denote the min edge emerging from this comparator and max(α, d β) denote the max edge emerging from this comparator. d If α and β are not matched then min(α, β) and max(α, d β) are undefined.
35
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
36
Lemma 8.0.25. Let M be a merging network of width 2n and let the input edges a ˆj and ˆbk be matched. Then: d aj , ˆbk )) = [min(j, k), j + k]. a) V(min(ˆ b) V(max(ˆ d aj , ˆbk )) = [j + k + 1, max(j, k) + n]. d aj , ˆbk ))) + 1 = j + k + 1. c) If M is AMOP then min(O(max(ˆ d aj , ˆbk ))) = max(O(min(ˆ Proof. Statements (a) and (b) are straightforward so we prove only statement (c). By d aj , ˆbk )) and O(max(ˆ d aj , ˆbk )) ⊂ Lemma 8.0.24, O(min(ˆ d aj , ˆbk )) are intervals. Clearly, V(min(ˆ d aj , ˆbk )) and V(max(ˆ d aj , ˆbk ))∩ O(min(ˆ d aj , ˆbk )) ⊂ O(max(ˆ d aj , ˆbk )). Since M is AMOP, O(min(ˆ O(max(ˆ d aj , ˆbk )) = φ; therefore statement (c) follows from statements (a) and (b). Lemma 8.0.26. Let M be a regular compact merging network of width 4n having d aj , ˆbk ) and min(ˆ d aj 0 , ˆbk0 ). Then: a comparator whose incoming edges are min(ˆ a) j 0 − j = k − k 0 . b) If one (or more) of the four edges a ˆj , ˆbk ,ˆ aj 0 and ˆbk0 is large then: 0 1) |j − j | ≥ n. 2) In each one of the four pairs (ˆ aj , a ˆj 0 ), (ˆbk , ˆbk0 ), (ˆ aj , ˆbk ) and (ˆ aj 0 , ˆbk0 ), one edge is small and the other is large. d aj , ˆbk ) and e0 = min(ˆ d aj 0 , ˆbk0 ). Clearly, Proof. Consider Statement (a). Let e = min(ˆ O(e) = O(e0 ). By Lemma 8.0.25 (c), j + k = max(O(e)) = max(O(e0 )) = j 0 + k 0 which implies Statement (a). Consider Statement (b.1). Due to Statement (a), Statement (b.1) is symmetric with respect to a ˆ and ˆb. Due to this symmetry and the symmetry w.r.t. k and k 0 , we can assume that ˆbk is large and k > k 0 . Assume, for a contradiction, that |j − j 0 | < n. ˜ so that the following In this case the network M can be pruned into a network M conditions hold: 1) Half of the edges of each input sequence ~a and ~b are pruned. 2) The pruning is honest; that is, the input vector that charts this pruning is valid for merging. 3) Out of the four input edges a ˆj , bˆk , a ˆj 0 and ˆbk0 , only ˆbk is pruned and is pruned +∞. 0 This is possible because |j − j | < n, k > k 0 and because k is large. ˜ is AMOP. In M ˜ , the two edges min(ˆ d aj 0 , ˆbk0 ) and a By Lemma 4.0.10, M ˆj enter the ˜ M same comparator; therefore imf is not total. This contradicts Lemma 7.0.22(d) since ˜ is a regular AMOP merging network. This contradiction the normalized variant of M establishes Statement (b.1). Statement (b.2) follows from statements (a) and (b.1).
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
37
Lemma 8.0.27. Let M be a regular compact merging network whose width is at least d aj , ˆbk ) be an edge of M where a four. Let min(ˆ ˆj is small and ˆbk is large. Then M has d aj 0 , ˆbk0 ) where a an edge min(ˆ ˆj 0 is large and ˆbk0 is small and these two min edges enter the same comparator. ˜ = split(M ) and let c be the comparator a ˜ . By Lemma Proof. Let M ˆj enters in M ˜ ˜ is AMOP. By Lemmas 4.2.4 and 7.0.22 (d), imf M is total. Let ˆbk0 be the 4.0.10, M ˜ . Clearly, ˆbk0 is a small input edge. Let e and e0 be the other edge entering c in M two edges entering c in M , such that a ˆj ∈ IC M (e) and ˆbk0 ∈ IC M (e0 ). Since M is AMOP, the edges e and e0 are uniquely defined. Since a ˆj is small, in the bypass process a ˆj follows the min edges; therefore e is 0 a min edge. Similarly, e is a min edge. It remains to show that each of IC M (e) and IC M (e0 ) has exactly two input edges. In M both a ˆj and ˆbk0 enter a comparator whose second incoming edge is a large input edge; hence the situation is symmetric w.r.t. a ˆj and ˆbk0 . Due to this symmetry it suffices to consider only IC M (e). First we show that a ˆj is the only small input edge in IC M (e). Assume for a contradiction, that d is another small input edge in IC M (e) and let the width of M ˜ and by Lemma 4.2, splitD (M ) is a be 4n. Recall that splitD (M ) is a subnetwork of M width 2n merging network. Since c is of depth one in splitD (M ), c ∈ IC splitD (M ) ; this ˜ implies that n ∈ OM (c); therefore n ∈ OM (e) and n ∈ OM (d). Since M is AMOP, ˜ into the any path p from d to oˆn passes through e. Since e has been reduced in M ˜ input edge a ˆj , the path p from d to oˆn is disconnected in M . This contradicts the ˜ fact that M is a half-merging network. Let a ˆp and ˆbq be two input edges in IC M (e) which enter the same comparator and let g = max(ˆ d ap , ˆbq ). Since ˆbq is large, p + q ≥ n. By Lemma 8.0.25 (c), min(O(g)) = d ap , ˆbq ) ∈ IC M (e). p + q + 1 > n which implies g ∈ / IC M (e) and therefore min(ˆ Now assume, for a contradiction, that there are more then two input edges in IC M (e). In this case, IC M (e) has four input edges which comply with the conditions of Lemma 8.0.26. By the conclusions of this lemma, two of these edges are small. This contradicts the fact that a ˆj is the only small input edge in IC M (e). Lemma 8.0.28. Let M be a regular compact merging network whose width is at least four. Then imf M uniquely determines imf split(M ) . Proof. Let 4n be the width of M . Let η = imf M and η 0 = imf split(M ) . It suffices to show that: (a) η 0 is total. (b) For any j: η 0 (j) < n iff j < n. (c) For any j: η 0 (j) ≡ η(j)(mod n).
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
38
By Lemmas 4.0.10 and 4.2.4, split(M ) is a disjoint sum of two regular AMOP merging networks. By Lemma 7.0.22 (d), η 0 is a total function over [0, 2n). This establishes statement (a). Statement (b) follows from the fact that these two networks are disjoint. By symmetry, it suffices to prove statement (c) only for j < n. If η(j) < n then a ˆj 0 enters a comparator which is not mixed; therefore η (j) = η(j). Thus, it remains to considers js which are members of I = {i|i < n and η(i) ≥ n}. Define the function f : I → Z by f (j) = η −1 (η 0 (j)) − n. We first show that f is a permutation of I. Clearly, f is one-to-one. Let j ∈ I and let k = η(j). The edges a ˆj , ˆbk satisfy the premise of Lemma 8.0.27 so let a ˆj 0 and ˆbk0 be the two edges provided by this Lemma. 0 0 By this lemma, η (j) = k and η −1 (k 0 ) = j 0 ≥ n; i.e. f (j) = j 0 − n. It remains to show that f (j) = j 0 − n ∈ I. By Lemma 7.0.22(d), η is a congruent function, and so j 0 ≥ n and η(j 0 ) = k 0 < n imply that η(j 0 − n) = k 0 + n ≥ n. This and j 0 − n < n imply that j 0 − n ∈ I. If f is not the identity function, then there exists a j ∈ I such that f (j) < j. Let k = η(j), j 0 = η −1 (η 0 (j)) and let k 0 = η(j 0 ). Since j 0 is large, the edges a ˆj , ˆbk , a ˆj 0 ˆ and bk0 satisfy the premise of Lemma 8.0.26 (b). The fact that f (j) < j imply that η −1 (η 0 (j)) − n < j, and so j 0 − j < n; this contradicts Lemma 8.0.26 (b.1). Hence f is the identity function. That is, for every j ∈ I, j = f (j) = η −1 (η 0 (j)) − n. This implies η 0 (j) = η(j +n). Since η is a congruent function, η(j +n) ≡ η(j) (mod n).
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
Chapter 9 Degenerate comparators for half merging. This chapter investigates which comparators of a merging network are degenerate with respect to the half merging functionality. Recall that ζ 4n is the unique halved bisorted, 0 − 1 vector of width 4n having exactly 2n zeroes. In cases the width of such a vector is clear from context, we omit the superscript and use the shortcut ζ. Recall that a comparator is mixed if it is mixed under ζ. Lemma 9.0.29. Let c be a comparator of a Batcher merging network B of width 4n. Then c is mixed iff c is degenerate under the functionality of half-merging. Proof. Consider the left to right implication. By Lemmas 4.2.2 and 3.0.3, ζ 4n agree on c with any halved bisorted vector. This implies that c is degenerate under the functionality of half-merging. To prove the right to left direction we show that if c is not mixed then c is non-degenerate under the functionality of half-merging. This we show by induction on n. There are exactly two Batcher merging networks of width 4 and it is easy to verify that this implication holds for both of them. Let n > 1 and let B = B(r, B1 , B2 , Z ) as defined in Chapter 5 where r is a Batcher cleaver, B1 and B2 are Batcher merging networks and Z is the appropriate depth one network. Let c be a non-mixed comparator of B. First consider the case where c belongs to B1 or B2 , say B1 . Clearly, r(ζ 4n ) = hζ 2n , ζ 2n i; namely, when B receives ζ 4n , each of the networks B1 and B2 receives ζ 2n . Since c is not mixed in B under ζ 4n , c is not mixed in B1 under ζ 2n . By the induction 39
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
40
hypothesis, c is non-degenerate in B1 for the functionality of half-merging; therefore there are two halved, bisorted input vectors, v1 and v2 , of B1 that establish that c is non-degenerate in B1 under the half-merging functionality. Clearly, for any input vector v of B, r(2v) = hv, vi; this implies that when 2vi is applied to B then vi is applied to B1 . Therefore, the pair (2v1 , 2v2 ) establishes that c is non-degenerate in M under the half-merging functionality. Next consider the case where c ∈ Z. The network Z is of depth one and c is nondegenerate under the functionality of merging; hence, by Lemma 8.0.24, its outgoing edges are oˆi and oˆi+1 for some i. Since c is not mixed under ζ 4n , we have i 6= 2n − 1. 0 Let v ∈ Pbs 4n be defined by bj = aj +1 for all j. Let v be the permutation derived from v by swapping the keys i and i+1. Clearly, v and v 0 are halved, bisorted permutations. The two incoming cones of the edges entering c are disjoint since one is a subgraph of M1 and the other is a subgraph of M2 . Since both v and v 0 are valid vectors, c receives the keys i and i + 1 under both input vectors but from different edges; that is, v and v 0 establish that c is non-degenerate for the half-merging functionality. Lemma 9.0.30. Let M be a regular compact merging network whose width is greater then two. Then M has a minor M ∗ such that: a) split(M ∗ ) is a non-degenerate AMOP half-merging network. ∗ b) IC M ∼ = IC M . ∗ c) imf split(M ) = imf split(M ) . Note that M ∗ is not required to be a merging network. Proof. Let the width of M be 4n and let C be the set of the comparators of M which are degenerate w.r.t. half-merging and are not mixed under ζ 4n . Pick any halved bisorted permutation v and let M ∗ = ℘(N, C, v); namely, M ∗ is the product of bypassing all comparators of C in the manner charted by v. Since all halved bisorted permutations agree on all comparators of C, by Lemma 4.1.1, M ∗ is independent of which input vector v is chosen. By Lemma 4.1.1, M ∗ is a half-merging network and all its degenerate comparators (w.r.t. half merging) are mixed under ζ 4n , establishing statement (a). By Lemma 7.0.22(d), imf M is a congruent function. By Lemma 6.0.19, there is a Batcher merging network B such that imf B = imf M . By Lemma 7.0.23, IC M ∼ = IC B . By Lemma 9.0.29, all the comparators of B which are degenerate w.r.t. half-merging are mixed. This implies that no comparator of C is in IC M , which implies statement (b). ∗ To prove statement (c), let µ = imf M , µ ¯ = imf split(M ) and µ ¯0 = imf split(M ) . ∗ By statement (b), imf M = µ. We have to show that µ ¯(j) = µ ¯0 (j) for any j. By symmetry, we may assume j < n and let k = µ(j); that is, the two edges a ˆj and ˆbk
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
41
are matched in some comparator c of M . Assume first that k < n. In this case c is non-degenerate w.r.t. half merging and so µ ¯(j) = µ ¯0 (j) = k. Next assume k ≥ n. In this case a ˆj and ˆbk comply with the premise of Lemma 8.0.27. Let a ˆj 0 and ˆbk0 be the two edges provided by this lemma, let c0 be the comparator where the edges a ˆj 0 and ˆbk0 are matched in M and let c¯ be the comparator d aj , ˆbk ) and min(ˆ d a0j , ˆb0k ) enter. It is not hard to see that c¯ is nonthat the edges min(ˆ degenerate w.r.t. half merging; therefore, c¯ ∈ M ∗ . Since c and c0 are not in split(M ∗ ) and are not in split(M ) and since c¯ is in split(M ∗ ) and in split(M ), the two edges a ˆj 0 ∗ ˆ and bk enter the same comparator both in split(M ) and in split(M ). This implies that µ ¯(j) = µ ¯0 (j) = k 0 , establishing statement (c).
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
Chapter 10 Characterization of the Batcher merging networks This chapter combines the results of previous ones into our main result. To this end The following theorem is the main result of this work: Theorem 10.0.31. A network is a regular and compact merging network iff it is a Batcher merging network. Proof. The right to left implication is the easy part of this theorem. By their definition, Batcher merging networks are regular. By Lemma 5.0.4 and 5.0.2, all Batcher merging networks are non-degenerate and AMOP. We prove the left to right implication by induction on the width of the network. Let M be a compact merging network. The case where M is of width 2 is trivial so let the width of M be 4n. By Lemma 7.0.22(d), imf M is a congruent function and so by Lemma 6.0.19, there exists a Batcher merging network B of the same width and i.m.f. as M . By Lemma 4.3.1, it suffices to show that some minor of M is isomorphic to B. Let M ∗ be the minor of M provided by Lemma 9.0.30. Without loss of generality, we may assume that M ∗ is a minor of M via an embedding σ which is the identity function over the vertices of M ∗ ; this implies that the vertices of M ∗ are vertices of M . By Lemma 9.0.30, split(M ∗ ) is a non-degenerate AMOP half-merging network. By Lemmas 4.2.1, split(M ∗ ) is a disjoint sum of two networks, one is a compact merging network we call splitD (M ∗ ) and the normalized variant of the other is a compact merging network called splitU (M ∗ ). By Lemma 9.0.29, all comparators of B which are degenerate w.r.t. half-merging are mixed under ζ, and so split(B) is a nondegenerate AMOP half-merging network. By Lemmas 4.2.1 and 4.0.10, split(B) is a 42
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
43
disjoint sum of two networks, one is a compact merging network and the normalized variant of the other is a compact merging network. We name these compact merging networks splitD (B) and splitU (B), respectively. By the induction hypothesis, each of the four networks, splitD (M ∗ ), splitU (M ∗ ), splitD (B) and splitU (B), is isomorphic to some Batcher merging network of width 2n. By our construction, imf B = imf M . By Lemma 8.0.28, imf split(B) = imf split(M ) . ∗ ∗ By Lemmas 9.0.30 (c), imf split(M ) = imf split(M ) = imf split(B) ; therefore, imf splitD (M ) = ∗ imf splitD (B) and imf splitU (M ) = imf splitU (B) . By the induction hypothesis and Lemma 5.0.9, splitD (B) ∼ = splitD (M ∗ ) and splitU (B) ∼ = splitU (M ∗ ). This implies that split(B) ∼ = split(M ∗ ). Recall that it remains to show that M ∗ ∼ = B. By Lemma 2.0.1, it suffices to show that M ∗ and B have a common syndrome1 ; i.e. synd(M ∗ , v) = synd(B, v), for some non-repeating input vector v. ∗ Pick a halved input vector v ∈ Pbs 4n . We prove that synd(M , v) = synd(B, v), i.e. ∗ that for any key k in v, act(M , v, k) = act(B, v, k). Clearly, the first, second and last elements of those acts (k, l1 and l2 ) are identical in act(M ∗ , v, k) and act(B, v, k). The last identity is due to the fact that v is valid. For a network N ∈ {M ∗ , B, split(M ∗ ), split(B)}, let ~s N be the sequence of keys that our key k encounterers on its way through N under the input vector v and let ~s1N be the initial segment of ~s N composed of the keys k encountered inside IC N . It ∗ remains to show that ~s M = ~s B . By symmetry, we may assume k < 2n. It is not ∗ ∗ hard to see that ~s split(M ) and ~s split(B) are derived from the sequences ~s M and ~s B , respectively, by purging all keys greater then 2n−1; furthermore, all of these purgings concern comparators which are mixed in their networks (which are either M ∗ or B). ∗ Let k¯ > 2n − 1 appear in ~s M and let this “encounter” occur in a comparator c of M . (Clearly, c is mixed.) Since v is valid for half-merging, oˆk and oˆk¯ are members ∗ of OC M (c). Since M ∗ is a minor of M , oˆk and oˆk¯ are members of OC M (c). By ∗ ∗ Lemma 8.0.24, c is a member of IC M . Since IC M = IC M , c is a member of IC M . ∗ ∗ Therefore, all the keys purged from ~s M reside in the sequence ~s1M . Similar and simpler arguments imply that all the keys purged from ~s B reside in the sequence ~s1B . ∗ ∗ By Lemma 7.0.23, IC B ∼ = IC M ∼ = IC M ; hence ~s1M = ~s1B . This implies that in ∗ the above purgings (both in ~s M and in ~s B ) the same keys have been removed from ∗ the same position. Since split(B) ∼ = split(M ∗ ), the derived sequences, ~s split(M ) and ∗ ~s split(B) , are equal. hence, so must be the original sequences; that is ~s M = ~s B .
1
Recall that for a network N and a non-repeating input vector v, synd(N, v) is the set {act(N, v, k)| The key k appears in v} and that act(k, v, N ), is the fourtouple hk, l1 , ~s, l2 i, in which l1 and l2 stands for the labels of the input and output edges traversed by k and ~s is the sequence of the keys that k encountered on its way through N .
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
Chapter 11 oblivious algorithms In the next chapter we survey a variety of published merging techniques and merging networks. Usually, networks of comparators are presented indirectly via an oblivious algorithm (a.k.a. non-adaptive algorithm) which gives rise to the desired network. This section discusses the concept of oblivious algorithm and how it relates to networks of comparators. We prefer to start by presenting a new model of oblivious computation, which we believe to be more natural than the accepted model ([16], [10, pp 220] and [13, pp 623]). Our model is more powerful than the accepted one in two aspects; it speedsup the computation of several functions w.r.t. the accepted model and it allows the computation of new functions which are not computable in the accepted model. Nevertheless, it still obeys the classical 0-1 principle and its known generalizations including several new ones presented in Chapter 13. Our model of computation, lets call it the min/max oblivious model, has only one data type – a key. These keys are stored in variables and there are only two instructions : “z ⇐ min(x, y)” and “z ⇐ max(x, y)”, where x, y and z stand for arbitrary (not-necessarily distinct) variables. Note that this model does not allow for any control operations (such as branching or looping) and especially no conditional instructions of the form “ if (x < y) then ...”; therefore, an algorithm in this model, called a min/max oblivious algorithm, is a straight-line code of the above statements. Clearly, an algorithm is aimed to solve a given problem. To this end, some of the variables are input variables which contain the input; another set of variables, output variables contains the output of the algorithm. All other variables are intermediate variables. The names of the input/output variables reflects the functionality of the algorithm. Consider, for example, an algorithm to merge two sorted sequences, each of length 4, into a single sequence of length 8; in this case, the input variables could be A0 , A1 , A2 , A3 , B0 , B1 , B2 and B3 and the output variables could be O0 , O1 . . . O7 44
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
45
with the obvious semantics. To avoid the problem of uninitialized variables we add a restriction to this model that a variable must be written before it is read .Clearly, the input variables are considered to be written at the beginning of the algorithm. Note that the min/max model differs from the accepted model in two significant manners. Firstly, it uses the weak instructions, “min” and “max”, while the accepted model combines these instructions into a single one – the sorting of a pair of keys. Secondly, and more significant, in the min/max model a computed value can be used several times as opposed to once in the accepted model. To simplify the comparative study of algorithms it is desirable that algorithms will have as little insignificant details as possible. Ideally, two algorithm that essentially “preform the same computation” will be identical. One source of insignificant details is the reuse of variables; namely, using a variable to store several values during the course of the algorithm. To eliminate this source we prefer write-once algorithms; that is, algorithms in which a variable is written only once. Clearly, any min/max oblivious algorithm can be transformed into a write once algorithm. Any min/max oblivious algorithm can be translated into a min/max network. Such a network is a directed acyclic graph, composed of three types of vertices; input vertices with in-degree 0 and arbitrary out-degree, output vertices with in-degree 1 and out-degree 0 and intermediate vertices with in-degree 2 and arbitrary out-degree. Actually, intermediate vertices are either min elements or max elements. Such an element computes the appropriate value and transmits it on all it’s outgoing edges. The input and output vertices are associated with labels; this association specifies how to apply the input to the network and how to collect the output of the network. There are two differences between min/max networks and networks of comparators. These differences are parallel to the differences between the min/max oblivious model and the accepted one. Namely, the use of min/max elements instead of comparators and the unrestricted fanout. Any min/max oblivious algorithm can be translated into a min/max network and this translation is immediate when the algorithm is write-once. In this case input vertices correspond to an input variables. All other vertices correspond to intermediate vertices that compute the appropriate value. The output variables correspond, not only to intermediate vertices, but also to output vertices. Edges are used to propagate the values within the network. Translation from a min/max network into a write-once algorithm is also straightforward1 but unfortunately is not unique; the resulting algorithms may differ in nonsignificant details of two types; the names of intermediate variables and the relative 1
A network may have an edge going from an input vertex to an output vertex. In order to translate such an edge we add a special instruction of the form “y ⇐ x” where x is an input variable and y is an output one.
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
46
order of independent2 instructions. So far we described oblivious algorithm as a serial program. However, the big advantage of oblivious algorithms is that they can be computed in a parallel manner. In this parallel model time is divided into steps and a program is divided into a sequence of sections. In each time stamp all instructions of the current section are performed simultaneously; therefore, all instructions of a section are required to be independent. The run time of a parallel algorithm is the number of those steps. Note that the instructions of the same serial oblivious algorithm can be grouped into a sequence of sections in many different ways, all having the same functionality; this holds even if we insist that the resulting parallel oblivious algorithm is of minimal time. A partition of a serial algorithm into sections induces a partition of the networks vertices into stages s.t. every edge goes from one stage into a later stage. In addition, all input vertices are in the first stage and all output vertices are in the last stage. Each stage corresponds to a different step except of the first stage (containing all input vertices) and the last stage (containing all output vertices). Each partition of a min/max network into stages induces a partition of the serial oblivious algorithm into steps. In both cases, there is no canonical partition of a network into stages or of an oblivious algorithm into steps even if we require minimal number of stages or steps. The two operations, “min” and “max”, can be combined into a single operation. The resulting model of computation is very similar to our min/max model except that only one operation exists. Namely, “hz, wi ⇐ sort(x, y)” where x, y, z, w stand for arbitrary variables with the only restriction is that z and w are distinct variables. Clearly, this change is superficial and effects neither the computability of the model nor the run time of a serial or parallel algorithm. Networks which represents such algorithms are called unrestricted network of comparators. Clearly, the internal vertices of such a network are comparators; however, there is no restriction on the fanout of these comparators; namely, a comparator has two out-ports corresponding to the minimal and maximal keys. An arbitrary number (including zero) of outgoing edges emerge from any out-port. Such a network was presented by Knuth [10, pp 233]. As said, the difference between unrestricted networks of comparators and min/max networks is superficial and we believe the latter to be a more natural model. The restricted networks of comparators, which are the main subject of this work and outside this section are simply referred to as networks, are defined in Chapter 2. As said, they are subclass of the unrestricted networks of comparators in which the fanout of inputs and the fanout of out-ports of intermediate vertices equals one. We refer to the model of computation which corresponds to restricted networks of 2 Two instructions are dependent if they write to the same variable or if one writes a variable which the other reads.
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
47
comparators as the read-once model. Its instructions are of the form “hz, wi ⇐ sort(x, y)” where x, y, z, w stand for arbitrary variables with the restriction that z and w are distinct variables. Furthermore, every computed or input value is used exactly once; therefore, the above x and y must be distinct variables. As said, it is sometimes desirable that the algorithm is a write-once algorithm. However, the common models of oblivious computation took the opposite approach. In these models, lets call them the in-place model s, there is only one set of variables. The same variables are used as input, output and as intermediate variables. (This implies that a variable should have two names: one as an input variable and another as an output variable.) The only instruction of the in-place model is the in-place sorting instruction; namely, “hx, yi ⇐ sort(x, y)” where x and y are distinct variables. This implies that any computed value is used exactly once and keys are neither duplicated nor lost. This model best corresponds to Knuth’s non-standard diagram [10, pp 237]. This diagram consists of horizontal lines, representing the variables, and vertical arrows representing the sorting operations, connecting the two variables in question with the arrowhead pointing to the variable where the maximal key is assigned. Clearly any Knuth’s diagram can be translated into a unique restricted networks of comparators; however, translation from a restricted network of comparators into Knuth’s diagram is not unique. As mentioned by Bilardi [6], every partition of the network into edge-disjoint paths, combined with an ordering of these paths, produces a distinct Knuth’s diagram. The accepted model of oblivious computation [16],[13], lets call it the ordered inplace model is a submodel of the previous one in which the set of variables is ordered. An instruction “hx, yi ⇐ sort(x, y)” is allowed only if x < y. This model corresponds to the well-known variant of Knuth’s diagram [10, pp 222]. In this diagram, lets call it Knuth’s standard diagram, the layout of the horizontal lines follows the order of variables; the arrowheads are redundant in this diagram; therefore, sort operations are represented as vertical lines. As before, any standard diagram can be translated into a unique restricted networks of comparators. Translation in the other direction (from a restricted network of comparators into Knuth’s standard diagram) is meaningful only when there is a natural order on the output vertices of the restricted network; such an order exists when the restricted network produces a single sequence rather than, for example, two sequences. Clearly, we would like to translate the network into a Knuth’s standard diagram in which the layout of the horizontal lines follows the order of output vertices of the network. Following Knuth [10, pp 239] and Bilardi [6], such a translation can be performed as follows. Let N be such a restricted network and let v be a permutation input vector which N sorts (according to the order of the output vertices mentioned above). Then the path traversed by key i is associated with the (i + 1)’th horizontal line. This induces the vertical lines representing the comparators. If no such vector
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
48
v exists, the network can not be translated into a Knuth’s standard diagram. A network usually has many such input permutations and therefore many translations into a Knuth’s standard diagram. This variety of representations have lead to confusion. Bilardi [6], for example, shown that two distinct Knuth’s standard diagrams, one of the Balanced network [8] and the other of Batcher’s bitonic sorter [1] actually represent networks which are isomorphic up to a relabelling of their inputs. This variety of representations is especially noticeable in the context of merging networks. Let N be a merging network of width 2n. Clearly, this network sorts any bisorted permutation and each of them leads to a different Knuth’s standard diagram. Namely, for any partition of the horizontal lines into two sets of size n there is a Knuth standard diagram of N in which the sequence ~a enters the horizontal lines via the first set, the sequence ~b enters via the second set and within each set the corresponding sequence enters the diagram in the natural order. As said, the min/max model is more powerful than the accepted model in two aspects. Firstly, as observed by Knuth [10, pp 241], there are some functions which are computable in the former and are not computable in the latter. This is due to the following property of such a function f : Kn → K, computable in the accepted model. The function f is either a projection or it has two distinct arguments xi and xj s.t. f (x0 , . . . , xn−1 ) is invariant under a transposition of the values of xi and xj , and this for any x0 , . . . , xn−1 ∈ K. The min/max model allow functions which violate this property, for example f (x0 , x1 , x2 , x3 ) = max(min(x0 , x1 ), min(x1 , x2 ), min(x2 , x3 )). The second aspect in which the min/max model is more powerful than the accepted one is the computation time of certain functions. Assume we have several functions, all from Kn to K and each of them can be computed in time t. In the min/max model all of them can be computed simultaneously in the same time t. In the accepted model it is not assured that all these functions can be computed together (ignoring the computation time) or if all of them can be computed in time t. A concrete example , already observed by Batcher and presented in [10, pp 233], has to do with insertion. An inserter of width n merges a sorted sequence of length n − 1 and a single key into a sorted sequence of length n. By a straightforward reachability argument, the depth of a network of comparators having this functionality is at least dlog(n)e. Now consider the min/max model. Any output key of an inserter depends on either two or three input keys. Not surprisingly, any output key can be computed from the relevant input keys in the min/max model; this in a constant run time. Hence, all output keys can be computed simultaneously in a constant time.
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
Chapter 12 Variety of Merging Techniques Many ingenious algorithms for oblivious merging have been invented. However, it is a common phenomena that radically different algorithms produce identical networks. For many applications the networks themselves are of importance and their properties are investigated. In order to prevent a duplication of effort it is desired to tell when different algorithms produce identical networks. This chapter shows that all published merging networks are Batcher merging networks1 . As said, this diagnostics does not require a complete understanding of the technique in question; in fact, a very superficial understanding of the algorithm suffices to establish that a merging technique produces generalized Batcher merging network. In this chapter we use Theorem 10.0.31 to prove such claims. We survey several published techniques and demonstrate that this tool does not require a full understanding of the oblivious algorithm in question but only a few details that the authors explicitly state. In order to use Theorem 10.0.31 we need to establish that a network is both AMOP and non-degenerate. It is usually easy to show that a network is AMOP but it is tedious to show it has no degenerate comparators; this is not critical since if a network would have a degenerate comparators they can be bypassed, by Lemma 4.1.2, without disturbing the AMOP property or the merging functionality of the network; therefore, we do not concern ourselves with showing them to be non-degenerate. A few words about non-Batcher merging networks are in order. Prior to this work there was no technique to produce a non-Batcher merging network which is non-degenerate and of minimal depth; moreover, it was not known if such a network exists. Such a technique is presented in Chapter 16. 1
Most published oblivious algorithms are only partly-specified; for example, they say ”sort this set of keys” without specifying how to do so; when such missing details are filled in (in a natural manner) the generated networks are Batcher merging network.
49
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
50
12.1
pre-recursive and post-recursive
As explained above, our main goal in the following discussion is to show that certain oblivious algorithms produce AMOP networks; therefore, during the following discussion, we ignore the functionality of the networks, their input and output labels and the min/max type of edges. Many oblivious recursive algorithms (not necessarily for merging) fall into one of the following two classes. In the first class, the given input is partitioned, in a data independent manner, into several smaller problems; each of these problems is solved recursively and, finally, some post-processing combines the solutions of the smaller problems into a solution of the original one. We call such algorithms pre-recursive. In the second class, some pre-processing partitions the given problem into several smaller problems which are then solved recursively and those solutions combined into a solution of the original problem without any post-processing. We call algorithms of the second class post-recursive. An oblivious algorithm of the above types can also be a multi-choice algorithm. Such an algorithm can choose, in a data independent manner, one of several modes of operations. For example, the generalized Batcher merging technique is a multi-choice pre-recursive algorithm. A restricted form of a multi-choice pre-recursive oblivious algorithms produces networks of the following type. A family (i.e. a set) of networks < is pre-recursive if every member of < is composed of a pre-processing network A followed by a postprocessing network B such that: 1. The network A is a disjoint sum of two (non empty) networks N1 , N2 s.t. each Ni is either a single edge or a member of vi+1 . This implies that ~v has a 0 − 1 monotonic image ~u in which u0 , ui+1 = 0 and ui , uj = 1; hence, ~u is not bitonic. It is not hard to see that not all natural sets of vectors are complete; for example: a) For n > 2, the set of unitonic sequences of width n is not complete. b) For n > 1, the set of sandwich vectors of width 2n is not complete. The following lemma generalizes Lemma 13.1.4 to cases where the output of the network is not required to be sorted, but to be a member of a complete set of vectors. Lemma 13.1.8. Let X be a set of input vectors of a network N , let X support Y , let S be complete and let T N (X) ⊂ S. Then T N (Y ) ⊂ S. Proof. Assume for a contradiction, that T N (Y ) * S. Hence, there is a vector v ∈ Y s.t. T N (v) ∈ / S. Since S is complete, S does not support T N (v); that is, there is a monotonic key function f s.t. f (T N (v)) is a 0-1 vector and f (T N (v)) ∈ / S = S. By Lemma 3.0.2, T N (f (v)) = f (T N (v)) and so T N (f (v)) ∈ / S. This implies that f (v) ∈ / X. Moreover, by Lemma 13.1.2, f (v) ∈ / X, contradicting the fact that X supports v. Lemma 13.1.4 follows from Lemma 13.1.8 when Y = V, X = V 0−1 and S is the set of the sorted sequences and from the fact that if a set of vectors V is monotonicallyclosed then V 0−1 supports V . We present two applications of Lemma 13.1.8 which enable us to construct conclusive sets much smaller then those restricted to 0 − 1 vectors. Consider the two straightforward facts: 1. The set of unitonic permutations supports the set of bitonic sequences.
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
60
2. The set of sandwich permutations supports the set of bisorted sequences. Lemma 13.1.8 and the facts above imply that: Lemma 13.1.9. a) A network is a merging network iff it is a sandwich-permutation sorter. b) A network is a bitonic sorter iff it is a unitonic-permutation sorter. In contrast to unitonic sorters, not all ascending-descending sorters are bitonic sorters; that is, for every n ≥ 3, it is easy to construct an ascending-descending sorter of width n, which is not a bitonic sorter. As said, the above techniques are powerful tools for constructing small conclusive sets – much smaller then those restricted to 0 − 1 vectors. A natural question in this context is what is the smallest set of 0 − 1 vectors which is conclusive for sorting, merging, etc. By Lemma 13.1.9, there is a conclusive set for merging two sequences of width n each into a sorted sequence which has n + 1 members; however, by the following lemma, 13.1.10, the smallest conclusive set of 0 − 1 vectors for the same functionality has (n+1)2 −2 vectors. For the functionality of sorting bitonic sequences of width n, Lemma 13.1.9 implies there is a conclusive set of size n while, again by Lemma 13.1.10, the smallest conclusive set of 0 − 1 vectors is of size n · (n − 1). Lemma 13.1.10. For any non-constant v ∈ {0, 1}n there is a function f : {0, 1}n → {0, 1}n , computable by a network of comparators, s.t. f (v) is not sorted while f (v 0 ) is sorted for any other v 0 ∈ {0, 1}n . Proof. Given such a vector v let f = T N where N is constructed as follows. First of all N statically (without any comparisons) partitions its input vector u ∈ {0, 1}n into two sequences ~u 0 and ~u 1 s.t. ~u 0 contains all the elements uj where vj = 0 and ~u 1 contains all the elements uj where vj = 1. Since v is non-constant none of the sequences ~u 0 and ~u 1 is empty. Let n0 and n1 be the width of ~u 0 and ~u 1 , respectively. The network N computes x0 and x1 which are the maximal element of ~u 0 and the minimal element of ~u 1 , respectively. Let w ~ be the sequence composed of all elements of u except x 0 and x 1 . The network N partitions w ~ into w ~ 0 and w ~ 1 s.t. |w ~ 0 | = n0 − 1, |w ~ 1 | = n1 − 1 and w ~ 0 1. Let N be a minimal ˆ depth bitonic sorter of width n. By Lemma 14.0.4 (4), imf N is the order inverting n ˆ ˆ by bypassing mapping of [0, n2 ); that is, imf N = imf X . Let N 0 be derived from N ˆ w.r.t. merging. Clearly, imf N 0 = imf Nˆ . By Lemma all degenerate comparators of N 14.0.4 (2), N is AMOP. Therefore, N 0 is a regular compact merging network. By Theorem 10.0.31, N 0 is a Batcher merging network. 0 n As said, imf N = imf X and both networks are Batcher merging networks; by Lemma 5.0.9, N 0 ∼ = X n . By Lemma 14.0.5 (2), each path in N 0 from an input edge ˆ is log(n), no to an output edge has exactly log(n) comparators. Since the depth of N 0 ˆ . That is, N ˆ ∼ comparator has been bypassed while deriving N from N = N0 ∼ = X n. ˆ∼ Since N is derived from N = X n by a predefined relabelling, N is unique. We now show that the requirement “n is a power of two” in Lemma 14.0.6 is mandatory. Consider the case of n = 3. Clearly, any sequence of length 3 is bitonic and so any bitonic sorter of width 3 is a sorting network. It is easy to verify that the depth of a sorting network of width 3 is at least 3. Two non-isomorphic bitonic sorters of width 3 are depicted in figure 14.1. These networks are not isomorphic, even after any relabeling of their input and output edges, since the left one has a path of edges of type hmax , min , maxi and the network on the right does not have such a path. x ˆ0
oˆ0
x ˆ0
oˆ0
x ˆ1
oˆ1
x ˆ1
oˆ1
oˆ2
x ˆ2
oˆ2
x ˆ2
Figure 14.1: Two non-isomorphic bitonic sorters of width 3 By Lemma 14.0.4 (1), the minimal depth of a bitonic sorter of width 4 is 2 while by the above argument the minimal depth of a bitonic sorter of width 3 is 3. This yields the following surprising result:
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
66
Lemma 14.0.7. The minimal depth of a bitonic sorter of width n is not a monotonic function of n. Clearly, the functionality of bitonic sorting is stronger then merging. As discussed in Chapter 14, there are 4n natural ways to relabel a bitonic sorter of width 2n into a merging network. However, if the bitonic sorter is of minimal depth, then by Lemma 14.0.6 it is unique. Moreover, this unique networks halves the inputs after a depth one network. The only minimal depth merging network with this property is the strictly-cross network. Therefore Lemma 14.0.8. Let B be a bitonic sorter of minimal depth whose width is a power of two and let M be a merging network derived from B only by relabelling of its input edges. Then M is the strictly-cross network.
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
Chapter 15 Zipper sorters This chapter introduces a new merging technique which is based on a certain way to quantify the amount of “unsortedness” of a bysorted vector and a network of depth one which halves this quantity. This technique produces minimal depth merging networks which, not surprisingly, are Batcher merging networks. A bisequenced vector h~x, ~y i of width 2n is a (k± 2t )-zipper iff all following conditions hold a ) t ∈ N + 1, k + 2t ∈ Z, |k| ≤ 2t . The parameters t, k are called the tolerance and the offset of the zipper. b ) ~x and ~y are sorted; hence, a zipper is bisorted. c ) ∀ i, j ∈ [0, n) : j = i + k −
t 2
⇒ y j ≤ xi
d ) ∀ i, j ∈ [0, n) : j = i + k +
t 2
⇒ xi ≤ y j
A pair hk, ti that satisfies condition (a) is called (zipper) admissible. In this discussion, the only interesting case where the tolerance t is odd is for t = 1. Such a zipper is essentially sorted; namely, it can be transformed into a sorted sequence by rearrangement of its elements (without any comparisons). By our definition, any hk, t, 2ni as above are associated with a set Γk,t,2n of weak inequalities of the form αi ≤ βj where α, β ∈ {x, y}. A vector h~x, ~y i is a (k ± 2t )-zipper of width 2n if it satisfies all the inequalities of Γk,t,2n . In our drawings of a zipper, we use directed edges to denote the inequalities of condition (c) and (d) as above. An edge from αi to βj denotes that αi ≥ βj . All of the inequalities of Γk,t,2n are clearly preserved under any monotonic key functions; moreover, if a bisequenced vector violates an inequality of Γk,t,2n , it has a 0 − 1 monotonic image which violates this inequality. Therefore: 67
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
68
x0
y0
x1
y1
x2
y2
Figure 15.1: A (1 ± 22 )-zipper of width 6
Lemma 15.0.9. For any n ∈ N+1 and an admissible hk, ti, the set of (k ± 2t )-zippers of width 2n is monotonically-closed and complete. We present a number of straightforward properties of zippers. To this end, we remind the reader of the following notations. For two sequences ~a and ~b, ~a · ~b denotes the concatenation of the sequence ~a followed by the sequence ~b. Let ~a ¿ ~b denote that any element of ~a is smaller or equal to any element of ~b. For a sequence ~a of width n and for i < j < n, let ~a[i, j) denote the subsequence of ~a containing all elements of ~a from position i until (not including) position j. Lemma 15.0.10. h~x, ~y i is a (k ± 2t )-zipper iff h~y , ~xi is a (−k ± 2t )-zipper. Lemma 15.0.11. Let h~x, ~y i be a (k ± 2t )-zipper, let ~a, ~b be two sorted sequences of width j such that ~a ¿ ~x, ~y and ~b À ~x, ~y and let hk − j, ti be admissible. Then h(~a · ~x), (~y · ~b)i is a ((k − j) ± 2t )-zipper. Lemma 15.0.12. Let ~a, ~b be two sequences of width j, let ~x and ~y be two sequences such that h(~a · ~x), (~y ·~b)i is a (k ± 2t )-zipper and let hk + j, ti be admissible. Then h~x, ~y i is a ((k + j) ± 2t )-zipper.
15.1
Tolerance-halvers
Let ht, ki be admissible, let t be even and let n ∈ N. A tolerance-halfer with parameters ht, k, 2ni is a mapping that transforms any (k ± 2t )-zipper of width 2n to a ˜ t i is admissible. A network N (k˜ ± 4t )-zipper, where k˜ = |k|− 4t . Note that the pair hk, 4
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
69
is a tolerance-halfer for the parameters (t, k, 2n) iff its input/output transformation T N is a tolerance-halfer with those parameters. We now present a conclusive set for the functionality of tolerance halving. A bisequenced vector v = h~x, ~y i is called arithmetic iff the sequences ~x and ~y contain only integer numbers1 , both ~x and ~y are arithmetic progressions with difference one and x0 = 0. Arithmetic zippers are easy to study due to the following trivial fact: Lemma 15.1.1. An arithmetic bisequenced vector h~x, ~y i is a (k ± 2t )-zipper iff y0 ∈ [− 2t − k, 2t − k]. Note that the set of arithmetic (k± 2t )-zippers supports the set of all (k± 2t )-zippers. This fact and Lemmas 15.0.9 and 13.1.8 imply the following lemma. Lemma 15.1.2. For any even t and k, n ∈ N, a network N is a tolerance-halfer with parameters (k, t, 2n) iff it has this functionality when the input vectors are further restricted to be arithmetic. For any t, k, 2n as above, we construct a tolerance-halfer denoted hk,n . Note that the tolerance t is missing in hk,n , since our mapping is independent of t. As said, the argument of hk,n is a bisequenced vector, say h~x, ~y i. We divide the definition of hk,n to three cases according to the offset k. • Case 1 ,
0≤k≤n:
let hk,n (h~x, ~y i) , h~u, ~v i where
~u = min(~x[0, n − k), ~y [k, n)) · ~x[n − k, n) ~v = ~y [0, k) · max(~x[0, n − k), ~y [k, n)). • Case 2 , n < k : Let hk,n (h~x, ~y i) , hn,n (h~x, ~y i). (Note that hn,n (h~x, ~y i) = h~x, ~y i). • Case 3 ,
k 0. Let ~a be the sequence of width k composed only of the key −∞ and let ~b the sequence of width k composed only of the key +∞. By Lemma 15.0.11, h~a · ~x, ~y · ~bi is a (0 ± 2t )-zipper. Let h~p, ~qi = H 0,k+n (h~a · ~x, ~y · ~bi). By the case of k = 0, h~p, ~qi is a ((− 4t ) ± 4t )-zipper of width 2n + 2k. It is not hard to see that p~ = ~a · ~u and ~q = ~v · ~b. By Lemma 15.0.12, h~u, ~v i is a (k˜ ± 4t )-zipper. Let ht, ki be admissible. A network Z is a (k ± 2t )-zipper sorter of width 2n iff it sorts any (k ± 2t )-zipper of width 2n. Recursive applications of Lemma 15.1.3 produce a (k ± 2t )-zipper sorter as stated in the following lemma: Lemma 15.1.4. Let ht, ki be admissible and let n ∈ N + 1. Then there exists a (k ± 2t )-zipper sorter of width 2n and of depth dlog te. Lemma 15.1.4 for the special case of k = 2t has been observed by Batcher and Lee [2]. It is not hard to see that, when t ≤ n, the (k ± 2t )-zipper sorter provided by Lemma 15.1.4 is of minimal depth.
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
71
15.2
The zipper merging technique
The concept of a zipper offers the following “zipper merging technique” to construct minimal depth merging networks. The simple variant of this technique is based on the fact that any bisorted vector is a (0 ± 2n )-zipper; therefore it can be sorted by a 2 2n (0 ± 2 )-zipper sorter. As Batcher has observed [2], the resulting network when n is a power of 2 is exactly his odd/even network. The general and more interesting variant is based on the fact, shown shortly, that any bisorted vector of width 2n can be transformed, by a depth one network, into a (j ± n2 )-zipper for any given j ∈ [− n2 , n2 ). By Lemma 15.1.4, the resulting vector can be sorted by a network of depth log n. To define the above depth one network, we proceed as follows. Let l ∈ [0, n) and let h~x, ~y i be a bisequenced vector of width 2n. Define ~x 0 , x[0, l), ~x 00 , x[l, n), ~y 0 , y[0, n − l), ~y 00 , y[n − l, n) ~u 0 , min(~x 0 , ~y 00 ), ~u 00 , max(~x 00 , ~y 0 ), ~v 0 , min(~x 00 , ~y 0 ), ~v 00 , max(~x 0 , ~y 00 ) zipl,n (h~x, ~y i) , h~u 0 · ~u 00 , ~v 0 · ~v 00 i Clearly there is a unique network of depth one that implements the zipl,n operator (See Figure 16.1). Let Gl,n be that network.
~u 0
~x 0
~y 0
uo 11 00 00 11 u1 00 11 00 11 u2 11 00 u3 11 00 u4 11 00
xo 1 0 0 1 x1 1 0 0 1 x2
yo 1 0 0 1 y 0 1 01 1 1 0 y2 1 0 y3 1 0 y4
~u 00
~x 00
0 1 x3 0 1 x4 0 1
~y 00
~v 0 vo 1 0 0 1 v 0 1 01 1 1 0 v2 1 0 v3 1 0 v4 ~v 00
Figure 15.3: The G3,5 network
Lemma 15.2.1. Let 0 ≤ l ≤ n. Then h~u, ~v i is a (( n2 − l) ± n2 )-zipper of width 2n iff (~u, ~v ) = zipl,n (h~x, ~y i) for some bisorted vector h~x, ~y i. Proof. Consider the right to left implication. Let h~u, ~v i = zipl,n (h~x, ~y i) and let ~x 0 , ~x 00 , ~y 0 , ~y 00 , ~u 0 , ~u 00 , ~v 0 , ~v 00 be as above. We first show that h~u, ~v i is bisorted. Clearly,
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
72
the four sequences ~u 0 , ~u 00 , ~v 0 , ~v 00 are ascending. By definition ~x 0 ¿ ~x 00 ; this implies that ~u 0 = min(~x 0 , ~y 00 ) ¿ max(~x 00 , ~y 0 ) = ~u 00 . By symmetry, ~v 0 ¿ ~v 00 . Hence, h~u, ~v i is bisorted. Consider condition (c) in the zipper definition for t = n and k = n2 − l and let i, j ∈ [0, n) where j = i + k − 2t = i − l. This implies that ui is a member of ~u 00 and that ui and vj are the max and min of the same pair of keys; therefore, ui ≥ vj . That is, condition (c) holds. By symmetry, condition (d) holds. To establish the left to right implication we show that any (( n2 − l) ± n2 )-zipper, h~u, ~v i, is a fixed point of zipl,n ; that is, h~u, ~v i = zipl,n (h~u, ~v i). The (c) condition of a zipper implies that ~u 00 = max(~u 00 , ~v 0 ) and that ~v 0 = min(~u 00 , ~v 0 ). The (d) condition of a zipper implies that ~u 0 = min(~u 0 , ~v 00 ) and that ~v 00 = max(~u 0 , ~v 00 ). Altogether, h~u, ~v i is a fixed point of zipl,n . Our zipper merging technique is based on Lemmas 15.2.1 and 15.1.4. The network G transforms any width 2n bisorted vector into a ( n2 − l ± n2 )-zipper which can be sorted by a depth log n network provided by Lemma 15.1.4. l,n
15.3
Unique zipper sorter
We now show that minimal depth zipper sorters with no degenerate comparators are unique. To this end, we consider minors of zippers sorters having the following functionality. An inserter of width n is a network I that receives a single key x and an ascending sequence ~y , of width n − 1 and sorts any such vector. Its input edges are labeled xˆ and yˆi for 0 ≤ i < n − 1. As we show, any zipper sorter contains several inserters which can be generated by the pruning operation introduced in Chapter 4. Let hk, ti be admissible and let Z be a (k ± 2t )-zipper sorter of width 2n. For any edge xˆi , Z can be pruned into an inserter whose edge xˆ is the edge xˆi of Z and all the other xˆj edges of Z are pruned. This pruning is charted by the input vector v i of Z defined as follows: v i = h~x, ~y i is the (k ± 2t )-zipper of width 2n composed only of the keys {−∞, 0, +∞}; xi = 0; any other element αj of v i is either −∞ or +∞; it is −∞ when αj ≤ xi and it is +∞ when αj ≥ xi . If the zipper conditions do not settle the relative order of xi and αj , then αj = 0. Note that the key 0 appears at least once and at most t times in v i . Let I i be the normalized variant of prun(Z, v i ). Since v i is a (k ± 2t )-zipper, this pruning is honest and I i is an inserter of width w where w is the number of zeroes in v i . (Note that w may equal one.) The following lemma follows from a straightforward reachability argument. Lemma 15.3.1. Let I be an inserter of width n and let n ˜ = 2dlog(n)e . Then:
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
73
a) The depth of I is at least log(˜ n). b) If the depth of I is log(˜ n) then there is a j such that xˆ and yˆj are matched and j ∈ [n − n˜2 − 1, n˜2 − 1]. In particular, If n is a power of two, then yˆj is exactly the center member of the yˆ sequence. We use Lemma 15.3.1 to study the structure of minimal depth zipper sorters. To this end, we generalize the definition of an input matching function to zipper sorters and tolerance-halvers; in this case the sequence ~x replaces the sequence ~a and ~y replaces ~b. The following lemma states that the i.m.f. of a minimal depth (k ± 2t )-zipper sorter is determined by k and its width n. Lemma 15.3.2. Let hk, ti be admissible, let t be a power of 2 and n ≥ t > 1. Let Z be a non-degenerate (k ± 2t )-zipper sorter of depth log(t) and of width 2n. Then k,n imf Z = imf H
Proof. By Lemma 15.0.10, we may assume k ≥ 0. We show that two edges, xˆi and yˆj , are matched iff j = i + k. Let i ∈ [0, n), let v i be defined as above and let P i , prun(Z, v i ); let wi denote the width of P i . Recall that 1 ≤ wi ≤ t. Let the input edges of P i be xˆi and yˆli , yˆli +1 , . . . yˆmi ; then, li = max(0, i + k − 2t + 1) and mi = min(n − 1, i + k + 2t − 1). We consider two cases according to wi . Case 1 , wi > 2t . Lemma 15.3.1, applied on the normalized version of P i and for n ˜ = t, implies that P i and Z are of the same depth, log(t); therefore, the edge xˆi is matched, both in P i and in Z, with the same yˆ edge, say yˆj . First assume that li 6= 0 and that mi 6= n − 1. In this case wi = t. By Lemma 15.3.1, j is the center member of the interval [li , mi ]; i.e., j = i + k. Assume now that either li = 0 or mi = n − 1 (or both). The two cases are similar and we consider only the first. Since n ≥ t, the condition of li ≥ 0 and wi > 2t holds exactly when i ∈ [0, 2t − k) , Q. Let f : Q → Z be defined by f (q) = imf Z (q) − k. As said, f is total and clearly, f is one-to-one. Let q ∈ Q. Lemma 15.3.1 and the fact that wi − 2 = mi = i + k + 2t − 1 imply that xˆq is matched with an edge yˆj where i + k = mi + 2 − 2t − 1 = wi − 2t − 1 ≤ j < 2t ; hence, f is a permutation of Q and f (q) ≥ q. This implies that f is the identity function; i.e., imf Z (i) = i + k. Case 2 ,wi ≤ 2t : In this case i ∈ [n − k, n). By case (1), all input edges yˆj for j ∈ [k, n) are matched to previous input edges. Let h~x, ~y i be a (k ± 2t )-zipper. Condition (c) of a zipper and the facts that hk, ti is admissible and that n ≥ t imply that for j ∈ [0, k), xi ≥ yi+k− 2t ≥ y(n−k)+k− 2t = yn− 2t ≥ y 2t ≥ yk ≥ yj . Since Z has no degenerate comparators, the input edge xˆi is not matched.
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
74
Let hk, ti be admissible and let k˜ = |k| − 4t . Recall that hk,n transforms a (k ± t )-zipper into a (k˜ ± 4t )-zipper. The next lemma provides some fixed points of hk,n 2 and is useful to establish the uniqueness of certain zipper sorters. Lemma 15.3.3. Let t be even, let hk, tibe admissible, let k˜ = |k| − 4t and let h~x, ~y i be a (k˜ ± 4t )-zipper of width 2n. Then hk,n (h~x, ~y i) = h~x, ~y i. Proof. By Lemma 15.0.10, we may assume k ≥ 0. Let h~u, ~v i = hk,n (h~x, ~y i). First consider all i ∈ [n−k, n). By definition of hk,n , ui = xi and vn−i = yn−i . Next consider all i ∈ [0, n − k). Since h~x, ~y i is a (k˜ ± 4t )-zipper, xi ≤ yi+k+ ˜ t = yi+k− t + t = yi+k . 4 4 4
Lemma 15.3.4 (zipper sorters are unique). Let t be a power of 2 and n ≥ t. Then there is a unique non-degenerate (k ± 2t )-zipper sorter of depth log(t). Proof. The existence of such a sorter is provided by Lemma 15.1.4. We prove the uniqueness by induction on t. Let Z be a non-degenerate (k ± 2t )-zipper-sorter. For t = 1, any valid input of Z is essentially sorted; therefore, Z has no comparators and its input/output labels are predefined. Assume t > 1. By Lemma 15.3.2, Z is concatenation of H k,n and a network Z 0 . By Lemma 15.3.3, when Z receives a valid input then its subnetwork Z 0 receives a (k˜ ± 4t )-zipper; furthermore, any (k˜ ± 4t )-zipper can enter Z 0 this way. Therefore, Z 0 is a non-degenerate (k˜ ± 4t )-zipper sorter of depth log(t) − 1. By the induction hypothesis, Z 0 is unique and therefore Z is unique. We now show that all merging networks produced by the “zipper merging technique” are Batcher merging networks. By Lemma 15.2.1 and arguments which appeared in the last proof, for every k ∈ [0, n) there is exactly one merging network M such that the following holds k,n
1. imf M = imf G . 2. M is non-degenerate. 3. Its depth is log(n). k,n
It is not hard to see that imf G is of the form x 7→ x + k(mod n). By Lemma 6.0.16, it is a congruent function and so if imf M ; hence, there is a Batcher merging network which satisfies the above conditions.
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
Chapter 16 Merging by tri-section This chapter presents a method for constructing merging networks which is based on partitioning, by a depth one network, the bisorted input vector into three sequences, ~x, ~y , ~z, such that ~x ¿ ~y ¿ ~z. This allows us to sort each of these sequences separately and in our technique the resulting network is of minimal depth log(2n); furthermore the sequences ~x and ~z are sorted by networks of depth dlog(|~x|)e and dlog(|~z|)e, respectively. There is also a certain freedom concerning the length of these sequences; in particular, the length of ~x or ~z can be any number of [0, n) where 2n is the width of the network. This technique is of interest by itself and has the following practical advantage. Let the length of ~x or ~z be k which is much smaller then n; the k smallest or largest keys exit the merging network after a delay of at most 1 + dlog(k)e comparators; i.e. the resulting merging network produces the k smallest keys or the k largest keys faster then the well-known bound of log(2n). This technique also answers the following question. Most of oblivious merging algorithms published are recursive (or can be presented as recursive) and produce minimal depth merging networks. A question arises whether these two properties are dependent on each other or whether there exists a minimal depth merging network (without degenerate comparators1 ) which has no recursive structure. We show that the latter is correct. We actually present two tri-section methods. In the symmetric method, |~x| = |~z| and this number is a power of two. This is the only restriction regarding on the length of these sequences; moreover, ~x and ~z are sorted by isomorphic networks. In the asymmetric method, usually |~x| 6= |~z| and the only restriction regarding the length of these above three sequences is |~y | = n. 1
Producing such a network with degenerate comparators is trivial.
75
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
76
In both tri-section methods, the lowest |~x| keys and highest |~z| keys exit the merging network after a delay of at most dlog(|~x|)e+1 and dlog(|~z|)e+1 comparators, respectively; i.e. the longest path ending at an output edge oˆi , corresponding to one of the |~x| smallest keys or to the |~z| largest keys has at most the above number of comparators. The two best known Batcher merging network (the strictly-cross and strictly-parallel networks) are rather weak in this regard. In the former, every path from input to output goes through exactly log(2n) comparators. In the latter the lowest and highest keys come out after only one comparator. For any other output edge, there is a path of length log(2n) comparators which ends at this edge. In our tri-section techniques the ~x, ~y and ~z sequences can be sorted s.t. the resulting merging network is a Batcher merging network. It is not clear wether these sequences can be sorted, s.t. the resulting merging network is not a Batcher merging network, while maintaining the above depth restrictions. However, if we drop the restriction of sorting ~x by a network of depth dlog(|~x|)e, we can produce minimal depth merging networks which are not Batcher merging networks, as shown ahead. That is, for any n a power of two greater then 8, there is a minimal depth merging network of width 2n which is not a Batcher merging network; however, we do not know of any such network having useful or interesting properties.
16.1
The asymmetric tri-section method
We now define the tsec operator which partitions a bisorted vector into three sequences ~x, ~y and ~z as above. Let k ∈ [0, n) and let h~a, ~bi be a bisorted vector of width 2n. Define ~a 0 , a[0, k), ~a 00 , a[k, n), ~b 0 , rev(b[0, k)), ~b 00 , rev(y[k, n)) ~x , rev(min(~a 0 , ~b 0 )), ~y , max(~a 0 , ~b 0 ) · min(~a 00 , ~b 00 ), ~z , rev(max(~a 00 , ~b 00 )) and finally tseck,n (h~a, ~bi) , h~x, ~y , ~zi Clearly there is a unique network of depth one that implements the tseck,n operator. Let T k,n be that network. See Figure 16.1. Lemma 16.1.1. Let h~a, ~bi be a biosrted vector of width 2n, let k ∈ [0, n) and let h~x, ~y , ~zi = tseck,n (h~a, ~bi). Then 1. ~x ¿ ~y ¿ ~z. 2. |~x| = k,
|~y | = n,
|~z| = n − k.
3. ~x is ascending-descending, ,~z is descending-ascending. 4. ~y is bitonic.
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
77
yo
~y
0 1 1 0 1 y3 0 1 y4 0 0 1 0 1 1 0 1 0 1 0 1 y9 0 0 1 0 1
ao
0 1 1 0 1 a3 0 1 a4 0 0 1 0 1 1 0 1 0 1 0 1 0 a9 1 0 0 1 ~a
b
00o 11 11 00 11 00 b3 11 00 b4 00 11 00 11 11 00 11 00 11 00 11 00 b9 00 11 00 11 ~b
x
o 0 1 x1 1 0 x2 1 0 x 13 0 z0 0 1 0 1 z1 1 0 z2 1 0 z3 1 0 z 14 0 z 0 1 05 1
~x
~z
Figure 16.1: The T 4,10 network
5. If n is a power of 2, then imf T
k,n
is a congruent function.
Proof. Items (1) and (3) follow from the generalized 0-1 principle (Lemmas 13.1.8 and 13.1.7). Item (2) follows immediately from the definition above. To show item k,n k,n (5), it is not hard to see that imf T (i) = n − 1 − i + k( mod n); hence, imf T is a composition of the order reversing permutation and the cyclic rotation by k permutation. By Lemma 6.0.16, these two permutations are congruent functions and k,n by Lemma 6.0.15, imf T is a congruent function. It remains to show item (4). The conventional manner to prove such a claim is via the 0-1 principle; however, this leads to many special cases which need to be verified. In contrast, we use the set of permutation sandwich vectors which supports the set of bisorted vectors. We use Lemma 13.1.8 and the fact that the set of bitonic sequences is complete. Note that a permutation sandwich vector h~a, ~bi is determined by the key a0 . There are two (overlapping) cases; either a0 ≤ k or a0 ≥ k. The two cases are similar and we consider only the first. Let j = a0 ≤ k. Since h~a, ~bi is a sandwich, ~b[0, j) ¿ ~a ¿ ~b[j, n); therefore, the following equations hold: y[0, k − j) = rev(b[j, k)). y[k − j, k) = max(a[k − j, k), rev( b[0, j) )) = a[k − j, k) y[k, n) = min(a[k, n), rev(b[k, n))) = a[k, n). Altogether, ~y = rev(~b[j, k)) · ~a[k − j, n) which is a concatenation of a descending and ascending sequences; i.e. ~y is bitonic.(The sequence ~y is not always descendingascending; in the other case, where a0 ≥ k, the sequence ~y comes out ascendingdescending.)
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
78
We henceforth assume that n is a power of two and show how to sort the sequences provided by the previous lemma. The sequence ~y is bitonic and of length n; therefore, by Lemma 14.0.6, it can be sorted by (the unique) bitonic sorter of depth log(n). For the sequence ~x consider the following fact. Any ascending-descending sequence can be expanded into a bitonic sequence of a desired width by adding keys of value −∞ at the end of the sequence. This implies any wide enough bitonic sorter can be pruned into an ascending-descending sorter of a smaller given width; therefore, ~x can be sorted by depth dlog(|~x|)e. Note that the fact that ~x is ascending-descending, rather then bitonic, is critical; there is no way to expand a bitonic sequence into a wider bitonic sequence by adding the keys −∞ and +∞ at predefined positions; moreover, when n is not a power of two, the minimal depth of a width n bitonic sorter might exceed dlog(n)e (see Lemma 14.0.7). By symmetry, the sequence ~z can be sorted in a similar manner. In this merging network the |~x| lowest keys and the |~z| highest keys exit the network after a delay of dlog(|~x|)e + 1 and dlog(|~z|)e + 1 comparators, respectively. This is useful when |~x| or |~z| are small relative to n. It is not hard to see that the resulting merging network is AMOP and has no degenerate comparators w.r.t. merging (and if it has such degenerate comparators then they can be bypassed, as per Lemma 4.1.2); therefore, by Theorem 10.0.31, our asymmetric tri-section merging network are Batcher merging network. If we drop the request that the sequence ~x is sorted by a network of depth dlog(|~x|)e, we can construct a non-degenerate, minimal depth merging network M which is not a Batcher merging network. This construction is based the fact that when |~x| is small w.r.t. n, the sequence ~x can be sorted in an inefficient manner (by a network of excessive depth) while maintaining the minimal depth of the entire merging network. Since we are free to choose how to sort ~x with very little restrictions, we may sort ~x using a non-AMOP network. For |~x| = 3 such a network X is depicted in Figure 16.2. Let M be the merging network constructed by the asymmetric tri-section method as described above except that M sorts the sequence ~x using the network X. When n > 8, M is a minimal depth merging network.Note that none of the comparators of X is degenerate in M (w.r.t. merging); moreover, the network X is not AMOP; therefore, by Theorem 10.0.31, M is not a Batcher merging network. Assume that |~x| is large and still much smaller than n; than it is possible to sort ~x by a network which is clearly of no recursive structure of the “divide and conquer” form; two good examples are Knuth’s bubble-sort network and Knuth’s odd-even transposition sort [10, pp 223,241]. This provides a non-degenerate merging network of minimal depth having an arbitrary large subnetwork which posses no recursive structure.
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
79
xˆ0
oˆ0
xˆ1
oˆ1 oˆ2
xˆ2
Figure 16.2: The network X – an ascending-descending sorter of width 3 and of depth 3 which is not AMOP.
16.2
The symmetric tri-section method
Nakatni et al. [12] introduced an elegant manner to construct a bitonic sorter which is based on the following lemma. ~~ Lemma 16.2.1 ([12]). Let ~x be a bitonic sequence of width j · k and M be the j × k matrix having the ~x sequence in a row major fashion. Then : ~~ 1) Every row in M is a bitonic sequence. ~~ 2) Every column in M is a bitonic sequence. ~~ 0 ~~ Let M be derived from M by sorting each column separately. Then: ~~ 0 3) Every row in M is a bitonic sequence. ~~ 0 4) Every element of M is smaller or equal to any element of the next row. Nakatni’s matrix technique to sort bitonic sequences of length j × k is based on the above lemma and is composed of three stages: Stage 1: Arrange the bitonic sequence in a j × k matrix in a row major fashion. Stage 2: Independently, sort every column by a bitonic sorter. Stage 3: Independently, sort every row by a bitonic sorter. Following those stages the resulting matrix is sorted in a row major fashion. Assume henceforth, that j and k are a power of two and that the bitonic sorters used in stage (2) and stage (3) are of minimal depth. The depth of the entire network is log(j)+log(k) which is minimal. The technique of Nakatani et al. is new and elegant; however, by Lemma 14.0.6 and the fact that their bitonic sorter is of minimal depth their sorter is the Batcher bitonic sorter; that is, their technique does not produce new bitonic sorter but sheds new light on the Batcher bitonic sorter.
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
80
Our2 symmetric tri-section technique is a specialization of Nakatani’s method to the special case where the input is a bitonic sequence of special form; namely, of the form ~a · rev(~b) where h~a, ~bi is bisorted. Our technique differs from Nakatani’s only in stage (2). We use the fact that a column, after stage (1), is not only bitonic but also bisorted; hence it can be sorted by any merging network. We choose a Batcher merging network in which the lowest and highest keys exit after only one comparator (for example, the strictly-parallel network). Let ~x and ~z be the keys of the first and last row, respectively, after the above Stage 2; let N 0 be the depth one network which produces ~x and ~z. (each comparator of N 0 produce a member of these sequences.) Clearly, N 0 partitions the keys into the sequences ~x, ~z and ~y which is composed of all other keys. Since keys of ~x are compared from this stage forward only among themselves and the same holds for ~z; this implies that the keys are tri-sected. As promised, the smallest k and the largest k keys exit the entire merging networks after a delay of exactly 1 + log(k) comparators. It is not hard to see that the resulting merging network is both AMOP and non-degenerate; therefore, by Theorem 10.0.31, it is a Batcher merging network.
2
Batcher and Lizka [3] presented an oblivious merging algorithm based on different ideas but actually similar to the technique we present now. They did not observe that this is a tri-section technique and that it has the practical advantage of producing the k smallest keys and k largest keys faster then log(2n).
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
Bibliography [1] Batcher K.E. Sorting networks and their applications. Proc. AFIPS Spring Joint Computer Conf.,Vol. 32, 1968, pp. 307-314. [2] Batcher K.E. & Lee D.L. A Multiway Merge Sorting Network. IEEE Trans on Prallel And Distibuted Systems. Vol. 6, 1995, pp. 211-215. [3] Batcher K.E. & Liszka K.J. A Modulo Merge Sorting Network. Fourth Symposium on the Frontiers of Massively Parallel Computation, 1992, pp. 164 - 169. [4] Becker R.I. & Nassimi D. & Perl Y. The Genralized Class of g-Chain Periodic Sorting Networks. IEEE Vol. 32, 1994, pp. 424-432. (Earlier version appears in SPAA 1993. [5] Bender E.A. & Williamson S.G. Periodic Sorting Using Minmum Delay Recursively Constructed Merging Networks. Electronics Journal Of Combinatorics, Vol. 5, 1998. [6] Bilardi G. Merging and Sorting Networks with the Topology of the Omega Network. IEEE Trans on Computers. Vol 38, 1989, pp. 1396-1403. [7] Coremen T.H. & Leiserson C.E. & Rivest R.L. & Stein C. Introduction To Algorithms. Mcgraw-Hill. [8] Dowd M. & Perl Y. & Rudolph L. & Sacks M. The Periodic Balanced Sorting Network. JACM Vol. 36, 1989, pp. 738-757. [9] Hong Z. & Sedgewick R. Notes on merging networks, Proceedings of the fourteenth annual ACM symposium on Theory of computing, 1982, pp.296-302. [10] Knuth D.E. The art of computer programming Vol. 3: Sorting and searching Addison-Wesely, 1973. [11] Milterson P.B. & Paterson M. & Tarui J. The Asymptotic Complexity Of Merging Networks Foundations of Computer Science, 1992, pp. 236-246. 81
Technion - Computer Science Department - Tehnical Report CS-2007-16 - 2007
82
[12] Nakatani T. & Huang S.T. & Arden B.W. & Tripathi S.K. k-Way Bitonic Sort. IEEE Trans on Computers. Vol. 38, 1989. pp. 283-288. [13] Leighton F.T. Introduction to Parallel Algorithms and Architectures Morgan Kaufmann, 1991. [14] Rajasekaran S. & Sandeep S. A generalization of the 0-1 principle for sorting. IPL Vol. 94, 2005, pp 43-47. [15] Rice W.D. Continous Algorithms. Topology Appl. Vol. 85, 1998, pp. 299-318. [16] Van Voorhis D.C. An Economical Construction for Sorting Networks. Proc. AFIPS NCC, 1974, pp. 921-927. [17] Dalpiaz F. & Rizzi R. Two algorithms, One sorting network. (Personal communication. Available from“
[email protected]”)