A robust sorting network - CiteSeerX

Report 3 Downloads 166 Views
Carnegie Mellon University

Research Showcase @ CMU Computer Science Department

School of Computer Science

1983

A robust sorting network Larry Rudolph Carnegie Mellon University

Follow this and additional works at: http://repository.cmu.edu/compsci

This Technical Report is brought to you for free and open access by the School of Computer Science at Research Showcase @ CMU. It has been accepted for inclusion in Computer Science Department by an authorized administrator of Research Showcase @ CMU. For more information, please contact [email protected].

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS:

The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or other reproductions of copyrighted material. Any copying of this document without permission of its author may be prohibited by law.

CMU-CS-84-104

University l . » f c r ^ * ?

Pittsburgh

G

PA 1 5 2 1 3 - 3 8 9 0

A Robust Sorting Network Larry Rudoiph Department of C o m p u t e r S c i e n c e C a r n e g i e - M e l l o n University A u g u s t 1983

Abstract B e g i n n i n g with the r e c e n t l y i n t r o d u c e d ' b a l a n c e d sorting network* that sorts an input v e c t o r of s i z e N=-2 a n d c o n s i s t s of n identical b l o c k s w h e r e e a c h block is c o m p o s e d of n p h a s e s of N/2 c o m p a r a t o r s per p h a s e , w e p r o p o s e a s h u f f l e - e x c h a n g e t y p e layout consisting of a s i n g l e block with t h e output r e c i r c u l a t e d b a c k as input until sorting is a c h i e v e d . T h e main a d v a n t a g e of the p r o p o s e d d e s i g n is that n o c o m p a r a t o r in the network is critical in the s e n s e that a n y faulty c o m p a r a t o r c a n be b y p a s s e d w i t h o u t d i s t u r b i n g the functionality of the n e t w o r k (just its s p e e d ) . T h e novelty of the d e s i g n is that the r o b u s t n e s s is d e r i v e d from the u n d e r l y i n g algorithm. T h e network will sort in Lhe p r e s e n c e of m a n y faulty c o m p a r a t o r s . M o r e o v e r , of the NlogN/2 c o m p a r a t o r s , only A' pairs of c o m p a r a t o r s a r e critical. T h a t is, the n e t w o r k fails only w h e n b o t h c o m p a r a t o r s in a n y of these pairs fail. T h e s e results e n a b l e o n e to build large sorting n e t w o r k s on a single wafer s o that a high p e r c e n t a g e of the f a b r i c a t e d w a f e r s c a n b e u s e d ; s o m e of the w a f e r s wiil sort v e r y quickly (the o n e s with no faulty c o m p o n e n t s ) , most will sort at s o m e w h a t s l o w e r than optimal s p e e d s , but only a few will fail to b e useful as s o r t i n g n e t w o r k s ( d u e to too many, b a d l y p l a c e d faults). n

1

T h e a d v e n t of V L S I t e c h n o l o g y is impacting almost all a s p e c t s of s o c i e t y . Unfortunately, o n e of the major p r o b l e m s facing V L S I t e c h n o l o g y increasing manufacturing c o s t s is t h e low yield in mass p r o d u c t i o n of c h i p s a n d wafers. T h a t is, a l t h o u g h it is c h e a p to p r o d u c e large quantities of a single c h i p o r wafer, only a small fraction of them function c o r r e c t l y . T h e rest are r e n d e r e d useless d u e to r a n d o m flaws i n t r o d u c e d during the fabrication p r o c e s s . T h e flaws tend to h a v e the most a d v e r s e effect on the active elements (e.g., gates, transistors) a n d less effect o n the wires. O n e w a y of increasing t h e yield is b y employing d e s i g n s that function d e s p i t e fabrication flaws. W e p r e s e n t a layout for a sorting network that c a n withstand many faulty c o m p o n e n t s . A l t h o u g h a small network c a n usually fit o n a single c h i p , a m u c h larger network c o u l d b e made to fit o n a single wafer. P r e v i o u s l y k n o w n sorting networks s h a r e d the p r o p e r t y that almost all the c o m p o n e n t s (comparators) are c r u c i a l , a n d s o large networks p r o d u c e d o n a single wafer w o u l d be e x p e n s i v e d u e to the resulting low yield. T h i s is not the c a s e with o u r layout; most faulty c o m p a r a t o r s c a n b e b y p a s s e d a n d still allow the n e t w o r k to sort, albeit s o m e w h a t s l o w e r . Related w o r k c a n b e classified into t w o different areas, sorting n e t w o r k s and fault-tolerant systems. T h e r e has b e e n m u c h r e s e a r c h in sorting n e t w o r k s a n d the related area of parallel sorting algorithms. B a t c h e r [ B a t c h e r 68] i n t r o d u c e d the Bitonic network as well as the O d d - E v e n network (see also [ K n u t h 68]) b o t h requiring 0 ( [ log A ] ) s t e p s to sort input v e c t o r s of s i z e N (see also, H o n g and S e d g e w i c k [ H o n g a n d S e d g e w i c k 82] and Perl [Perl 83] for additional insights into s u c h networks). T h e r e has also b e e n m u c h r e s e a r c h in sorting algorithms for parallel p r o c e s s o r s , s o m e parts of w h i c h are relevant to sorting networks, for e x a m p l e , Valiant [Valiant 75], B o r o d i n a n d H o p c r o f t [ B o r o d i n a n d H o p c r o f t 82], a n d Kruskal [Kruskal 83]. Rief a n d Valiant [Reif a n d Valiant 83] g i v e an algorithm that requires 0 ( log A O e x p e c t e d time to sort o n a particular t y p e of network, w h e r e a s , Ajtai et. al [Ajtai el al 83] recently s h o w e d that there a r e 0(N\ogN) s i z e d networks that c a n sort in 0 ( log AO steps, a l t h o u g h the large c o n s t a n t s make their network impractical. W i n s l o w a n d C h o w [ W i n s l o w a n d C h o w 83] review a n d c o m p a r e v a r i o u s sorting m a c h i n e s that make different a s s u m p t i o n s a b o u t h o w the input a n d the output are c o n n e c t e d to the m a c h i n e . 2

T h e s t r u c t u r e of the p a p e r is as follows: W e first d e s c r i b e the ' b a l a n c e d sorting network', w h i c h has recently b e e n i n t r o d u c e d [ D o w d et al 83a, D o w d et al 83b]. T h e 'crucial c o m p a r a t o r s ' are then identified a n d their effect a n a l y z e d . T h e third s e c t i o n first r e v i e w s the layout p r o p o s e d in the i n t r o d u c t o r y p a p e r a n d then modifies the layout in t w o w a y s : first to r e d u c e the n u m b e r of critical c o m p a r a t o r s a n d t h e n to eliminate all of t h e m . A n analysis of the i n c r e a s e d yield is also p r e s e n t e d .

1. The Balanced Sorting Network In this s e c t i o n w e review the d e s i g n a n d layout of the " b a l a n c e d sorting n e t w o r k " i n t r o d u c e d b y D o w d et. al [ D o w d et al 83a]. T h e network requires [ log N\ s t a g e s of N/2 c o m p a r a t o r s to s o r t N items a n d c o n s i s t s of a s e q u e n c e of log N identical merging b l o c k s , w h e r e e a c h block p o s s e s s e s a highly regular, r e c u r s i v e d e s i g n (see F i g u r e 1-1 for a merging network of s i z e 16). A n o v e l a s p e c t of the network is that t h e b l o c k s a r e identical not smaller r e c u r s i v e v e r s i o n s as in t h o s e of B a t c h e r [ B a t c h e r 68]. 2

Specifically, a b a s i c unit of the n e t w o r k is a t w o input, t w o o u t p u t c o m p a r a t o r transforming the arbitrary o r d e r of the t w o input elements into n o n d e c r e a s i n g o r d e r . E a c h phase of t h e b a l a n c e d merging n e t w o r k is c o m p o s e d of N/2 of these c o m p a r a t o r s with the first p h a s e c o m p a r i n g elements x ( 0 ) with x ( N - l ) , x ( l ) with x(N-2), • • • , x ( A V 2 - l ) with x(N/2), w h e r e x is the input v e c t o r . T a k i n g the a p p r o a c h of an 'oblivious* algorithm in that e v e n t h o u g h the first p h a s e d o e s not g u a r a n t e e a

2

0 -* 1 2 3 4— 1 5 6 7 Input 8 9 10 — 11 12 13 14

Phase 3 Phase 4

Phase 2

Phase 1

, 1. 1

ul

J

i

. I —

J

.—

1

i i — a —

4

—a

• —

Figure 1 -1:

Output

I —

J

.—

1 1 i . 1i t — p - ^

A B a l a n c e d M e r g i n g B l o c k of S i z e 16

partition of the input into t w o halves, w e p r e t e n d that it d o e s a n d s o c o n t i n u e to apply the s a m e p r o c e d u r e to both halves of the output of the first p h a s e r e c u r s i v e l y . T h u s , log N p h a s e s c o m p r i s e a merging network. W e n u m b e r the p h a s e s from 1 to log N. F i g u r e 1-1 d e p i c t s a b a l a n c e d merging network for N-16

elements using K n u t h ' s [ K n u t h 68]

c o m p a r a t o r - n e t w o r k representation w h e r e horizontal lines r e p r e s e n t the input lines x ( z ) , 0 < / < Af, a n d vertical lines r e p r e s e n t c o m p a r i s o n s b e t w e e n the elements o n the c o r r e s p o n d i n g input lines.

Since

the output of a merging network (from n o w o n w e call s u c h a merging network a b l o c k ) may not b e s o r t e d , w e c o n t i n u e to apply t h e s e b l o c k s until sorting is o b t a i n e d .

( F i g u r e 1-2 s h o w s the full

b a l a n c e d sorting network for s i z e 8.) E a c h b l o c k , as its name implies, is a merging network, h o w e v e r , it is not easily o b s e r v e d e x a c t l y w h a t is being m e r g e d . T h e first p h a s e of the merging network a p p l i e d to a recursively balanced v e c t o r partitions the elements s o that t h e N/2 smallest elements a r e in t h e first half of t h e v e c t o r a n d t h e N/2 largest elements are in the s e c o n d half of t h e v e c t o r . M o r e o v e r , e a c h half is r e c u r s i v e l y b a l a n c e d s o that e a c h s u b s e q u e n t p h a s e acts r e c u r s i v e l y to sort the input. T h e b a l a n c e d sorting network is v e r y similar to the bitonic a n d t h e o d d - e v e n sorting n e t w o r k s i n t r o d u c e d b y B a t c h e r [ B a t c h e r 68]. c o m p o s e d of N/2 c o m p a r a t o r s .

T h e s e n e t w o r k s also c o n s i s t of [ log N]

1

s t a g e s with a s t a g e

M o r e o v e r , they are b o t h build u p o n merging n e t w o r k s , h o w e v e r ,

d e s p i t e their similarity, there is n o permutation b e t w e e n the input lines of either of B a t c h e r ' s t w o n e t w o r k s a n d the b a l a n c e d sorting network.

T h e d i f f e r e n c e s b e t w e e n the b a l a n c e d network a n d

t h o s e of B a t c h e r a r e evident from t h e following lemma w h i c h is satisfied b y neither the o d d - e v e n n o r bitonic m e r g e n e t w o r k s .

3

Block 1

Block 2

Block 3

Input

Sorted Output

Figure 1 -2:

A B a l a n c e d S o r t Network of S i z e 8

L e m m a 1: i) If n o e x c h a n g e o c c u r s d u r i n g a block, t h e n the input of the b l o c k is s o r t e d . ii) A network w h i c h s o r t s a n y input c a n b e c o n s t r u c t e d by serially c o m p o s i n g a finite n u m b e r of b l o c k s . Proof: i) A s i n g l e b l o c k performs all c o m p a r i s o n s x ( / - 1) with x(/) for 1 < i< n (among others) a n d t h u s if n o e x c h a n g e o c c u r s the input must b e s o r t e d . ii) E a c h e x c h a n g e d e c r e a s e s the n u m b e r of inversions (that is, pair of elements w h i c h are out of o r d e r ) . S i n c e a permutation has at most ( ? ) inversions, part (i) implies that ) b l o c k s suffice to sort. • In addition to differentiating b e t w e e n the sorting networks, this lemma is important in two respects. It d e m o n s t r a t e s that only s o m e of the c o m p a r i s o n s a r e n e e d e d to sort. It also s u g g e s t a p r o c e d u r e for d e c i d i n g w h e n the o u t p u t is s o r t e d . T h e following implementation strategy, w h i c h is a s s u m e d t h r o u g h o u t the rest of the paper, arises from these o b s e r v a t i o n s . S i n c e a s u c c e s s i o n of identical b l o c k s a r e r e q u i r e d , only o n e block is actually n e e d e d . T h e o u t p u t of the b l o c k is recirculated b a c k as input (see F i g u r e 1-3). M o r e o v e r , by the first part of t h e lemma, the d e c i s i o n to recirculate c a n b e b a s e d o n w h e t h e r a n y e x c h a n g e s o c c u r r e d within a block. Not only d o e s this allow a faster c o m p l e t i o n time for certain input v e c t o r s and the elimination of a l o g i V c o u n t e r , but it also e n a b l e s a m o r e fault tolerant network, as will b e s h o w n later.

2. Critical Comparators In this s e c t i o n w e identify the 'critical c o m p a r a t o r s ' of the recirculating network (see F i g u r e 1-3). S i n c e m a n y fabricated c h i p s , o r more significantly, wafers, are likely to contain faulty c o m p a r a t o r s , it is d e s i r a b l e that the fabricated p r o d u c t still sort o n c e the faulty c o m p a r a t o r s a r e b y p a s s e d . Recall that w e are c o n s i d e r i n g a recirculating network consisting of o n e b l o c k of c o m p a r a t o r s with the o u t p u t r e c i r c u l a t e d b a c k w h e n e v e r there is at least o n e e x c h a n g e o c c u r r i n g in the block. We c o m p a r e a c o m p l e t e b a l a n c e d sorting network with o n e missing s o m e of its c o m p a r a t o r s . T h e term

4

Output

Input

Figure 1 -3:

iteration

A S i n g l e B l o c k B a l a n c e d S o r t N e t w o r k of S i z e 8

is u s e d to indicate the m o v e m e n t of d a t a t h r o u g h o n e block of the c o m p l e t e network a n d t h e

term pass refers to the m o v e m e n t of d a t a t h r o u g h o n e b l o c k of the incomplete network. W h e n s o m e c o m p a r a t o r s are missing, there will usually b e m o r e p a s s e s than iterations to p r o d u c e the s o r t e d output for a g i v e n input. C o n s i d e r a size 8 network with an input v e c t o r x consisting of all z e r o s e x c e p t for a o n e in t h e fourth position, i.e. x ( 3 ) = 1 o r x = [0,0,0,1,0,0,0,0] (the result after the s o r t s h o u l d b e [0,0,0,0,0,0,0,1]). It is e a s y to s e e that t h e x ( 3 ) : x ( 4 ) c o m p a r i s o n (a first p h a s e c o m p a r i s o n ) is c r u c i a l . W i t h o u t it, the 1 will n e v e r c h a n g e its p o s i t i o n . O n the o t h e r h a n d , the x [ 4 ] : x [ 7 ] c o m p a r a t o r (a s e c o n d p h a s e c o m p a r a t o r ) is not c r u c i a l .

In the first b l o c k , the first p h a s e e x c h a n g e s x [ 3 ] with x [ 4 ] , the next p h a s e d o e s

nothing, the third p h a s e e x c h a n g e s x [ 4 ] with x [ 5 ] . A s e c o n d pass t h r o u g h the block will p r o d u c e t h e s o r t e d o u t p u t using t h e x [ 5 ] : x [ 6 ] c o m p a r a t o r in the s e c o n d p h a s e a n d the x [ 6 ] : x [ 7 ] c o m p a r a t o r in t h e third p h a s e . B e f o r e presenting t h e main t h e o r e m , w e i n t r o d u c e s o m e notation a n d q u o t e a few lemma's p r o v e d in [ D o w d et al 83a]. • G r e e k letters r e p r e s e n t a string of bits with a s u p e r s c r i p t to indicate the n u m b e r of bits. F o r e x a m p l e <x ~ ft" 1 indicates a string of k bits, the first (high o r d e r ) k—j bits of w h i c h a r e d e n o t e d b y a ~* a n d the last bit is 1. W e omit s u p e r s c r i p t s of 1. k

J

1

k

%

• C o m p a r a t o r s a r e specified b y the i n d i c e s of the lines they c o m p a r e ; t h e t w o i n d i c e s a r e s e p a r a t e d b y a c o l o n (:). F o r e x a m p l e , the first p h a s e of a s i z e 16 network c o n t a i n s the c o m p a r a t o r 0 : 1 5 . T h e i n d i c e s will often b e written in a b i n a r y templet form in w h i c h s o m e of the bits are left u n s p e c i f i e d in o r d e r to r e p r e s e n t a set of c o m p a r a t o r s .

5

W e first identify t h e critical c o m p a r a t o r s with the following definition a n d later s h o w that these are indeed the only c o m p a r a t o r s that are required for sorting. D e f i n i t i o n 2 : In a b a l a n c e d sorting network of size N=2 the j+ l p h a s e are of the form

n i

the critical

comparators

of

s r

T h e other (N log N/2)-N

c o m p a r a t o r s are referred to as

noncriticaL

P r o p o s i t i o n 3 : In a s i z e N= 2 b a l a n c e d sorting network, the f p h a s e c o m p a r e s all pairs of entries w h o s e i n d i c e s h a v e identical h i g h - o r d e r (leftmost) y ' - l bits a n d c o m p l e m e n t a r y l o w - o r d e r (rightmost) n—(j—l) bits. In t h e a b o v e notation, c o m p a r i s o n s in the j p h a s e a r e b e t w e e n elements w h o s e indices a r e of t h e form n

h

t h

D e f i n i t i o n 4 : T h e level i chains a r e all t h o s e with the same rightmost / bits. T h e level / cochains are all t h o s e with the s a m e o r c o m p l e m e n t a r y rightmost / + 1 bits. L e m m a 5 : A p p l y i n g t h e \ p h a s e (J< i< n) to an input w h o s e level n-(j— s o r t e d , p r e s e r v e s this property.

1) c o c h a i n s are

th

L e m m a 6 : A p p l y i n g t h e i p h a s e (j+l

• ( p h a s e r)

a\Lv : a\iv

and

auv :

In o r d e r to differentiate the value in the v e c t o r at e a c h p h a s e , w e s u p e r s c r i p t the v e c t o r with a p h a s e n u m b e r / for the value b e f o r e the i' p h a s e c o m p a r i s o n . h

A s a result of the p h a s e k c o m p a r i s o n w e h a v e x * ( a / i ? ) < x (a]iv). By a previous lemma, this is still true at p h a s e r. T h i s , along with t h e fact that after a c o m p a r i s o n , the smaller value is p l a c e d into the position with the smaller index and the larger value into t h e position with the larger index, w e h a v e : x ( a / i » » ) = min { x ^ a / i r ) , x ( a ^ i » ) } + 1

r + 1

k+l

r

S x (afi*0 r

£ x (ujZjr) r


j: B y lemma 6, the p h a s e s after j+1

th

p h a s e a n d c o n s i d e r the

j

th

h a v e no effect.

C a s e k<j: Starting with level n—(j-l) s o r t e d c h a i n s , w e must s h o w that after a n o t h e r iteration it must b e the c a s e that the level / i — j c h a i n s are s o r t e d . T h e first k— 1 p h a s e s are the s a m e in b o t h the networks. By the p r e v i o u s lemma ( L e m m a 7) the effect of p h a s e k is a c c o m p l i s h e d by the e n d of the pass. B y lemma 5 it is clear that the first k p h a s e s of the s u b s e q u e n t p a s s d o e s n o harm. T h e r e f o r e b y the k+1 p h a s e of the s e c o n d pass, the items h a v e the d e s i r e d p r o p e r t y required at the k+1 p h a s e of the c o r r e s p o n d i n g iteration. At most t w i c e the n u m b e r of p a s s e s are n e e d e d to c o m p e n s a t e for the removal of o n e noncritical c o m p a r a t o r . T h e next q u e s t i o n to ask is what h a p p e n s if two noncritical c o m p a r a t o r s a r e b y p a s s e d ? Let a n d N C b e two noncritical c o m p a r a t o r s a n d let C O M P j , C O M P , C O M P b e the three c o m p a r a t o r s u s e d to c o m p e n s a t e for N C j . If N C is not o n e of t h e s e t h r e e then n o e x t r a p a s s e s a r e r e q u i r e d . O n the ether h a n d , additional p a s s e s may b e required if N C is o n e of t h e s e three. S u p p o s e N C is C O M P . T h e n n o e x t r a p a s s e s are required s i n c e b y the e n d of the block the effect of C O M P will h a v e b e e n a c c o m p l i s h e d and thus the s a m e for the effect of N C j . H o w e v e r , if N C is C O M P j then the effect of C O M P i may not o c c u r until the e n d of the block a n d s o an additional p a s s will b e n e e d e d for C O M P a n d C O M P to h a v e a n d effect. T h u s in the w o r s t c a s e , three times as many p a s s e s will be n e e d e d if t w o noncritical c o m p a r a t o r s are r e m o v e d . 2

2

3

2

2

2

2

2

2

2

3

(ii) It is clear by inspection that the critical c o m p a r a t o r s are i n d e e d critical. C o n s i d e r a p h a s e j+1 critical c o m p a r a t o r . It has the form a ^ O F ^ ' ^ i a ^ l O " " ^ " ) . In later p h a s e s , s a y p h a s e r>j\ the smaller i n d e x e d line will b e c o m p a r e d to lines smaller than it (i.e. 8 ~ 0 " < " > : a^01 "^" >). T h u s if the maximum key is o n line a/Ql ~V- \ it will n e v e r b e s w a p p e d b y any other c o m p a r i s o n . 1

I

1

/ I

r

1

/l

1

n

l

O n e may w o n d e r w h y L e m m a 7 d o e s not apply in this c a s e . U p o n careful examination, it is c l e a r that for a critical c o m p a r a t o r p ~ c a n n o t b e rewritten in the required form. • n

k

In g e n e r a l , w h e n c noncritical c o m p a r a t o r s are r e m o v e d , a factor of at most c i n c r e a s e in the n u m b e r of p a s s e s will be r e q u i r e d . C o n s i d e r w h a t h a p p e n s if all the noncritical c o m p a r a t o r s from the first p h a s e are r e m o v e d , leaving only o n e p h a s e 1 c o m p a r a t o r . It is not hard to s e e that a factor of log N additional p a s s e s will be n e e d e d . M o r e o v e r , w h e n all noncritical c o m p a r a t o r s are r e m o v e d , the network is r e d u c e d to b u b b l e s o r t [ K n u t h ] . C o r o l l a r y 9 : With only critical c o m p a r a t o r s , sorting takes N log N p h a s e s .

3. The Shuffle-Exchange Layout U p to this point w e d e s c r i b e d t h e network in terms of c o m p a r i s o n s b e t w e e n the v a l u e s o n " h o r i z o n t a l l i n e s " . In this s e c t i o n , w e first review the s h u f f l e - e x c h a n g e layout for the b a l a n c e d sorting network as p r e s e n t e d in [ D o w d et al 83a], a n d then identify the critical c o m p a r a t o r s in this layout. A slight modification to the layout halves the n u m b e r of critical c o m p a r a t o r s . Simple replication c a n

8

t h e n eliminate the rest of the critical c o m p a r a t o r s .

T h e s e c t i o n c o n c l u d e s with an analysis of the

increased robustness. A shuffle e x c h a n g e layout for N = 2 'elements' c o n s i s t s of a series of identical s t a g e s . E a c h s t a g e k

c o n s i s t s of N/2 t w o b y t w o c o m p a r a t o r s , n u m b e r e d 0 to TV/2— 1. If w e n u m b e r the lines t h r o u g h the c o m p a r a t o r s in a s t a g e from 0 to N-1

s o that the lines t h r o u g h t h e \ c o m p a r a t o r are labeled 2i a n d th

2i + 1 t h e n output i from s t a g e t is c o n n e c t e d to input a{\) in s t a g e t + 1 w h e r e t h e permutation a is t h e perfect

shuffle

permutation (see [ C l o s 53, B e n e s 65]):

t h e n a(i) = i * - i*-3 • • • »o U-i 2

(

s e e

F i

if \ k

x

i*_

2

. . . i is the binary e x p a n s i o n for i 0

9 u r e 3-1(a)).

(a) S H U F F L E P E R M U T A T I O N cr

(b) T H E P E R M U T A T I O N r Figure 3-1:

E a c h c o m p a r a t o r c o m p a r i n g input lines / a n d j \ i< jean

b e set into four p o s s i b l e states:

1. S t a t e

+

:

output(i)

< output(j)

(larger

value to the upper

line)

2. S t a t e

-

:

output(i)

> output(j)

(larger

value to the lower

line)

3.State

0 :

output(i)

=

input(i),

output(j)

=

input(j)

(no

exchange)

4.State

1 :

output(i) = input(j), output(j) = input(i) (exchange). T h e layout realizing t h e b a l a n c e d merging n e t w o r k (a s i n g l e b l o c k of t h e b a l a n c e d sorting network) c o n s i s t s of log N shuffle e x c h a n g e s t a g e s with all c o m p a r a t o r s set to t h e " + " state. E a c h s t a g e c o r r e s p o n d s to a p h a s e of t h e merging n e t w o r k . In o r d e r that the layout simulate the merging network, t h e inputs into t h e layout must b e a certain permutation r of the i n p u t s into t h e n e t w o r k , w h e r e r(2i) = 2i a n d T ( 2 / + 1 ) = w - 2 / - l , that is, r f i x e s the location of the e v e n inputs a n d r e v e r s e s t h e o r d e r of t h e o d d inputs ( s e e F i g u r e 3-1 ( b ) ) . T h e b a l a n c e d sorting n e t w o r k c a n b e realized with log N s u c c e s s i v e s h u f f l e - e x c h a n g e b l o c k s with t h e o u t p u t of e a c h b l o c k c o n n e c t e d to t h e input of the n e x t b l o c k via t h e r permutation ( s e e F i g u r e 3-2). O u r plan is to first s h o w the c o r r e s p o n d e n c e b e t w e e n t h e n e t w o r k a n d t h e p r o p o s e d layout, w h i c h r e q u i r e s s o m e additional notation, s o that t h e critical c o m p a r a t o r s in the layout c a n b e identified. D e f i n i t i o n 1 0 : Let L i n e ( i ) b e the value o n t h e i ;

th

n e t w o r k line ( s e e F i g u r e 1-1) just

b e f o r e the p h a s e t c o m p a r i s o n s ( 0 < / < /i, 1 < / < log AO. In particular, LineHO are t h e input values. D e f i n i t i o n 1 1 : Let In^i) b e t h e v a l u e of t h e i input line of t h e X s t a g e of t h e layout. T h i s is the v a l u e for the i(mod 2) input into t h e [ i / 2 j c o m p a r a t o r (0• P v e c t o r that c a u s e s all c o m p a r a t o r s to s w a p during the first pass is 2

T

h

e

n

t

h

e

i n

u t

17

1 2 3 4

Layout (i) Layout Layout (iii) Layout (iv)

5000

Size of Network (N)

probability

of a comparator

F i g u re 4 - 4 :

fault (p) a .01

Probability of Network W o r k i n g

18

probability

of a comparator

Fig u re 4 - 5 :

fault (p) - .001

Probability of N e t w o r k W o r k i n g

19

20

defined as follows: x(i) = i n

v

i„_ , i _ , i . , • • • 2

n

3

w

4

In other w o r d s , every other bit is c o m p l e m e n t e d . Let y be the output v e c t o r after o n e pass t h o u g h the block. If y ( i ) =

• • • , i ^ , i^, • • •, then it is easy to identify the p h a s e k c o m p a r a t o r that failed to + 1

swap. W h e n actually c o n s t r u c t i n g a s i n g l e block recirculating sorting network as outlined a b o v e , o n e n e e d not initially permute the input b y T or r s i n c e the initial input is u n o r d e r e d . If this is the c a s e then the test input v e c t o r just p r e s e n t e d must first b e p e r m u t e d b y T. 3

G i v e n an input that f o r c e s e v e r y c o m p a r a t o r to e x e c u t e a s w a p , it is then possible to d e s i g n a self modifying circuit. A single input bit indicates that the circuit is in d e b u g m o d e . While in this m o d e e a c h c o m p a r a t o r tests to s e e that it e x e c u t e s a s w a p . If a c o m p a r a t o r d o e s not s w a p then the c o m p a r a t o r c a n automatically disable itself (i.e. put itself into b y p a s s m o d e ) .

6. Conclusion A l t h o u g h the b a l a n c e d sorting network has the s a m e time a n d s p a c e requirements as that of the bitonic sorting network, w e h a v e s h o w n that its a d v a n t a g e s are more than c o s m e t i c . T h e network c a n be realized as a highly robust s h u f f l e - e x c h a n g e layout. It is t h u s p o s s i b l e to p r o d u c e a fairly large sorting network on a single wafer s o that most of the wafers fabricated c a n be u s e d . T h e major 'trick' u s e d in o u r d e s i g n w a s the alternation of roles p l a y e d by e a c h c o m p a r a t o r . It w o u l d b e interesting to find other algorithms that c a n exploit this trick. A s the requirements for l a r g e - s c a l e parallel p r o c e s s i n g g r o w s , s o d o e s the n e e d to d e v e l o p 'robust* algorithms that c a n function in the p r e s e n c e of many failures. A n o t h e r requirement a p p e a r s to b e a simply c o m p u t e d function that indicates termination as well as the requirement for p r o g r e s s to o c c u r at e a c h iteration. A n immediate application of the results of this p a p e r may be routing networks. O u r sorting network c a n be c o n s i d e r e d to be a routing network with e a c h c o m p a r a t o r s e n d i n g an input value to the output port c o r r e s p o n d i n g to a certain bit in the destination a d d r e s s (instead of as the result of a c o m p a r i s o n ) . A l t h o u g h a single s h u f f l e - e x c h a n g e b l o c k c a n route an single input to any single output, s o m e permutations of N inputs require more than o n e pass. P e r h a p s o u r method c a n be u s e d to build a r o b u s t routing network?

^ h i s may not be the case if the input is assumed to be 'almost* sorted.

21

References [Ajtai el al 83]

[ B a t c h e r 68]

[ B e n e s 65]

Ajtai, M., J . K o m l o s , a n d E. S z e m e r e d i . A n C ( n log n) soritng Network. In 15th Annual ACM Symposium on Theory B a t c h e r , K. E. Sorting n e t w o r k s a n d their applications. AFIPS Spring Joint Computer Conference B e n e s , V . E. Mathematical theory of connecting A c a d e m i c P r e s s , 1965.

of Computing,

p a g e s 1-9. 1983.

32:307-314,1968.

networks

and telephone

traffic.

[ B o r o d i n a n d H o p c r o f t 82] B o r o d i n , A., a n d J . E. H o p c r o f t . R o u t i n g , M e r g i n g a n d Sorting o n Parallel M o d e l s of C o m p u t a t i o n . In 14th Annual ACM Symposium on Theory of Computing, p a g e s 338-344. 1982. [ C l o s 53]

Clos, C. A S t u d y of N o n b l o c k i n g S w i t c h i n g N e t w o r k s . Bell System Technical Journal 32:406-424,1953.

[ D o w d et al 83a]

D o w d , M., Y . Perl, L. R u d o l p h , a n d M. S a k s . The B a l a n c e d S o r t N e t w o r k . In Proceedings of Principles of Distributed Computing, A u g u s t , 1983.

[ D o w d et al 83b]

p a g e s 161-172. A C M ,

D o w d , M., Y . Perl, L. R u d o l p h , a n d M. S a k s . The Balanced Sort Network. T e c h n i c a l R e p o r t D C S - T R - 1 2 7 , R u t g e r s University, N e w B r u n s w i c k , N J , J u n e , 1983.

[ H o n g and S e d g e w i c k 82] H o n g , Z., R. S e d g e w i c k . Notes on M e r g i n g N e t w o r k s . In 14th Annual ACM Symposium

on Theory

of Computing,

p a g e s 296-302. 1982.

[Kleitman et al 81]Kleitman, D., T . L e i g h t o n , M. L e p l e y a n d G . Miller. A s u r v e y of n e w layouts for the s h u f f l e - e x c h a n g e g r a p h . In 13th Annual ACM Symposium on Theory of Computing. [ K n u t h 68]

K n u t h , D. E. The Art of Computer Programming. A d d i s o n - W e s l e y , 1968.

[Kruskal 83]

1981.

V o l u m e 3: Searching

and

Kruskal, C . P. S e a r c h i n g , M e r g i n g , a n d Sorting in Parallel C o m p u t a t i o n . IEEE Transactions on Computers C-32(10), 1983.

[Perl 83]

Perl, Y . Bitonic and Odd-Even Networks are more than merging. T e c h n i c a l R e p o r t , R u t g e r s University, N e w B r u n s w i c k , N J , 1983.

Sorting.

22

[Reif and Valiant 83] Reif, J . H. a n d L. G . Valiant. A logarithmic time sort for linear s i z e networks. In 15th Annual ACM Symposium on Theory of Computing,

p a g e s 10-16. 1983.

[Snir81]

S n i r , M. Lower Bounds on VLSI implementations of Communication Networks. T e c h n i c a l R e p o r t 32, C o u r a n t Institute, N Y U , 251 M e r c e r st, N Y , N e w Y o r k , May, 1981.

[ S t o n e 71]

Stone, H . S . Paralel P r o c e s s i n g with the Perfect Shuffle. IEEE Transactions on Computers C-20:153-161,1971.

[Valiant 75]

Valiant, L. G . Parallelism in c o m p a r i s o n p r o b l e m s . SI AM Journal of Computing 4(3):348-355,1975.

[ W i n s i o w and C h o w 83] W i n s l o w , L. E., a n d Y . C . C h o w . T h e analysis a n d d e s i g n of s o m e n e w sorting m a c h i n e s . IEEE Transactions on Computers C-32(7):677-683,1983. [ W i s e 81]

Wise,D.S. C o m p a c t layout of B a n y a n / F F T n e t w o r k s . In K u n g , S p r o l l , a n d Steele (editor), Conference on VLSI Systems and Computations, p a g e s 186-195. C o m p u t e r S c i e n c e P r e s s , 1981.