The
Turn
Model
for
Christopher Advanced
Adaptive
J. Glass
and
Computer
Department
Systems
State
Lansing,
{glass
MI
that
a model
are deadlock
free, livelock
free,
maximally adaptive. A unique is not based on adding physical topologies
(though
nels).
Instead,
which
packets
wormhole minimaf
the model
or nonminimal,
to networks
is based
with
on analyzing
in a network
can form. Prohibiting cles produces routing
algorithms and
Laboratory
48824-1027 .msu.
edu
extra
that
just enough turns to break algorithms that are deadlock
network,
local
input
devices,
and
network
nique
in
the
output
and
which
by first
into
flow
dividing
coniroi
routing the
digits
message
the tits in the packet to become available.
of the turns three
in routing.
Partially
for 2D meshes, cubes.
algorithms
adaptive
to prevent
for 2D meshes
permit
routing
k-ary
adaptive
are described
n-cubes, show
and highest
The
adaptiveness
that
which
sustainable
each router
a few
for
tilts
to store
routing algo-
on the pattern of message tratlic. For nonuniform trfic, adaptive routing algorithms perform better than nonones.
ture
for
become
constructing
computers
and
net works scalable
offer than
Systems nodes,
have
interconnection
multiprocessors,
shared-memory
massive
based where
large-scale
scalable
other
a popular
parallelism
approaches
on dh-ect each node
such
[1, 2, 3, 4, 5] and to multiprocessor
networks
are organized
has its own processor,
and broadcast
Direct
The
majority
of
as ensembles locaf
work
are
memory,
of
meshes
in part
by the
Touchstone
advantage. Its date
the
ACM
appear,
and
Association requires
copy
for
ACM
fee
are not
copyright notice
Computing
a fee and/or
@ 1992
without
the copies
specific
all
or
made notice
IS gven
part
of
th]s
or dmtrlbuted and that
Machinery.
To
the
copying copy
title
material for of
direct the
is by
all nodes
otherwise,
IS granted
publication
on their have
n-cube
the
or to republish,
permission.
0=89791 -509.7/92/0005/0278
$1.50
=
(Zj
bors if k which is n-cubes. for O < i
and of
routing
also permits
them
over multiple
communications topologies
and
k-ary
[2], the
for
wormhole
n-cabcs,
[9, 1]. routing
particularly
A 2D mesh
Intel
channels,
possible
Paragon,
is used and
the
lowin the Symult
MIT J-machine [11], and is used in the nCUBE2
the [1].
location
in the
the same number differs
from
that
mesh.
In a, k-ary
of neighbors.
The
of an n-dimensional
to k and two nodes
n-cube
definition mesh
[8], of a
in that
X and Y are neighbors
if z, = vi for all i, O < i < n— 1, except
one, j, where
1) mod k. The change to modular arithmetic in the definition adds wraparound channels to the k- ary n-cube, giving it symmetry, Every node has n neighbors if k = 2 and 2n neigh-
commercial
permission
In
bors if and only if z, = y, for all i, O < i < :n – 1, except one, j, where y, = ZJ & 1. Thus, nodes have from n to 2n neighbors,
if aud only
to
of
are low.
X is identified by n coordinates, (c., c1, . . . . Zn-2, Zn–l ), where 1. Two nodes X. and Y are neighO<xl 1/2, which indicates a significant degree of adaptiveness. Note that Sp = 1 for a sourcedestination pair does not imply that for that pair. The partially adaptive
algorithm algorithms
p is nonadaptive all permit non-
rninimed routing. In the case of the negative-first algorithm, this means that rout ing can be adaptive even when d= < s= and dv ~ Su or when d= > s= and du < Sg. this, see the bottom path in Figure 10b. (a) The
six turns
allowed
(solid
lines)
by the negative-tirst
4
■ !9.
■
■
an illustration
of
algo-
rithm.
99
For
Partially
Adaptive
n-Dimensional
■
4
This from
Routing
in
Networks
section describes the adaptive routing algorithms the application of the turn model to n-dimensional
and k-ary n-cubes. In general, these unless n is small or the k’s equal 2.
4.1
n-Dimensional
Of the
many
per
cycle,
analogs
algorithms
are
gorithm
is the
all-but-
first
adaptively
for
north-last,
for 2D meshes.
a packet
formed
noteworthy
of the west-first,
gorithms
are not
practical
Meshes
routing
three
topologies
resnlting meshes
The
by prohibiting
their
one turn
simplicity.
They
and negative-first
analog
of the
one-negative-jirst
west-first
routing
in the negative
are
routing
al-
routing
al-
algorithm:
directions
route
of all but
one
dimension (n – 1) and then adaptively in the other directions. The analog of the north-last routing algorithm is the all-but-onepositive-last the negative
(b) Examples Figure
3.4 How
of the negative-tirst
10. The
negative-fist
Degree adaptive
of are
sion (0) and then the negative-tlmt
u.
mmm
■
algorithm routing
muting
Theorem 5 n-dimensional
for 2D meshes.
Proofi
partially
adaptive
routing
adaptive
adaptive
algorithms,
algorithm, Ax
(Ax ‘f
=
Ay
partially
a node
= IdY – Sg [. Then,
channel
=
{
AZ!AY!
&J!#
if d% ~ s%
1
otherwise
(.4a&2 ~ ~z!AY!
Snorth_la.t =
when
channels
{ For minimal
routing
if dv ~ Sv
< SY)
or (d~ z so and
dv ~ SU)
1
otherwise the
the
traveling
but
the only
sum
of
the
k,
in a negative
larger
proof
All
for
an
of
we present
algorithm
direction,
K – n – X – 1, which
–1,
channels
which
leaving
the
the negative-first with
strictly
dimensional mesh. following theorem, if (dz < s= anddg
algorithms,
in the negative directions.
foT
n-dimensional
along
a
K – n – X
channels leaving the node If a packet ente~ a node it enters along a channel
islessthan node
K-n+X,
thenurn-
in the
positive
routes
every
algorithm
increasing
it enters
is less than
numbers
directions. packet
and is deadlock
along free.
❑
The negative-fit algorithm is deadlock free as a result of prohibiting just one turn per abstract cycle, the turn from a positive direction to a negative direction. Therefore, prohibiting some quarter of the turns is sntiicient to prevent deadlock in an n-
w =
be
K-n+X
of the
produces s negattve-fsrst
K
nmnbered
Therefore,
otherwise
{
adaptively positive
and K – n + X, the numbers of the in the negative and positive directions. when traveling in a positive direction,
+ Ay)!
ber -f:rst
first
in the
The negative- fivst routing meshes is deadlock free.
Let
numbered
s west
a packet
adaptively
X be the snm of the z, for any node Zn-I ). Nnmber each channel leaving a node in a positive direction K — n + X and each channel leaving a node in a negative direction K — n — X. Then, if a packet enters
algorithms?
p be one of the three
= Idz — s= 1, and
route
then
mesh, and let (ZO , Z* , . . . . z.-2,
Let Salgor,thm be the number of shortest paths the algorithm allows from source node (s=, SY) to destination node (d=, dv). Also, let j be a fully
and
these algorithms are deadlock free, is for the negative-fist algorithm.
in an 8 x 8 mesh.
algorithm
adaptively in the other directions. The analog of routing algorithm is also called the n egative-jirst
alg orithrn:
directions
Adaptiveness these
routing algorithm: route a packet tirst adaptively in directions and the positive direction of one dimen-
Theorem n(n — 1)) to prevent
Salgor,t~~
is, the
283
maximally 6
Combining this with Theorem which supports our claim that adaptive
Prohibiting
routing
some
in an n-dimensional deadlock.
algorithms.
quarter mesh
1, we have the the tnrn model
of
the
is necessary
turns and
(that suficient
is,
As
the
tially
number
adaptive
messages k,
of
dimensions
algorithms
adaptively.
are large.
are
SP
But
=
averaged
increases, more
the
likely
to
1 less often, across
all
minimal
be able
especially
Algorithm: Minimal p-cube routing for hypercubes. Input: Current address, C, and destination address, D. Procedure:
par-
to
route
when
source-destination
the pairs,
1. If C = D, exit.
Sp/Sj >1 /2n–1, indicating that the degree of adaptiveness relative to fully adaptive algorithms decreases as n increases. Again, adaptiveness 4.2
can be increased
k-ary
The
adaptive
to use the
way is to allow only
on its
lock free, wraparound than
routing
wraparound
a packet
first
of the
are routed
mesh
along
k-ary n-cubes according to
prove
of k-ary
along
that
channels,
channels The
the
n-cubes.
routing
in strictly
decreasing
for meshes. Note that
freedom
all of these
For k-ary
routing
n-cubes
deadlock-free
routing
is a simple
with
that
Linder
and
is deadlock ever,
Harden
free,
enough
minimal,
channels
subnetworks
[16] construct
with
and
fully
to
n + 1 levels
per
k-ary
Overall,
the paths,
Routing
for
hypercubes
n-dimensional algorithms
meshes
in
h = 6,
hO
free
They
add,
how-
into
2n -1
and
cases of the
n-cubes.
Proofs
are corollaries
routing
routing
especially
channel
algorithm
3, and
algorithm when
The
offers
compared
in a di-
for hypercubes.
a choice
of many
to tlhe nonadaptive
following
table
illustrates
of the 36 possible
transmitting
for
sends a example,
shortest
the message,
e-
this
the source node (1011010100) node (0010111001). For this
h] = 3. One
For each node
kn
address
the last
paths
the number
of
hop
C.
destination
was m a posltwe
address
dmectlon,
D;
and
p.
nr”uti’gf”r
channels
1. If C = D,
route
meshes routing
2.
Ifp=l,
and al-
3.
Elseif
for
4.
Else R = C.
algorithms that
of the
special
address
case of the negative-first
of the node
the header
address
has two phases.
the
packet
to the
local
processor
(CA
D).
and
increased
of the
proofs
algorithm
flits
destination
adaptiveness the packet
for
5. Route
the
currently node.
occupy, The
and
fault
tolerance,
any dimension
flits
is D.
p depends
on which
C is a unique input
buffer
constant the
Figure
distance
of adaptiveness
bet ween S and D. for
the p-cube
The routing
the
any
packet
i for which
the
first
i for which
Wits occupy
measure
available
channel
in a di-
r, = 1.
p-cube
routing
algorithm
for hypercubes.
dimension
choices
comment
taken
‘-
g: OO1OO1OOOL , . nnlm lnnnn I 7 ./
phase c, = 1
-.-..-””
“
1
.
0010110001
I 1
0010111001
I
,
n
1
I
‘w
and in the
6
Simulation
Experiments
To compare the partially adaptive routing algorithms all-but-one negative-fist (ABONF), all-but-one-positive-last (ABOPL), and negative-tirst (NF) with the nonadaptive routing algorithms Zy and e-cube, we simulate a 16 x 16 mesh and a binary 8-cube for three different tratlic patterns. Each of these topologies contains 256 nodes.
of the degree
Is Sp—ctibe /.$f
12. The minimal
address
a fully adaptive routing = l(S@D)[ is the Ham-
other
along
CV
the
router. Konstantinidou proposes an algorithm similar to p-cube [20], but only for minimal routing. The number of shortest paths from S to D, SP_cti~e, is hl !ho !, where IX I represents the number of 1‘s in the binary number X, hl=l(S A ~)1, and ho=l(s A D)l. For algorithm, ~, Sf = h!, where h = hl +ho
R=
rout-
routing,
for each router,
header
O,then
and D
p-cube
In the case of minimal
along
D=
D.
has a particu-
and d, = 1. Then, the steps can be computed as shown in Figure 12. In both of these algorithms, the only input transmitted in the header
CA
R=CA
the routing
fist phase routes the packet along a dimension i for which c, = 1 and d, = O. When there is no such dimension, the second phase routes the packet along a dimension i for which c, = O and d, = 1. These steps are easily computed using bitwise logic operations as shown in Figure 11. If nomninimal routing is desired, because can also route
then
mension
ing algorithm
ming
p-cube
algorithm.
=
whether
cases.
be the binary
of its
p-cube
Current
that
larly compact expression, the p-cube routing algorithm. Let S be the binary address of the source node for a packet, C be the binary
available
choices based on the p-cube routing is also shown. The number of choices in rmrentheses indicates the additional choices available with nonmi~mal routing.
Hypercubes
are special and k-ary
are deadlock
general
The
any
~i = 1.
addhg
n-cube
Hypercubes are a special case of both n-dimensional k-ary n-cubes. Consequently, the partially adaptive
more
along
exit.
p-cube
gorithms
packet
10-cube where to the destination
is shown.
per level.
5
CAD.
i for which
routing
a binary message
nonmini-
algorithm
subnetwork
and
to construct
a routing
the
the
shortest cube
Again,
without
adaptive.
to partition
R=
11. The minimal
extra channels. This is a result of the many cycles that do not involve turns in the topology. By adding channels to a k-ary ncube,
processor
or-
of the proof
are strictly
are minimal
O,then
Route
Figure
channel and then
channels.
k > 4, it is impossible
algorithms
local
packets
or increasing
modification
algorithms
to the
CAD.
mension
algorithm. Thus, a node at the east edge will have two channels to the west: a mesh immediately to its west and a wraparound
of deadlock
If R=
packet
dead-
can be extended
at the west edge of the mesh
3. 4.
One
is still
in another way: classify each wraparound the direction in which it routes packets
to a node
R=
channel
on whether
algorithm
2.
the
can be ex-
a wraparound
depending
negative-tirst
apPIY the negative-tit of the mesh channels channel to the node
maL
To
for meshes
channels
to be routed
hop.
der in the proof.
the proof
algorithms
number the mesh channels as before and assign the channels a number that is either greater than or less
those
channel
routing.
n-cubes
partially
tended
by nouminimal
route
of neighboring
= 1/(:,).
284
A pair
of unidirectional
routers
and
channels
each router
connects
to its local
each pair
processor.
All
of the
channels
input
channel
The
routers
have into
the
same
a router
operate
bandwidth,
has a bufTer
asynchronously
20 tlits/flsec. the
and
synchronize
rithms about
Each
size of a single
flit.
(Figure 13). At low throughputs, the algorithms perform the same. For the nonuniform tratiic patterns, the partially
adaptive
to simult~
routing
algorithms
throughputs
(Figures
have the lower
high
are blocked the source
mum sustainable throughput of the partially is four times that of the nonadaptive e-cube
contain
header
an input
flits
selection
consumed. waiting
policy
must
local first-come-first-served that
arrived
indefinite
channel
has multiple
selection policy decides in favor
multiple
arbitrate.
first.
This
postponement. output
in favor policy
When channels
channels channel,
of the header
a header
policy along
and
queued
at their the
source
network
processors
fit
an
output
used is called ZY and the lowest dimension.
Average communication latency
and Teverse-jiip. any of the other the
The uniform processors with
matrix-transpose
cessor at row umn i. In the by
mapping
bors
in the
then
sent
mesh.
nodes
resulting
one d
determined
message
the by
from
hypercube
so that
hypercube.
the
q ). The
0
100
200
Average
300
matrix
transpose
are
in the
reverse-tlip
Figure
14. Comparison
traffic
in a 16 x 16 mesh.
pattern
Overall
one at
of routing
in the hypercube,
are for the partially These throughputs
sends each
able throughput algorithm and is not
algorithms
the highest
I
I
I
Average commun-
25
ication
Z.
due to shorter
(
longer
-
path
lengths
for matrix-transpose
than
tratlic 15
tion
-
xv
10
west-first
1
0 0
100
Average
1
I
200
300
network
negative-first I I 400
500
I 600
throughput
for reverse-flip
800
packets.
(flits/Psec)
The
maintained Despite
The and
simulations
hypercube,
latencies
at high
of routing
tratfic
for uniform
happen
tratlic
From
pattern
that,
nonadaptive
throughputs
for uniform routing
than
the
traffic
algorithms
traffic
probably
partially
in the mesh have adaptive
hops)
than
algorithms
routing
to embody
pattern.
trafhc
(11.34
routing
adaptive
result
as well the
is that as when
superior
for uniform provide
better
the
The
av-
for uniform perform
algorithms global, with
for
long-term
a global,
starts
informa-
long-term
message
bet-
uniform point
traffic
of
spread
and the zv and e-cube algoadaptive algorithms, on the
lower
A traffic processes
algo-
applications,
performance traiiic,
285
the
will
of uniform
tratiic
information
is used.
of the
nonadaptive
partially
perforrmmce
pattern is determined are mapped to the each node
evenness
global
tratiic has been used in many we know of no real applications
ind]cate the
algorithms
tratlic.
tratlic is 4.27 hops, versus 4.01 in the mesh, the highest sustti’n-
across the mesh or hypercube, maintain that evenness. The
algorithms Figure 13. Comparison in a 16 x 16 mesh.
for the e-cube in throughput
other hand, select channels based on local, short-term information. These selections tend to benefit just the routed packet and ordy for the immediate future and tend to interfere with other
-AI 700
they
this
the uniform
evenly rithms
-Q--
partially
is that about
view,
-Q-
north-last
5;
*
the
tratiic. sustain-
throughput in the mesh, which occurs for the uniform tratRc. Again, average path length is
traflic (10.61 hops). The reason the nonadaptive ter
throughputs
and reverse-tlip the next highest
is for the negative-&t algorithm and matrixThis throughput is 30~o higher than the second
highest sustainable xy algorithm and
latency (flsec)
800
for matrix-transpose
in the hypercube, which occurs uniform tratiic. This improvement
able throughput transpose trallic.
i
35 30
700
(flits/flsec)
sustainable
adaptive algorithms are 50% higher than
erage path length for reverse-flip hops for uniform tratlic. Overall I
600
sends each message
at (ZO, ZI, zz, Z3, ZA, Z5, ~Ij, $7) to the
I
500
throughput
neigh-
Messages
(Z-7, Z-f, $–~, ~–~,Z-3, @, Z–I, Z-o).
40
400
network
at row j and colpatterm is derived
in the hypercube
the processor
by
the pru-
at (ZO, ZI, $2, Z3, X4, Z5, Z6, X7 ) to the
(3Y4, W, Z6, X7, ZZO,r~, w, from
each
in the
dictated
pattern
the processor
message
to
are neighbors
F
pattern sends each message to equal probability. In the mesh,
sends
a 16 x 16 mesh mesh
zo
dependent. Three matrix-transpose,
i and column j to the one hypercube, a matrix-transpose
to the
The
from
pattern
25
and bounded.
is largely
the message trafFic pattern, which is application network workloads are considered: uniform,
algorithms
in an input
to it,
is small
performance
adaptive algorithm.
flits
routing is minimal. For each simulation, two characteristics of network performance are measured: average communication latency (in Usec) and average sustainable net work throughput (in tlits delivered per psec). The throughput is sustainable when the number of Obviously,
at
therefore
All
packets
especially
For matrix-transpose
used is called
is fair
available
must arbitrate. The of the output channel
input output
The policy
and decides
in the router
prevents
When
for the same available
16).
traflic in both the mesh and hypercube, the maximum sustainable throughput of the partially adaptive algorithms is twice that of the nonadaptive algorithms. For reverse-fllp traftic, the maxi-
from immediately entering the net work are queued at processor. Messages that arrive at a destination pro-
cessor are immediately
14, 15, and
latencies,
neously transmit the tlits in a packet. The processors generate messages at time intervals chosen from a negative exponential distribution. Each message has an equal probability of being one packet of 10 or 200 tlits. Messages that
previous that
adaptive
in real
systems.
is not routing
algorithms Uniform
simulation studies, but generate uniform traffic.
by the application and how its nodes of the network. For most
communicate
with
some nodes
much
more
than
algorithms
30
Average25 communication Z. latency
.
ABONF
Q
ABOPL p-cube
= L
presents
illustrate,
is often
7
Conclusions
Our
goal
tions
because
has
been
to
poor
performance.
and
Future
make
the
interconnection
in which
packets
number
they
a problem
for the
are nonadaptive.
Just
maintain the evenness of uniform tratlic, the unevenness of nonuniform trafiic. The
minimum
use
of
net works.
can turn
of turns
best
break
the
channels
Analyzing
in a network
that
they blindly result, ss the
Work
cycles
faulty ment spot
hardware and decreases and livelock. Nonminimal avoidance
and fault
the
produces
free, minimal freedom and
livelock freedom are essentiaf for routing algorithms. ness increases the chances that packets can avoid hot
15 -
in
the direc-
and prohibiting
all of the
routing algorithms that are deadlock free, livelock or nonminimal, and maximally adaptive. Deadlock
10
Adaptivespots and
the chances of indefinite postponerouting allows, even greater hot-
tolerance.
The
turn
model,
urdike
other
54
apprOmhes to designing ~aptive routing algorithms, is applicable to networks with only the channels required by the network
o~
topologies channels).
o
100
200
Average
300
400
network
500
throughput
600
700
without
800
tially
(flits/psec)
(as well Applied extra
15. Comparison in an 8-cube.
ofroutingalgorittis
formatrix-trampose
While the disadvantages.
as to networks with extra to n-dimensional meshes
chanuels,
the turn
routing
algorithms.
adaptive
adaptive routing than nonadaptive tratiic. Figure traffic
tratlic
as they maintain
wormhole-routed
35
(psec)
Nonuniform
e-cube
figures
40
others.
ZY and
algorithms algorithms
turn model Adaptive
trol
logic
for route
this
may
increase
the
need
for
produces
Simulations
tion
on more
selection
than
delay.
of the
between
information.
bases
one of the dimensions. a selection
partially
does nonadaptive
Part
to decide
header
typically
new, par-
indicate that they can perform better for nonuniform pat terns of message
For
a selection
on the distance
remaining
and
is due
output
to
chan-
Another part of the to base the route selecdimension-order
on the
For adaptive
routing,
complexity multiple
nels, all of which lead to the destination. complexity is due to the need for a router a router
severaf of these
has many advantages, it also has some routing can require more complex con-
node
a router
model
physicaf or virtual and k-ary n-cubes
distance
routing, remaining
routing,
a, router
in more
than
in
must
base
one, or all, di-
mensions. Every extra bit of header information that is required for the router to select an output channel increases router storage requirements of store 40
I
35
(flsec)
e-cube
*
on network
~
the
Q -,&
cal channels.
I
tigate
the turn
effects model
to apply octagonal,
15 10
o~
for future input
In [18],
to networks Other
that
models
for
networks.
and
more
like those
work.
In [19],
we inves-
output
selection
we illustrate include designing
Another
mit adaptive routing topologies, the turns
without are not
stract
necessarily
cycles
are not
task
butions,
the
extra
virtuaf
adaptive
policies
application
of
or physi-
routing
algo-
obvious
extension
do for
of our work
is
is the
so that
the
the addition of channels. necessarily 90-degrees and formed
identification results
by
four
of realktic
of future
simulations
turns. workload can
In such the abA final distribe more
meaningful. 500
network
1000
1500
throughput
2000
2500
(flits/psec)
Acknowledgments The
Figure 16. Comparison fic in an 6-cube.
latencies
the turn model to other topologies, such as hexagonal, and cube-connected cycle net works, all of which per-
important
u]
Average
directions of different
performance.
the enhanced
o
communication
rithms are based on adding extra channels to networks, but not produce routing algorithms that are maximally adaptive
.20
54
are mauy
I
ABOPL p-cube (NF)
25
There
I
ABONF
30
Average communication latency
and makes
and f&ward.
of routing
algorithms
for reverse-flip
authors
develop
traf-
wish
to thank
Dr.
Philip
K. McKinley
for helping
us
the simulator.
References 1. NCUBE 2. Intel iion,
286
Company,
Corporation, 1991.
NCUBE A
6400
Touchdone
P70ce88c,r DELTA
Manual, System
1990. De8crip-
3.
S. B. Borkar, Kung,
R. Cohn,
M. Lam,
P. S. Tseng,
J. Sutton,
An integrated
solution
Proceedings 4. D.
Lenoski,
J.
May A.
and J. Webb, parallel
Gharachorloo,
multiprocessorfl on
Agarwal,
B.-H.
Lim,
A processor
tiprocessor,” on
Computer
W.
J. Dally
D.
Kranz,
tmchitecture
in Proc.
of the
Architecture,
Networks,
9.
vol.
vol. 3, no. 4, pp.
nection
“Performance IEEE net works,”
39, pp.
775–785,
J.
J.
Kubiatowicz,
International
torus
May
1990.
routing
1, no. 3, pp.
X. Lh
June
267–286,
chipfl
Journal
187–196,
and L. M. Ni,
“Deadlock-fi-ee networks,”
International
116–125,
C.
L.
Seitz,
A new Computer
1979.
multicast
wormhole
in Proceedings
Symposium
May
1986.
1990.
ing in in multicomputer pp.
mul-
Symposium
analysis of Ic-ary n-cube interconTransactions on Compute.., vol. C-
Dally,
Annual
10.
and
protocol
7. P. Kermani and L. Kleinrock, “Virtual cut-through: computer communication switching technique,”
W.
20.
1988.
multiprocessing
104–114,
“The
Computing,
and
for
17th
pp.
and C. L. Seitz,
of Distributed
8.
Conference
the 17th Internapp. 148–159,
of
Architecture,
on
of
Computer
routthe 18th
Architecture,
1991.
W.
Athas,
C.
C.
M.
Flaig,
A.
J.
Martin,
J. Seizovic, C. S. Steele, and W.-K. Su, “The arch] tecturc and programming of the Ametek Series 2010 multicomputer~ in Proceedings of the Third Conference current Computers and Applications, CA), pp. 1988. 11.
12.
13.
W.
J.
33–36,
Association
“The
Dally,
on Hypercube ConVolume I, (Pasadena,
for Computing
J-machine:
System
in Actors:
Knowledge-Based
Concument
and
eds.),
1989.
Agha,
MIT
Press,
Mach] nery,
support
for
Jan.
Actors)
Computing
(Hewitt
W. J. DallY and H. Aoki, “Adaptive routing using virtual channels,” tech. rep., Massachusetts Institute of Technology, Laboratory for Computer Science, Sept. 1990. H. Sullivan fully Annu.
and T. R. Bashkow,
distributed Symp.
parallel
“A large
machine? Architecture,
Comput.
scale, homogeneous,
in Proceedings
oj the lth
vol. 5, pp. 105–124,
Mar.
1977. 14.
W. J. DallY and in multiprocessor tions
15.
on
Computers,
J. T. Yantchev deadlock-free IEE
16.
18.
and
vol.
C-36,
Pt.
routing
for
E, vol.
W. J. Dally,
“Fine-grain
message
ers,”
in PTOC. of the Third
rent
Computers,
vol.
vol.
pp.
1987. low
latency,
of processorsfl 178-186,
May
in 1988.
“An adaptive and fault tolfor k-ary n-cubes ,“ IEEE 40, pp. passing
Conference
1, (Pasadena,
May
“Adaptive,
networks
136(3),
D. H. Linder and J. C. Harden, erant wormhole routing strategy on Computers,
message routing IEEE Tmnsac-
pp. 547–553,
C. R. Jesshope,
packet
Proceedings,
Transactions 17.
C. L. Seitz, “Deadlock-free interconnection networks,”
on CA.),
2-12,
Jan.
concurrent Hypercube pp.
2–12,
1991. computConcurJan.
C. J. Glass and L. M. N], “Adaptive routing in meshconnected networks,” in Proceedings of the 12th International on
Distributed
Computing
System.,
June
1992.
in
Gupta,
coherence
in Proc.
Computer
Nov.
A.
cache
“iWarp:
computing:
’88, pp. 330-339,
K.
19.
H. T.
L. Rankh,
1990.
“APRIL:
6.
J. Urbanski,
directory-baaed
Symposium
T. Gross,
J. Pieper,
to high-speed
Laudon,
“The
for the DASH tional
S. Gleason,
C. Peterson,
oj Svpercompnting
J. Hennessy,
5.
G. Cox,
B. Moore,
1988.
C. J. GhSS and L. M. Ni, “Maximally fully adaptive routing in 2d meshes; Tech. Rep. MSU-CPS-ACS-51, Dept. of Computer Science, Michigan State University, East Lansing, Michigan, Jan. 1992.
287
S. Konstantinidou, “Adaptive, minimal routing in hypercubes, “ in Proc. of the 6th MIT Conference: Advanced Re1990. search in VLSI, pp. 139–153,