A High-performance ATM Switch With Completely ... - Semantic Scholar

Report 0 Downloads 64 Views
A High-Performance ATM Switch with Completely and Fairly Shared Buffers * Wu-Yuin Hwang

Wen-Tsuen Chen

Yao-Wen Deng

Department of Computer Science National Tsing Hua University Hsin-Chu, Taiwan 30043 R.O.C. Abstract

there is a burst of packets entering the system and going to the same output port, the output buffer of the destination output port is likely to overflow while the output buffers of the other output port is still empty. If the output ports can share their buffer space, the number of lost packets can be reduced. In the Starlite switch [3] packets are queued at a completely shared input buffer. However, due to head-of-line (HOL) blocking the maximum throughput of a first-infirst-out (FIFO) input queuing switching system is limited to 0.586 [4,5]. In [6] the performance of four buffering strategies: input queuing, input smoothing, output queuing, and complete buffer sharing are evaluated and compared. It is shown that the complete buffer sharing strategy has the optimal buffer utilization and that it can achieve the optimal throughpuddelay performance with less buffer space than that of the output queuing strategy. However, the impiementation of complete buffer sharing requires complicated hardware and software. Moreover, complete buffer sharing may cause buffer hogging. Buffer hogging occurs when some of the switch ports suddenly become overloaded such that all the buffer spaces are occupied by these overloaded ports. Under this situation, every switch port begins to drop packets even though most of the switch ports are lightly loaded. Buffer hogging results in unfair utilization of system resources. In this paper, a new ATM switch named StarPlus is proposed. In Starplus, a novel buffer management mechanism is designed. With the StarPlus buffer management mechanism, buffer space can be completely shared by all the switch ports in a fair manner. The StarPlus buffer management mechanism also helps alleviating the influence of HOL blocking such that system throughput can be improved. The proposed buffer management mechanism works according to a simple algorithm and can be easily implemented by hardware. The performance of the proposed switching system is evaluated by both analytical model and simulation. The results show that high buffer

Sharing buffer space between switch ports greatly improves the pelformunee of the switching systems. Howevel; sharing buffers in a fair Manner is not an easy task. In this paper we propose a high-performuneeATM switching system with fairly and completely shared buffers. The core of the proposed switching system is a novel buffer management mechanism with which buffer space in the switching system can be completely shared by all the switch ports in a fair mannex The proposed buffer Management mechanism works based on a simple algorithm and can be easily implemented by hardware. The performance of the proposed switching system is evaluated by both analytical model and simulation. The results show that high buffer utilization and low packet lost ratio can be achieved under both uniform and nonuniform trafic loads.

1 Introduction Switching systems play an important role in distributed systems. A well designed switching system provides dedicated communicationpaths for the endpoints such that network workstations can communicate with each other concurrently. Due to internal conflicts within the switching system or contentions over the output ports, packets entering the switching system may be blocked temporarily such that it is unable to forward the packets to their destinations before the contentions are resolved. Buffers are required to store these temporarily blocked packets. Under estimation of the buffer space requirement at design time causes packet lost when too many packets are blocked. However, over estimation of the buffer space requirement results in a waste of resources. Efficient use of buffer space can improve the system performance and save cost [ 11. In the Knockout switching system [2] each output port use a dedicate buffer and the

buffer space can not be shared between each other. When 'This work was supported by the National Science Council, Taiwan, Republic of China, under grant NSC-85-0408-E007-093.

203 0-8186-8227-2/97 $10.00 0 1997 IEEE

utilization and low packet lost ratio can be achieved under both uniform and nonuniform traffic loads. The rest of the paper is organized as follows. In section 2 the architecture of the StarPlus switch and the buffer management mechanism are described. Performance evaluation of the StarPlus switch is presented in section 3. In section 4 some implementation issues are discussed. Section 5 gives the conclusions.

2 Architecture of the StarPlus Switch The overall switch architecture of the StarPlus is shown in Fig. 1. The Batcher sorter [7] sorts the input packets according to their destination addresses in a non-decreasing order. The buffer manager resolves output contentions and shares buffer space among all the switch ports. The routing network forwards the packets to their destination output ports.

-

0

-

-

-

Buffer

Sorter

Manager

*

-

0

Routing Network

0

__

are merged into one in BOM. For each of the output addresses, we choose at most one packet and mark it. If the number of unmarked packets exceeds N then we choose more packets from the set of packets originally belong to UBi and mark them. Marked packets are sent to UBi while unmarked packets are sent to LBj. In this way, conflicting packets in UBi are moved to LBj except the oldest one. Packets in LBj that won't conflict with packets in UBi are move to UBi to improve the throughput. If there is not enough room in LBj, some conflicting packets will be left in UBi. At UBI,packets with conflicting output address will be dropped except the oldest one before they enter the non-blockingrouting network. This avoids buffer hogging. An illustrative example of the CROM is shown in Fig. 3.

__

-

Batcher

Figure 2: Block diagram of the buffer manager.

0

-

Figure 1: The architecture of the StarPlus. It is shown that the Banyan network can be used as a non-blocking routing network if the input packets are concentrated and their destination addresses are in a monotonically increasing or decreasing order [8]. At the output end of the buffer manager, packets are concentrated and their destination addresses will be kept in an ascending order. The block diagram of the buffer manager is illustrated in Fig. 2. The system buffers are arranged into two groups: the upper group and the lower group. Each group contains B buffers, numbered from right to left in an ascending order. Each of the buffers is capable of storing N packets, where N is the number of switch ports. Between the upper buffer group and the lower buffer group are the buffer operator modules (BOMs). The key operation of the buffer management is called the contention resolution and output maximization (CROM) operation. The CROM operation applied to the ith buffer in the upper group, UBi, and the j-th buffer in the lower group, LBj, is described as follows. Each packet carries a time stamp with it when it enters the system. Older packets have smaller time stamp values. Packets in UBj and LBj are already sorted according to their destination addresses. Packets having the same destination address are sorted according to the time stamps. These two sorted lists

Before CROM

After CROM

Figure 3: An example of the CROM. The buffer management procedure can be subdivided

204

into three steps.

With uniform traffic model, the probability that a packet is heading for a particular output port is assumed to be $. Packet arrival at each input port is an i.i.d. and Bernoulli process. If p is the input line load and B is a random variable representing the number of arrivals destined to a particular output port then the probability that i packets are destined to an output port is

Step 1. the i-th BOM, BOMi, performs the shift operation such that the contents of the lower buffer group are shift left ,and that of the upper buffer group are shift right. At the left-most end, the contents of the buffers are wrapped around, i.e., UBB t LBB. Step 2. BOMi, 1 5 z 5 B - 1,performs CROM operation to UBi and L.Bi+l. Step 3. BOMi, 1 LBi.

5i5

bi

= Pr{B = i} =

( y ) (5ii(l-

(1)

i = 0,1,. . . N , and the generating function of B is

B , performs CROM to UBi and

An example with N = 4 and B = 4 is shown in Fig. 4. When N + 03 and p is small, binomial distribution becomes Poisson distribution and hence e-Ppi bi = P r ( B = i} = (3)

i!

N i=O

LB,

Ll,

LBI

LB,

INPUT

LE,

LB,

(a) Initial state.

LB1

LB,

Let Qm be the number of packets in a particular output queue at the m-th time slot and B, be the number of arriving packets during the m-th time slot. When Qm = 0 and B, = 1new arriving packets are served immediately. If we have infinite buffer space, then

INPUT

(b) Step 1.

Qm UB,

UB.

UB,

UB,

wmrr

u..

UBI

UB.

= "(0,

&,-I

+ Bm-1-

1).

(5)

UB.

The steady state of the of the mean queue length is

t t t t

U,

LBJ

LB1

LB,

(c) Step 2.

INPUT

LB,

LB,

LEI

LB,

When N

+ 03,we have

INPUT

The N-fold convolution of (8) is

(d) Step 3.

(1 -- p ) N ( 1 - 2 ) N (9) Q" = @p(l-") - 2 ) N * Expand (9) into a Maclaurin series yields the following asymptotic queue length probability

Figure 4:A buffer management example.

3 Traffic Mod(2land Performance Analysis The performance of the StarPlus is evaluated by queuing analysis and computer simulation with both uniform traffic model and non-uniform traffic model. In the StarPlus, packets lost when more than one cell with the same output address are sent to the non-blocking routing network, which occurs only when the system run out of buffer. Therefore, buffers are shared completely in the StarPlus switch.

205

Thus, we have a power series in z and can pick off the steady state probability of the N-fold convolution. For Q = 0 we have

UNIFORM TRAFFIC, p = 0.9 _.Analysis,

Pr{Q = 0) = ((1 - p ) e P ) N .

A Simulation,N = 32 A Simulation,N = 128

For Q = i we have min(N,i)

Pr{Q = i)

= ((1 -

~ ) e p ) ~

j=O

N 3 32

. _Analysis, _-N E 128

( 7 ) (-1)j le8

I

I

1

2

4

6

%

I

I

I

I

8

10

12

14

NUMBER OF BUFFERS

Figure 5: Packet lost probability versus number of buffers. Uniform traffic, p = 0.9.

Although (1 1) is a closed form of the steady state probability, it is not stable for the computing of Pr{Q = i}. Therefore, the packet lost probability for large N is approximated by I N

l r

\

i=l

where Nb is the total buffer size and Qa is the steady state length of the i-th output queue. Packet lost probability versus number of buffers are shown in Fig. 5 and Fig. 6 under uniform traffic with load p = 0.9 and p = 0.85, respectively,by analytical approach and computer simulation. The results derived by analysis agree with that derived by simulation. Fig. 7 illustrates the packet lost probability versus the number of BOMs, B . The results shows that the StarPlus needs only a small number of BOMs to achieve a low packet lost probability even under heavy load. For example, when the input line load p = 0.9, no more than four BOMs (B < 4) are enough to achieve packet lost probability. Under non-uniform traffic, packets arrival between switch ports are assumed to be an i.i.d. Bernoulli process with parameter p. Let the set of output ports be denoted by S. Assume that a subset of the output ports A C S is heavily loaded such that T percent of the offered load is heading for A. The rest of traffic is uniform distributed among output ports in B = S - A. The load P A for each output port in A is given by

E

le-l

2

le2

28

I

le-3

-

3c

8 2

5

1.

_ _ _ _Analysis, N 128 A Simulation,N = 32 5

A Simulation, N = 128

le-5 le-6 le-7

-

1-8

I

I!

2

4

6

I

I

I

I

8

10

12

14

NUMBER OF BUFFERS

Figure 6: Packet lost probability versus number of buffers. Uniform traffic, p = 0.85. UNIFORM TRAFFIC, N = 128 1 ,

I

II

II

1

2

3

I

I

4

5

I

6

7

I

8

NUMBER OF BOMs

where IS1 and ]AI are the size of S and A, respectively. The load p~ for each output port in B is

= ( 1 - T)P, 0 5 T

-Analysis, N = 32

le-4

le-8

PB

UNIFORM TRAFFIC, p = 0.85

,

Figure 7: Packet lost probability versus number of BOMs. Uniform traffic, N = 128.

(14)

206

1 le-1 PA

88

le-3

E(

le-4

1

le-5

d

le-6

CI rA

_ _ r = 0.7

U

le-7 le-8 0.2

0.4 0.6 0.8 OFFERED LOAD

1.0

Figure 8: Packet lost probability versus offered load p. Non-uniform traffic, N = 128, B = 4. Fig. 8 shows simulation results of the packet lost probabilities versus offered load. PA and PB are the packet lost probability for group A and group B, respectively. The results reveals that in StarPlus the packet lost ratio of lightly loaded switch ports are not effected by the traffic from heavily loaded switcih ports. This verifies that the StarPlus shares buffer space f d y among switch ports.

tween the destination address of the packet at the (i - l)-th line and that of the packet at the i-th line. If the destination addresses are different, then the T (tag) field of the packet at the i-th line is set to 1. The T field of a packet is set only if the packet should be forwarded to upper output Ow.So far the packets tagged and to be forwarded to Ou do not have a output conflict. In some cases, the number of packets not tagged yet may exceeds N and hence exceeds the capacity of the lower output, O L . Therefore, we should trap some of the un-tagged packets originally come from IU and forward them to OU.The trap network is in charged of doing this. The packets come from Iu are marked by setting the F field in advance. The trap network can be constructed by two running adders and some combinational circuits as shown in Fig. 10. Let the value of the A field equals the value of the T field and the value of the B field equals F A ? The i'. SA field and the SB field are the running sums of the A field and the B field, respectively. Then the TA field contains the total number of tagged packets. The SB field is the order of the packet that is not tagged but comes from Iu. Therefore, we may trap more packets by setting T = T V ((TA

+ SB)) A B.

4 Implementation Issues The block diagram of the BOM is shown in Fig. 9. A BOM can be constructed by a 2N x 2N merger, 2N comparators, one trap network, two pseudo output generators, and two concentratoi*s.

N

1,

N

1,

2Pl -+*

;I 2m 0

2N

-H

Figure 9: The block diagram of the BOM. Since packets fed from the upper input, Iu, and lower input, IL,are already sorted, these packets can be merged into a single sorted list by a merger. The merger can be constructed by a Banyan network of 2 x 2 sorting nodes. The comparator at the i-th line makes a comparison be-

TA

SI.

Figure 10: The trap network and an example. An example of the CROM operation done by the BOM is shown in Fig. 11. The pseudo output generators are constructed by 2N x 2N running adders. The first pseudo output generators applies only to the tagged packets (T = 1). The second pseudo output generators applies only to the un-tagged packets (T = 0). The concentrator networks are constructed by reverse Banyan networks. It is known that the reverse Banyan network is non-blocking if the destination addresses of the input packets are compacted and in a sorted order [91. The buffer management procedure can be performed by the BOM and some extra circuits to control the multiplexers and demultiplexersas shown in Fig. 12. The hardware complexity of the running adders and the mergers are all of O ( N log N). Therefore, the hardware

207

through the copy network, the StarPlus may be very suitable for such an application. However, the traffic model at the output end of the copy network need to be established and the overall system performance requires further evaluation.

References [ 13 A. E. Eckberg and T.-C. Hou, “Effect of output buffer

T-0

sharing on buffer requirements in an ATDM packet switch:’ in Proc. IEEE INFOCOM’88, pp. 459466, Mar. 1988.

Figure 11: A CROM example.

121 Y. S . Yeh, M. G. Hluchyj, and A. S . Acampora, “The knockout switch: A simple, modular architecture for high-performance packet switching,” IEEE Trans. Commun., vol. SAC-5, pp. 1274-1283, Oct. 1987. [3] A. Huang and S . Knauer, “Starlite: A wideband digital switch,” in Prac. IEEE GLOBECOM’84,pp. 121125,1984.

.......................... : .,,.,.,.,...._..~ . Skp 1. Shih

UL SkpZ. CROM toUB I and LB

,,,

.......................

:

Skp3. CROM

; :

[4] M. J. Karol, M. G. Hluchyj, and S. P. Morgan, “Input versus output queueing on a space-division packet switch,”IEEE Trans. Commun.,vol. COM-35, pp. 1347-1356, Dec. 1987.

.............-

tom, and LB I

[5] W.-T. Chen, H.-J. Liu, and Y.-T. Tsay, “Highthroughput cell scheduling for broadband switching systems,” IEEE J. Select. Areas Commun., vol. 9, pp. 1510-1523, Dec. 1991.

Figure 12: Perform buffer management by BOM. complexity of the BOM is of O(N1ogN). The hardware complexity of the batcher sorter is of O(N log2 N). The hardware complexity of the StarPlus is clearly dominated by the batcher sorter and hence is of O(N log2N).

[6] M. G. Hluchyj and M. J. Karol, “Queueing in highperformance packet switching,”IEEE J. Select. Areas Commun., vol. 6, pp. 1587-1597, Dec. 1988.

5 Conclusions and Future Works

[7] K. Batcher, “Sorting networks and their applications,” in Proc. AFIPS, pp. 307-3 14,1968.

In this paper a high-performance shared-buffer ATM switch, named Starplus, is proposed. In StarPlus buffer space are shared completely by all the switch ports to achieve a very high buffer utilization. A fair buffer sharing strategy is employed to avoid buffer hogging and unfair allocation under non-uniform traffic patterns. The performance of the StarPlus switch is evaluated by both analysis and simulation. The evaluation results reveals that buffer utilization in StarPlus is fair and high. Although extra hardware is required to implement the buffer management scheme, the packet lost probability and system throughput in StarPlus can both be improved when the offered load is heavy and the traffic is biased. Throughout the simulation study, we find that the proposed buffer management scheme does not cause out of sequence problem. However, formal proof is required. The StarPlus can be extended to be a multicast ATM switch by adding a copy network [8,10] in front of the batcher sorter. Since output contention is very likely to occur after passing

[8] T. T. Lee, “Nonblocking copy networks for multicast packet switching,” IEEE J. Select. Areas Commun., vol. 6, pp. 1455-1467, Dec. 1988. [9] H. S. Kim and A. Leon-Garcia, “Nonblocking property of reverse banyan networks,” IEEE Trans. Commun., vol. 40,pp. 472-476, Mar. 1992. [ 101 X. Liu and H. T. Moutfah, “Design of a high perfor-

mance nonblocking copy network for multicast ATM switching,” IEE Pmc.-Commun., vol. 141, pp. 317324, Oct. 1994.

208

Recommend Documents