Performance Modelling and Analysis of ... - Semantic Scholar

Comment

Report 0 Downloads 121 Views

Performance Modelling and Analysis of Interconnection Networks with Spatio-Temporal Bursty Traffic Geyong Min1, Yulei Wu1, Mohamed Ould-Khaoua2, Hao Yin3, and Keqiu Li4 1

School of Computing, Informatics and Media, University of Bradford, Bradford, BD7 1DP, UK 2 Department of Computing Science, University of Glasgow, Glasgow, G12 8RZ, UK 3 Department of Computer Science, Tsinghua University, 100084, Beijing, China 4 Department of Computer Science and Engineering, Dalian University of Technology, P.R. China Email: {g.min, y.l.wu}@brad.ac.uk, [email protected], [email protected], [email protected] Abstract- The k-ary n-cube which has an n-dimensional grid structure with k nodes in each dimension has been a popular topology for interconnection networks. Analytical models for kary n-cubes have been widely reported under the assumptions that the message destinations are uniformly distributed over all network nodes and the message arrivals follow a non-bursty Poisson process. Recent studies have convincingly demonstrated that the traffic pattern in interconnection networks reveals the bursty nature in the both spatial domain (i.e., non-uniform distribution of message destinations) and temporal domain (i.e., bursty message arrival process). With the aim of capturing the characteristics of the realistic traffic pattern and obtaining a comprehensive understanding of the performance behaviour of interconnection networks, this paper presents a new analytical model for k-ary n-cubes in the presence of spatio-temporal bursty traffic. The accuracy of the model is validated through extensive simulation experiments of an actual system.

1.

INTRODUCTION

The k-ary n-cube, where k is referred to as the radix and n as the dimension, has been a popular topology for interconnection networks owing to its desirable properties, such as ease of implementation, recursive structures, and the ability to exploit communication locality to reduce message latency [5]. The configuration of k-ary n-cube networks can be obtained by adding wrap around connections on edge and corner nodes to the n-dimensional mesh networks [2]. The traffic pattern has a significant impact on the performance of interconnection networks. The arrival process and destination distribution of messages are the most important parameters used to define the traffic pattern [5]. A number of recent measurement studies have convincingly demonstrated that the message arrival process generated by many real-world applications reveals the bursty nature and the message destinations follow the non-uniform distribution [3, 11]. Such traffic patterns have significantly different theoretical properties from the conventional non-bursty message arrival process and the uniform distribution of message destinations [9, 13]. To obtain a comprehensive understanding of network performance under more realistic working conditions, it is important to take into account the bursty nature of traffic on the both temporal and spatial domains, characterised by the bursty message arrival process and non-uniform distribution of message destinations, respectively.

Many performance models for interconnection networks have been reported [1, 4]. Most of these models are based on the assumptions that the message destinations are uniformly distributed over all network nodes and the message arrivals follow a non-bursty Poisson process. Several recent studies have explored to develop the analytical models to handle the non-uniformly distributed message destinations [14] or bursty message arrivals [9], separately. To the best of our knowledge, there has not been any study reported in the current literature to handle the traffic burstiness in the both temporal and spatial domains simultaneously. With the aim of capturing the characteristics of the realistic traffic patterns and obtaining a comprehensive understanding of the performance behaviour of interconnection networks, this study presents a new analytical performance model for k-ary n-cube networks in the presence of spatio-temporal bursty traffic. This model adopts the well-known Markov-modulated Poisson process (MMPP) [6, 9] to model the traffic burstiness in the temporal domain. Moreover, the hot-spot destination model proposed by Pfister and Norton [11] is used to capture the traffic burstiness in the spatial domain. The traffic characteristics on network channels are determined in a hopby-hop scheme with respect to the distance to the hot-spot node, due to the non-uniform distribution of traffic on network channels. The proposed model employs an MMPP/G/1 queueing system with infinite buffer capacity to compute the average degree of virtual channel multiplexing [1]. Extensive simulation experiments are conducted to validate the accuracy of the model. The rest of this paper is organised as follows. Section 2 reviews some useful preliminaries. Section 3 presents the derivation of the analytical model. Extensive simulation experiments are used to validate the accuracy of the model in Section 4. Finally, Section 5 concludes this study. 2.

PRELIMINARIES

2.1. Interconnection Networks The k-ary n-cubes [5] have N nodes, where N = k n . Each node consists of a processing element (PE) and a router. The router of a node is connected to its neighbouring nodes through n incoming and n outgoing channels. In addition, the router is connected to its local PE through injection and

978-1-4244-4148-8/09/$25.00 ©2009 This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2009 proceedings.

ejection channels, respectively. To improve the network performance, each physical channel is partitioned into V virtual channels, which share the bandwidth of the same physical channel in a time-multiplexed fashion. The incoming and outgoing channels are connected by an (n + 1)V − way crossbar switch, which can simultaneously connect multiple inputs to multiple output channels in the absence of channel contention. Pipelined circuit switching (PCS) [13] has been suggested as an efficient message switching method for preserving both network performance and fault-tolerant demands in interconnection networks. In PCS, a message consists of flits for transmission and flow control. The header flit (containing the routing information) needs to reserve the path between its source and destination before the remaining flits of the message start to move along this path in a pipelined fashion. If the header cannot progress upon reaching an intermediate router because the required channel is busy or faulty, it can release the last reserved channel by backtracking to the preceding node and then try alternative paths. Once the header reaches its destination an acknowledgement flit is sent back via the reserved path to the source. 2.2. Modelling Traffic Burstiness in Temporal Domain MMPP has been widely adopted to model the message arrival behaviour of bursty traffic in the temporal domain [6, 9] since it can capture the time-varying arrival rate and the important correlations between inter-arrival times and still remains analytically tractable. The two-state MMPPs (the subscript s denoting the traffic generated by source nodes) used to capture the traffic burstiness in the temporal domain can be characterised by the infinitesimal generator Q s of underlying Markov chain and rate matrix Λ s as

⎡− ϕ Q s = ⎢ s1 ⎣ ϕ s2

ϕ s1 ⎤ ⎡λ and Λ s = ⎢ s1 − ϕ s 2 ⎥⎦ ⎣0

0 ⎤

λ s 2 ⎥⎦

(1)

where the element ϕ s1 is the transition rate from state 1 to 2 and ϕ s 2 is the rate out of state 2 to 1. λs1 and λs 2 are the traffic rate when the Markov chain is in state 1 and 2, respectively. 2.3. Modelling Traffic Burstiness in Spatial Domain Hot-spot is able to capture the characteristics of the nonuniform distribution of message destinations where a number of nodes direct a fraction of their messages to the hot-spot node [10]. This node often receives a larger amount of traffic than the other network nodes, which causes the higher traffic loads on network channels located closer to the hot-spot node. Hot-spot traffic has strong evidence of its existence and the great efforts on network performance. The hot-spot model proposed in [11] is used to generate the non-uniform distribution of message destinations in this study. Specifically, each message has the probability δ to be directed to the hotspot node (i.e., hot-spot messages), and the probability (1 − δ )

of being evenly directed to all network nodes (i.e., regular messages). 2.4. Assumptions The model is based on following assumptions [1, 4, 9-11, 13, 14, 16]: a) source nodes generate traffic according to an MMPPs model, which is characterised by the infinitesimal generator Q s and rate matrix Λ s given by Eq. (1); b) message destinations are non-uniformly distributed over the network nodes, as depicted in Section 2.3; c) message length is m flits where m is a random variable with the LaplaceStieltjes transform (LST) Fm∗ (s ) . Each flit requires one-cycle transmission time to cross a physical channel; d) messages are routed adaptively through the network using PCS.

3.

DERIVATION OF THE MODEL

The communication delay is a key performance metrics used to evaluate the interconnection network and it consists of: a) the network latency, T , that is the mean time for a message to cross the network; and b) the delay experienced by a message at the source node, Ws . In order to model the effects of virtual channel multiplexing, the delay has to be scaled by a factor, V , representing the average degree of virtual channel multiplexing that takes place at a given physical channel [10]. Thus, the communication delay can be written as Delay = (T + Ws )V

(2)

In what follows, we will derive the expression of T , Ws , and V , respectively. 3.1. Network Latency for Regular Messages and Hot-Spot Messages The average number of hops that a regular message can make within one dimension is given by k r = (k − 1) / 2 , and within the network is d r = nk r [2]. On the other hand, the hot-spot messages can make, in a given dimension, from 0 to (k − 1) hops, and j (1 ≤ j ≤ n(k − 1)) hops within the network. Due to the symmetry of the k-ary n-cube and the use of adaptive routing, the regular message is destined evenly across the network. Thus, its network latency, t r , can be computed by the sum of the actual message transmission time , (m + d r ) , and the time to set up a path as follows:

t r = m + d r + cr

(3)

where cr is a random variable representing the path set-up time for a regular message and will be derived in Section 3.2. Unlike regular messages, the hot-spot message is nonuniformly distributed over the network. With the hot-spot traffic, a number of nodes direct a fraction of their messages to the hot-spot destination node. As a result, we should consider different path set-up time and transmission time of the message with different locations regarding to the hot-spot node to compute the network latency for a hot-spot message. Let t h j denote a random variable representing the network

978-1-4244-4148-8/09/$25.00 ©2009 This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2009 proceedings.

latency for a j-hop hot-spot message. Based on Eq. (3), t h j can be given by t h j = m + j + ch j ,

(4)

1 ≤ j ≤ n(k − 1)

where ch j is a random variable that denotes the path set-up time for a j-hop hot-spot message and will be obtained in Section 3.2. Since the Laplace transform of the sum of independent random variables equals to the product of their transforms [8], we can easily express the LST of t r and t h j as Ft*r ( s ) = Fm* ( s )e − sdr Fc*r ( s ) and Ft*h ( s ) = Fm* ( s )e − sj Fc*h ( s ) (5) j

Fc*r

j

Fc*h j

where ( s ) and ( s ) denote the LST of the path set-up time for regular messages and j-hop hot-spot messages, respectively. These two quantities will be given in Section 3.2. The probability, Pj , that a message needs to make j hops to reach its destination is Pj = Nodes j ( N − 1) , where Nodes j is the number of nodes located j hops away from a given node. Given that a j-hop message may make from 0 up to (k − 1) hops in each of n dimensions. Nodes j can be determined by Nodes j = G0k −1 ( j , n) , where Grr +q −1 (u, c) denotes the number of ways to distribute u hops into c dimensions with at least r hops and at most (r + q − 1) hops in each dimension and is given by the following expression obtained from the Combinatorial Theory [12] Grr + q −1 (u, c) =

∑A=0 (−1) A CcA Cuc−−1cr −qA+c−1 c

(6)

where CcA denotes the number of ways to choose A objects out of the set of c members. Averaging all possible hops j made by a hot-spot message yields the network latency for the hot-spot messages as, t h = ∑nj (=k1−1) Pj t h j . The mean network latency can be computed by the sum of the network latencies for the regular and hot-spot messages with their appropriate probabilities as T = (1 − δ )t r + δ t h

(7)

3.2. The Laplace-Stieltjes Transform of Path Set-Up Time for Regular Messages and Hot-Spot Messages Before deriving the LSTs of path set-up time for regular and hot-spot messages, we need to compute their time to establish a path. Let us first consider regular messages. With the use of PCS, the header needs to make, on average, d r hops to establish a connection in the absence of blocking, and the same number of hops is needed to send an acknowledgement back. Given that the header requires one cycle transmission time to move from one node to the next. According the behaviour of PCS, the header requires two extra cycle transmission time when it encounters blocking. Based on the above analysis, the mean time to establish a path is d −1 c r = 2⎛⎜ d r + ∑i =r0 1 × Pbir ⎞⎟ ⎝ ⎠

(8)

blocked at a given network channel when all virtual channels of the remaining dimensions to be visited are busy. Since a regular message makes, on average, k r hops per dimension, the probability that a message terminates at the current dimension is 1 / k r . Thus, the probability that a message has finished crossing, say, A out of n dimensions, and needs to visit the remaining (n − A) dimensions can be given by C nA (1 − 1 / k r ) n−A (1/ k r ) A . Summing up all cases of A (0 ≤ A ≤ n − 1) yields the probability, Pbir, j , that the header of a regular message is blocked at its i-th hop channel located j hops away from the hot-spot node. Pbir, j can be expressed as Pbir, j = ∑A =0 C nA (1 − 1 / k r ) n−A (1 / k r ) A ΩVn−, Aj n −1

(9)

where the probability ΩV , j denotes that all V virtual channels at a given physical channel located j (1 ≤ j ≤ n(k − 1)) hops away from the hot-spot node are busy (The calculation of ΩV , j will be given in Section 3.4). Since there are N c j channels that are j hops away from the hot-spot node and nN output channels in the network, the probability, Pch j , that a channel is located at j hops away from the hot-spot node is Pch j = N c j /(nN ) , where N c j can be found as N c j = ∑np−=10 (n − p)C np G1k −1 ( j , n − p) [14]. Therefore, the probability, Pbir , can be given by

Pbir =

∑ j =1

n ( k −1)

Pch j Pbir, j

(10)

Examining the expression of cr given by Eq. (8), it is infeasible to find the exact expression of the distribution of the path set-up time for the regular message. For the analytical tractability, we model the distribution of the path set-up time for regular messages by a Gamma distribution, since it can capture the first two moments of the distribution and has been adopted in some related studies [8, 9]. Therefore, the LST of the path set-up time, Fc*r ( s ) , can be given by

Fc*r ( s ) = υ θ /(υ + s )θ

(11)

where υ and θ are selected to match the mean and variance of the path set-up time for a regular message, and can be expressed as

υ = c r /(c r − 2d r ) 2 and θ = crυ

(12)

with the variance value approximated by the method provided in [4]. When a j-hop hot-spot message is blocked at its i-th hop channel, it is (i − 1) hops away from its source and ( j − i + 1) hops away from the hot-spot node. Given that the average number of hops made by a j-hop hot-spot message in each dimension can be approximated as j / n ( min(1, j / n) should be used instead of j / n to account for the case where j < n ) [14]. According to the calculation of cr , the mean time to establish a path, ch j , for a j-hop hot-spot message can be expressed as

where Pbir is the probability that the header of a regular message suffers blocking after making i hops. A header flit is

c h j = 2⎛⎜ j + ⎝

∑i=01× Pbih, j−i+1 ⎞⎟⎠ j −1

(13)

978-1-4244-4148-8/09/$25.00 ©2009 This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2009 proceedings.

where Pbih, j −i +1 is the probability that the header of a j-hop hot-spot message is blocked at its i-th hop channel and can be obtained as Pbih, j −i +1 = ∑An=−10 C nA (1 − n / j ) n −A (n / j ) A ΩVn−, Aj −i +1 . The LST of ch j , Fc*h ( s ) , can be given by j Fc*h ( s ) = ϑ σ (ϑ + s )σ , with ϑ and σ matching the mean j and variance of the path set-up time for a j-hop hot-spot message. 3.3. Traffic Characteristics on Network Channels The loads at network channels located j hops away from the hot-spot node consist of both the regular and hot-spot traffic, formed by regular and hot-spot messages, respectively. With the use of adaptive routing, the regular messages are destined evenly across the network nodes, resulting in the balanced loads on all channels. Since the header flit of a message visits, on average, cr / 2 channels to successfully establish a connection between its source and destination, and each node has n output channels. Thus, the amount of traffic arriving at each network channel is, on average, ψ r times of that generated by the source node; ψ r can be given by

Ncr / 2 c (1 − δ ) = r (1 − δ ) (14) nN 2n Generally, ψ r is not an integer value due to its determination from the network size, the properties of traffic generated by the source nodes, together with the hot-spot fraction. We partition ψ r into two parts: integral part and fractional part, denoted by ψ Ir and ψ Fr respectively. Given that the splitting of an MMPP distribution gives rise to a new MMPP [6]. Let MMPPFr denote the traffic flow splitting from MMPPs with the splitting probability ψ Fr . The infinitesimal generator, Q Fr , and rate matrix, Λ Fr , of MMPPFr can be given by

ψr =

Q Fr = Q s and Λ Fr = ψ Fr Λ s

(15)

Since the superposition of multiple MMPPs is again an MMPP distribution [6], let MMPPIr denote the traffic determined by the superposition of ψ Ir traffic flows with MMPPs . Using the method provided in Ref. [7] and [9], we can obtain the infinitesimal generator, Q Ir , and rate matrix, Λ Ir , of the superposed traffic flow MMPPIr . We calculate the Kronecker sum of the parameter matrices of MMPPFr and MMPPIr to parameterise the MMPPcr characterising the regular traffic arriving at a given network channel [6]. Thus, the infinitesimal generator, Q cr , and rate matrix, Λ cr , of MMPPcr can be given by Q cr = Q Fr ⊕ Q Ir

and Λ cr = Λ Fr ⊕ Λ Ir

(16)

The hot-spot traffic is non-uniformly distributed over the network channels. According to the properties of such a traffic pattern, the traffic characteristics on network channels that are j (1 ≤ j ≤ n(k − 1)) hops away from the hot-spot node are identical. Since there are N c j channels located j hops away from the hot-spot node and the number of nodes that are more than j hops away from a given node can be expressed as N ≥ j = N − ∑ij=−01 Nodesi , the hot-spot traffic arriving at these

network channels is equal to ψ h j times of the traffic generated by the source nodes. ψ h j can be given by

ψ hj =

N≥ j Nc j

δ

(17)

The parameter matrices of MMPPch j characterising the hotspot traffic arriving at a network channel located j hops away from the hot-spot node can be obtained according to Eqs. (14)(16). The superposition of these two types of traffic yields MMPPc j , representing the superposed traffic loads at network channels that are j hops away from the hot-spot node. Due to the symmetry of the k-ary n-cube topology, the use of adaptive routing distributes regular traffic evenly across the network channels, and, thus the mean service time experienced by a regular message on each channel is identical and is equal to its network latency, t r . With the presence of hot-spot traffic, the service time experienced by a hot-spot message varies from one network channel to another, depending on the locations with respect to the hot-spot node. Taking both the regular and hot-spot traffic with their appropriate weights into account yields the service time at network channels located j hops away from the hot-spot node as Tc j =

λch j λcr tr + th λc j λc j j

(18)

where λcr , λch j , and λc j are the arrival rate of the regular traffic, hot-spot traffic and the superposed traffic on network channels located j hops away from the hot-spot node, respectively. 3.4. Average Degree of Virtual Channel Multiplexing The probability, Ω v , j , denotes that v (0 ≤ v ≤ V ) virtual channels at a given physical channel located j (1 ≤ j ≤ n(k − 1)) hops away from the hot-spot node are busy and can be determined using the probability that there are v packets in an MMPP/G/1 queueing system [1]. Specifically, the probability that there are v virtual channels busy, 0 ≤ v ≤ V − 1 , corresponds to the probability that there are v packets in the queue, while the probability that all virtual channels are busy is the summation of the probabilities that there are v , V ≤ v ≤ ∞ , packets in the queue. Therefore, Ω v , j can be given by [15]

⎧π c (I − R j )R ij e c ⎪ j Ω v, j = ⎨ i ⎪⎩π c j R j e c

0 ≤ v ≤ V −1 v =V

(19)

where the matrix, R j , can be computed by solving the quadratic matrix equation A j + R j B j + R 2j C j = 0

(20)

with A j = Λ c j , B j = Q c j − Λ c j − I c / Tc j , and C j = I c / Tc j . The algorithm used to compute the matrix R j can be found in [15]. I c is the identity matrix and e c is the column unit

978-1-4244-4148-8/09/$25.00 ©2009 This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2009 proceedings.

vector with the same size of Λ c j . π c j is the steady state vector of MMPPc j satisfying the following equations: π c j Q c j = 0 and π c j e c = 1

(21)

Solving these equations yields the steady state vector, π c j , as π c j = Α(I c − Β + e c Α) −1 , where Β = I c + Q c j min {Q c j (i, i )} and Α is an arbitrary row vector of Β [6]. With the joint probability Ω v, j , we can calculate the average degree of virtual channel multiplexing, V , based on the method presented in [2] and [13]. 3.5. Waiting Time at Source Nodes Messages generated by each source node follow an MMPPs and can enter the network through one of V injection virtual channels with equal probability, 1 / V . Let MMPPv denote the traffic arriving at an injection virtual channel at the source node. The infinitesimal generator, Q v , and rate matrix, Λ v , of MMPPv can be determined from the splitting of MMPPs with the splitting probability 1 / V . A regular message injected from a source node located j hops away from the hot-spot node experiences a network latency of t r , whereas a hot-spot message experiences the latency of t h j . Taking both the regular and hot-spot messages with their appropriate weights into consideration yields the service time for a message originating from a source node located j hops away from the hot-spot node as t s j = (1 − δ )t r + δ t h j . The LST of the service time, t s j , can be written as

Ft*s ( s ) = (1 − δ ) Ft*r ( s )δFt*h ( s ) j

(22)

j

To determine the waiting time, Ws j , experienced by a message at the source node that is located j hops away from the hot-spot node, the local queue is modelled as an MMPP/G/1 queueing system [6]. Thus, Ws j can be expressed as Ws j =

1 [2 ρ s j + λv t s( 2) − 2t s j ((1 − ρ s j )g v j 2 ρ s j (1 − ρ s j ) −1 ˆ

+ t s j π v Λ v )(Q v + e v π v ) λ ] −

λv t s(2) j

2ρ s j

(23)

where t s j and t s( 2) represent the first two moments of the j service time experienced by a message generated by the source located j hops away from the hot-spot node, and can be computed from Ft*s ( s ) , given by Eq. (22). The traffic j intensity is ρ s j = t s j λv , where λv is the mean traffic rate arriving at an injection virtual channel, and is equal to π v λˆ . π v is the steady-state vector of MMPPv , and λˆ = Λ v e v . e v is the column unit vector of length 2. The algorithm used to compute g v can be found in [6]. Averaging over all possible values of j yields the mean waiting time at a given source node as Ws =

∑ j =1

n ( k −1)

Pj W s j

(24)

4.

VALIDATION OF THE MODEL

The above analytical model has been validated by means of a discrete event-driven simulator, operating at the flit level. Extensive simulation experiments have been performed to validate the model. However, for the sake of specific illustration, Figs. 1 and 2 depict the performance results for the communication delay predicted by the above analytical model plotted against those provided by the simulator as a function of the generated traffic for the following cases: network size is N = 16 2 , 12 3 , and 2 8 nodes; message length is m = 32, 48, 64, and 96 flits; number of virtual channels is V = 8 and 10 per physical channel; the infinitesimal generator, Q s , of MMPPs and the hot spot fraction, δ , are set in the captions of figures, representing different degrees of traffic burstiness in the temporal and spatial domains. In these figures, the horizontal axis represents the traffic rate, λs1 , at which a node injects messages into the network when the MMPPs is at state 1, while the vertical axis denotes the communication delay obtained. For the sake of clarity of the figures, we have deliberately set the arrival rate, λs 2 , at state 2 at zero; otherwise we need to use three-dimensional graphs to represent the results. These figures reveal that communication delay obtained from the above derived model closely match those obtained from the simulation.

5.

CONCLUSIONS

The traffic pattern in the interconnection networks reveals the bursty nature in the both temporal and spatial domains. This study has developed a new analytical model to investigate the communication delay in k-ary n-cube networks in the presence of bursty message arrivals with non-uniform destination distributions. The comparison between the analytical results and those obtained from extensive simulation experiences has shown that the derived analytical model possesses a good degree of accuracy for predicting the delay in interconnection networks with various sizes and under different operating conditions.

REFERENCES [1]

[2] [3]

[4] [5] [6]

N. Alzeidi, A. Khonsari, M. Ould-Khaoua, and L.M. Mackenzie, "A New General Method to Compute Virtual Channels Occupancy Probabilities in Wormhole Networks," Journal of Computer and System Sciences, vol. 74, no. 6, pp. 1033-1042, 2008. W.J. Dally and B.P. Towles, Principles and Practices of Interconnection Network, Morgan Kaufmann, 2004. P.A. Dinda, "Design, Implementation, and Performance of an Extensible Toolkit for Resource Prediction in Distributed Systems," IEEE Trans. on Parallel and Distributed Systems, vol. 17, no. 2, pp. 160-173, 2006. J.T. Draper and J. Ghosh, "A Comprehensive Analytical Model for Wormhole Routing in Multicomputer Systems," Journal of Parallel & Distributed Computing, vol. 32, no. 2, pp. 202-214, 1994. J. Duato, S. Yalamanchili, and L. Ni, Interconnection Networks: An Engineering Approach, Morgan Kaufmann, 2003. W. Fischer and K. Meier-Hellstern, "The Markov-Modulated Poisson Process (MMPP) Cookbook," Performance Evaluation, vol. 18, no. 2, pp. 149-171, 1993.

978-1-4244-4148-8/09/$25.00 ©2009 This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2009 proceedings.

[8] [9]

[10]

[11]

Communication Delay (cycles)

[12]

H. Heffes, "A Class of Data Traffic Processes-Covariance Function Characterization and Related Queueing Results," Bell System Technical Journal, vol. 59, no. 6, pp. 897-929, 1980. L. Kleinrock, Queueing Systems, vol. 1, New York, John Wiley, 1975. G. Min, M. Ould-Khaoua, D.D. Kouvatsos, and I.U. Awan, "A queuing model of dimension-ordered routing under self-similar traffic loads," in Proc. of IEEE International Conference on Parallel and Distributed Processing Symposium (IPDPS'04), CD-ROM, 2004. M. Ould-Khaoua and H. Sarbazi-Azad, "An Analytical Model of Adaptive Wormhole Routing in Hypercubes in the Presence of Hot Spot Traffic," IEEE Trans. on Parallel and Distributed Systems, vol. 12, no. 3, pp. 283-292, 2001. G.J. Pfister and V.A. Norton, "Hot-Spot Contention and Combining in Multistage Interconnection Networks," IEEE Trans. on Computers, vol. 34, no. 10, pp. 943-948, 1985. J. Riorda, An Introduction to Combinatorial Analysis, John Wiley & Sons, 1985.

660 Model, m = 64 Model, m = 96

440

Sim , m = 64 Sim , m = 96

220

0

0

0.00044 0.00088 Traffic Rate (messages/cycle) (a)

0.00132

[13] [14]

[15] [16]

Communication Delay (cycles)

[7]

F. Safaei, A. Khonsari, M. Fathy, and M. Ould-Khaoua, "Pipelined Circuit Switching: Analysis for the Torus with Non-Uniform Traffic," Journal of Systems Architecture, vol. 54, no. 1-2, pp. 97-110, 2008. H. Sarbazi-Azad, M. Ould-Khaoua, and L.M. Mackenzie, "Analytical Modeling of Wormhole-Routed k-Ary n-Cubes in the Presence of HotSpot Traffic," IEEE Trans. on Computers, vol. 50, no. 7, pp. 623-634, 2001. H.-P. Schwefel, Performance Analysis of Intermediate Systems Serving Aggregated ON/OFF Traffic with Long-Range Dependent Properties, PhD Thesis, Technische Universität München, 2000. Y. Wu, G. Min, M. Ould-Khaoua, and H. Yin, "Analytical Modelling of Pipelined Circuit Switching with Bursty and Hot-Spot Traffic," in Proc. of the 10th IEEE International Conference on High Performance Computing and Communications (HPCC'08), IEEE Computer Society, Washington, DC, USA, pp. 470-477, 2008.

390 Model, m = 48 Model, m = 64

260

Sim , m = 48 Sim , m = 64

130

0

0

0.00041 0.00082 Traffic Rate (messages/cycle) (b)

0.00123

300 Model, m = 32 Model, m = 64

200

Sim , m = 32 Sim , m = 64

100

0

0

0.0044 0.0088 Traffic Rate (messages/cycle) (a)

0.0132

Communication Delay (cycles)

Communication Delay (cycles)

Fig. 1 Latency predicted by the model and simulation: (a) 16-ary 2-cubes, V = 10, ϕ s1 = 0.7, ϕ s 2 = 0.35, δ = 0.2 and (b) 12-ary 3-cubes, V = 8, ϕ s1 = 0.08, ϕ s 2 = 0.06, δ = 0.05.

360 Model, m = 32 Model, m = 64

240

Sim , m = 32 Sim , m = 64

120

0

0

0.0025 0.005 Traffic Rate (messages/cycle) (b)

0.0075

Fig. 2 Latency predicted by the model and simulation in the 2-ary 8-cube network with: (a) V = 10, ϕ s1 = 0.9, ϕ s 2 = 0.45, δ = 0.2 and (b) V = 8, ϕ s1 = 0.6, ϕ s 2 = 0.4, δ = 0.3.

978-1-4244-4148-8/09/$25.00 ©2009 This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2009 proceedings.

Recommend Documents

Performance Modelling of Opportunistic ... - Semantic Scholar

Performance analysis and optimization of ... - Semantic Scholar