Power Managed Packet Switching - Semantic Scholar

Report 3 Downloads 48 Views
Power Managed Packet Switching Aditya Dua∗ , Benjamin Yolken† and Nicholas Bambos∗† ∗ Department

of Electrical Engineering, of Management Science and Engineering, Stanford University Stanford CA 94305 Email: {dua,yolken,bambos}@stanford.edu

† Department

Abstract— High power dissipation in packet switches is fast turning into a key problem, owing to increasing line speeds and decreasing chip sizes. To address this issue, we introduce and develop the notion of a Power-Managed Input-Queued (PMIQ) switch in this paper. A PMIQ switch is an inputqueued switch with an additional hierarchy of control to regulate the power dissipated by the switch. We formulate the joint scheduling and power management problem for a PMIQ switch as a dynamic program (DP). Leveraging intuition gained from provable structural properties of the optimal solution to the DP, we propose the Power-Aware Switch Scheduling (PASS) switch management policy. PASS adaptively selects the rate/speed at which the switch operates, in conjunction with the switch configuration, as a function of the backlogs of the input buffers. PASS is characterized by a single parameter, which can be tuned to trade-off high power consumption for larger queuing delays. Simulation results show that PASS yields an attractive powerdelay trade-off relative to the benchmark maximum weight matching (MWM) scheduling policy. Further, PASS is robust and amenable to implementation because it has low computational complexity and is agnostic to traffic arrival statistics. Index Terms— Switching, Packet scheduling, Power management, Dynamic programming.

I. I NTRODUCTION The input-queued (IQ) packet switch architecture has been the subject of much attention in high-speed networking. Most initial work on switching focused on the output-queued (OQ) architecture, owing to its conceptual simplicity. While OQ switches can provide deterministic quality-of-service (QoS) guarantees [1], they are not attractive from an implementation perspective, owing to their high memory bandwidth requirements. The popularity of the IQ architecture stems from its low memory bandwidth and scalability. The low memory bandwidth of IQ switches is compensated for by employing intelligent scheduling/arbitration algorithms. The issues of designing high throughput ( [2], [3] etc.), QoS aware ( [4] etc.), and low complexity ( [5] etc.) scheduling algorithms for IQ switches have been extensively addressed in the literature. However, system level design of switches often overlooks physical requirements like power consumption, which are typically addressed only at the circuit level. Power consumption is becoming an increasingly serious problem, especially for single chip network routers, where the switch fabric contributes significantly to the overall power consumption [6], [7]. It was noted in [7] that power consumption increases almost linearly with switch throughput. As line speeds

keep increasing, the problem of high power consumption in switches will continue to gain prominence. Advances in VLSI design are being leveraged to densely pack digital electronic circuitry, resulting in unprecedented power densities (> 10W/cm2 ). Consequently, issues like packaging and thermal cooling are fast becoming critical factors in router design. The need for thermally resilient packaging and sophisticated cooling techniques add significantly to design and operational costs. In addition, excessive power dissipation may result in reduced reliability and, in extreme cases, physical damage to the system. High power density is a problem common to all system-onchip (SoC) designs. Dynamic power management techniques have been addressed at the circuit level as well as the system level in the literature. See [8], [9] and references therein for an overview. System level techniques, which are of primary interest to us here, curb power usage by employing dynamic power management (DPM) to adaptively vary the number of active system components at any time or by employing dynamic voltage scaling (DVS) to adjust the processor clock frequency and supply voltage to meet instantaneous performance requirements. These techniques have been shown to provide a performance-power trade-off for applications as diverse as IEEE 802.11 MAC design and microprocessor design. Wassal et. al. [10] studied traffic-aware system level design of low-power VLSI switching fabrics in a non-linear optimization framework. Wang et. al. [11] proposed three power efficient router micro-architectures for network-on-chip (NoC) implementations and reported significant power savings over baseline architectures based on simulations. Simunic et. al. [12] studied DVS and DPM for NoCs, based on renewal theory, in an optimization framework. Bambos et. al. [13] introduced the notion of a power managed packet switch (PMPS), which adjusts its operating characteristics to meet QoS requirements while seeking to minimize power consumption. Motivated by the relevance and severity of the high power consumption and density problem in packet switches, and various ideas propounded in the SoC/NoC paradigm for intelligent power management, we propose the Power-Managed Input-Queued (PMIQ) switch architecture in this paper. Our work is akin to [13] in spirit. The proposed PMIQ switch has two key components: the scheduler and the power man-

ager. While the scheduler performs the task of selecting a switch configuration in every time-slot, the power manager is an added hierarchy of control which adaptively configures the speed mode of the switch based on loading conditions. The speed mode determines the rate at which packets are transported across the switching fabric. The highest speed mode corresponds to normal IQ switch operation. While the switch would typically be operated in high speed modes when heavily loaded to curtail excessive backlogs, there is strong motivation to operate the switch in lower speed modes under light loading conditions in order to reduce the power consumed by the switch. However, reduced power consumption comes at the expense of increased packet delays. Thus, inherent in a PMIQ switch is a fundamental power-delay trade-off. We quantify this trade-off in our paper and study properties of the scheduling and power management policy which sweeps the optimal trade-off curve. A. Organization of the paper In Section II, we formally expound the notion of a PMIQ switch and formulate the joint scheduling and power management (SPM) problem for the PMIQ switch within a dynamic programming (DP) framework [14]. In Section III, we focus on the canonical 2 × 2 switch. We transform the SPM problem for a 2 × 2 switch into an equivalent problem for a parallel queue single-server model and present structural properties of the optimal policy for the latter. We leverage intuition gained from these structural properties to propose the PowerAware Switch Sscheduling (PASS) policy for a 2×2 switch in Section IV. We also present an extension of PASS to bigger PMIQ switches. We evaluate the performance of PASS via simulations in Section V and contrast it to the benchmark maximum weight matching (MWM) scheduler. We provide concluding remarks in Section VI. II. P ROBLEM D EFINITION A. The PMIQ switch architecture In this section, we expound the notion of a Power-Managed Input-Queued (PMIQ) packet switch. Consider an N × N switch with buffering only at the input ports and virtual output queues (VOQs) to prevent head-of-line (HOL) blocking. There are N VOQs at each input port, one corresponding to each output port. The VOQ containing packets destined from input port i to output port j is denoted Qij . The switch operates in slotted time. In each time-slot, at most one packet can be transferred from each input port to one of the output ports. Also, at most one packet can be delivered to each output port from one of the input ports. We refer to concurrent packet transfers which happen within a time-slot as a scheduling cycle. It is the task of the switch scheduler/arbiter to select a switch configuration (a matching between input and output ports) for every scheduling cycle. With each switch configuration, we associate an N 2 -length configuration vector. For an N × N switch, there are N ! possible configuration vectors, each one corresponding to a perfect matching in the bi-partite graph generated by the input

and output ports. We denote the ith configuration vector by vi . If vi (n) = 1, then input port dn/N e is connected to output port (n − 1) mod N + 1 when the ith switch configuration is exercised. Each configuration vector has exactly N unit entries, with the remaining entries being zero. For instance, there are two possible configurations for a 2 × 2 switch, corresponding to configuration vectors v1 = (1, 0, 0, 1) and v2 = (0, 1, 1, 0). Suppose that S consecutive time-slots are grouped together to form a super-slot, where S ∈ {1, 2, . . .}. Now, consider operating the switch in the following fashion: At the beginning of each super-slot, the scheduler selects a switch configuration and a speed mode k ∈ {1, . . . , S}. Speed mode k corresponds to exercising the chosen switch configuration k times within the super-slot (k scheduling cycles in S time-slots). Thus, k packets get transferred from the selected VOQs to the associated output ports during a super-slot. Speed mode S corresponds to “normal” switch operation, where one scheduling cycle occurs per time-slot. However, the switch configuration is reset only every S time-slots. We call a switch operated in this fashion a PMIQ switch. Each scheduling cycle dissipates energy in the switch. A PMIQ switch with S speed modes dissipates the most energy per time-slot (power) when operated in speed mode S and the least energy per time-slot when operated in speed mode 1. However, a scheduling cycle occurs every time-slot when speed mode S is used, as opposed to just one scheduling cycle every S time-slots when speed mode 1 is used. Thus, packets queued at the input ports will experience lower delay on average if the switch is operated in a higher speed mode. The foregoing discussion highlights the fundamental powerdelay trade-off inherent in a PMIQ switch. Large packet delays can be traded for higher power dissipation in a PMIQ switch by appropriately selecting the speed mode in which the switch operates. B. DP formulation for the PMIQ switch Consider the following buffer draining problem for a PMIQ switch: The initial backlog vector (at time 0, say) is denoted b0 = (b011 , . . . , b0N N ), where b0ij is the initial backlog of VOQ Qij . No further packet arrivals occur to any of the VOQs. The switch can be set in one of N ! possible configurations, indexed by set I. The switch can be operated in one of S speed modes. A switch configuration and speed mode are selected by a scheduling and power management (SPM) algorithm at the beginning of each super-slot. Suppose that the SPM selects configuration vector vi and speed mode k, given the current backlog vector b. Recall that each speed mode is associated with a different rate of energy dissipation in the switch. A “power cost” S(k) is incurred for choosing speed mode k. A “backlog cost” B(b0 ) is incurred, where b0 is the new backlog vector at the end of the current super-slot and is given by b0 = (b−kvi )+ . Here, (x)+ = max(x, 0), element-wise, and 0 is a vector with all zero entries. We assume that P (·) and B(·) are non-negative, strictly increasing, and convex functions. Thus,

i∈I

k=1,...,S

where b0 = (b − kvi )+ . The size of the action-space for the above dynamic program is N !S, which is extremely large even for moderately large N . The idea of solving (1) to obtain the optimal SPM decision in each backlog state, while theoretically sound, is impractical from an implementation perspective. It is thus imperative to re-formulate the scheduling and power management problem in a way such that its solution leads to a low-complexity policy, amenable to implementation. III. T HE S INGLE -S ERVER P ROBLEM A. Problem transformation For ease of exposition and analysis, we focus initially on the canonical 2 × 2 IQ switch. Consider the buffer draining problem for a 2 × 2 switch. The switch can be set into one of two possible configurations — C1 (VOQ Q11 and Q22 are scheduled) or C2 (VOQ Q12 and Q21 are scheduled). See Fig. 1 for an illustration. Now, given the initial backlog vector b0 , insert an appropriate number of “dummy packets” into the VOQs such that Q11 and Q22 have the same initial backlog (say, b01 ), and Q12 and Q21 have the same initial backlog (say, b02 ). In particular, b01 = max(b011 , b022 ) and b02 = max(b012 , b021 ). Next, consider a set of meta-packets generated by “fusing” the packets of Q11 and Q22 and another set by “fusing” the packets of Q12 and Q21 . Correspondingly, visualize two metaqueues, Q01 and Q02 , which contain these two sets of metapackets, respectively. If the switch is set in configuration C1 and speed mode k, then k meta-packets of meta-queue Q01 are transferred across the switch fabric in a super-slot. Likewise, if the switch is set in configuration C2 and speed mode k, then k meta-packets of meta-queue Q02 are transferred. In effect, our construction transforms the buffer draining problem for a 2 × 2 IQ switch into a buffer draining problem for a system with two parallel meta-queues being served by a single time-multiplexed server. While the optimal policies for the two problems are not identical in general (due to the introduction of dummy packets), we expect the singleserver formulation to provide key insights into the fundamental ∗ We restrict our attention to policies which do not idle if at least one of the VOQs is non-empty at the beginning of a super-slot. Since the system dynamics are stationary, and the system reaches terminal state 0 in finite time with finite cost incurred per time-slot, an optimal stationary policy exists.

1

2

2

Input Ports

1

Output Ports

the total cost of selecting configuration vector vi and speed mode k, given the backlog vector b, is P (k)+B((b−kvi )+ ). A policy Π is a list of SPM decisions in all possible backlog states. The buffer draining problem can be formulated within a DP framework, where the objective is to compute the optimal stationary policy which drives the system from state b0 to state 0 incurring the minimum possible cost, given the cost structure described above∗. We denote the cost-to-go function in state b by V (b). By definition, V (b) is the cost incurred by the optimal policy in reaching state 0, starting in state b. V (b) and the optimal decision in each state can be computed recursively via the following DP equations:   0 0 min {V (b ) + P (k) + B(b )} , (1) V (b) = min

1

1

2

2

(a)

Fig. 1.

(b)

Two possible configurations for a 2 × 2 switch — (a) C1 , (b) C2 .

scheduling and power management trade-offs for a 2 × 2 switch. B. DP formulation for the single-server model We now focus on the buffer draining problem for a singleserver system with two meta-queues, constructed in Section III-A. In line with our discussion in Section II-B, we can formulate the problem in a DP framework. The two possible configuration vectors in this case are v1 = (1, 0) and v2 = (0, 1), corresponding to scheduling Q01 and Q02 , respectively. Once again, we assume that the system can be operated in one of S speed modes in each super-slot. A power cost P (k) and a backlog cost B((b−kvi )+ ) are incurred if a policy selects the ith configuration and speed mode k when the backlog vector is b = (b1 , b2 ). We seek the optimal stationary policy Π? , which drives the system from initial backlog state b0 = (b01 , b02 ) to state 0, incurring the minimum possible cost. Denoting the cost-to-go function in state b by V (b), we have the following recursive DP equations ∀ b1 , b2 > 0. V (b1 , b2 ) = min{ min {V ((b1 − k)+ , b2 ) + P (k) + B((b1 − k)+ , b2 )}, | {z }

k=1,...,S

min {V (b1 , (b2 − |

k=1,...,S

Schedule Q01 , speed mode k k)+ ) + P (k) + B(b1 , (b2

{z

Schedule Q02 , speed mode k

− k)+ )}, }

(2)

and the boundary conditions V (0, 0) = 0, V (b1 , 0) =

min {V ((b1 −k)+ , 0)+P (k)+B((b1 −k)+ , 0)},

k=1,...,S

(3)

V (0, b2 ) =

+

+

min {V (0, (b2 −k) )+P (k)+B(0, (b2 −k) )}.

k=1,...,S

(4) While our problem formulation is valid for arbitrary choice of non-negative, non-decreasing, and convex cost functions P (·) and B(·), for the purposes of illustration and explicit computations, we now focus on the special class of quadratic cost functions; that is, P (k) = λk 2 , λ > 0 and B(b1 , b2 ) = b21 + b22 . The parameter λ captures the power-delay trade-off in a PMIQ switch. Choosing a large value for λ is equivalent to reducing power dissipation in the switch at the expense of increased packet delays, and vice-versa.

Proof: The proof is by contradiction and induction. We omit details due to space constraints. The idea is to establish that as b1 increases for fixed b2 (with b1 > b2 ) or b2 increases for fixed b1 (with b1 ≤ b2 ), the optimal speed mode chosen by Π? can only increase. The quantities {t1,k (b2 )} and {t2,k (b1 )} have the interpretation of being backlog dependent speed mode thresholds — the optimal speed mode increases by one every time the backlog crosses one of these thresholds and otherwise remains constant. By symmetry, t1,k (b) = t2,k (b), ∀ b, k. Also, we have: Lemma 1 (Monotonicity of Thresholds): t1,k (b2 ) is a nonincreasing function of b2 , and t2,k (b1 ) is a non-increasing function of b1 , k = 0, . . . , S. Theorem 1 and Theorem 2 jointly characterize the optimal SPM policy Π? for the two meta-queue single-server buffer draining problem with quadratic cost functions. Π? can be described as follows: 1) Given the meta-queue backlogs b1 and b2 , schedule Q01 if b1 ≥ b2 ; otherwise, schedule Q02 . 2) If Q01 is scheduled, choose speed mode k if b1 ∈ (t1,k−1 (b2 ), t1,k (b2 )]. If Q02 is scheduled, choose speed mode k if b2 ∈ (t2,k−1 (b1 ), t2,k (b1 )]. The decision regions generated by Π? for a typical choice of parameters (S = 4, λ = 103 ) are depicted in Fig. 2. D. An approximation to Π? The speed mode thresholds which characterize Π? cannot be computed in closed form; they are obtained by solving the DP

100

(2,4)

90

(2,3) 80

Backlog of meta−queue 2 − b2

C. Structural properties of Π? In this section, we study key structural properties of the optimal SPM policy Π? for quadratic cost functions. Our first result addresses the optimal scheduling policy. Theorem 1 (Optimality of LmQF): The longest meta-queue first (LmQF) policy is optimal; that is, Π? schedules Q01 in state b if b1 ≥ b2 , and schedules Q02 else. Proof: The proof is by induction. We omit details due to space constraints. The idea is to establish that if Π? schedules Q01 in state (b1 , b2 ), it also schedules Q01 in state (b1 + 1, b2 ). Similarly, if Π? schedules Q02 in state (b1 , b2 ), it also schedules Q02 in state (b1 , b2 + 1). The optimality of LmQF is intuitively appealing from “symmetry” considerations. Our next result concerns the optimal power management policy. Theorem 2 (Optimality of Threshold Type Policy): The optimal power management policy is of the threshold type; that is, ∃ functions t1,0 , t1,1 , . . . , t1,S and t2,0 , t2,1 , . . . , t2,S , the former varying with b2 and the latter with b1 , such that 0 = t1,0 < t1,1 (b2 ) < . . . < t1,S = b01 , 0 = t2,0 < t2,1 (b1 ) < . . . < t2,S = b02 , and 1) For b1 ≥ b2 , with b2 fixed, Π? selects speed mode k for b1 ∈ (t1,k−1 (b2 ), t1,k (b2 )]. 2) For b1 < b2 , with b1 fixed, Π? selects speed mode k for b2 ∈ (t2,k−1 (b1 ), t2,k (b1 )].

(1,4)

70

60

(2,2)

50

(1,3) 40

30

(2,1)

(1,2)

20

(1,1)

10

0

0

10

20

30

40

50

60

70

80

90

100

Backlog of meta−queue 1 − b1

Fig. 2. Decision regions generated by Π? for S = 4, λ = 103 . Each region is depicted in a different color and is labeled by a two-tuple (q, s), where q ∈ {1, 2} is the optimal scheduling decision and s ∈ {1, 2, 3, 4} is the optimal speed mode decision for backlog vectors contained in the region.

equations (2)-(4), thereby making the implementation of Π? quite cumbersome. To alleviate this problem, we now develop a low-complexity approximate implementation of Π? , which does not require explicit solution of the DP equations. As our first step, let us consider the boundary conditions for the DP, given by (3) and (4). Π? does not need to make any scheduling decisions in states (b1 , 0) and (0, b2 ); however, it still needs to make a power management decision. To do so, it needs to compute the thresholds t1,k (0) and t2,k (0) for k = 0, . . . , S. Once again, these thresholds are cumbersome to compute analytically. We will invoke a fluid caricature model to approximately compute t1,k (0), t2,k (0) in closed form. Fluid caricature model: Consider the buffer draining problem for a single-queue single-server continuous-time system. The queue has initial workload w0 . No new work arrives to the system. At each time-instant, the server chooses a rate r > 0 to drain the queue, as a function of the remaining workload w. A backlog (workload) cost is incurred at rate W (w), and a power (rate) cost is incurred at rate R(r). To establish an equivalence with the discrete-time model, we set W (w) = w2 and R(r) = λr2 . The goal is to compute the optimal buffer draining rate (as a function of residual workload), such that the total cost incurred in emptying the buffer is minimized. For this model, we can show Lemma 2: The optimal buffer draining rate as a function of w residual workload is given by r? (w) = √ . λ Sketch of Proof: Discretize the continuous-time system into infinitesimal time-intervals of size δ > 0. Assume that the buffer draining rate is constant over each δ interval. Denoting the cost-to-go by Ve (w) as a function of residual workload w, we have the following DP equation: Ve (w) = inf { r>0

Ve (w − rδ) | {z }

Cost-to-go after time δ

2 + δ(w − rδ)2 + δλr | {z } }. (5) | {z } Backlog cost ?

Power cost

Denoting the optimal draining rate by r (w), using a Taylor’s approximation for Ve (w−r? (w)δ) around w, and finally taking

IV. P OWER AWARE S WITCH S CHEDULING (PASS)

16

14

Optimal speed mode

12

10

8

6

4

DP boundary conditions Approximate (fluid caricature)

2

0

0

50

100

150

200

250

300

Backlog

Fig. 3. Optimal speed mode as a function of backlog computed from the DP boundary conditions, and the fluid caricature model.

w2 + λr? (w), r? (w) where Ve 0 (w) denotes the derivative of Ve (w) with respect to w. Since r? (w) is the optimal rate in state w, the following must  ∂ e = 0. V (w − rδ) + δ(w − rδ)2 + δλr2 ? be true: ∂r r=r (w) Dividing throughout by δ and taking limit δ ↓ 0, we get ? ? Ve 0 (w) √ = 2λr (w). Combining our results, we have r (w) = w/ λ. the limit δ ↓ 0, we can show: Ve 0 (w) =

We will use the optimal buffer draining rate for the continuous-time model to approximate the speed mode thresholds t1,k (0), t2,k (0) in the discrete-time model. In particular, we set t1,k (0) = bwk c, k = 1, . . . , S −1, where wk = inf{w : r? (w) > k}. In words, the thresholds are simply approximated by “integer crossings” of r√? (w). By symmetry arguments, t1,k (0) = t2,k (0) , tk = bk λc. See Fig. 3 for an illustration with S = 16, and λ = 200. The approximation becomes increasingly accurate as the number of permissible speed modes increase because the fluid caricature model corresponds to the limiting regime S → ∞. Radial Approximation: Having approximately computed the speed mode thresholds t1,k (0) and t2,k (0) in closed-form, we now propose the following radial approximation for the optimal power management policy: Π? chooses speed mode k in state b if the vector b is contained inside a circle of radius tk and outside a circle of radius tk−1 ; that is, t2k−1 < b21 +b22 ≤ t2k . This is equivalent to the following approximations for  the q t2k − b22 , and speed mode thresholds: (a) t1,k (b2 ) = q  (b) t2,k (b1 ) = t2k − b21 . Note that this approximation is consistent with the monotonicity property of Lemma 1. With the radial approximation at our disposal, we have an approximate closed-form description of Π? , which does not require explicit solution of the DP equations. The approximate e ? ) is parameterized by the power-delay trade-off policy (say, Π parameter λ. It follows from Lemma 2 and the subsequent discussion that tk (for every k) increases as λ increases. Consequently, the decision regions corresponding to lower speed modes grow bigger as λ increases. This is consistent with intuition and the motivation for introducing λ.

We formulated the scheduling and power management problem for a PMIQ switch as a buffer draining problem in Section II and transformed it into an equivalent problem for a parallelqueue single-server system in Section III. We then constructed e ? , for the latter problem. We an approximate SPM policy, Π now put these ideas together to propose the Power Aware Switch Scheduling (PASS) policy for a 2 × 2 IQ switch: 1) Given the backlogs b11 , b12 , b21 and b22 of the four VOQs at the beginning of a super-slot, set b1 = max(b11 , b22 ), and b2 = max(b12 , b21 ). 2) Set the switch in configuration C1 if b1 ≥ b2 , else set the switch in configuration C2 . q 3) Set the switch in speed mode k if b21 + b22 ∈ (tk−1 , tk ], √ where tk = k λ, as computed in Section III-D. 4) Repeat steps 1-3 in every time-slot. A. PASS for bigger switches We now discuss an extension of the PASS policy proposed for a 2 × 2 switch to N × N IQ switches for N > 2. The key underlying idea is to construct “orthogonal” and “complete” subsets of switch configurations and then to reformulate the buffer draining problem for a PMIQ switch as a buffer draining problem for a system with a single-server and N parallel meta-queues. To see this, consider an arbitrary switch configuration vector, say v = [eT1 . . . eTN ]T , where ei is the standard ith unit vector in RN , and xT denotes the transpose of vector x. Configuration vector v corresponds to scheduling VOQs Q11 , Q22 , . . . , QN N . Now, for any N 2 T T length vector w = [w1T w2T . . . wN ] , define the circular shift operator C as T T T C(w) = [wN w1T w2T . . . wN −1 ] .

Recursively define C k (w) = C(C k−1 (w)) for k = 1, 2, . . .. By convention, C 0 (v) = v. Note that C N (w) = w and C k (w) = C k mod N (w). Thus, starting with any switch configuration vector v, we can generate a configuration subset Sv = {v, C(v), . . . , C N −1 (v)} by applying the circular shift operator N − 1 times in succession. In fact, we can generate (N − 1)! disjoint configuration subsets, each of size N , starting with different configuration vectors. The subset Sv has the following special properties: (a) No VOQ is served by more than one configuration vector in Sv , and (b) Each of the N 2 VOQs is served by some configuration vector in Sv . As a consequence, we can construct N meta-queues, one corresponding to each configuration in Sv , by “fusing” the packets of the VOQs served by this configuration and introducing dummy packets if necessary. Note that the problem transformation described in Section III-A for a 2 × 2 switch is a special case of the transformation outlined above with only one subset comprising two configurations. For illustration, the two possible orthogonal subsets of configurations for a 3 × 3 switch are depicted in Fig. 4. We refer the reader to [15] for further details.

v1

C(v1 )

C 2 (v1 )

v2

C(v2 )

C 2 (v2 )

T T T and the three Fig. 4. Subsets of orthogonal configurations for a 3 × 3 switch. The three leftmost configurations are generated by v1 = [eT 1 e2 e3 ] T eT ]T . e rightmost configurations are generated by v2 = [eT 2 1 3

If the N × N PMIQ switch is operated in configurations from a single subset, the buffer draining problem for the switch can be transformed into a buffer draining problem for a single-server system with N parallel meta-queues. It can be shown using a Lyapunov technique that operating the switch using configurations from a single subset is throughput optimal under uniform loading. Further, using a randomized subset selection policy in conjunction with MWM based selection from the chosen subset is throughput optimal under any nonuniform loading. For ease of illustration, we will focus on the uniform loading scenario here. Without loss of generality, we consider the subset Sv generated by v = [eT1 . . . eTN ]T . We have the following straightforward extension of PASS: 1) Consider the backlogs b1 , . . . , bN of the meta-queues corresponding to the N configurations in Sv . ? 2) Set the switch in configuration C k −1 (v) ∈ Sv , such that bk? = max {bk }. k=1,...,N q 3) Select speed mode k if b21 + . . . + b2N ∈ (tk−1 , tk ], √ where tk = k λ, as computed in Section III-D. While PASS is based on the analysis of the buffer draining problem, it can be used as a heuristic policy in a dynamic environment where packets arrive continually to the switch. The advantages of PASS are: (a) It has low implementation complexity of O(N ) for an N ×N switch, and (b) It is agnostic to assumptions on the packet arrival process, thereby rendering it robust to fluctuations in traffic patterns. B. Another heuristic: PA-MWM We propose another heuristic scheduling and power management policy, namely, Power-Aware Maximum Weight Matching (PA-MWM). The PA-MWM policy makes its power management decision based on the total (sum over all VOQs) backlog in the system and its scheduling decision based on the benchmark MWM scheduling policy. In particular, given the VOQ backlogs b11 , . . . , bN N and the sum backlog B = N X N X bij at the beginning of a super-slot, the power mani=1 j=1

agement problem for a PMIQ switch based on sum backlog is equivalent to the power management problem for a singlequeue single-server system (where the queue has backlog B). Given the total backlog B, PA-MWM selects speed mode k if

B ∈ (tk−1 , tk ], where tk = kbλc, as computed in Section IIID. Once again, λ has the interpretation of being a power-delay trade-off parameter. After PA-MWM selects a speed mode at the beginning of a super-slot, it makes its scheduling decision based on MWM. However, the switch configuration is updated only once every super-slot (and not every time-slot, which is the case for MWM). V. S IMULATION R ESULTS In this section, we evaluate the performance of the proposed PASS and PA-MWM policies via simulations and contrast them to the benchmark MWM scheduler. We consider a uniform loading scenario; that is, all VOQs are subject to the same average load. We assume that all buffers are initially empty. The switch operates in slotted time. The switch can be set into one of four possible speed modes (S = 4). At most one packet arrives to each VOQ at the beginning of every timeslot according to some stochastic arrival process. We consider two different packet arrival processes: 1) Bernoulli i.i.d. Traffic: Under this model, in every timeslot, a packet arrives to a VOQ (independent of other VOQs, and past/future time-slots) with probability p and no packet arrives with probability 1 − p. We choose p < 1/N , for an N × N switch, to ensure switch stability. Fig. 5 and Fig. 6 depict the power-backlog (equivalently, power-delay) tradeoff for a 2 × 2 switch and a 4 × 4 switch under Bernoulli i.i.d. traffic, respectively†. Each curve for PASS and PA-MWM corresponds to a different choice of trade-off parameter λ. The arrival rate p increases from left to right along each curve. As expected, both average power consumption and average backlog increase with p. For fixed p, MWM has the lowest average backlog and the highest power consumption. PASS and PA-MWM provide a power-delay trade-off, which can be tuned by varying λ. In many cases, both PASS and PA-MWM yield power savings of 30-40% with only a marginal increase in average backlog, especially under light to moderate loading conditions. For the 4 × 4 switch, delay performance of PASS can be further improved by operating the switch in multiple configuration subsets, rather than just one subset, as discussed in Section IV-A. † Each point on every curve depicted in this section corresponds to 105 time-slots worth of simulation time.

1

1

0.9

0.9

0.6

0.5

PASS (λ=2) PASS (λ=10) PASS (λ=50) PA−MWM (λ=2) PA−MWM (λ=10) PA−MWM (λ=50) MWM

0.4

0.3

0.2

0.1

0

2

4

6

8

10

12

Lo ad

0.8

0.7

In cr ea si ng

Average power per time−slot

Lo ad 0.7

In cr ea si ng

Average power per time−slot

0.8

0.6

0.5

PASS (λ=2) PASS (λ=10) PASS (λ=50) PA−MWM (λ=2) PA−MWM (λ=10) PA−MWM (λ=50) MWM

0.4

0.3

0.2

14

0

2

4

Average backlog

Power-Backlog trade-off for a 2 × 2 switch under Bernoulli traffic

Fig. 7.

0.9

0.6

0.5

PASS (λ=2) PASS (λ=20) PASS (λ=100) PA−MWM (λ=5) PA−MWM (λ=100) PA−MWM (λ=1000) MWM

0.4

0.3

0

2

4

6

8

10

si ng

Lo ad

0.7

0.8

cr ea

Average power per time−slot

ad cr ea

si ng

Lo

0.8

In

Average power per time−slot

12

1

0.9

0.7

0.6

0.5

PASS (λ=2) PASS (λ=10) PASS (λ=50) PA−MWM (λ=5) PA−MWM (λ=100) PA−MWM (λ=1000) MWM

0.4

0.3

0.2 12

Power-Backlog trade-off for a 4 × 4 switch under Bernoulli traffic

2) Markov Modulated Bernoulli (MMB) Traffic: Under this model, packets arrive to each VOQ according to independent Bernoulli processes. However, the Bernoulli parameter characterizing each process varies according to a finite-state Markov chain (FSMC). Thus, Bernoulli i.i.d traffic is a degenerate case of MMB traffic with only one state in the FSMC. The MMB traffic model has often been used in the literature to simulate bursty packet arrivals. We assume that the FSMC has two states — LO and HI. Packets arrive with probability pLO and pHI in the two states, respectively, where pLO < pHI . In each time-slot, the FSMC corresponding to each VOQ transitions its state (from LO to HI, or HI to LO), independently of others, with probability q. For our simulations, we fix pLO = 0.1, q = 0.3, and vary pHI to vary the load on the switch. Fig. 7 and Fig. 8 depict the power-backlog trade-off for a 2×2 switch and a 4 × 4 switch under MMB traffic, respectively. Qualitatively, our observations are similar to those in the Bernoulli i.i.d. traffic case. Our simulation results clearly demonstrate the power-delay trade-off which can be achieved using the PASS and PAMWM policies. The former has a computational complexity O(N/S) per time-slot, while the latter has a computational complexity O(N 3 /S) per time-slot. The trade-off point for both policies can be tuned by varying the parameter λ. Both policies yield significant power savings relative to the benchmark MWM policy (with no power management) at the

0

1

2

3

4

5

6

7

8

9

10

Average backlog

Average backlog

Fig. 6.

10

Power-Backlog trade-off for a 2 × 2 switch under MMB traffic

1

0.2

8

In

Fig. 5.

6

Average backlog

Fig. 8.

Power-Backlog trade-off for a 4 × 4 switch under MMB traffic

expense of a moderate increase in average packet delay. We expect this advantage to prevail under different kinds of packet arrival processes and also scale with switch size N . While all our results here are for a uniform loading scenario, PASS can be extended to handle arbitrary non-uniform traffic by combining single configuration subset operation of the PMIQ switch with an intelligent subset selection criterion. VI. C ONCLUSIONS To address the increasingly important problem of high power consumption in packet switches, this paper proposed the Power-Managed Input-Queued (PMIQ) switch architecture. A PMIQ switch is an input-queued switch with an additional layer of decision making to regulate the power dissipated by the switch. A PMIQ switch can be adaptively configured to operate in different speed modes depending upon its current loading conditions. In this way, the switch can achieve a tradeoff between power consumption and average delays incurred by packets queued at the input buffers. We proposed the Power-Aware Switch Scheduling (PASS) switch management policy, which jointly determines the switch configuration and speed mode based on the backlogs of various input buffers. The computational complexity of PASS is linear in switch size, making it attractive from an implementation perspective. We studied the power management problem in the context of an IQ switch. The formulation, however, is quite generally

applicable to any resource/server allocation scenario where allocating an idle resource/server to a job/customer waiting in the system incurs a cost, thereby inducing a natural “pushpull” effect between the number of active resources and average job/customer waiting time. R EFERENCES [1] A.K. Parekh and R.G. Gallager, “A generalized processor sharing approach to flow control in integrated services networks: multiple nodes case”, IEEE/ACM Trans. Networking, vol. 2, pp. 137-150, Apr. 1994. [2] N. McKeown, A. Mekkittikul, V. Anantharam and J. Walrand, “Achieving 100% throughput in an input-queued switch”, IEEE Trans. Communications, vol. 47, no. 8, pp. 1260-1267, Aug. 1999. [3] A. Mekkittikul and N. McKeown, “A practical scheduling algorithm to achieve 100% throughput in input-queued switches”, IEEE INFOCOM 1998, pp. 792-799, San Francisco, CA, Mar. 1998. [4] C.S. Chang, W.J. Chen and H.Y. Huang, “Birkhoff-von Neumann input-buffered crossbar switches for guaranteed-rate services”, IEEE Transactions on Communications, vol. 49, no. 7, pp. 1145-1147, Jul. 2001. [5] P. Giaccone, B. Prabhakar and D. Shah, “Randomized scheduling algorithms for high-aggregate bandwidth switches”, IEEE Journal on Selected Areas in Communications, vol. 21, no. 4, pp. 546-559, May 2003. [6] c. Minkenberg, R.P. Luijten, F. Abel, W. Denzel and M. Gusat, “Current issues in packet switch design”, ACM SIGCOMM Computer Communication Review, vol. 33, no. 1, pp. 119-124, Jan. 2003. [7] T.T. Ye, L. Benini and G. De Micheli, “Analysis of power consumption on switch fabrics in network routers”, ACM/IEEE Design Automation Conference, pp. 524-529, New Orleans, LA, Jun. 2002. [8] V. Raghunathan, M.B. Srivastava and R.K. Gupta, “A survey of techniques for energy efficient on-chip communication”, ACM/IEEE Design Automation Conference, pp. 900-905, Anaheim, CA, Jun. 2003. [9] L. Benini, A. Bogliolo and G. De Michelli, “A survey of design techniques for system-level dynamic power management”, IEEE Transactions on VLSI Systems, vol. 8, no. 3, pp. 299-316, Jun. 2000. [10] A.G. Wassal and M.A. Hasan, “Low-power system-level design of VLSI packet switching fabrics”, IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems, vol. 20, no. 6, pp. 723-738, Jun. 2001. [11] H. Wang, L.S. Peh and S. Malik, “Power-driven design of router microarchitectures in on-chip networks”, International Symposium on Microarchitecture, pp. 105-116, San Diego, CA, Dec. 2003. [12] T. Simunic, S.P. Boyd and P. Glynn, “Managing power consumption in networks on chips”, IEEE Transactions on VLSI Systems, vol. 12, no. 1, pp. 96-107, Jan. 2004. [13] N. Bambos and D. O’Neill, “Power management of packet switch architectures with speed modes”, Allerton Conference on Communication, Control and Computing, Allerton, IL, Oct. 2003. [14] D. Bertsekas, Dynamic Programming and Optimal Control, vol. 1 & 2, 2nd Ed., Athena Scientific, 2000. [15] A. Dua and N. Bambos, “Scheduling with soft deadlines for input queued switches”, Allerton Conference on Communication, Control and Computing, Allerton, IL, Sep. 2006.

Recommend Documents