Approximation Modeling for the Online Performance Management of Distributed Computing Systems Dara Kusic† , Nagarajan Kandasamy† and Guofei Jiang‡ of Electrical and Computer Engineering, Drexel University, Philadelphia, PA 19104 ‡ Robust and Secure System Group, NEC Laboratories America, Princeton, NJ 08540
[email protected],
[email protected],
[email protected] † Department
1998 World Cup HTTP Requests 1400 Arrival Rate Per 30 Second Interval
Abstract— This paper develops a hierarchical control framework to solve performance management problems in distributed computing systems. To reduce the control overhead, concepts from approximation theory are used in the construction of the dynamical models that predict system behavior, and in the solution of the associated control equations themselves. Using a dynamic resource provisioning problem as a case study, we show that a computing system managed by the proposed control framework using approximation models realizes profit gains that are, in the best case, within 1% of a controller using an exact parametric model of the system.
Silver Workload
1200
Bronze Workload
1000 Gold Workload
800 600 400 200
I. I NTRODUCTION
0 0
This short paper describes an optimization framework to solve a class of performance management problems in distributed computing systems. We refer the interested reader to [1] for more details. The performance optimization problem is decomposed into a set of simpler sub-problems and solved in cooperative fashion by multiple controllers arranged in a decentralized hierarchical structure. Concepts from approximation theory are applied in two places—in the construction of the dynamical models to track and predict system behavior over a finite prediction horizon, and in the solution of the associated control equations.
500
1000
1500 Time Instance
2000
2500
Fig. 2. An example workload representing client requests for the three online services hosted by the computing system
Simulations using workload traces from the 1998 World Cup Soccer web site (WC’98) show that a computing system managed by a control framework using approximation models realizes profit gains that are in the best case within 1% of a controller using a parametric model based upon first-principles while incurring low control overhead. II. S YSTEM M ODEL
Workload Ȝ(k)
Dispatcher Ȝ1(k)
Ȝ2(k)
Ȝ3(k)
Sleep Gold
Silver
Bronze
Dispatcher
Dispatcher
Dispatcher
…
…
…
n11(k)
r11(k)
n1m(k)
r1m(k)
n21(k)
r21(k)
n2m(k)
r2m(k)
n31(k)
n3m(k)
r31(k)
r3m(k)
Fig. 1. The system model comprising the Gold, Silver and Bronze service clusters and a Sleep cluster holds machines in a powered-off state
We assume a distributed computing environment (DCE) hosting three independent online services, labeled as “Gold”, “Silver”, and “Bronze” and indexed using i ∈ {1, 2, 3} as shown in Fig. 1. Requests for the Gold, Silver, and Bronze services arrive with time-varying rates λ1 (k), λ2 (k), and λ3 (k), respectively, and are routed to a computer cluster dedicated to hosting that service. Fig. 2 shows an example workload arrival pattern. Each cluster comprises heterogeneous computers with different processing capacities working independently to service incoming requests. Computers contributing excess capacity during periods of slow workload arrivals are powered down and placed in the Sleep cluster to reduce system power consumption. The Gold, Silver, and Bronze services generate revenue as per a pricing structure in which the response time of a completed request is translated into a dollar amount to be collected from the client. When the response time violates the SLA, the service provider pays a penalty to the client.
Fourth International Conference on Autonomic Computing (ICAC'07) 0-7695-2779-5/07 $20.00 © 2007 Authorized licensed use limited to: NEC Labs. Downloaded on May 4, 2009 at 18:46 from IEEE Xplore. Restrictions apply.
2
Zˆ (k 1)
Predictive filter
Z (k )
Gold cluster size, System model approximation, Workload J 15
System model
Regression tree approximation
sˆ(k 1)
uˆ (k 1)
s (k ) System
Optimizer u (k )
Fig. 3.
The controller schematic
Number of class 1 machines
14
L2 controller n21(k)
n31(k)
n1m(k)
n2m(k)
n3m(k)
...
Silver L1 controller
Ȗ11(k)
…
L0 controller
L0 controller
f11(k) n11(k)
Ȗ1m(k)
f1m(k)
… n1m(k)
10
8
0
500
1000
1500
2000
2500
Time in 30 second increments
Bronze
L1 controller Ȗ21(k)
…
L0 controller
Ȗ2m(k) L0 controller
f21(k) n21(k)
11
Neural network approximation
...
f2m(k)
… n2m(k)
L1 controller Ȗ31(k)
…
L0 controller
Fig. 5. Number of machines assigned to the Gold cluster by the L2 controller for workload J using an approximation in place of the system model
L0 controller
f31(k) n31(k)
Ȗ3m(k)
f3m(k)
… n3m(k)
Fig. 4. The control hierarchy showing L2, L1 and L0 controllers superimposed upon the Gold, Silver and Bronze services and m performance classes within each service cluster
Percent profit gain
Gold
12
9
n11(k)
...
Sleep
13
Percent Profit Gain Over Uncontrolled DCE
24 22 20 18 16 14 12 10 8 6 4 2 0
Baseline parametric model System_model