Loadstar: Load Shedding in Data Stream Mining

Report 4 Downloads 42 Views
Loadstar: Load Shedding in Data Stream Mining Yun Chi¹, Haixun Wang², Philip S. Yu² ¹Department of Computer Science, UCLA ²IBM Thomas J. Watson Research Center

Introduction n

Data stream systems ¤ ¤ ¤

n

Data from embedded sensors Financial and retailer data Network traffic data

Resources are limited ¤ ¤ ¤

CPU cycles Bandwidth Memory

1

Load Shedding—Which to Drop? n

Load shedding ¤

n

Dropping certain amount of loads

Which to drop? ¤ ¤

Randomly Intelligently

Load Shedding—An Example of Temperature Sensors Case 2

Case 1

Sensor A

Sensor B

80

90

100

Sensor A

80

90

100

80

90

100

Sensor B

80

90

100

2

Load Shedding in Classifying Multiple Data Streams—Introduction

Our Main Contributions n

A Novel Quality of Decision (QoD) measure ¤ ¤

n

A feature prediction model based on ¤ ¤

n

Discriminant functions Predicted feature distribution Markov-chains Real-time parameters update

Loadstar

3

Quality of Decision —Discriminant Functions Discriminant Functions 1 f (x)

f (x)

1

2

0.5

0 -0.5

0 0.5 1 1.5 2 Log Ratio of Discriminant Functions

2.5

10 0

decision boundary

-10 -20 -0.5

0

0.5 1 1.5 Feature Value

2

2.5

Quality of Decision —Based on Overall Risk n

Feature distribution in the next time unit X ~ p( x )

n

At a point x, the conditional risk for ci R (ci | x ) =

n

K j =1

σ (ci | c j ) P(c j | x )

The expected risk

E x [R (ci | x )] = R (ci | x ) p ( x )dx x

n

The decision based on expected risk δ 2 : k = arg min i E X [ R(ci | x )]

4

Quality of Decision —Based on Overall Risk n

The Bayesian risk:

[

]

E x R(c* | x ) = R(c* | x ) p( x )dx x

n

The Quality of Decision (QoD) :

(

[ ]) | x )]p ( x )dx

Q2 = 1 − E x [R(ck | x )] − E x R(c * | x )

[

= 1 − P ( c * | x ) − P ( ck x

Quality of Decision —Based on Overall Risk n

n

0 Q2 1, the higher the Q2, the more confident we are. Q2=1 if and only if ck is the minimum-risk decision at all region of the feature space.

5

Feature Prediction n n

Feature distribution: x ~ p ( x ) Take advantage of temporal locality ¤ ¤

n n

Stock price data Consecutive snapshots from satellites

Discrete-time Markov-chains Real-time parameter updating

Parameter Updating t0 t1 t2 t3 t4 t5

MLE:

ti

nij Pˆij = nik k

EM?

Approximation:

nij Pˆij = nik k

6

The Loadstar Algorithm

Demonstration n n n

Penalty of Load Shedding Resource Adaptability Data Streams with Concept-drifts

7

Penalty of Load Shedding— Performance 6.5

Error Rate (%)

6

Naive Algorithm Loadstar* Loadstar

5.5 5 4.5 4 3.5 0

20 40 60 Percentage of Loads Shed (%)

80

Observations given to Volatile Streams (%)

Penalty of Load Shedding—Resources Assigned to the Volatile Streams 30 Naive Algorithm Loadstar 25

20

15

10

5 0

20 40 60 Percentage of Loads Shed (%)

80

8

Resource Adaptive Load Shedding

Data Streams with Concept-drifts n

Use two Markov-chains: .91 .03 .03 .03 PA =

n

n

.03 .91 .03 .03 .03 .03 .91 .03 .03 .03 .03 .91

.25 .25 .25 .25 , PB =

.25 .25 .25 .25 .25 .25 .25 .25 .25 .25 .25 .25

First use PA; at time 1,000 switch to PB; at time 3,000 switch back to PA; Report the total Kullback-Leibler Pij divergence: d ( P , Pˆ ) = P log i

i

i

ij

i

j

Pˆij

9

Data Streams with Concept-drifts

K-L Divergence

K-L Divergence

Loadstar No Learning

200 100 0

1000

2000

3000 Loadstar*

4000

5000

200 No Shedding 50% Shedding

100

0

1000

2000

3000 Time Unit

4000

5000

Future Directions n n n n

Certain time units reserved for learning Networks of dependent data streams Data mining as an intermediate block Control the communication rates

10