Sliding - UPC

Report 40 Downloads 76 Views
Kalman Filters and Adaptive Windows for Learning in Data Streams Albert Bifet

Ricard Gavaldà

Universitat Politècnica de Catalunya

DS’06 Barcelona

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

1 / 29

Outline 1

Introduction

2

The Kalman Filter and the CUSUM Test

3

The ADWIN Algorithm

4

General Framework

5

K-ADWIN

6

Experimental Validation of K-ADWIN

7

Conclusions

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

2 / 29

Introduction

Introduction

Data Streams Sequence potentially infinite High amount of data: sublinear space

High Speed of arrival: small constant time per example

Estimation and prediction Distribution and concept drift K-ADWIN : Combination Kalman filter ADWIN : Adaptive window of recently seen data items.

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

3 / 29

Introduction

Introduction

Problem Given an input sequence x1 , x2 , . . . , xt , . . . we want to output at instant t a prediction xbt+1 minimizing prediction error: |xbt+1 − xt+1 | considering distribution changes overtime.

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

4 / 29

Introduction

Time Change Detectors and Predictors: A General Framework Estimation -

xt -

Estimator

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

5 / 29

Introduction

Time Change Detectors and Predictors: A General Framework Estimation -

xt -

Estimator

Alarm -

A. Bifet, R. Gavaldà (UPC)

Change Detect.

-

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

5 / 29

Introduction

Time Change Detectors and Predictors: A General Framework Estimation -

xt -

Estimator

Alarm -

Change Detect.

-

6 6 ? -

A. Bifet, R. Gavaldà (UPC)

Memory

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

5 / 29

Introduction

Introduction

Our generic proposal: Use change detector Use memory

Our particular proposal: K-ADWIN Kalman filter as estimator Use ADWIN as change detector with memory [BG06]

Application Estimate statistics from data streams In Data Mining Algorithms based on counters, replace them for estimators.

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

6 / 29

Introduction

Data Mining Algorithms with Concept Drift No Concept Drift

Concept drift

DM Algorithm output

input -

Counter5

-

input

DM Algorithm

output

-

-

Static Model

Counter4 Counter3

6

Counter2 Counter1

A. Bifet, R. Gavaldà (UPC)

-

Change Detect.



Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

7 / 29

Introduction

Data Mining Algorithms with Concept Drift No Concept Drift

Concept Drift

DM Algorithm output

input -

DM Algorithm

Counter5

-

output

input -

Estimator5

Counter4

Estimator4

Counter3

Estimator3

Counter2

Estimator2

Counter1

Estimator1

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

-

7 / 29

The Kalman Filter and the CUSUM Test

The Kalman Filter Optimal recursive algorithm Minimum mean-square error estimator Estimate the state x ∈ h then alarm and gt = 0

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

10 / 29

The ADWIN Algorithm

Algorithm ADWIN [BG06] Example W = 101010110111111

ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − µ ˆW1 | ≥ c holds 6 for every split of W into W = W0 · W1 7 Output µ ˆW

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

11 / 29

The ADWIN Algorithm

Algorithm ADWIN [BG06] Example W = 101010110111111 W0 = 1 W1 = 01010110111111

ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − µ ˆW1 | ≥ c holds 6 for every split of W into W = W0 · W1 7 Output µ ˆW

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

11 / 29

The ADWIN Algorithm

Algorithm ADWIN [BG06] Example W = 101010110111111 W0 = 10 W1 = 1010110111111

ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − µ ˆW1 | ≥ c holds 6 for every split of W into W = W0 · W1 7 Output µ ˆW

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

11 / 29

The ADWIN Algorithm

Algorithm ADWIN [BG06] Example W = 101010110111111 W0 = 101 W1 = 010110111111

ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − µ ˆW1 | ≥ c holds 6 for every split of W into W = W0 · W1 7 Output µ ˆW

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

11 / 29

The ADWIN Algorithm

Algorithm ADWIN [BG06] Example W = 101010110111111 W0 = 1010 W1 = 10110111111

ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − µ ˆW1 | ≥ c holds 6 for every split of W into W = W0 · W1 7 Output µ ˆW

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

11 / 29

The ADWIN Algorithm

Algorithm ADWIN [BG06] Example W = 101010110111111 W0 = 10101 W1 = 0110111111

ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − µ ˆW1 | ≥ c holds 6 for every split of W into W = W0 · W1 7 Output µ ˆW

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

11 / 29

The ADWIN Algorithm

Algorithm ADWIN [BG06] Example W = 101010110111111 W0 = 101010 W1 = 110111111

ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − µ ˆW1 | ≥ c holds 6 for every split of W into W = W0 · W1 7 Output µ ˆW

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

11 / 29

The ADWIN Algorithm

Algorithm ADWIN [BG06] Example W = 101010110111111 W0 = 1010101 W1 = 10111111

ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − µ ˆW1 | ≥ c holds 6 for every split of W into W = W0 · W1 7 Output µ ˆW

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

11 / 29

The ADWIN Algorithm

Algorithm ADWIN [BG06] Example W = 101010110111111 W0 = 10101011 W1 = 0111111

ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − µ ˆW1 | ≥ c holds 6 for every split of W into W = W0 · W1 7 Output µ ˆW

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

11 / 29

The ADWIN Algorithm

Algorithm ADWIN [BG06] Example W = 101010110111111 |ˆ µW0 − µ ˆW1 | ≥ c : CHANGE DETECTED! W0 = 101010110 W1 = 111111

ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − µ ˆW1 | ≥ c holds 6 for every split of W into W = W0 · W1 7 Output µ ˆW

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

11 / 29

The ADWIN Algorithm

Algorithm ADWIN [BG06] Example W = 101010110111111 Drop elements from the tail of W W0 = 101010110 W1 = 111111

ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − µ ˆW1 | ≥ c holds 6 for every split of W into W = W0 · W1 7 Output µ ˆW

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

11 / 29

The ADWIN Algorithm

Algorithm ADWIN [BG06] Example W = 01010110111111 Drop elements from the tail of W W0 = 101010110 W1 = 111111

ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − µ ˆW1 | ≥ c holds 6 for every split of W into W = W0 · W1 7 Output µ ˆW

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

11 / 29

The ADWIN Algorithm

Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004

Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004

ADWIN: All Adjacent subwindows 1 01010110111111

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

12 / 29

The ADWIN Algorithm

Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004

Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004

ADWIN: All Adjacent subwindows 10 1010110111111

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

12 / 29

The ADWIN Algorithm

Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004

Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004

ADWIN: All Adjacent subwindows 101 010110111111

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

12 / 29

The ADWIN Algorithm

Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004

Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004

ADWIN: All Adjacent subwindows 1010 10110111111

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

12 / 29

The ADWIN Algorithm

Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004

Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004

ADWIN: All Adjacent subwindows 10101 0110111111

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

12 / 29

The ADWIN Algorithm

Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004

Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004

ADWIN: All Adjacent subwindows 101010 110111111

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

12 / 29

The ADWIN Algorithm

Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004

Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004

ADWIN: All Adjacent subwindows 1010101 10111111

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

12 / 29

The ADWIN Algorithm

Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004

Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004

ADWIN: All Adjacent subwindows 10101011 0111111

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

12 / 29

The ADWIN Algorithm

Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004

Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004

ADWIN: All Adjacent subwindows 101010110 111111

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

12 / 29

The ADWIN Algorithm

Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004

Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004

ADWIN: All Adjacent subwindows 1010101101 11111

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

12 / 29

The ADWIN Algorithm

Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004

Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004

ADWIN: All Adjacent subwindows 10101011011 1111

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

12 / 29

The ADWIN Algorithm

Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004

Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004

ADWIN: All Adjacent subwindows 101010110111 111

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

12 / 29

The ADWIN Algorithm

Window Management Models W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004

Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004

ADWIN: All Adjacent subwindows 1010101101111 11

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

12 / 29

The ADWIN Algorithm

Window Management Models

W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. 2004

Total window against subwindow 10101011011 1111 J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. 2004

ADWIN: All Adjacent subwindows 10101011011111 1

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

12 / 29

The ADWIN Algorithm

Algorithm ADWIN [BG06]

ADWIN has rigorous guarantees On ratio of false positives On ratio of false negatives On the relation of the size of the current window and change rates

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

13 / 29

The ADWIN Algorithm

Algorithm ADWIN [BG06] Theorem At every time step we have: 1

(Few false positives guarantee) If µt remains constant within W , the probability that ADWIN shrinks the window at this step is at most δ.

2

(Few false negatives guarantee) If for any partition W in two parts W0 W1 (where W1 contains the most recent items) we have |µW0 − µW1 | > , and if s ≥4·

3 max{µW0 , µW1 } 4n ln min{n0 , n1 } δ

then with probability 1 − δ ADWIN shrinks W to W1 , or shorter. A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

14 / 29

The ADWIN Algorithm

Data Streams Algorithm ADWIN2 [BG06] ADWIN2 using a Data Stream Sliding Window Model, can provide the exact counts of 1’s in O(1) time per point. tries O(log W ) cutpoints uses O( 1 log W ) memory words the processing time per example is O(log W ) (amortized) and O(log2 W ) (worst-case). Sliding Window Model 1010101 101 11 1 1 Content:

4

2

2

1 1

Capacity:

7

3

2

1 1

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

15 / 29

The ADWIN Algorithm

Algorithm ADWIN2 ADWIN2 using a Data Stream Sliding Window Model, can provide the exact counts of 1’s in O(1) time per point. tries O(log W ) cutpoints uses O( 1 log W ) memory words the processing time per example is O(log W ) (amortized) and O(log2 W ) (worst-case). Insert new Item 1010101 101 11 1 1 Content:

4

2

2

1 1

Capacity:

7

3

2

1 1

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

16 / 29

The ADWIN Algorithm

Algorithm ADWIN2 ADWIN2 using a Data Stream Sliding Window Model, can provide the exact counts of 1’s in O(1) time per point. tries O(log W ) cutpoints uses O( 1 log W ) memory words the processing time per example is O(log W ) (amortized) and O(log2 W ) (worst-case). Insert new Item 1010101 101 11 1 1

1

Content:

4

2

2

1 1

1

Capacity:

7

3

2

1 1

1

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

16 / 29

The ADWIN Algorithm

Algorithm ADWIN2 ADWIN2 using a Data Stream Sliding Window Model, can provide the exact counts of 1’s in O(1) time per point. tries O(log W ) cutpoints uses O( 1 log W ) memory words the processing time per example is O(log W ) (amortized) and O(log2 W ) (worst-case). Compressing Buckets 1010101 101 11 1 1

1

Content:

4

2

2

1 1

1

Capacity:

7

3

2

1 1

1

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

16 / 29

The ADWIN Algorithm

Algorithm ADWIN2 ADWIN2 using a Data Stream Sliding Window Model, can provide the exact counts of 1’s in O(1) time per point. tries O(log W ) cutpoints uses O( 1 log W ) memory words the processing time per example is O(log W ) (amortized) and O(log2 W ) (worst-case). Compressing Buckets 1010101 101 11 11 1 Content:

4

2

2

2

1

Capacity:

7

3

2

2

1

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

17 / 29

The ADWIN Algorithm

Algorithm ADWIN2 ADWIN2 using a Data Stream Sliding Window Model, can provide the exact counts of 1’s in O(1) time per point. tries O(log W ) cutpoints uses O( 1 log W ) memory words the processing time per example is O(log W ) (amortized) and O(log2 W ) (worst-case). Compressing Buckets 1010101 101 11 11 1 Content:

4

2

2

2

1

Capacity:

7

3

2

2

1

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

17 / 29

The ADWIN Algorithm

Algorithm ADWIN2 ADWIN2 using a Data Stream Sliding Window Model, can provide the exact counts of 1’s in O(1) time per point. tries O(log W ) cutpoints uses O( 1 log W ) memory words the processing time per example is O(log W ) (amortized) and O(log2 W ) (worst-case). Compressing Buckets 1010101 10111 11 1 Content:

4

4

2

1

Capacity:

7

5

2

1

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

18 / 29

The ADWIN Algorithm

Algorithm ADWIN2 ADWIN2 using a Data Stream Sliding Window Model, can provide the exact counts of 1’s in O(1) time per point. tries O(log W ) cutpoints uses O( 1 log W ) memory words the processing time per example is O(log W ) (amortized) and O(log2 W ) (worst-case). Detecting Change: Delete last Bucket 1010101 10111 11 1 Content:

4

4

2

1

Capacity:

7

5

2

1

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

19 / 29

The ADWIN Algorithm

Algorithm ADWIN2 ADWIN2 using a Data Stream Sliding Window Model, can provide the exact counts of 1’s in O(1) time per point. tries O(log W ) cutpoints uses O( 1 log W ) memory words the processing time per example is O(log W ) (amortized) and O(log2 W ) (worst-case). Detecting Change: Delete last Bucket 10111 11 1 Content:

4

2

1

Capacity:

5

2

1

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

19 / 29

General Framework

General Framework Time Change Detectors and Predictors : Type I Example (Kalman Filter) Estimation -

xt -

Estimator

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

20 / 29

General Framework

General Framework Time Change Detectors and Predictors : Type II Example (Kalman Filter + CUSUM) Estimation -

xt -

Estimator

Alarm -

A. Bifet, R. Gavaldà (UPC)

Change Detect.

-

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

20 / 29

General Framework

General Framework Time Change Detectors and Predictors : Type III Example (Adaptive Kalman Filter) Estimation -

xt -

Estimator 6

-

A. Bifet, R. Gavaldà (UPC)

Memory

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

20 / 29

General Framework

General Framework Time Change Detectors and Predictors : Type IV Example (ADWIN, Kalman Filter+ADWIN) Estimation -

xt -

Estimator

Alarm -

Change Detect.

-

6 6 ? -

A. Bifet, R. Gavaldà (UPC)

Memory

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

20 / 29

General Framework

Time Change Detectors and Predictors: A General Framework

No memory

Memory

No Change Detector

Type I Kalman Filter

Type III Adaptive Kalman Filter

Change Detector

Type II Kalman Filter + CUSUM

Type IV ADWIN Kalman Filter + ADWIN

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

21 / 29

General Framework

Time Change Detectors and Predictors: A General Framework

No memory

Memory

No Change Detector

Type I Kalman Filter

Type III Adaptive Kalman Filter Q,R estimated from window

Change Detector

Type II Kalman Filter + CUSUM

Type IV ADWIN Kalman Filter + ADWIN Q,R estimated from window

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

21 / 29

K-ADWIN

K-ADWIN = ADWIN + Kalman Filtering Estimation -

xt -

Kalman

Alarm -

ADWIN

-

6 6 ? -

ADWIN Memory

R = W 2 /50 and Q = 200/W , where W is the length of the window maintained by ADWIN.

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

22 / 29

Experimental Validation of K-ADWIN

Tracking Experiments

KALMAN: R=1000;Q=1 Error= 854.97

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

23 / 29

Experimental Validation of K-ADWIN

Tracking Experiments ADWIN : Error= 674.66

KALMAN: R=1000;Q=1 Error= 854.97

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

23 / 29

Experimental Validation of K-ADWIN

Tracking Experiments K-ADWIN Error= 530.13

ADWIN : Error= 674.66

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

23 / 29

Experimental Validation of K-ADWIN

Naïve Bayes Example Data set that describes the weather conditions for playing some game.

outlook sunny sunny overcast rainy rainy rainy overcast

temp. hot hot hot mild cool cool cool

humidity high high high high normal normal normal

windy false true false false false true true

play no no yes yes yes no yes

Assume we have to classify the following new instance: outlook temp. humidity windy play sunny cool high true ? A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

24 / 29

Experimental Validation of K-ADWIN

Naïve Bayes Assume we have to classify the following new instance: outlook temp. humidity windy play sunny cool high true ? We classify the new instance: νNB = arg

max ν∈{yes,no}

P(νj )P(sunny |νj )P(cool|νj )P(high|νj )P(true|νj )

Conditional probabilities can be estimated directly as frequencies: P(ai |νj ) =

number of instances with attribute ai and class νj total number of training instances with class νj

Create one estimator for each frequence that needs estimation A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

24 / 29

Experimental Validation of K-ADWIN

Experimental Validation of K-ADWIN

We test Naïve Bayes Predictor and k-means clustering Method: replace counters by estimators Synthetic data where change is controllable Naïve Bayes: We compare accuracy of Static model: Training of 1000 samples every instant Dynamic model: replace probabilities counters by estimators

computing the ratio

A. Bifet, R. Gavaldà (UPC)

%Dynamic Static

with tests using 2000 samples.

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

25 / 29

Experimental Validation of K-ADWIN

Naïve Bayes Predictor

ADWIN Kalman Q = 1, R = 1000 Kalman Q = 1, R = 1 Kalman Q = .25, R = .25 Adaptive Kalman CUSUM Kalman K-ADWIN Fixed-sized Window Fixed-sized Window Fixed-sized Window Fixed-sized Window

A. Bifet, R. Gavaldà (UPC)

Width

%Static

%Dynamic

% Dynamic/Static

32 128 512 2048

83,36% 83,22% 83,21% 83,26% 83,24% 83,30% 83,24% 83,28% 83,30% 83,28% 83,24%

80,30% 71,13% 56,91% 56,91% 76,21% 50,65% 81,39% 67,64% 75,40% 80,47% 82,19%

96,33% 85,48% 68,39% 68,35% 91,56% 60,81% 97,77% 81,22% 90,52% 96,62% 98,73%

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

26 / 29

Experimental Validation of K-ADWIN

k -means Clustering

ADWIN Kalman Q = 1, R = 1000 Kalman Q = 1, R = 100 Kalman Q = .25, R = .25 Adaptive Kalman CUSUM Kalman K-ADWIN Fixed-sized Window Fixed-sized Window Fixed-sized Window Fixed-sized Window Fixed-sized Window Fixed-sized Window

A. Bifet, R. Gavaldà (UPC)

Width

σ = 0.15 Static Dynamic

32 128 512 2048 8192 32768

9,72 9,72 9,71 9,71 9,72 9,72 9,72 9,72 9,72 9,72 9,72 9,72 9,72

21,54 19,72 17,60 22,63 18,98 18,29 17,30 25,70 36,42 38,75 39,64 43,39 53,82

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

27 / 29

Experimental Validation of K-ADWIN

Results

No estimator ever does much better than K-ADWIN K-ADWIN does much better than every other estimators in at least one context. Tracking problem K-ADWIN and ADWIN automatically do about as well as the Kalman filter with the best set of fixed covariance parameters. Naïve Bayes and k -means: K-ADWIN does somewhat better than ADWIN and far better than any memoryless Kalman filter.

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

28 / 29

Conclusions

Conclusions and Future Work

K-ADWIN tunes itself to the data stream at hand, with no need for the user to hardwire or precompute parameters. Better results than either memoryless Kalman Filtering or sliding windows with linear estimators. Future work : Tests on real-world, not only synthetic data. Other learning algorithms: algorithms for induction of decision trees.

A. Bifet, R. Gavaldà (UPC)

Kalman Filters and Adaptive Windows for Learning in Data Streams DS’06 Barcelona

29 / 29