slides - University of Notre Dame

Report 2 Downloads 110 Views
Anomaly Detection in a Mobile Communication Network Alec Pawling, Nitesh V. Chawla, and Greg Madey University of Notre Dame

Anomaly Detection in a Mobile Communication Network. – p.1

Overview We present a technique that uses hybrid clustering in conjunction with statistical process control to handle concept drift in a data stream.

Anomaly Detection in a Mobile Communication Network. – p.2

Outline • Motivation • Background ◦ Data streams ◦ Concept drift ◦ Statistical process control • Related work • Hybrid clustering for streams • Setup • Results • Conclusion

Anomaly Detection in a Mobile Communication Network. – p.3

Motivation Application • Detection and Alert System component of WIPER

Emergency Response System [Schoenharl et al., 2006], [Madey, et al., 2006] ◦ Detect and report anomalies in network usage ◦ Notify Simulation and Prediction System Difficulties • Massive volume of data • Dynamic system

Anomaly Detection in a Mobile Communication Network. – p.4

Data Streams • Data can only be read once (due to volume) • Order of data cannot be manipulated • Often, if the underlying process is stationary, anomaly

detection is straightforward • If the underlying process is dynamic, the problem is

difficult

Anomaly Detection in a Mobile Communication Network. – p.5

Concept Drift • Change in process that generates the data stream

over time • May or may not be periodic

Anomaly Detection in a Mobile Communication Network. – p.6

Concept Drift 700

Number of instances

600 500 400 300 200 100 0

0

0

72

18

0

28

17

0

84

15

0

40

14

0

96

12

0

52

11

08

10

40

86

00

72

60

57

20

43

80

28

40

14

0

Time (min)

Figure 1: GPRS usage over 12 days Anomaly Detection in a Mobile Communication Network. – p.7

Statistical Process Control Distinguish between random and assignable variation: threshold is µ ± lω. • Random variation ◦ High probability, little effect on process output • Assignable variation ◦ Low probability, significant effect on process output ◦ Change in underlying process

Anomaly Detection in a Mobile Communication Network. – p.8

Statistical Process Control 1.6e+06

Telephone call volume

1.4e+06 1.2e+06 1e+06 800000 600000 400000 200000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

0 Hour of the day

Figure 2: Range of random variance Anomaly Detection in a Mobile Communication Network. – p.9

Related Work Intrusion detection (Portnoy, 2001) • Identify intrusions in an unlabeled data set using

leader clustering. • The leader algorithm (Hartigan 1975) ◦ Let d be a distance threshold. ◦ Let the first instance assigned to cluster Ci be the

defining instance, ci ◦ For each instance x • Find the closest cluster, Cj • If dist(x, ci ) < d, add c to Ci • Otherwise, create a new cluster with the defining instance x. Anomaly Detection in a Mobile Communication Network. – p.10

Related Work Problem: • Uses z-score normalization to allow for arbitrary data

distribution: vi′

vi − v¯i = σi

• This is not possible in one pass

Anomaly Detection in a Mobile Communication Network. – p.11

Related Work Hybrid clustering algorithms, (Cheu et al., 2004) 1. Cluster to reduce the data set 2. Produce final clusters

Anomaly Detection in a Mobile Communication Network. – p.12

Hybrid Algorithm for Streams 1. Establish clusters with some minimum number of instances using a partitional or hierarchical algorithm 2. Incrementally update cluster center and standard deviations using a variation on the leader algorithm.

Anomaly Detection in a Mobile Communication Network. – p.13

Setup Data set • Feature vector consists of timestamp and number of

instances of 5 services • One example for each minute of a 12 day period

(18721 examples) Clustering Algorithms • Expectation Maximization — Weka, cross-validation to

determine number of clusters • Leader • Hybrid for streams: (1) k-means, (2) modified leader

Anomaly Detection in a Mobile Communication Network. – p.14

Results Hybrid algorithm • Small clusters compared to EM • Little consistency in detected outliers among different

thresholds or values of k Leader algorithm • More consistency in anomaly detection

Anomaly Detection in a Mobile Communication Network. – p.15

Conclusion • Algorithms using random values may be a bad idea • Algorithms requiring only threshold parameter seem

promising Future work • Hierarchical clustering to establish clusters • Examine further how the number of clusters grows

over time

Anomaly Detection in a Mobile Communication Network. – p.16

Acknowledgments This work is supported, in part, by the National Science Foundation, the DDDAS Program, under Grant No. CNS_050348.

Anomaly Detection in a Mobile Communication Network. – p.17

Recommend Documents