Anomaly Detection in a Mobile Communication Network Alec Pawling, Nitesh V. Chawla, and Greg Madey University of Notre Dame
Anomaly Detection in a Mobile Communication Network. – p.1
Overview We present a technique that uses hybrid clustering in conjunction with statistical process control to handle concept drift in a data stream.
Anomaly Detection in a Mobile Communication Network. – p.2
Outline • Motivation • Background ◦ Data streams ◦ Concept drift ◦ Statistical process control • Related work • Hybrid clustering for streams • Setup • Results • Conclusion
Anomaly Detection in a Mobile Communication Network. – p.3
Motivation Application • Detection and Alert System component of WIPER
Emergency Response System [Schoenharl et al., 2006], [Madey, et al., 2006] ◦ Detect and report anomalies in network usage ◦ Notify Simulation and Prediction System Difficulties • Massive volume of data • Dynamic system
Anomaly Detection in a Mobile Communication Network. – p.4
Data Streams • Data can only be read once (due to volume) • Order of data cannot be manipulated • Often, if the underlying process is stationary, anomaly
detection is straightforward • If the underlying process is dynamic, the problem is
difficult
Anomaly Detection in a Mobile Communication Network. – p.5
Concept Drift • Change in process that generates the data stream
over time • May or may not be periodic
Anomaly Detection in a Mobile Communication Network. – p.6
Concept Drift 700
Number of instances
600 500 400 300 200 100 0
0
0
72
18
0
28
17
0
84
15
0
40
14
0
96
12
0
52
11
08
10
40
86
00
72
60
57
20
43
80
28
40
14
0
Time (min)
Figure 1: GPRS usage over 12 days Anomaly Detection in a Mobile Communication Network. – p.7
Statistical Process Control Distinguish between random and assignable variation: threshold is µ ± lω. • Random variation ◦ High probability, little effect on process output • Assignable variation ◦ Low probability, significant effect on process output ◦ Change in underlying process
Anomaly Detection in a Mobile Communication Network. – p.8
Statistical Process Control 1.6e+06
Telephone call volume
1.4e+06 1.2e+06 1e+06 800000 600000 400000 200000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
0 Hour of the day
Figure 2: Range of random variance Anomaly Detection in a Mobile Communication Network. – p.9
Related Work Intrusion detection (Portnoy, 2001) • Identify intrusions in an unlabeled data set using
leader clustering. • The leader algorithm (Hartigan 1975) ◦ Let d be a distance threshold. ◦ Let the first instance assigned to cluster Ci be the
defining instance, ci ◦ For each instance x • Find the closest cluster, Cj • If dist(x, ci ) < d, add c to Ci • Otherwise, create a new cluster with the defining instance x. Anomaly Detection in a Mobile Communication Network. – p.10
Related Work Problem: • Uses z-score normalization to allow for arbitrary data
distribution: vi′
vi − v¯i = σi
• This is not possible in one pass
Anomaly Detection in a Mobile Communication Network. – p.11
Related Work Hybrid clustering algorithms, (Cheu et al., 2004) 1. Cluster to reduce the data set 2. Produce final clusters
Anomaly Detection in a Mobile Communication Network. – p.12
Hybrid Algorithm for Streams 1. Establish clusters with some minimum number of instances using a partitional or hierarchical algorithm 2. Incrementally update cluster center and standard deviations using a variation on the leader algorithm.
Anomaly Detection in a Mobile Communication Network. – p.13
Setup Data set • Feature vector consists of timestamp and number of
instances of 5 services • One example for each minute of a 12 day period
(18721 examples) Clustering Algorithms • Expectation Maximization — Weka, cross-validation to
determine number of clusters • Leader • Hybrid for streams: (1) k-means, (2) modified leader
Anomaly Detection in a Mobile Communication Network. – p.14
Results Hybrid algorithm • Small clusters compared to EM • Little consistency in detected outliers among different
thresholds or values of k Leader algorithm • More consistency in anomaly detection
Anomaly Detection in a Mobile Communication Network. – p.15
Conclusion • Algorithms using random values may be a bad idea • Algorithms requiring only threshold parameter seem
promising Future work • Hierarchical clustering to establish clusters • Examine further how the number of clusters grows
over time
Anomaly Detection in a Mobile Communication Network. – p.16
Acknowledgments This work is supported, in part, by the National Science Foundation, the DDDAS Program, under Grant No. CNS_050348.
Anomaly Detection in a Mobile Communication Network. – p.17