Visualizing Time Series State Changes with Prototype Based Clustering ¨ am¨o, Tommi K¨arkk¨ainen Markus Pylv¨anen, Sami Ayr¨ University of Jyv¨ askyl¨ a, Department of Mathematical Information Technology P.O.Box 35 (Agora) FIN-40014 Jyv¨ askyl¨ a, Finland
[email protected],
[email protected],
[email protected] Abstract. Modern process and condition monitoring systems produce a huge amount of data which is hard to analyze manually. Previous analyzing techniques disregard time information and concentrate only for the indentification of normal and abnormal operational states. We present a new method for visualizing operational states and overall order of the transitions between them. This method is implemented to a visualization tool which helps the user to see the overall development of operational states allowing to find causes for abnormal behaviour. In the end visualization tool is tested in practice with real time series data collected from gear unit.
1
Introduction
Industrial processes and systems of a condition management produce nowadays a huge amount of time series data. The data are often monitored in industrial applications by defining separate limits for attribute values. This type of monitoring is easy to implement and understand, but it is unable to show if more than one attributes behave abnormally without breaking their limits. Altogether, a process state is characterized and controlled individually, without overall utilization of the measurements. Clustering has been used before for finding states of industrial process and abnormal behaviour from multivariate data [18][17]. However, this method loses time information between the states. We can examine in which states observations represent, but we can not examine in which order they occurred. This information can be meaningful if we are interested in causes that expose abnormal behaviour. This can only be seen by examining states that have occurred before the abnormal states. This paper presents a new concept for visualizing time series data with cluster prototypes where information about the transitions between states is added. Implementation of this method is presented and it is tested with time series data collected from a gear unit. For this case, we present shortly the whole knowledge mining process [2] and the role of techniques presented here on that.
2
Related Work
Methods for identifying operational states from industrial data have been presented in several publications. Wang [20] presents adaptive resonance theory (ART) which is an unsupervised learning algorithm and the Bayesian automatic classification system (AutoClass) for indentifying operational states. Also selforgazing maps are used for identifying states in processes like Heikkinen et al. [9] have done. Alhoniemi et al. [1] used SOM for monitoring and modelling industrial processes. Different clustering algorithms produce different results and it can be difficult to compare them [11]. Visualization of clusters offers a user-friendly method for comparing and presenting their dissimilarities. Data and cluster visualization are overlapping approaches to the analysis of large data sets because some data visualization techniques might present clusters at the same time like Self-organizing maps [14] and parallel coordinates [12]. A problem in self-orgazing maps is that the clustering algorithm is embedded to the method and that prevents selecting the best clustering algorithm for each data set. Huang and Lin [11] have developed a visualization technique for validating clusters. They use Fastmap for visualizing high dimensional data in 2D and a k-prototypes algorithm for clustering. Also Hoffman and Grinstein [10] have presented many visualization techniques in their survey, but the problem is that they are not meant to visualize time series data, i.e. temporal information is lost when whole data is concerned.
3
The Approach
This method is originally developed for visualizing time series data collected from the sensors attached to a gear unit. The primary use of collected data is to detect faults before they cause a serious damage to the gear unit. In this case goal is to form states that present either normal or abnormal behaviour of a gear unit. This knowledge can be used afterwards for finding patterns which may precede to a malfunction and this way predict faults even sooner. For example, the system may run smoothly if two normal states take turns occasionally, but when these states take turns rapidly it could expose faults. This kind of behaviour can not be seen by examining only values of single attributes. The used data has to be in a chronological order and complete. Observations that have missing values can be removed or they can be estimated based on other values of same attribute. In future the clustering algorithm can be replaced with a version which allows missing values. The system where the data is collected does not have delays between depended attribute values. If two attributes are depended to each other and there is time delay between their changes blur clusters so that it is hard to find basic features for each cluster. This kind of problem might come out in industrial processes like the waste water treatment process that S`anchez et al. [17] have studied.
3.1
Clustering
A base element of the new visualization method is the gear unit state identification with data clustering. For this prototype construction, the K-means method is chosen. The K-means clustering method is an iterative process that divides a given data set into K disjoint groups [16], [7]. It is one of the most widely used clustering principles, and the best-known partitioning-based clustering method that utilize prototypes for cluster presentation. Due to its straightforward implementation, gaussian assumptions, and computational efficiency, K-means is popular principle for many problems. It also has smaller memory requirements than, for instance, hierarchical methods. The K-means algorithm converges to a partition for which the cluster prototypes minimizes the clustering error with respect to the sum of the within-cluster squared errors.
minJ (c, {mk }K k=1 )c∈Nn ,mk ∈Rp =
n X
kxi − m(c)i k22
(1)
i=1
subject to (c)i ∈ {1, . . . , K} for all i = 1, . . . , n, where c is a code vector, which represents the cluster assignments of the objects, and m(c)i is the mean of the cluster, where the data point xi is assigned to. A general iterative relocation algorithm for solving the problem of K-means is given by the following algorithm: Input: The number of clusters K, n × p data set X. Output: Allocation of each data point to one of K clusters. Step 1. (Initialization) Compute the initial K cluster centers. Step 2. (Recomputation) (Re)compute memberships of the data points to the current cluster centers. Step 3. (Update) Update the cluster centers for the assignments of the data points. Step 4. (Stopping rule) Repeat from Step 2 until no data point changes cluster. One should note that the K-means is very sensitive to the initial partition and towards outliers. Since this work presents initial experiments with a new clustering-based visualization method, K-means is a sufficient method. Another typical option for the clustering step in a state identification problem are provided by the hierarchical clustering methods [6]. The problem with the hierarchical clustering is the O(n2 ) costs due to the use of the n × n distance matrix. Another problem are the missing data values since the gear unit data can contain them. The similarities computed in different sub-spaces are not easy to compare. Sometimes the comparison may be impossible. For instance, let us consider the distance computation for the following three 3-dimensional data vectors x1 = (1
0
NaN)T , x2 = (NaN
1
1)T , x3 = (1
NaN
1)T .
Straightforward comparison of the between-object distances is difficult, since all the points lie in the different sub-spaces. Hence, the use of prototype-based methods enables us to represent the recognized states with explicit prototype vectors and, on the other hand, provides more straightforward solutions for missing data treatment. 3.2
Dimensionality Reduction
”Curse of dimensionality” is often a problem in data mining applications. Real life data have often many variables or attributes which makes visualization difficult. A human being can realize one, two or three dimensional space easily, but when there are more dimensions than this visualization is not straightforward. This is one reason why dimensionality reduction techniques are developed. The easiest way to reduce dimensions is just to reduce variables. We can select only the most interesting variables based on domain-knowledge and visualize them. This technique inevitably losts some information and that is why we have to use advanced ones. Principal component analysis (PCA) and multi dimensional scaling (MDS) are used often for dimension reduction [8]. Also linear discriminant analysis (LDA) [4] can be used for this. They all present original n-dimensional data where are n-axes with fewer axes so that differences in the original data are showing as much as possible. Principal component analysis aims at finding such linear combinations of a data set that preserve the maximum amount of information assuming that the information is measured by variance. Hence, it is natural to use it for explorative data mining. A reduced dimension is obtained when the original high dimensional data is projected from the original