Detection of Correlated Alarms Based on Similarity ... - IEEE Xplore

Report 5 Downloads 79 Views
1014

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 10, NO. 4, OCTOBER 2013

Detection of Correlated Alarms Based on Similarity Coefficients of Binary Data Zijiang Yang, Jiandong Wang, and Tongwen Chen, Fellow, IEEE

Abstract—This paper studies the statistical analysis for alarm signals in order to detect whether two alarm signals are correlated. First, a similarity measurement, namely, Sorgenfrei coefficient, is selected among 22 similarity coefficients for binary data in the literature. The selection is based on the desired properties associated with specialities of alarm signals. Second, the distribution of a so-called correlation delay is shown to be indispensable and effective for the detection of correlated alarms. Finally, a novel method for detection of correlated alarms is proposed based on Sorgenfrei coefficient and distribution of the correlation delay. Numerical and industrial examples are provided to illustrate and validate the obtained results. Note to Practitioners: Alarm systems have been recognized as critical assets of industrial plants for safety and efficient operation. However, operators of industrial plants often receive far more alarms than they can handle promptly. Many alarms belong to the correlated alarms that almost always occur within a short time period of each other. Detecting and handling the correlated alarms can improve the performance of alarm systems. This paper proposes a novel method to detect whether two industrial alarm signals are statistically correlated. The proposed method is applicable to alarm signals in various industrial sectors, including power and utility, process and manufacturing, and oil and gas, and is one of the fundamental tools in advanced alarm management systems. Index Terms—Alarm signals, binary data, correlated alarms, similarity coefficients.

I. INTRODUCTION

A

LARM systems are of paramount importance to safety and efficient operation of modern industrial plants, including power stations, oil refineries, and petrochemical facilities [2], [15]. A well designed and efficient alarm system should meet with some performance criteria such as those in the guide from the Engineering Equipment and Materials Users’ Association (EEMUA) [7]. However, according to industrial surveys (see, e.g., in [2, Table IV], [15, Table 4.7.3]), operators of industrial plants often receive much more alarms than they can Manuscript received August 28, 2012; revised December 03, 2012; accepted February 09, 2013. Date of publication March 22, 2013; date of current version October 02, 2013. This paper was recommended for publication by Associate Editor M. K. Jeong and Editor H. Ding upon evaluation of the reviewers’ comments. This work was supported in part by the Shandong Electric Power Research Institute, NSERC, and NSFC under Grant 60704031 and Grant 61061130559. Z. Yang and J. Wang are with the College of Engineering, Peking University, Beijing 100871, China (e-mail: [email protected]; [email protected]). T. Chen is with the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6G 2V4, Canada (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TASE.2013.2248000

handle promptly. Hence, advanced alarm management systems have received an increasing attention from both industrial and academic communities. It is well known that industrial processes are usually equipped with interacting components; as a result, it is not unusual to see several alarms triggered by the same initiating abnormal event, or some closely related alarms conveying similar information. These alarms are commonly known as consequential/related alarms. Rothenberg [15, p. 123] gave the definitions: “consequential alarms are ones that occur one before the other, with the same one activating first; related alarms occur one after the other, but without a specific order.” Both the consequential and related alarms are referred to as the correlated alarms [15]. Once correlated alarms are detected, there are several approaches to deal with them, e.g., redundant alarms in cause-consequence groups can be suppressed [2], [7], or the most important or informative alarm in a related group of alarms can be highlighted for operators [15]. Doing so could greatly improve the performance of alarm systems and bring benefits to industrial operations, e.g., reducing operator reaction time to detect sources of primary faults. Due to the increasing complexity and varying dynamics in the modern industrial processes, usually it is a difficult and time-consuming task for operators to classify the correlated alarms solely based on process knowledge and operation experience. Hence, it would be valuable to have a systematic approach to detect the correlated alarms in an automatic manner, and design advance alarm management systems to deal with these correlated alarms. The detection of correlated alarms has received increasing attentions lately. Dahlstrand [6] used multilevel flow models based on the process knowledge to perform consequence analysis to find the root cause of alarms. Brooks et al. [3] exploited the parallel coordinate to capture the relationship of multivariate alarm signals and to obtain dynamically varying alarm limits, providing the historical data containing the information of best operating zones. Yang et al. [18] analyzed the discrepancy of correlation between process data and alarm data and used this information to optimize the alarm limits. Kondaveeti et al. [12] provided an alarm similarity color map to group similar alarms together based on the Jaccard similarity coefficient after padding alarm sequence with extra 1’s to existing alarm occurrences. Noda et al. [14] conducted correlation analysis between alarms and operation events to identify sequential alarms and unnecessary operations. Yang et al. [19] generated pseudo continuous time series from the original binary alarm data and used a correlation color map of the pseudo data to show the cluster of correlated variables. The objective of this paper is to detect correlated alarms via some statistical analysis. The contribution is threefold: First,

1545-5955 © 2013 IEEE

YANG et al.: DETECTION OF CORRELATED ALARMS BASED ON SIMILARITY COEFFICIENTS OF BINARY DATA

two formulations of alarm signals are compared in terms of suitability for detection of correlated alarms, and a similarity measurement, namely, Sorgenfrei coefficient, is selected among 22 existing similarity coefficients for binary data. The selection is on the basis of some desired properties associated with specialities of alarm signals. Second, a so-called correlation delay, which is the time shift for two alarm signals to achieve the maximum Sorgenfrei coefficient, is shown to be indispensable and effective in detecting correlated alarms. Third, a novel method for detection of correlated alarms is proposed based on Sorgenfrei coefficient and distribution of the correlation delay. Some preliminary results of this work have been presented in [20]. The present paper is more complete in the formulation of the proposed method and uses different numerical and industrial examples. Before starting with the study, it is necessary to clarify the necessity of performing the analysis directly to alarm signals, instead of doing so to process signals, because it is very difficult to have a statistical coefficient to describe a nonlinear relationship between two process signals, due to ample possibilities of nonlinear types. By contrast, the similarity coefficients are equally applicable to alarm signals, whose corresponding process signals can be correlated in a linear or nonlinear manner. Nevertheless, performing the analysis directly to alarm signals does not exclude the usage of process signals. In fact, the detection results based on alarm signals should be consistent with the actual relationship between the corresponding process signals. That is, with proper setting of alarm trippoint values, two alarm signals are concluded to be correlated if and only if their process signals are correlated. The rest of this paper is organized as follows. Section II selects Sorgenfrei coefficient as the similarity measure suitable for alarm signals. Section III discusses the necessity of looking at the distribution of the correlation delay, and proposes a method for the detection of correlated alarms. Section IV illustrates the proposed method via an industrial example. Finally, some concluding remarks are given in Section V. II. SIMILARITY COEFFICIENTS FOR ALARM SIGNALS This section summarizes 22 similarity coefficients for binary sequences, compares two formulations of alarm signals in terms of suitability for detection of correlated alarms, and recommends Sorgenfrei coefficient as the similarity measure for alarm signals. A. Similarity Measurements for Binary Sequences As alarm signals are composed by only ‘1’s and ‘0’s the similarity measurements for binary sequences are applicable to alarm signals. There are perhaps 76 statistical coefficients as the similarity measurements in the literature [5]. Here, 22 of them based on [4, Table I], therein, and [8, Table III], therein, are summarized in Table I. The mathematical symbols in Table I are defined as follows: data length of the two sequences; number of ‘1’s in the first sequence; number of ‘1’s in the second sequence;

1015

TABLE I SIMILARITY COEFFICIENTS FOR BINARY SEQUENCES

number of ‘0’ appeared simultaneously in both sequences; number of ‘1’s appeared simultaneously in both sequences; number of ‘1’s in the first sequence corresponding to ‘0’s in the second sequence; number of ‘1’s in the second sequence corresponding to ‘0’s in the first sequence. Basic properties of the coefficients in Table I have already been studied, see, e.g., [4]. In particular, some coefficients focus on the matching of ‘1’ to ‘1’, while some pay more attention to the matching of other types, e.g., ‘0’ to ‘0’. However, a similarity coefficient suitable for analyzing correlated alarm signals should have some special properties, e.g., the matching of ‘1’ to ‘1’ is desired for alarm signals. Thus, it is necessary to analyze the coefficients in Table I to select the ones having these special properties. Before conducting such an analysis, we need to discuss two different formulations of alarm signals in Section II-B. B. Two Formulations of Alarm Signals It is a common practice in the industry that when a process signal goes into the alarm state, the corresponding alarm signal changes the value from ‘0’ to ‘1’, when the process signal runs from the alarm state to the non-alarm one, the alarm signal makes an opposite switch from ‘1’ to ‘0’. However, there are two possible formulations for the alarm signal when the process signal is in the alarm state. Let and be the

1016

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 10, NO. 4, OCTOBER 2013

process signal and its associated high-alarm trippoint value, respectively. The first formulation is to let the alarm signal take the value of ‘1’ throughout the period that is in the alarm state, i.e., if if

(1)

the value of ‘1’) in each trial. The number is a hypergeometric random variable, standing for the number of ‘1’s in samples of size drawn from a population consisting ‘1’s and ‘0’s. If and are small, then can be approximated by a binomial random variable being independent to and , i.e., . The mean value of Sorgenfrei coefficient for the first formulation is

The alarm signal in the second formulation takes the value of ‘1’ only at the time instant when goes into the alarm state from the non-alarm state, i.e., if otherwise

and

(2)

is assumed to be known a Here, the alarm trippoint value priori and properly designed in some manner; see [17] for a systematic design procedure of and other design methods in [10], [13], and [16]. Both formulations have been adopted in literature, see, e.g., [12], [18], and [19]. The second formulation in (2) is more suitable to generate alarm signals for the subsequence detection of correlated alarms. It provides the information on the time instant when the process signal goes into the alarm state from the non-alarm state, which is crucial to tell if two alarm sequences occur in a correlated manner. By contrast, this information is very likely to be overwhelmed for the first formulation in (1) by many ‘1’s arisen from the period that the process signal stays in the alarm state. Since the similarity coefficients for alarm signals are based on the matching of ‘1’ to ‘1’, the first formulation will overestimate the similarity coefficients and may lead to incorrect conclusions. This fact can be revealed by a theoretical analysis on the two formulations for independent process signals as follows. Consider the collected samples of process signals and , and denote their alarm trippoint values as and . Choose Sorgenfrei coefficient1 (the eighth row in Table I) as the similarity measure of alarm signals and , denoted as (3) is an independent and identically distributed (IID) If process signal, then the probability of greater than its alarm trippoint value is (4) Similarly, for another IID process signal

, we have (5)

and are mutually independent to each other. Assume Thus, for the first formulation of alarm signals in (1), and are binomial random variables being mutually independent, i.e., , . Here, stands for a binomial random variable with as the total number of Bernoulli trials and as the probability of success (taking 1Sorgenfrei coefficient is also named as the correlation ratio coefficient [5], [8].

(6) For the second formulation in (2), and are another set of binomial random variables being mutually independent, i.e., (7) and

Owing to the triangular inequality, are less than 1/4, e.g.,

Thus, can be regarded as a binomial random variables independent to and , i.e., (8) The mean value of Sorgenfrei coefficient for the second formulation is

(9) and , is always larger than . The larger value of is due to too many ‘1’s in the first formulation, and may lead to an incorrect conclusion against the fact that and are independent, for which the next example provides a numerical illustration. Therefore, the second formulation in (2) is adopted in the sequel. 1) Example 1: Let the distribution of a process signal is Since

if if where is a Bernoulli random variable indicating whether is in the normal or abnormal condition, and the distribution of is and . Here, stands for a Gaussian random variable with mean and standard deviation . Another process signal , being independent to , has the same distribution as . The trippoint values for and are the same, , so that both in (4) and in (5) are equal to 0.5. One-thousand Monte Carlo simulations are performed. In each simulation, the data length of collected samples for and is equal to 1000. The two formulations in (1) and (2) are used to generate their own alarm signals for and . Fig. 1 presents some parts of and from the two formulations in one simulation. Sorgenfrei coefficients and are calculated based on the two groups of ’s

YANG et al.: DETECTION OF CORRELATED ALARMS BASED ON SIMILARITY COEFFICIENTS OF BINARY DATA

1017

TABLE III MMVS OF SEVEN SIMILARITY COEFFICIENTS

Fig. 1. Two formulations of alarm signals: (a) the process signal (solid) (dash), (b) the alarm signal from the and the alarm trippoint value from the second formufirst formulation in (1), and (c) the alarm signal lation in (2).

TABLE II SAMPLE MEAN AND STANDARD DEVIATION OF SORGENFREI COEFFICIENTS FOR TWO FORMULATIONS

from the first and second formulations, respectively. Table II and lists the sample means and standard deviations of , obtained from the 1000 Monte Carlo simulations. The sample means are consistent with the theoretical values in (6) and in (9) with . For the first formulation, may lead to an incorrect conclusion that and are correlated. Remark #1: The second formulation has been selected for analyzing correlated alarms. However, this does not imply that the second formulation should always be selected for other studies on alarm signals. For instance, the first formulation may be more suitable in detecting false alarms. C. Selection of Similarity Coefficients With alarm signals generated as that in (2), it is ready to select a similarity coefficient among those in Table I suitable for detection of correlated alarms. By considering the specialities of alarm signals, it would be desirable for a selected similarity coefficient to have the following properties: a) it focuses on the matching of ‘1’ to ‘1’ only; b) it takes a value inside the interval ; c) its value is expected to be as small as possible for the case that two process signals are independent. Property a) is a natural choice for the alarm signals generated as that in (2) where the matching of ‘1’ to ‘1’ implies that alarms

are tripped on in a synchronized manner. Property b) is a standard requirement so that the similarity coefficient can be easily interpreted: If the similarity coefficient takes a value close to 1 (0), then two alarm signals are strongly (weakly) correlated. Property c) is an intuitive requirement. Since the similarity coefficient closer to 0 means a weaker correlation, this value is expected to be as small as possible for two independent process signals. Otherwise, if the similarity coefficient is quite large, e.g., Simpson coefficient equal to 1/2 shown later in Table III, then it would be against intuition to interpret such a high value as indicating two uncorrelated alarms. Based on the three properties, we can now select a similarity coefficient among those in Table I. Property a) rules out the similarity coefficients containing the term , because alarm signals contain much more ‘0’s than ‘1’s and the term will play a dominant role to overwhelm the matching of ‘1’ to ‘1’. The value ranges of similarity coefficients in Table I have already been investigated in [4] so that those similarity coefficients do not satisfy Property b) can be readily discarded. In summary, the similarity coefficients satisfying Properties a) and b) include Jaccard, Dice, 2nd Kulcz, Otsuka, Sorgenfrei, Simpson, and Braun-Bl. For Property c), we establish at Proposition 1 the maximum mean values (MMVs) of the above seven similarity coefficients for two independent process signals. Sorgenfrei coefficient has the smallest MMV 1/16, much smaller than others. Therefore, Sorgenfrei coefficient is the choice for the subsequent detection of correlated alarms. Proposition 1: If two process signals and are mutually independent, and each of them is IID, then the MMVs of Sorgenfrei, Jaccard, Dice, 2nd Kulcz, Otsuka, Simpson, and Braun-Bl are those in Table III. Proof of Proposition 1: Here, we prove the MMV of Sorgenfrei coefficient in (3), i.e., . For Jaccard, Dice, 2nd Kulcz, Otsuka, Simpson, and Braun-Bl coefficients, their MMVs can also be established in a similar manner. Define and respectively the same as (4) and (5). From (9), the mean value of Sorgenfrei coefficient is

Using the triangular inequality, we have

The equality holds if and only if leading to .

and

,

1018

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 10, NO. 4, OCTOBER 2013

where is the th percentile of a standard Gaussian distribution. A hypothesis test can be formulated to verify if and are correlated. That is, if the inequality (11)

Fig. 2. The sample means of Sorgenfrei/Jaccard coefficients (circle/star) with one standard deviation confidence intervals ‘+’/‘x’.

1) Example 2: The MMVs of Sorgenfrei and Jaccard coefficients are numerically validated here. The process signals and and their alarm trippoint values are the same as those in Example 1. We vary the data length of collected samples from to . For each data length, 1000 Monte Carlo simulations are performed. The sample means of Sorgenfrei and Jaccard coefficients are obtained, depicted as the circles and stars in Fig. 2; one standard deviation confidence intervals around the sample means are also provided in Fig. 2. As the data length increases, the two similarity coefficients approach to their own MMVs 1/16 and 1/7 respectively. After selecting Sorgenfrei coefficient as the similarity measure, we propose a hypothesis test based on Proposition 1 to tell if two alarm signals are correlated. Kazt et al. [11] showed that if and are independent binomial random variables, namely, and , then with approximately is normally distributed with mean and variance

and , respectively. If where and are realizations of and are mutually independent and each of them is IID, then , and are mutually independent binomial random variables as given in (7) and (8). By generalizing the result in [11], it can be shown that the random variable approximately is normally distributed with mean and variance

Thus, the

confidence interval of is given by (10)

holds, then it is claimed that and are correlated. 2) Example 3: Two simulation experiments are performed. In the first experiment, the process signals and and their alarm trippoint values are the same as those in Example 1. The samples for are collected. After generating the alarm signals and by taking the second formulation in (2), Sorgenfrei coefficient is calculated, with the 95% confidence interval . Thus, the inequality (11) does not hold so that and are uncorrelated. In the second experiment, is the same as that in Example 1, while for an IID Gaussian noise being independent to . The other settings are the same as the first experiment. Sorgenfrei coefficient is calculated, , with the 95% confidence interval . Thus, the inequality (11) holds so that it is concluded that and are correlated. III. DETECTION OF CORRELATED ALARMS This section discusses the correlation delay, and proposes a novel method to detect whether two alarm signals are statistically correlated or not. A. Correlation Delay Because alarm signals may be correlated in a dynamic manner, Sorgenfrei coefficient is generalized as (12) where is referred to as Sorgenfrei sequence, taking the value of Sorgenfrei coefficient in (3) between and . Here, is obtained by shifting forward by samples if and backward by samples if . For finite data lengths, a zero-padding strategy is exploited to make and having the same number of samples. The delay achieving in (12) is named as the correlation delay, denoted as , i.e., (13) Proposition 1 says that if is greater than the MMV 1/16, then and must be correlated. In practice, is replaced by the hypothesis test in (11). However, if is smaller than 1/16, then it does not necessarily mean that and are uncorrelated. In other words, it is possible that for two correlated alarm signals and is less than 1/16. This is due to strong effects from noise; in addition, Proposition 1 is based on the assumption that two process signals are IID, which may not be hold in practice. In this case, we need to look at the distribution of the correlation delay . If the distribution is concentrated to a small interval,

YANG et al.: DETECTION OF CORRELATED ALARMS BASED ON SIMILARITY COEFFICIENTS OF BINARY DATA

Fig. 3. Process signals in Example 4a: (a)

and (b)

.

1019

Fig. 4. Results in Example 4a: (a) (solid) with its confidence interval (plus) in (10) and the critical value 1/16 (dash) and (b) the histogram of .

then and are believed to be correlated; otherwise, they are uncorrelated. The necessity of looking at the distribution of is illustrated in the following numerical examples. 1) Example 4a: Let the distribution of a process signal is if if where stands for the samples of in the range , and is a Bernoulli random variable indicating whether these samples of are in the normal or abnormal condition with the probabilities: and . Another process signal , being independent to , has the same distribution as . The trippoint values for and are the same, . One-hundred Monte Carlo simulations are performed. In each simulation, the total number of realizations of is equal to 100, so that the data length of collected samples for and is equal to 1000. Fig. 3 presents the collected samples of and in one simulation. The second formulation in (2) is used to generate the alarm signals and . The calculated Sorgenfrei coefficients in the 100 simulations are presented in Fig. 4(a), together with the confidence intervals in (10). The histogram of is presented in Fig. 4(b). It is clear that the inequality (11) does not hold for most cases and the distribution of is not concentrated to a small interval. 2) Example 4b: The process signal is the same as that in Example 4a. Another process signal is generated as

where is an IID Gaussian noise being independent to . The other settings are the same as Example 4a. Fig. 5(a) and (b), respectively, present the collected samples of and in one simulation. The calculated Sorgenfrei coefficients in the 100 simulations are presented in

Fig. 5. Process signals in Examples 4b and 4c: (a) (c) .

, (b)

, and

Fig. 6(a). Sorgenfrei coefficients in Fig. 6(a) are similar to those in Fig. 4(a), but the actual relation between and is completely different. The difference can be effectively found via the distribution of shown in Fig. 6(b). In the 100 simulations, is concentrated to the right time delay 0. Remark #1: It is important to detect the correlation relationship, even if the current value of is small, and and look like to be weakly correlated. Such a small value may be caused by noise effects. In this case, we may remove the noise effects by filtering process signals in order to detect the presence of correlation, as illustrated in the next example. 3) Example 4c: The process signals and , as well as the simulation settings, are the same those in Example 4b. We can exploit a Kalman filter for tracking the mean variations in in order to remove the noise effects (see, e.g., [1, Sec. III]). The filter version of is denoted as . Fig. 5 presents

1020

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 10, NO. 4, OCTOBER 2013

as those in (4) and (5), respectively. As , the mean value of defined in (12) is obtained analogously to (9) as

Fig. 6. Results in Example 4b: (a) (solid) with its confidence interval (plus) in (10) and the critical value 1/16 (dash) and (b) the histogram of .

The approximated equality holds for all values of . Thus, each value of has the same probability to be selected as to associate with the maximum value of . Next, we introduce the coefficient of variation (CV), defined as , where and are the mean and standard deviation of the random variable , respectively. The CV measures the dispersion of a probability distribution, or more specifically, the variability relative to the mean. To remove the effects of different values of , we shift the mean of to , i.e., (14) where is the realizations of . A point estimate of is , where and are the sample mean and standard deviation of , respectively

To take the uncertainty of into consideration, an interval estimate of is preferred. Gulhar et al. [9] compared fifteen confidence intervals for via simulation studies for various population distributions and sample sizes, and recommended their proposed one. That is, the confor is fidence interval of (15)

Fig. 7. Results in Example 4c: (a) (solid) with its confidence interval (plus) in (10) and the critical value 1/16 (dash) and (b) the histogram of .

is a small positive real number, e.g., , and is the th percentile of a chi-square distribution degree of freedom. From (15), the with is confidence interval of

where the collected samples of , and in one simulation. Sorgenfrei coefficients for and in Fig. 7(a) are much larger than the counterparts for and in Fig. 6(a), and the histogram of in Fig. 7(b) is concentrated to a small interval. Note that the Kalman filter induces some extra delays between and . In order to perform a statistical test on whether the distribution of is concentrated to a small interval, we first find the distribution of for independent process signals. Proposition 2: If two process signals and are mutually independent, and each of them is IID, then in (13) is uniformly distributed. Proof of Proposition 2: Consider the collected samples of alarm signals , and let the range of to achieve nonzero values of be . Define and

(16) Owing to (14), Proposition 2 says that is a uniform random variable taking the values in the range ; thus, the theoretical value of is equal to if the process signals and are independent. Based on the confidence interval in (16), a hypothesis test is formulated, i.e., if the inequality (17)

YANG et al.: DETECTION OF CORRELATED ALARMS BASED ON SIMILARITY COEFFICIENTS OF BINARY DATA

Fig. 8. Calculated Sorgenfrei coefficients for different values of , (b) , and (c) . values of : (a)

under three

holds, then it is claimed that the distribution of is not uniformly distributed so that and are correlated. B. Numbers of Alarms and Correlation Delays In order to reach a reliable estimate of the similarity coefficient , the numbers of alarms in and cannot be too small; otherwise, the alarms tripped by noise could easily deteriorate the reliability of the estimate. A natural question is: How many ‘1’s in and are required? We propose a rule of thumb as the answer: The numbers of ‘1’s in the alarm signals and , generated via (2), should be at least 30. This rule of thumb is obtained based on a numerical study as follows. Suppose that , where both and are binary alarm signals. Let the number of ‘1’s in be a positive integer . The position of each ‘1’ in is uniformly distributed and is denoted as . Then, ‘1’ appears either at with probability or at with probability for a real number , i.e.,

Therefore, the mean value of Sorgenfrei coefficient is

It is expected that the calculated Sorgenfrei coefficient may be quite far away from the true value due to the effects of noise for small values of , and get closer to the true value, while increases. The expectation is numerically validated in Fig. 8. One-hundred Monte Carlo simulations are performed for each value of , with a fixed data length of and . Based on Fig. 8, we choose as a threshold of trust, that is, the estimated similarity coefficient is expected to be reliable when there are no less than 30 ‘1’s in and .

Fig. 9. The curve of

1021

.

There is also a requirement on the sample size of the correlation delay samples . The requirement on can be obtained based on a numerical investigation on the confidence interval in (15). Fig. 9 presents the curve of that is defined as

Fig. 9 says that is smaller than 1 if is larger than 10. Thus, it is recommended that at least ten samples of are required to yield an accurate estimate of . C. Detection Method for Correlated Alarms This subsection proposes a novel detection method for correlated alarms, based on Sorgenfrei sequence , Sorgenfrei coefficient and the distribution of the correlation delay . The proposed method consists of the following steps: 1) Generate the alarm signals and as described in the second formulation (2) from the process signals and with the associated alarm trippoint values and , respectively. 2) If the numbers of ‘1’s in and are larger than 30, then proceed to Step 3; otherwise, wait for the occurrence of more alarms. 3) Calculate Sorgenfrei sequence for and with where is a user-selected positive integer, and find the largest value of as Sorgenfrei coefficient . If the inequality (11) holds, then it is concluded that and are correlated; otherwise, proceed to Step 4. 4) Separate the collected data of and into at least 10 data segments. For each data segment, the least total number of ‘1’s in and is equal to 30. Obtain the correlation delay in (13) for each data segment. 5) If the inequality (17) holds, then it is concluded that the distribution of is concentrated to a small interval and

1022

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 10, NO. 4, OCTOBER 2013

TABLE IV SAMPLE MEANS

AND STANDARD DEVIATIONS OF THE SORGENFREI AND JACCARD COEFFICIENTS IN EXAMPLE 5

and are correlated; otherwise, and are not correlated. Once the above steps have been implemented, we can conclude if two alarm signals are statistically correlated. If there are multiple alarm signals, the proposed method is applied to each pair of them to detect the correlated alarms. Remark #2: It is recommended in Section III-B that at least 30 alarms are required to reach a reliable estimate of . If the inequality (11) does not hold, then the distribution of has to be calculated. In Section III-B, at least ten samples of are recommended to yield a reliable estimate of . Thus, the estimation of the distribution of may require a large amount of data, depending on the frequency of appearance of alarms.

Section II-B, the first formulation may overestimate the similarity coefficients and yield an incorrect conclusion. If the second formulation in (2) of alarm signals is used, the method in [19] may give results similar to the proposed method; however, there are many unsolved issues, such as the establishment of MMV of the correlation coefficient between pseudo continuous time series and the associated hypothesis test, as the counterparts of the MMV of Sorgenfrei coefficient in Proposition 1 and the hypothesis test in (17). These issues are beyond the scope of the current paper. In terms of the method in [14], even if the second formulation in (2) of alarm signals is used, the similarity measure in [14] is less effective than in (12) in detecting correlated alarms. By adopting the notation in this paper, the cross-correlation function between two events used in [14] is

and the maximum value of

is denoted as

D. Comparison With Existing Methods This subsection compares the difference between the proposed method and three other methods proposed by Kondaveeti et al. [12], Noda et al. [14], and Yang et al. [19], respectively. The proposed method has three major advantages: 1) the Sorgenfrei coefficient is more robust to noise effects than the Jaccard coefficient used in [12]; 2) the methods in [14] and [19] are based on the first formulation in (1) of alarm signals, which may overestimate the similarity coefficients and may lead to an incorrect conclusion; 3) the distribution of the correlation delay has not been exploited in [12], [14], and [19], while this paper finds it indispensable for detecting correlated alarms. First, both the proposed method and the alarm similarity color map in [12] are based on the second formulation in (2) of alarm signals; however, the Sorgenfrei coefficient has a smaller MMV than that of Jaccard coefficient (Proposition 1), so that the Sorgenfrei coefficient is more robust to noise effects, as illustrated in the next example. 1) Example 5: The process signal is the same as that in Example 1, except that the distribution of is different: and , and another process signal is , where is an IID Gaussian noise being independent to . The other settings, including the alarm trippoint values, are the same as those in Example 1. For three values of , the sample means and standard deviations of the Sorgenfrei and Jaccard coefficients in 1000 Monte Carlo simulations are listed in Table IV. When the noise level is small, both Sorgenfrei and Jaccard coefficients can correctly indicate that and are correlated. However, in the case , the Jaccard coefficient is close to its MMV 1/7, which may lead to the conclusion that and are uncorrelated; by contrast, the Sorgenfrei coefficient is quite far away from its MMV 1/16, showing more robustness to noise effects. Second, the methods proposed by Noda et al. [14] and Yang et al. [19] are based on the first formulation in (1) of alarm signals. As illustrated by the theoretical analysis and Example 1 in

Thus, is the number of ‘1’s appeared simultaneously in both and . By assuming two events (alarm signals) being independent, the number of ‘1’s appeared simultaneously in two event sequences approximately follows the Poisson distribution, based on which a similarity measure is proposed (18) where

and

It is obvious that is the same as the numerator of Sorgenfrei coefficient in (12). However, in (18) measures the cumulative probability at the value of the Poisson random variable with parameter , being very different with in (12). The next numerical example shows that in (12) may achieve too large values for two uncorrelated alarm signals. 2) Example 6: The configuration is the same as that in Example 2, with a fixed data length . The similarity measure in (18) is calculated with and . The sample means and standard deviations of the Sorgenfrei coefficient, Jaccard coefficient and in (18) from 1000 Monte Carlo simulations are , , and , respectively. Since and are actually uncorrelated, the Sorgenfrei and Jaccard coefficients are close to their MMVs 1/16 and 1/7, respectively. Comparing with the two coefficients, the similarity measure in (18) takes much larger values of the sample mean and standard deviation, which certainly is not a desired property as explained in Section II-C and Example 5.

YANG et al.: DETECTION OF CORRELATED ALARMS BASED ON SIMILARITY COEFFICIENTS OF BINARY DATA

Fig. 10. Process signals in Example 7: (a)

and (b)

1023

Fig. 11. Process signals in the first data segment in Example 7: (a) (dash) and (b) (solid) and (dash). and

.

(solid)

IV. INDUSTRIAL EXAMPLE This section provides an industrial example to illustrate the effectiveness of the proposed detection method. 1) Example 7: Real-time measurements of two process variables in a large-scale petrochemical plant afflicted to Sinopec Yangzi Petrochemical Company, Jiangsu Province, China. The data are collected with the sampling period of 1 min. The data length is , standing for the operating period about 94 days. All the collected data are presented in Fig. 10. The alarm trippoint values are and . They are the low-alarm trippoint instead of the high-alarm trippoint in (2), e.g., if otherwise

and

(19)

The proposed method in Section II-C is implemented as foland via lows: Step 1 generates the alarm signals is se(19). In Step 2, the first data segment lected to make the least number of alarms in and equal to 30; Fig. 11 presents the first data segment. Step 3 calculates Sorgenfrei sequence for and obtain Sorgenfrei coefficient with its confidence interval . Thus, the inequality (11) does not hold, and we need to look at the distribution of . In Steps 4 and 5, 11 data segments can obtained from the collected data samples ; the least number of alarms in and in each segment is equal to 30. The calculated Sorgenfrei coefficients for the 11 data segments are presented in Fig. 12(a), together with the confidence intervals in (10). The histogram of is presented in Fig. 12(b) with the results and the confidence interval of in (16) as . The inequality (17) holds so that the distribution of is concentrated to a small interval, and it is concluded that and are correlated. By looking at the process and alarm signals carefully, we realize that the low values of Sorgenfrei coefficients in Fig. 12(a) are partially due to the small position mismatch between ‘1’s in

Fig. 12. Results in Example 7: (a) (plus) in (10) and (b) the histogram of

(solid) and its confidence interval .

and . Thus, a time window is introduced to revise the computation of in (3): if and , then there is an alarm ‘1’ appeared simultaneously in the two alarm signals. Fig. 13 presents the calculated Sorgenfrei coefficients for the 11 data segments with and the histogram of . Sorgenfrei coefficients in Fig. 13(a) are much larger than the counterparts in Fig. 12(a). The distribution of is also concentrated to a small interval: and the confidence interval of in (16) is . The averages of in Figs. 12(b) and 13(b) are equal to 1.1818 and 2.7273, respectively. Hence, it is concluded that and are correlated, with the occurrence of lagging behind that of by 1 or 2 samples. The conclusion is consistent with the available process knowledge on the process signals. That is, is the measurement of a fuel gas pressure in a pipe connected to a fuel-gas

1024

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 10, NO. 4, OCTOBER 2013

be indispensable and effective in detecting correlated alarms. Based on and the distribution of , a novel detection method for correlated alarm signals was proposed. The effectiveness of the proposed method was illustrated via numerical and industrial examples. The proposed detection method could be further improved by considering the effects of false and missed alarms on Sorgenfrei coefficient. One feasible way is to integrate with some change detection methods to separate the collected data of process signals into isolated segments in normal and abnormal conditions, and suppress the corresponding false and missed alarms for each isolated segment. Another future work is on a systematic way to handle correlated alarms. As stated in Section I, after correlated alarms have been detected, there are several approaches to deal with them to improve the performance of alarm systems. However, more research work is required to formulate systematic approaches to do so. Fig. 13. Results in Example 7 with : (a) dence interval (plus) in (10) and (b) the histogram of

(solid) and its confi.

ACKNOWLEDGMENT The authors would like to thank the Associate Editor and the anonymous reviewers for their constructive comments and helpful suggestions, and Sinopec Yangzi Petro-Chemical Company, Nanjing, China, for providing industrial data used in this study. REFERENCES

Fig. 14. Enlarged parts of process signals in Example 7: (a) (dash) and (b) (solid) and (dash).

(solid) and

pretreatment drum, while is the measurement of the fuel gas temperature inside the drum. Thus, the alarm signals and are expected to be correlated. The conclusion that lagging behind is also consistent with the variations of and , as shown at some magnified time plots of and in Fig. 14. V. CONCLUSION This paper studied the problem of detecting correlated alarm signals via statistical analysis. The second formulation of alarm signals in (2) is found to be more suitable for the detection of correlated alarms. Based on the properties desired for alarm signals, Sorgenfrei coefficient in (3) was chosen as the one to measure the similarity between two alarm signals. The distribution of the correlation delay defined in (13) was shown to

[1] G. Bishop and G. Welch, “An introduction to the Kalman filter,” SIGGRAPH, Course 8, 2001. [2] M. L. Bransby and J. Jenkinson, The Management of Alarm Systems. Birmingham, UK: Health and Safety Executive, 1998. [3] R. Brooks, R. Thorpe, and J. Wilson, “A new method for defining and managing process alarms and for correcting process operation when an alarm occurs,” J. Hazardous Mater., vol. 115, pp. 169–174, 2004. [4] A. H. Cheetham and J. E. Hazel, “Binary (presence-absence) similarity coefficients,” J. Paleontology, vol. 43, no. 5, pp. 1330–1336, 1969. [5] S. S. Choi, S. H. Cha, and C. Tappert, “A survey of binary similarity and distance measures,” J. Systemics, Cybern. Informat., vol. 8, no. 1, pp. 43–48, 2010. [6] F. Dahlstrand, “Consequence analysis theory for alarm analysis,” Knowledge-Based Syst., vol. 15, pp. 27–36, 2002. [7] Engineering Equipment and Materials Users’ Association (EEMUA), Alarm Systems—A Guide to Design, Management and Procurement. London, U.K., EEMUA Publication 191, Version 2, 2007. [8] D. P. Faith, “Asymmetric binary similarity measures,” Oecologia, vol. 57, pp. 287–290, 1983. [9] M. Gulhar, B. M. G. Kibria, A. N. Albatineh, and N. U. Ahmed, “A comparison of some confidence intervals for estimating the population coefficient of variation: A simulation study,” Statist. Oper. Res. Trans., vol. 36, pp. 45–68, 2012. [10] A. Henningsen and J. P. Kemmerer, “Intelligent alarm handling in cement plants,” IEEE Ind. Appl. Mag., pp. 9–15, Sep./Oct. 1995. [11] D. Katz, J. Baptista, S. P. Azen, and M. C. Pike, “Obtaining confidence intervals for the risk ratio in cohort studies,” Biometrics, vol. 34, pp. 469–474, 1978. [12] S. R. Kondaveeti, I. Izadi, S. L. Shah, and T. Black, “Graphical representation of industrial alarm data,” presented at the Proc. 11th IFAC Symp. Anal., Design and Evaluation of Human-Machine Systems, Valenciennes, France, 2010. [13] J. Liu, K. W. Lim, W. K. Ho, K. C. Tan, R. Srinivasan, and A. Tay, “The intelligent alarm management system,” IEEE Software, pp. 66–71, 2003. [14] M. Noda, F. Higuchi, T. Takai, and H. Nishitani, “Event correlation analysis for alarm system rationalization,” Asia-Pac. J. Chem. Eng., vol. 6, pp. 497–502, 2011. [15] D. Rothenberg, Alarm Management for Process Control. New York: Momentum Press, 2009. [16] R. Srinivasan, J. Liu, K. W. Lim, K. C. Tan, and W. K. Ho, “Intelligent alarm management in a petroleum refinery,” Hydrocarbon Process., vol. 83, pp. 47–53, 2004.

YANG et al.: DETECTION OF CORRELATED ALARMS BASED ON SIMILARITY COEFFICIENTS OF BINARY DATA

[17] J. Xu, J. Wang, I. Izadi, and T. Chen, “Performance assessment and design for univariate alarm systems based on FAR, MAR and AAD,” IEEE Trans. Autom. Sci. Eng., vol. 9, no. 2, pp. 296–307, Apr. 2012. [18] F. Yang, S. L. Shah, and D. Xiao, “Correlation analysis of alarm data and alarm limit design for industrial processes,” in Proc. 2010 Amer. Control Conf., Baltimore, MD, 2010, pp. 5850–5855. [19] F. Yang, S. L. Shah, D. Xiao, and T. Chen, “Improved correlation analysis and visualization for industrial alarm data,” in Proc. 18th IFAC World Congr., Milano, Italy, 2011, pp. 12898–12903. [20] Z. Yang, J. Wang, and T. Chen, “On correlation analysis of bivariate alarm signals,” in Proc. 9th IEEE Int. Conf. Inform. Autom., Shenyang, China, 2012, pp. 530–535.

Zijiang Yang received the B.E. degree in automatic control from Taiyuan University of Technology, Shanxi, China, in 2005, and the M.Sc. degree in control engineering from Peking University, Beijing, China, in 2012. His current research topic is the advanced alarm system management.

1025

Jiandong Wang received the B.E. degree in automatic control from the Beijing University of Chemical Technology, Beijing, China, in 1997, and the M.Sc. and Ph.D. degrees in electrical and computer engineering from the University of Alberta, Edmonton, AB, Canada, in 2003 and 2007, respectively. He is presently a Professor with the Department of Industrial Engineering and Management, College of Engineering, Peking University, Peking, China. From 1997 to 2001, he was a Control Engineer with the Beijing Tsinghua Energy Simulation Company, Beijing. From February 2006 to August 2006, he was a Visiting Scholar at the Department of System Design Engineering, Keio University, Japan. His research interests include system identification, alarm systems, process monitoring and management, and their applications to industrial problems. Dr. Wang currently is an Associate Editor for Systems and Control Letters.

Tongwen Chen (F’06) received the B.Eng. degree in automation and instrumentation from Tsinghua University, Beijing, China, in 1984, and the M.A.Sc. and Ph.D. degrees in electrical engineering from the University of Toronto, Toronto, ON, Canada, in 1988 and 1991, respectively. He is presently a Professor of Electrical and Computer Engineering with the University of Alberta, Edmonton, AB, Canada. His research interests include computer and network-based control systems, process safety and alarm systems, and their applications to the process and power industries. Dr. Chen is a registered Professional Engineer in Alberta, Canada. He has served as an Associate Editor for several international journals, including the IEEE TRANSACTIONS ON AUTOMATIC CONTROL, Automatica, and Systems and Control Letters.