Information Sciences 259 (2014) 369–379
Contents lists available at SciVerse ScienceDirect
Information Sciences journal homepage: www.elsevier.com/locate/ins
Fault detection and diagnosis of non-linear non-Gaussian dynamic processes using kernel dynamic independent component analysis Jicong Fan, Youqing Wang ⇑ College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
a r t i c l e
i n f o
Article history: Available online 20 June 2013 Keywords: Non-linear non-Gaussian dynamic processes Independent component analysis Non-linear contribution plot TE process
a b s t r a c t This paper proposes a novel approach for dealing with fault detection of multivariate processes, which will be referred to as kernel dynamic independent component analysis (KDICA). The main idea of KDICA is to carry out an independent component analysis in the kernel space of an augmented measurement matrix to extract the dynamic and nonlinear characteristics of a non-linear non-Gaussian dynamic process. Furthermore, as a new method of fault diagnosis, a non-linear contribution plot is developed for KDICA. A comparative study on the Tennessee Eastman process is carried out to illustrate the effectiveness of the proposed method. The experimental results show that the proposed method compares favorably with existing methods. Ó 2013 Elsevier Inc. All rights reserved.
1. Introduction With the rapid development of complex industrial systems and sensor technology, data-driven fault detection and diagnosis methods have been vigorously developed over the past years. Data-driven methods avoid considering complicated system models and can take advantage of large numbers of measurements. The main idea of data-driven methods is to use data processing algorithms, such as multivariate statistical analysis [19,24,26,27], machine learning [3,23,31,34], and other methods [17,21], to extract useful information and latent knowledge. As a representative data-driven method, multivariate statistical process monitoring and fault diagnosis (MSPM&FD) is becoming much more important in chemical and biological plants. As a typical MSPM&FD technique, principal component analysis (PCA) [33] has been developed for decades and has a number of excellent achievements [16,35]. PCA is probably the most mature and best-known multivariate analysis method. It is very effective at handling data, which is high-dimensional, highly correlated, and noisy. The basic idea of PCA is to replace larger number of interrelated variables with smaller number of uncorrelated variables. By maximizing the variances of the principal components, PCA divides the data into two parts: a systematic part containing most of the variations in the data, and a noisy part having the least variations. PCA adopts two indices, T2 and SPE, to monitor the processes [27]. T2 is the Hotelling square statistic calculated by using the dominant components containing the maximal process variations. SPE is the quadratic sum of the residuals of the PCA model. Traditional PCA has been generally applied to static, linear, and Gaussian processes. However, dynamics, non-linearity, and non-Gaussianity are wide-spread in many practical processes, and these characteristics are significant challenges for PCA. Many enhanced versions of PCA, e.g., combining it with other techniques, have been proposed. Ku et al. [16] developed dynamic PCA (DPCA) for dynamic processes. DPCA uses an auto-regression (AR) model to augment the measurement matrix
⇑ Corresponding author. Tel.: +86 10 64414261; fax: +86 10 64437805. E-mail address:
[email protected] (Y. Wang). 0020-0255/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.ins.2013.06.021
370
J. Fan, Y. Wang / Information Sciences 259 (2014) 369–379
so that it can include process dynamics. To handle non-linearity, many non-linear PCA (NLPCA) methods, such as neural-network-related PCA (NN-PCA) [4,8,15] and kernel PCA (KPCA) [1,7,10,30,36,39], have been developed. NN-PCA is time-consuming in training the network and the number of principal components has to be determined before training. Therefore, KPCA was proposed to overcome the disadvantages of NN-PCA. Cheng et al. [5] integrated kernel dynamic PCA (KDPCA) with the exponentially weighted moving average (EWMA) scheme to monitor small faults. In PCA, only mean value and variance of the monitored data are utilized, but higher-order statistics are not considered. The mean value and variance are enough to fully describe a Gaussian distribution but not enough to describe non-Gaussian distributions. Then, as a development of PCA, independent component analysis (ICA) [13,14] has been developed. Compared with PCA, ICA depends on higher order statistics and extracts more information to obtain the independent components. Lee et al. [19] first applied ICA to monitor non-Gaussian processes. Analogously with PCA, I2 and SPE are used in ICA, where I2 is the quadratic sum of the dominant independent components and SPE is the quadratic sum of the residuals of the ICA model. Wang et al. [32] proposed a method to select features for ICA. For non-linear non-Gaussian process monitoring, kernel ICA (KICA) [11,18,37,38,40] has been developed and it has superior performance compared with ICA and KPCA. Furthermore, Lee and Qin [18] proposed a modified KICA through combining a modified ICA with the kernel method. In order to monitor non-Gaussian dynamic processes, dynamic ICA (DICA) [20,29] has been proposed and it outperforms ICA and DPCA. In sum, even though there exist some MSPM&FD methods to deal with dynamics, non-linearity, and non-Gaussianity, individually; however, to the authors’ best knowledge, there is no reported method can deal with these three characteristics simultaneously. Nevertheless, non-linear non-Gaussian dynamic (NLNGD) processes exist widely in practice, so dynamics, non-linearity, and non-Gaussianity have to be handled simultaneously. In this study, a new approach towards fault detection and diagnosis, which will be referred to as kernel dynamic independent component analysis (KDICA), is proposed for NLNGD processes. The main idea of KDICA is explained in the sequel. First, the measurement matrix is augmented to include the process dynamics. Second, a kernel mapping is used to get nonlinear scores. Finally, independent components can be obtained based on the derived scores and then the final model for monitoring is obtained. Generally, KDICA can handle dynamics, non-linearity, and non-Gaussianity simultaneously. A standard fault diagnosis method for PCA and ICA is contribution plot [2,16,19], which does not need prior knowledge and is easy to be applied for online fault diagnosis. The main idea of contribution plot is that when a monitoring index detects a fault, these variables with the largest contribution to the index are supposed to be the fault points. Because the conventional contribution plot is linear, transplanting it to kernel-based methods is difficult. A novel contribution plot is proposed for KDICA to diagnose faults in this study. Furthermore, KDICA is compared with DPCA, KPCA, KDPCA, DICA, and KICA on the Tennessee Eastman (TE) process [6,9,22,25]. The experimental results validate the superior monitoring capability of KDICA. 2. KDICA 2.1. Exploiting process dynamics The original PCA and ICA assume that the process is static and hence the observations are independent. However, in dynamic processes, the time lags and auto-correlation should not be ignored. Ku et al. [16] proposed dynamic PCA (DPCA) to deal with process dynamics. Lee et al. [20] proposed DICA to treat the dynamics of non-Gaussian processes. Actually, the main idea of these dynamic methods is to introduce an auto-regressive (AR) model to include the dynamics. An AR model can be described by
xi; ¼ f ðxi1; ; xi2; ; . . . ; xil; Þ þ ei;
ð1Þ
where xi, is the ith measurement vector, l is the order of the AR process, ei, is the noise vector, and f can be a linear or nonlinear function. Here, in order to explore the process dynamics, the first step of KDICA is to augment each observation vector xi ¼ ½xi;1 xi;2 . . . xi;m0 with the previous l observations,
xli ¼ ½xi;1 xi1;1 . . . xil;1 xi;2 xi1;2 . . . xil;2 ... ...
ð2Þ
xi;m0 xi1;m0 . . . xil;m0 Then the measurement matrix Xl((n0 l) m0(l + 1)) is represented as
2
xllþ1
3
6 l 7 6 xlþ2 7 7 6 Xl ¼ 6 . 7 6 . 7 4 . 5 xln0
ð3Þ
J. Fan, Y. Wang / Information Sciences 259 (2014) 369–379
371
where m0 is the number of variables, n0 is the number of observations, and Xl is the augmented data matrix. Xl should be normalized to have zero mean value and unit variance. For convenience, this paper sets n = n0 l and m = m0(l + 1). A method has been suggested to determine l automatically in [16] and it has also been reported that l = 1 or 2 is good enough to describe the dynamic characteristics of most processes [20]. In this paper, l is determined by the identification based on higherorder cumulants because higher-order ccumulants can well restrain Gaussian noises [12]. 2.2. Data whitening using KPCA PCA, DPCA, ICA, and DICA assume that the relationships between the variables are linear, but non-linear relationships are wide-spread. Hence, KPCA [1,7,10,39], KDPCA [5], and KICA [11,18,37,38,40] have been developed to deal with non-linearities. However, for non-linear non-Gaussian dynamic processes, none of these methods is effective enough. KPCA and KDPCA perform badly on non-Gaussian processes because they are only effective at extracting Gaussian features. Both KPCA and KICA ignore the process dynamics. The method proposed in this paper, KDICA, can handle dynamics, non-linearity, and non-Gaussianity simultaneously. In order to get the linear and orthogonal scores that the ICA algorithm requires, the second step of KDICA is to whiten the augmented measurement matrix in the kernel space. Using a non-linear mapping U(), the kernel method transforms a measurement vector into a higher-dimension feature space to get linear associations; then the ICA algorithm can be applied to the higher-dimension linear feature space. In short, the kernel method uses higher-dimensional linearity to replace lowerdimensional non-linearity, and then linear methods can be used to reduce the dimension and extract the features. The final dimension of the features is always less than that of the input data. The ith augmented measurement vector xli is mapped to /i ¼ U xli . The covariance matrix in the feature space can be expressed as
SF ¼
n 1X / /T n i¼1 i i
ð4Þ
Putting U = [/1; /2; . . .; /n], one gets SF ¼ 1n UUT . Although U() is unknown, the Gram matrix K = UUT can be determined by a given kernel function K(x, y)
T kij ¼ U xli U xlj ¼ /Ti /j ¼ k xli ; xlj
ð5Þ
The polynomial, sigmoid, and radial basis kernels are widely used for kij. In this paper, the radial basis kernel is used:
Kðx; yÞ ¼ exp
kx yk2
!
r
ð6Þ
where r is usually determined by trial and error. Some papers have suggested calculating it by r = rm [1,18], where m is the input dimension and r is a constant to be selected. In order to normalize the mapped data /i, i = 1, 2, . . ., n, the Gram matrix K should be centered by
K
K In K KIn þ In KIn 2
ð7Þ
3
1 1 . .. . where In ¼ 1n 4 .. . .. 5 . Given a1, a2, . . ., au the orthonormal eigenvectors corresponding to the u largest eigenvalues k1, 1 1 nn k2, . . ., ku of K, the u largest eigenvalues of SF are given by k1/n, k2/n, . . ., ku/n and the corresponding orthonormal eigenvectors b1, b2, . . ., bu can be expressed as
1 bj ¼ pffiffiffiffi Uaj ; kj
j ¼ 1; 2; . . . ; u
ð8Þ
Denoting V = (a1, a2, . . ., au), K = (k1, k2, . . ., ku), and C = (b1, b2, . . ., bu), a whitening matrix can be obtained as
P¼C
1=2 pffiffiffi 1 K ¼ nUV K1 n
ð9Þ
Then the ith (i = 1, 2, . . ., n) mapped training data in the feature space can be whitened as follows,
zi ¼ PT /i ¼
pffiffiffi 1 T T pffiffiffi T pffiffiffi 1 T T nK V U /i ¼ nK1 V T k xl1 ; xli ; k xl2 ; xli ; . . . ; k xln ; xli ¼ nK V ki
ð10Þ
where ki is the ith row of K. For a new datum xlnew to be monitored, its kernel mapping vector knew should be centered as
knew
knew Inew K knew In þ Inew KIn
ð11Þ
where Inew ¼ 1n ½1; ; 1n . The whitened new score is
pffiffiffi T pffiffiffi 1 T T znew ¼ PT U xlnew ¼ PT /new ¼ nK1 V T k xl1 ; xlnew ; k xl2 ; xlnew ; . . . ; k xln ; xlnew ¼ nK V knew
ð12Þ
372
J. Fan, Y. Wang / Information Sciences 259 (2014) 369–379
It is worth noticing that the above procedure from Eqs. (4)–(12) is also used in a standard KPCA. More details about KPCA can be found in [1,7,10,39]. The KPCA is used to whiten the data in the feature space, which can be considered the initial step of KICA. Choosing fewer whitened scores can reduce the number of ICs effectively but may lose some important information. Therefore, reducing the whitened scores should be done carefully. In this paper, a criterion, ki/sum(k) > 0.0001[18], is used to select the whitened scores. 2.3. Extracting non-Gaussian features using ICA ICA [13,14] is a multivariate statistical technique to obtain the latent independent components. Data whitening gives unrelated scores to improve the computational speed of the ICA algorithm. Suppose given u whitened scores z1, z2, . . ., zu, which are linear combinations of u latent independent components (ICs), s1, s2, . . ., su. Both z = [z1, z2, . . ., zu]T and s = [s1, s2, . . ., su]T have zero mean and unit variance, E(ssT) = I, E(zzT) = I, E(BTssTB) = BTB = I. The relationship is
z ¼ Bs
ð13Þ
The goal of ICA is to find a orthogonal matrix B to estimate the independent components s, i.e.,
s ¼ BT z
ð14Þ T
In order to obtain the independent components (ICs), the ICA algorithm maximizes the non-Gaussianity of B z. The measurement of non-Gaussianity can be done by using kurtosis, approximations of negentropy, and so on [14]. In this paper, the fixed-point algorithm developed by Hyvärinen [13] is used. Now, the third step of KDICA is to implement the ICA algorithm on the whitened data z and znew obtained in Section 2.2. Here, the ICs with larger negentropy are selected to be the dominant ones, since larger negentropy means stronger Gaussianity [14]. Because both the augmentation and the kernel mapping can increase the number of ICs, KDICA will have more ICs than DICA and KICA do. Denoting the d ICs with the largest negentropy by sd, and denoting the corresponding d columns of B by Bd, then sd ¼ BTd z. The projection of sd for z is
^z ¼ Bd sd
ð15Þ
Hence, combining Eqs. (9) and (10), the projection of the mapped data is
^ ¼ UV ^zn1=2 /
ð16Þ
2.4. Online monitoring and fault detection After d ICs have been selected, the first index for monitoring can be given by
I2 ¼ sTd sd
ð17Þ
Unlike the linear methods, the SPE of KDICA cannot be calculated by the residuals of the original measurement because the non-linear mapping U() is unknown. However, the residual of mapped data U(xl) can be used to calculate SPE indirectly:
b ðxl Þk2 ¼ k/ /k ^ 2 SPE ¼ kUðxl Þ U
ð18Þ
Combining Eqs. (16) and (18), one gets
^ þ/ ^T/ ^ ¼ /T / 2/T UV ^zn1=2 þ n1=2 ^zT V T UT UV ^zn1=2 ¼ kðxl ; xl Þ 2n1=2 kV ^z þ n1 ^zT V T KV ^z SPE ¼ /T / 2/T /
ð19Þ
2
Since the distributions of I and SPE are unknown, their control limits can be calculated by using the kernel density estimation [24] of training sets. When a new online measurement is available, I2new and SP Enew are computed, and if they are beyond their control limits, it is supposed that a fault has occurred. 2.5. Fault diagnosis using non-linear contribution plot A contribution plot is a typical fault diagnosis method for PCA and ICA [2,16,19]. While a monitoring index detects a fault, the variables with the largest contribution to the index are the dominant induction factors of the fault. Hence, these variables are supposed to be the fault points. The contribution plot of KDICA has the same meaning as that of PCA and ICA. For KDICA, however, the contribution plot is difficult to calculate because of the kernel mapping. Using the work of Rakotomamonjy [28], the partial derivative of the radial basis kernel are set to be
@knew @v i
0 2 1 l l 2 v xlj v xlnew 1 l B C @kðv xj ; v xnew Þ exp @ ¼ v x v i xlnew;i v xlj ; v xlnew A¼ r r i j;i @v i ¼
1
r
xlj;i xlnew;i
2 k xlj ; xlnew
ð20Þ
373
J. Fan, Y. Wang / Information Sciences 259 (2014) 369–379
where v = [v1, v2, . . ., vn], vi is set to be 1 and the others are set to be zero to get the partial derivative of the ith variable in the augmented measurement. xlj is the jth measurement vector of the training set Xl. xlnew is an online augmented vector of measurement. Hence, the partial derivative can be regarded as the contribution of the ith variable to the jth element of a new kernel vector. It should be noticed that the kernel vector has to be centered by Eq. (7) and the jth element of a new centered kernel vector is n n n X n 1X 1X 1X knew;j ¼ k xlj ; xlnew k xlp ; xlj k xlp ; xlnew þ 2 k xlp ; xlq n p¼1 n p¼1 n p¼1 q¼1
ð21Þ
The second and fourth terms of Eq. (21) can be regarded as two constants, and their derivatives are zero because they are not influenced by the new variables. Then, combining Eqs. (20) and (21), the contribution of the ith variable to the jth element of the centered kernel vector is
C knew;j ðiÞ ¼
1
r
xlj;i xlnew;i
n 2 1X 2 1 l k xlj;i ; xlnew;i xj;i xlnew;i k xlj;i ; xlnew;i n j¼1 r
ð22Þ
Combining Eqs. (10) and (17), one gets
I2 ¼ zT Bd BTd z ¼
pffiffiffi T pffiffiffi T T T T nK1 V T knew Bd BTd nK1 V T knew ¼ nknew V K1 Bd BTd K1 V T knew ¼ nknew UU T knew
ð23Þ
where U = VK1Bd. Using the variable diagonal contribution [2], the contribution of the jth element of the centered kernel vector to I2 is 2
C I2 ðknew;j Þ ¼ nknew;j kU j k2
ð24Þ
where Uj is the jth row of U. Then, combining Eqs. (22) and (24), the contribution of the ith variable in n X
C I2 ðiÞ ¼ n
2
ðC knew;j ðiÞÞ kU j k2
xlnew
2
to I is
ð25Þ
j¼1
Finally, the contribution of the gth original variable in xnew to I2 is
ContðI2 ; xnew;g Þ ¼
ðlþ1Þ Xg
C I2 ðiÞ
ð26Þ
i¼ðlþ1Þgl
where g = 1, 2, . . ., m0. Combining Eq. (19) (15), and (12), one gets
^ þ /T / ^ ¼ kðxl ; xl Þ 2n1=2 kVBd sd þ n1 ðBd sd ÞT V T KVBd sd SPE ¼ /T / 2/T / pffiffiffi pffiffiffi T pffiffiffi T T T ¼ kðxl ; xl Þ 2n1=2 kVBd BTd nK1 V T k þ n1 Bd BTd nK1 V T k V T KVBd BTd nK1 V T k T
T
¼ kðxl ; xl Þ 2kVBd BTd K1 V T k þ kV K1 Bd BTd V T KVBd BTd K1 V T k ¼ kðxl ; xl Þ kGk 1
Bd BTd V T KVBd BTd
1
T
2VBd BTd
1
T
ð27Þ
T
where G ¼ V K K V K V . Analogously, a variable’s non-linear contribution to SPE can be obtained. This algorithm uses four steps to calculate a variable’s contributions to the KDICA index. First, the augmented variable’s contribution to the element of the kernel vector is computed by using the partial derivative, which is non-linear. Second, the contribution of the kernel vector to the KDICA index is given. In addition, integrating steps 1 and 2, the augmented variable’s contribution to the KDICA index is obtained. Finally, the contribution of the original measurement variable to the KDICA index is obtained. In this paper, only Cont(I2, xnew,g) is applied for fault diagnosis, because SPE is the residual of the KDICA model and contains noise. 3. Case study In this section, the proposed approach will be compared with conventional methods on a benchmark testbed, the Tennessee Eastman (TE) process. The TE process was introduced by Downs and Vogel [9] and it has been widely used to develop, study, and evaluate process control and monitoring strategies. McAvoy et al. [25] proposed the base control of the TE process. Lyman and Georgakis [22] developed four difference control structures for the process. A number of studies have been made of TE to compare and evaluate different monitoring approaches [1,5,10,18,20,29]. In the TE process, there are 12 manipulated variables and 41 measured variables (22 continuous process measurements and 19 composition measurements). The simulation contains 20 faults as shown in Table 1. In this paper, the same simulation data as was generated by Chiang et al. [6] will be used. Because these 19 composition measurements are hard to measure in real time and the agitation speed is not manipulated, there are 33 variables in total,
374
J. Fan, Y. Wang / Information Sciences 259 (2014) 369–379
Table 1 Fault description in TE process. Fault no.
Fault description
Fault type
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16–20
A/C feed ratio, B composition constant (stream 4) B composition, A/C ratio constant (stream 4) D feed temperature (stream 2) Reactor cooling water inlet temperature Condenser cooling water inlet temperature A feed loss (stream 1) C header pressure loss-reduced availability (stream 4) A, B, C feed composition (stream 4) D feed temperature (stream 2) C feed temperature (stream 4) Reactor cooling water inlet temperature Condenser cooling water inlet temperature Reaction kinetics Reactor cooling water valve Condenser cooling water valve Unknown
Step Step Step Step Step Step Step Random variation Random variation Random variation Random variation Random variation Slow drift Sticking Sticking Unknown
containing 22 measurements and 11 manipulated variables, to be used for monitoring, as shown in Table 2. The training data includes 480 samples, and the testing data includes 960 samples. All faults are introduced from sample 160. A normal data set is used to test the false alarm rates of all methods. The AR orders of 33 variables are shown in Fig. 1, which are calculated by cumulants-based identification [12] using the Higher-Order Spectral Analysis Toolbox of Matlab. It is found that there are 22 variables whose AR orders are one. Hence, the time lag l = 1 is enough to include most of the dynamics, and it will lead to fewer ICs than l = 2. The parameter of the radial basis kernel is set to be r = 500m to get fewer whitening scores, which will further improve the speed of the ICA algorithm and reduce the final dimension of the ICs. In this study, 13 PCs and 13 ICs are selected for KPCA and KICA; 17 PCs and 17 ICs are selected for DPCA and DICA; 22 PCs and 22 ICs are selected for KDPCA and KDICA. Because the computational complexity of KDICA is mainly related with the dimension of the whitened scores in the kernel space, and the dimension of the whitened scores is determined by the numbers of observation variables, it is very taxing to make a specific comparison analysis of the time complexities and space complexities of all the methods. For convenience, the computing times of off-line modeling and online monitoring are studied in this paper. The computing times for all algorithms are given in Table 3. It is found that the modeling time of KDICA is the longest. However, a modeling time under 20 s is acceptable and valuable considering that the monitoring model is more accurate. The monitoring time consumed by nonlinear methods are more than 100 times that consumed by linear methods, because the kernel mapping is time-consuming. However, the monitoring time of KDICA for one online data is less than 0.01 s (8.372/960). In a word, the computational complexity of KDICA does not hinder its implementation. Table 4 shows the false alarm rates of the considered methods in terms of their control limits based on 99% confidence limit. One can find that all false alarm rates are acceptable. Then, the detection rates of all considered methods for all faults are shown in Table 5. The highest detection rate values for various faults are in boldface type. The results demonstrate that KDICA gives the highest detection rates for almost all faults. For Faults 1, 2, 6, 7, 8, 12, 13, 14, and 18, the detection rates of all methods are very high and close to 100%, because the magnitudes of these faults are large enough. Faults 3, 9, and 15 are so small that they have almost no effect on the whole process. Because every method considered has two indices that will give Table 2 Monitored variables in TE process. Variables 1 A feed (stream 1) 2 D feed (stream 2) 3 E feed (stream 3) 4 Total feed (stream 4) 5 Recycle flow (stream 8) 6 Reactor feed rate (stream 6) 7 Reactor pressure 8 Reactor level 9 Reactor temperature 10 Purge rate (stream 9) 11 Product separator temperature 12 Product separator level 13 Product separator pressure 14 Product separator underflow (stream 10) 15 Stripper level 16 Stripper pressure 17 Stripper underflow (stream 11)
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
Stripper temperature Stripper steam flow Compressor work Reactor cooling water outlet temperature Separator cooling water outlet temperature D feed flow valve (stream 2) E feed flow valve (stream 3) A feed flow valve (stream 1) Total feed flow valve (stream 4) Compressor recycle valve Purge valve (stream 9) Separator pot liquid flow valve (stream 10) Stripper liquid product flow valve (stream 11) Stripper steam valve Reactor cooling water flow Condenser cooling water flow
375
J. Fan, Y. Wang / Information Sciences 259 (2014) 369–379
4 3.5
AR order
3 2.5 2 1.5 1 0.5 0
0
5
10
15
20
25
30
Variable Fig. 1. AR orders of 33 variables.
Table 3 Comparison of computational time (s). Unit: second
DPCA
KPCA
KDPCA
DICA
KICA
KDICA
Modeling time Monitoring time
2.128 0.054
3.118 5.571
3.402 6.618
9.024 0.082
8.574 6.637
13.536 8.372
Table 4 False alarm rates of considered methods (%). DPCA
KPCA
KDPCA
DICA
KICA
KDICA
T2
SPE
T2
SPE
T2
SPE
I2
SPE
I2
SPE
I2
SPE
0.83
2.79
1.67
2.54
1.25
3.04
2.5
1.4
2.08
1.87
2.4
1.56
Table 5 Detection rate comparison (%). DPCA
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
KPCA
KDPCA
DICA
KICA
KDICA
T2
SPE
T2
SPE
T2
SPE
I2
SPE
I2
SPE
I2
SPE
99.5 98.4 6.0 10.5 28.4 99.4 100 97.6 5.0 47.0 31.1 99.3 94.1 98.4 8.4 29.6 78.4 89.6 5.9 54.3
100 99.0 9.5 100 32.4 100 100 97.5 10.0 55.5 87.9 98.0 95.3 100 12.9 53.9 97.5 91.0 78.0 63.4
99.3 95.3 9.0 9.5 29.9 81.0 100 97.4 5.4 48.6 51.0 98.9 94.3 99.6 9.3 33.6 81.5 90.0 8.8 49.1
100 99.0 6.8 100 30.3 100 100 97.9 6.5 52.5 77.6 98.5 95.2 100 8.8 48.0 95.9 90.6 43.1 59.8
99.5 98.3 4.4 16.9 27.6 78.3 100 97.6 5.9 42.6 33.6 99.1 96.3 99.9 7.3 29.1 80.9 90.4 12.3 51.5
100 99.1 9.6 100 33.5 100 100 97.8 10.0 63.5 91.0 99.1 95.4 100 13.6 59.6 97.4 91.3 67.6 66.75
100 99.1 6.9 100 30.3 100 100 97.8 8.5 87.6 78.9 99.3 94.6 100 11.9 86.2 91.1 90.6 34.8 60.9
99.1 98.3 15.0 100 32.5 100 100 97.5 7.1 85.1 76.0 99.1 95.3 100 17.1 87.5 93.6 91.4 25.5 56.8
100 98.3 14.3 93.1 33.6 100 100 99.0 9.6 87.8 78.3 99.5 95.1 100 17.8 89.0 95.8 90.8 87.0 78.5
99.7 98.4 18.6 98.3 33.3 100 100 97.8 12.8 78.3 68.6 99.8 94.9 100 13.8 80.8 93.6 90.1 70.4 69.1
100 98.8 19.8 100 35.8 100 100 99.4 12.9 92.9 90.3 100 95.9 100 20.4 90.3 98.0 91.8 97.5 83.9
100 98.5 19.4 99.9 35.3 100 100 97.8 13.4 80.6 81.4 99.7 95.9 100 18.5 88.0 95.0 91.0 86.9 72.7
376
J. Fan, Y. Wang / Information Sciences 259 (2014) 369–379
different results, in order to be fair, the average detection rate of the two indices of each method is shown in Fig. 2. It is found that DICA and KICA outperform DPCA and KPCA, respectively, because the ICA takes advantage of the high order statistics to extract independent components while PCA uses only the mean value and variance to extract unrelated components. It is hard to say whether DPCA or KPCA is more effective, because they have different characteristics and advantages. Similarly, DICA and KICA are not comparable. However, KDICA is more effective than any other method because it considers the dynamics, non-linearity, and non-Gaussianity simultaneously.
Fig. 2. Average detection rate of the two indices of each method.
KDPCA
50 0 0
(b) I2
T2
(a) 100
100 200 300 400 500 600 700 800 900 1000
20
SPE
SPE
30
10 0 0
500 400 300 200 100 0 0
400
(d) 1000 I2
I2
KICA
200 0 0
100 200 300 400 500 600 700 800 900 1000
0.1
0.04
SPE
SPE
KDICA
500 0 0
100 200 300 400 500 600 700 800 900 1000
0.06
0.02 0 0
100 200 300 400 500 600 700 800 900 1000
Sample time
Sample time
(c) 600
100 200 300 400 500 600 700 800 900 1000
800 600 400 200 00
100 200 300 400 500 600 700 800 900 1000
DICA
100 200 300 400 500 600 700 800 900 1000
Sample time
0.05 0 0
100 200 300 400 500 600 700 800 900 1000
Sample time
Fig. 3. Monitoring charts for Fault 11: (a) KDPCA, (b) DICA, (c) KICA, (d) KDICA.
377
J. Fan, Y. Wang / Information Sciences 259 (2014) 369–379
To give a clearer illustration of the superiority of KDICA, some monitoring charts for Faults 11 and 19 are shown in Figs. 3 and 4. The horizontal lines are the control limits of the two indices in terms of a 99% confidence limit. In Fig. 3a, most of the points in the T2 chart of KDPCA are under the control limit, but most of the points in the SPE chart are beyond the control limit. The reason is that the PCs of KDPCA and the other PCA-related methods are selected by variance but most changes introduced by Fault 11 are in the PCs with smaller variances. As shown in Fig. 3a–d, both I2 and SPE of the ICA-related methods give very high detection rates because the ICA algorithm takes into account higher-order statistical information. The
40
40
I2
60
(b) 60
T2
KDPCA
(a)
20
20 0 0
0
100 200 300 400 500 600 700 800 900 1000
0
100 200 300 400 500 600 700 800 900 1000
0
100 200 300 400 500 600 700 800 900 1000
150
SPE
3
SPE
DICA
2 1 0
0
100 50 0
100 200 300 400 500 600 700 800 900 1000
Sample time
Sample time KICA
200 150 100 50 0 0
KDICA
(d) 300 I2
I2
(c) 250
200 100 0 0
100 200 300 400 500 600 700 800 900 1000
100 200 300 400 500 600 700 800 900 1000
0.02
SPE
0.01
0.01
0.005 100 200 300 400 500 600 700 800 900 1000
0
0
100 200 300 400 500 600 700 800 900 1000
Sample time
Sample time
Fig. 4. Monitoring charts for Fault 19: (a) KDPCA, (b) DICA, (c) KICA, (d) KDICA.
Sample 167
0
x 10−3
1
Contribution plot
0.5 0
0
5
10
15
20
25
30
35
5
10
15
20
25
30
35
5
10
15
20
25
30
35
−4
Sample 170
0
2
x 10
1 0
0 −3
Sample 200
SPE
0.02 0.015
1
x 10
0.5 0
0
Variable number Fig. 5. Contribution plots at samples 167, 170, and 200.
378
J. Fan, Y. Wang / Information Sciences 259 (2014) 369–379
same phenomena can also be found in Fault 19. In Fig. 4, the monitoring chart of KDICA has few points under the control limits of I2 and the SPE while many points in KDPCA, DICA, and KICA are under the control limits after the fault occurred. This demonstrates that KDICA is more effective than the other methods at detecting faults. The underlying reasons can be summarized as follows. First, KDICA takes into account the dynamics and time delays of the TE process. Second, KDICA successfully extracts non-linear and non-Gaussian features. Finally, the other methods fail to obtain enough information about the TE process. The most accurate monitoring model of KDICA is effective at detecting faults of small observable magnitudes. After KDICA finds a fault, the non-linear contribution plot is applied to diagnose the fault. In terms of Fault 11, the reactor cooling water inlet temperature changes randomly after sample 160. In Fig. 3d, KDICA detects the fault at sample 166. The contribution plot at samples 167, 170, and 200 are shown in Fig. 5. At sample 167, variables 9 and 32 give the largest contributions. As shown in Table 2, variable 9 is the reactor temperature and variable 32 is the reactor cooling water flow. Therefore, it deduces that the reactor cooling water has a fault. At sample 170, besides variables 9 and 32, variables 7, 11, 13 and 16 also make large contributions because the change of temperature in the reactor influences the temperature and pressure in the separator. Furthermore, sample 200 gives the same result as sample 167, which further confirms the fault diagnosis result. In sum, the proposed non-linear contribution plot for KDICA can successfully diagnose faults. 4. Conclusions In this study, a novel approach, KDICA, was proposed for fault detection and diagnosis of non-linear non-Gaussian dynamic processes. KDICA can extract the characteristics of auto-correlation, cross-correlation, and non-linearity through combining the ICA, the kernel method, and the AR model. The fault diagnosis is implemented by using a non-linear contribution plot, which is based on partial derivatives and linear contributions. In addition, KDICA was compared with DPCA, KPCA, KDPCA, DICA, and KICA on the Tennessee Eastman process. The experimental results demonstrate that KDICA can exploit more information than conventional methods, owns the best fault detection performance, and gives exact fault diagnosis results. To judge whether KDICA should be used, the degree of non-linearity and non-Gaussianity of a process should be evaluated. To evaluate the non-linearity degree of a process, one could use the correlation coefficient matrix and mutual information matrix of the monitored variables. Larger difference between these two matrices indicates stronger non-linearity of the process, because the correlation coefficient matrix indicates linear relationship and while the mutual information matrix indicates both linear and non-linear relationships. To evaluate the non-Gaussianity degree of a process, one could use the kurtosis values of the monitored variables. If these kurtosis values are all close to zero, the process is Gaussian; otherwise, it is non-Gaussian. However, the above conclusion holds only for linear processes, because a non-linear combination of several Gaussian distributions might be non-Gaussian. It is still an open problem to evaluate the non-Gaussianity degree for a non-linear process. To conclude, deeper explorations should be carried out to further improve the proposed method. Acknowledgement This work was supported by National Natural Science Foundation of China (61074081), Beijing Nova Program (2011025), Doctoral Fund of Ministry of Education of China (20100010120011), and the Fok Ying-Tong Education Foundation (131060). References [1] C.F. Alcala, S.J. Qin, Reconstruction-based contribution for process monitoring with kernel principal component analysis, Industrial and Engineering Chemistry Research 49 (2010) 7849–7857. [2] C.F. Alcala, S.J. Qin, Analysis and generalization of fault diagnosis methods for process monitoring, Journal of Process Control 21 (2011) 322–330. [3] T.P. Banerjee, S. Das, Multi-sensor data fusion using support vector machine for motor fault detection, Information Sciences 217 (2012) 96–107. [4] J. Chen, C.M. Liao, Dynamic process fault monitoring based on neural network and PCA, Journal of Process Control 12 (2002) 277–289. [5] C.Y. Cheng, C.C. Hsu, M.C. Chen, Adaptive kernel principal component analysis (KPCA) for monitoring small disturbances of nonlinear processes, Industrial and Engineering Chemistry Research 49 (2010) 2254–2262. [6] L.H. Chiang, E. Russell, R.D. Braatz, Fault Detection and Diagnosis in Industrial Systems, Springer, London, 2001. [7] J.H. Cho, J.M. Lee, S.W. Choi, D. Lee, I.B. Lee, Fault identification for process monitoring using kernel principal component analysis, Chemical Engineering Science 60 (2005) 279–288. [8] D. Dong, T.J. McAvoy, Nonlinear principal component analysis-based on principal curves and neural networks, Computers and Chemical Engineering 20 (1996) 65–78. [9] J.J. Downs, E.F. Vogel, A plant-wide industrial process control problem, Computers and Chemical Engineering 17 (1993) 245–255. [10] Z. Ge, C. Yang, Z. Song, Improved kernel PCA-based monitoring approach for nonlinear processes, Chemical Engineering Science 64 (2009) 2245–2255. [11] Z. Ge, C. Yang, Z. Song, H. Wang, Robust online monitoring for multimode processes based on nonlinear external analysis, Industrial and Engineering Chemistry Research 47 (2008) 4775–4783. [12] G.B. Giannakis, J.M. Mendel, Cumulant-based order determination of non-Gaussian ARMA models, IEEE Transactions on Acoustics, Speech, and Signal Processing 38 (1990) 1411–1423. [13] A. Hyvärinen, Fast and robust fixed-point algorithms for independent component analysis, IEEE Transactions on Neural Networks 10 (1999) 626–634. [14] A. Hyvärinen, E. Oja, Independent component analysis: algorithms and applications, Neural Networks 13 (2000) 411–430. [15] M.A. Kramer, Nonlinear principal component analysis using autoassociative neural networks, AIChE Journal 37 (1991) 233–243. [16] W. Ku, R.H. Storer, C. Georgakis, Disturbance detection and isolation by dynamic principal component analysis, Chemometrics and Intelligent Laboratory Systems 30 (1995) 179–196. [17] A. Lemos, W. Caminhas, F. Gomide, Adaptive fault detection and diagnosis using an evolving fuzzy classifier, Information Sciences 220 (2013) 64–85. [18] J.M. Lee, S.J. Qin, I.B. Lee, Fault detection of non-linear processes using kernel independent component analysis, The Canadian Journal of Chemical Engineering 85 (2007) 526–536.
J. Fan, Y. Wang / Information Sciences 259 (2014) 369–379
379
[19] J.M. Lee, C.K. Yoo, I.B. Lee, Statistical process monitoring with independent component analysis, Journal of Process Control 14 (2004) 467–485. [20] J.M. Lee, C.K. Yoo, I.B. Lee, Statistical monitoring of dynamic processes based on dynamic independent component analysis, Chemical Engineering Science 59 (2004) 2995–3006. [21] T.K. Li, C.H. Tsai, H.C. Hsu, A fast fault-identification algorithm for bijective connection graphs using the PMC model, Information Sciences 187 (2012) 291–297. [22] P.R. Lyman, C. Georgakis, Plant-wide control of the Tennessee Eastman problem, Computers and Chemical Engineering 19 (1995) 321–331. [23] Z. Man, K. Lee, D. Wang, Z. Cao, C. Miao, A new robust training algorithm for a class of single-hidden layer feedforward neural networks, Neurocomputing 74 (2011) 2491–2501. [24] E.B. Martin, A.J. Morris, Non-parametric confidence bounds for process performance monitoring charts, Journal of Process Control 6 (1996) 349–358. [25] T.J. McAvoy, N. Ye, Base control for the Tennessee Eastman problem, Computers and Chemical Engineering 18 (1994) 384–413. [26] P. Nomikos, J.F. MacGrego, Multivariate SPC charts for monitoring batch processes, Technometrics 37 (1995) 4159. [27] S.J. Qin, Statistical process monitoring: basics and beyond, Journal of Chemometrics 17 (2003) 480–502. [28] A. Rakotomamonjy, Variable selection using SVM-based criteria, The Journal of Machine Learning Research 3 (2003) 1357–1370. [29] G. Stefatos, A.B. Hamza, Dynamic independent component analysis approach for fault detection and diagnosis, Expert Systems with Applications 37 (2010) 8606–8617. [30] J. Sun, X. Li, Y. Yang, J. Luo, Y. Bai, Scaling the kernel function based on the separating boundary in input space: a data-dependent way for improving the performance of kernel methods, Information Sciences 184 (2012) 140–154. [31] D. Wang, P. Conilione, Machine learning approach for face image retrieval, Neural Computing and Applications 21 (2012) 683–694. [32] J. Wang, Y. Zhang, H. Cao, W. Zhu, Dimension reduction method of independent component analysis for process monitoring based on minimum mean square error, Journal of Process Control 22 (2012) 477–487. [33] S. Wold, K. Esbensen, P. Geladi, Principal component analysis, Chemometrics and Intelligent Laboratory Systems 2 (1987) 37–52. [34] Q. Wu, R. Law, Complex system fault diagnosis based on a fuzzy robust wavelet support vector classifier and an adaptive Gaussian particle swarm optimization, Information Sciences 180 (2010) 4514–4528. [35] Y. Yao, F. Gao, Batch process monitoring in score space of two-dimensional dynamic principal component analysis (PCA), Industrial and Engineering Chemistry Research 46 (2007) 8033–8043. [36] L. Zhang, Q. Cao, A novel ant-based clustering algorithm using the kernel method, Information Sciences 181 (2011) 4658–4672. [37] Y. Zhang, Fault detection and diagnosis of nonlinear processes using improved kernel independent component analysis (KICA) and support vector machine (SVM), Industrial and Engineering Chemistry Research 47 (2008) 6961–6971. [38] Y. Zhang, Enhanced statistical analysis of nonlinear processes using KPCA, KICA and SVM, Chemical Engineering Science 64 (2009) 801–811. [39] Y. Zhang, C. Ma, Fault diagnosis of nonlinear processes using multiscale KPCA and multiscale KPLS, Chemical Engineering Science 66 (2011) 64–72. [40] C. Zhao, F. Gao, F. Wang, Nonlinear batch process monitoring using phase-based kernel-independent component analysis-principal component analysis (KICA-PCA), Industrial and Engineering Chemistry Research 48 (2009) 9163–9174.