Information Sciences 309 (2015) 138–162
Contents lists available at ScienceDirect
Information Sciences journal homepage: www.elsevier.com/locate/ins
Bias-correction fuzzy clustering algorithms Miin-Shen Yang a,⇑, Yi-Cheng Tian a,b a b
Department of Applied Mathematics, Chung Yuan Christian University, Chung-Li 32023, Taiwan Center for General Education, Hsin Sheng College of Medical Care and Management, Longtan, Taiwan
a r t i c l e
i n f o
Article history: Received 6 July 2013 Received in revised form 13 February 2015 Accepted 8 March 2015 Available online 14 March 2015 Keywords: Cluster analysis Fuzzy clustering Fuzzy c-means (FCM) Initialization Bias correction Probability weight
a b s t r a c t Fuzzy clustering is generally an extension of hard clustering and it is based on fuzzy membership partitions. In fuzzy clustering, the fuzzy c-means (FCM) algorithm is the most commonly used clustering method. Numerous studies have presented various generalizations of the FCM algorithm. However, the FCM algorithm and its generalizations are usually affected by initializations. In this paper, we propose a bias-correction term with an updating equation to adjust the effects of initializations on fuzzy clustering algorithms. We first propose the so-called bias-correction fuzzy clustering of the generalized FCM algorithm. We then construct the bias-correction FCM, bias-correction Gustafson and Kessel clustering and bias-correction inter-cluster separation algorithms. We compared the proposed bias-correction fuzzy clustering algorithms with other fuzzy clustering algorithms by using numerical examples. We also applied the bias-correction fuzzy clustering algorithms to real data sets. The results indicated the superiority and effectiveness of the proposed bias-correction fuzzy clustering methods. Ó 2015 Elsevier Inc. All rights reserved.
1. Introduction Clustering is a method for determining the cluster structure of a data set such that objects within the same cluster demonstrate maximum similarity and objects within different clusters demonstrate maximum dissimilarity. Numerous clustering theories and methods have been evaluated in the literature (see Jain and Dubes [10] and Kaufman and Rousseeuw [11]). In general, the most well-known approaches are partitional clustering methods based on an objective function of similarity or dissimilarity measures. In partitional clustering methods, the k-means (see MacQueen [14] and Pollard [20]), fuzzy c-means (FCM) (see Bezdek [2] and Yang [23]), and possibilistic c-means (PCM) algorithms (see Krishnapuram and Keller [12], Honda et al. [8], and Yang and Lai [24]) are the most commonly used approaches. Fuzzy clustering has received considerable attention in the clustering literature. In fuzzy clustering, the FCM algorithm is the most well-known clustering algorithm. Previous studies have proposed numerous extensions of FCM clustering (see Gath and Geva [4], Gustafson and Kessel [5], Hathaway et al. [6], Honda and Ichihashi [7], Husseinzadeh Kashan et al. [9], Miyamoto et al. [15], Pedrycz [17], Pedrycz and Bargiela [18], Wu and Yang [22], Yang et al. [25], and Yu and Yang [26]). Regarding the generalization of FCM clustering, Yu and Yang [26] proposed a generalized FCM (GFCM) model to unify numerous variations of FCM. However, initializations affect FCM clustering and its generalizations. In this paper, we evaluated a bias-correction approach by using an updating equation to adjust the effects of initial values and then propose the bias-correction fuzzy clustering methods. ⇑ Corresponding author. E-mail address:
[email protected] (M.-S. Yang). http://dx.doi.org/10.1016/j.ins.2015.03.006 0020-0255/Ó 2015 Elsevier Inc. All rights reserved.
M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162
139
The rest of this paper is organized as follows. Section 2 presents a brief review of the FCM and GFCM algorithms. Section 3 presents the procedures involved in deriving the bias-correction fuzzy clustering algorithms. In these procedures, a bias-correction term is first assessed using an updating equation. The bias-correction FCM (BFCM), inter-cluster separation (ICS), and Gustafson and Kessel (GK) algorithms are then proposed. The bias-correction term is used as the total information for fuzzy c-partitions so that the proposed BFCM, GK, and ICS algorithms can be used to adjust gradually the effects of poor initializations. Section 4 presents comparisons between different clustering algorithms. In the comparisons, the number of optimal clustering results, error rates and root mean squared errors (RMSEs) are used as performance evaluation criteria. Numerical and real data sets are used to demonstrate the effectiveness and usefulness of the proposed bias-correction algorithms. Finally, conclusions and discussion are stated in Section 5. 2. Fuzzy clustering algorithms Let X ¼ fx1 ; . . . ; xn g be a set of n data points in an s-dimensional real Euclidean space. Let c be a positive integer greater than one. The FCM objective function [2,23] is expressed as follows:
J m ðl; aÞ ¼
n X c X
lmik kxk ai k2
ð1Þ
k¼1 i¼1
where m > 1 is the weighting exponent, a ¼ fa1 ; . . . ; ac g is the set of cluster centers, and the membership lik represents the degree to which the data point xk belongs to a cluster i with
(
X c
l ¼ ½lik cn 2 Mfcm ¼ l ¼ ½lik cn
i¼1
lik ¼ 1; lik P 0; 0
0 Give an initial að0Þ and let t ¼ 0. Step 2: Compute the membership lðtþ1Þ with aðtÞ using Eq. (3). Step 3: Update the cluster center aðtþ1Þ with lðtþ1Þ using Eq. (2). Step 4: Compare aðtþ1Þ to aðtÞ in a convenient matrix norm kk. IF aðtþ1Þ aðtÞ < e, STOP ELSE t ¼ t þ 1 and return to step 2.
The FCM algorithm is the most commonly used clustering algorithm. Numerous generalizations of the FCM algorithm exist. Yu and Yang [26] proposed a unified model, called the generalized FCM (GFCM). The GFCM objective function is expressed as follows:
J hm ðl; aÞ ¼
" n X c X
lmik hi ðdðxk ; ai ÞÞ
k¼1 i¼1
c cX
c
h0 d ai ; aj
# ð4Þ
j¼1
Pc
i¼1 lik ¼ f k for f k P 0; c P 0 are constant weights; hi ðxÞ; i ¼ 0; 1; . . . ; c are continuous functions of x 2 ½0; þ1Þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 0 satisfying its derivative hi ðxÞ > 0 for all x 2 ½0; þ1Þ, and dðxk ; ai Þ is the distance between the data point xk and the cluster center ai . The GFCM framework enables modeling numerous FCM variants. By Lagrange multiplier, the necessary conditions
where
for a minimum of J hm ðl; aÞ are obtained as follows:
140
M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162
Pn ai ¼
m 0 k¼1 ik hi ðdðxk ; ai ÞÞxk Pn m 0 k¼1 ik hi ðdðxk ; ai ÞÞ
l
l
0 j¼1 h0 d ai ; aj aj 0 2c Pc j¼1 h0 d ai ; aj c
2cc
Pc
ð5Þ
1
ðh ðdðx ; ai ÞÞÞm1 1 m1 j¼1 hj dðxk ; aj Þ
lik ¼ f k Pc i k
ð6Þ
The iterations with updating Eqs. (5) and (6) are called the GFCM algorithm. Although the FCM and GFCM algorithms return excellent clustering results when optimal initial values are provided, these fuzzy clustering algorithms are always affected by initial values. An example illustrating this phenomenon is presented in the next section. 3. Bias-correction fuzzy clustering algorithms In general, the FCM and GFCM algorithms are affected by initializations; that is, the FCM and GFCM algorithms may return poor clustering results when poor initializations are used. Therefore, to overcome this drawback of the FCM algorithm and its generalizations, such as the GFCM algorithm, we propose a bias-correction term for reducing the effects of poor initializaPc tions. First, consider a probability mass pi for the cluster center ai ; i ¼ 1; . . . ; c with i¼1 pi ¼ 1; the probability mass pi can be used to represent the proportion of the cluster center ai to the c clusters. Theoretically, the term lnðpi Þ can represent the information on the occurrence of the cluster center ai . The total information based on fuzzy c-partitions lik can thus be P P expressed as nk¼1 ci¼1 lm ik lnðpi Þ, which can be denoted as entropy. An optimal pi can be determined by minimizing the P P entropy to obtain the most information for pi . In general, w nk¼1 ci¼1 lm ik lnðpi Þ is used as a bias correction term to the GFCM objective function shown in Eq. (4) as follows:
J hm ðl; a; pÞ ¼
" n X c X
lmik hi ðdðxk ; ai ÞÞ
k¼1 i¼1
c cX
c
h0 d ai ; aj
j¼1
#
n X c X w lmik lnðpi Þ
ð7Þ
k¼1 i¼1
By Lagrange multiplier, we can get the necessary conditions for minimum of J hm ðl; a; pÞ as follows:
Pn
ai ¼
P
lmik h0i ðdðxk ; ai ÞÞxk 2cc cj¼1 h00 d ai ; aj aj 0 2c Pc m 0 k¼1 lik hi ðdðxk ; ai ÞÞ c j¼1 h0 d ai ; aj
k¼1
ð8Þ
Pn
1
ðh ðdðxk ; ai ÞÞ w lnðpi ÞÞm1 1 m1 j¼1 hj ðdðxk ; aj ÞÞ w lnðpj Þ
lik ¼ f k Pc i
ð9Þ
Pn lm Pc ik pi ¼ Pn k¼1 k¼1
j¼1
ð10Þ
lmjk 1
For Eq. (9), we determine that, if w ! 1, then
1
lik ¼ f k Pcð lnðpi ÞÞm1 1 . If w ! 0, then lik ¼ f k Pcðhi ðdðxk ;ai ÞÞÞm1 1 , as shown in
ð lnðpj ÞÞm1 ðhj ðdðxk ;aj ÞÞÞm1 j¼1 j¼1 Eq. (6) of the GFCM algorithm. Therefore, an updating equation may be used for the parameter w with
wðtÞ ¼ ð0:99Þt
ð11Þ
where t is the number of iterations. Thus, the iterated algorithm with updating Eqs. (8)–(10) and decreasing learning rate of the updating Eq. (11) are called the bias-correction GFCM algorithm. Next, the three types of bias-correction GFCM algorithms, which are the BFCM, bias-correction GK (BGK), and bias-correction ICS (BICS) algorithms, are assessed. 3.1. Bias-correction FCM algorithm The bias-correction FCM (BFCM) objective function is expressed as follows:
J m ðl; a; pÞ ¼
n X c X
n X c X
k¼1 i¼1
k¼1 i¼1
lmik kxk ai k2 w
lmik lnðpi Þ
ð12Þ
P P subject to ci¼1 lik ¼ 1 and ci¼1 pi ¼ 1. Thus, the BFCM algorithm is iterated under the necessary conditions by using the following updating equations:
Pn lmik xk ai ¼ Pk¼1 n m k¼1
ð13Þ
lik
1 m1 kxk ai k2 w lnðpi Þ lik ¼ 1 Pc xk aj 2 w lnðpj Þ m1 j¼1 Pn lm Pn ik m pi ¼ Pc k¼1
j¼1
k¼1
ljk
ð14Þ
ð15Þ
M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162
141
Similarly, we also consider the same decreasing learning rate with the updating Eq. (11) for the parameter w. The BFCM algorithm can be summarized as follows: BFCM algorithm Step 1: Fix 2 6 c 6 n and fix any e > 0 Give an initial að0Þ ; pð0Þ ¼ 1c ; 1c ; . . . ; 1c and let t ¼ 0; wð0Þ ¼ 1. Step 2: Learn the parameter wðtÞ using Eq. (11). Step 3: Compute the membership lðtþ1Þ with aðtÞ and pðtÞ using Eq. (14). ðtþ1Þ
Step 4: Compute the probability weight pi
using Eq. (15);
ðtþ1Þ
with lðtþ1Þ using Eq. (13). Step 5: Update the cluster center a ðtþ1Þ ðtÞ to a in a convenient matrix norm kk. Step 6: Compare a IF aðtþ1Þ aðtÞ < e, STOP ELSE t ¼ t þ 1 and return to step 2.
3.2. Bias-correction GK algorithm Obviously, using the Euclidean distance as a distance measure can lead to optimal results only when a data set containing spherical clusters is used. The FCM algorithm is not ideal for analyzing a data set containing clusters with different shapes. To overcome this drawback, Gustafson and Kessel [5] assessed the effects of different cluster shapes by replacing the Euclidean 2 (squared) distance dðxj ; ai Þ ¼ xj ai in the FCM algorithm with the Mahalanobis distance dðxj ; ai Þ ¼ kxj ai k2 ¼ Ai
ðxj ai ÞT Ai ðxj ai Þ and proposed the GK algorithm. The GK objective function is expressed as follows:
J GK m ðl; a; AÞ ¼
n X c X j¼1 i¼1
lmij xj ai 2Ai
ð16Þ
l 2 Mfcn ; a ¼ ða1 ; . . . ; ac Þ 2 Rcs and A ¼ fA1 ; . . . ; Ac g which Ai is positive definite with detðAi Þ ¼ qi . The necessary conditions for minimizing J GK m ðl; a; AÞ are the following updating equations: with
Pn lmik xk ai ¼ Pk¼1 n m k¼1
ð17Þ
lik
2=ðm1Þ
kxk ai kAi
lik ¼ Pc j¼1
ð18Þ
2=ðm1Þ xk aj A j
Pn 1=s T Pn m m with Ai ¼ ðqi detðSi ÞÞ S1 i , Si ¼ k¼1 lik ðxk ai Þðxk ai Þ k¼1 lik ; i ¼ 1; . . . ; c. Thus, the GK algorithm can be created according to the updating Eqs. (17) and (18). P P By adding the bias-correction term w nk¼1 ci¼1 lm ik lnðpi Þ , the BGK objective function can be expressed as follows:
J m ðl; a; pÞ ¼
n X c X
n X c X
k¼1 i¼1
k¼1 i¼1
lmik kxk ai k2Ai w
lmik lnðpi Þ
ð19Þ
P P subject to ci¼1 uik ¼ 1 and ci¼1 pi ¼ 1. Thus, the BGK algorithm is iterated under the necessary conditions by using the following updating equations:
Pn
lmij xj m j¼1 lij Pn lm Pn ik m pi ¼ Pc k¼1 j¼1 k¼1 ljk j¼1
ai ¼ Pn
lik
ð20Þ ð21Þ
1 m1 kxk ai k2Ai w lnðpi Þ ¼ 1 Pc xk aj 2 w lnðp Þ m1
j¼1
Aj
ð22Þ
j
Pn 1=s T Pn m m with Ai ¼ ðqi detðSi ÞÞ S1 k¼1 lik ðxk ai Þðxk ai Þ k¼1 lik ; i ¼ 1; . . . ; c. Similarly, in this study, we used the updating i , Si ¼ Eq. (11) with the w parameter. Thus, the BGK algorithm can be summarized as follows:
142
M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162
BGK algorithm Step 1: Fix 2 6 c 6 n and fix any e > 0 Give an initial að0Þ ; pð0Þ ¼ 1c ; 1c ; . . . ; 1c and let t ¼ 0; wð0Þ ¼ 1. Step 2: Learn the parameter wðtÞ using Eq. (11). Step 3: Compute the membership lðtþ1Þ with aðtÞ and pðtÞ using Eq. (22). ðtþ1Þ
Step 4: Compute the probability weight pi
using Eq. (21);
Step 5: Update the cluster center aðtþ1Þ with lðtþ1Þ using Eq. (20). Step 6: Compare aðtþ1Þ to aðtÞ in a convenient matrix norm kk. IF aðtþ1Þ aðtÞ < e, STOP ELSE t ¼ t þ 1 and return to step 2. 3.3. Bias-correction ICS algorithm 2 For the GFCM objective function J hm ðl; aÞ, if we have hi ðdðxk ; ai ÞÞ ¼ kxk ai k2 and h0 d ai ; aj ¼ ai aj , then we can obtain the objective function of inter-cluster separation (ICS) fuzzy clustering proposed by Özdemir and Akarun [16] as follows: n X c 1X l; aÞ ¼ n k¼1 i¼1
J ICS m ð
l
m ik kxk
2
ai k
c cX
c
! 2 ; ai aj
cP0
ð23Þ
j¼1
The necessary conditions for minimizing J ICS m ðl; aÞ are the following updating equations: 1
ai ¼ n
Pn
2c Pc m k¼1 ik xk c j¼1 aj P n 1 m 2 k¼1 ik n
l
l
ð24Þ
c
2
kx ai km1 2 m1 j¼1 xk aj
lik ¼ Pc k
ð25Þ
P P By adding the bias-correction term w nk¼1 ci¼1 lm ik lnðpi Þ , the bias-correction ICS (BICS) objective function can be expressed as follows:
J BICS m ð
l; a; pÞ ¼
n X c X
l
m ik kxk
2
ai k
k¼1 i¼1
c cX
c
2 ai aj
!
n X c X w lmik lnðpi Þ;
j¼1
cP0
ð26Þ
k¼1 i¼1
The necessary conditions for minimizing J BICS m ðl; a; pÞ are the following updating equations: 1
ai ¼ n
Pn
2c Pc m k¼1 ik xk c j¼1 aj P n 1 m 2 k¼1 ik n
l
l
ð27Þ
c
1 m1 kxk ai k2 w lnðpi Þ lik ¼ 1 Pc xk aj 2 w lnðp Þ m1 j j¼1 Pn lm Pn ik m pi ¼ Pc k¼1
j¼1
k¼1
ð28Þ
ð29Þ
ljk
Thus, the BICS algorithm can be summarized as follows: BICS algorithm Step 1: Fix 2 6 c 6 n and fix any e > 0 Give an initial að0Þ ; pð0Þ ¼ 1c ; 1c ; . . . ; 1c and let t ¼ 0; wð0Þ ¼ 1. Step 2: Learn the parameter wðtÞ using Eq. (11). Step 3: Compute the membership lðtþ1Þ with aðtÞ and pðtÞ using Eq. (28). ðtþ1Þ
Step 4: Compute the probability weight pi
using Eq. (29);
Step 5: Update the cluster center aðtþ1Þ with lðtþ1Þ using Eq. (27). Step 6: Compare aðtþ1Þ to aðtÞ in a convenient matrix norm kk. IF aðtþ1Þ aðtÞ < e, STOP ELSE t ¼ t þ 1 and return to step 2.
143
M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162
The main differences between the fuzzy clustering algorithms, such as FCM, ICS and GK, and the proposed bias-correction fuzzy clustering algorithms, such as BFCM, BICS and BGK, are that in the proposed bias-correction fuzzy clustering algoPn m l Pc ik m and the fuzzy c-partirithms, the probability mass pi for the cluster center ai is calculated using the equation pi ¼ Pn k¼1 k¼1
tion
j¼1
ljk
lik is updated by replacing the distance between the data point xk and the cluster center ai , such as kxk ai k2 , with
(distance w lnðpi ÞÞ, such as kxk ai k2 w lnðpi Þ. Furthermore, in the proposed bias-correction fuzzy clustering algorithms, an updating equation is used for the w parameter with wðtÞ ¼ ð0:99Þt to ensure that the bias-correction decreases after more iteration steps are conducted. In this sense, the proposed bias-correction fuzzy clustering algorithms can gradually adjust the bias caused by the effects of poor initializations. We subsequently conducted experiments to demonstrate the robustness of the proposed bias-correction fuzzy clustering algorithms to initializations.
Data set
7 6 5 4 3 2 1 0 -1 -2
0
2
4
6
8
Fig. 1. Ten-cluster data set where ‘‘’’ denotes the true cluster centers.
The trace of the initializations for FCM
7
1
2
3
4
5
6 5 4 3 2 1 0
9
10
6
7
8
-1 0
4
2
6
8
Fig. 2. Trajectories of initializations for FCM, where the numbers 1–10 denote 10 initializations and ‘‘’’ denotes the true cluster centers.
The trace of the initializations for BFCM 7
1
2
3
4
5
6 5 4 3 2 1 0
9
10
6
7
8
-1 0
2
4
6
8
Fig. 3. Trajectories of initializations for BFCM, where the numbers 1–10 denote 10 initializations and ‘‘’’ denotes the true cluster centers.
144
M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162
Data set
6
Iteration =1
6
5
5
4
4
4
3
3
3
2
2
2
1
1
1
5
0
0
0
-1
-1
-1
-2 -1
0
1
2
3
4
5
6
7
-2 -1
0
1
2
3
4
5
6
-2 -1
7
(a)
5
5
4
4
4
3
3
3
2
2
2
1
1
1
0
0
0
-1
-1
-1 -2
1
2
3
4
5
6
7
-2 -1
0
1
(d)
2
3
4
5
6
7
5
5
4
4
4
3
3
3
2
2
2
1
1
1
0
0
0
-1 0
1
2
3
4
5
6
-2 -1
7
0
1
(g)
2
3
4
5
6
7
5
4
4
3
3
3
2
2
2
1
1
1
0
0
0
1
2
(j)
2
3
4
5
6
7
3
1
2
3
6
7
4
5
6
7
4
5
6
7
FCM
5
7
4
8
3
69
2
-1 0
0
-1
6 5
-1
5
Iteration =60
-1 -2
5
4
4
(i)
Iteration =74
6
-2 -1
1
(h)
Iteration =70
6
0
6
5
-1
3
(f)
Iteration =50
6
-2 -1
-1
(e)
Iteration =40
6
2
Iteration =30
6
5
0
1
(c)
Iteration =20
6
-2 -1
0
(b)
Iteration =10
6
Iteration =5
6
1
-1 -2 -1
0
1
2
(k)
3
4
5
6
7
-1
0
1
2
3
4
5
6
(l)
Fig. 4. (a) Nine-cluster data set with 9 initial cluster centers; (b)–(j) cluster centers after 1, 5, 10, 20, 30, 40, 50, 60 and 70 iterations, respectively, from FCM; (k) convergent cluster centers after 74 iterations from FCM; and (l) final clustering results from FCM.
145
M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162
3.4. Analysis on influence of initializations P P We examined the correction behavior of the bias-correction term w nk¼1 ci¼1 lm ik lnðpi Þ for initializations. We observed the bias-correction term in mathematical form and derived a term with a positive value to ensure that the BFCM objective function J m ðl; a; pÞ is greater than the FCM objective function J m ðl; aÞ. However, the bias-correction term presents the total information on the occurrence of the cluster centers with weighted fuzzy c-partitions. This is analogous to heating the cost function so that it becomes more flexible for adjusting the poor initializations and then cools down the system by a decreasing learning rate of wðtÞ ¼ ð0:99Þt . To analyze the influence of initializations, we present some examples to demonstrate the analysis. Because the BFCM, BGK, and BICS algorithms all involve adding the same bias-correction term, we present initialization analysis only for the BFCM algorithm. We first present the correction behavior for the bias-correction term by using the moving routes of cluster centers during iterations in Example 1. We then present more examples that involve analyses of the correction behavior of the bias-correction term by using mean squared errors (MSEs). Example 1. We used a simulated data set in this example. As illustrated in Fig. 1, the data set comprised 10 clusters and each cluster comprised 100 data points; ‘‘’’ denotes the true cluster centers. We choose the 10 initial cluster centers ð0Þ
ð0Þ
ð0Þ
ð0Þ
ð0Þ
ð0Þ
ð0Þ
ð0Þ
ð0Þ
a1 ¼ ð0; 7:5Þ, a2 ¼ ð2; 7:5Þ, a3 ¼ ð4; 7:5Þ, a4 ¼ ð6; 7:5Þ, a5 ¼ ð8; 7:5Þ, a6 ¼ ð0; 0Þ, a7 ¼ ð3:5; 0Þ, a8 ¼ ð7; 0Þ, a9 ¼ ð0; 1:5Þ, ð0Þ a10
¼ ð7; 1:5Þ for both FCM and BFCM algorithms, where the numbers 1–10 denotes 10 initializations, as shown in Figs. 2 and 3. ð0Þ
Fig. 2 depicts the initialization trajectories for the FCM algorithm, indicating that the two initializations at a1 ¼ ð0; 7:5Þ and ð0Þ a2
¼ ð2; 7:5Þ approach the same cluster center; therefore, no initialization approached the true cluster center ð3:5; 0:5Þ. Fig. 3 shows the initialization trajectories for the BFCM algorithm, indicating that some routes are affected by the bias correction; therefore, all initializations finally approached the true cluster center. These trajectories demonstrate the effects of initializations on the bias-correction term. In summary, the BFCM algorithm is actually more robust against initializations than the FCM is. Example 2. We used a simulated data set in this example as well. The data set comprised nine clusters and each cluster has ð0Þ
ð0Þ
100 data points, as shown in Figs. 4(a) and 6(a). We choose the 9 initial cluster centers a1 ¼ ð3:4; 1:6Þ, a2 ¼ ð3:3; 0:6Þ, ð0Þ a3
ð0Þ a4
ð0Þ a5
ð0Þ a6
ð0Þ a7
ð0Þ a8
ð0Þ a9
¼ ð2:6; 2:3Þ, ¼ ð0:4; 4:3Þ, ¼ ð4:6; 2:0Þ, ¼ ð3:4; 0:5Þ, ¼ ð5:1; 2:1Þ, ¼ ð3:4; 0:9Þ and ¼ ð0:1; 2:1Þ for both FCM and BFCM algorithms, as shown in Figs. 4(a) and 6(a). Fig. 4 illustrates the centers of these clusters after 1, 10, 30, 60, and 74 iterations in the FCM algorithm; the final clustering results are also shown in this figure. Fig. 5 shows the plot of the MSEs of the FCM algorithm for different iterations. Furthermore, Fig. 6 illustrates the centers of these clusters after 1, 10, 20, 30, 40, 50, 60, 100, 500 and 507 iterations in the BFCM algorithm as well as the final clustering results. Fig. 7 depicts the plot of the MSEs of the BFCM algorithm for different iterations. The plots of the MSEs of both the FCM and BFCM algorithms for the first 70 iterations were incorporated (Fig. 8) for comparison. As shown in Figs. 4, 5, and 8, the FCM algorithm adjusts initializations at an extremely fast rate, thus preventing it from targeting actual cluster centers. The BFCM algorithm gradually adjusts initializations and then targets the true cluster centers after the iteration t = 60 (Figs. 6–8). This behavior indicates the effects of adjusting the initialization on the bias-correction term. In general, the BFCM algorithm is actually more robust against initializations than the FCM algorithm is. ð0Þ
This example is based on Example 2. However, we use nine additional initial cluster centers with a1 ¼ ð1:1; 4:7Þ, ð0Þ
ð0Þ
ð0Þ
ð0Þ
ð0Þ
ð0Þ
ð0Þ
a2 ¼ ð4:2; 4:1Þ, a3 ¼ ð1:1; 2:8Þ, a4 ¼ ð4:8; 2:6Þ, a5 ¼ ð1:6; 4:6Þ, a6 ¼ ð0:8; 1:2Þ, a7 ¼ ð3:1; 3:7Þ, a8 ¼ ð2:3; 1:7Þ and ¼ ð1:6; 2:7Þ for both FCM and BFCM algorithms. We determined that both the FCM and BFCM algorithms returned
FCM
1.6 1.4 1.2
MSE
ð0Þ a9
1 0.8 0.6 0.4 0.2
0
10
20
30
40
50
60
70
numbers of iteration Fig. 5. Plot of MSEs from FCM for different iterations.
80
146
M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162
Data set
6
6
Iteration =1
Iteration =10
6
5
5
5
4
4
4
3
3
3
2
2
2
1
1
1
0
0
0
-1 -2 -1 0
-1 -2 -1 0 1 2 3 4 5 6 7
-1 -2 -1 0
1
2
3
4
5
6
7
(a) Iteration =20
6
6
Iteration =30
5
5
4
4
3
3
3
2
2
2
1
1
1
0
0
0
-1 -2 -1 0
-1 -2 -1 0
2
3
4
5
6
7
1
(d)
3
4
5
6
7
6
Iteration =60
5
4
4
4
3
3
3
2
2
2
1
1
1
0
0
0
2
3
4
5
6
-1 -2 -1 0
7
1
(g)
3
4
5
6
7
-1 -2 -1 0
6
Iteration =507
5 4
4
3
3
3
2
2
2
1
1
1
0
0
0
1
2
(j)
3
4
5
6
7
-1 -2 -1 0
7
2
3
4
5
6
7
2
3
4
5
6
7
BFCM
6
4
-1 0
6
(i)
5
-1 -2
1
(h)
Iteration =300
6
2
5
Iteration =100
6
5
1
4
(f)
5
-1 -2 -1 0
1
(e)
Iteration =50
6
2
3
Iteration =40
6
4
1
2
(c)
5
-1 -2 -1 0
1
(b)
5
3 333 3 33 3 333333 3 3 33 33333 33 3333 333333 333 33 3 33 33333333 33 33333 33 3 333 3333 33 3 33333 3 3 2 2 22 22222222 2 22 2 2 22 222222222 222222 222222222 222 222 2 222222 222 222 222222222 2 2 2 22222 2222
1 111 11111 1 11111111111 11 1 1111 11 11111 1111 1 1111 1 11111 1 11 11111111 1 111 1 1111111111 11 11 11
6 6666666 666 6 66 6 666666666 666 6666666 6 66666 6666666 666666 6 66666666666 6 666666 6 6 6 6 4 44 444 44444444 444 444 44 4 4 444444444 4 4444444 4444 44 44 44 44 44444 4 444444 4 44 4 444 44 444 4 4 55 55 5 55555 55 55 55 5555555 55555 5 5 55 555555 55 55 555 55 555 555 5555 5 55 555 55 5 5 5 5 55555555
7 7 77777777777777 777777777777 7 7777777 7777777 777 77 777777 7 77 7 77 777777 7 77 7777 9
9
9 99 9 999 9999 9 9 9999999 9999 9999 999999 999999 9999999 99999 9999999 9 999 9 99999 9999 9 9 99999
8 8 88 8888 8 8888 8 88 88 88 8888 8888888 88 8888888888 888888888 88 88 888 8 888888 88888 8888 8 8 8
-1 1
2
3
(k)
4
5
6
7
0
2
4
6
(l)
Fig. 6. (a) Nine-cluster data set with 9 initial cluster centers; (b)–(j) cluster centers after 1, 10, 20, 30, 40, 50, 60, 100 and 500 iterations, respectively, from BFCM; (k) convergent cluster center after 507 iterations from BFCM; and (l) final clustering results from BFCM.
147
M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162
BFCM 2 1.5
MSE
1 0.5 0 -0.5
0
100
50
150
200
250
300
350
400
450
500
numbers of iteration Fig. 7. Plot of MSEs from BFCM.
comparisons of FCM and BFCM 2
MSE
1.5 1 0.5 0 -0.5
FCM BFCM
0
10
20
30
40
50
60
70
numbers of iteration Fig. 8. Comparisons between MSEs from FCM and BFCM for the first 70 iterations.
comparisons of FCM and BFCM 1.5
MSE
1
0.5
0
-0.5 FCM BFCM
0
1
2
3
4
5
6
7
8
various values of initializations Fig. 9. MSE results from FCM and BFCM with various initializations.
148
M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162
Table 1 ð0Þ MSEs of FCM and BFCM with different initializations in a1 . a1
(1.1, 0.7)
(1.1, 1.7)
(1.1, 2.7)
(1.1, 3.7)
(1.1, 4.7)
(1.1, 5.7)
(1.1, 6.7)
(1.1, 7.7)
FCM BFCM
0.3417 0.0007
0.0007 0.0007
0.5183 0.0007
0.5934 0.0007
0.0007 0.0007
0.6210 0.0007
0.6207 0.0007
0.8709 0.0007
ð0Þ
t
t
(0.9) & Iteration =78
t
(0.99) & Iteration =329
6
6
6
5
5
5
4
4
4
3
3
3
2
2
2
1
1
1
0
0
0
-1 -2 -1 0
1
2
3
4
5
6
-1 -2 -1 0
7
1
(a)
2
3
4
5
6
7
-1 -2 -1 0
5
5
5
4
4
4
3
3
3
2
2
2
1
1
1
0
0
0
4
5
6
-1 -2 -1 0
7
(d)
1
2
4
5
6
7
(0.999999) & Iteration =224 6
3
3
t
(0.99999) & Iteration =224 6
2
2
(c)
t
t
(0.9999) & Iteration =5268
1
1
(b)
6
-1 -2 -1 0
(0.999) & Iteration =1933
3
4
5
6
7
-1 -2 -1 0
(e)
1
2
3
4
5
6
7
(f)
Fig. 10. BFCM convergent cluster centers using different updating equations for the parameter w with their convergent iterations; (a) w = (0.9)t, 78 iterations; (b) w = (0.99)t, 329 iterations; (c) w = (0.999)t, 1933 iterations; (d) w = (0.9999)t, 5268 iterations; (e) w = (0.99999)t, 224 iterations; and (f) w = (0.999999)t, 224 iterations.
optimal final clustering results with extremely low MSEs (Fig. 9, fifth point). The final clustering results of the FCM algorithm may be changeable, even if the initialization has a small change. However, the BFCM algorithm still has stable final clustering results with an extremely low MSE. These procedures were conducted by assigning various values to the second component ð0Þ
ð0Þ
ð0Þ
of a1 , as shown in Table 1. For example, a1 ¼ ð1:1; 0:7Þ is the first point and a1 ¼ ð1:1; 4:7Þ is the fifth point with their respective MSEs shown in Table 1 and Fig. 9. As shown in Table 1 and Fig. 9, the FCM algorithm is sensitive to initializations and its final clustering results can be changed. However, the BFCM algorithm is not sensitive to initializations and its final clustering results are stable. In this example, we studied different learning behaviors for the parameter w. We executed the BFCM algorithm for the data set employed in Example 2. We assessed the learning approach by using the updating equation w ¼ ð0:9Þt , and the results indicated that initializations are adjusted at an extremely fast rate, thus preventing them from targeting optimal cluster centers, as shown in Fig. 10(a). Similarly, the clustering results (Fig. 10(e) and (f)) obtained using the updating equations w ¼ ð0:99999Þt and w ¼ ð0:999999Þt , respectively, are not optimal; this is because initializations are adjusted at an extremely slow rate, thus preventing them from approaching optimal clustering centers. Regarding the clustering results (Fig. 10(b)–(d)) obtained using the updating equations w ¼ ð0:99Þt , w ¼ ð0:999Þt and w ¼ ð0:9999Þt , initializations can be adjusted and targeted to optimal clustering centers. However, the MSE of the updating equation w ¼ ð0:99Þt is lower than that of w ¼ ð0:999Þt and w ¼ ð0:9999Þt , as shown in Table 2 and Fig. 11. In general, we recommend using the updating equation w ¼ ð0:99Þt as a decreasing learning approach for the parameter w in the bias-correction term.
149
M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162 Table 2 MSEs of BFCM and iteration number in various updating equations of w. w
(0.9)t
(0.99)t
(0.999)t
(0.9999)t
(0.99999)t
(0.999999)t
MSE iteration
0.5642 78
0.0007 329
0.0011 1933
0.0416 5268
0.2245 224
0.2259 224
1 0.8
(0.9)t
MSE
0.6
(0.999999) t (0.99999) t
0.4 (0.99)t
0.2
(0.9999) t (0.999) t
0 -0.2 -0.4 1
2
3
4
5
6
w Fig. 11. MSEs of BFCM in various updating equations of w.
3.5. Convergence properties of the BFCM clustering algorithm We assessed a convergence theorem for the proposed BFCM clustering algorithm; that is, we can guarantee that any BFCM convergent subsequence will tend to optimal solutions. Zangwill’s convergence theorem [27] must be applied first. Zangwill [27] originally defined a point-to-set map as T : V ! PðVÞ, where PðVÞ represents the power set of V; a closed point-to-set map must be defined. However, the algorithms of interest here is a point-to-point map and the ‘‘closed’’ property is exactly ‘‘continuity’’ for the case of point-to-point map. Thus, the Zangwill’s convergence theorem [27] is given as follows: Zangwill’s convergence theorem [27]. Let the point-to-point map T : V ! V generate a sequence fzk g1 k¼0 by zkþ1 ¼ Tðzk Þ. Let a solution set X V be given, and suppose that: (1) There is a continuous function Z : V ! R such that (a) if z R X, then ZðTðzÞÞ < ZðzÞ, and (b) if z 2 X, then ZðTðzÞÞ 6 ZðzÞ. (2) The map T is continuous on V n X. (3) All point zk are contained in a compact set S # V. Then the limit of any convergent subsequence shall be in the solution set X, and Zðzk Þ will monotonically converge to ZðzÞ for some z 2 X. P P
P We set that M fcm ¼ l ¼ ½lik cn ci¼1 lik ¼ 1; lik P 0; 0 < nk¼1 lik < n , M p ¼ p ¼ ½pi c1 ci¼1 pi ¼ 1; pi P 0 and T
a ¼ ðaT1 ; . . . ; aTc Þ . Let
8 9 8l 2 M fcm ; l – l; J m ðl ; a ; p Þ < J m ðl; a ; p Þ > > < = XF ¼ ðl ; a ; p Þ 8a – a; Jm ðl ; a ; p Þ < Jm ðl ; a; p Þ > > : 8p 2 Mp ; p – p; J ðl ; a ; p Þ < J ðl ; a ; pÞ ; m m
where
Pn m l k¼1 ik Pc P n j¼1
1
kx ai k2 w lnðpi ÞÞm1
l ¼ ½lik cn with lik ¼ Pcð k k¼1
j¼1
c
m jk
l
2
kxk aj k
T T 1 , a ¼ ða1 ; . . . ; ac Þ m1
w lnðpj Þ
. Let E : ðRs Þ M p ! M fcm with Eða; pÞ ¼ l ¼
lik
, where cn
T
Pn m lik xk with ai ¼ Pk¼1 , and p ¼ ½pi c1 with pi ¼ n m k¼1
lik
1
kx ai k2 w lnðpi ÞÞm1
lik is calculated via lik ¼ Pcð k j¼1
2
kxk aj k
1 . Let m1
w lnðpj Þ
150
M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162
Pn m lik xk T c F : M fcm ! ðRs Þ ; FðlÞ ¼ a ¼ ðaT1 ; . . . ; aTc Þ , where ai is calculated via ai ¼ Pk¼1 . Let G : M fcm ! M p ; GðlÞ ¼ p ¼ ½pi c1 where pi n lm k¼1 ik Pn m l k¼1 ik is calculated by pi ¼ Pc P . We next define the BFCM operator as follows. n m j¼1
Definition
1. The
k¼1
BFCM
ljk
c
c
T m : M fcm ðRs Þ M p ! M fcm ðRs Þ M p
operator
s c
is
defined
by
T m ¼ A2 A1
where
c
A1 : M fcm ðR Þ M p ! M fcm with A1 ðl; a; pÞ ¼ Eða; pÞ and A2 : M fcm ! M fcm ðRs Þ M p with A2 ðlÞ ¼ ðl; FðlÞ; GðlÞÞ. Thus, we have
T m ðl; a; pÞ ¼ ðA2 A1 Þðl; a; pÞ ¼ A2 ðA1 ðl; a; pÞÞ ¼ A2 ðEða; pÞÞ ¼ ðEða; pÞ; FðEða; pÞÞ; GðEða; pÞÞÞ ¼ ðl ; a ; p Þ where l ¼ Eða; pÞ, a ¼ FðEða; pÞÞ ¼ Fðl Þ and p ¼ GðEða; pÞÞ ¼ Gðl Þ. In general, the sufficient and necessary condition for a strict minimizer of an objective function is to analyze the Jacobian matrix and Hessian matrix. However, if some constraints are considered, Lagrange’s multipliers in addition to a bordered Hessian matrix must be assessed as follows. Theorem 1 (Lagrange’s theorem; see Werner and Sotskov [21], pp. 425–426). Let functions f : Df ! R; Df # Rn , and g i : Dg i ! R, Dgi # Rn ; i ¼ 1; . . . ; t; t < n, be continuously partially differentiable and let x0 ¼ ðx01 ; x02 ; . . . ; x0n Þ 2 Df be a local extreme point of the function f subject to the constraints g i ðx1 ; x2 ; . . . ; xn Þ ¼ 0; i ¼ 1; 2; . . . ; t. Let Lðx; kÞ ¼ f ðx1 ; x2 ; . . . ; xn Þþ Pt i¼1 ki g i ðx1 ; x2 ; . . . ; xn Þ and
@g ðxÞ 1 @x1 jJ j ¼ ... @gt ðxÞ @x 1
@g 1 ðxÞ @xt
.. – 0 . @g t ðxÞ @x t
at the point x0 . Then, we have that the gradient of Lðx; kÞ at the point ðx0 ; k0 Þ is 0, i.e. rLðx0 ; k0 Þ ¼ 0. Theorem 2 (local sufficient conditions; see Werner and Sotskov [21], pp. 426–427). Let functions f : Df ! R; Df # Rn ,and g i : Dgi ! R; Dgi # Rn ; i ¼ 1; . . . ; t; t < n, be twice continuously partially differentiable and let ðx0 ; k0 Þ with x0 2 Df be a solution of the system rLðx0 ; k0 Þ ¼ 0. Let
2
0
6 6 . 6 .. 6 6 6 0 6 HL ðx; kÞ ¼ 6 2 6 @L 6 @x1 @k1 6 6 . 6 .. 4 2
@ L @xn @k1
0
@2 L @k1 @x1
.. .
.. .
0
@2 L @kt @x1
@2 L @x1 @kt
@2 L @x1 @x1
.. .
.. .
2
2
@ L @xn @kt
@ L @xn @x1
@2 L @k1 @xn
3
7 7 7 7 7 @2 L 7 @kt @xn 7 7 2 @x@1 @xL n 7 7 7 .. 7 . 7 5 2 @x@n @xL n .. .
be the bordered Hessian and consider its leading principle minors Hr ðx0 ; k0 Þ of the order r ¼ 2t þ 1; 2t þ 2; . . . ; n þ t at point 0 0 x ; k . Therefore, the following expressions can be derived: (1) If all leading principle minors, Hr ðx0 ; k0 Þ; 2t þ 1 6 r 6 n þ t, have the sign ð1Þt , then x0 ¼ ðx01 ; x02 ; . . . ; x0n Þ is a local minimum point of function f subject to the constraints g i ðxÞ ¼ 0; i ¼ 1; 2; . . . ; t. (2) If the signs of all leading principle minors Hr ðx0 ; k0 Þ; 2t þ 1 6 r 6 n þ t, are alternated, the sign of Hnþt ðx0 ; k0 Þ ¼ HL ðx0 ; k0 Þ being that of ð1Þn , then x0 ¼ ðx0 ; x0 ; . . . ; x0 Þ is a local maximum point of function f subject to the constraints 1
2
n
g i ðxÞ ¼ 0; i ¼ 1; 2; . . . ; t. (3) If neither the conditions of (1) nor those of (2) are satisfied, then x0 is not a local extreme point of function f subject to the constraints g i ðxÞ ¼ 0; i ¼ 1; 2; . . . ; t. Here, the case in which one or several leading principal minors have a value of zero is not considered a violation of condition (1) or (2).
Remark 1 (see Werner and Sotskov [21], p. 379). The leading principal minors of matrix A ¼ ðaij Þ of order n n are the determinants
151
M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162
a11 a 21 Dk ¼ . .. a k1
a1k a2k ; .. . a
a12 a22 .. . ak2
k ¼ 1; 2; . . . ; n
kk
i.e. Dk is obtained from jAj by crossing out the last n k columns and rows. T ^ and p ¼ p ^ ; a; p ^ are fixed, then J m ðl ^Þ is minimized at a ¼ ðaT Lemma 1. If l ¼ l 1 ; . . . ; ac Þ Pn m l x ik k ai ¼ Pk¼1 ; 8i ¼ 1; . . . ; c. n m k¼1
T
if and only if
lik
Proof. Recall that J m ðl; a; pÞ ¼
Pn Pc k¼1
i¼1
Pn Pc
lmik kxk ai k2 w
Pn n lmik xk @J m X ¼ lmik ð2ðxk ai ÞÞ ¼ 0 ) ai ¼ Pk¼1 ; n m @ai k¼1 lik k¼1
k¼1
i¼1
lmik lnðpi Þ. With the gradient of Jm w.r.t. ai , we have
8i ¼ 1; . . . ; c
^ and p ¼ p ^ are fixed, then we Thus, we had proved the ‘‘only if’’ condition. We next prove the ‘‘if’’ condition as follows. If l ¼ l Pn Pn 1; if i ¼ j @J m @ 2 Jm m m have @a ¼ k¼1 lik ð2ðxk ai ÞÞ and @a @a ¼ 2 dij Is k¼1 lik , where dij is Kronecker index with dij ¼ . Thus, i i j 0; if i – j Pn P P n n m m m ^ ; a; p ^Þ w.r.t. a is 2 diag Is k¼1 l1k ; Is k¼1 l2k ; . . . ; Is k¼1 lck , and obviously, the Hessian the Hessian matrix of J m ðl Pn m lik xk T T ^ ; a; p ^Þ is minimized at a ¼ ðaT Pk¼1 ; . . . ; a Þ with a ¼ ; 8i ¼ 1; . . . ; c. h matrix is positive definite. That is, J m ðl n 1 c i m k¼1
Lemma 2. If Pn m l k¼1 ik pi ¼ Pc P n j¼1
k¼1
l ¼ l^ and a ¼ a^ are fixed, then Jm ðl^ ; a^; pÞ subject to lm jk
Pc
i¼1 pi
lik
¼ 1 is minimized at p ¼ ½pi c1 if and only if
; 8i ¼ 1; . . . ; c.
Proof. Let the Lagrangian function be
L1 ¼
n X c X
l
m ik kxk
n X c c X X ai k w lmik lnðpi Þ þ g pi 1
!
2
k¼1 i¼1
k¼1 i¼1
i¼1
where g is a Lagrangian multiplier. With the gradient of L1 w.r.t. pi and g, we have
8 n X > @L1 m 1 > > > @pi ¼ w lik pi þ g ¼ 0 < k¼1
> > @L1 > > : @g ¼
c X
) pi ¼
pi 1 ¼ 0
n wX
c X
g
i¼1
lmik ;
k¼1
pi ¼
c X n wX
g
lmik ¼ 1
i¼1 k¼1
i¼1
Pn m l k¼1 ik Thus, we have pi ¼ Pc P n
P P , g ¼ w ci¼1 nk¼1 lm ik and the ‘‘only if’’ condition is proved. We next prove the ‘‘if’’ condition by j¼1 Pn m Pn m 2 w l w l @ 2 L1 @ 2 L1 L1 k¼1 ik k¼1 ik 1 ^ are fixed, then @L ^ and a ¼ a ¼ þ g, @p ¼ dij , and @p ¼ @@g@p ¼ 1. Thus, the Theorem 2 as follows. If l ¼ l @pi pi p2 i @pj i @g i lm k¼1 jk
i
bordered Hessian matrix w.r.t. p and g is
2
0
6 61 6 6 6 6 HL1 ðp; gÞ ¼ 6 1 6 6. 6. 6. 4 1
1
1
@ 2 L1 @p1 @p1
0
0 .. . 0
..
. ..
.
1
3
7 0 7 7 7 .. 7 7 . 7 7 .. 7 7 . 7 5 2
@ L1 @pc @pc
Note that we have only one constraint, i.e. t ¼ 1, and so ð1Þ1 ¼ 1 < 0. We next check all leading principle minors as follows:
152
M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162
0 H3 ðp ; g Þ ¼ 1 1
1 Pn
0 Pn m w l k¼1 2k p2 1
m k¼1 1k
w
l
p21
0
2
0 1 H4 ðp ; g Þ ¼ 1 1
1 Pn
p¼p ; g¼g
0 0 Pn m w l k¼1 3k p2
1
m k¼1 1k
w
P Pn w k¼1 lm w nk¼1 lm 1k 2k ¼ þ < 0; 2 2 p1 p2 p¼p ; g¼g
l
1
0
p21
w
0
Pn
m k¼1 2k
l
p22
0
0
3
p¼p ; g¼g
Pn Pn Pn Pn Pn 2 Pn m m m m m m w w2 w2 k¼1 l1k k¼1 l2k k¼1 l2k k¼1 l3k k¼1 l3k k¼1 l1k ¼ þ þ 1 < @lik ¼ mlik kxk ai k mlik w lnðpi Þ þ kk ¼ 0 1 m1 kk m1 c X ) l ¼ kxk ai k2 w lnðpi Þ ik @L2 > m lik 1 ¼ 0 > : @kk ¼ i¼1
and
kk
1 m1
m
¼ Pc
1 1
ðkxk ai k2 w lnðpi ÞÞm1 i¼1
. This implies that
153
M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162
ik
l
1 m1 kxk ai k2 w lnðpi Þ ¼ 1 ; Pc xk aj 2 w lnðp Þ m1 j j¼1
kk ¼ mlm1 kxk ai k2 w lnðpi Þ ik
^ and p ¼ p ^ are fixed, and the ‘‘only if’’ condition is proved. We next prove the ‘‘if’’ condition by following Theorem 2. If a ¼ a 2 @L2 m1 then @ l ¼ mlik kxk ai k w lnðpi Þ þ kk , ik
@ 2 L2 ¼ dij dkr mðm 1Þlm2 kxk ai k2 w lnðpi Þ ; ik @ lik @ ljr Thus, the bordered Hessian matrix w.r.t.
2
0
61 6 6 HL2 ðlk ; kk Þ ¼ 6 . 6 .. 4 1
lk and kk is
1
1
1
@ 2 L2 @ l1k @ l1k
0
0
.. .
..
0
3
.. .
.
2
@ L2 @ lck @ lck
0
@ 2 L2 @ 2 L2 ¼ ¼1 @ lik @kk @kk @ lik
and
7 7 7 7 7 5
Similar as the proof in Lemma 2, we check all leading principle minors as follows:
0 1 1 2 m2 1 mðm 1Þ l x a w lnðp Þ 0 k k 1 k 1 H3 ðl ; k Þ ¼ 1k k k 2 m2 1 0 mðm 1Þl2k kxk a2 k w lnðp2 Þ lk ¼lk ; kk ¼kk 2 2 m2 ¼ mðm 1Þlm2