Bias-correction fuzzy clustering algorithms

Report 2 Downloads 244 Views
Information Sciences 309 (2015) 138–162

Contents lists available at ScienceDirect

Information Sciences journal homepage: www.elsevier.com/locate/ins

Bias-correction fuzzy clustering algorithms Miin-Shen Yang a,⇑, Yi-Cheng Tian a,b a b

Department of Applied Mathematics, Chung Yuan Christian University, Chung-Li 32023, Taiwan Center for General Education, Hsin Sheng College of Medical Care and Management, Longtan, Taiwan

a r t i c l e

i n f o

Article history: Received 6 July 2013 Received in revised form 13 February 2015 Accepted 8 March 2015 Available online 14 March 2015 Keywords: Cluster analysis Fuzzy clustering Fuzzy c-means (FCM) Initialization Bias correction Probability weight

a b s t r a c t Fuzzy clustering is generally an extension of hard clustering and it is based on fuzzy membership partitions. In fuzzy clustering, the fuzzy c-means (FCM) algorithm is the most commonly used clustering method. Numerous studies have presented various generalizations of the FCM algorithm. However, the FCM algorithm and its generalizations are usually affected by initializations. In this paper, we propose a bias-correction term with an updating equation to adjust the effects of initializations on fuzzy clustering algorithms. We first propose the so-called bias-correction fuzzy clustering of the generalized FCM algorithm. We then construct the bias-correction FCM, bias-correction Gustafson and Kessel clustering and bias-correction inter-cluster separation algorithms. We compared the proposed bias-correction fuzzy clustering algorithms with other fuzzy clustering algorithms by using numerical examples. We also applied the bias-correction fuzzy clustering algorithms to real data sets. The results indicated the superiority and effectiveness of the proposed bias-correction fuzzy clustering methods. Ó 2015 Elsevier Inc. All rights reserved.

1. Introduction Clustering is a method for determining the cluster structure of a data set such that objects within the same cluster demonstrate maximum similarity and objects within different clusters demonstrate maximum dissimilarity. Numerous clustering theories and methods have been evaluated in the literature (see Jain and Dubes [10] and Kaufman and Rousseeuw [11]). In general, the most well-known approaches are partitional clustering methods based on an objective function of similarity or dissimilarity measures. In partitional clustering methods, the k-means (see MacQueen [14] and Pollard [20]), fuzzy c-means (FCM) (see Bezdek [2] and Yang [23]), and possibilistic c-means (PCM) algorithms (see Krishnapuram and Keller [12], Honda et al. [8], and Yang and Lai [24]) are the most commonly used approaches. Fuzzy clustering has received considerable attention in the clustering literature. In fuzzy clustering, the FCM algorithm is the most well-known clustering algorithm. Previous studies have proposed numerous extensions of FCM clustering (see Gath and Geva [4], Gustafson and Kessel [5], Hathaway et al. [6], Honda and Ichihashi [7], Husseinzadeh Kashan et al. [9], Miyamoto et al. [15], Pedrycz [17], Pedrycz and Bargiela [18], Wu and Yang [22], Yang et al. [25], and Yu and Yang [26]). Regarding the generalization of FCM clustering, Yu and Yang [26] proposed a generalized FCM (GFCM) model to unify numerous variations of FCM. However, initializations affect FCM clustering and its generalizations. In this paper, we evaluated a bias-correction approach by using an updating equation to adjust the effects of initial values and then propose the bias-correction fuzzy clustering methods. ⇑ Corresponding author. E-mail address: [email protected] (M.-S. Yang). http://dx.doi.org/10.1016/j.ins.2015.03.006 0020-0255/Ó 2015 Elsevier Inc. All rights reserved.

M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162

139

The rest of this paper is organized as follows. Section 2 presents a brief review of the FCM and GFCM algorithms. Section 3 presents the procedures involved in deriving the bias-correction fuzzy clustering algorithms. In these procedures, a bias-correction term is first assessed using an updating equation. The bias-correction FCM (BFCM), inter-cluster separation (ICS), and Gustafson and Kessel (GK) algorithms are then proposed. The bias-correction term is used as the total information for fuzzy c-partitions so that the proposed BFCM, GK, and ICS algorithms can be used to adjust gradually the effects of poor initializations. Section 4 presents comparisons between different clustering algorithms. In the comparisons, the number of optimal clustering results, error rates and root mean squared errors (RMSEs) are used as performance evaluation criteria. Numerical and real data sets are used to demonstrate the effectiveness and usefulness of the proposed bias-correction algorithms. Finally, conclusions and discussion are stated in Section 5. 2. Fuzzy clustering algorithms Let X ¼ fx1 ; . . . ; xn g be a set of n data points in an s-dimensional real Euclidean space. Let c be a positive integer greater than one. The FCM objective function [2,23] is expressed as follows:

J m ðl; aÞ ¼

n X c X

lmik kxk  ai k2

ð1Þ

k¼1 i¼1

where m > 1 is the weighting exponent, a ¼ fa1 ; . . . ; ac g is the set of cluster centers, and the membership lik represents the degree to which the data point xk belongs to a cluster i with

(

 X c

l ¼ ½lik cn 2 Mfcm ¼ l ¼ ½lik cn 

 i¼1

lik ¼ 1; lik P 0; 0
0 Give an initial að0Þ and let t ¼ 0. Step 2: Compute the membership lðtþ1Þ with aðtÞ using Eq. (3). Step 3: Update the cluster center aðtþ1Þ with lðtþ1Þ using Eq. (2). Step 4: Compare aðtþ1Þ to aðtÞ in a convenient matrix norm kk.   IF aðtþ1Þ  aðtÞ  < e, STOP ELSE t ¼ t þ 1 and return to step 2.

The FCM algorithm is the most commonly used clustering algorithm. Numerous generalizations of the FCM algorithm exist. Yu and Yang [26] proposed a unified model, called the generalized FCM (GFCM). The GFCM objective function is expressed as follows:

J hm ðl; aÞ ¼

" n X c X

lmik hi ðdðxk ; ai ÞÞ 

k¼1 i¼1

c   cX

c

h0 d ai ; aj



# ð4Þ

j¼1

Pc

i¼1 lik ¼ f k for f k P 0; c P 0 are constant weights; hi ðxÞ; i ¼ 0; 1; . . . ; c are continuous functions of x 2 ½0; þ1Þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 0 satisfying its derivative hi ðxÞ > 0 for all x 2 ½0; þ1Þ, and dðxk ; ai Þ is the distance between the data point xk and the cluster center ai . The GFCM framework enables modeling numerous FCM variants. By Lagrange multiplier, the necessary conditions

where

for a minimum of J hm ðl; aÞ are obtained as follows:

140

M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162

Pn ai ¼

m 0 k¼1 ik hi ðdðxk ; ai ÞÞxk Pn m 0 k¼1 ik hi ðdðxk ; ai ÞÞ

l

l

 0  j¼1 h0 d ai ; aj aj  0  2c Pc j¼1 h0 d ai ; aj c

 2cc 

Pc

ð5Þ

1

ðh ðdðx ; ai ÞÞÞm1 1 m1 j¼1 hj dðxk ; aj Þ

lik ¼ f k Pc i  k

ð6Þ

The iterations with updating Eqs. (5) and (6) are called the GFCM algorithm. Although the FCM and GFCM algorithms return excellent clustering results when optimal initial values are provided, these fuzzy clustering algorithms are always affected by initial values. An example illustrating this phenomenon is presented in the next section. 3. Bias-correction fuzzy clustering algorithms In general, the FCM and GFCM algorithms are affected by initializations; that is, the FCM and GFCM algorithms may return poor clustering results when poor initializations are used. Therefore, to overcome this drawback of the FCM algorithm and its generalizations, such as the GFCM algorithm, we propose a bias-correction term for reducing the effects of poor initializaPc tions. First, consider a probability mass pi for the cluster center ai ; i ¼ 1; . . . ; c with i¼1 pi ¼ 1; the probability mass pi can be used to represent the proportion of the cluster center ai to the c clusters. Theoretically, the term  lnðpi Þ can represent the information on the occurrence of the cluster center ai . The total information based on fuzzy c-partitions lik can thus be P P expressed as  nk¼1 ci¼1 lm ik lnðpi Þ, which can be denoted as entropy. An optimal pi can be determined by minimizing the P P entropy to obtain the most information for pi . In general, w nk¼1 ci¼1 lm ik lnðpi Þ is used as a bias correction term to the GFCM objective function shown in Eq. (4) as follows:

J hm ðl; a; pÞ ¼

" n X c X

lmik hi ðdðxk ; ai ÞÞ 

k¼1 i¼1

c   cX

c

h0 d ai ; aj

j¼1



#

n X c X w lmik lnðpi Þ

ð7Þ

k¼1 i¼1

By Lagrange multiplier, we can get the necessary conditions for minimum of J hm ðl; a; pÞ as follows:

Pn

ai ¼

 

P



lmik h0i ðdðxk ; ai ÞÞxk  2cc cj¼1 h00 d ai ; aj aj  0  2c Pc m 0 k¼1 lik hi ðdðxk ; ai ÞÞ  c j¼1 h0 d ai ; aj

k¼1

ð8Þ

Pn

1

ðh ðdðxk ; ai ÞÞ  w lnðpi ÞÞm1 1 m1 j¼1 hj ðdðxk ; aj ÞÞ  w lnðpj Þ

lik ¼ f k Pc i

ð9Þ

Pn lm Pc ik pi ¼ Pn k¼1 k¼1

j¼1

ð10Þ

lmjk 1

For Eq. (9), we determine that, if w ! 1, then

1

lik ¼ f k Pcð lnðpi ÞÞm1 1 . If w ! 0, then lik ¼ f k Pcðhi ðdðxk ;ai ÞÞÞm1 1 , as shown in

ð lnðpj ÞÞm1 ðhj ðdðxk ;aj ÞÞÞm1 j¼1 j¼1 Eq. (6) of the GFCM algorithm. Therefore, an updating equation may be used for the parameter w with

wðtÞ ¼ ð0:99Þt

ð11Þ

where t is the number of iterations. Thus, the iterated algorithm with updating Eqs. (8)–(10) and decreasing learning rate of the updating Eq. (11) are called the bias-correction GFCM algorithm. Next, the three types of bias-correction GFCM algorithms, which are the BFCM, bias-correction GK (BGK), and bias-correction ICS (BICS) algorithms, are assessed. 3.1. Bias-correction FCM algorithm The bias-correction FCM (BFCM) objective function is expressed as follows:

J m ðl; a; pÞ ¼

n X c X

n X c X

k¼1 i¼1

k¼1 i¼1

lmik kxk  ai k2  w

lmik lnðpi Þ

ð12Þ

P P subject to ci¼1 lik ¼ 1 and ci¼1 pi ¼ 1. Thus, the BFCM algorithm is iterated under the necessary conditions by using the following updating equations:

Pn lmik xk ai ¼ Pk¼1 n m k¼1

ð13Þ

lik

1  m1 kxk  ai k2  w lnðpi Þ lik ¼  1  Pc  xk  aj 2  w lnðpj Þ m1 j¼1 Pn lm Pn ik m pi ¼ Pc k¼1

j¼1

k¼1

ljk

ð14Þ

ð15Þ

M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162

141

Similarly, we also consider the same decreasing learning rate with the updating Eq. (11) for the parameter w. The BFCM algorithm can be summarized as follows: BFCM algorithm Step 1: Fix 2 6 c 6 n and fix any e > 0   Give an initial að0Þ ; pð0Þ ¼ 1c ; 1c ; . . . ; 1c and let t ¼ 0; wð0Þ ¼ 1. Step 2: Learn the parameter wðtÞ using Eq. (11). Step 3: Compute the membership lðtþ1Þ with aðtÞ and pðtÞ using Eq. (14). ðtþ1Þ

Step 4: Compute the probability weight pi

using Eq. (15);

ðtþ1Þ

with lðtþ1Þ using Eq. (13). Step 5: Update the cluster center a ðtþ1Þ ðtÞ to a in a convenient matrix norm kk. Step 6: Compare a   IF aðtþ1Þ  aðtÞ  < e, STOP ELSE t ¼ t þ 1 and return to step 2.

3.2. Bias-correction GK algorithm Obviously, using the Euclidean distance as a distance measure can lead to optimal results only when a data set containing spherical clusters is used. The FCM algorithm is not ideal for analyzing a data set containing clusters with different shapes. To overcome this drawback, Gustafson and Kessel [5] assessed the effects of different cluster shapes by replacing the Euclidean  2 (squared) distance dðxj ; ai Þ ¼ xj  ai  in the FCM algorithm with the Mahalanobis distance dðxj ; ai Þ ¼ kxj  ai k2 ¼ Ai

ðxj  ai ÞT Ai ðxj  ai Þ and proposed the GK algorithm. The GK objective function is expressed as follows:

J GK m ðl; a; AÞ ¼

n X c X j¼1 i¼1





lmij xj  ai 2Ai

ð16Þ

l 2 Mfcn ; a ¼ ða1 ; . . . ; ac Þ 2 Rcs and A ¼ fA1 ; . . . ; Ac g which Ai is positive definite with detðAi Þ ¼ qi . The necessary conditions for minimizing J GK m ðl; a; AÞ are the following updating equations: with

Pn lmik xk ai ¼ Pk¼1 n m k¼1

ð17Þ

lik

2=ðm1Þ

kxk  ai kAi

lik ¼ Pc   j¼1

ð18Þ

2=ðm1Þ xk  aj A j

Pn 1=s T Pn m m with Ai ¼ ðqi detðSi ÞÞ S1 i , Si ¼ k¼1 lik ðxk  ai Þðxk  ai Þ k¼1 lik ; i ¼ 1; . . . ; c. Thus, the GK algorithm can be created according to the updating Eqs. (17) and (18).   P P By adding the bias-correction term w nk¼1 ci¼1 lm ik lnðpi Þ , the BGK objective function can be expressed as follows:

J m ðl; a; pÞ ¼

n X c X

n X c X

k¼1 i¼1

k¼1 i¼1

lmik kxk  ai k2Ai  w

lmik lnðpi Þ

ð19Þ

P P subject to ci¼1 uik ¼ 1 and ci¼1 pi ¼ 1. Thus, the BGK algorithm is iterated under the necessary conditions by using the following updating equations:

Pn

lmij xj m j¼1 lij Pn lm Pn ik m pi ¼ Pc k¼1 j¼1 k¼1 ljk j¼1

ai ¼ Pn

lik

ð20Þ ð21Þ

1  m1 kxk  ai k2Ai  w lnðpi Þ ¼  1  Pc  xk  aj 2  w lnðp Þ m1

j¼1

Aj

ð22Þ

j

Pn 1=s T Pn m m with Ai ¼ ðqi detðSi ÞÞ S1 k¼1 lik ðxk  ai Þðxk  ai Þ k¼1 lik ; i ¼ 1; . . . ; c. Similarly, in this study, we used the updating i , Si ¼ Eq. (11) with the w parameter. Thus, the BGK algorithm can be summarized as follows:

142

M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162

BGK algorithm Step 1: Fix 2 6 c 6 n and fix any e > 0   Give an initial að0Þ ; pð0Þ ¼ 1c ; 1c ; . . . ; 1c and let t ¼ 0; wð0Þ ¼ 1. Step 2: Learn the parameter wðtÞ using Eq. (11). Step 3: Compute the membership lðtþ1Þ with aðtÞ and pðtÞ using Eq. (22). ðtþ1Þ

Step 4: Compute the probability weight pi

using Eq. (21);

Step 5: Update the cluster center aðtþ1Þ with lðtþ1Þ using Eq. (20). Step 6: Compare aðtþ1Þ to aðtÞ in a convenient matrix norm kk.   IF aðtþ1Þ  aðtÞ  < e, STOP ELSE t ¼ t þ 1 and return to step 2. 3.3. Bias-correction ICS algorithm 2     For the GFCM objective function J hm ðl; aÞ, if we have hi ðdðxk ; ai ÞÞ ¼ kxk  ai k2 and h0 d ai ; aj ¼ ai  aj  , then we can obtain the objective function of inter-cluster separation (ICS) fuzzy clustering proposed by Özdemir and Akarun [16] as follows: n X c 1X l; aÞ ¼ n k¼1 i¼1

J ICS m ð

l

m ik kxk

2

 ai k 

c  cX 

c

! 2  ; ai  aj

cP0

ð23Þ

j¼1

The necessary conditions for minimizing J ICS m ðl; aÞ are the following updating equations: 1

ai ¼ n

Pn

2c Pc m k¼1 ik xk  c j¼1 aj P n 1 m  2 k¼1 ik n

l

l

ð24Þ

c

2

kx  ai km1  2 m1 j¼1 xk  aj

lik ¼ Pc k 

ð25Þ

  P P By adding the bias-correction term w nk¼1 ci¼1 lm ik lnðpi Þ , the bias-correction ICS (BICS) objective function can be expressed as follows:

J BICS m ð

l; a; pÞ ¼

n X c X

l

m ik kxk

2

 ai k 

k¼1 i¼1

c  cX 

c

2 ai  aj 

!

n X c X w lmik lnðpi Þ;

j¼1

cP0

ð26Þ

k¼1 i¼1

The necessary conditions for minimizing J BICS m ðl; a; pÞ are the following updating equations: 1

ai ¼ n

Pn

2c Pc m k¼1 ik xk  c j¼1 aj P n 1 m  2 k¼1 ik n

l

l

ð27Þ

c

1  m1 kxk  ai k2  w lnðpi Þ lik ¼  1  Pc  xk  aj 2  w lnðp Þ m1 j j¼1 Pn lm Pn ik m pi ¼ Pc k¼1

j¼1

k¼1

ð28Þ

ð29Þ

ljk

Thus, the BICS algorithm can be summarized as follows: BICS algorithm Step 1: Fix 2 6 c 6 n and fix any e > 0   Give an initial að0Þ ; pð0Þ ¼ 1c ; 1c ; . . . ; 1c and let t ¼ 0; wð0Þ ¼ 1. Step 2: Learn the parameter wðtÞ using Eq. (11). Step 3: Compute the membership lðtþ1Þ with aðtÞ and pðtÞ using Eq. (28). ðtþ1Þ

Step 4: Compute the probability weight pi

using Eq. (29);

Step 5: Update the cluster center aðtþ1Þ with lðtþ1Þ using Eq. (27). Step 6: Compare aðtþ1Þ to aðtÞ in a convenient matrix norm kk.   IF aðtþ1Þ  aðtÞ  < e, STOP ELSE t ¼ t þ 1 and return to step 2.

143

M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162

The main differences between the fuzzy clustering algorithms, such as FCM, ICS and GK, and the proposed bias-correction fuzzy clustering algorithms, such as BFCM, BICS and BGK, are that in the proposed bias-correction fuzzy clustering algoPn m l Pc ik m and the fuzzy c-partirithms, the probability mass pi for the cluster center ai is calculated using the equation pi ¼ Pn k¼1 k¼1

tion

j¼1

ljk

lik is updated by replacing the distance between the data point xk and the cluster center ai , such as kxk  ai k2 , with

(distance w lnðpi ÞÞ, such as kxk  ai k2  w lnðpi Þ. Furthermore, in the proposed bias-correction fuzzy clustering algorithms, an updating equation is used for the w parameter with wðtÞ ¼ ð0:99Þt to ensure that the bias-correction decreases after more iteration steps are conducted. In this sense, the proposed bias-correction fuzzy clustering algorithms can gradually adjust the bias caused by the effects of poor initializations. We subsequently conducted experiments to demonstrate the robustness of the proposed bias-correction fuzzy clustering algorithms to initializations.

Data set

7 6 5 4 3 2 1 0 -1 -2

0

2

4

6

8

Fig. 1. Ten-cluster data set where ‘‘’’ denotes the true cluster centers.

The trace of the initializations for FCM

7

1

2

3

4

5

6 5 4 3 2 1 0

9

10

6

7

8

-1 0

4

2

6

8

Fig. 2. Trajectories of initializations for FCM, where the numbers 1–10 denote 10 initializations and ‘‘’’ denotes the true cluster centers.

The trace of the initializations for BFCM 7

1

2

3

4

5

6 5 4 3 2 1 0

9

10

6

7

8

-1 0

2

4

6

8

Fig. 3. Trajectories of initializations for BFCM, where the numbers 1–10 denote 10 initializations and ‘‘’’ denotes the true cluster centers.

144

M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162

Data set

6

Iteration =1

6

5

5

4

4

4

3

3

3

2

2

2

1

1

1

5

0

0

0

-1

-1

-1

-2 -1

0

1

2

3

4

5

6

7

-2 -1

0

1

2

3

4

5

6

-2 -1

7

(a)

5

5

4

4

4

3

3

3

2

2

2

1

1

1

0

0

0

-1

-1

-1 -2

1

2

3

4

5

6

7

-2 -1

0

1

(d)

2

3

4

5

6

7

5

5

4

4

4

3

3

3

2

2

2

1

1

1

0

0

0

-1 0

1

2

3

4

5

6

-2 -1

7

0

1

(g)

2

3

4

5

6

7

5

4

4

3

3

3

2

2

2

1

1

1

0

0

0

1

2

(j)

2

3

4

5

6

7

3

1

2

3

6

7

4

5

6

7

4

5

6

7

FCM

5

7

4

8

3

69

2

-1 0

0

-1

6 5

-1

5

Iteration =60

-1 -2

5

4

4

(i)

Iteration =74

6

-2 -1

1

(h)

Iteration =70

6

0

6

5

-1

3

(f)

Iteration =50

6

-2 -1

-1

(e)

Iteration =40

6

2

Iteration =30

6

5

0

1

(c)

Iteration =20

6

-2 -1

0

(b)

Iteration =10

6

Iteration =5

6

1

-1 -2 -1

0

1

2

(k)

3

4

5

6

7

-1

0

1

2

3

4

5

6

(l)

Fig. 4. (a) Nine-cluster data set with 9 initial cluster centers; (b)–(j) cluster centers after 1, 5, 10, 20, 30, 40, 50, 60 and 70 iterations, respectively, from FCM; (k) convergent cluster centers after 74 iterations from FCM; and (l) final clustering results from FCM.

145

M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162

3.4. Analysis on influence of initializations   P P We examined the correction behavior of the bias-correction term w nk¼1 ci¼1 lm ik lnðpi Þ for initializations. We observed the bias-correction term in mathematical form and derived a term with a positive value to ensure that the BFCM objective function J m ðl; a; pÞ is greater than the FCM objective function J m ðl; aÞ. However, the bias-correction term presents the total information on the occurrence of the cluster centers with weighted fuzzy c-partitions. This is analogous to heating the cost function so that it becomes more flexible for adjusting the poor initializations and then cools down the system by a decreasing learning rate of wðtÞ ¼ ð0:99Þt . To analyze the influence of initializations, we present some examples to demonstrate the analysis. Because the BFCM, BGK, and BICS algorithms all involve adding the same bias-correction term, we present initialization analysis only for the BFCM algorithm. We first present the correction behavior for the bias-correction term by using the moving routes of cluster centers during iterations in Example 1. We then present more examples that involve analyses of the correction behavior of the bias-correction term by using mean squared errors (MSEs). Example 1. We used a simulated data set in this example. As illustrated in Fig. 1, the data set comprised 10 clusters and each cluster comprised 100 data points; ‘‘’’ denotes the true cluster centers. We choose the 10 initial cluster centers ð0Þ

ð0Þ

ð0Þ

ð0Þ

ð0Þ

ð0Þ

ð0Þ

ð0Þ

ð0Þ

a1 ¼ ð0; 7:5Þ, a2 ¼ ð2; 7:5Þ, a3 ¼ ð4; 7:5Þ, a4 ¼ ð6; 7:5Þ, a5 ¼ ð8; 7:5Þ, a6 ¼ ð0; 0Þ, a7 ¼ ð3:5; 0Þ, a8 ¼ ð7; 0Þ, a9 ¼ ð0; 1:5Þ, ð0Þ a10

¼ ð7; 1:5Þ for both FCM and BFCM algorithms, where the numbers 1–10 denotes 10 initializations, as shown in Figs. 2 and 3. ð0Þ

Fig. 2 depicts the initialization trajectories for the FCM algorithm, indicating that the two initializations at a1 ¼ ð0; 7:5Þ and ð0Þ a2

¼ ð2; 7:5Þ approach the same cluster center; therefore, no initialization approached the true cluster center ð3:5; 0:5Þ. Fig. 3 shows the initialization trajectories for the BFCM algorithm, indicating that some routes are affected by the bias correction; therefore, all initializations finally approached the true cluster center. These trajectories demonstrate the effects of initializations on the bias-correction term. In summary, the BFCM algorithm is actually more robust against initializations than the FCM is. Example 2. We used a simulated data set in this example as well. The data set comprised nine clusters and each cluster has ð0Þ

ð0Þ

100 data points, as shown in Figs. 4(a) and 6(a). We choose the 9 initial cluster centers a1 ¼ ð3:4; 1:6Þ, a2 ¼ ð3:3; 0:6Þ, ð0Þ a3

ð0Þ a4

ð0Þ a5

ð0Þ a6

ð0Þ a7

ð0Þ a8

ð0Þ a9

¼ ð2:6; 2:3Þ, ¼ ð0:4; 4:3Þ, ¼ ð4:6; 2:0Þ, ¼ ð3:4; 0:5Þ, ¼ ð5:1; 2:1Þ, ¼ ð3:4; 0:9Þ and ¼ ð0:1; 2:1Þ for both FCM and BFCM algorithms, as shown in Figs. 4(a) and 6(a). Fig. 4 illustrates the centers of these clusters after 1, 10, 30, 60, and 74 iterations in the FCM algorithm; the final clustering results are also shown in this figure. Fig. 5 shows the plot of the MSEs of the FCM algorithm for different iterations. Furthermore, Fig. 6 illustrates the centers of these clusters after 1, 10, 20, 30, 40, 50, 60, 100, 500 and 507 iterations in the BFCM algorithm as well as the final clustering results. Fig. 7 depicts the plot of the MSEs of the BFCM algorithm for different iterations. The plots of the MSEs of both the FCM and BFCM algorithms for the first 70 iterations were incorporated (Fig. 8) for comparison. As shown in Figs. 4, 5, and 8, the FCM algorithm adjusts initializations at an extremely fast rate, thus preventing it from targeting actual cluster centers. The BFCM algorithm gradually adjusts initializations and then targets the true cluster centers after the iteration t = 60 (Figs. 6–8). This behavior indicates the effects of adjusting the initialization on the bias-correction term. In general, the BFCM algorithm is actually more robust against initializations than the FCM algorithm is. ð0Þ

This example is based on Example 2. However, we use nine additional initial cluster centers with a1 ¼ ð1:1; 4:7Þ, ð0Þ

ð0Þ

ð0Þ

ð0Þ

ð0Þ

ð0Þ

ð0Þ

a2 ¼ ð4:2; 4:1Þ, a3 ¼ ð1:1; 2:8Þ, a4 ¼ ð4:8; 2:6Þ, a5 ¼ ð1:6; 4:6Þ, a6 ¼ ð0:8; 1:2Þ, a7 ¼ ð3:1; 3:7Þ, a8 ¼ ð2:3; 1:7Þ and ¼ ð1:6; 2:7Þ for both FCM and BFCM algorithms. We determined that both the FCM and BFCM algorithms returned

FCM

1.6 1.4 1.2

MSE

ð0Þ a9

1 0.8 0.6 0.4 0.2

0

10

20

30

40

50

60

70

numbers of iteration Fig. 5. Plot of MSEs from FCM for different iterations.

80

146

M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162

Data set

6

6

Iteration =1

Iteration =10

6

5

5

5

4

4

4

3

3

3

2

2

2

1

1

1

0

0

0

-1 -2 -1 0

-1 -2 -1 0 1 2 3 4 5 6 7

-1 -2 -1 0

1

2

3

4

5

6

7

(a) Iteration =20

6

6

Iteration =30

5

5

4

4

3

3

3

2

2

2

1

1

1

0

0

0

-1 -2 -1 0

-1 -2 -1 0

2

3

4

5

6

7

1

(d)

3

4

5

6

7

6

Iteration =60

5

4

4

4

3

3

3

2

2

2

1

1

1

0

0

0

2

3

4

5

6

-1 -2 -1 0

7

1

(g)

3

4

5

6

7

-1 -2 -1 0

6

Iteration =507

5 4

4

3

3

3

2

2

2

1

1

1

0

0

0

1

2

(j)

3

4

5

6

7

-1 -2 -1 0

7

2

3

4

5

6

7

2

3

4

5

6

7

BFCM

6

4

-1 0

6

(i)

5

-1 -2

1

(h)

Iteration =300

6

2

5

Iteration =100

6

5

1

4

(f)

5

-1 -2 -1 0

1

(e)

Iteration =50

6

2

3

Iteration =40

6

4

1

2

(c)

5

-1 -2 -1 0

1

(b)

5

3 333 3 33 3 333333 3 3 33 33333 33 3333 333333 333 33 3 33 33333333 33 33333 33 3 333 3333 33 3 33333 3 3 2 2 22 22222222 2 22 2 2 22 222222222 222222 222222222 222 222 2 222222 222 222 222222222 2 2 2 22222 2222

1 111 11111 1 11111111111 11 1 1111 11 11111 1111 1 1111 1 11111 1 11 11111111 1 111 1 1111111111 11 11 11

6 6666666 666 6 66 6 666666666 666 6666666 6 66666 6666666 666666 6 66666666666 6 666666 6 6 6 6 4 44 444 44444444 444 444 44 4 4 444444444 4 4444444 4444 44 44 44 44 44444 4 444444 4 44 4 444 44 444 4 4 55 55 5 55555 55 55 55 5555555 55555 5 5 55 555555 55 55 555 55 555 555 5555 5 55 555 55 5 5 5 5 55555555

7 7 77777777777777 777777777777 7 7777777 7777777 777 77 777777 7 77 7 77 777777 7 77 7777 9

9

9 99 9 999 9999 9 9 9999999 9999 9999 999999 999999 9999999 99999 9999999 9 999 9 99999 9999 9 9 99999

8 8 88 8888 8 8888 8 88 88 88 8888 8888888 88 8888888888 888888888 88 88 888 8 888888 88888 8888 8 8 8

-1 1

2

3

(k)

4

5

6

7

0

2

4

6

(l)

Fig. 6. (a) Nine-cluster data set with 9 initial cluster centers; (b)–(j) cluster centers after 1, 10, 20, 30, 40, 50, 60, 100 and 500 iterations, respectively, from BFCM; (k) convergent cluster center after 507 iterations from BFCM; and (l) final clustering results from BFCM.

147

M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162

BFCM 2 1.5

MSE

1 0.5 0 -0.5

0

100

50

150

200

250

300

350

400

450

500

numbers of iteration Fig. 7. Plot of MSEs from BFCM.

comparisons of FCM and BFCM 2

MSE

1.5 1 0.5 0 -0.5

FCM BFCM

0

10

20

30

40

50

60

70

numbers of iteration Fig. 8. Comparisons between MSEs from FCM and BFCM for the first 70 iterations.

comparisons of FCM and BFCM 1.5

MSE

1

0.5

0

-0.5 FCM BFCM

0

1

2

3

4

5

6

7

8

various values of initializations Fig. 9. MSE results from FCM and BFCM with various initializations.

148

M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162

Table 1 ð0Þ MSEs of FCM and BFCM with different initializations in a1 . a1

(1.1, 0.7)

(1.1, 1.7)

(1.1, 2.7)

(1.1, 3.7)

(1.1, 4.7)

(1.1, 5.7)

(1.1, 6.7)

(1.1, 7.7)

FCM BFCM

0.3417 0.0007

0.0007 0.0007

0.5183 0.0007

0.5934 0.0007

0.0007 0.0007

0.6210 0.0007

0.6207 0.0007

0.8709 0.0007

ð0Þ

t

t

(0.9) & Iteration =78

t

(0.99) & Iteration =329

6

6

6

5

5

5

4

4

4

3

3

3

2

2

2

1

1

1

0

0

0

-1 -2 -1 0

1

2

3

4

5

6

-1 -2 -1 0

7

1

(a)

2

3

4

5

6

7

-1 -2 -1 0

5

5

5

4

4

4

3

3

3

2

2

2

1

1

1

0

0

0

4

5

6

-1 -2 -1 0

7

(d)

1

2

4

5

6

7

(0.999999) & Iteration =224 6

3

3

t

(0.99999) & Iteration =224 6

2

2

(c)

t

t

(0.9999) & Iteration =5268

1

1

(b)

6

-1 -2 -1 0

(0.999) & Iteration =1933

3

4

5

6

7

-1 -2 -1 0

(e)

1

2

3

4

5

6

7

(f)

Fig. 10. BFCM convergent cluster centers using different updating equations for the parameter w with their convergent iterations; (a) w = (0.9)t, 78 iterations; (b) w = (0.99)t, 329 iterations; (c) w = (0.999)t, 1933 iterations; (d) w = (0.9999)t, 5268 iterations; (e) w = (0.99999)t, 224 iterations; and (f) w = (0.999999)t, 224 iterations.

optimal final clustering results with extremely low MSEs (Fig. 9, fifth point). The final clustering results of the FCM algorithm may be changeable, even if the initialization has a small change. However, the BFCM algorithm still has stable final clustering results with an extremely low MSE. These procedures were conducted by assigning various values to the second component ð0Þ

ð0Þ

ð0Þ

of a1 , as shown in Table 1. For example, a1 ¼ ð1:1; 0:7Þ is the first point and a1 ¼ ð1:1; 4:7Þ is the fifth point with their respective MSEs shown in Table 1 and Fig. 9. As shown in Table 1 and Fig. 9, the FCM algorithm is sensitive to initializations and its final clustering results can be changed. However, the BFCM algorithm is not sensitive to initializations and its final clustering results are stable. In this example, we studied different learning behaviors for the parameter w. We executed the BFCM algorithm for the data set employed in Example 2. We assessed the learning approach by using the updating equation w ¼ ð0:9Þt , and the results indicated that initializations are adjusted at an extremely fast rate, thus preventing them from targeting optimal cluster centers, as shown in Fig. 10(a). Similarly, the clustering results (Fig. 10(e) and (f)) obtained using the updating equations w ¼ ð0:99999Þt and w ¼ ð0:999999Þt , respectively, are not optimal; this is because initializations are adjusted at an extremely slow rate, thus preventing them from approaching optimal clustering centers. Regarding the clustering results (Fig. 10(b)–(d)) obtained using the updating equations w ¼ ð0:99Þt , w ¼ ð0:999Þt and w ¼ ð0:9999Þt , initializations can be adjusted and targeted to optimal clustering centers. However, the MSE of the updating equation w ¼ ð0:99Þt is lower than that of w ¼ ð0:999Þt and w ¼ ð0:9999Þt , as shown in Table 2 and Fig. 11. In general, we recommend using the updating equation w ¼ ð0:99Þt as a decreasing learning approach for the parameter w in the bias-correction term.

149

M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162 Table 2 MSEs of BFCM and iteration number in various updating equations of w. w

(0.9)t

(0.99)t

(0.999)t

(0.9999)t

(0.99999)t

(0.999999)t

MSE iteration

0.5642 78

0.0007 329

0.0011 1933

0.0416 5268

0.2245 224

0.2259 224

1 0.8

(0.9)t

MSE

0.6

(0.999999) t (0.99999) t

0.4 (0.99)t

0.2

(0.9999) t (0.999) t

0 -0.2 -0.4 1

2

3

4

5

6

w Fig. 11. MSEs of BFCM in various updating equations of w.

3.5. Convergence properties of the BFCM clustering algorithm We assessed a convergence theorem for the proposed BFCM clustering algorithm; that is, we can guarantee that any BFCM convergent subsequence will tend to optimal solutions. Zangwill’s convergence theorem [27] must be applied first. Zangwill [27] originally defined a point-to-set map as T : V ! PðVÞ, where PðVÞ represents the power set of V; a closed point-to-set map must be defined. However, the algorithms of interest here is a point-to-point map and the ‘‘closed’’ property is exactly ‘‘continuity’’ for the case of point-to-point map. Thus, the Zangwill’s convergence theorem [27] is given as follows: Zangwill’s convergence theorem [27]. Let the point-to-point map T : V ! V generate a sequence fzk g1 k¼0 by zkþ1 ¼ Tðzk Þ. Let a solution set X  V be given, and suppose that: (1) There is a continuous function Z : V ! R such that (a) if z R X, then ZðTðzÞÞ < ZðzÞ, and (b) if z 2 X, then ZðTðzÞÞ 6 ZðzÞ. (2) The map T is continuous on V n X. (3) All point zk are contained in a compact set S # V. Then the limit of any convergent subsequence shall be in the solution set X, and Zðzk Þ will monotonically converge to ZðzÞ for some z 2 X. P P



P We set that M fcm ¼ l ¼ ½lik cn  ci¼1 lik ¼ 1; lik P 0; 0 < nk¼1 lik < n , M p ¼ p ¼ ½pi c1  ci¼1 pi ¼ 1; pi P 0 and T

a ¼ ðaT1 ; . . . ; aTc Þ . Let

 8 9  8l 2 M fcm ; l – l; J m ðl ; a ; p Þ < J m ðl; a ; p Þ > >  < =  XF ¼ ðl ; a ; p Þ 8a – a; Jm ðl ; a ; p Þ < Jm ðl ; a; p Þ  > > :  8p 2 Mp ; p – p; J ðl ; a ; p Þ < J ðl ; a ; pÞ ; m m

where

Pn m l k¼1 ik Pc P n j¼1

1

kx ai k2 w lnðpi ÞÞm1

l ¼ ½lik cn with lik ¼ Pcð k k¼1

j¼1

c

m jk

l

2

kxk aj k

 T T 1 , a ¼ ða1 ; . . . ; ac Þ m1

w lnðpj Þ

. Let E : ðRs Þ  M p ! M fcm with Eða; pÞ ¼ l ¼



lik

, where cn

T

Pn m lik xk with ai ¼ Pk¼1 , and p ¼ ½pi c1 with pi ¼ n m k¼1

lik

1

kx ai k2 w lnðpi ÞÞm1

lik is calculated via lik ¼ Pcð k j¼1

2

kxk aj k

1 . Let m1

w lnðpj Þ

150

M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162

Pn m lik xk T c F : M fcm ! ðRs Þ ; FðlÞ ¼ a ¼ ðaT1 ; . . . ; aTc Þ , where ai is calculated via ai ¼ Pk¼1 . Let G : M fcm ! M p ; GðlÞ ¼ p ¼ ½pi c1 where pi n lm k¼1 ik Pn m l k¼1 ik is calculated by pi ¼ Pc P . We next define the BFCM operator as follows. n m j¼1

Definition

1. The

k¼1

BFCM

ljk

c

c

T m : M fcm  ðRs Þ  M p ! M fcm  ðRs Þ  M p

operator

s c

is

defined

by

T m ¼ A2 A1

where

c

A1 : M fcm  ðR Þ  M p ! M fcm with A1 ðl; a; pÞ ¼ Eða; pÞ and A2 : M fcm ! M fcm  ðRs Þ  M p with A2 ðlÞ ¼ ðl; FðlÞ; GðlÞÞ. Thus, we have

T m ðl; a; pÞ ¼ ðA2 A1 Þðl; a; pÞ ¼ A2 ðA1 ðl; a; pÞÞ ¼ A2 ðEða; pÞÞ ¼ ðEða; pÞ; FðEða; pÞÞ; GðEða; pÞÞÞ ¼ ðl ; a ; p Þ where l ¼ Eða; pÞ, a ¼ FðEða; pÞÞ ¼ Fðl Þ and p ¼ GðEða; pÞÞ ¼ Gðl Þ. In general, the sufficient and necessary condition for a strict minimizer of an objective function is to analyze the Jacobian matrix and Hessian matrix. However, if some constraints are considered, Lagrange’s multipliers in addition to a bordered Hessian matrix must be assessed as follows. Theorem 1 (Lagrange’s theorem; see Werner and Sotskov [21], pp. 425–426). Let functions f : Df ! R; Df # Rn , and g i : Dg i ! R, Dgi # Rn ; i ¼ 1; . . . ; t; t < n, be continuously partially differentiable and let x0 ¼ ðx01 ; x02 ; . . . ; x0n Þ 2 Df be a local extreme point of the function f subject to the constraints g i ðx1 ; x2 ; . . . ; xn Þ ¼ 0; i ¼ 1; 2; . . . ; t. Let Lðx; kÞ ¼ f ðx1 ; x2 ; . . . ; xn Þþ Pt i¼1 ki g i ðx1 ; x2 ; . . . ; xn Þ and

 @g ðxÞ  1  @x1      jJ j ¼  ...   @gt ðxÞ  @x  1



@g 1 ðxÞ  @xt 

 ..  – 0 .  @g t ðxÞ   @x t

at the point x0 . Then, we have that the gradient of Lðx; kÞ at the point ðx0 ; k0 Þ is 0, i.e. rLðx0 ; k0 Þ ¼ 0. Theorem 2 (local sufficient conditions; see Werner and Sotskov [21], pp. 426–427). Let functions f : Df ! R; Df # Rn ,and g i : Dgi ! R; Dgi # Rn ; i ¼ 1; . . . ; t; t < n, be twice continuously partially differentiable and let ðx0 ; k0 Þ with x0 2 Df be a solution of the system rLðx0 ; k0 Þ ¼ 0. Let

2

0

6 6 . 6 .. 6 6 6 0 6 HL ðx; kÞ ¼ 6 2 6 @L 6 @x1 @k1 6 6 . 6 .. 4 2

@ L @xn @k1

0

@2 L @k1 @x1

.. .

.. .



0

@2 L @kt @x1



@2 L @x1 @kt

@2 L @x1 @x1





.. .

.. .

2

2

@ L @xn @kt

@ L @xn @x1



@2 L @k1 @xn

3

7 7 7 7 7 @2 L 7    @kt @xn 7 7 2    @x@1 @xL n 7 7 7 .. 7 . 7 5 2    @x@n @xL n .. .

  be the bordered Hessian and consider its leading principle minors Hr ðx0 ; k0 Þ of the order r ¼ 2t þ 1; 2t þ 2; . . . ; n þ t at point  0 0 x ; k . Therefore, the following expressions can be derived:   (1) If all leading principle minors, Hr ðx0 ; k0 Þ; 2t þ 1 6 r 6 n þ t, have the sign ð1Þt , then x0 ¼ ðx01 ; x02 ; . . . ; x0n Þ is a local minimum point of function f subject to the constraints g i ðxÞ ¼ 0; i ¼ 1; 2; . . . ; t.     (2) If the signs of all leading principle minors Hr ðx0 ; k0 Þ; 2t þ 1 6 r 6 n þ t, are alternated, the sign of Hnþt ðx0 ; k0 Þ ¼   HL ðx0 ; k0 Þ being that of ð1Þn , then x0 ¼ ðx0 ; x0 ; . . . ; x0 Þ is a local maximum point of function f subject to the constraints 1

2

n

g i ðxÞ ¼ 0; i ¼ 1; 2; . . . ; t. (3) If neither the conditions of (1) nor those of (2) are satisfied, then x0 is not a local extreme point of function f subject to the constraints g i ðxÞ ¼ 0; i ¼ 1; 2; . . . ; t. Here, the case in which one or several leading principal minors have a value of zero is not considered a violation of condition (1) or (2).

Remark 1 (see Werner and Sotskov [21], p. 379). The leading principal minors of matrix A ¼ ðaij Þ of order n  n are the determinants

151

M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162

  a11  a  21 Dk ¼  .  ..  a k1

    a1k     a2k  ; ..   .   a 

a12 a22 .. . ak2

k ¼ 1; 2; . . . ; n

kk

i.e. Dk is obtained from jAj by crossing out the last n  k columns and rows. T ^ and p ¼ p ^ ; a; p ^ are fixed, then J m ðl ^Þ is minimized at a ¼ ðaT Lemma 1. If l ¼ l 1 ; . . . ; ac Þ Pn m l x ik k ai ¼ Pk¼1 ; 8i ¼ 1; . . . ; c. n m k¼1

T

if and only if

lik

Proof. Recall that J m ðl; a; pÞ ¼

Pn Pc k¼1

i¼1

Pn Pc

lmik kxk  ai k2  w

Pn n lmik xk @J m X ¼ lmik ð2ðxk  ai ÞÞ ¼ 0 ) ai ¼ Pk¼1 ; n m @ai k¼1 lik k¼1

k¼1

i¼1

lmik lnðpi Þ. With the gradient of Jm w.r.t. ai , we have

8i ¼ 1; . . . ; c

^ and p ¼ p ^ are fixed, then we Thus, we had proved the ‘‘only if’’ condition. We next prove the ‘‘if’’ condition as follows. If l ¼ l  Pn Pn 1; if i ¼ j @J m @ 2 Jm m m have @a ¼ k¼1 lik ð2ðxk  ai ÞÞ and @a @a ¼ 2  dij  Is  k¼1 lik , where dij is Kronecker index with dij ¼ . Thus, i i j 0; if i – j   Pn P P n n m m m ^ ; a; p ^Þ w.r.t. a is 2  diag Is  k¼1 l1k ; Is  k¼1 l2k ; . . . ; Is  k¼1 lck , and obviously, the Hessian the Hessian matrix of J m ðl Pn m lik xk T T  ^ ; a; p ^Þ is minimized at a ¼ ðaT Pk¼1 ; . . . ; a Þ with a ¼ ; 8i ¼ 1; . . . ; c. h matrix is positive definite. That is, J m ðl n 1 c i m k¼1

Lemma 2. If Pn m l k¼1 ik pi ¼ Pc P n j¼1

k¼1

l ¼ l^ and a ¼ a^ are fixed, then Jm ðl^ ; a^; pÞ subject to lm jk

Pc

i¼1 pi

lik

¼ 1 is minimized at p ¼ ½pi c1 if and only if

; 8i ¼ 1; . . . ; c.

Proof. Let the Lagrangian function be

L1 ¼

n X c X

l

m ik kxk

n X c c X X  ai k  w lmik lnðpi Þ þ g pi  1

!

2

k¼1 i¼1

k¼1 i¼1

i¼1

where g is a Lagrangian multiplier. With the gradient of L1 w.r.t. pi and g, we have

8 n X > @L1 m 1 > > > @pi ¼ w lik pi þ g ¼ 0 < k¼1

> > @L1 > > : @g ¼

c X

) pi ¼

pi  1 ¼ 0

n wX

c X

g

i¼1

lmik ;

k¼1

pi ¼

c X n wX

g

lmik ¼ 1

i¼1 k¼1

i¼1

Pn m l k¼1 ik Thus, we have pi ¼ Pc P n

P P , g ¼ w ci¼1 nk¼1 lm ik and the ‘‘only if’’ condition is proved. We next prove the ‘‘if’’ condition by j¼1 Pn m Pn m 2 w l w l @ 2 L1 @ 2 L1 L1 k¼1 ik k¼1 ik 1 ^ are fixed, then @L ^ and a ¼ a ¼ þ g, @p ¼ dij  , and @p ¼ @@g@p ¼ 1. Thus, the Theorem 2 as follows. If l ¼ l @pi pi p2 i @pj i @g i lm k¼1 jk

i

bordered Hessian matrix w.r.t. p and g is

2

0

6 61 6 6 6 6 HL1 ðp; gÞ ¼ 6 1 6 6. 6. 6. 4 1

1

1



@ 2 L1 @p1 @p1

0



0 .. . 0

..

. ..

.



1

3

7 0 7 7 7 .. 7 7 . 7 7 .. 7 7 . 7 5 2

@ L1 @pc @pc

Note that we have only one constraint, i.e. t ¼ 1, and so ð1Þ1 ¼ 1 < 0. We next check all leading principle minors as follows:

152

M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162

 0      H3 ðp ; g Þ ¼  1    1

1 Pn

     0   Pn m  w l  k¼1 2k  p2 1

m k¼1 1k

w

l

p21

0

2

 0    1       H4 ðp ; g Þ ¼   1     1

1 Pn

p¼p ; g¼g

     0       0  Pn m  w l  k¼1 3k  p2

1

m k¼1 1k

w

P  Pn  w k¼1 lm w nk¼1 lm 1k 2k ¼ þ < 0; 2 2 p1 p2 p¼p ; g¼g

l

1

0

p21

w

0

Pn

m k¼1 2k

l

p22

0

0

3

p¼p ; g¼g

Pn  Pn Pn  Pn Pn   2 Pn m m m m m m  w w2 w2 k¼1 l1k k¼1 l2k k¼1 l2k k¼1 l3k k¼1 l3k k¼1 l1k ¼ þ þ   1 < @lik ¼ mlik kxk  ai k  mlik w lnðpi Þ þ kk ¼ 0 1 m1 kk m1  c X ) l ¼ kxk  ai k2  w lnðpi Þ ik @L2 > m lik  1 ¼ 0 > : @kk ¼ i¼1

and

kk 

1 m1

m

¼ Pc

1 1

ðkxk ai k2 w lnðpi ÞÞm1 i¼1

. This implies that

153

M.-S. Yang, Y.-C. Tian / Information Sciences 309 (2015) 138–162

 ik

l

1  m1 kxk  ai k2  w lnðpi Þ ¼  1 ;  Pc  xk  aj 2  w lnðp Þ m1 j j¼1

  kk ¼ mlm1 kxk  ai k2  w lnðpi Þ ik

^ and p ¼ p ^ are fixed, and the ‘‘only if’’ condition is proved. We next prove the ‘‘if’’ condition by following Theorem 2. If a ¼ a   2 @L2 m1 then @ l ¼ mlik kxk  ai k  w lnðpi Þ þ kk , ik

  @ 2 L2 ¼ dij  dkr  mðm  1Þlm2 kxk  ai k2  w lnðpi Þ ; ik @ lik @ ljr Thus, the bordered Hessian matrix w.r.t.

2

0

61 6 6 HL2 ðlk ; kk Þ ¼ 6 . 6 .. 4 1

lk and kk is

1

1 

1

@ 2 L2 @ l1k @ l1k

0 

0

.. .

..

0

3

.. .

.

2

@ L2 @ lck @ lck

0 

@ 2 L2 @ 2 L2 ¼ ¼1 @ lik @kk @kk @ lik

and

7 7 7 7 7 5

Similar as the proof in Lemma 2, we check all leading principle minors as follows:

   0 1  1    2 m2     1 mðm  1Þ l x  a  w lnðp Þ 0 k k   1 k 1  H3 ðl ; k Þ ¼  1k k k       2 m2 1 0 mðm  1Þl2k kxk  a2 k  w lnðp2 Þ  lk ¼lk ; kk ¼kk   2 2 m2 ¼  mðm  1Þlm2