Deriving two-stage learning sequences from ... - Semantic Scholar

Report 10 Downloads 42 Views
Information Sciences 159 (2004) 69–86 www.elsevier.com/locate/ins

Deriving two-stage learning sequences from knowledge in fuzzy sequential pattern mining Yi-Chung Hu a, Gwo-Hshiung Tzeng

b,*

, Chin-Mi Chen

c

a

c

Department of Business Administration, Chung Yuan Christian University, Chungli 320, Taiwan, ROC b Institute of Management of Technology, National Chiao Tung University, 1001 Ta-Hsueh Road, Hsinchu 300, Taiwan, ROC School of Nursing, National Defense Medical Center, Taipei 114, Taiwan, ROC

Received 8 May 2001; received in revised form 30 December 2002; accepted 20 February 2003

Abstract A fuzzy sequential pattern consisting of several fuzzy sets represents a frequently occurring behavior related to time and can be discovered from transaction bases. An example is that large purchase amounts of one product were bought by customers after these consumers had bought small purchase amounts of another product. Recently, Hu et al. (2003) proposed a fuzzy data mining method to discover fuzzy sequential patterns. In this method, consumers’ products preferences and consumers’ product buying orders related to purchase behaviors can be found in the fuzzy sequential pattern mining. Since for each decision problem, there is a competence set consisting of ideas, knowledge, information, and skills for solving that problem, we consider knowledge found in fuzzy sequential pattern mining as a needed competence set for solving one decision problem. This paper uses a known competence set expansion method, the minimum spanning table method, to find appropriate two-stage learning sequences that can effectively acquire individual fuzzy knowledge sets found in the fuzzy sequential pattern mining. A numerical example is used to show the usefulness of the proposed method.  2003 Elsevier Inc. All rights reserved. Keywords: Competence sets; Data mining; Sequential patterns; Fuzzy sets

*

Corresponding author. Tel.: +886-3-5712121x57505; fax: +886-3-5753926. E-mail address: [email protected] (G.-H. Tzeng).

0020-0255/$ - see front matter  2003 Elsevier Inc. All rights reserved. doi:10.1016/S0020-0255(03)00190-7

70

Y.-C. Hu et al. / Information Sciences 159 (2004) 69–86

Nomenclature C1 C2 K k d m AxK;i m m lxK;i m n cr ar

tpðrÞ Lj

set of fuzzy knowledge related to ‘‘consumers’ purchase preference of products’’ set of fuzzy knowledge related to ‘‘consumers’ product buying orders’’ number of partitions in each quantitative attribute length of a fuzzy sequence degree of a given relation, where d P 1 im th linguistic value of K fuzzy partitions defined in quantitative attribute xm , 1 6 im 6 K m membership function of AxK;i m total number of customers rth customer, where 1 6 r 6 n number of consecutive transactions ordered by transaction-time for cr pth transaction corresponding to cr , where tpðrÞ ¼ ðtpðrÞ1 ; tpðrÞ2 ; . . . ; tpðrÞd Þ, and 1 6 p 6 ar jth frequent fuzzy grid in a fuzzy sequence, where 1 6 j 6 b

1. Introduction Sequential pattern mining is the determining of frequently occurring patterns related to time or other sequences [1], where a sequence is an ordered list of itemsets [2]. Actually, sequential patterns can help managers determine which items are bought after other items had been bought [1], or to analyze browsing orders of homepages in a Web site [3]. Recently, Hu et al. [4] have proposed fuzzy sequential pattern mining, and much worthwhile fuzzy knowledge related to consumer purchase behavior, such as ‘‘large amounts of product A were frequently purchased for each consumer’’ or ‘‘small amounts of one product and large amounts of another product were purchased sequentially’’ may be discovered from transaction databases. The former fuzzy representation or term may be roughly interpreted as the consumers’ purchase preference of product A, and the latter fuzzy term may be a fuzzy sequential pattern that roughly represents consumers’ buying order of products. In fact, a fuzzy sequential pattern is derived from consumers’ purchase preference of products and expresses the temporal relation between them. We also find that the fuzzy sequential patterns described by natural languages are well suited for the use by human subjects and will help to increase the flexibility for users in making decisions.

Y.-C. Hu et al. / Information Sciences 159 (2004) 69–86

71

We consider that decision makers can ‘‘learn’’ the above-mentioned fuzzy knowledge as two different types, that is, they can be interested in investigating either consumer behavior or current strategies based on fuzzy sequential patterns. In other words, they can acquire or learn corresponding knowledge from fuzzy sequential patterns. Finally, decision makers should be confident in solving certain decision problems, for example, proposing a more competitive marketing strategy. However, the acquisition of knowledge should be carefully planned rather blindly learned. For example, it seems to be easier to acquire ‘‘algorithms’’ after both ‘‘introduction to computer science’’ and ‘‘data structures’’ have been already acquired, compared to the situation when only ‘‘introduction to computer science’’ has been acquired. That is, how to derive appropriate ‘‘learning’’ sequences from fuzzy knowledge extracted from fuzzy sequential pattern mining for decision makers is an important problem. Competence sets was initiated by Yu [5], and its mathematical foundation was provided by Yu and Zhang [6]. For each decision problem, there is a competence set consisting of ideas, knowledge, information and skills for its satisfactory solution [5–7]. From this viewpoint, we can view knowledge found in fuzzy sequential pattern mining as a needed competence set for solving one decision problem. In order to effectively acquire the needed competence set, it is necessary to find appropriate learning sequences for acquiring those useful patterns, the so-called competence set expansion. Since the set related to ‘‘consumers’ product buying orders’’ (denoted by C2) is derived from the set related to ‘‘consumers’ purchase preference of products’’ (denoted by C1), we assume that it is helpful for decision makers to learn C2 after first learning C1. That is, by treating C1 as an aggregate skill we consider that a two-stage learning sequence is designed to consist of two subsequences: one generated from C1, the other generated from C2. Actually, two-stage learning sequences with minimum costs are derived by a powerful method, the minimum spanning table method (MST), proposed by Feng and Yu [8], since MST is especially powerful for the expansion of set of single skills or terms. From the experimental results, we can see that it is possible to help decision makers effectively acquire a needed competence set found in the fuzzy sequential pattern mining, enabling them to set up strategies for promoting their products or improving their services. It is noted that a compound skill represents a collection of single skills that might be acquired by decision makers [9,10]; however, it is not considered in this paper for simplicity. The rest of this paper is organized as follows. Since the fuzzy sequential pattern mining is developed by the simple fuzzy partition method [11,12], this method is introduced in Section 2. Subsequently, the fuzzy data mining technique for discovering fuzzy sequential patterns is briefly introduced in Section 3, where the generation and representations of C1 and C2 are demonstrated in detail. Section 4 introduces the MST. Detailed experimental results of a

72

Y.-C. Hu et al. / Information Sciences 159 (2004) 69–86

numerical example are presented in Section 5. We end this paper with discussions and conclusions in Section 6.

2. Simple fuzzy partition method Fuzzy sets were originally proposed by Zadeh [13], who also proposed the concept of linguistic variables and its applications to approximate reasoning [14]. A linguistic variable is a variable whose values are linguistic words or sentences in a natural language [15]. For example, the values or linguistic terms of the linguistic variable ‘‘amounts of apple juices that were purchased’’ may be ‘‘close to 3 pounds’’ or ‘‘very close to 5 pounds’’. In this paper, triangular membership functions are used for the linguistic terms defined in quantitative attributes. A quantitative attribute xm can be divided into K partitions (K ¼ 2; 3; 4; . . .) m m In addition, AxK;i stands for a candidate 1-dim fuzzy grid, and lxK;i ðxÞ can be m m defined as follows: m lxK;i ðxÞ ¼ maxf1  jx  aKim j=bK ; 0g m

ð1Þ

where aKim ¼ mi þ ðma  miÞ  ðim  1Þ=ðK  1Þ

ð2Þ

K

b ¼ ðma  miÞ=ðK  1Þ

ð3Þ

where ma is the maximum value of attribute domain, and mi is the minimum value. If we divide both x1 and x2 into three fuzzy partitions, then a feature space is divided into 3 · 3 2-dim fuzzy grids, as shown in Fig. 2. We use a linguistic

x2 A3,3 x2

A3, 2 x2

A3,1 0.0

x1 1.0 1.0 x1

0.0

A3,1

x1

A3, 2

x1

A3,3

Fig. 1. Both attributes x1 and x2 are partitioned into three partitions.

Y.-C. Hu et al. / Information Sciences 159 (2004) 69–86

73

Databases Phase I

Simple fuzzy partition method

Partition each attribute User-specified minimal fuzzy support

Find frequent fuzzy grids (C1)

Generate frequent fuzzy 1-sequences

Phase II Generate frequent fuzzy sequences

Discover fuzzy sequential patterns (C2)

Fig. 2. Framework of the proposed method.

1 2 value Ax3;1  Ax3;3 (i.e. small AND large) to denote the shaded 2-dim fuzzy grid shown in Fig. 1. The next important task is how to use these candidate 1-dim fuzzy grids to generate C1 and C2. The framework of the proposed method for discovering fuzzy sequential patterns is further described in following section.

3. Generate C1 and C2 In this section, the concrete meanings of C1 and C2 are described in detail, and the computational steps of the proposed method are also briefly introduced as follows. The fuzzy sequential pattern mining consists of two phases. After candidate 1-dim fuzzy grids have been generated, we must determine how to find frequent fuzzy grids, frequent fuzzy k-sequences (k P 1) and fuzzy sequential patterns from those candidate 1-dim fuzzy grids. Frequent fuzzy grids with small

74

Y.-C. Hu et al. / Information Sciences 159 (2004) 69–86

dimension, say m, are used to construct candidate (m þ 1)-dim fuzzy grids accompanied by a fuzzy support. A candidate (m þ 1)-dim grid can be determined to be frequent or not by comparing its fuzzy support with the userspecified minimum fuzzy support (min FS). At the end of phase I, each frequent fuzzy grid, say Lj , can be transformed into a frequent fuzzy 1-sequence hLj i. C1 is actually composed of frequent fuzzy grids. In phase II, fuzzy sequential patterns are discovered by analyzing the temporal relation between those frequent grids found in phase I. However, we must first define fuzzy sequences. A fuzzy sequence is an ordered list of frequent fuzzy grids, and the length of a fuzzy sequence is the number of frequent fuzzy grids in the fuzzy sequence. That is, a fuzzy sequence expresses the temporal relation between frequent fuzzy grids. Thus, if there are k fuzzy grids (k P 1) in a fuzzy sequence, then we call it a fuzzy k-sequence. Frequent fuzzy sequences with shorter length, say k, are used to construct candidate fuzzy sequences with longer length (i.e. fuzzy (k þ 1)-sequences) accompanied by a fuzzy support. A candidate fuzzy (k þ 1)-sequence can also be determined to be frequent or not by comparing its fuzzy support with the min FS used in phase I. At the end of phase II, all fuzzy sequential patterns are generated from those frequent fuzzy sequences. C2 is actually composed of fuzzy sequential patterns. From the above-mentioned operations, the framework for discovering fuzzy sequential patterns is illustrated in Fig. 2, and the details of the fuzzy sequential pattern mining can be found in [4]. Below, the determinations of frequent fuzzy 1-sequences and fuzzy sequential patterns are described in Sections 3.1 and 3.2, respectively. 3.1. Frequent fuzzy 1-sequences l1 l 1 2 Given a candidate l-dim (l 6 d) fuzzy grid AxK;i  AxK;i      AxK;i  AxK;i , l 1 2 l1 ðrÞ the degree to which tp belongs to this fuzzy grid can be computed as follows [16,17]:

lAx1

K;i1

x

x

x

ðrÞ

2 A l1 A l ðt Þ AK;i K;i p K;i 2

l1

l

l1 l 1 2 ¼ lxK;i ðtpðrÞ1 Þ  lxK;i ðtpðrÞ2 Þ    lxK;i ðtpðrÞl1 Þ  lxK;i ðtpðrÞl Þ l 1 2 l1

ð4Þ where ‘‘Æ’’ is a fuzzy intersection operator, namely the algebraic product [18]. Its l1 l 1 2 fuzzy support denoted by FSðAxK;i  AxK;i      AxK;i  AxK;i Þ is defined as l 1 2 l1 follows: l1 l 1 2 FSðAxK;i  AxK;i      AxK;i  AxK;i Þ l 1 2 l1 n X  lAx1 Ax2 Axl1 Axl ðcr Þ n ¼

r¼1

K;i1

K;i2

K;il1

K;il

Y.-C. Hu et al. / Information Sciences 159 (2004) 69–86

¼

n X r¼1

" max

p¼1;...;ar

n X

#, lAx1

x

x

K;i1

r¼1

x

2 A l1 A l AK;i K;i K;i 2

l1

xl1 x2 xl ðc Þ is r AK;i AK;i AK;i l 2 l1 xl1 x2 xl AK;i2      AK;il1  AK;il . Since

where lAx1 1 AxK;i 1

75

K;i1

l

ðtpðrÞ Þ

n

the degree to which cr

ð5Þ

supports

 sequential pattern mining mainly analyzes customer behavior, the fuzzy support is obtained by computing 1 2 l1 l lAx1 Ax2 Axl1 Axl ðcr Þ. If FSðAxK;i  AxK;i      AxK;i  AxK;i Þ is larger 1 2 l1 l K;i1

K;i2

K;il1

K;il

than or equal to the user-specified minimum fuzzy support (i.e. min FS), then l1 l 1 2  AxK;i      AxK;i  AxK;i is a frequent l-dim fuzzy grid. The fuzzy AxK;i l 1 2 l1 support actually indicates the degree of importance of one fuzzy grid. Example 1. In this example, we demonstrate possible marketing or advertising strategies that may be planned by using frequent fuzzy grids. We assume that 2 1 AProduct  AProduct is a frequent fuzzy grid, which demonstrates that small 2;2 2;1 purchase amounts of Product 1 and large purchase amounts of Product 2 were frequently purchased together by customers. This information may help managers design store layouts. For example, as in market basket analysis [1], Product 1 and Product 2 can be placed in close proximity in order to encourage the sales of both items. In an alternative strategy, retailers may plan Product 1 or Product 2 to put on sale. 3.2. Fuzzy sequential patterns As we have mentioned in the previous section, each frequent fuzzy grid, say Lj , can be transformed into a frequent fuzzy 1-sequence denoted by hLj i The fuzzy support of a fuzzy k-sequence is the average degree of total customers who support this sequence. Here, we take a fuzzy k-sequence hL1 ; L2 ; . . . ; Lk i, which may represent L1 ; L2 ; . . . ; Lk , being bought sequentially, as an example to compute its fuzzy support. For the rth customer (i.e., cr ) with ar transactions, there are ar Ck (ar P k) different combinations, ðtsðrÞ ; tsðrÞ ; . . . ; tsðrÞ Þ (1 6 s1 < k 1 2 ðrÞ ðrÞ s2 <    < sk 6 ar ), ordered by transaction-time. Since ðts1 ; ts2 ; . . . ; tsðrÞ Þ supk ports hL1 ; L2 ; . . . ; Lk i, the degree FSðhL1 ; L2 ; . . . ; Lk ir Þ to which cr supports hL1 ; L2 ; . . . ; Lk i is described as follows: FSðhL1 ; L2 ; . . . ; Lk ir Þ ¼

max

ð

ðrÞ ðrÞ ðrÞ ts1 ;ts1 ;...;tsk

Þ

½lL1 ðtsðrÞ Þ  lL2 ðtsðrÞ Þ    lLk ðtsðrÞ Þ k 1 2

for ar Ck different ðtsðrÞ ; tsðrÞ ; . . . ; tsðrÞ Þ k 1 2

ð6Þ

Þ represents the degree to which tsðrÞ belongs to Lk , and which can where lLk ðtsðrÞ k k be computed by Eq. (7). Of course, if ar < k, then FS hL1 ; L2 ; . . . ; Lk ir ¼ 0.

76

Y.-C. Hu et al. / Information Sciences 159 (2004) 69–86 ð2Þ

Example 2. Assume that the number of transactions of c2 is a2 ¼ 3 (i.e., t1 , ð2Þ ð2Þ t2 , and t3 ), and all possible combinations of transactions ordered by transð2Þ ð2Þ ð2Þ ð2Þ ð2Þ ð2Þ 2 action-time is ðt1 ; t2 Þ, ðt1 ; t3 Þ, and ðt2 ; t3 Þ. Let L1 and L2 be AProduct (i.e., 3;2 Product 3 (i.e., medium purchase medium purchase amounts of Product 2) and A3;2 2 Product 3 ; A iÞ2 (i.e., k ¼ 2) is amounts of Product 3), respectively, FSðhAProduct 3;2 3;2 ð2Þ ð2Þ ð2Þ ð2Þ ð2Þ obtained by computing max flL1 ðt1 Þ  lL2 ðt2 Þ; lL1 ðt1 Þ  lL2 ðt3 Þ; lL1 ðt2 Þ ð2Þ 2 3 ; AProduct i and a2 P k. lL2 ðt3 Þg since each combination may support hAProduct 3;2 3;2 2 However, if the number of transactions of c1 is a1 ¼ 1, then FSðhAProduct ; 3;2 Product 3 A3;2 iÞ1 is equal to zero since a1 k. It is clear that if the total number of customers is 2 (i.e., n ¼ 2), then FSðhAProduct2 ;AProduct3 iÞ ¼ ½FSðhAProduct2 ;AProduct3 iÞ1 þ FSðhAProduct2 ;AProduct3 iÞ2 =2. 3;2 3;2 3;2 3;2 3;2 3;2 The fuzzy support also indicates the degree of importance of one fuzzy sequence. In addition, we further define the fuzzy support FSðhL1 ; L2 ; . . . ; Lk iÞ of hL1 ; L2 ; . . . ; Lk i as follows: FSðhL1 ; L2 ; . . . ; Lk iÞ ¼

n X

 FSðhL1 ; L2 ; . . . ; Lk ir Þ n

ð7Þ

r¼1

If FShL1 ; L2 ; . . . ; Lk i is larger than or equal to the aforementioned min FS, then hL1 ; L2 ; . . . ; Lk i is a frequent fuzzy k-sequence. Subsequently, we define a fuzzy sequential pattern as a frequent fuzzy sequence which is not contained in any other frequent fuzzy sequence. Formally, a frequent fuzzy z1-sequence, say a, denoted by hLa;1 ; La;2 ; . . . ; La;z1 i is contained in another frequent fuzzy z2-sequence, say b, denoted by hLb;1 ; Lb;2 ; . . . ; Lb;z2 i if z1 6 z2 and there exist integers 1 6 j1 < j2 <    < jz1 6 z2 such that Lb;j1  La;1 ; Lb;j2  La;2 ; . . . ; Lb;jz1  La;z1 . Then, a is not a fuzzy sequential pattern but b is if it is not contained in the other frequent fuzzy sequences. In comparison with b, it seems that a is not valuable for decision makers. This also means that there is more information contained in b than in a. 1 1 Example 3. Assume that hLa;1 ; La;2 i ¼ hAProduct ; AProduct i (i.e., z1 ¼ 2) and 2;1 2;2 Product 1 Product 1 Product 2 Product 1 Product 2 ; A2;2  A2;1 ; A2;1  A2;2 i (i.e., z2 ¼ 3) hLb;1 ; Lb;2 ; Lb;3 i ¼ hA2;1 1 Product 1 ; A i is not a fuzzy are frequent fuzzy sequences. In addition, hAProduct 2;1 2;2 sequential pattern, since z1 6 z2 and there exist j1 ¼ 1 and j2 ¼ 2 such that 1 1 1 2  AProduct ) and Lb;j2  La;2 (i.e., AProduct  AProduct  Lb;j1  La;1 (i.e., AProduct 2;1 2;1 2;2 2;1 Product 1 Product 1 Product 1 Product 1 Product 1 ). That is, hA2;1 ; A2;2 i is contained in hA2;1 ; A2;2  A2;2 2 Product 1 Product 2 Product 1 Product 1 Product 2 ; A  A i. Of course, hA ; A  A ; AProduct 2;1 2;1 2;2 2;1 2;2 2;1 Product 1 Product 2  A2;2 i is a fuzzy sequential pattern if it is not contained in the A2;1 other frequent fuzzy sequences.

Y.-C. Hu et al. / Information Sciences 159 (2004) 69–86

77

Example 4. In this example, we demonstrate the possible application of fuzzy 2 1 sequential patterns. We assume that hAProduct ; AProduct i is a fuzzy sequential 2;2 2;1 pattern, which demonstrates that small purchase amounts of Product 1 are likely to be bought by customers next time after they bought large purchase amounts of Product 2. The result may be used to help decision makers (e.g., retailers) plan marketing strategy. For example, those customers, who have previously bought large purchase amounts of Product 2, may be attracted to buy more of Product 1 on sale. Since the set of the necessary and worthy fuzzy concepts is viewed as a competence set, we use the MST method introduced in following section to expand a competence set. 4. Competence set expansion Competence set expansion means a learning sequence of acquiring the needed skills so that the needed competence set is obtained [8]. It can be regarded as a tree construction process if there are no compound skills [19]. Feng and Yu [8] proposed a powerful method, the minimum spanning table method (MST), that can employ a directed graph with an expansion table to find a spanning tree with minimum cost. Then, an optimal expansion is acquired from minimum spanning tree. This procedure views each fuzzy term in C1 or C2 as a node in a directed graph, and it sets the learning cost cðyi ; yj Þ, which may be measured by time or money, in the directed path directly from node yi to node yj . It is noted that cðyi ; yj Þ 6¼ cðyj ; yi Þ usually holds. The starting node in the directed path is the fuzzy term that we suggest decision makers learn first. The MST is used to generate learning sequences and is briefly introduced as follows. Algorithm. The MST method Input: A directed graph with an expansion table T0 . The element ðyi ; yj Þ of T0 stores the learning cost cðyi ; yj Þ directly from node yi to node yj . Initially, no columns of T0 are crossed out and an integer number, z, is set to zero. Output: The minimum spanning table ST0 and corresponding spanning tree. Method Step 1. Selecting and marking procedure Select the smallest element cðyi ; yj Þ (i 6¼ j) among the remaining notcrossed-out columns of expansion table Tz , and mark it. Note that an element with undefined cost cannot be selected. Step 2. Cycle detecting procedure Determine whether a cycle has been formed among the marked elements; if so, go to Step 5.

78

Y.-C. Hu et al. / Information Sciences 159 (2004) 69–86

Step 3. Crossing out procedure Cross out the column to which the newly selected marked element belongs. Step 4. Stopping rule If only one not-crossed-out column is left, then the minimum spanning table STz can be constructed, and go to Step 6; otherwise, go to Step 1. Step 5. Compressing procedure Compress the nodes in the detected cycle C into a single node, x. Define transformation equations as follows: cðx; yi Þ ¼ minfcðy; yi Þjy 2 xg

ð8Þ

cðyi ; xÞ ¼ minfcðyi ; yÞ þ cðys ; yt Þ  cðya ; yÞjy 2 xg

ð9Þ

where cðys ; yt Þ is the largest cost in C; and ðya ; yÞ is a marked element. Set z þ 1 to z so that a new expansion table Tz is constructed, and then go to Step 1. Step 6. Unfolding procedure From the minimum spanning table of Tz , the minimum spanning table of Tz ; Tz1 ; . . . ; T0 can be generated. Note that STz1 produced by the unfolding procedure is the minimum spanning table of Tz1 . The MST will be stopped in a finite number of steps, at which point an optimal expansion has been acquired from a spanning tree with a minimal total degree of learning difficulty. By treating C1 as an aggregate skill, corresponding to a node, say a that has been acquired by decision makers, the learning cost from a to the fuzzy term, corresponding to a node, say yi in C2 is defined as follows: cða; yi Þ ¼ minfcðy; yi Þjy 2 C1g

ð10Þ

cðyi ; C1Þ is not defined

ð11Þ

Below, we use a numerical example to demonstrate the usefulness of the proposed method that combines both fuzzy sequential pattern mining and competence set expansion for decision making.

5. Numerical example A database relation, BOUGHT, with 10 tuples tpðrÞ (1 6 r 6 5, a1 ¼ 2, a2 ¼ 3, a3 ¼ 1, a4 ¼ 3, a5 ¼ 1) is given in Table 1, where the asterisks denote that one product was not purchased in that transaction. We first employ the method for fuzzy sequential pattern mining to discover fuzzy sequential patterns from

Record

Transaction time

Product 1

Product 2

ð1Þ

04/10/02

*

*

t2

ð1Þ

05/11/02

*

*

ð2Þ t1 ð2Þ t2 ð2Þ t3 ð3Þ t1 ð4Þ t1 ð4Þ t2 ð4Þ t3 ð5Þ t1

04/12/02

6

10

04/25/02

*

*

06/01/02

*

*

1

05/02/02

*

04/05/02

*

04/29/02

*

06/02/02

*

*

05/20/02

*

*

t1

Product 3 5

* 6

Product 5

Product 6

Product 7

Product 8

Product 9 *

*

*

*

*

*

*

*

*

*

*

9

*

*

*

*

*

*

8

*

*

*

14

*

*

*

12

6

*

*

8

Product 4

9

8

6

7

*

10

*

9

*

*

15

*

*

*

12

*

*

*

10

*

*

10

*

*

*

*

*

*

*

12

*

*

*

*

*

5

4 *

Y.-C. Hu et al. / Information Sciences 159 (2004) 69–86

Table 1 Table BOUGHT sorted by transaction time for each customer

79

80

Y.-C. Hu et al. / Information Sciences 159 (2004) 69–86

BOUGHT. Subsequently, two-stage learning sequences are derived from those fuzzy terms found in the mining results. For simplicity, some columns or rows of the following tables are omitted and indicated by ‘‘. . .’’.

Fuzzy sequential pattern mining There are 9 quantitative attributes, and each quantitative attribute ranging from zero to twenty is partitioned into three linguistic values (i.e., K ¼ 3) by the simple fuzzy partition method. Therefore, fuzzy subsets defined in individual partitions can be linguistically interpreted, such as that for the product m (m ¼ 1; . . . ; 9): m AProduct : small 3;1 m AProduct : medium 3;2 m AProduct : large 3;3

Then, we employ the proposed method to find fuzzy sequential patterns from BOUGHT by specifying min FS to be 0.21. The detailed computation process is omitted for simplicity. Then, six frequent fuzzy 1-sequences and six fuzzy sequential patterns can be found in the fuzzy sequential pattern mining, as shown in Table 2. Like Examples 1 and 4, possible strategies for promoting products or improving services for various fuzzy terms in C1 or C2 can be set 3 3 up by managers. For example, hAProduct ; AProduct i in C2 indicates that small 3;2 3;1 purchase amounts of Product 3 are likely to be bought by customers the following time after they bought medium purchase amounts of Product 3. Those customers, as can be found from databases by a query language such as SQL, may be attracted to buy more Product 3 on sale. Of course, managers can also analyze the possible reasons why small purchase amounts of Product 3 are likely to be bought. Deriving two-stage learning sequences Subsequently, we employ the MST to expand the competence set by deriving learning sequences. At first, fuzzy terms in C1 are expanded by using an expansion table T10 with hypothesis learning cost, as shown in Table 3. By the MST, cða12 ; a11 Þ is first selected and because no cycle is detected, column a11 is crossed out, as shown by a vertical dashed line. Then, for simplicity, we select cða12 ; a14 Þ, cða16 ; a15 Þ and cða15 ; a16 Þ, and omit other alternatives. However, a cycle a15 ! a16 ! a15 shown in Table 4 is detected and compresses these nodes in a single node a17 ¼ fa15 ; a16 g. Table 4 is rearranged and transformed by (8)

Y.-C. Hu et al. / Information Sciences 159 (2004) 69–86

81

Table 2 Fuzzy terms in C1 or C2 with individual labels in directed graph Node

Fuzzy term

Fuzzy support

C1 a11

3 AProduct 3;1

0.46

a12

7 AProduct 3;3

0.28

a13

9 AProduct 3;1

0.28

a14

9 AProduct 3;2

0.48

a15

2 3 AProduct  AProduct 3;2 3;2

0.29

a16

3 AProduct 3;2

0.30

C2 a21

7 hAProduct i 3;3

0.28

a22

9 hAProduct i 3;1

0.28

a23

9 hAProduct i 3;2

0.48

a24

2 3 hAProduct  AProduct i 3;2 3;2

0.29

a25

3 7 hAProduct  AProduct i 3;2 3;2

0.30

a26

3 3 hAProduct ; AProduct i 3;2 3;1

0.22



7 AProduct 3;2

Table 3 Expansion table T10 a11 a12 a13 a14 a15 a16

a11

a12

a13

a14

a15

a16

* 0.40 0.80 0.50 0.56 0.63

0.60 * 0.80 0.80 0.60 0.63

0.60 1.0 * 0.80 0.60 0.63

0.40 0.40 0.80 * 0.56 0.63

0.60 1.0 0.80 0.80 * 0.44

0.60 0.60 0.80 0.80 0.52 *

and (9) to a new expansion table T11 as shown in Table 5. From Table 6, we can see that cða12 ; a11 Þ, cða12 ; a14 Þ, cða12 ; a17 Þ and cða11 ; a13 Þ are selected and no cycle is detected. At this time, column a12 is the only column that remains and the stopping rule condition is reached. Subsequently, using the unfolding procedure, the minimum spanning table ST10 of T10 is shown in Table 7 and the corresponding minimum spanning tree is depicted in Fig. 3. Of course, there are other types of minimum spanning trees, but we omit them for simplicity. At the second stage, we obtain the learning sequence from C2 by viewing C1 as an aggregate skill corresponding to a18 that has been acquired by decision makers. An expansion table T20 with hypothesis learning cost is constructed as Table 8. However, we stress that the learning cost from a18 to other nodes in C2 is computed by Eqs. (10) and (11). For simplicity, the complete process for

82

Y.-C. Hu et al. / Information Sciences 159 (2004) 69–86

Table 4 Select cða15 ; a16 Þ, and then detect a cycle a15 ! a16 ! a15 a11 a12 a13 a14 a15 a16

a11

a12

a13

a14

a15

a16

* 0.40 0.80 0.50 0.56 0.63

0.60 * 0.80 0.80 0.60 0.63

0.60 1.0 * 0.80 0.60 0.63

0.40 0.40 0.80 * 0.56 0.63

0.60 1.0 0.80 0.80 * 0.44

0.60 0.60 0.80 0.80 0.52 *

Table 5 Construct T11 from Table 4 a11 a12 a13 a14 a17

a11

a12

a13

a14

a17

* 0.40 0.80 0.50 0.56

0.60 * 0.80 0.80 0.60

0.60 1.0 * 0.80 0.60

0.40 0.40 0.80 * 0.56

0.60 0.60 0.80 0.80 *

Table 6 Minimum spanning table ST11 of T11 a11 a12 a13 a14 a17

a11

a12

a13

a14

a17

* 0.40 0.80 0.50 0.56

0.60 * 0.80 0.80 0.60

0.60 1.0 * 0.80 0.60

0.40 0.40 0.80 * 0.56

0.60 0.60 0.80 0.80 *

Table 7 Minimum spanning table ST10 of T10 a11 a12 a13 a14 a15 a16

a11

a12

a13

a14

a15

a16

* 0.40 0.80 0.50 0.56 0.63

0.60 * 0.80 0.80 0.60 0.63

0.60 1.0 * 0.80 0.60 0.63

0.40 0.40 0.80 * 0.56 0.63

0.60 1.0 0.80 0.80 * 0.44

0.60 0.60 0.80 0.80 0.52 *

conducting learning sequences is omitted. The minimum spanning tree is depicted in Fig. 4. Clearly, the minimum total cost of two-stage learning difficulty is 5.12.

Y.-C. Hu et al. / Information Sciences 159 (2004) 69–86

83

α 13

α 11 0.40

α 12

0.40

α 14

0.60 16

0.44

15

Fig. 3. Minimum spanning tree in first learning stage.

Table 8 Expansion table T20 a18 a21 a22 a23 a24 a25 a26

a18

a21

a22

a23

a24

a25

a26

* * * * * * *

0.40 * 0.80 0.80 0.60 0.63 0.60

0.40 1.0 * 0.80 0.60 0.63 0.60

0.40 0.40 0.80 * 0.56 0.63 0.40

0.44 1.0 0.80 0.80 * 0.44 0.30

0.48 0.60 0.80 0.80 0.52 * 0.52

0.56 0.70 0.80 0.80 0.56 0.63 *

21

0.40

α18

0.40 α 22 0.40 0.48

α 23 25

0.44

24

0.56

26

Fig. 4. Minimum spanning tree corresponds to Table 8. 7 The learning sequence shown in Fig. 3 indicates that AProduct (i.e., node a12 ) 3;3 is the fuzzy term that we suggest decision makers to learn first. Subsequently, 3 9 3 7 AProduct (i.e., node a11 ), AProduct (i.e., node a14 ), and AProduct  AProduct (i.e., 3;1 3;2 3;2 3;2 Product 7 node a16 ) are suggested to be learnt simultaneously after A3;3 has been learnt. In addition, the learning sequence shown in Fig. 4, indicates that 7 9 9 hAProduct i (i.e., node a21 ), hAProduct i (i.e., node a22 ), hAProduct i (i.e., node a23 ), 3;3 3;1 3;2 Product 3 Product 7 and hA3;2  A3;2 i (i.e., node a25 ) after C1 has been learnt. It is noted that learning sequences with minimum costs are not unique. Decision makers should select one of the alternatives to acquire the competence set by their past experience or personal preferences. The experimental results demonstrate the

84

Y.-C. Hu et al. / Information Sciences 159 (2004) 69–86

usefulness of the proposed method, and also show that it is possible to help decision makers to solve the decision problems by use of the fuzzy sequential pattern mining and the competence set expansion, enabling them to make better decisions.

6. Discussions and further topics Fuzzy knowledge discovered in the fuzzy sequential pattern mining can be viewed as a competence set of one decision problem, and the acquisition of two-stage learning sequences, consisting of C1 and C2, with minimum learning costs is the focus of this paper. In addition to the aforementioned descriptions in previous sections, many further topics should be discussed. First, the meaning of the fuzzy terms of the quantitative attribute xl can be changed by a linguistic hedge [13] such as ‘‘very’’ or ‘‘more or less’’. For example, 0

l l l Þ ¼ very AxK;i ¼ ðAxK;i Þ ðAxK;i l l l

2

ð12Þ

00

l l l ðAxK;i Þ ¼ more or less AxK;i ¼ ðAxK;i Þ l l l

0

1=2

ð13Þ 00

2

1=2

l l l l The membership functions of ðAxK;i Þ and ðAxK;i Þ are ½lxK;i ðxÞ and ½lxK;i ðxÞ , l l l l respectively. These extensions for defining the linguistic terms by linguistic hedge in each quantitative attribute will make the fuzzy terms discovered from databases to be more friendly and more flexible for decision makers. Additionally, the number of linguistic values defined in each quantitative attribute need not to be equal to K. For example, x1 and x2 may be divided into three and 1 2 five partitions, respectively. A frequent fuzzy grid such as Ax3;1  Ax5;2 may be thus generated. As a result, many useful terms may be further discovered from databases. In Table 2, we also find that there are duplicate terms in C1 and C2. For 7 7 example, AProduct and hAProduct i actually express the same meaning. Therefore, 3;3 3;3 an additional procedure for eliminating the terms in C2 that have ever appeared in C1 should be considered. As mentioned in Section 4, learning directly from one skill to another skill requires learning cost, which can be measured by time or money. For example, a student may spend one year to acquire ‘‘data structures’’ after he had acquired ‘‘introduction to computer science’’. However, it seems to be impossible to exactly measure learning costs by either money or time since how much money or time will be spent should be determined by decision makers. A method for acquiring learning costs based on relationships that exist between two single skills has been proposed by Hu et al. [10]. Thus, the minimum learning cost can be regarded as the minimum degree of the difficulty for ac-

Y.-C. Hu et al. / Information Sciences 159 (2004) 69–86

85

quiring all fuzzy terms. On the other hand, Hu et al. [20] also proposed an acquisition method by using fuzzy rough sets [21,22]. We consider that it is feasible to use either of the above-mentioned two methods to compute the learning cost between any two fuzzy terms in C1 or C2.

Acknowledgements We would like to thank the anonymous referees for their valuable comments and constructive suggestions.

References [1] J.W. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, New York, 2001. [2] R. Agrawal, R. Srikant, Mining sequential patterns, in: Proceedings of the 11th International Conference on Data Engineering, Taipei, Taiwan, 1995, pp. 3–14. [3] S. Myra, Web usage mining for Web site evaluation, Communications of the ACM 43 (8) (2000) 127–134. [4] Y.C. Hu, R.S. Chen, G.H. Tzeng, J.H. Shieh, A fuzzy data mining algorithm for finding sequential patterns, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 11 (2) (2003) 173–193. [5] P.L. Yu, Forming Winning Strategies: An Integrated Theory of Habitual Domains, Springer Verlag, Berlin, 1990. [6] P.L. Yu, D. Zhang, A foundation for competence set analysis, Mathematical Social Sciences 20 (1990) 251–299. [7] H.L. Li, P.L. Yu, Optimal competence set expansion using deduction graph, Journal of Optimization Theory and Application 80 (1) (1994) 75–91. [8] J.W. Feng, P.L. Yu, Minimum spanning table and optimal expansion of competence set, Journal of Optimization Theory and Application 99 (3) (1998) 655–679. [9] H.L. Li, Incorporating competence sets of decision makers by deduction graphs, Operations Research 47 (1999) 209–220. [10] Y.C. Hu, R.S. Chen, G.H. Tzeng, Y.J. Chiu, Acquisition of compound skills and learning costs for expanding competence sets, Computers and Mathematics with Applications, in press. [11] H. Ishibuchi, K. Nozaki, N. Yamamoto, H. Tanaka, Selecting fuzzy if-then rules for classification problems using genetic algorithms, IEEE Transactions on Fuzzy Systems 3 (3) (1995) 260–270. [12] H. Ishibuchi, T. Nakashima, T. Murata, Performance evaluation of fuzzy classifier systems for multidimensional pattern classification problems, IEEE Transactions on Systems, Man, and Cybernetics 29 (5) (1999) 601–618. [13] L.A. Zadeh, Fuzzy sets, Information Control 8 (3) (1965) 338–353. [14] L.A. Zadeh, The concept of a linguistic variable and its application to approximate reasoning, Information Science (part 1) 8 (3) (1975) 199–249 (part 2) 8 (4) (1975) 301–357; (part 3) 9 (1) (1976) 43–80. [15] S.M. Chen, W.T. Jong, Fuzzy query translation for relational database systems, IEEE Transactions on Systems, Man, and Cybernetics 27 (4) (1997) 714–721.

86

Y.-C. Hu et al. / Information Sciences 159 (2004) 69–86

[16] H. Ishibuchi, T. Yamamoto, T. Nakashima, Fuzzy data mining: effect of fuzzy discretization, in: Proceedings of the First IEEE International Conference on Data Mining, San Jose, USA, 2001, pp. 241–248. [17] Y.C. Hu, R.S. Chen, G.H. Tzeng, Finding fuzzy classification rules using data mining techniques, Pattern Recognition Letters 24 (2003) 517–527. [18] H.-J. Zimmermann, Fuzzy Set Theory and its Applications, Kluwer, Boston, 1996. [19] D.S. Shi, P.L. Yu, Optimal expansion and design of competence sets with asymmetric acquiring costs, Journal of Optimization Theory and Application 88 (1) (1996) 643–658. [20] Y.C. Hu, G.H. Tzeng, R.S. Chen, Discovering fuzzy concepts for expanding competence set, in: Proceedings of the Second International Symposium on Advanced Intelligent Systems, Daejeon, Korea, 2001, pp. 396–401. [21] S. Bodjanova, Approximation of fuzzy concepts in decision making, Fuzzy Sets and Systems 85 (1) (1997) 23–29. [22] T.Y. Lin, Topological and fuzzy rough sets, in: R. Slowinski (Ed.), Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory, Kluwer, Boston, 1992, pp. 287–304.