Fast discovery of sequential patterns in large databases using ...

Report 6 Downloads 94 Views
Fast discovery of sequential patterns in large databases using effective time -indexing Information Sciences ( 2008 ) 4228 -4245 Ming-Yen Lin , Suh-Yin Lee and Sheng-Shun Wang National Chiao Tung University , Taiwan

Advisor : Prof. Huang, Jen-Peng Student: TU,JING-GUO

Outline



Introduction Related work Definition



An example



Performance analysis and experimental evaluation Conclusions

 



Introduction 

Introduction

The time constraints between elements of a sequential pattern ar e not specified so that some uninteresting patterns may appear. For example, without specifying the maximum time gap, one my fin d a pattern < ( b, d, e ) ( a, f ) >, which means an item -set having a and f will occur after the occurrence of an item -set having b, d, and e. However, the pattern could be insignificant if the time interva l between the two item-set is too long such as over months.

? time

pc

printer

Ink ,paper

Introduction 

Introduction

The time constraints between elements of a sequential pattern ar e not specified so that some uninteresting patterns may appear. For example, without specifying the maximum time gap, one my fin d a pattern < ( b, d, e ) ( a, f ) >, which means an item -set having a and f will occur after the occurrence of an item -set having b, d, and e. However, the pattern could be insignificant if the time interva l between the two item-set is too long such as over months.

pc

1

printer

2

3

4

5

Ink ,paper



100

Related work   

Sequentail pattern mining GSP ( apriori ) DELISP

Definition Definition .1 (frequent item) An item x is called a frequent item in a sequence database DB if the supp ort of 1sequence is greater than or equal to minsup. Definition .2 (type-1, type-2 , prefix , stem) itemset

Type

< (a) (b) >

Type-1

< (a , b) >

Type-2

Definition Definition .1 (frequent item) An item x is called a frequent item in a sequence database DB if the supp ort of 1sequence is greater than or equal to minsup. Definition .2 (type-1, type-2 , prefix , stem) itemset

Type

< (a) (b) >

Type-1

< (a , b) >

Type-2

prefix

stem

Definition Definition .3 ( it , lst , let )

Transaction

itemset

TIdx

T1

< 1(a) 2(b) 9(d) 15(c) >

[1:1:1]

T2

< 1(a) 2(b) 9(d) 15(c) 21(a)>

[ 1:1:1 , 21:21:21 ]

[x:y:z] Last end-time initial-time Last start-time

Definition Definition .3 ( it , lst , let )

itemset

TIdx

< 1(a) 2(b) 9(d) 15(c) >

( a) (b )

[ 1:2:2 ]

< 1(a) 2(b) 9(d) 25(c) 28(a)>

( a) (c )

[ 1:25:25 ]

[x:y:z] Last end-time initial-time Last start-time

Definition

Time-constraints swin = sliding time-window mingap = minimum time gap maxgap = maximum time gap duration = constraint time window

Definition Lemma .1 ( type1 ) leti + mingap ≤ VTP ≤ lsti + maxgap

VTP = valid time periods

Definition Lemma .1 ( type1 ) leti + mingap ≤ VTP ≤ lsti + maxgap

Ex: < (b) (e) >

Transaction

itemset

TIdx

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

[ 10:17:17 ]

a,c

1

b

2

….

10

….

e

a

17

18

c ,d

….

24

Definition Lemma .1 ( type1 ) leti + mingap ≤ VTP ≤ lsti + maxgap

Ex: < (b) (e) >

Transaction

itemset

TIdx

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

[ 10:17:17 ]

duration = 25

a,c

1

b

2

….

10

….

e

a

17

18

c ,d

….

24

35

Definition Lemma .1 ( type1 ) leti + mingap ≤ VTP ≤ lsti + maxgap

Ex: < (b) (e) >

Transaction

itemset

TIdx

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

[ 10:17:17 ]

maxgap = 15

a,c

1

b

2

….

10

….

e

a

17

18

c ,d

….

24

32

35

Definition Lemma .1 ( type1 ) leti + mingap ≤ VTP ≤ lsti + maxgap

Ex: < (b) (e) >

Transaction

itemset

TIdx

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

[ 10:17:17 ]

a,c

1

b

2

….

10

e

….

17

mingap = 3

20

c ,d 24

….

32

35

Definition Lemma .1 ( type1 ) leti + mingap ≤ VTP ≤ lsti + maxgap

Ex: < (b) (e) >

Transaction

itemset

TIdx

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

[ 10:17:17 ]

a,c

1

b

2

….

10

e

….

17

VTP 20

….

32

35

Definition Lemma .1 ( type1 ) leti + mingap ≤ VTP ≤ lsti + maxgap

a,c

1

b

2

….

10

e

….

17

VTP 20

….

32

35

Definition Lemma .2 ( type2 ) leti - swin ≤ VTP ≤ minimum of { lsti + swin , iti + duration }

Definition Lemma .2 ( type2 ) leti - swin ≤ VTP ≤ minimum of { lsti + swin , iti + duration }

Ex: < (b) (e) >

Transaction

itemset

TIdx

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

[ 10:17:17 ]

a,c

1

b

2

….

10

e

c ,d

17

24

35

Definition Lemma .2 ( type2 ) leti - swin ≤ VTP ≤ minimum of { lsti + swin , iti + duration } Ex: < (b) (e) >

Transaction

itemset

TIdx

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

[ 10:17:17 ]

a,c

1

b

2

….

10

e

17

An example

Item

Support

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >

a

3

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

b

3

C3

< 1(b) 20(b,g) 27(e) 34(d,g) 35(g) >

c

3

C4

< 5(a) 10(d) 21(c,d) 26(e) >

d

3

e

3

f

1

g

1

Tran, ID

sequences

C1

min_Sup=2

An example min_Sup=2 -TIdx [ 5:5:5 , 31:31:31 ] [ 6:6:6 , 18:18:18 ] [ 5:5:5 ]

Time-constraints swin = 2 mingap = 3 maxgap = 15 duration = 25

Tran, ID

sequences

C1

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

C3

< 1(b) 20(b,g) 27(e) 34(d,g) 35(g) >

C4

< 5(a) 10(d) 21(c,d) 26(e) >

An example item

Tran, ID

a

C1

TIdx

sequences

< 3(c) 5(a,f) 18(b) 31(a) 45(f) > [ 5:5:5 , 31:31:31 ]

c

a ,f

b

a

3

5

18

31

Time-constraints swin = 2 mingap = 3 maxgap = 15 duration = 25

f

45

An example item

Tran, ID

a

C1

TIdx

sequences

< 3(c) 5(a,f) 18(b) 31(a) 45(f) > [ 5:5:5 , 31:31:31 ]

Time-constraints swin = 2 mingap = 3 maxgap = 15 duration = 25

duration = 25

30 c

a ,f

b

a

3

5

18

31

f

45

An example item

Tran, ID

a

C1

1.

TIdx

sequences

< 3(c) 5(a,f) 18(b) 31(a) 45(f) > [ 5:5:5 , 31:31:31 ]

Time-constraints swin = 2 mingap = 3 maxgap = 15 duration = 25

leti + mingap ≤ VTP ≤ lsti + maxgap

8 ≤ VTP ≤ 20

30 c

a ,f

b

a

3

5

18

31

f

45

An example item

Tran, ID

a

C1

1.

TIdx

sequences

< 3(c) 5(a,f) 18(b) 31(a) 45(f) > [ 5:5:5 , 31:31:31 ]

Time-constraints swin = 2 mingap = 3 maxgap = 15 duration = 25

leti + mingap ≤ VTP ≤ lsti + maxgap

8 ≤ VTP ≤ 20

1

30 c

a ,f

b

a

3

5

18

31

f

45

An example item

Tran, ID

a

C1

TIdx

sequences

< 3(c) 5(a,f) 18(b) 31(a) 45(f) > [ 5:5:5 , 31:31:31 ]

2.

Time-constraints swin = 2 mingap = 3 maxgap = 15 duration = 25

leti - swin ≤ VTP ≤ minimum of { lsti + swin , iti + duration }

3 ≤ VTP ≤ 7

c

a ,f

b

a

3

5

18

31

f

45

An example item

Tran, ID

a

C1

TIdx

sequences

< 3(c) 5(a,f) 18(b) 31(a) 45(f) > [ 5:5:5 , 31:31:31 ]

2.

Time-constraints swin = 2 mingap = 3 maxgap = 15 duration = 25

leti - swin ≤ VTP ≤ minimum of { lsti + swin , iti + duration }

3 ≤ VTP ≤ 7

1 1

c

a ,f

b

a

3

5

18

31

f

45

An example item

Tran, ID

a

C1

TIdx

sequences

< 3(c) 5(a,f) 18(b) 31(a) 45(f) > [ 5:5:5 , 31:31:31 ]

Time-constraints swin = 2 mingap = 3 maxgap = 15 duration = 25

duration = 25

c

a ,f

b

a

3

5

18

31

f

45

56

An example item

Tran, ID

a

C1

1.

TIdx

sequences

< 3(c) 5(a,f) 18(b) 31(a) 45(f) > [ 5:5:5 , 31:31:31 ]

Time-constraints swin = 2 mingap = 3 maxgap = 15 duration = 25

leti + mingap ≤ VTP ≤ lsti + maxgap

33 ≤ VTP ≤ 46

c

a ,f

b

a

3

5

18

31

f

45

56

An example item

Tran, ID

a

C1

1.

TIdx

sequences

< 3(c) 5(a,f) 18(b) 31(a) 45(f) > [ 5:5:5 , 31:31:31 ]

Time-constraints swin = 2 mingap = 3 maxgap = 15 duration = 25

leti + mingap ≤ VTP ≤ lsti + maxgap

33 ≤ VTP ≤ 46

1

c

a ,f

b

a

3

5

18

31

f

45

56

An example item

Tran, ID

a

C1

TIdx

sequences

< 3(c) 5(a,f) 18(b) 31(a) 45(f) > [ 5:5:5 , 31:31:31 ]

Time-constraints swin = 2 mingap = 3 maxgap = 15 duration = 25

1 1 1

c

a ,f

b

a

3

5

18

31

f

45

56

An example sequences

TIdx

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

[ 6:6:6 , 18:18:18 ]

item Tran, ID

a

C2

1.

Time-constraints swin = 2 mingap = 3 maxgap = 15 duration = 25

leti + mingap ≤ VTP ≤ lsti + maxgap

9 ≤ VTP ≤ 21

a ,c

b

e

a

c ,d

6

10

17

18

24

An example item

Tran, ID

a

C2

1.

TIdx

sequences

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) > [ 6:6:6 , 18:18:18 ]

Time-constraints swin = 2 mingap = 3 maxgap = 15 duration = 25

leti + mingap ≤ VTP ≤ lsti + maxgap

9 ≤ VTP ≤ 21

1 1 1 a ,c

b

e

a

c ,d

6

10

17

18

24

An example item

Tran, ID

a

C2

2.

TIdx

sequences

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) > [ 6:6:6 , 18:18:18 ]

Time-constraints swin = 2 mingap = 3 maxgap = 15 duration = 25

leti - swin ≤ VTP ≤ minimum of { lsti + swin , iti + duration }

4 ≤ VTP ≤ 8

1

a ,c

b

e

a

c ,d

6

10

17

18

24

An example item

Tran, ID

a

C2

1.

TIdx

sequences

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) > [ 6:6:6 , 18:18:18 ]

Time-constraints swin = 2 mingap = 3 maxgap = 15 duration = 25

leti + mingap ≤ VTP ≤ lsti + maxgap

21 ≤ VTP ≤ 33

1 1

a ,c

b

e

a

c ,d

6

10

17

18

24

An example item

Tran, ID

a

C2

TIdx

sequences

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) > [ 6:6:6 , 18:18:18 ]

Time-constraints swin = 2 mingap = 3 maxgap = 15 duration = 25

1 1 1 1 1 1

a ,c

b

e

a

c ,d

6

10

17

18

24

An example item

Tran, ID

sequences

TIdx

a

C4

< 5(a) 10(d) 21(c,d) 26(e) >

[ 5:5:5 ]

1.

leti + mingap ≤ VTP ≤ lsti + maxgap

8 ≤ VTP ≤ 20

1

a

d

c ,d

e

5

10

21

26

Time-constraints swin = 2 mingap = 3 maxgap = 15 duration = 25

An example min_Sup=2 Tran, ID

sequences

C1

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

C3

< 1(b) 20(b,g) 27(e) 34(d,g) 35(g) >

C4

< 5(a) 10(d) 21(c,d) 26(e) >

-TIdx [ 5:5:5 , 31:31:31 ] [ 6:6:6 , 18:18:18 ] [ 5:5:5 ]

1 2 1 2 1 2

An example min_Sup=2 Tran, ID

sequences

[ 3:3:5 ]

C1

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >

[ 6:6:6]

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

C3

< 1(b) 20(b,g) 27(e) 34(d,g) 35(g) >

C4

< 5(a) 10(d) 21(c,d) 26(e) >

-TIdx

Time-constraints swin = 2 mingap = 3 maxgap = 15 duration = 25

2

An example min_Sup=2 Tran, ID

sequences

[ 3:3:18 ]

C1

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >

[ 6:6:10]

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

C3

< 1(b) 20(b,g) 27(e) 34(d,g) 35(g) >

C4

< 5(a) 10(d) 21(c,d) 26(e) >

-TIdx

Time-constraints swin = 2 mingap = 3 maxgap = 15 duration = 25

No more patterns can be formed

An example

Min_Sup=2 Frequent itemset

Frequent itemset

a

c (c )( b)

Tran, ID

sequences

(a ,c)

C1

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >

(a )( b)

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

(a )( d)

C3

< 1(b) 20(b,g) 27(e) 34(d,g) 35(g) >

(a ,c)( b)

C4

< 5(a) 10(d) 21(c,d) 26(e) >

Frequent itemset b (b )( a) (b )( d) (b )( e) (b )( e)( d)

(c )( e) (c )( b)( a) Frequent itemset d Frequent itemset e (e )( d)

Dealing with extra-large databases

Performance analysis and experimental evaluation

Average number of transaction per data -sequence = 10 Average number of items per transaction = 2.5 Average size of potentially sequential patterns = 4 Average size of potentially frequent itemsets =1.25 Number of data sequences in database = 100k

Performance analysis and experimental evaluation

Average number of transaction per data -sequence = 10 Average number of items per transaction = 2.5 Average size of potentially sequential patterns = 4 Average size of potentially frequent itemsets =1.25 Number of data sequences in database = 100k

Performance analysis and experimental evaluation

Average number of transaction per data -sequence = 10 Average number of items per transaction = 2.5 Average size of potentially sequential patterns = 4 Average size of potentially frequent itemsets =1.25 Number of data sequences in database = 100k

Performance analysis and experimental evaluation

Average number of transaction per data -sequence = 10 Average number of items per transaction = 2.5 Average size of potentially sequential patterns = 4 Average size of potentially frequent itemsets =1.25 Number of data sequences in database = 100k

Conclusions



This paper has presented METISP, a time -indexing algorithm for mining sequential patterns with various time constraints , inclu ding minimum-, maximum-, and exact-gaps, sliding time-windows, and durations. METISP effectively shrinks the search space of potent ial patterns.