Explass: Exploring Associations between Entities via ... - VideoLectures

Report 3 Downloads 57 Views
Explass: Exploring Associations between Entities via Top-K Ontological Patterns and Facets Gong Cheng, Yanan Zhang, Yuzhong Qu Websoft Research Group State Key Laboratory for Novel Software Technology Nanjing University, China

Association search

Association search

?

air pollution

autism

? ?

Association search

?

?

You

? ?

Association search on the Web of documents

associations hidden in text

Association search on an entity-relation graph inProcOf

paper-A secondAuthor

Alice

chair

Bob

inProcOf

paper-B paper-D

reviewer

conf-B

firstAuthor

secondAuthor

conf-A

firstAuthor cites

paper-C

cites extends

firstAuthor

article-A

associations exposed as graph

association = path Bob

Alice secondAuthor firstAuthor firstAuthor secondAuthor secondAuthor

paper-A

inProcOf

paper-B

inProcOf

paper-B

cites

paper-D

cites

paper-D

extends

conf-A

reviewer

conf-B

chair

paper-C

firstAuthor

paper-C

firstAuthor

article-A

firstAuthor

Challenge

over 1,000 associations in DBpedia (within 4 hops)

How to explore them?

Exploration methods (1) • Clustering • Facets

cluster = pattern Common super-property pattern

author

Common class

Paper

inProcOf

Conference

role

match

secondAuthor associations

conf-A

reviewer

paper-B

inProcOf

conf-B

chair

Position 2

Position 3

Position 4

paper-A

inProcOf

firstAuthor Position 1

Position 5

Problem: To recommend k patterns

secondAuthor firstAuthor firstAuthor secondAuthor secondAuthor

paper-A

inProcOf

paper-B

inProcOf

paper-B

cites

paper-D

cites

paper-D

extends

conf-A

reviewer

conf-B

chair

paper-C

firstAuthor

paper-C

firstAuthor

article-A

firstAuthor

Step 1: Mining all significant patterns author

Paper

inProcOf

Conference

role

frequency = 2/5 > threshold secondAuthor firstAuthor firstAuthor secondAuthor secondAuthor

paper-A

inProcOf

paper-B

inProcOf

paper-B

cites

paper-D

cites

paper-D

extends

conf-A

reviewer

conf-B

chair

paper-C

firstAuthor

paper-C

firstAuthor

article-A

firstAuthor

Formulated as frequent itemset mining 1. transaction = association item = <position, class> or <position, property> 2. Mining frequent itemsets 3. itemset  pattern secondAuthor

paper-A

inProcOf

conf-A

reviewer

Position 1

Position 2

Position 3

Position 4

Position 5

Formulated as frequent itemset mining 1. transaction = association item = <position, class> or <position, property> 2. Mining frequent itemsets 3. itemset  pattern secondAuthor

Position 1

paper-A

inProcOf

conf-A

reviewer

Position 2

Position 3

Position 4

Position 5

Formulated as frequent itemset mining 1. transaction = association item = <position, class> or <position, property> 2. Mining frequent itemsets 3. itemset  pattern secondAuthor



author

paper-A

inProcOf

conf-A

reviewer



Paper

inProcOf

Conference

role

Step 2: Finding k frequent, informative, and small-overlapping patterns • Frequency (as previous) • Informativeness • Overlap

Step 2: Finding k frequent, informative, and small-overlapping patterns • Frequency (as previous) • Informativeness • informativeness of a class = self-information of its occurrence (more informative = having fewer instances) e.g. ConfPaper > Paper

• informativeness of a property = entropy of its values (more Informative = having more diverse values) e.g. is-author-of > nationality

• Overlap author

Paper

inProcOf

Conference

role

Step 2: Finding k frequent, informative, and small-overlapping patterns • Frequency (as previous) • Informativeness • Overlap • Ontological overlap: holding subClassOf/subPropertyOf relations • Contextual overlap: matched by common associations in the results

author

ConfPaper

inProcOf

Conference

ontological overlap

firstAuthor

Paper

cites

Paper

role

author

Formulated as multidimensional 0-1 knapsack • Find k patterns that maximize frequency*Informativeness (goal) and not share considerably large overlap (constraints)

• Solved by a greedy algorithm

Exploration methods (2) • Clustering • Facets • facet values = classes of entities and properties appearing in associations in the results • Problem: To recommend k facet values (solved in a similar way) ConfPaper secondAuthor

Paper

paper-A

inProcOf

Conference

conf-A

reviewer

Demo based on DBpedia ws.nju.edu.cn/explass

Demo based on DBpedia ws.nju.edu.cn/explass

facet values (classes)

facet values (properties)

Demo based on DBpedia ws.nju.edu.cn/explass

a collapsed pattern an expanded pattern

associations not matching any pattern above

User study from QALD

• 26 association exploration tasks over DBpedia • Derived from QALD queries and “People also search for” • Example: Suppose you will write an article about the associations between Abraham Lincoln and George Washington. Use the given system to explore their associations and identify several themes to discuss in the article.

• 20 subjects • 3 approaches • Explass: clustering + facets • RelClus: clustering into a hierarchy of patterns • RF: facets only (similar to RelFinder)

Post-task questionnaire results

Usability scores (SUS)

User behavior

Conclusion 1. Provide patterns wisely. • To avoid deep, complicated hierarchy • To avoid very general, almost meaningless concepts

2. Combine patterns and facets wisely. • Patterns as meaningful summaries of results • Facets as filters for refining the search

Filters

Summaries of results

Future work • Performance optimization • (online) path finding • (online) frequent itemset mining

• Exploring associations between several entities or, a data set

Questions?