Explass: Exploring Associations between Entities via Top-K Ontological Patterns and Facets Gong Cheng, Yanan Zhang, Yuzhong Qu Websoft Research Group State Key Laboratory for Novel Software Technology Nanjing University, China
Association search
Association search
?
air pollution
autism
? ?
Association search
?
?
You
? ?
Association search on the Web of documents
associations hidden in text
Association search on an entity-relation graph inProcOf
paper-A secondAuthor
Alice
chair
Bob
inProcOf
paper-B paper-D
reviewer
conf-B
firstAuthor
secondAuthor
conf-A
firstAuthor cites
paper-C
cites extends
firstAuthor
article-A
associations exposed as graph
association = path Bob
Alice secondAuthor firstAuthor firstAuthor secondAuthor secondAuthor
paper-A
inProcOf
paper-B
inProcOf
paper-B
cites
paper-D
cites
paper-D
extends
conf-A
reviewer
conf-B
chair
paper-C
firstAuthor
paper-C
firstAuthor
article-A
firstAuthor
Challenge
over 1,000 associations in DBpedia (within 4 hops)
frequency = 2/5 > threshold secondAuthor firstAuthor firstAuthor secondAuthor secondAuthor
paper-A
inProcOf
paper-B
inProcOf
paper-B
cites
paper-D
cites
paper-D
extends
conf-A
reviewer
conf-B
chair
paper-C
firstAuthor
paper-C
firstAuthor
article-A
firstAuthor
Formulated as frequent itemset mining 1. transaction = association item = <position, class> or <position, property> 2. Mining frequent itemsets 3. itemset pattern secondAuthor
paper-A
inProcOf
conf-A
reviewer
Position 1
Position 2
Position 3
Position 4
Position 5
Formulated as frequent itemset mining 1. transaction = association item = <position, class> or <position, property> 2. Mining frequent itemsets 3. itemset pattern secondAuthor
Position 1
paper-A
inProcOf
conf-A
reviewer
Position 2
Position 3
Position 4
Position 5
Formulated as frequent itemset mining 1. transaction = association item = <position, class> or <position, property> 2. Mining frequent itemsets 3. itemset pattern secondAuthor
author
paper-A
inProcOf
conf-A
reviewer
Paper
inProcOf
Conference
role
Step 2: Finding k frequent, informative, and small-overlapping patterns • Frequency (as previous) • Informativeness • Overlap
Step 2: Finding k frequent, informative, and small-overlapping patterns • Frequency (as previous) • Informativeness • informativeness of a class = self-information of its occurrence (more informative = having fewer instances) e.g. ConfPaper > Paper
• informativeness of a property = entropy of its values (more Informative = having more diverse values) e.g. is-author-of > nationality
• Overlap author
Paper
inProcOf
Conference
role
Step 2: Finding k frequent, informative, and small-overlapping patterns • Frequency (as previous) • Informativeness • Overlap • Ontological overlap: holding subClassOf/subPropertyOf relations • Contextual overlap: matched by common associations in the results
author
ConfPaper
inProcOf
Conference
ontological overlap
firstAuthor
Paper
cites
Paper
role
author
Formulated as multidimensional 0-1 knapsack • Find k patterns that maximize frequency*Informativeness (goal) and not share considerably large overlap (constraints)
• Solved by a greedy algorithm
Exploration methods (2) • Clustering • Facets • facet values = classes of entities and properties appearing in associations in the results • Problem: To recommend k facet values (solved in a similar way) ConfPaper secondAuthor
Paper
paper-A
inProcOf
Conference
conf-A
reviewer
Demo based on DBpedia ws.nju.edu.cn/explass
Demo based on DBpedia ws.nju.edu.cn/explass
facet values (classes)
facet values (properties)
Demo based on DBpedia ws.nju.edu.cn/explass
a collapsed pattern an expanded pattern
associations not matching any pattern above
User study from QALD
• 26 association exploration tasks over DBpedia • Derived from QALD queries and “People also search for” • Example: Suppose you will write an article about the associations between Abraham Lincoln and George Washington. Use the given system to explore their associations and identify several themes to discuss in the article.
• 20 subjects • 3 approaches • Explass: clustering + facets • RelClus: clustering into a hierarchy of patterns • RF: facets only (similar to RelFinder)
Post-task questionnaire results
Usability scores (SUS)
User behavior
Conclusion 1. Provide patterns wisely. • To avoid deep, complicated hierarchy • To avoid very general, almost meaningless concepts
2. Combine patterns and facets wisely. • Patterns as meaningful summaries of results • Facets as filters for refining the search