US006038560A
United States Patent [19]
[11]
Patent Number:
Wical
[45]
Date of Patent:
[54]
CONCEPT KNOWLEDGE BASE SEARCH
6,038,560 Mar. 14, 2000
D.R. Cutting, et al., “Constant interaction—time scatter/
gather browsing of very large document collections,” Proc.
AND RETRIEVAL SYSTEM
Sixteenth annual international ACM SIGIR Conf. on
[75] Inventor: Kelly Wical, San Carlos, Calif.
[73] Assignee: Oracle Corporation, Redwood Shores, Calif.
Research and Development in Information Retrieval, pp. 126—134, Dec. 1993. RB. Allen “An Interface for Navigating Clustered Docu ment Sets Returned by Queries,” Proc. of the Conf. on
Organizational Computing Systems, pp. 166—171, Dec. 1993.
[21] Appl. No.: 08/861,983 [22] Filed:
ED. Liddy, et al., “Text Categorization for Multiple Users
May 21, 1997
Based on Semantic Features from a Machine—Radable Dic
[51]
Int. Cl.7
.... .. G06F 17/30
[52]
US. Cl. ................................................. .. 707/5; 706/50
[58]
Field of Search .............................. .. 706/50, 61, 934;
707/5 [56]
U.S. PATENT DOCUMENTS
5,167,011
11/1992
5,226,111 5,257,185
7/1993 Black et a1. .. 10/1993 Farley et al. ..
5,325,298 5,369,763
(List continued on next page.)
4/1992 Lanier et a1. ........................... .. 706/11
5,159,667 10/1992 Borrey et al. .
5,276,616
12(3), pp. 278—295, Jul. 1994. RB. Allen, “Two Digital Library Interfaces that Exploit Hierarchical Structure,” DAGS95: Electronic Publishing and the Information Superhighway, (10 pages), May 1995. A. Celentano, et al., “Knowledge—based Document Retrieval in Office Environments: the Kabiria System,” ACM TRans. on Information Systems, vol. 13(30, pp. 237—268, Jul. 1995.
References Cited
5,103,498
tionary,” ACM TRansactions on Information Systems, vol.
Priest
. ... ... ... ..
707/500 . . . . ..
706/62
706/50 707/100
1/1994 Kuga et al. 6/1994 11/1994
704/10
Gallant .... .. Biles ........... ..
707/5 707/3
5,442,780 5,555,408
8/1995 Takanashi et al. 9/1996 Fujisawa et a1. .
707/1 707/5
5,598,557
1/1997 Doner et al. .... ..
707/5
5,615,112
3/1997 Liusheng et al. .
5,625,767 5,630,117
4/1997 Bartell et al. 5/1997 Oren et al.
706/50
5,630,125
5/1997 Zellweger
5,634,051
5/1997
Thomson
5,659,724 5,720,008
8/1997 2/1998
Borgida et al. ..... .. .. 706/50 McGuinness et al. .................. .. 706/50
345/440 707/100
707/103 ......
. . . . . . ..
707/5
OTHER PUBLICATIONS
Cox, John “‘Text—Analysis’ Server to Simplify Queries”, Communications Week, Apr. 19, 1993.
Primary Examiner—Robert W. Downs Attorney, A gent, or Firm—Fliesler, Dubb, Meyer & Lovej oy LLP
[57]
ABSTRACT
A knowledge base search and retrieval system, which
includes factual knowledge base queries and concept knowl edge base queries, is disclosed. A knowledge base stores associations among terminology/categories that have a lexical, semantical or usage association. Document theme vectors identify the content of documents through themes as
well as through classi?cation of the documents in categories that re?ects what the documents are primarily about. The factual knowledge base queries identify, in response to an input query, documents relevant to the input query through expansion of the query terms as well as through expansion of themes. The concept knowledge base query does not identify speci?c documents in response to a query, but
speci?es terminology that identi?es the potential existence of documents in a particular area.
“Verity ?nds the Topic,” The Seybold Report on Publishing Systems, vol. 19(4), Oct. 1989.
Generate apptrcable senses and terms for drstinctrve query terms
29 Claims, 21 Drawing Sheets
500 sro 520 530 540 550
Select common denominators of expanded themes
terms, and themes
6,038,560 Page 2 OTHER PUBLICATIONS
G. Salton, et al., “Automatic TeXt Decomposition Using TeXt Segments and TeXt Themes,” Proc. Seventh ACM Conf. on
M. IWayama and T. Tokunaga, “Cluster—based Text Catego riZation: a Comparison of Category Search Strategies,” Proc. 18th Annual Int’l. ACM SIGIR Conf. on Research and
DEvelopment in Information Retrieval, pp. 273—280, Dec. 1995.
Hypertext ’96, pp. 53—65, Dec. 1996. P. Pirolli, et al., “Scatter/Gather BroWsing Communicates the Topic Structure of a Very Large TeXt Collection,” Conf. Proc. on Human Factors in Computing Systems, pp. 213—220, Dec. 1996.
U.S. Patent
Mar. 14, 2000
Query Processing 1_7_§ Concept Query Processing E A
Mode User
Query
Query Term
Retrieval Information
> Processing
QQQ
6,038,560
Sheet 2 0f 21
Document
Signatures LQQ
Knowledge Base
155.
Y
Factual Query
Processing gig
FIG. 2 To Screen Module y 230
U.S. Patent
Mar. 14, 2000
Sheet 3 0f 21
6,038,560
FIG. 3 Query:
Legal, Betting, China Iv‘ 610 Government, Casino, Asia \A 620
Gaming Industry (2)
625
_:| Patents, Slot Machines, Japan F630 ————> Patent Law (4) ————>
635
Gaming Industry (2)
‘__> Crime, Wagering, China
-———> Insects (I)
640 645
650
Conservation - Ecology (2) W655
U.S. Patent
Mar. 14, 2000
6,038,560
Sheet 5 0f 21
FIG. 5
(g) Generate senses and distinct parts from query
t Generate query term strengths
Expand query terms using knowledge base
Select categories in knowledge base identified by expanded query terms Select documents classified for those categories
“A 420
t Select themes from documents
v“ 430
t
Sort and compile information by theme
LA 440
+
List themes in order of strongest themes
m 450
Select top themes from additional documents based on predetermined criteria
LA 460
t
Organize themes in groups
W 465
Order theme groups
470
Order documents within groups
475
Display groups and associated document names
“A 480
Display categories classified for documents
m 485
@
U.S. Patent
m woz 40E @
Mar. 14,2000
Sheet 6 0f 21
6,038,560
802 5<E6._
U.S. Patent
Mar. 14,2000
Sheet 7 0f 21
6,038,560
FIG. 7
@D
Generate applicable senses and forms for distinctive query terms
v‘ 500
Generate strengths for query terms
|v\ 510
Map query terms to knowledge base
|v\ 520
Expand query terms through knowledge base |~A 530 Select theme set for expanded query terms
P 540
Expand theme set through knowledge base
|\/\ 550
Select common denominators of expanded themes
560
among expanded query terms to satisfy input query
Relevance rank query terms, expanded query terms, and themes
Display query response
570
P 580
U.S. Patent
Mar. 14,2000
Sheet 8 0f 21
6,038,560
FIG. 8A Social Sciences I _"'_’
History
I
—> Ancient History I ‘—> Ancient Home
I
Anthropology I —>|Customs and PracticesI ——>| Kinship and Marriage I —->
Peoples
I
‘——---> Races of Peoples I
‘__’
Linguistics
I
_>| Languages
I
U.S. Patent
FIG. 88
Mar. 14, 2000
Sheet 9 0f 21
— Food and Agriculture 7
Cereal and Grains
V
Condiments
i
Dairy Products
Drinking and Dining Q Alcoholic Beverages 7
>
Beers
V
Liquors
=
Liqueurs
V
Wines
Wineries Meals and Dishes
iv
Meats V
Beef
V
Lamb
Pate and Sausages Seafood Pastas
V
Prepared Foods Desserts
Fl V
V
Cakes
7
Cookies
Pastries
Sauces
Soups and Stews
6,038,560
U.S. Patent
Mar. 14,2000
Sheet 10 0f 21
6,038,560
FIG. 86 Geography
I
Political Geography 7
‘
Europe
:I Western Europe I ——>
Austria
I
———>
Germany
I
\-——>
France
I
Iberia
I Spain
——5
Ireland
I
—>I
Italy
I
—>| Sweden I ——>| Netherlands I
T United Kingdom England Eastern Europe
U.S. Patent
Mar. 14,2000
Sheet 11 0f 21
6,038,560
E2 mEom 2 21 E2 3 E2 1
£23m
265 c gméwl 582 m8
82 5
02Em$6 9:10
2A6$
52mmx
855x
Em@co5?m2a3 Em2EEm\s ts?
>
U.S. Patent
Mar. 14,2000
Sheet 12 0f 21
6,038,560
meg:am59%H 25 oE
.GEmm
3c0826m0 I_ 296502$5i320
2b9o85m:0?
|_ l_ 0$5598282
5$286K065I9065
058m35mE5jw
|_@1w9m2w56m0gFo:
3E5Em8:55\ 220\
2280@38a
l_0m m so
r852“2
m A
2Q50
U.S. Patent
Mar. 14,2000
Sheet 13 0f 21
I) Festivals, Foods, Western Europe ' A) Festivals, Drinking and Dining, Germany Beer Knockwurst Oktoberfest Stein Sauerkraut
‘—> B) Festivals, Drinking and Dining, France
\ilm?wtonr MCBTIO
_,4
arnla.h086nO?|lri
S
aenr 6HT
n.lkemS apm
g me n
B taro U
II) Festivals, Food ’ A) Ancient Rome, Wines
Grapes Fermentation
6,038,560
U.S. Patent
Mar. 14,2000
Sheet 14 0f 21
Internet
'Vir'tudcferk
-
Concept Search Knowledge Search List Topics
Help Found 15 Documents and 5 Categories
7262"}??? Computer Networking (15) ‘if Internet CreditBureau, Incorporated (0) it Internet Fax Server (0) 73? Internet Productions, Incorporated (0) 72: Internet Newbies (0)
FIG. 10A
6,038,560
U.S. Patent
Mar. 14,2000
Sheet 15 0f 21
6,038,560
FIG. 108 Science and Technology (2380) l Communications (279) | Telecommunications lndustry(90)
Computer Networking(15) Electronic Mail (1) GE Networks (1)
Internet Technology (2)
Messaging (1) NBC Networks (3)
Networks (1) [1 Documents About Computer Networking and Also: @Colorado I] 7/01/88 Business Brief: Noted... @Mexican [3 8/19/88 The Americas: Mexico's... iii NBC Officials
[1 7/05/88 NBC Talks With European...
[5 State Agencies
D 10/07/88 Three Comrgnies Win $180...
15 Television and Radio B 8/09/88 NBC-TV Trvinq to Beat... See Also:
Computer Hardware lndusm/ (56)
Computer lndustgy (256) Computer Standards (1) information Technology (9) Mathematics (4)
Q Q - New
+1 @
U.S. Patent
Mar. 14,2000
Sheet 16 0f 21
Internet
1
‘VirtuafCferk
?g Concept Search
Knowledge Search
-
6,038,560
List Topics Help Stocks
Found 152 Documents and 64 Categories ‘£6262? Commerce and Trade
7372? Companies *i? Financial investments
72???? Investors 7372f Portfolios 7:? Pharmaceutical lndustgy
* Magazines 72? Automotive industry
it? Mineralogy 72? Computer Software industry it Stocks and Bonds ‘1;? Food and Drink lndustgy 7Q? Petroleum Products Industry 31? Television and Radio ‘3? New York Life insurance Company
7,“? McGraw-Hill, Incorporated
71‘? Banking lndustgy ‘if Industrial Goods Manufacturing
72? Texaco, Incorporated 731' Insurance lndustgy
7:? Lawyers 7:? Walt Disney Company wk CitiCorp if? Diversi?ed Companies
if ma 72’ Preferred Stocks 7:? Computer Hardware Industry
U.S. Patent
Mar. 14,2000
Sheet 17 0f 21
FIG. 11A-2 ‘ii? Dun & Bradstreet Corporation
72! Health-care Comganies 72f Brokers 72? Personal Finance 7:? Lawsuits
7:? Leveraged Buy-outs 72f ltel Corooration
72f Comguler Industry 7L‘? Aviation 7L‘: Plastic and Rubber
72: Hard Sciences
7,“: Ball Transgortation ‘it Financial Lending if Chrysler Corgoration it Gillette
72: M ii? Elmo. if Brush Wellman, lncoroorated if Taxes and Tarills
7:? Manufacturing 71‘? Jaoanese Comganies 72f Airlines 73? Cinema ‘3! Construction lndustnr 72? Automotive Service and Repair #2: Retail Trade Industry ‘12? Dow Chemical Company ‘if Real Estate
7:? ConsumerElectronics if Chemical lndustgr 7:? Convenience Products Businesses
72? Shares Outstanding if American Brands, lncoroorated
‘it! Motorola, lncorgorated 72f Packaqe Deliverv lndustrv ‘if Masco Corgoration
EG
6,038,560
U.S. Patent
Mar. 14,2000
Sheet 18 of 21
6,038,560
FIG. 115 Business and Economics (5438) I Business and lndustnr (2889) l Corporate Practices (263)
Portfolios (4) El Documents About Portfolios and Also: 15 Commerce and Trade U 11/16/88 Money Manaqers With... [5 Interest Rates ['3 8/24/88 Your Money Matters: Manv... 5 investors [3 10/10/88 These Stocks Are a...
5 Securities
[11 7/14/88 Fannie Mae Net Rose 97%...
Q Q Q