Semantic Inversion In XML Keyword Search with ... - Semantic Scholar

Report 6 Downloads 75 Views
Semantic Inversion In XML Keyword Search with General Conditional Random Fields Shuhan Wang Peking University Tithew»QLAS.Yahoo!

2013-10-14

Semantic Inversion In XML Keyword Search with General Conditional Random Fields

Introduction

2

Semantic Inversion

3

general CRF

4

Experiments

5

Future Work and Conclusion

Semantic Inversion In XML Keyword Search with General Conditional Random Fields

P EK I N

S I TY

1

UN IV E

R

Outline

G

1 8 9 8

P EK I N

• In existing IR system§users can hardly specialize their search

intent precisely • The structural information has not been adequately taken into

consideration • ,,

Semantic Inversion In XML Keyword Search with General Conditional Introduction Random Fields

S I TY

• Ambiguity in input sequence

UN IV E

R

Main Problems in Keyword Search

G

1 8 9 8

written by Pineau and is about some technique based on point. • One possible search inputµ’Point based Pineau IJCAI ’ • If the system knows that .Point based.is part of the

paper.s title(sometimes users may hardly remember the whole title, so only type in part of it), ’Pineau.is the author, and .IJCAI.is the title of a proceeding, the search will surely be optimized.

Semantic Inversion In XML Keyword Search with General Conditional Introduction Random Fields

P EK I N

S I TY

• A user wants to find a paper published by IJCAI. The paper is

UN IV E

R

An Example

G

1 8 9 8

P EK I N

of the term if the content of the tag contains the word. A single word may have many probable labels. • The label of word .Pineau.can be .author., and the label

of words (appearing together) .Torran Dubh.can be .conflict.or .caption.. • Label Sequence: Given a search keyword sequence S, the

corresponding label sequence is composed of labels respect to each word in keyword sequence. Apparently, a keyword sequence can be recognized as various probable label sequences, which express diverse semantic. • The Label Sequence of ’Point, based, Pineau, IJCAI’ can be

’title, title, author, booktitle’

Semantic Inversion In XML Keyword Search with General Conditional Semantic RandomInversion Fields

S I TY

• Label: Given a set of XML files and a word, a tag is the label

UN IV E

R

Definition

G

1 8 9 8

P EK I N

keyword sequence S = {w1 , w2 , · · · , wk }, the problem of semantic inversion is to find sequential label(s) L = {l1 , l2 , · · · , lk } which maximizes Sim(S, L), where Sim(S, L) is a function to evaluate the fitness or relevance of S and L. The answer sequences (label sequences) are given in descending order of Sim(S, L). • In our algorithm, the CRF model uses conditional probability

P r(y|x) as the relevance function Sim(S, L).

Semantic Inversion In XML Keyword Search with General Conditional Semantic RandomInversion Fields

S I TY

• Semantic Inversion: Given a set of XML files and search

UN IV E

R

Semantic Inversion

G

1 8 9 8

P EK I N

S I TY

1 8 9 8

• For the keyword sequence x and the semantic sequence y, the

global feature vector of CRF is the sum of all the local feature functions: n X F(y, x) = f (y, x, i) i=1

• CRF computes the conditional probability with parameter

vector w by ew·F(y,x) w·F(y0 ,x) y0 e

P r(y|x, w) = P

• Training a CRF is to learn λ for maximizing the log-likelihood

of a given training set T = {(xk , yk )}N k=1 . So the gradient is: X ∇Lw = [F(yk , xk ) − EP r(y0 |xk ,w) F(y0 , xk )] k



UN IV E

R

Conditional Random Fields

G

w σ2

Semantic Inversion In XML Keyword Search with General Conditional general Random CRF Fields

Semantic Inversion In XML Keyword Search with General Conditional general Random CRF Fields

P EK I N

S I TY

• Why general? • Linear-chain CRFs performs well in sequential learning problems such as NP chunking, Part of Speech tagging, Opinion Expression Identification and Named Entity Recognition. • linear CRF vs general CRF • linear CRF only consider the relevance between the adjacent y§while general CRF consider the relevance of all pairs of yi , yj . • keyword sequence is not restrictly sequential§’Point based Pineau IJCAI ’ and ’IJCAI Pineau Point based’ should be relatively similar. • We need to structure y to describe the relevance of all pairs of labels, rather than only the adjacent ones, so we choose general CRF.

UN IV E

R

general CRF

G

1 8 9 8

S I TY

P EK I N

UN IV E

R

Features

G

1 8 9 8

Assume x = x1 , x2 , · · · , xn is the keyword sequence§y = y1 , y2 , · · · , yn is the label sequence. i ,yi ) • f (xi , yi ) = p(x p(xi ) expresses the dependence between the

keyword and the label in position i. p(y ,y )

i j • g(yi , yj ) = p(y )p(y expresses the relevance of two labels. i j) • h(xi , xi+1 , yi , yi+1 ) and h0 (xi , xi+1 , yi , yi+1 ) measure the

relevance of the adjacent keywords. ( p(x ,x h(xi , xi+1 , yi , yi+1 ) =

h0 (xi , xi+1 , yi , yi+1 ) =

i i+1 ) p(xi )p(xi+1 ) ,

yi = yi+1

0,

yi 6= yi+1

( 0,

yi = yi+1 yi 6= yi+1

p(xi ,xi+1 ) p(xi )p(xi+1 ) ,

Semantic Inversion In XML Keyword Search with General Conditional general Random CRF Fields

S I TY

P EK I N

UN IV E

R

Results

G

1 8 9 8

• Dataset: 50,000 Wikipedia documents •

P Acc =

M atch(ˆ y, y) P Length(x)

x∈S

x∈S

where M atch(ˆ y, y) =

X

eq(ˆ y, y, i)

i

• Our algorithms(73.2%) outperforms the baseline

algorithms(38.1% for Random Select Algorithm and 61.2% for Greedy Algorithm), which confirms that fully consideration of various categories of relevance can improve the quality of semantic inversion. Semantic Inversion In XML Keyword Search with General Conditional Experiments Random Fields

P EK I N

S I TY

Semantic Inversion In XML Keyword Search with General Conditional Experiments Random Fields

UN IV E

R

Accuracy of Top-N Answers

G

1 8 9 8

P EK I N

S I TY

Semantic Inversion In XML Keyword Search with General Conditional Experiments Random Fields

UN IV E

R

The Rank of the Correct Answer

G

1 8 9 8

P EK I N

which our algorithm gives are ’caption, ’conflict, ’conflict, ’caption,

caption, conflict, conflict, caption,

S I TY

1 8 9 8

• For the query’Torran, Dubh, John, Murray’§Top 4 results • • • •

UN IV E

R

Representation

G

name, name’ commander1, commander1’ name, name’ commander1, commander1’

• Ambiguity can exist between specialized and generalized

conceptions.

Semantic Inversion In XML Keyword Search with General Conditional Experiments Random Fields

Semantic Inversion In XML Keyword Search with General Conditional Future Random Work Fields and Conclusion

P EK I N

S I TY

• Not XML? • Expand the definition of ’semantic’ • Real Search Engine • Tithew»QLAS.Yahoo!

UN IV E

R

Future work

G

1 8 9 8

some names are naturally ambiguous. • Michael Jordan UC Berkley • Aaron Brooks Score • P r(entity|name, context words)

Semantic Inversion In XML Keyword Search with General Conditional Future Random Work Fields and Conclusion

P EK I N

S I TY

• Existing search engine can recognize person names §but

UN IV E

R

Entity Precise Recognition» »QLAS.Yahoo!

G

1 8 9 8

• Where is the southernmost point of Africa? • How to solve it? • Match the question with the templates • Split out the keywords • Semantic Inversion£precise entities§semantic analysis¤ • Get the answer from the knowledge base.

Semantic Inversion In XML Keyword Search with General Conditional Future Random Work Fields and Conclusion

P EK I N

S I TY

• %3 of queries are real questions

UN IV E

R

Questions» »QLAS.Yahoo!

G

1 8 9 8

keyword search. • Semantic Inversion provides users with the opportunities to

clarify their search intent. • Semantic Inversion can be well solved by general conditional

random fields. • Semantic Inversion is quite useful in real search engine.

Thank you! Semantic Inversion In XML Keyword Search with General Conditional Future Random Work Fields and Conclusion

P EK I N

S I TY

• Semantic Inversion precisely recognize the semantic of the

UN IV E

R

Conclusion

G

1 8 9 8

General Conditional Random Fields has been accepted by Web Information Systems Engineering 2013 (CCF recommended international conference). • The paper also introduces the related work, the algorithms,

and the details of experiments. • Access: LNCS 8180, pp. 431-440

Semantic Inversion In XML Keyword Search with General Conditional Future Random Work Fields and Conclusion

P EK I N

S I TY

• The paper Semantic Inversion In XML Keyword Search with

UN IV E

R

Publication

G

1 8 9 8