Semantic Inversion In XML Keyword Search with General Conditional Random Fields Shuhan Wang Peking University Tithew»QLAS.Yahoo!
2013-10-14
Semantic Inversion In XML Keyword Search with General Conditional Random Fields
Introduction
2
Semantic Inversion
3
general CRF
4
Experiments
5
Future Work and Conclusion
Semantic Inversion In XML Keyword Search with General Conditional Random Fields
P EK I N
S I TY
1
UN IV E
R
Outline
G
1 8 9 8
P EK I N
• In existing IR system§users can hardly specialize their search
intent precisely • The structural information has not been adequately taken into
consideration • ,,
Semantic Inversion In XML Keyword Search with General Conditional Introduction Random Fields
S I TY
• Ambiguity in input sequence
UN IV E
R
Main Problems in Keyword Search
G
1 8 9 8
written by Pineau and is about some technique based on point. • One possible search inputµ’Point based Pineau IJCAI ’ • If the system knows that .Point based.is part of the
paper.s title(sometimes users may hardly remember the whole title, so only type in part of it), ’Pineau.is the author, and .IJCAI.is the title of a proceeding, the search will surely be optimized.
Semantic Inversion In XML Keyword Search with General Conditional Introduction Random Fields
P EK I N
S I TY
• A user wants to find a paper published by IJCAI. The paper is
UN IV E
R
An Example
G
1 8 9 8
P EK I N
of the term if the content of the tag contains the word. A single word may have many probable labels. • The label of word .Pineau.can be .author., and the label
of words (appearing together) .Torran Dubh.can be .conflict.or .caption.. • Label Sequence: Given a search keyword sequence S, the
corresponding label sequence is composed of labels respect to each word in keyword sequence. Apparently, a keyword sequence can be recognized as various probable label sequences, which express diverse semantic. • The Label Sequence of ’Point, based, Pineau, IJCAI’ can be
’title, title, author, booktitle’
Semantic Inversion In XML Keyword Search with General Conditional Semantic RandomInversion Fields
S I TY
• Label: Given a set of XML files and a word, a tag is the label
UN IV E
R
Definition
G
1 8 9 8
P EK I N
keyword sequence S = {w1 , w2 , · · · , wk }, the problem of semantic inversion is to find sequential label(s) L = {l1 , l2 , · · · , lk } which maximizes Sim(S, L), where Sim(S, L) is a function to evaluate the fitness or relevance of S and L. The answer sequences (label sequences) are given in descending order of Sim(S, L). • In our algorithm, the CRF model uses conditional probability
P r(y|x) as the relevance function Sim(S, L).
Semantic Inversion In XML Keyword Search with General Conditional Semantic RandomInversion Fields
S I TY
• Semantic Inversion: Given a set of XML files and search
UN IV E
R
Semantic Inversion
G
1 8 9 8
P EK I N
S I TY
1 8 9 8
• For the keyword sequence x and the semantic sequence y, the
global feature vector of CRF is the sum of all the local feature functions: n X F(y, x) = f (y, x, i) i=1
• CRF computes the conditional probability with parameter
vector w by ew·F(y,x) w·F(y0 ,x) y0 e
P r(y|x, w) = P
• Training a CRF is to learn λ for maximizing the log-likelihood
of a given training set T = {(xk , yk )}N k=1 . So the gradient is: X ∇Lw = [F(yk , xk ) − EP r(y0 |xk ,w) F(y0 , xk )] k
−
UN IV E
R
Conditional Random Fields
G
w σ2
Semantic Inversion In XML Keyword Search with General Conditional general Random CRF Fields
Semantic Inversion In XML Keyword Search with General Conditional general Random CRF Fields
P EK I N
S I TY
• Why general? • Linear-chain CRFs performs well in sequential learning problems such as NP chunking, Part of Speech tagging, Opinion Expression Identification and Named Entity Recognition. • linear CRF vs general CRF • linear CRF only consider the relevance between the adjacent y§while general CRF consider the relevance of all pairs of yi , yj . • keyword sequence is not restrictly sequential§’Point based Pineau IJCAI ’ and ’IJCAI Pineau Point based’ should be relatively similar. • We need to structure y to describe the relevance of all pairs of labels, rather than only the adjacent ones, so we choose general CRF.
UN IV E
R
general CRF
G
1 8 9 8
S I TY
P EK I N
UN IV E
R
Features
G
1 8 9 8
Assume x = x1 , x2 , · · · , xn is the keyword sequence§y = y1 , y2 , · · · , yn is the label sequence. i ,yi ) • f (xi , yi ) = p(x p(xi ) expresses the dependence between the
keyword and the label in position i. p(y ,y )
i j • g(yi , yj ) = p(y )p(y expresses the relevance of two labels. i j) • h(xi , xi+1 , yi , yi+1 ) and h0 (xi , xi+1 , yi , yi+1 ) measure the
relevance of the adjacent keywords. ( p(x ,x h(xi , xi+1 , yi , yi+1 ) =
h0 (xi , xi+1 , yi , yi+1 ) =
i i+1 ) p(xi )p(xi+1 ) ,
yi = yi+1
0,
yi 6= yi+1
( 0,
yi = yi+1 yi 6= yi+1
p(xi ,xi+1 ) p(xi )p(xi+1 ) ,
Semantic Inversion In XML Keyword Search with General Conditional general Random CRF Fields
S I TY
P EK I N
UN IV E
R
Results
G
1 8 9 8
• Dataset: 50,000 Wikipedia documents •
P Acc =
M atch(ˆ y, y) P Length(x)
x∈S
x∈S
where M atch(ˆ y, y) =
X
eq(ˆ y, y, i)
i
• Our algorithms(73.2%) outperforms the baseline
algorithms(38.1% for Random Select Algorithm and 61.2% for Greedy Algorithm), which confirms that fully consideration of various categories of relevance can improve the quality of semantic inversion. Semantic Inversion In XML Keyword Search with General Conditional Experiments Random Fields
P EK I N
S I TY
Semantic Inversion In XML Keyword Search with General Conditional Experiments Random Fields
UN IV E
R
Accuracy of Top-N Answers
G
1 8 9 8
P EK I N
S I TY
Semantic Inversion In XML Keyword Search with General Conditional Experiments Random Fields
UN IV E
R
The Rank of the Correct Answer
G
1 8 9 8
P EK I N
which our algorithm gives are ’caption, ’conflict, ’conflict, ’caption,
caption, conflict, conflict, caption,
S I TY
1 8 9 8
• For the query’Torran, Dubh, John, Murray’§Top 4 results • • • •
UN IV E
R
Representation
G
name, name’ commander1, commander1’ name, name’ commander1, commander1’
• Ambiguity can exist between specialized and generalized
conceptions.
Semantic Inversion In XML Keyword Search with General Conditional Experiments Random Fields
Semantic Inversion In XML Keyword Search with General Conditional Future Random Work Fields and Conclusion
P EK I N
S I TY
• Not XML? • Expand the definition of ’semantic’ • Real Search Engine • Tithew»QLAS.Yahoo!
UN IV E
R
Future work
G
1 8 9 8
some names are naturally ambiguous. • Michael Jordan UC Berkley • Aaron Brooks Score • P r(entity|name, context words)
Semantic Inversion In XML Keyword Search with General Conditional Future Random Work Fields and Conclusion
P EK I N
S I TY
• Existing search engine can recognize person names §but
UN IV E
R
Entity Precise Recognition» »QLAS.Yahoo!
G
1 8 9 8
• Where is the southernmost point of Africa? • How to solve it? • Match the question with the templates • Split out the keywords • Semantic Inversion£precise entities§semantic analysis¤ • Get the answer from the knowledge base.
Semantic Inversion In XML Keyword Search with General Conditional Future Random Work Fields and Conclusion
P EK I N
S I TY
• %3 of queries are real questions
UN IV E
R
Questions» »QLAS.Yahoo!
G
1 8 9 8
keyword search. • Semantic Inversion provides users with the opportunities to
clarify their search intent. • Semantic Inversion can be well solved by general conditional
random fields. • Semantic Inversion is quite useful in real search engine.
Thank you! Semantic Inversion In XML Keyword Search with General Conditional Future Random Work Fields and Conclusion
P EK I N
S I TY
• Semantic Inversion precisely recognize the semantic of the
UN IV E
R
Conclusion
G
1 8 9 8
General Conditional Random Fields has been accepted by Web Information Systems Engineering 2013 (CCF recommended international conference). • The paper also introduces the related work, the algorithms,
and the details of experiments. • Access: LNCS 8180, pp. 431-440
Semantic Inversion In XML Keyword Search with General Conditional Future Random Work Fields and Conclusion
P EK I N
S I TY
• The paper Semantic Inversion In XML Keyword Search with
UN IV E
R
Publication
G
1 8 9 8