Scholarly Paper Recommendation via User’s Recent Research Interests Kazunari Sugiyama,
Min-Yen Kan
National University of Singapore
Introduction Digital Contents Documents
“Information Overload” Images
Movies
WING, NUS
2
Digital Library
•Email alerts •RSS feeds Users are required to inputs their interests explicitly.
WING, NUS
3
Introduction Our aim • To provide recommendation of papers by using latent information about each user’s research interests • Historical and current publication lists
Users are not required to input their interests explicitly.
WING, NUS
4
Related Work Improving Ranking in Digital Library • Ranking Search Results ISI impact factor [Garfield, ‘79]
= High impact
Low impact
Recent works introduce PageRank to weight and control for the impact of papers [Sun and Giles, ECIR’07] [Krapivin and Marchese, ICADL’08], [Sayyadi and Getoor, SIAM Data Mining, ‘09] WING, NUS
5
Related Work Improving Ranking in Digital Library • Measuring the Importance of Scholarly Papers ISI impact factor [Garfield, ‘79]
Popularity biased PageRank also controls the popularity bias [Bollen et al., Scientometrics’06] [Chen et al., Informetrics’07] WING, NUS
6
Related Work Recommendation in Scholarly Digital Libraries • Collaborative Filtering Approach [McNee et al., CSCW’02]: Focuses on citation network of papers [Yang et al., JCDL’09]: Ranking-oriented collaborative filtering
• Hybrid Approach of Collaborative Filtering and Content-based Filtering [Torres et al., JCDL’04]: Many users satisfied with the recommended papers
• PageRank-based Approach [Gori and Pucci, WI’06]: Focuses on graph structure of papers WING, NUS
7
Related Work Robust User Profile Construction in Recommendation Systems • Web Search Results [Teevan et al., SIGIR’05]: Visited Web pages and emails history [White et al., SIGIR’09]: A small number of Web pages preceding the current browsing page
• Dynamic Content such as News [Shen et al., SIGIR’05] Kullback-Leibler divergence is used [Tan et al., KDD’06] to represent a user’s information need [Chu and Park, WWW’09]: Use demographics and interaction data
• Abstracts of Scholarly Papers [Kim et al., ICADL’08]: Frequent patterns from click-history and term weight WING, NUS
8
Proposed Method
• Junior researchers Only one recently published paper without citations
• Senior researchers Multiple published papers with citation papers WING, NUS
9
User Profile Construction (Junior Researchers)
WING, NUS
10
User Profile Construction (Senior Researchers)
WING, NUS
11
Linear Combination
Weighting Scheme (LC) pc1→ p1
f p1
pc1→ p
1
f
p1
=
=
( …, digital, library, recommendation, …)
(…, 0.25, 0.13,
0.47,
…)
( …, digital, library, recommendation, …)
(…, 0.53, 0.38,
0.62,
…)
References
p1→ref1
F =f p1
WING, NUS
f p1
+ 1 ×f + p1→ref1 + 1×f + pc→ p1
p1→ref1
=
( …, digital, library, recommendation, …)
(…, 0.61, 0.72,
0,
…)
Puser = F p1 + F p2 + 12
Cosine Similarity
Weighting Scheme (SIM) pc1→ p1
f
pc1→ p
1
=
( …, digital, library, recommendation, …)
(…, 0.25, 0.13,
0.47,
…)
Similarity: 0.36
p1
f
p1
=
( …, digital, library, recommendation, …)
(…, 0.53, 0.38,
0.62,
…)
References
Similarity: 0.54
p1→ref1
F =f p1
WING, NUS
f p1
+ 0.36 × f
+ p + 0.54 × f 1→ref1 + pc→ p1
p1→ref1
=
( …, digital, library, recommendation, …)
(…, 0.61, 0.72,
0,
…)
Puser = F p1 + F p2 + 13
Reciprocal of the Difference Between Published Years
Weighting Scheme (RPY) pc1→ p1
f
pc1→ p
1
=
( …, digital, library, recommendation, …)
(…, 0.25, 0.13,
0.47,
…)
(‘07) RPY: 1/2=0.50 Difference of published years: 2 1 (‘05)
p
f
p1
=
( …, digital, library, recommendation, …)
(…, 0.53, 0.38,
0.62,
…)
References
Difference of published years: 4 RPY: 1/4=0.25
p1→ref1
f
(‘01)
F =f p1
WING, NUS
p1
+ 0.50 × f + p1→ref1 0.25 + ×f + pc→ p1
p1→ref1
=
( …, digital, library, recommendation, …)
(…, 0.61, 0.72,
0,
…)
Puser = F p1 + F p2 + 14
Forgetting Factor
Weighting Scheme (FF, senior researchers only) Publication list old p1 p2 (‘02)
new pi (‘05)
(‘03)
d =7
d =8
pn (‘10)
d =5
W pn→z = e −γ ×d [ γ : forgetting coefficient (0 ≤ γ ≤ 1) ] (e.g., γ = 0.2)
W
pn→ pi
= e −0.2×5
W = e −0.2×7 pn→ p1 W = e −0.2×8 pn→ p2
WING, NUS
Puser = F pn + + e −0.2×5 ⋅ F pi + + e −0.2×7 ⋅ F p2 + e −0.2×8 ⋅ F p1 15
(2) Feature Vector Construction for Candidate Papers • •
Basically, TF-IDF Also use information about citation and reference papers
pc1→ p
rec
F
prec
prec
=f
prec
pc1→ p
+W ⋅f + prec→ref1 prec→ref1 +W ⋅f + rec
pc1→ prec
References
Weighting scheme
prec→ref1 WING, NUS
W
p1→refi
(i = 1, , l )
• LC • SIM • RPY 16
(3) Recommendation of Papers •
Compute cosine similarity prec P ⋅ F user sim(Puser , F prec ) = | Puser | ⋅ | F prec | Puser : User profile F prec : Feature vector for candidate paper to recommend
•
Then, recommend the top n papers to the user •
WING, NUS
n=5,10
17
Experiments Experimental Data • Researchers
WING, NUS
Natural Language Processing Information Retrieval
Junior researchers
Senior researchers
Number of subjects
15
13
Average number of DBLP papers
1.0
9.5
Average number of relevant papers in ACL’00 – ‘06
28.6
38.7
Average number of citation papers
0
10.5 (max. 199)
Average number of reference papers
18.7 (max. 29)
19.4 (max.79) 18
Experiments Experimental Data • Candidate Papers to Recommend
pc1→ p
tgt
ACL Anthology Reference Corpus [Bird et al., LREC’08]
ptgt References
ptgt → pref1
WING, NUS
Information about citation and reference papers
ptgt ptgt → pref1