Weighting Scheme (RPY) - NUS

Report 0 Downloads 49 Views
Scholarly Paper Recommendation via User’s Recent Research Interests Kazunari Sugiyama,

Min-Yen Kan

National University of Singapore

Introduction Digital Contents Documents

“Information Overload” Images

Movies

WING, NUS

2

Digital Library

•Email alerts •RSS feeds Users are required to inputs their interests explicitly.

WING, NUS

3

Introduction Our aim • To provide recommendation of papers by using latent information about each user’s research interests • Historical and current publication lists

Users are not required to input their interests explicitly.

WING, NUS

4

Related Work Improving Ranking in Digital Library • Ranking Search Results ISI impact factor [Garfield, ‘79]

= High impact

Low impact

Recent works introduce PageRank to weight and control for the impact of papers [Sun and Giles, ECIR’07] [Krapivin and Marchese, ICADL’08], [Sayyadi and Getoor, SIAM Data Mining, ‘09] WING, NUS

5

Related Work Improving Ranking in Digital Library • Measuring the Importance of Scholarly Papers ISI impact factor [Garfield, ‘79]

Popularity biased PageRank also controls the popularity bias [Bollen et al., Scientometrics’06] [Chen et al., Informetrics’07] WING, NUS

6

Related Work Recommendation in Scholarly Digital Libraries • Collaborative Filtering Approach [McNee et al., CSCW’02]: Focuses on citation network of papers [Yang et al., JCDL’09]: Ranking-oriented collaborative filtering

• Hybrid Approach of Collaborative Filtering and Content-based Filtering [Torres et al., JCDL’04]: Many users satisfied with the recommended papers

• PageRank-based Approach [Gori and Pucci, WI’06]: Focuses on graph structure of papers WING, NUS

7

Related Work Robust User Profile Construction in Recommendation Systems • Web Search Results [Teevan et al., SIGIR’05]: Visited Web pages and emails history [White et al., SIGIR’09]: A small number of Web pages preceding the current browsing page

• Dynamic Content such as News [Shen et al., SIGIR’05] Kullback-Leibler divergence is used [Tan et al., KDD’06] to represent a user’s information need [Chu and Park, WWW’09]: Use demographics and interaction data

• Abstracts of Scholarly Papers [Kim et al., ICADL’08]: Frequent patterns from click-history and term weight WING, NUS

8

Proposed Method

• Junior researchers Only one recently published paper without citations

• Senior researchers Multiple published papers with citation papers WING, NUS

9

User Profile Construction (Junior Researchers)

WING, NUS

10

User Profile Construction (Senior Researchers)

WING, NUS

11

Linear Combination

Weighting Scheme (LC) pc1→ p1

f p1

pc1→ p

1

f

p1

=

=

( …, digital, library, recommendation, …)

(…, 0.25, 0.13,

0.47,

…)

( …, digital, library, recommendation, …)

(…, 0.53, 0.38,

0.62,

…)

References

p1→ref1

F =f p1

WING, NUS

f p1

+ 1 ×f + p1→ref1 + 1×f + pc→ p1

p1→ref1

=

( …, digital, library, recommendation, …)

(…, 0.61, 0.72,

0,

…)

Puser = F p1 + F p2 +  12

Cosine Similarity

Weighting Scheme (SIM) pc1→ p1

f

pc1→ p

1

=

( …, digital, library, recommendation, …)

(…, 0.25, 0.13,

0.47,

…)

Similarity: 0.36

p1

f

p1

=

( …, digital, library, recommendation, …)

(…, 0.53, 0.38,

0.62,

…)

References

Similarity: 0.54

p1→ref1

F =f p1

WING, NUS

f p1

+ 0.36 × f

+ p + 0.54 × f 1→ref1 + pc→ p1

p1→ref1

=

( …, digital, library, recommendation, …)

(…, 0.61, 0.72,

0,

…)

Puser = F p1 + F p2 +  13

Reciprocal of the Difference Between Published Years

Weighting Scheme (RPY) pc1→ p1

f

pc1→ p

1

=

( …, digital, library, recommendation, …)

(…, 0.25, 0.13,

0.47,

…)

(‘07) RPY: 1/2=0.50 Difference of published years: 2 1 (‘05)

p

f

p1

=

( …, digital, library, recommendation, …)

(…, 0.53, 0.38,

0.62,

…)

References

Difference of published years: 4 RPY: 1/4=0.25

p1→ref1

f

(‘01)

F =f p1

WING, NUS

p1

+ 0.50 × f + p1→ref1 0.25 + ×f + pc→ p1

p1→ref1

=

( …, digital, library, recommendation, …)

(…, 0.61, 0.72,

0,

…)

Puser = F p1 + F p2 +  14

Forgetting Factor

Weighting Scheme (FF, senior researchers only) Publication list old p1 p2 (‘02)

new pi (‘05)

(‘03)

d =7

d =8

pn (‘10)

d =5

W pn→z = e −γ ×d [ γ : forgetting coefficient (0 ≤ γ ≤ 1) ] (e.g., γ = 0.2)

W

pn→ pi

= e −0.2×5

W = e −0.2×7 pn→ p1 W = e −0.2×8 pn→ p2

WING, NUS

Puser = F pn +  + e −0.2×5 ⋅ F pi +  + e −0.2×7 ⋅ F p2 + e −0.2×8 ⋅ F p1 15

(2) Feature Vector Construction for Candidate Papers • •

Basically, TF-IDF Also use information about citation and reference papers

pc1→ p

rec

F

prec

prec

=f

prec

pc1→ p

+W ⋅f + prec→ref1 prec→ref1 +W ⋅f + rec

pc1→ prec

References

Weighting scheme

prec→ref1 WING, NUS

W

p1→refi

(i = 1,  , l )

• LC • SIM • RPY 16

(3) Recommendation of Papers •

Compute cosine similarity prec P ⋅ F user sim(Puser , F prec ) = | Puser | ⋅ | F prec | Puser : User profile F prec : Feature vector for candidate paper to recommend



Then, recommend the top n papers to the user •

WING, NUS

n=5,10

17

Experiments Experimental Data • Researchers

WING, NUS

Natural Language Processing Information Retrieval

Junior researchers

Senior researchers

Number of subjects

15

13

Average number of DBLP papers

1.0

9.5

Average number of relevant papers in ACL’00 – ‘06

28.6

38.7

Average number of citation papers

0

10.5 (max. 199)

Average number of reference papers

18.7 (max. 29)

19.4 (max.79) 18

Experiments Experimental Data • Candidate Papers to Recommend

pc1→ p

tgt

ACL Anthology Reference Corpus [Bird et al., LREC’08]

ptgt References

ptgt → pref1

WING, NUS

Information about citation and reference papers

ptgt ptgt → pref1