Stable Coactive Learning via Perturbation Karthik Raman
1
Thorsten Joachims 1 Tobias Schnabel
1 Cornell
2 AT&T
Pannaga Shivaswamy
2
3
University {karthik,tj}@cs.cornell.edu Research
[email protected] 3 Stuttgart
University
[email protected] June 19, 2013
Karthik Raman (Cornell)
Stable Coactive Learning
June 19, 2013
1/5
Coactive Learning Learning model Repeat forever: System receives context xt .
Karthik Raman (Cornell)
Stable Coactive Learning
June 19, 2013
2/5
Coactive Learning e.g. : Search Engine
Learning model Repeat forever: System receives context xt .
Karthik Raman (Cornell)
Stable Coactive Learning
June 19, 2013
2/5
Coactive Learning e.g. : Search Engine
Learning model Repeat forever: System receives context xt .
Karthik Raman (Cornell)
Stable Coactive Learning
User Query
June 19, 2013
2/5
Coactive Learning e.g. : Search Engine
Learning model Repeat forever: System receives context xt .
User Query
System makes prediction yt . Ranking
Karthik Raman (Cornell)
Stable Coactive Learning
June 19, 2013
2/5
Coactive Learning e.g. : Search Engine
Learning model Repeat forever: System receives context xt .
User Query
System makes prediction yt . Regret = Regret + U(xt , yt∗ ) − U(xt , yt )
Ranking User utility
Karthik Raman (Cornell)
Stable Coactive Learning
June 19, 2013
2/5
Coactive Learning e.g. : Search Engine
Learning model Repeat forever: System receives context xt .
User Query
System makes prediction yt . Regret = Regret + U(xt , yt∗ ) − U(xt , yt ) System gets feedback: Full information: U(xt , y(1) ), U(xt , y(2) ), . . .
Karthik Raman (Cornell)
Stable Coactive Learning
Ranking User utility
June 19, 2013
2/5
Coactive Learning e.g. : Search Engine
Learning model Repeat forever: System receives context xt .
User Query
System makes prediction yt . Regret = Regret + U(xt , yt∗ ) − U(xt , yt ) System gets feedback: Full information: U(xt , y(1) ), U(xt , y(2) ), . . .
Ranking User utility
Unrealistic for users to provide (e.g., implicit feedback).
Karthik Raman (Cornell)
Stable Coactive Learning
June 19, 2013
2/5
Coactive Learning e.g. : Search Engine
Learning model Repeat forever: System receives context xt .
User Query
System makes prediction yt . Regret = Regret + U(xt , yt∗ ) − U(xt , yt ) System gets feedback: Full information: U(xt , y(1) ), U(xt , y(2) ), . . . Bandit: U(xt , yt )
Ranking User utility
Unrealistic for users to provide (e.g., implicit feedback).
Karthik Raman (Cornell)
Stable Coactive Learning
June 19, 2013
2/5
Coactive Learning e.g. : Search Engine
Learning model Repeat forever: System receives context xt .
User Query
System makes prediction yt . Regret = Regret + U(xt , yt∗ ) − U(xt , yt ) System gets feedback: Full information: U(xt , y(1) ), U(xt , y(2) ), . . . Bandit: U(xt , yt ) Coactive: U(xt , ¯ yt ) ≥α U(xt , yt )
Karthik Raman (Cornell)
Stable Coactive Learning
Ranking User utility
June 19, 2013
2/5
Coactive Learning e.g. : Search Engine
Learning model Repeat forever: System receives context xt .
User Query
System makes prediction yt . Regret = Regret + U(xt , yt∗ ) − U(xt , yt ) System gets feedback: Full information: U(xt , y(1) ), U(xt , y(2) ), . . . Bandit: U(xt , yt ) Coactive: U(xt , ¯ yt ) ≥α U(xt , yt )
Ranking User utility
1 Perceptron has regret O( √ ) for linear utility (U(x, y) = w∗>φ(x, y)). α T Karthik Raman (Cornell)
Stable Coactive Learning
June 19, 2013
2/5
User Study: Learning Rankings using Perceptron On live search engine. Goal: Learn ranking function from user clicks. Interleaved comparison against hand-tuned baseline.
Karthik Raman (Cornell)
Stable Coactive Learning
June 19, 2013
3/5
User Study: Learning Rankings using Perceptron On live search engine. Goal: Learn ranking function from user clicks. Interleaved comparison against hand-tuned baseline. Win ratio of 1 means no better than baseline (Higher = Better). User Study Results 2.2 2
Win Ratio
1.8 1.6 1.4 1.2 1
Preference Perceptron 0.8 0
5000
10000
15000
20000
25000
30000
Number Of Iterations
Karthik Raman (Cornell)
Stable Coactive Learning
June 19, 2013
3/5
User Study: Learning Rankings using Perceptron On live search engine. Goal: Learn ranking function from user clicks. Interleaved comparison against hand-tuned baseline. Win ratio of 1 means no better than baseline (Higher = Better). User Study Results 2.2 2
Win Ratio
1.8 1.6 1.4 1.2 1
Preference Perceptron 0.8 0
5000
10000
15000
20000
25000
30000
Number Of Iterations
Perceptron performs poorly! Karthik Raman (Cornell)
Stable Coactive Learning
June 19, 2013
3/5
User Study: Learning Rankings using Perceptron On live search engine.
Preference Perceptron Algo: 1
Initialize weight vector w1 ← 0.
2
Given context xt present yt ← argmaxy wt> φ(xt , y).
Goal: Learn ranking function from user clicks. Interleaved comparison against hand-tuned baseline. Win ratio of 1 means no better than baseline (Higher = Better). User Study Results 2.2 2
Presented Ranking (y)
Win Ratio
1.8 1.6 1.4 1.2 1
Preference Perceptron 0.8 0
5000
10000
15000
20000
25000
30000
Number Of Iterations
Perceptron performs poorly! Karthik Raman (Cornell)
Stable Coactive Learning
June 19, 2013
3/5
User Study: Learning Rankings using Perceptron On live search engine.
Preference Perceptron Algo: 1
Initialize weight vector w1 ← 0.
2
Given context xt present yt ← argmaxy wt> φ(xt , y).
3
Goal: Learn ranking function from user clicks. Interleaved comparison against hand-tuned baseline. Win ratio of 1 means no better than baseline (Higher = Better).
Observe clicks and construct feedback ranking ¯ yt .
User Study Results 2.2 2
Presented Ranking (y) Click!
Win Ratio
1.8 1.6 1.4 1.2 1
Preference Perceptron Click!
0.8 0
5000
10000
15000
20000
25000
30000
Number Of Iterations
Perceptron performs poorly! Karthik Raman (Cornell)
Stable Coactive Learning
June 19, 2013
3/5
User Study: Learning Rankings using Perceptron On live search engine.
Preference Perceptron Algo: 1
Initialize weight vector w1 ← 0.
2
Given context xt present yt ← argmaxy wt> φ(xt , y).
3
Goal: Learn ranking function from user clicks. Interleaved comparison against hand-tuned baseline. Win ratio of 1 means no better than baseline (Higher = Better).
Observe clicks and construct feedback ranking ¯ yt .
User Study Results 2.2 2
Presented Ranking (y) Click!
Feedback Ranking (y̅)
Win Ratio
1.8 1.6 1.4 1.2 1
Preference Perceptron Click!
0.8 0
5000
10000
15000
20000
25000
30000
Number Of Iterations
Perceptron performs poorly! Karthik Raman (Cornell)
Stable Coactive Learning
June 19, 2013
3/5
User Study: Learning Rankings using Perceptron On live search engine.
Preference Perceptron Algo: 1
Initialize weight vector w1 ← 0.
2
Given context xt present yt ← argmaxy wt> φ(xt , y).
3
Observe clicks and construct feedback ranking ¯ yt .
4
wt+1 ← wt +φ(xt , ¯ yt )−φ(xt , yt ).
Goal: Learn ranking function from user clicks. Interleaved comparison against hand-tuned baseline. Win ratio of 1 means no better than baseline (Higher = Better). User Study Results 2.2
5
Repeat from step 2.
2
Presented Ranking (y) Click!
Feedback Ranking (y̅)
Win Ratio
1.8 1.6 1.4 1.2 1
Preference Perceptron Click!
0.8 0
5000
10000
15000
20000
25000
30000
Number Of Iterations
Perceptron performs poorly! Karthik Raman (Cornell)
Stable Coactive Learning
June 19, 2013
3/5
Perturbed Preference Perceptron 1
Initialize weight vector w1 ← 0.
2
Given context xt compute ˆ yt ← argmaxy wt> φ(xt , y).
4
2
Present yt ← Perturb(ˆ yt ) (Randomly swap adjacent pairs). Observe clicks and construct feedback ranking ¯ yt .
1.8
Win Ratio
3
User Study Results 2.2
1.6
Preference Perceptron Perturbed Preference Perceptron
1.4 1.2 1
5
wt+1 ← wt +φ(xt , ¯ yt )−φ(xt , yt ).
6
Repeat from step 2.
0.8 0
5000
10000
15000
20000
25000
30000
Number Of Iterations
Presented Ranking (y)
Predicted Ranking (ŷ)
PERTURB
Karthik Raman (Cornell)
Stable Coactive Learning
June 19, 2013
4/5
Please come to our poster
I will tell you: Why the preference perceptron performs poorly? Why does perturbation fix the problem? What are the regret bounds for the algorithm? How do we do this more generally for non-ranking problems?
Karthik Raman (Cornell)
Stable Coactive Learning
June 19, 2013
5/5