Stable Coactive Learning via Perturbation - Cornell Computer Science

Report 2 Downloads 42 Views
Stable Coactive Learning via Perturbation Karthik Raman

1

Thorsten Joachims 1 Tobias Schnabel

1 Cornell

2 AT&T

Pannaga Shivaswamy

2

3

University {karthik,tj}@cs.cornell.edu Research [email protected]

3 Stuttgart

University [email protected]

June 19, 2013

Karthik Raman (Cornell)

Stable Coactive Learning

June 19, 2013

1/5

Coactive Learning Learning model Repeat forever: System receives context xt .

Karthik Raman (Cornell)

Stable Coactive Learning

June 19, 2013

2/5

Coactive Learning e.g. : Search Engine

Learning model Repeat forever: System receives context xt .

Karthik Raman (Cornell)

Stable Coactive Learning

June 19, 2013

2/5

Coactive Learning e.g. : Search Engine

Learning model Repeat forever: System receives context xt .

Karthik Raman (Cornell)

Stable Coactive Learning

User Query

June 19, 2013

2/5

Coactive Learning e.g. : Search Engine

Learning model Repeat forever: System receives context xt .

User Query

System makes prediction yt . Ranking

Karthik Raman (Cornell)

Stable Coactive Learning

June 19, 2013

2/5

Coactive Learning e.g. : Search Engine

Learning model Repeat forever: System receives context xt .

User Query

System makes prediction yt . Regret = Regret + U(xt , yt∗ ) − U(xt , yt )

Ranking User utility

Karthik Raman (Cornell)

Stable Coactive Learning

June 19, 2013

2/5

Coactive Learning e.g. : Search Engine

Learning model Repeat forever: System receives context xt .

User Query

System makes prediction yt . Regret = Regret + U(xt , yt∗ ) − U(xt , yt ) System gets feedback: Full information: U(xt , y(1) ), U(xt , y(2) ), . . .

Karthik Raman (Cornell)

Stable Coactive Learning

Ranking User utility

June 19, 2013

2/5

Coactive Learning e.g. : Search Engine

Learning model Repeat forever: System receives context xt .

User Query

System makes prediction yt . Regret = Regret + U(xt , yt∗ ) − U(xt , yt ) System gets feedback: Full information: U(xt , y(1) ), U(xt , y(2) ), . . .

Ranking User utility

Unrealistic for users to provide (e.g., implicit feedback).

Karthik Raman (Cornell)

Stable Coactive Learning

June 19, 2013

2/5

Coactive Learning e.g. : Search Engine

Learning model Repeat forever: System receives context xt .

User Query

System makes prediction yt . Regret = Regret + U(xt , yt∗ ) − U(xt , yt ) System gets feedback: Full information: U(xt , y(1) ), U(xt , y(2) ), . . . Bandit: U(xt , yt )

Ranking User utility

Unrealistic for users to provide (e.g., implicit feedback).

Karthik Raman (Cornell)

Stable Coactive Learning

June 19, 2013

2/5

Coactive Learning e.g. : Search Engine

Learning model Repeat forever: System receives context xt .

User Query

System makes prediction yt . Regret = Regret + U(xt , yt∗ ) − U(xt , yt ) System gets feedback: Full information: U(xt , y(1) ), U(xt , y(2) ), . . . Bandit: U(xt , yt ) Coactive: U(xt , ¯ yt ) ≥α U(xt , yt )

Karthik Raman (Cornell)

Stable Coactive Learning

Ranking User utility

June 19, 2013

2/5

Coactive Learning e.g. : Search Engine

Learning model Repeat forever: System receives context xt .

User Query

System makes prediction yt . Regret = Regret + U(xt , yt∗ ) − U(xt , yt ) System gets feedback: Full information: U(xt , y(1) ), U(xt , y(2) ), . . . Bandit: U(xt , yt ) Coactive: U(xt , ¯ yt ) ≥α U(xt , yt )

Ranking User utility

1 Perceptron has regret O( √ ) for linear utility (U(x, y) = w∗>φ(x, y)). α T Karthik Raman (Cornell)

Stable Coactive Learning

June 19, 2013

2/5

User Study: Learning Rankings using Perceptron On live search engine. Goal: Learn ranking function from user clicks. Interleaved comparison against hand-tuned baseline.

Karthik Raman (Cornell)

Stable Coactive Learning

June 19, 2013

3/5

User Study: Learning Rankings using Perceptron On live search engine. Goal: Learn ranking function from user clicks. Interleaved comparison against hand-tuned baseline. Win ratio of 1 means no better than baseline (Higher = Better). User Study Results 2.2 2

Win Ratio

1.8 1.6 1.4 1.2 1

Preference Perceptron 0.8 0

5000

10000

15000

20000

25000

30000

Number Of Iterations

Karthik Raman (Cornell)

Stable Coactive Learning

June 19, 2013

3/5

User Study: Learning Rankings using Perceptron On live search engine. Goal: Learn ranking function from user clicks. Interleaved comparison against hand-tuned baseline. Win ratio of 1 means no better than baseline (Higher = Better). User Study Results 2.2 2

Win Ratio

1.8 1.6 1.4 1.2 1

Preference Perceptron 0.8 0

5000

10000

15000

20000

25000

30000

Number Of Iterations

Perceptron performs poorly! Karthik Raman (Cornell)

Stable Coactive Learning

June 19, 2013

3/5

User Study: Learning Rankings using Perceptron On live search engine.

Preference Perceptron Algo: 1

Initialize weight vector w1 ← 0.

2

Given context xt present yt ← argmaxy wt> φ(xt , y).

Goal: Learn ranking function from user clicks. Interleaved comparison against hand-tuned baseline. Win ratio of 1 means no better than baseline (Higher = Better). User Study Results 2.2 2

Presented Ranking (y)

Win Ratio

1.8 1.6 1.4 1.2 1

Preference Perceptron 0.8 0

5000

10000

15000

20000

25000

30000

Number Of Iterations

Perceptron performs poorly! Karthik Raman (Cornell)

Stable Coactive Learning

June 19, 2013

3/5

User Study: Learning Rankings using Perceptron On live search engine.

Preference Perceptron Algo: 1

Initialize weight vector w1 ← 0.

2

Given context xt present yt ← argmaxy wt> φ(xt , y).

3

Goal: Learn ranking function from user clicks. Interleaved comparison against hand-tuned baseline. Win ratio of 1 means no better than baseline (Higher = Better).

Observe clicks and construct feedback ranking ¯ yt .

User Study Results 2.2 2

Presented Ranking (y) Click!

Win Ratio

1.8 1.6 1.4 1.2 1

Preference Perceptron Click!

0.8 0

5000

10000

15000

20000

25000

30000

Number Of Iterations

Perceptron performs poorly! Karthik Raman (Cornell)

Stable Coactive Learning

June 19, 2013

3/5

User Study: Learning Rankings using Perceptron On live search engine.

Preference Perceptron Algo: 1

Initialize weight vector w1 ← 0.

2

Given context xt present yt ← argmaxy wt> φ(xt , y).

3

Goal: Learn ranking function from user clicks. Interleaved comparison against hand-tuned baseline. Win ratio of 1 means no better than baseline (Higher = Better).

Observe clicks and construct feedback ranking ¯ yt .

User Study Results 2.2 2

Presented Ranking (y) Click!

Feedback Ranking (y̅)

Win Ratio

1.8 1.6 1.4 1.2 1

Preference Perceptron Click!

0.8 0

5000

10000

15000

20000

25000

30000

Number Of Iterations

Perceptron performs poorly! Karthik Raman (Cornell)

Stable Coactive Learning

June 19, 2013

3/5

User Study: Learning Rankings using Perceptron On live search engine.

Preference Perceptron Algo: 1

Initialize weight vector w1 ← 0.

2

Given context xt present yt ← argmaxy wt> φ(xt , y).

3

Observe clicks and construct feedback ranking ¯ yt .

4

wt+1 ← wt +φ(xt , ¯ yt )−φ(xt , yt ).

Goal: Learn ranking function from user clicks. Interleaved comparison against hand-tuned baseline. Win ratio of 1 means no better than baseline (Higher = Better). User Study Results 2.2

5

Repeat from step 2.

2

Presented Ranking (y) Click!

Feedback Ranking (y̅)

Win Ratio

1.8 1.6 1.4 1.2 1

Preference Perceptron Click!

0.8 0

5000

10000

15000

20000

25000

30000

Number Of Iterations

Perceptron performs poorly! Karthik Raman (Cornell)

Stable Coactive Learning

June 19, 2013

3/5

Perturbed Preference Perceptron 1

Initialize weight vector w1 ← 0.

2

Given context xt compute ˆ yt ← argmaxy wt> φ(xt , y).

4

2

Present yt ← Perturb(ˆ yt ) (Randomly swap adjacent pairs). Observe clicks and construct feedback ranking ¯ yt .

1.8

Win Ratio

3

User Study Results 2.2

1.6

Preference Perceptron Perturbed Preference Perceptron

1.4 1.2 1

5

wt+1 ← wt +φ(xt , ¯ yt )−φ(xt , yt ).

6

Repeat from step 2.

0.8 0

5000

10000

15000

20000

25000

30000

Number Of Iterations

Presented Ranking (y)

Predicted Ranking (ŷ)

PERTURB

Karthik Raman (Cornell)

Stable Coactive Learning

June 19, 2013

4/5

Please come to our poster

I will tell you: Why the preference perceptron performs poorly? Why does perturbation fix the problem? What are the regret bounds for the algorithm? How do we do this more generally for non-ranking problems?

Karthik Raman (Cornell)

Stable Coactive Learning

June 19, 2013

5/5