Surrogate Regret Bounds for the Area Under the ROC Curve via ...

Report 3 Downloads 39 Views
Surrogate Regret Bounds for the Area Under the ROC Curve via Strongly Proper Losses Shivani Agarwal Department of Computer Science and Automation Indian Institute of Science

a.k.a. Standard Logistic Regression, AdaBoost, and Least Squares Regression are AUC-Consistent! Shivani Agarwal Department of Computer Science and Automation Indian Institute of Science

Area Under the ROC Curve (AUC)

RR

Area Under the ROC Curve (AUC) ROC curve of scoring function

RR

Area Under the ROC Curve (AUC) ROC curve of scoring function

RR

Area Under the ROC Curve (AUC) ROC curve of scoring function

RR

Area Under the ROC Curve (AUC) ROC curve of scoring function

RR

Empirical AUC as a Wilcoxon-Mann-Whitney Statistic

RR

Pairwise Surrogate Risk Minimization Algorithms for Optimizing AUC • RankSVM [Herbrich et al, 2000; Joachims, 2002; Rakotomamonjy, 2004]

• RankBoost [Freund et al, 2003]

• RankNet [Burges et al, 2005]

Many of these Pairwise Algorithms are AUC-Consistent • RankBoost [Clemencon et al, 2008; Uematsu & Lee, 2011]

• RankNet [Clemencon et al, 2008; Uematsu & Lee, 2011]

X [Uematsu & Lee, 2011]

• RankSVM

However, Standard Logistic Regression, AdaBoost Also Reported to Give Good AUC Performance • Empirical studies [Cortes & Mohri, 2004]

• Some results specific to boosting [Rudin & Schapire, 2009; Ertekin & Rudin, 2011]

• Balanced losses [Kotlowski et al, 2011]

This Paper: Any Algorithm Minimizing a (Standard, Non-Pairwise) Strongly Proper Loss is AUC-Consistent!

Road Map

Problem Setup and Previous Results

Proper and Strongly Proper Losses AUC Regret Bounds and Consistency

Problem Setup • Instance space

Probability distribution

on

Problem Setup • Instance space

Probability distribution

on

• AUC of a scoring function

[See paper for more complete definitions/statements]

Problem Setup • Instance space

Probability distribution

on

• AUC of a scoring function

• Optimal AUC:

[See paper for more complete definitions/statements]

Problem Setup • Instance space

Probability distribution

on

• AUC of a scoring function

• Optimal AUC: • AUC regret of

[See paper for more complete definitions/statements]

Binary Loss Functions • Prediction space

Binary loss function

Binary Loss Functions • Prediction space



-error of a function

Binary loss function

Binary Loss Functions • Prediction space



-error of a function

• Optimal -error:

Binary loss function

Binary Loss Functions • Prediction space



-error of a function

• Optimal -error: •

-regret of

Binary loss function

Example: 0-1 Loss • Prediction space

0-1 loss

Example: 0-1 Loss • Prediction space

0-1 loss

• 0-1 error of a function

• Optimal 0-1 error (Bayes error):

• 0-1 regret of

Example: Logistic Loss • Prediction space

Logistic loss

Example: Logistic Loss • Prediction space

• Logistic error of a function

• Optimal logistic error:

• Logistic regret of

Logistic loss

Example: Exponential Loss • Prediction space

Exponential loss

Example: Exponential Loss • Prediction space

Exponential loss

• Exponential error of a function

• Optimal exponential error:

• Exponential regret of

Reduction to Pairwise Binary Classification [Clemencon et al, 2008]

• Define `pairwise’ distribution • Sample • If

on (iid)

then discard and repeat, else set

as follows:

Reduction to Pairwise Binary Classification [Clemencon et al, 2008]

• Define `pairwise’ distribution • Sample • If

• For any

on

as follows:

(iid) then discard and repeat, else set

define

as

Reduction to Pairwise Binary Classification [Clemencon et al, 2008]

• Define `pairwise’ distribution • Sample • If

• For any • Then

on

as follows:

(iid) then discard and repeat, else set

define

as

Reduction to Pairwise Binary Classification [Clemencon et al, 2008]

• From [Bartlett et al, 2006]: for any classification-calibrated, margin-based loss

for some strictly increasing function

with

Reduction to Pairwise Binary Classification [Clemencon et al, 2008]

• From [Bartlett et al, 2006]: for any classification-calibrated, margin-based loss

for some strictly increasing function

with

Reduction to Pairwise Binary Classification [Clemencon et al, 2008]

• From [Bartlett et al, 2006]: for any classification-calibrated, margin-based loss

for some strictly increasing function

with

Regret Bounds via Balanced Losses [Kotlowski et al, 2011]

• Let

(under

). Define balanced losses as follows:

Regret Bounds via Balanced Losses [Kotlowski et al, 2011]

• Let

• Then

(under

). Define balanced losses as follows:

Regret Bounds via Balanced Losses [Kotlowski et al, 2011]

• Let

• Then

(under

). Define balanced losses as follows:

Regret Bounds via Balanced Losses [Kotlowski et al, 2011]

• Combining with results of [Clemencon et al, 2008] and [Bartlett et al, 2006]:

Summary So Far • AUC regret of can be upper bounded in terms of balanced exponential regret of and balanced logistic regret of .

Summary So Far • AUC regret of can be upper bounded in terms of balanced exponential regret of and balanced logistic regret of . But: Analysis goes via pairwise classification

Summary So Far • AUC regret of can be upper bounded in terms of balanced exponential regret of and balanced logistic regret of . But: Analysis goes via pairwise classification

Analysis specific to exponential and logistic losses

Summary So Far • AUC regret of can be upper bounded in terms of balanced exponential regret of and balanced logistic regret of . But: Analysis goes via pairwise classification

Analysis specific to exponential and logistic losses Balanced losses contain hidden terms, hard to optimize directly

Summary So Far • AUC regret of can be upper bounded in terms of balanced exponential regret of and balanced logistic regret of . But: Analysis goes via pairwise classification

Analysis specific to exponential and logistic losses Balanced losses contain hidden terms, hard to optimize directly

Doesn’t explain empirical success with standard algorithms

Road Map

Problem Setup and Previous Results

Proper and Strongly Proper Losses AUC Regret Bounds and Consistency

Proper Losses [Savage, 1971; Schervish, 1989; Buja et al, 2005; Reid & Williamson, 2009, 2010] • Conditional -risk • Conditional Bayes -risk

:

:

Proper Losses [Savage, 1971; Schervish, 1989; Buja et al, 2005; Reid & Williamson, 2009, 2010] • Conditional -risk • Conditional Bayes -risk

• Let

.

:

:

Proper Losses [Savage, 1971; Schervish, 1989; Buja et al, 2005; Reid & Williamson, 2009, 2010] • Conditional -risk • Conditional Bayes -risk

• Let

. A loss

:

:

is proper if

,

Proper Losses [Savage, 1971; Schervish, 1989; Buja et al, 2005; Reid & Williamson, 2009, 2010] • Conditional -risk • Conditional Bayes -risk

• Let

. A loss

and strictly proper if

:

:

is proper if

,

Proper Losses [Savage, 1971; Schervish, 1989; Buja et al, 2005; Reid & Williamson, 2009, 2010] • Conditional -risk • Conditional Bayes -risk

:

:

• Theorem [Hendrickson & Buehler, 1971; Schervish, 1989]. A proper loss is strictly proper if and only if is strictly concave.

Strongly Proper Losses • Let . Say a loss function -strongly proper if

is

Strongly Proper Losses • Let . Say a loss function -strongly proper if

• Theorem. Let . A `regular’ proper loss is -strongly proper if and only if

is

is -strongly concave.

Road Map

Problem Setup and Previous Results

Proper and Strongly Proper Losses AUC Regret Bounds and Consistency

Regret Bound via Strongly Proper (Composite) Losses • Theorem. Let . Let composite loss. Then for any

be a -strongly proper

Regret Bound via Strongly Proper (Composite) Losses • Theorem. Let . Let composite loss. Then for any

• Crux of Proof:

where

be a -strongly proper

Regret Bound via Strongly Proper (Composite) Losses • Theorem. Let . Let composite loss. Then for any

• Crux of Proof: suitable

where

be a -strongly proper

Examples of Strongly Proper Composite Losses

Tighter Bound under Low-Noise Conditions [via result of Clemencon & Robbiano, 2011]

• Theorem. Let . Let composite loss. Suppose satisfies

Then for any

be a -strongly proper such that

Any algorithm minimizing a standard, non-pairwise, strongly proper composite loss (assuming appropriate function class and regularization)

is AUC-consistent! • Standard Logistic Regression • Standard AdaBoost • Standard Least Squares Regression

Strongly proper losses may also be useful in other contexts. • A.K. Menon, H. Narasimhan, S. Agarwal, S. Chawla. On the statistical consistency of algorithms for binary classification under class imbalance. ICML 2013.