Surrogate Regret Bounds for the Area Under the ROC Curve via Strongly Proper Losses Shivani Agarwal Department of Computer Science and Automation Indian Institute of Science
a.k.a. Standard Logistic Regression, AdaBoost, and Least Squares Regression are AUC-Consistent! Shivani Agarwal Department of Computer Science and Automation Indian Institute of Science
Area Under the ROC Curve (AUC)
RR
Area Under the ROC Curve (AUC) ROC curve of scoring function
RR
Area Under the ROC Curve (AUC) ROC curve of scoring function
RR
Area Under the ROC Curve (AUC) ROC curve of scoring function
RR
Area Under the ROC Curve (AUC) ROC curve of scoring function
RR
Empirical AUC as a Wilcoxon-Mann-Whitney Statistic
RR
Pairwise Surrogate Risk Minimization Algorithms for Optimizing AUC • RankSVM [Herbrich et al, 2000; Joachims, 2002; Rakotomamonjy, 2004]
• RankBoost [Freund et al, 2003]
• RankNet [Burges et al, 2005]
Many of these Pairwise Algorithms are AUC-Consistent • RankBoost [Clemencon et al, 2008; Uematsu & Lee, 2011]
• RankNet [Clemencon et al, 2008; Uematsu & Lee, 2011]
X [Uematsu & Lee, 2011]
• RankSVM
However, Standard Logistic Regression, AdaBoost Also Reported to Give Good AUC Performance • Empirical studies [Cortes & Mohri, 2004]
• Some results specific to boosting [Rudin & Schapire, 2009; Ertekin & Rudin, 2011]
• Balanced losses [Kotlowski et al, 2011]
This Paper: Any Algorithm Minimizing a (Standard, Non-Pairwise) Strongly Proper Loss is AUC-Consistent!
Road Map
Problem Setup and Previous Results
Proper and Strongly Proper Losses AUC Regret Bounds and Consistency
Problem Setup • Instance space
Probability distribution
on
Problem Setup • Instance space
Probability distribution
on
• AUC of a scoring function
[See paper for more complete definitions/statements]
Problem Setup • Instance space
Probability distribution
on
• AUC of a scoring function
• Optimal AUC:
[See paper for more complete definitions/statements]
Problem Setup • Instance space
Probability distribution
on
• AUC of a scoring function
• Optimal AUC: • AUC regret of
[See paper for more complete definitions/statements]
Binary Loss Functions • Prediction space
Binary loss function
Binary Loss Functions • Prediction space
•
-error of a function
Binary loss function
Binary Loss Functions • Prediction space
•
-error of a function
• Optimal -error:
Binary loss function
Binary Loss Functions • Prediction space
•
-error of a function
• Optimal -error: •
-regret of
Binary loss function
Example: 0-1 Loss • Prediction space
0-1 loss
Example: 0-1 Loss • Prediction space
0-1 loss
• 0-1 error of a function
• Optimal 0-1 error (Bayes error):
• 0-1 regret of
Example: Logistic Loss • Prediction space
Logistic loss
Example: Logistic Loss • Prediction space
• Logistic error of a function
• Optimal logistic error:
• Logistic regret of
Logistic loss
Example: Exponential Loss • Prediction space
Exponential loss
Example: Exponential Loss • Prediction space
Exponential loss
• Exponential error of a function
• Optimal exponential error:
• Exponential regret of
Reduction to Pairwise Binary Classification [Clemencon et al, 2008]
• Define `pairwise’ distribution • Sample • If
on (iid)
then discard and repeat, else set
as follows:
Reduction to Pairwise Binary Classification [Clemencon et al, 2008]
• Define `pairwise’ distribution • Sample • If
• For any
on
as follows:
(iid) then discard and repeat, else set
define
as
Reduction to Pairwise Binary Classification [Clemencon et al, 2008]
• Define `pairwise’ distribution • Sample • If
• For any • Then
on
as follows:
(iid) then discard and repeat, else set
define
as
Reduction to Pairwise Binary Classification [Clemencon et al, 2008]
• From [Bartlett et al, 2006]: for any classification-calibrated, margin-based loss
for some strictly increasing function
with
Reduction to Pairwise Binary Classification [Clemencon et al, 2008]
• From [Bartlett et al, 2006]: for any classification-calibrated, margin-based loss
for some strictly increasing function
with
Reduction to Pairwise Binary Classification [Clemencon et al, 2008]
• From [Bartlett et al, 2006]: for any classification-calibrated, margin-based loss
for some strictly increasing function
with
Regret Bounds via Balanced Losses [Kotlowski et al, 2011]
• Let
(under
). Define balanced losses as follows:
Regret Bounds via Balanced Losses [Kotlowski et al, 2011]
• Let
• Then
(under
). Define balanced losses as follows:
Regret Bounds via Balanced Losses [Kotlowski et al, 2011]
• Let
• Then
(under
). Define balanced losses as follows:
Regret Bounds via Balanced Losses [Kotlowski et al, 2011]
• Combining with results of [Clemencon et al, 2008] and [Bartlett et al, 2006]:
Summary So Far • AUC regret of can be upper bounded in terms of balanced exponential regret of and balanced logistic regret of .
Summary So Far • AUC regret of can be upper bounded in terms of balanced exponential regret of and balanced logistic regret of . But: Analysis goes via pairwise classification
Summary So Far • AUC regret of can be upper bounded in terms of balanced exponential regret of and balanced logistic regret of . But: Analysis goes via pairwise classification
Analysis specific to exponential and logistic losses
Summary So Far • AUC regret of can be upper bounded in terms of balanced exponential regret of and balanced logistic regret of . But: Analysis goes via pairwise classification
Analysis specific to exponential and logistic losses Balanced losses contain hidden terms, hard to optimize directly
Summary So Far • AUC regret of can be upper bounded in terms of balanced exponential regret of and balanced logistic regret of . But: Analysis goes via pairwise classification
Analysis specific to exponential and logistic losses Balanced losses contain hidden terms, hard to optimize directly
Doesn’t explain empirical success with standard algorithms
Road Map
Problem Setup and Previous Results
Proper and Strongly Proper Losses AUC Regret Bounds and Consistency
Proper Losses [Savage, 1971; Schervish, 1989; Buja et al, 2005; Reid & Williamson, 2009, 2010] • Conditional -risk • Conditional Bayes -risk
:
:
Proper Losses [Savage, 1971; Schervish, 1989; Buja et al, 2005; Reid & Williamson, 2009, 2010] • Conditional -risk • Conditional Bayes -risk
• Let
.
:
:
Proper Losses [Savage, 1971; Schervish, 1989; Buja et al, 2005; Reid & Williamson, 2009, 2010] • Conditional -risk • Conditional Bayes -risk
• Let
. A loss
:
:
is proper if
,
Proper Losses [Savage, 1971; Schervish, 1989; Buja et al, 2005; Reid & Williamson, 2009, 2010] • Conditional -risk • Conditional Bayes -risk
• Let
. A loss
and strictly proper if
:
:
is proper if
,
Proper Losses [Savage, 1971; Schervish, 1989; Buja et al, 2005; Reid & Williamson, 2009, 2010] • Conditional -risk • Conditional Bayes -risk
:
:
• Theorem [Hendrickson & Buehler, 1971; Schervish, 1989]. A proper loss is strictly proper if and only if is strictly concave.
Strongly Proper Losses • Let . Say a loss function -strongly proper if
is
Strongly Proper Losses • Let . Say a loss function -strongly proper if
• Theorem. Let . A `regular’ proper loss is -strongly proper if and only if
is
is -strongly concave.
Road Map
Problem Setup and Previous Results
Proper and Strongly Proper Losses AUC Regret Bounds and Consistency
Regret Bound via Strongly Proper (Composite) Losses • Theorem. Let . Let composite loss. Then for any
be a -strongly proper
Regret Bound via Strongly Proper (Composite) Losses • Theorem. Let . Let composite loss. Then for any
• Crux of Proof:
where
be a -strongly proper
Regret Bound via Strongly Proper (Composite) Losses • Theorem. Let . Let composite loss. Then for any
• Crux of Proof: suitable
where
be a -strongly proper
Examples of Strongly Proper Composite Losses
Tighter Bound under Low-Noise Conditions [via result of Clemencon & Robbiano, 2011]
• Theorem. Let . Let composite loss. Suppose satisfies
Then for any
be a -strongly proper such that
Any algorithm minimizing a standard, non-pairwise, strongly proper composite loss (assuming appropriate function class and regularization)
is AUC-consistent! • Standard Logistic Regression • Standard AdaBoost • Standard Least Squares Regression
Strongly proper losses may also be useful in other contexts. • A.K. Menon, H. Narasimhan, S. Agarwal, S. Chawla. On the statistical consistency of algorithms for binary classification under class imbalance. ICML 2013.