Privacy Aware Learning John C. Duchi, Michael I. Jordan, Martin J. Wainwright University of California, Berkeley
NIPS 2012
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
1 / 28
An example
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
2 / 28
An example
Setting: We want to construct a good image classifier, say, of military personnel
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
2 / 28
Petraeus in a market
Source: Department of Defense Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
3 / 28
Petraeus thinking
Source: CBS News Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
4 / 28
Pictures Petraeus might not want shared
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
5 / 28
Pictures Petraeus might not want shared
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
5 / 28
Maybe he shouldn’t share that?
Source: Rolling Stone Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
6 / 28
But I want a good image classifier
What should we do?
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
7 / 28
Start to develop theory of learning from private data
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
8 / 28
Start to develop theory of learning from private data
Instead of this
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
8 / 28
Start to develop theory of learning from private data
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
8 / 28
Start to develop theory of learning from private data
Learn from
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
8 / 28
Start to develop theory of learning from private data
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
8 / 28
Start to develop theory of learning from private data
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
8 / 28
Start to develop theory of learning from private data
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
8 / 28
Start to develop theory of learning from private data
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
8 / 28
Start to develop theory of learning from private data
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
8 / 28
Start to develop theory of learning from private data
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
8 / 28
Outline
I Problem statement and motivating examples II The privacy game III Statistical estimation tradeoffs IV Conclusions and future work
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
9 / 28
Tradeoffs
What are the tradeoffs between maintaining privacy and statistical estimation?
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
10 / 28
Tradeoffs
What are the tradeoffs between maintaining privacy and statistical estimation? Fine-grained tradeoffs between privacy and utility
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
10 / 28
Setting
I
Get samples X1 , . . . , Xn
I
Have a parameter θ we want to infer
I
Measure performance of parameter θ with loss `(θ; X)
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
11 / 28
Example: breast cancer prediction
I
Data in (x, y) pairs (regressor x ∈ {−1, 1}d , label y ∈ {±1})
1 −1 1 −1 ··· x= Clump Uniform Adhesian Chromatin · · · +1 if cancerous y= −1 if not
Duchi (UC Berkeley)
Privacy Aware Learning
1 Mitoses
December 2012
12 / 28
Example: breast cancer prediction
I
Data in (x, y) pairs (regressor x ∈ {−1, 1}d , label y ∈ {±1})
1 −1 1 −1 ··· x= Clump Uniform Adhesian Chromatin · · · +1 if cancerous y= −1 if not I
1 Mitoses
Goal: Find θ so that sign(θ> x) = y
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
12 / 28
Example: breast cancer prediction
I
Data in (x, y) pairs (regressor x ∈ {−1, 1}d , label y ∈ {±1})
1 −1 1 −1 ··· x= Clump Uniform Adhesian Chromatin · · · +1 if cancerous y= −1 if not I
Goal: Find θ so that sign(θ> x) = y
I
Loss:
Duchi (UC Berkeley)
h i `(θ; {x, y}) = 1 − yθ> x
Privacy Aware Learning
1 Mitoses
+
December 2012
12 / 28
Setting
θ! M X1 Duchi (UC Berkeley)
X2
X3 Privacy Aware Learning
Xn December 2012
13 / 28
Formal setting
Goal: minimize a risk R measuring performance of a parameter θ:
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
14 / 28
Formal setting
Goal: minimize a risk R measuring performance of a parameter θ: minimize R(θ) = E[`(θ; X)] subject to θ ∈ Θ using samples X1 , . . . , Xn .
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
14 / 28
Formal setting
Goal: minimize a risk R measuring performance of a parameter θ: minimize R(θ) = E[`(θ; X)] subject to θ ∈ Θ using samples X1 , . . . , Xn . b − R(θ? ) small without learning Question: Can we find θb so that R(θ) about X1 , . . . , Xn ?
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
14 / 28
Formal setting
Goal: minimize a risk R measuring performance of a parameter θ: minimize R(θ) = E[`(θ; X)] subject to θ ∈ Θ using samples X1 , . . . , Xn . b − R(θ? ) small without learning Question: Can we find θb so that R(θ) about X1 , . . . , Xn ?
From but not about
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
14 / 28
Formalizing privacy Prior work: Lots (Chaudhuri and collaborators, Dwork et al., Wasserman and Zhou)
θ! M X1
Duchi (UC Berkeley)
X2
X3
Xn
Privacy Aware Learning
December 2012
15 / 28
Formalizing privacy Prior work: Lots (Chaudhuri and collaborators, Dwork et al., Wasserman and Zhou) Local Privacy: Changing privacy barrier (Evfimievski et al. 2003, Warner 1965)
θ! M X1
Duchi (UC Berkeley)
X2
X3
Xn
Privacy Aware Learning
December 2012
15 / 28
Formalizing privacy Prior work: Lots (Chaudhuri and collaborators, Dwork et al., Wasserman and Zhou) Local Privacy: Changing privacy barrier (Evfimievski et al. 2003, Warner 1965)
θ! M X1
Duchi (UC Berkeley)
X2
X3
Xn
Privacy Aware Learning
December 2012
15 / 28
Formalizing privacy Prior work: Lots (Chaudhuri and collaborators, Dwork et al., Wasserman and Zhou) Local Privacy: Changing privacy barrier (Evfimievski et al. 2003, Warner 1965)
X1
Duchi (UC Berkeley)
X2
θ!
M
M
Q(Z | X)
X3
Zi
Xn
Privacy Aware Learning
Xi December 2012
15 / 28
How do we get privacy? Example: Classification I
Data pairs (x, y) with x ∈ {−1, 1}d , label y ∈ {±1}
1 −1 1 1 ··· x= Clump Uniform Adheres Chromatin · · · +1 if cancerous y= −1 if not
Duchi (UC Berkeley)
Privacy Aware Learning
−1 Mitoses
December 2012
16 / 28
How do we get privacy? Example: Classification I
Data pairs (x, y) with x ∈ {−1, 1}d , label y ∈ {±1}
1 −1 1 1 ··· x= Clump Uniform Adheres Chromatin · · · +1 if cancerous y= −1 if not
−1 Mitoses
Idea: Add independent random noise W to coordinates of x: Zi = Xi + W
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
16 / 28
How do we get privacy? Example: Classification I
Data pairs (x, y) with x ∈ {−1, 1}d , label y ∈ {±1}
1 −1 1 1 ··· x= Clump Uniform Adheres Chromatin · · · +1 if cancerous y= −1 if not
−1 Mitoses
Idea: Add independent random noise W to coordinates of x: Zi = Xi + W Problem: This is highly suboptimal, dimension dependence blows up
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
16 / 28
Communication model
Local Privacy: Communication model to study minimization of R(θ) = E[`(θ; X)]
M Zi
Q(Z | X) Xi
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
17 / 28
Communication model Local Privacy: Communication model to study minimization of R(θ) = E[`(θ; X)] I
Communicate ∇`(θ; Xi )
M Zi
Q(Z | X, θ) ∇ℓ(θ; Xi ) Xi
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
17 / 28
Communication model Local Privacy: Communication model to study minimization of R(θ) = E[`(θ; X)] I
Communicate ∇`(θ; Xi ) I I
Want to minimize, ∇` is sufficient Use stochastic optimization techniques with ∇`(θ; xi )
M Zi
Q(Z | X, θ) ∇ℓ(θ; Xi ) Xi
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
17 / 28
Communication model Local Privacy: Communication model to study minimization of R(θ) = E[`(θ; X)] I
Communicate ∇`(θ; Xi ) I I
I
Want to minimize, ∇` is sufficient Use stochastic optimization techniques with ∇`(θ; xi )
Really communicate Zi with property EQ [Zi | θ, Xi ] = ∇`(θ; Xi )
Duchi (UC Berkeley)
Privacy Aware Learning
M Zi
Q(Z | X, θ) ∇ℓ(θ; Xi ) Xi
December 2012
17 / 28
Main Contributions
Contribution 1: Optimal types of noise to guarantee privacy
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
18 / 28
Main Contributions
Contribution 1: Optimal types of noise to guarantee privacy
Contribution 2: Sharp upper and lower bounds on convergence rates as a function of privacy
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
18 / 28
Privacy Saddle Points Optimal Local Privacy:
M Maximize privacy of Q subject to EQ [Z | θ, X] = ∇`(θ; X)
Zi
Q(Z | X, θ) ∇ℓ(θ; Xi ) Xi
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
19 / 28
Privacy saddle points Goal: Maximize privacy of Z for X subject to θb being learnable (some constraints on Z)
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
20 / 28
Privacy saddle points Goal: Maximize privacy of Z for X subject to θb being learnable (some constraints on Z) Privacy metric: mutual information
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
20 / 28
Privacy saddle points Goal: Maximize privacy of Z for X subject to θb being learnable (some constraints on Z) Privacy metric: mutual information sup I(P ; Q) P ∈P
P
Duchi (UC Berkeley)
Xi
Q
Privacy Aware Learning
Zi
December 2012
20 / 28
Privacy saddle points Goal: Maximize privacy of Z for X subject to θb being learnable (some constraints on Z) Privacy metric: mutual information sup I(P ; Q) P ∈P
P
Xi
Q
Zi
Worst case information measure
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
20 / 28
Privacy saddle points Goal: Maximize privacy of Z for X subject to θb being learnable (some constraints on Z) Privacy metric: mutual information sup I(P ; Q) P ∈P
P
Xi
Q
Zi
Worst case information measure Strategy: We provide general solution to minimize sup I(P ; Q) Q
P ∈P
over distributions Q with larger support than P Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
20 / 28
Mutual information saddle point example Setting: Data x ∈ {−1, 1}d , allow z to be in kzk∞ ≤ M .
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
21 / 28
Mutual information saddle point example Setting: Data x ∈ {−1, 1}d , allow z to be in kzk∞ ≤ M .
q=
1 2
x2 2M
q=
1 x2 + 2 2M
Z1 Z2
X
2M Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
21 / 28
Mutual information saddle point example Setting: Data x ∈ {−1, 1}d , allow z to be in kzk∞ ≤ M .
q=
I
Independent coordinates zi ∈ {−M, M } Distribution
Q∗ (Zi = M | X) =
1 Xi + 2 2M
x2 2M
q=
1 x2 + 2 2M
Z1
Optimal distribution Q given X: I
1 2
Z2
X
2M Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
21 / 28
Example of optimal perturbation
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
22 / 28
Example of optimal perturbation
1 bit per bit
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
22 / 28
Example of optimal perturbation
.13 bits per bit (2× slower convergence)
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
22 / 28
Example of optimal perturbation
.033 bits per bit (4× slower convergence)
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
22 / 28
Example of optimal perturbation
.0081 bits per bit (8× slower convergence)
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
22 / 28
Example of optimal perturbation
.002 bits per bit (16× slower convergence)
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
22 / 28
Example of optimal perturbation
.0005 bits per bit (32× slower convergence)
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
22 / 28
Example of optimal perturbation
.00013 bits per bit (64× slower convergence)
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
22 / 28
Statistical estimation and convergence rates
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
23 / 28
Exhibiting tradeoffs
Goal: Understand tradeoff between mutual information bound I ∗ := min sup I(X; Z) Q
P
and number of samples n Reminder: θb is our estimate, based on X1 , . . . , Xn , R(θ) := E[`(θ; X)]
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
24 / 28
Exhibiting tradeoffs Goal: Understand tradeoff between mutual information bound I ∗ := min sup I(X; Z) Q
P
and number of samples n for risk minimization problems
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
25 / 28
Exhibiting tradeoffs Goal: Understand tradeoff between mutual information bound I ∗ := min sup I(X; Z) Q
P
and number of samples n for risk minimization problems Theorem: Effective sample size for d dimensional problem is made worse by nI ∗ n 7→ d
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
25 / 28
Exhibiting tradeoffs Goal: Understand tradeoff between mutual information bound I ∗ := min sup I(X; Z) Q
P
and number of samples n for risk minimization problems Theorem: Effective sample size for d dimensional problem is made worse by nI ∗ n 7→ d
I
Lower bound holds for all methods
I
Upper bound achieved by stochastic approximation
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
25 / 28
Exhibiting tradeoffs Have mutual information I ∗ := min sup I(X; Z) Q
P
Theorem: Optimality gap for d dimensional problem 1 b − R(θ? ) ≤ O(1) √1 Ω(1) √ ≤ E[R(θ)] n n
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
26 / 28
Exhibiting tradeoffs Have mutual information I ∗ := min sup I(X; Z) Q
P
Theorem: Optimality gap for d dimensional problem √ √ d d ? b Ω(1) √ ≤ E[R(θ)] − R(θ ) ≤ O(1) √ ∗ nI nI ∗
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
26 / 28
Exhibiting tradeoffs Have mutual information I ∗ := min sup I(X; Z) Q
P
Theorem: Optimality gap for d dimensional problem √ √ d d ? b Ω(1) √ ≤ E[R(θ)] − R(θ ) ≤ O(1) √ ∗ nI nI ∗ I
Lower bound holds for all methods
I
Upper bound achieved by stochastic approximation
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
26 / 28
Experimental example: breast cancer prediction
I
Regressors x are markers for breast cancer, labels y are presence/absence of tumor Measure predictive performance: count sign(θ> xi ) = yi 0.25
ℓ ∞ masking ℓ 1 masking 0.2
Error rate
I
0.15
0.1
0.05 −1
10
0
10
1
10
Maximum bits communicated Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
27 / 28
Conclusions and future work
1. Have given sharp rates of convergence when providers of data play “privacy game.” Extensions to differential privacy as well.
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
28 / 28
Conclusions and future work
1. Have given sharp rates of convergence when providers of data play “privacy game.” Extensions to differential privacy as well. 2. In a soon to be on arXiv paper, we generalize this: no more privacy game, essentially all statistical estimators
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
28 / 28
Conclusions and future work
1. Have given sharp rates of convergence when providers of data play “privacy game.” Extensions to differential privacy as well. 2. In a soon to be on arXiv paper, we generalize this: no more privacy game, essentially all statistical estimators 3. Is it possible to release a perturbed version of the data X1 , . . . , Xn ?
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
28 / 28
Conclusions and future work
1. Have given sharp rates of convergence when providers of data play “privacy game.” Extensions to differential privacy as well. 2. In a soon to be on arXiv paper, we generalize this: no more privacy game, essentially all statistical estimators 3. Is it possible to release a perturbed version of the data X1 , . . . , Xn ? 4. What if all we care about is protecting some function ϕ(Xi )?
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
28 / 28
Conclusions and future work
1. Have given sharp rates of convergence when providers of data play “privacy game.” Extensions to differential privacy as well. 2. In a soon to be on arXiv paper, we generalize this: no more privacy game, essentially all statistical estimators 3. Is it possible to release a perturbed version of the data X1 , . . . , Xn ? 4. What if all we care about is protecting some function ϕ(Xi )?
Thanks!
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
28 / 28
Conclusions and future work
1. Have given sharp rates of convergence when providers of data play “privacy game.” Extensions to differential privacy as well. 2. In a soon to be on arXiv paper, we generalize this: no more privacy game, essentially all statistical estimators 3. Is it possible to release a perturbed version of the data X1 , . . . , Xn ? 4. What if all we care about is protecting some function ϕ(Xi )?
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
28 / 28
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
29 / 28
Exhibiting tradeoffs Have mutual information I ∗ := min sup I(X; Z) Q
P
Theorem: There are constants a, b with b/a = O(1) dependent only on learning problem such that √ √ d ? b − R(θ ) ≤ √ d b √ a ≤ E[R(θ)] ∗ nI nI ∗
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
30 / 28
Exhibiting tradeoffs Have mutual information I ∗ := min sup I(X; Z) Q
P
Theorem: There are constants a, b with b/a = O(1) dependent only on learning problem such that √ √ d ? b − R(θ ) ≤ √ d b √ a ≤ E[R(θ)] ∗ nI nI ∗ I
Lower bound holds for all methods
I
Upper bound achieved by stochastic approximation
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
30 / 28
Mutual information saddle points Goal: Channel Q∗ (where Z ∼ Q∗ (· | X)) so that min max I(P, Q) ≥ max I(P, Q∗ ) Q
Duchi (UC Berkeley)
P,`
P,`
Privacy Aware Learning
December 2012
31 / 28
Mutual information saddle points Goal: Channel Q∗ (where Z ∼ Q∗ (· | X)) so that min max I(P, Q) ≥ max I(P, Q∗ ) Q
P,`
P,`
Z∈D
D
C X∈C
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
31 / 28
Mutual information saddle points Goal: Channel Q∗ (where Z ∼ Q∗ (· | X)) so that min max I(P, Q) ≥ max I(P, Q∗ ) Q
P,`
P,`
Z∈D
Theorem: Let ∇`(θ; X) ∈ C, Z ∈ D.
D
C X∈C
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
31 / 28
Mutual information saddle points Goal: Channel Q∗ (where Z ∼ Q∗ (· | X)) so that min max I(P, Q) ≥ max I(P, Q∗ ) Q
P,`
P,`
Z∈D
Theorem: Let ∇`(θ; X) ∈ C, Z ∈ D. If I
P ∗ is uniform on extreme points of C
I
Q∗ supported on extreme points of D, maximizes entropy of Z given X
D
C
Also Q∗ is unique
X∈C
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
31 / 28
Mutual information saddle points Goal: Channel Q∗ (where Z ∼ Q∗ (· | X)) so that min max I(P, Q) ≥ max I(P, Q∗ ) Q
P,`
P,`
Z∈D
Theorem: Let ∇`(θ; X) ∈ C, Z ∈ D. If I
P ∗ is uniform on extreme points of C
I
Q∗ supported on extreme points of D, maximizes entropy of Z given X
D
C
Also Q∗ is unique
X∈C
min max I(X; Z) = max min I(X; Z) = I(X ∗ ; Z ∗ ). Q
P,`
Duchi (UC Berkeley)
P,`
Q
Privacy Aware Learning
December 2012
31 / 28
Privacy intuition
Z x
Duchi (UC Berkeley)
Privacy Aware Learning
December 2012
32 / 28
Privacy intuition
Z′
Z x
Duchi (UC Berkeley)
Z
Z′
x
Privacy Aware Learning
December 2012
32 / 28