Privacy Aware Learning - Stanford University

Report 3 Downloads 44 Views
Privacy Aware Learning John C. Duchi, Michael I. Jordan, Martin J. Wainwright University of California, Berkeley

NIPS 2012

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

1 / 28

An example

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

2 / 28

An example

Setting: We want to construct a good image classifier, say, of military personnel

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

2 / 28

Petraeus in a market

Source: Department of Defense Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

3 / 28

Petraeus thinking

Source: CBS News Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

4 / 28

Pictures Petraeus might not want shared

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

5 / 28

Pictures Petraeus might not want shared

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

5 / 28

Maybe he shouldn’t share that?

Source: Rolling Stone Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

6 / 28

But I want a good image classifier

What should we do?

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

7 / 28

Start to develop theory of learning from private data

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

8 / 28

Start to develop theory of learning from private data

Instead of this

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

8 / 28

Start to develop theory of learning from private data

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

8 / 28

Start to develop theory of learning from private data

Learn from

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

8 / 28

Start to develop theory of learning from private data

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

8 / 28

Start to develop theory of learning from private data

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

8 / 28

Start to develop theory of learning from private data

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

8 / 28

Start to develop theory of learning from private data

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

8 / 28

Start to develop theory of learning from private data

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

8 / 28

Start to develop theory of learning from private data

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

8 / 28

Outline

I Problem statement and motivating examples II The privacy game III Statistical estimation tradeoffs IV Conclusions and future work

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

9 / 28

Tradeoffs

What are the tradeoffs between maintaining privacy and statistical estimation?

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

10 / 28

Tradeoffs

What are the tradeoffs between maintaining privacy and statistical estimation? Fine-grained tradeoffs between privacy and utility

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

10 / 28

Setting

I

Get samples X1 , . . . , Xn

I

Have a parameter θ we want to infer

I

Measure performance of parameter θ with loss `(θ; X)

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

11 / 28

Example: breast cancer prediction

I

Data in (x, y) pairs (regressor x ∈ {−1, 1}d , label y ∈ {±1}) 

1 −1 1 −1 ··· x= Clump Uniform Adhesian Chromatin · · ·  +1 if cancerous y= −1 if not

Duchi (UC Berkeley)

Privacy Aware Learning

1 Mitoses



December 2012

12 / 28

Example: breast cancer prediction

I

Data in (x, y) pairs (regressor x ∈ {−1, 1}d , label y ∈ {±1}) 

1 −1 1 −1 ··· x= Clump Uniform Adhesian Chromatin · · ·  +1 if cancerous y= −1 if not I

1 Mitoses



Goal: Find θ so that sign(θ> x) = y

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

12 / 28

Example: breast cancer prediction

I

Data in (x, y) pairs (regressor x ∈ {−1, 1}d , label y ∈ {±1}) 

1 −1 1 −1 ··· x= Clump Uniform Adhesian Chromatin · · ·  +1 if cancerous y= −1 if not I

Goal: Find θ so that sign(θ> x) = y

I

Loss:

Duchi (UC Berkeley)

h i `(θ; {x, y}) = 1 − yθ> x

Privacy Aware Learning

1 Mitoses



+

December 2012

12 / 28

Setting

θ! M X1 Duchi (UC Berkeley)

X2

X3 Privacy Aware Learning

Xn December 2012

13 / 28

Formal setting

Goal: minimize a risk R measuring performance of a parameter θ:

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

14 / 28

Formal setting

Goal: minimize a risk R measuring performance of a parameter θ: minimize R(θ) = E[`(θ; X)] subject to θ ∈ Θ using samples X1 , . . . , Xn .

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

14 / 28

Formal setting

Goal: minimize a risk R measuring performance of a parameter θ: minimize R(θ) = E[`(θ; X)] subject to θ ∈ Θ using samples X1 , . . . , Xn . b − R(θ? ) small without learning Question: Can we find θb so that R(θ) about X1 , . . . , Xn ?

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

14 / 28

Formal setting

Goal: minimize a risk R measuring performance of a parameter θ: minimize R(θ) = E[`(θ; X)] subject to θ ∈ Θ using samples X1 , . . . , Xn . b − R(θ? ) small without learning Question: Can we find θb so that R(θ) about X1 , . . . , Xn ?

From but not about

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

14 / 28

Formalizing privacy Prior work: Lots (Chaudhuri and collaborators, Dwork et al., Wasserman and Zhou)

θ! M X1

Duchi (UC Berkeley)

X2

X3

Xn

Privacy Aware Learning

December 2012

15 / 28

Formalizing privacy Prior work: Lots (Chaudhuri and collaborators, Dwork et al., Wasserman and Zhou) Local Privacy: Changing privacy barrier (Evfimievski et al. 2003, Warner 1965)

θ! M X1

Duchi (UC Berkeley)

X2

X3

Xn

Privacy Aware Learning

December 2012

15 / 28

Formalizing privacy Prior work: Lots (Chaudhuri and collaborators, Dwork et al., Wasserman and Zhou) Local Privacy: Changing privacy barrier (Evfimievski et al. 2003, Warner 1965)

θ! M X1

Duchi (UC Berkeley)

X2

X3

Xn

Privacy Aware Learning

December 2012

15 / 28

Formalizing privacy Prior work: Lots (Chaudhuri and collaborators, Dwork et al., Wasserman and Zhou) Local Privacy: Changing privacy barrier (Evfimievski et al. 2003, Warner 1965)

X1

Duchi (UC Berkeley)

X2

θ!

M

M

Q(Z | X)

X3

Zi

Xn

Privacy Aware Learning

Xi December 2012

15 / 28

How do we get privacy? Example: Classification I

Data pairs (x, y) with x ∈ {−1, 1}d , label y ∈ {±1} 

1 −1 1 1 ··· x= Clump Uniform Adheres Chromatin · · ·  +1 if cancerous y= −1 if not

Duchi (UC Berkeley)

Privacy Aware Learning

−1 Mitoses



December 2012

16 / 28

How do we get privacy? Example: Classification I

Data pairs (x, y) with x ∈ {−1, 1}d , label y ∈ {±1} 

1 −1 1 1 ··· x= Clump Uniform Adheres Chromatin · · ·  +1 if cancerous y= −1 if not

−1 Mitoses



Idea: Add independent random noise W to coordinates of x: Zi = Xi + W

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

16 / 28

How do we get privacy? Example: Classification I

Data pairs (x, y) with x ∈ {−1, 1}d , label y ∈ {±1} 

1 −1 1 1 ··· x= Clump Uniform Adheres Chromatin · · ·  +1 if cancerous y= −1 if not

−1 Mitoses



Idea: Add independent random noise W to coordinates of x: Zi = Xi + W Problem: This is highly suboptimal, dimension dependence blows up

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

16 / 28

Communication model

Local Privacy: Communication model to study minimization of R(θ) = E[`(θ; X)]

M Zi

Q(Z | X) Xi

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

17 / 28

Communication model Local Privacy: Communication model to study minimization of R(θ) = E[`(θ; X)] I

Communicate ∇`(θ; Xi )

M Zi

Q(Z | X, θ) ∇ℓ(θ; Xi ) Xi

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

17 / 28

Communication model Local Privacy: Communication model to study minimization of R(θ) = E[`(θ; X)] I

Communicate ∇`(θ; Xi ) I I

Want to minimize, ∇` is sufficient Use stochastic optimization techniques with ∇`(θ; xi )

M Zi

Q(Z | X, θ) ∇ℓ(θ; Xi ) Xi

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

17 / 28

Communication model Local Privacy: Communication model to study minimization of R(θ) = E[`(θ; X)] I

Communicate ∇`(θ; Xi ) I I

I

Want to minimize, ∇` is sufficient Use stochastic optimization techniques with ∇`(θ; xi )

Really communicate Zi with property EQ [Zi | θ, Xi ] = ∇`(θ; Xi )

Duchi (UC Berkeley)

Privacy Aware Learning

M Zi

Q(Z | X, θ) ∇ℓ(θ; Xi ) Xi

December 2012

17 / 28

Main Contributions

Contribution 1: Optimal types of noise to guarantee privacy

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

18 / 28

Main Contributions

Contribution 1: Optimal types of noise to guarantee privacy

Contribution 2: Sharp upper and lower bounds on convergence rates as a function of privacy

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

18 / 28

Privacy Saddle Points Optimal Local Privacy:

M Maximize privacy of Q subject to EQ [Z | θ, X] = ∇`(θ; X)

Zi

Q(Z | X, θ) ∇ℓ(θ; Xi ) Xi

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

19 / 28

Privacy saddle points Goal: Maximize privacy of Z for X subject to θb being learnable (some constraints on Z)

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

20 / 28

Privacy saddle points Goal: Maximize privacy of Z for X subject to θb being learnable (some constraints on Z) Privacy metric: mutual information

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

20 / 28

Privacy saddle points Goal: Maximize privacy of Z for X subject to θb being learnable (some constraints on Z) Privacy metric: mutual information sup I(P ; Q) P ∈P

P

Duchi (UC Berkeley)

Xi

Q

Privacy Aware Learning

Zi

December 2012

20 / 28

Privacy saddle points Goal: Maximize privacy of Z for X subject to θb being learnable (some constraints on Z) Privacy metric: mutual information sup I(P ; Q) P ∈P

P

Xi

Q

Zi

Worst case information measure

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

20 / 28

Privacy saddle points Goal: Maximize privacy of Z for X subject to θb being learnable (some constraints on Z) Privacy metric: mutual information sup I(P ; Q) P ∈P

P

Xi

Q

Zi

Worst case information measure Strategy: We provide general solution to minimize sup I(P ; Q) Q

P ∈P

over distributions Q with larger support than P Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

20 / 28

Mutual information saddle point example Setting: Data x ∈ {−1, 1}d , allow z to be in kzk∞ ≤ M .

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

21 / 28

Mutual information saddle point example Setting: Data x ∈ {−1, 1}d , allow z to be in kzk∞ ≤ M .

q=

1 2

x2 2M

q=

1 x2 + 2 2M

Z1 Z2

X

2M Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

21 / 28

Mutual information saddle point example Setting: Data x ∈ {−1, 1}d , allow z to be in kzk∞ ≤ M .

q=

I

Independent coordinates zi ∈ {−M, M } Distribution

Q∗ (Zi = M | X) =

1 Xi + 2 2M

x2 2M

q=

1 x2 + 2 2M

Z1

Optimal distribution Q given X: I

1 2

Z2

X

2M Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

21 / 28

Example of optimal perturbation

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

22 / 28

Example of optimal perturbation

1 bit per bit

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

22 / 28

Example of optimal perturbation

.13 bits per bit (2× slower convergence)

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

22 / 28

Example of optimal perturbation

.033 bits per bit (4× slower convergence)

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

22 / 28

Example of optimal perturbation

.0081 bits per bit (8× slower convergence)

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

22 / 28

Example of optimal perturbation

.002 bits per bit (16× slower convergence)

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

22 / 28

Example of optimal perturbation

.0005 bits per bit (32× slower convergence)

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

22 / 28

Example of optimal perturbation

.00013 bits per bit (64× slower convergence)

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

22 / 28

Statistical estimation and convergence rates

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

23 / 28

Exhibiting tradeoffs

Goal: Understand tradeoff between mutual information bound I ∗ := min sup I(X; Z) Q

P

and number of samples n Reminder: θb is our estimate, based on X1 , . . . , Xn , R(θ) := E[`(θ; X)]

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

24 / 28

Exhibiting tradeoffs Goal: Understand tradeoff between mutual information bound I ∗ := min sup I(X; Z) Q

P

and number of samples n for risk minimization problems

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

25 / 28

Exhibiting tradeoffs Goal: Understand tradeoff between mutual information bound I ∗ := min sup I(X; Z) Q

P

and number of samples n for risk minimization problems Theorem: Effective sample size for d dimensional problem is made worse by nI ∗ n 7→ d

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

25 / 28

Exhibiting tradeoffs Goal: Understand tradeoff between mutual information bound I ∗ := min sup I(X; Z) Q

P

and number of samples n for risk minimization problems Theorem: Effective sample size for d dimensional problem is made worse by nI ∗ n 7→ d

I

Lower bound holds for all methods

I

Upper bound achieved by stochastic approximation

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

25 / 28

Exhibiting tradeoffs Have mutual information I ∗ := min sup I(X; Z) Q

P

Theorem: Optimality gap for d dimensional problem 1 b − R(θ? ) ≤ O(1) √1 Ω(1) √ ≤ E[R(θ)] n n

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

26 / 28

Exhibiting tradeoffs Have mutual information I ∗ := min sup I(X; Z) Q

P

Theorem: Optimality gap for d dimensional problem √ √ d d ? b Ω(1) √ ≤ E[R(θ)] − R(θ ) ≤ O(1) √ ∗ nI nI ∗

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

26 / 28

Exhibiting tradeoffs Have mutual information I ∗ := min sup I(X; Z) Q

P

Theorem: Optimality gap for d dimensional problem √ √ d d ? b Ω(1) √ ≤ E[R(θ)] − R(θ ) ≤ O(1) √ ∗ nI nI ∗ I

Lower bound holds for all methods

I

Upper bound achieved by stochastic approximation

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

26 / 28

Experimental example: breast cancer prediction

I

Regressors x are markers for breast cancer, labels y are presence/absence of tumor Measure predictive performance: count sign(θ> xi ) = yi 0.25

ℓ ∞ masking ℓ 1 masking 0.2

Error rate

I

0.15

0.1

0.05 −1

10

0

10

1

10

Maximum bits communicated Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

27 / 28

Conclusions and future work

1. Have given sharp rates of convergence when providers of data play “privacy game.” Extensions to differential privacy as well.

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

28 / 28

Conclusions and future work

1. Have given sharp rates of convergence when providers of data play “privacy game.” Extensions to differential privacy as well. 2. In a soon to be on arXiv paper, we generalize this: no more privacy game, essentially all statistical estimators

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

28 / 28

Conclusions and future work

1. Have given sharp rates of convergence when providers of data play “privacy game.” Extensions to differential privacy as well. 2. In a soon to be on arXiv paper, we generalize this: no more privacy game, essentially all statistical estimators 3. Is it possible to release a perturbed version of the data X1 , . . . , Xn ?

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

28 / 28

Conclusions and future work

1. Have given sharp rates of convergence when providers of data play “privacy game.” Extensions to differential privacy as well. 2. In a soon to be on arXiv paper, we generalize this: no more privacy game, essentially all statistical estimators 3. Is it possible to release a perturbed version of the data X1 , . . . , Xn ? 4. What if all we care about is protecting some function ϕ(Xi )?

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

28 / 28

Conclusions and future work

1. Have given sharp rates of convergence when providers of data play “privacy game.” Extensions to differential privacy as well. 2. In a soon to be on arXiv paper, we generalize this: no more privacy game, essentially all statistical estimators 3. Is it possible to release a perturbed version of the data X1 , . . . , Xn ? 4. What if all we care about is protecting some function ϕ(Xi )?

Thanks!

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

28 / 28

Conclusions and future work

1. Have given sharp rates of convergence when providers of data play “privacy game.” Extensions to differential privacy as well. 2. In a soon to be on arXiv paper, we generalize this: no more privacy game, essentially all statistical estimators 3. Is it possible to release a perturbed version of the data X1 , . . . , Xn ? 4. What if all we care about is protecting some function ϕ(Xi )?

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

28 / 28

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

29 / 28

Exhibiting tradeoffs Have mutual information I ∗ := min sup I(X; Z) Q

P

Theorem: There are constants a, b with b/a = O(1) dependent only on learning problem such that √ √ d ? b − R(θ ) ≤ √ d b √ a ≤ E[R(θ)] ∗ nI nI ∗

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

30 / 28

Exhibiting tradeoffs Have mutual information I ∗ := min sup I(X; Z) Q

P

Theorem: There are constants a, b with b/a = O(1) dependent only on learning problem such that √ √ d ? b − R(θ ) ≤ √ d b √ a ≤ E[R(θ)] ∗ nI nI ∗ I

Lower bound holds for all methods

I

Upper bound achieved by stochastic approximation

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

30 / 28

Mutual information saddle points Goal: Channel Q∗ (where Z ∼ Q∗ (· | X)) so that min max I(P, Q) ≥ max I(P, Q∗ ) Q

Duchi (UC Berkeley)

P,`

P,`

Privacy Aware Learning

December 2012

31 / 28

Mutual information saddle points Goal: Channel Q∗ (where Z ∼ Q∗ (· | X)) so that min max I(P, Q) ≥ max I(P, Q∗ ) Q

P,`

P,`

Z∈D

D

C X∈C

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

31 / 28

Mutual information saddle points Goal: Channel Q∗ (where Z ∼ Q∗ (· | X)) so that min max I(P, Q) ≥ max I(P, Q∗ ) Q

P,`

P,`

Z∈D

Theorem: Let ∇`(θ; X) ∈ C, Z ∈ D.

D

C X∈C

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

31 / 28

Mutual information saddle points Goal: Channel Q∗ (where Z ∼ Q∗ (· | X)) so that min max I(P, Q) ≥ max I(P, Q∗ ) Q

P,`

P,`

Z∈D

Theorem: Let ∇`(θ; X) ∈ C, Z ∈ D. If I

P ∗ is uniform on extreme points of C

I

Q∗ supported on extreme points of D, maximizes entropy of Z given X

D

C

Also Q∗ is unique

X∈C

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

31 / 28

Mutual information saddle points Goal: Channel Q∗ (where Z ∼ Q∗ (· | X)) so that min max I(P, Q) ≥ max I(P, Q∗ ) Q

P,`

P,`

Z∈D

Theorem: Let ∇`(θ; X) ∈ C, Z ∈ D. If I

P ∗ is uniform on extreme points of C

I

Q∗ supported on extreme points of D, maximizes entropy of Z given X

D

C

Also Q∗ is unique

X∈C

min max I(X; Z) = max min I(X; Z) = I(X ∗ ; Z ∗ ). Q

P,`

Duchi (UC Berkeley)

P,`

Q

Privacy Aware Learning

December 2012

31 / 28

Privacy intuition

Z x

Duchi (UC Berkeley)

Privacy Aware Learning

December 2012

32 / 28

Privacy intuition

Z′

Z x

Duchi (UC Berkeley)

Z

Z′

x

Privacy Aware Learning

December 2012

32 / 28