Optimal Private Halfspace Counting via Discrepancy

Report 4 Downloads 66 Views
Optimal Private Halfspace Counting via Discrepancy S. Muthukrishnan

Aleksandar Nikolov

Rutgers U.

Muthu, A. Nikolov (Rutgers)

Private Range Counting

1 / 21

Range Counting

Private Range Counting Public Input: A ground set P ⊆ Rd ; a range space, i.e. collection of sets R ⊆ 2P induced by some natural geometric sets Private input: Integer weight xp for each p ∈ P.

Goal: For all ranges R ∈ R, approximate privately X R(x) = xp p∈R

Accuracy: Mean squared error of an algorithm M is 1 X (R(x) − M(R, x))2 |R| R∈R

Muthu, A. Nikolov (Rutgers)

Private Range Counting

3 / 21

Range Counting

Halfspace Counting Each R ∈ R is the points of P contained in some halfspace in Rd . Query: what is the total weight of all points of P in halfspace R?

Fundamental in Computational Geometry. Other range queries can be expressed as halfspace queries by “lifting” them to a higher dimension. R(x) = x2 + x3 + x5 + x7

R x1

x2 x3 x4 x5

x6

x7 x8

Muthu, A. Nikolov (Rutgers)

Private Range Counting

4 / 21

Range Counting

Private Linear Queries

More general algebraic problem: Public Input: A query matrix A ∈ Rm×n

In range counting: each row of A is the indicator of a range

Private Input: A vector x ∈ Zn

In range counting: the private point weights

Goal: An algorithm M that approximates Ax and satisfies a privacy guarantee ((ε, δ)-differential privacy). Accuracy: Mean squared error is

Muthu, A. Nikolov (Rutgers)

1 m kAx

Private Range Counting

− M(A, x)k22

5 / 21

Range Counting

Differential Privacy

Definition An algorithm M with input domain Zn and output range Y is (ε, δ)-differentially private if for every n, every x, x0 with kx − x0 k1 ≤ 1, and every measurable S ⊆ Y , M satisfies Pr[M(x) ∈ S] ≤ e ε Pr[M(x0 ) ∈ S] + δ.

Muthu, A. Nikolov (Rutgers)

Private Range Counting

6 / 21

Range Counting

What is Known about Halfspace Counting?

Lower Bounds: Ω(n) squared error necessary for arbitrary 0-1 A when m > n [DN03] Does not apply to halfspace counting! No superconstant lower bound known.

Upper Bounds: Randomized response gives O(n log m). For halfspaces m = O(nd ), therefore O(nd log n) error is sufficient.

Muthu, A. Nikolov (Rutgers)

Private Range Counting

7 / 21

Range Counting

Our Results Lower bounds Private halfspace counting in Rd requires Ω(n1−1/d ) mean squared error. More generally: linear queries A require noise lower bounded by the (hereditary) combinatorial discrepancy of A (up to a log factor).

Upper bounds Halfspace counting can be approximated privately with O(n1−1/d ) mean squared error. More generally: range counting for ranges with shatter functions exponent d can be approximated with the same error. Bounds also extend to worst case error (up to polylog factors).

Both results use discrepancy theory. Muthu, A. Nikolov (Rutgers)

Private Range Counting

8 / 21

Lower Bounds

Lower bound: Dinur-Nissim attack

Assume: There exists M such that for any x w.h.p. kAx − M(A, x)k2 ≤ E .

Adversary’s Goal: Given output of M(A, x), compute x0 , kx − x0 k1  n. So M is not private. Procedure: Output any x0 s.t. kAx0 − M(A, x)k2  E (succeeds w.p. 1 − β).

We have kAx − Ax0 k2 ≤ kAx0 − M(A, x)k2 + kAx − M(A, x)k2  E .

Needed: E such that kAx0 − Axk2  E ⇒ kx − x0 k1  n.

Muthu, A. Nikolov (Rutgers)

Private Range Counting

10 / 21

Lower Bounds

Discrepancy connection

Discrepancy: The adversary can succeed when E  discα (A) =

min

b∈{0,±1}n kbk1 ≥αn

kAbk2

When α = 0, this is trivially 0. When α = 1, this is the classical combinatorial `2 discrepancy. Can we connect discα to disc1 when α ∈ (0, 1)?

Muthu, A. Nikolov (Rutgers)

Private Range Counting

11 / 21

Lower Bounds

A More Robust Lower Bound

herdiscα (A) = maxS⊆[n] discα (A|S ) Weaker success condition for the adversary: choose a subset S of [n] (based only on A) and then guess most of x restricted to S: still implies a contradiction with (ε, δ)-differential privacy adversary can succeeds when E  herdiscα (A)

herdiscα (A) ≥ herdisc1 (A)/O(log n) (for constant α)

Muthu, A. Nikolov (Rutgers)

Private Range Counting

12 / 21

Lower Bounds

Putting it together

Theorem (Main Lower Bound) No algorithm M that satisfies ∀x ∈ {0, 1}n : Pr[kAx − M(A, x)k2 = o(herdisc1 (A)/ log n)] ≥ 1 − β, is (ε, δ)-differentially private for ε = O(1), and constant δ < 1 and β < 1. Halfspace counting: Mean squared error for private halfspace queries is Ω(n1−1/d / log n) Using the hereditary structure of halfspace range spaces, we can show mean squared error is Ω(n1−1/d ).

Muthu, A. Nikolov (Rutgers)

Private Range Counting

13 / 21

Upper Bounds

Two Tools: Input and Output Perturbation

Input perturbation: Compute ˜ x = x + Lap(1/ε)n and output A˜ x. Output perturbation: Output Ax + Lap(1/ε0 )m for ε0 chosen to satisfy (ε, δ)-differential privacy.

Muthu, A. Nikolov (Rutgers)

Private Range Counting

15 / 21

Upper Bounds

When Do the Tools Work?

For range counting: input perturbation works well with small ranges (squared error linear in size of range) output perturbation works well when each point belongs to few ranges (squared error linear in maximum degree) But for halfspaces most ranges are large and most points belong to many ranges. Solution from discrepancy theory: halfspace ranges admit a nice decomposition [Mat95]. (works for range spaces with VC dimension d and shatter function exponent d)

Muthu, A. Nikolov (Rutgers)

Private Range Counting

16 / 21

Upper Bounds

Decomposition n Decompose R into a series of new range spaces {Ti }log i=1 such that approximating counts for each Ti gives the counts for R.

R is decomposed into:

Ti with many small sets (i large): can use input perturbation Ti with few large sets (i small): can use output perturbation

Do we achieve the right balance? No!

Values of i s.t. noise variances is O(n1−1/d ): Output perturbation

i1 =

log n d

Muthu, A. Nikolov (Rutgers)



log n d2

Input perturbation

i0 =

log n d

Private Range Counting

17 / 21

Upper Bounds

How to Make It work For i ∈ (i1 , i0 ): For any Ti , there are points p that belong to a lot of sets and incur large privacy loss i.e. we need noise with variance Ω(n) to preserve their privacy

But we control both maximum set size and number of sets in Ti ! Idea: use average privacy loss (privacy loss averaged over all p) The “average” p requires only O(n1−1/d ) noise to preserve its privacy

We find a set X s.t. the privacy of each p ∈ X can be preserved by noise with variance O(n1−1/d ) |X | ≥ |P|/2.

Muthu, A. Nikolov (Rutgers)

Private Range Counting

18 / 21

Upper Bounds

A partial coloring style algorithm: For i ≥ i0 use input perturbation to approximate counts for Ti w.r.t. X For i < i0 , we add Laplace noise with variance O(n1−1/d 2(i0 −i)(1−d) ) to approximate counts for Ti w.r.t. X

This allows us to compute halfspace counts over X with squared error O(n1−1/d ). Recurse on P \ X (still a halfspace range space)

Muthu, A. Nikolov (Rutgers)

Private Range Counting

19 / 21

Conclusion

This work: Optimal upper and lower bounds for private halfspace counting Connection between discrepancy theory and noise lower bounds for differential privacy Other results: A lower bound of Ω((log n)d−1 ) for orthogonal range counting. Tight up to the dependence on d. Open question: Does discrepancy always characterize the error needed to preserve privacy of linear queries?

Thank you!

Muthu, A. Nikolov (Rutgers)

Private Range Counting

21 / 21

Conclusion

Avrim Blum, Katrina Ligett, and Aaron Roth. A learning theory approach to non-interactive database privacy. In Proceedings of the 40th annual ACM symposium on Theory of computing, STOC ’08, pages 609–618, New York, NY, USA, 2008. ACM. Irit Dinur and Kobbi Nissim. Revealing information while preserving privacy. In Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, PODS ’03, pages 202–210, New York, NY, USA, 2003. ACM. C. Dwork, G. N. Rothblum, and S. Vadhan. Boosting and differential privacy. In Proc. 51st Annual IEEE Symp. Foundations of Computer Science (FOCS), pages 51–60, 2010. M. Hardt and G. N. Rothblum. A multiplicative weights mechanism for privacy-preserving data analysis. Muthu, A. Nikolov (Rutgers)

Private Range Counting

21 / 21

Conclusion

In Proc. 51st Annual IEEE Symp. Foundations of Computer Science (FOCS), pages 61–70, 2010. J. Matou˘sek. Tight upper bounds for the discrepancy of half-spaces. Discrete and Computational Geometry, 13(1):593–601, 1995.

Muthu, A. Nikolov (Rutgers)

Private Range Counting

21 / 21