Two Differentially Private Rating Collection Mechanisms for

Report 3 Downloads 95 Views
Two Differentially Private Rating Collection Mechanisms for Recommender Systems Zheng Wenjie

arXiv:1604.08402v1 [stat.ML] 28 Apr 2016

April 29, 2016 Abstract We design two mechanisms for the recommender system to collect user ratings. One is modified Laplace mechanism, and the other is randomized response mechanism. We prove that they are both differentially private and preserve the data utility.

1

Introduction

Recommender Systems (RS) [1] are a kind of system that seek to recommend to users what they are likely interested in. Unlike search engines, the users do not need to type any keyword. The RS’s will learn their interest automatically. For instance, if the user has just bought a numeric camera, the RS will recommend to him some SD memory cards; if a user watches a lot of action movies, the RS may suggest some other action movies to him. And this is the typical behaviors which we observe universally in Netflix (movies), Youtube (videos), Google Play (apps), Facebook (friends), Amazon (goods) and other platforms today. One may wonder how it works. Let us take Netflix as an example. Netflix has a mechanism that allows every user to rate the movies they have watched. Based on these ratings, Netflix builds a profile for each user. And there should be quite a few methods to predict the user preference on other movies that the user has yet seen. Readers can learn more in Section 3. This is not the main topic of this article. The issue addressed in this article is whether user privacy is compromised by the rating collection mechanism, and what we may do to prevent it. In 2006, Netflix published 100 480 507 ratings that 480 189 users gave to 17 770 movies on the Internet to hold the Netflix Prize competition [2]. The data are anonymous. However, in 2007, two researchers from the University of Texas, deanonymized some of the Netflix data by matching the data set with movie ratings on the Internet Movie Database [3]. This aroused big privacy concern. In 2009, four Netflix users filed a lawsuit against Netflix. We see that this concern of privacy leak is real, and the anonymization alone is not sufficient to prevent it. Let us take a close look. There were actually two privacy leaks during the procedure. Firstly, the users leaked their ratings to the service provider Netflix. Then, Netflix leaked the ratings to the public. Users made a big fuss on the second leak, but they overlooked the fact that it was themselves who leaked the ratings to Netflix at the first place. Usually, all legal companies will ask the users to sign a user agreement, which authorizes the companies to collect user data and to use them for certain purposes. However, almost no users will ever read it. Anyway, if they do not agree, they will not be able to use the service. Hence, the goal of this article is to minimize the privacy leak but still guaranteeing the functionality of the service provider. We will achieve this goal by building differential privacy (DP) into the rating collection mechanism. The concept of DP will be explained in detail in Section 2. The big idea is that the user ratings are transformed through the rating collection mechanism, so that from the output (transformed ratings), one cannot know for sure what the input (original ratings) is. Of 1

course, this kind of transform should satisfy certain properties. After this transform, the service provider can do whatever they want with the ratings without worrying about privacy leak. They can either analyze it themselves, or subcontract the work to a third party by giving them the data access. It is also possible for Netflix to hold a second competition. One trivial transform is to transform every rating to zero or pure random number. This absolutely prevents any privacy leak, but it erases all information contained in the data as well. Therefore, when we build DP into the mechanism, we should be careful in order to preserve as much information as possible in the data. We designed two mechanisms. One is modified Laplace mechanism, and the other is randomized response mechanism (Section 2). We will show that they preserve the utility of the ratings (Section 3).

Related work [4] also tries to bring DP to RS. Their method is different from ours. Let X i be the original rating set of the i-th user, S be some aggregation statistic of ratings, A be some algorithm to do data analysis, and  f be some transform that guarantees DP. Their method can be summarized Xi , while our method, with a little abuse of notation, can be summarized as as A f S ⊗ni=1  A S ⊗ni=1 f X i . Note that we changed the position of the transform f . This modification is of significant advantage. In their method, f should be adapted to each statistic S, and they can only use algorithms A relying on S. In our method, we can generalize it to A ⊗ni=1 f X i , which means that we can use more types of algorithms. Furthermore, as long as f is “conjugate” (i.e. f does not change the space where X i is in), all previous successful algorithms could be seamlessly “transplanted”. And we will illustrate in Section 3 that this transplantation is also seamless in theoretical guarantee. When giving it a second thought, their method is neither privacy preserved nor meaningful. According to their method, at the moment where users transfer their ratings to the service provider so that it could calculate the statistic S, user privacy has already leaked to the service provider. Then, the service provider sends a differentially private version of recommendation back to the user. But why the user bothers to protect the privacy against himself?!

2

Mechanisms

In this section, we will first introduce the concept of differential privacy. Then we will define the modified Laplace mechanism and randomized response mechanism. Throughout this section, we consider the rating vector of a single user: x = (x1 , x2 , . . . , xn ), where n is the number of items. Note that the components may have missing values.

2.1

Differentially private mechanism

The concept of DP is not some entity lurking in the data, but it is used to describe a certain type of data releasing mechanism. The original idea is introduced in [5]. Since, dozens of formulations came out. We use the simplest formulation here. Definition 1. Let  be a positive value, a random application M : R −→ S is called -differentially private mechanism if Pr (M (y) ∈ S) ≤ exp () Pr (M (z) ∈ S) , for any y, z ∈ R and any S ⊂ S. The idea is that the distributions produced by y and z are absolutely continuous to each other with the multiplier exp (). With  close to 0, these two distributions should look similar, and it will be quite difficult  to infer whether it is y or z.  When it comes to the rating vector, it should be Pr M x(1) ∈ S ≤ exp () Pr M x(2) ∈ S , where x(1) and x(2) are two different rating vectors, which may represent two different users. Hence, the outputs of all users are mixed up and thus indistinguishable. 2

2.2

Modified Laplace mechanism

In this subsection, we introduce modified Laplace mechanism. The name comes from Laplace mechanism [5], which works only on continuous metrizable space. In order to handle missing value, we will modify it a bit. For convenience, we suppose that the data are normalized into the interval [−1, 1], and let the question mark ? denote the missing value. exp(/2) ) Definition 2. For any  ≥ 0, ξ ∼ Laplace(0, 2 ) is a random variable, and ζ ∼ Bernoulli( exp(/2)+1 is a random variable independent to ξ. A modified Laplace mechanism M (x) = (M (x1 ) , M (x2 ) , . . . , M (xn )) is defined as  ζ · (xi + ξ) + (1 − ζ) ·? : xi ∈ [−1, 1] M (xi ) = , (1) ζ·? + (1 − ζ) · ξ : xi =?

where by convention, 1·? =?, 0·? = 0 and ? + 0 =?. The idea is that besides adding Laplace noise, we randomly remove and create some ratings as well. This mechanism can be proven to be differentially private. Theorem 1. Modified Laplace mechanism is n-differentially private.

2.3

Randomized response mechanism

In this subsection, we present randomized response mechanism, which works on discrete data. Note that [6, 7] also use this term, and their definitions are even different between them. Our use of this term is closer to [7], and we will adapt it to rating data. Recommender systems rarely allow users to give continuous ratings. Instead, they often ask the user to rate an item by one to five stars. These ratings are surely ordinal, but we just ignore the order of the set. Along with the missing rating, we consider them as cardinal numbers. In the following definition, the number 0 can be seen as the missing rating, and the numbers 1, 2, . . . , d can be seen as the number of stars. Definition 3. Let W = {0, 1, 2, ..., d} be a set of finite cardinality. For any  ≥ 0, any i ∈ W , ξi is a (independent) random variable with support on W whose probability mass function is I(j=i)

, for any j ∈ W . A randomized response mechanism M (x) = defined by pi (j) = exp() exp()+d (M (x1 ) , M (x2 ) , . . . , M (xn )) is defined as M (xk ) = ξxk . The idea is that the transformed ratings (including missing ratings) will most likely remain the same as the original ratings, but there is still possibility that they are transformed to other ratings (with equal probability). One can prove that this mechanism is differentially private. Theorem 2. Randomized response mechanism is n-differentially private.

3

Utility

A natural question is that whether the transformed ratings are useful. If the transformed ratings produce nonsense, then there will be no meaning of this transform although the user privacy is protected. This question can be decomposed into two subquestions: what the usefulness means and what could be the possible way to make it useful. For the first question, we will use the statistical estimation framework. And for the second one, we will use the low-rank matrix completion method. We start with the framework. There are m users and n items in the universe. Θm×n is the unknown matrix of true ratings that each user will give to each item. This is a dense matrix without any missing values. However, since there are so many items, users are not able to test every item and their ratings are corrupted by noise. What we finally observe is a sparse matrix Xm×n , which could be regarded as some approximation of Θ. Then, we apply either of our mechanism on X to generate the transformed rating matrix Zm×n . Since our mechanism is computed elementwisely, this can be 3

X

Θ

Z

Figure 1: Rating generating process done locally at each user’s computer. After that, each user sends their transformed rating vector to the service provider, who observes the matrix Z. The service provider’s goal is to recover Θ from Z. Now we will show how it is possible to recover Θ from Z instead of from X. As mentioned in Section 1, there are quite a few methods. Interesting readers can refer to [1, 8]. Here we will only present one method, but the analysis can be generalized to all methods. This method is low-rank matrix completion. We suppose Θ is a low-rank matrix, i.e. rank(Θ) = r  min(m, n). If the true rating matrix is low-rank, then we are able to approximately recover it with a few corrupted ratings under certain condition such as restricted isometry property (RIP) [9]. Definition 4. Let ΩZ denote the support of the non-missing ratings of Z. A projection operator PΩZ satisfies restricted isometry property if it obeys 2

(1 − α) kAkF ≤

1 p

2

2

kPΩZ (A)kF ≤ (1 + α) kAkF ,

(2)

for any matrix A with sufficiently small rank and α ∈ (0, 1) sufficiently small, where p is the proportion of non-missing values of Z and k·kF is the Frobenius norm. The recover process is described as follow. Suppose that ρ := kPΩZ (Θ − Z)kF < ∞.

(3)

ˆ is obtained from the following optimization problem Our estimator Θ arg min M

s.t.

kM k∗

(4)

kPΩZ (M − Z)kF ≤ ρ,

where k·k∗ is the nuclear norm (a.k.a. trace norm). Under low-rank hypothesis and RIP, [9] proved



ˆ

Θ − Θ ≤ C0 p−1/2 ρ

(5)

F

for some numerical constant C0 . This means that the estimation error on the whole matrix is proportional to the error on the support of the observed matrix, which means that the recover method enjoys a kind of stability against the noise quantified by ρ. Of course, this noise includes not only the noise intrinsic in the problem (i.e. between Θ and X) but also the noise artificially introduced through the mechanism (i.e. between X and Z). We see how easily the traditional analysis techniques can be seamlessly transplanted to the new setting. What remains is just to give an upper bound of ρ. Let ΩX denote the support of the non-missing ratings √ of X, and s := |ΩX | be the number of non-missing ratings. Suppose that kPΩX (Θ − X)kF ≤ ρ0 s < ∞, for some small constant ρ0 . This hypothesis is quite realistic. Indeed, this is what we need if we want to recover Θ from the untransformed ratings X. Then we have the following theorems. Theorem 3. Let γ ∈ (0, 1) be the level of tolerance. With probability at least 1 − γ, the Z generated by modified Laplace mechanism satisfies s   r √ 4 s 2mn 8 ρ ≤ ρ0 s + + 1+ 2 ; (6)   γ  (e 2 + 1)γ 4

with probability at least 1 − γ, the Z generated by randomized response mechanism satisfies s



ρ ≤ ρ0 s + 2(d − 1)

2mnd . (e + d)γ

(7)

When √ the privacy parameter  increases toward infinity, the above upper bounds decrease toward ρ0 s. This means that the less the level of differential privacy is, the more accurate the data are and then the more precise estimation we will have. This is intuitive, since the larger  is, the less extra noise we introduce into the data. In practice, it is desirable to choose an  which makes the √ entire upper bound match the order of ρ0 s. Combining these bounds with (5), we assure the utility of our transformed ratings.

4 4.1

Proof Proof of Theorem 1

Proof. According to the values of (x, y) and S, we should divide it into nine cases. We only consider non empty set of S, since the empty set case is trivial. i) (x, y) ∈ [−1, 1]2 and S ∈ R([−1, 1]) Pr(M(x) ∈ S) Pr(ζ1 = 1, x + ξ1 ∈ S) = Pr(M(y) ∈ S) Pr(ζ2 = 1, y + ξ2 ∈ S) Pr(ζ1 = 1) Pr(x + ξ1 ∈ S) = Pr(ζ2 = 1) Pr(y + ξ2 ∈ S) Pr(x + ξ1 ∈ S) = Pr(y + ξ2 ∈ S) ≤ e . ii) (x, y) ∈ [−1, 1]2 and S = {?} Pr(M(x) ∈ S) Pr(ζ1 = 0) = = 1 ≤ e . Pr(M(y) ∈ S) Pr(ζ2 = 0) iii) (x, y) ∈ [−1, 1]2 and S is more than {?}

Pr(M(x) ∈ S) Pr(ζ1 = 1, x + ξ1 ∈ S) + Pr(ζ1 = 0) = ≤ e . Pr(M(y) ∈ S) Pr(ζ2 = 1, y + ξ2 ∈ S) + Pr(ζ2 = 0) iv) x ∈ [−1, 1], y =?, and S ∈ R([−1, 1])

Pr(M(x) ∈ S) Pr(ζ1 = 1, x + ξ1 ∈ S) = Pr(M(y) ∈ S) Pr(ζ2 = 0, ξ2 ∈ S) Pr(ζ1 = 1) Pr(x + ξ1 ∈ S) = Pr(ζ2 = 0) Pr(ξ2 ∈ S) 



≤ e2 · e2 = e . v) x ∈ [−1, 1], y =?, and S = {?}

5

 Pr(M(x) ∈ S) Pr(ζ1 = 0) = = e− 2 ≤ e . Pr(M(y) ∈ S) Pr(ζ2 = 1)

vi) x ∈ [−1, 1], y =? and S is more than {?}

Pr(M(x) ∈ S) Pr(ζ1 = 1, x + ξ1 ∈ S) + Pr(ζ1 = 0) = ≤ e . Pr(M(y) ∈ S) Pr(ζ2 = 0, ξ2 ∈ S) + Pr(ζ2 = 1) vii) y ∈ [−1, 1], x =?, and S ∈ R([−1, 1])

Pr(M(x) ∈ S) Pr(ζ1 = 0, ξ1 ∈ S) = Pr(M(y) ∈ S) Pr(ζ2 = 1, y + ξ2 ∈ S) Pr(ζ1 = 0) Pr(ξ1 ∈ S) = Pr(ζ2 = 1) Pr(y + ξ2 ∈ S) 



≤ e− 2 · e 2 ≤ e . viii) y ∈ [−1, 1], x =?, and S = {?}

 Pr(M(x) ∈ S) Pr(ζ1 = 1) = = e 2 ≤ e . Pr(M(y) ∈ S) Pr(ζ2 = 0)

ix) y ∈ [−1, 1], x =? and S is more than {?}

Pr(M(x) ∈ S) Pr(ζ1 = 0, ξ1 ∈ S) + Pr(ζ1 = 1) = ≤ e . Pr(M(y) ∈ S) Pr(ζ2 = 1, y + ξ2 ∈ S) + Pr(ζ2 = 0)

4.2

Proof of Theorem 2

Proof. For any (x, y) ∈ W 2 , we have P Pr(M(x) = s) Pr(M(x) ∈ S) = Ps∈S . Pr(M(y) ∈ S) s∈S Pr(M(y) = s) s could be three kinds of values: x, y and the other. So we divide it into three cases. i) s = x Pr(M(x) = x) = Pr(M(y) = x)

e e +d 1 e +d

= e .

ii) s = y Pr(M(x) = y) = Pr(M(y) = y) 6

1 e +d e e +d

= e− ≤ e .

(8)

iii) s 6= x and s 6= y Pr(M(x) = s) = Pr(M(y) = s)

1 e +d 1 e +d

= 1 ≤ e .

So in either way, the fraction is no more than e . Join these equations into (8), we get P Pr(M(y) = s)e Pr(M(x) ∈ S) ≤ Ps∈S = e . Pr(M(y) ∈ S) Pr(M(y) = s) s∈S

4.3

Proof of Theorem 3

Proof. Firstly, we decompose 3 into three terms.

kPΩZ (Θ − Z)kF = PΩZ ∩ΩX (Θ − X) + PΩZ ∩ΩX (X − Z) + PΩZ \ΩX (Θ − Z) F

≤ kPΩZ ∩ΩX (Θ − X)kF + kPΩZ ∩ΩX (X − Z)kF + PΩZ \ΩX (Θ − Z) F . The first term

√ kPΩZ ∩ΩX (Θ − X)kF ≤ kPΩX (Θ − X)kF ≤ ρ0 s.

Then, we calculate the mathematical expectation of the square of the second term.   X 2 (Xij − Zij )2  E kPΩZ ∩ΩX (X − Z)kF = E  (i,j)∈ΩZ ∩ΩX

 

 X

= E E 

(Xij − Zij )2 |ΩX , ΩZ 

(9)

(i,j)∈ΩZ ∩ΩX



   E (Xij − Zij )2 |ΩX , ΩZ  .

X

= E

(i,j)∈ΩZ ∩ΩX

In the same way, we also calculate for the third term.   X

2

(Θij − Zij )2  E PΩZ \ΩX (Θ − Z) F = E  (i,j)∈ΩZ \ΩX

 

 X

= E E 

(Θij − Zij )2 |ΩX , ΩZ 

(i,j)∈ΩZ \ΩX

 = E

X

   E (Θij − Zij )2 |ΩX , ΩZ  .

(i,j)∈ΩZ \ΩX

For modified Laplace mechanism

 2 2 8 = 2 , ∀i, j ∈ ΩX ∩ ΩZ ;    2   2 8 2 2 E (Θij − Zij ) |ΩX , ΩZ = Θij + 2 ≤ 1 + 2 , ∀i, j ∈ ΩZ \ΩX .  

  E (Xij − Zij )2 |ΩX , ΩZ = 2

7

(10)

Join these into (9) and (10), we get 8s 2  

2

mn 8 E PΩZ \ΩX (Θ − Z) F ≤  1+ 2 ,  e2 + 1 2

E kPΩZ ∩ΩX (X − Z)kF ≤

where s := |ΩX |. Then, r   2 E kPΩZ ∩ΩX (X − Z)kF 16s γ Pr kPΩZ ∩ΩX (X − Z)kF > ≤ = 16s 2  γ 2 2 γ s

2

! 

E PΩZ \ΩX (Θ − Z) F 8 γ 2mn  ≤ Pr PΩZ \ΩX (Θ − Z) F > 1+ 2 = .  2mn 8  2 (e 2 + 1)γ 1 +  2 (e 2 +1)γ

Finally, we have Pr kPΩZ (Θ − Z)kF ≤ ρ0



4 s+ 

r

s + γ

s

2mn  (e 2 + 1)γ



8 1+ 2 

! ≥ 1 − γ.

For randomized response mechanism

  d , ∀i, j ∈ ΩX ∩ ΩZ ; E (Xij − Zij )2 |ΩX , ΩZ ≤ (d − 1)2  e +d   E (Θij − Zij )2 |ΩX , ΩZ = (d − 1)2 , ∀i, j ∈ ΩZ \ΩX . Join these into (9) and (10), we get 2

E kPΩZ ∩ΩX (X − Z)kF ≤ (d − 1)2

sd , e + d

2 mnd E PΩZ \ΩX (Θ − Z) F ≤  (d − 1)2 , e +d where s := |ΩX |. Then, s Pr kPΩZ ∩ΩX (X − Z)kF > s

Pr PΩZ \ΩX (Θ − Z) F >

2sd(d − 1)2 (e + d)γ

!

2mnd(d − 1)2 (e + d)γ

!

2



E kPΩZ ∩ΩX (X − Z)kF



2 E PΩZ \ΩX (Θ − Z) F

2sd(d−1)2

s

Pr kPΩZ (Θ − Z)kF ≤ ρ0 s + 2(d − 1)

8

γ 2

=

γ . 2

(e +d)γ

2mnd(d−1)2 (e +d)γ

Finally, since s ≤ mn, we have √

=

2mnd (e + d)γ

! ≥ 1 − γ.

References [1] Francesco Ricci, Lior Rokach, and Bracha Shapira. Introduction to recommender systems handbook. Springer, 2011. [2] James Bennett and Stan Lanning. The netflix prize. In Proceedings of KDD cup and workshop, volume 2007, page 35, 2007. [3] Arvind Narayanan and Vitaly Shmatikov. Robust de-anonymization of large sparse datasets. In Security and Privacy, 2008. SP 2008. IEEE Symposium on, pages 111–125. IEEE, 2008. [4] Frank McSherry and Ilya Mironov. Differentially private recommender systems: building privacy into the net. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 627–636. ACM, 2009. [5] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography, pages 265–284. Springer, 2006. [6] John C Duchi, Michael I Jordan, and Martin J Wainwright. Local privacy, data processing inequalities, and statistical minimax rates. arXiv preprint arXiv:1302.3203, 2013. [7] Peter Kairouz, Sewoong Oh, and Pramod Viswanath. Extremal mechanisms for local differential privacy. In Advances in neural information processing systems, pages 2879–2887, 2014. [8] Charu C Aggarwal. Recommender Systems: The Textbook. Springer, 2016. [9] M Fazel, E Candes, B Recht, and P Parrilo. Compressed sensing and robust recovery of low rank matrices. In Signals, Systems and Computers, 2008 42nd Asilomar Conference on, pages 1043–1047. IEEE, 2008.

9