Towards a Theory for Privacy Preserving ... - Semantic Scholar

Comment

Report 1 Downloads 60 Views

Towards a Theory for Privacy Preserving Distributed OLAP Alfredo Cuzzocrea

Elisa Bertino

Domenico Saccà

ICAR-CNR and University of Calabria Cosenza, Italy

CERIAS and Purdue University West Lafayette, IN, USA

DEIS Dept., University of Calabria Cosenza, Italy

[email protected]

[email protected]

[email protected]

ABSTRACT Privacy Preserving Distributed OLAP identifies a collection of models, methodologies and algorithms devoted to ensuring the privacy of multidimensional OLAP data cubes in distributed environments. While there is noticeable research on practical and pragmatic aspects of Privacy Preserving OLAP, both in centralized and distributed environments, the active literature is lacking of contributions falling in the theory-side of this emerging research topic. Contrary to this, according to our vision, there is a significant need for theoretical results, which may involve in benefits for a wide spectrum of aspects, such as privacy preserving knowledge fruition schemes and query optimization. Inspired by these considerations, starting from our previous research result where the main privacy preserving distributed OLAP framework has been introduced, this paper proposes some theoretical results that nicely extend the capabilities and the potentialities of the framework above.

1. INTRODUCTION The issue of effectively and efficiently computing and managing privacy preserving OLAP data cubes [10,8] has attracted the interest from a large community of Database and Data Warehousing researchers. This problem indeed plays a critical role for both centralized [11,20,26,29,30,31,12,23,3] and distributed [2,28,19,22,4,21] environments. Applications where computing privacy preserving data cubes is relevant embrace a large range of cases, spanning from Business Intelligence (BI) systems to Data Mining and Analysis tools, and from sensor network data analysis tools to social network data components.

Bob perform OLAP on sale data extracted from the own respective legacy databases, with a focus on computer and electronic parts for Alice and Bob, respectively. To this end, Alice build and query the two-dimensional data cube SALESCP, and Bob build and query the two-dimensional data cube SALESEP, respectively. Data cubes SALESCP and SALESEP have the same multidimensional schema constituted by the dimensions Time (with granularity Year) and Zone (with granularity Country), respectively, and the measure Sale. As regards data, both data cubes SALESCP and SALESEP store aggregations on sales occurred in EU countries during the second half of the 2009. It should be noted that the respective OLAP tasks performed by Alice and Bob against their proper data cube are secure, hence they do not expose possible privacy breaches. Contrary to this, when Alice and Bob need to aggregate their respective data cubes into one common SUM-based data cube (named as SALES in Fig. 1 – note that SALES is characterized by the same multidimensional schema of SALESCP and SALESEP) for BI purposes, possible privacy breaches arise as both Alice and Bob do not want that the respective analysis partner (Bob and Alice, respectively) can access neither infer sensitive data cells stored in their proper data cubes SALESCP, and SALESEP, respectively, while, at the same, Alice and Bob aim at accessing the aggregated data cube SALES for BI purposes.

Despite the “equal dignity” of both the target environments (i.e., centralized and distributed), the distributed case is indeed more relevant, as today very large organization very often make use of a distributed infrastructure for producing, managing and delivering knowledge (e.g., [2]). Following this main consideration, in this paper we have focus on the more relevant case of computing and managing privacy preserving OLAP data cubes in distributed settings. Fig. 1 depicts an applicative example of such a scenario. Here, Alice and Bob are two analysts working at different companies federated to the same main organization, respectively. Alice and Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. PAIS’12, March 30, 2012, Berlin, Germany. Copyright 2012 ACM 978-1-4503-1143-4/12/03…$10.00.

Figure 1. A privacy preserving distributed OLAP scenario. It should noted that, Alice (Bob, respectively) can easily infer data cells of the Bob’s (Alice’s, respectively) data cube SALESEP (SALESCP, respectively) starting from data cells stored in the common data cube SALES and her (his, respectively) proper data

cube SALESCP (SALESEP, respectively), interpolation techniques [11,12].

via

simple

linear

Our reference scenario example adheres to the so-called Secure Multiparty Computation (SMC) [25] model, which is well-known in the context of Privacy Preserving Distributed Data Mining research [5], for which Privacy Preserving Distributed OLAP can be reasonably considered a major, yet-independent, research area. In order to solve research challenges deriving from the issue of computing and managing privacy preserving data cubes paradigm in distributed environments, in [9] we introduced an innovative privacy preserving distributed OLAP framework that relies on the novel concept of the so-called secure distributed OLAP aggregation task. Basically, this task purses the idea of performing OLAP across multiple distributed SUM-based twodimensional OLAP views extracted from data cubes under the SMC requirements [9]. To this end, the Secure Distributed OLAP aggregation protocol (SDO) has been introduced and experimentally assessed in [9]. Briefly, the SDO protocol works as follows [9]. Having fixed a certain node ordering (e.g., DNSbased), the first node N0 in the distributed environment computes a privacy preserving version denoted by V0, and sends

V0PP of its proper OLAP view,

V0PP is privacy preserving, V1PP is also

privacy preserving. This step is iterated until the local OLAP result computed at node Nn-1 in the distributed environment, , is returned to the node N0, which derives from VnPP 1

VnPP 1 the exact global OLAP result of the target OLAP task, denoted by VGLOBAL, on the basis of the local view V0, which is hidden to the other nodes of the distributed environment, and its privacy preserving version

A

  C    

pj 

Formally, given a large m  n matrix A, a CUR matrix decomposition is a low-rank approximation of A, denoted by A’,

R

A[][ j ]



pi 

A'

   

(1)

j

2

A[][ j ]

2

(2)

A[i ]

2

 A[i]

2

(3)

i

respectively. In line with the research result provided in [9], in this paper we further extend the proposed privacy preserving distributed OLAP framework [9] via providing a number of theoretical results that nicely extend the capabilities and the potentialities of the framework above. According to our vision, these theoretical results may involve in benefits for a wide spectrum of aspects, such as privacy preserving knowledge fruition schemes and query optimization. Particularly, we provide theoretical contributions on the following aspects of our framework [9]: 

re-construction capabilities of the CUR decomposition method, where we prove that the privacy preserving twodimensional OLAP views can be used for re-constructing the original OLAP views in a theoretically-sound manner;



independence capabilities of the CUR decomposition method, where we prove that the final two-dimensional OLAP view obtained from the target distributed OLAP aggregation task can be obtained from the two-dimensional OLAP views of the first and the last node of the reference environment, respectively, without dependence on the OLAP views of the remaining nodes, still in a theoretically-sound manner.

V0PP . The final global OLAP result

Particularly, in [9] we specifically focus the attention on the class of SUM-based distributed OLAP aggregation tasks, being SUM a popular aggregate operator for OLAP applications (e.g., [19]). Despite this, our framework [9] is general enough to deal with more sophisticated distributed OLAP aggregation tasks that embed complex OLAP aggregations (e.g., [18]) rather than conventional ones (e.g., SUM, COUNT, AVG). Also, while our framework [9] is general enough to deal with OLAP views computed over any arbitrary kind of data sources, in [9] we considered the specialized applicative case represented by data cubes computed on top of distributed collections of XML documents, which are more and more relevant for BI applications. The core of the framework [9] is represented by the CUR matrix decomposition technique [14], which allows us to computed privacy preserving two-dimensional OLAP views effectively and efficiently, at a provable approximation error [9].

      

whereas the probability pi for rows is defined as follows:

GLOBAL

V is sent to the external application that finally forwards VGLOBAL to all the other nodes in the distributed environment.

  U         

where: (i) C is an m  c matrix that stores O(1) columns of A; (ii) R is an r  n matrix that stores O(1) rows of A; (iii) U is a c  r carefully-chosen matrix. In particular, the number of columns of C consists of c = (1/2) columns of A, and the number of rows of R consists of r = (1/2) rows of A, respectively, with   0 arbitrarily small. C and R are built by means of adaptive sampling [27], via c (r, respectively) trials by picking a column (a row, respectively) of A with probability pj defined as follows:

V0PP with its

proper local view, denoted by V1, in order to perform the target OLAP operation and (ii) sends the local OLAP result, which is again represented by a view, denoted by V1PP , to the “following” node according to the fixed node ordering, and so forth. It should

denoted by

   

V0PP to the second node N1 in the

distributed environment that, in turn, (i) combines

be noted that, since

that represents A in terms of a small number of columns and rows of A, as follows:

All these theoretical results globally constitutes a first attempt towards a theory for Privacy Preserving Distributed OLAP, which is the main contribution of this paper, and nicely integrates and completes our previous research results presented in [9].

2. THEORETICAL ANALYSIS AND RESULTS ON THE RE-CONSTRUCTION CAPABILITIES OF THE CUR DECOMPOSITION METHOD Another critical property that is central to theoretical aspects of the CUR decomposition method consists in assessing the capabilities of the method in re-constructing the original matrix A

from the approximating matrix A’ that is retrieved by the method itself. In fact, beyond playing a central role in the effectiveness of the proposed privacy preserving distributed OLAP framework, the re-construction property also ensures the theoretical convergence of conceptual constructs and theory tools of the framework. In order to prove the re-construction property ensured by the CUR decomposition method, we provide Theorem 2 (see next) whose proof is characterized by a structure inspired to a theoretical model proposed in [2], which is the state-of-the-art result in the context of perturbation-based distributed privacy preservation techniques over OLAP data cubes. In more detail, as regards the re-construction property ensured by the proposed Retention Replacement Perturbation algorithm, in [2] authors provide rigorous probabilistic bounds over aggregates that are reconstructed from a relational table that has been perturbed by means of their algorithm. These aggregates are defined in terms of input range queries over the perturbed relational table, and their values are compared with the values of aggregates retrieved by the same queries over the original relational table. Here, we follow a similar structure, i.e. we study the re-construction property of the CUR decomposition method via considering the aggregate values of range queries over the approximating matrix A’ in comparison with the aggregate values of the same queries over the original matrix A. Before to provide Theorem 2, some definitions are necessary. First, we define a two-dimensional range query Q over the m  n matrix A (A’, respectively) as follows: Q = [l1:u1; l2:u2]

(4)

such that: (i) l1 denotes a lower bound on the dimension d1 of A (A’, respectively); (ii) u1 denotes an upper bound on the dimension d1 of A (A’, respectively); (iii) l1 < u1; (iv) l2 denotes a lower bound on the dimension d2 of A (A’, respectively); (v) u2 denotes an upper bound on the dimension d2 of A (A’, respectively); (vi) l2 < u2. On the basis of well-understood matrix algebra [17] principles, the evaluation of Q over A (A’, respectively) can be expressed as follows: xT  A  y = z

(5)

the original matrix A, or cell partitions of A, in the perturbed matrix A’ (due to the CUR decomposition method, in our case), or cell partitions of A’. In our theoretical analysis, we interpret numeric functions γ as the data distributions associated to elements of the original matrix A (the approximating matrix A’, respectively). A relevant property of a re-constructible function γ is that of verifying whether it is n,ε,δ-re-constructible by means of the so-called re-constructing function γ’, such that n is the number of items in γ, and ε and δ are positive integer arbitrarily small. In other words, this corresponds to verifying whether an unbiased estimator [24] γ’ for γ exists. If this is the case, γ’ gives us theoretically-proofed probabilistic bounds on the error we commit in reconstructing the function γ (by means of γ’). Definition 1. Let α : ℝm → ℝn be a perturbation function converting a matrix A into the approximating matrix A’; a numeric function γ on A is said to be n,ε,δ-re-constructible by means of a re-constructing function γ’, such that n is the number of items in γ, and ε and δ are positive integers arbitrarily small, iff γ’ can be evaluated on A’ and the following condition holds: |γ – γ’| = max{ε, εγ}, such that max{I} denotes the operator max over a given item set I. Based on these theoretical constructs and concepts, we now focus the attention on re-constructing the answer z to a given range query Q = [l1:u1; l2:u2] over A from the approximating matrix A’ (or, equally, retrieving the approximate answer to Q, z’) and the probabilities pi (3) and pj (2) exploited by the CUR decomposition method to obtain A’ from A. For this theoretical setting, the re-constructing function γ’ we adopt, still inspired by [2], is defined as follows: u1 u1  (1  pi )  p j   (Q  [l1 : u1 ; l2 : u2  ])    A'[i ][ j ]   b pi  (1  p j )  i  l1 j l 2  

such that: (i) A’[i][j] denotes an element of A’; (ii) pi (3) denotes the probability of picking the i-th row of A during the CUR decomposition method; (iii) pj (2) denotes the probability of picking the j-th column of A during the CUR decomposition method; (iv) b is defined as follows:

such that: (i) x models an m-dimensional vector whose elements x[i], with 0 ≤ i ≤ m – 1, are defined as follows:

1 if l1  i  u1 x[i]   0 otherwise

(6)

(ii) y models an n-dimensional vector whose elements y[j], with 0 ≤ j ≤ n – 1, are defined as follows:

1 if l2  j  u2 y[ j ]   otherwise 0

b

For the sake of clarity, Theorem 2 proves that the approximate answer to Q, z’, is probabilistically-close to the exact answer to Q, z, or, in other words, the re-construction property of the CUR decomposition method. Second, we introduce the concept of re-constructible function, still inspired by [2], whose formal definition is provided in Definition 1. Intuitively enough, a numeric function γ is said to be re-constructible iff it allows us to “invert” the transformation of

max{A'}  min{A'} max{A}  min{A}

(9)

such that max{B} denotes the operator max over the elements of B, with B in {A, A’}, and min{B} denotes the operator min over the elements of B, with B in {A, A’}, respectively. Theorem 2 states that the re-constructing function γ’ (8) is an unbiased estimator for the function γ determined by the CUR decomposition method, under the following condition:

(7)

and (iii) z models the answer to Q (z’ models the approximate answer to Q, respectively).

(8)

2 n  4  log( )  ( pi  p j   )  2 

(10)

such that: (i) n denotes the number of elements of A involved in the evaluation of Q; (ii) ε and δ are positive integers arbitrarily small; (iii) pi and pj are the probabilities (3) and (2), respectively, exploited by the CUR decomposition method. Theorem 2. Let the value A[i][j] in [min{A'}, max{A'}] be estimated by the re-constructing function γ’; then γ’ is a n,ε,δunbiased-estimator for γ if the following condition holds: 2 n  4  log( )  ( pi  p j   )  2 .



Proof. Let Xij denote a random variable [24] for the event that element A[i][j] of A is perturbed, and the perturbed element A’[i][j] is contained by the interval [min{A'}, max{A'}]. It should be noted that the collection of random variables Xij are i.i.d. [24], and that the probability that element A[i][j] of A is perturbed is given by the following formula: P(Xij = 1) = (1 – pi)  (1 – pj)  b

(12)

Likewise, let Yij denote a random variable for the event that element A[i][j] of A is not perturbed, and it is contained by the interval [min{A'}, max{A'}]. Similarly to the case of random variables Yij, it should be clear enough that the collection of random variables Yij are i.i.d. and that the probability that element A[i][j] of A is not perturbed is given by the following formula: P(Yij = 1) = pi  pj

(13)

In turn, the following formula holds: P(Yij = 0) = 1 – P(Yij = 1) = 1 – pi  pj

(14)

Now, let Zij denote a random variable for the event that, during the CUR decomposition method, element A[i][j] of A falls within the interval [min{A'}, max{A'}]. It follows that Zij can be defined in terms of the previous random variable Xij and Yij, as follows: Zij = Xij + Yij

Uij (1 ,  2 ) 

i  1 1 j   2 1

(15)

due to the fact that, during the CUR decomposition method, an arbitrary element A[i][j] of A may be contained (i.e., Xij = 1 and Yij = 0) or not (i.e., Xij = 0 and Yij = 1) by the interval [min{A'}, max{A'}]. From (15), it follows that the collection of random variables Zij are i.i.d. and that the probability that element A[i][j] of A falls within the interval [min{A'}, max{A'}] is given by the following formula:

 Z

hk

h i

(11)

As a consequence, the following formula holds: P(Xij = 0) = 1 – P(Xij = 1) = 1 – (1 – pi)  (1 – pj)  b

Now, let Uij denote a random variable defined as the summation of random variables Zij [P] over the two-dimensional domain of A (A’, respectively) modeling the range of Q, i.e. [l1:u1; l2:u2] that is defined as follows:

 A'[h][k ]

(22)

k j

It should be noted that random variables Uij are those associated to the evaluation of the approximate answer to Q, z’, and that they underlie the definition of the re-constructing function γ’ (8). The number of elements of A’ involved in the Q’s evaluation process, n (or, equally, the number of items of γ’ – γ, respectively), is given by the following formula: (23)

n = ||Q|| = ||Δ1||  ||Δ2||

How to model the approximate evaluation of Q over A’ in a probabilistic manner? In order to answer this critical question, first note that each one among the n elements A’[i][j] of A’ may contribute (i.e., Uij = 1) or not (i.e., Uij = 0) to the approximate answer to Q, z’. Our final aim is to find probabilistic bounds for the probability P(Uij = 1). Since random variables Zij are i.i.d. and random variables Uij are defined as the summation of Zij, then Uij are independent Bernoulli random variables [24]. Under the condition (10), by applying the well-known Chernoff bound [24], the following inequality holds:





P Uij (1 ,  2 )  n  t  AVG( 1 ,  2 )  n   2e

 n 2 4t



(24)

such that (i) t = P(Zij = 1) (17); (ii) AVG(Δ1,Δ2) denotes the average value of elements A’[i][j] of A’ contained by the twodimensional range of Q, [l1:u1; l2:u2]; (iii)  is defined as follows: u1

u2

   pi  p j  

(25)

i  l1 j l 2

P(Zij = 1) = P((Xij + Yij) = 1) = P(Xij = 1) + P(Yij = 1)

(16)

From (11) and (13), (16) is finally given by the following formula: P(Zij = 1) = (1 – pi)  (1 – pj)  b + pi  pj

(17)

As a consequence, the following formula holds: P(Zij = 0) = 1 – P(Zij = 1) = 1 – (1 – pi)  (1 – pj)  b + pi  pj

(18)

Furthermore, let Δ1 denote the range of Q on the dimension d1 of A (A’, respectively). From (4), it clearly follows that the cardinality of Δ1, ||Δ1||, is given by the following formula: ||Δ1|| = u1 – l1

(19)

Similarly, let Δ2 denote the range of Q on the dimension d2 of A (A’, respectively). From (4), it clearly follows again that the cardinality of Δ2, ||Δ2||, is given by the following formula: ||Δ2|| = u2 – l2

(20)

Also, let ||Q|| denote the volume (or selectivity [6]) of Q. Based on (4), (19) and (20), ||Q|| is given by the following formula: ||Q|| = ||Δ1||  ||Δ2||

(21)

such that ||Δ1|| denotes the cardinality of Δ1, and ||Δ2|| denotes the cardinality of Δ2, respectively.

where pi and pj are the probabilities (3) and (2), respectively, exploited by the CUR decomposition method, and  is a positive integer arbitrarily small; (iv) δ is a positive integer arbitrarily small. From (24), it follows that, with probability greater than 1 – δ, the following inequality holds: u1 u1  (1  pi )  p j        A'[i][ j ]   b     pi  (1  p j )  i  l1 j  l2  

(26)

from which it follows that |γ – γ’| <  with probability 1 – δ, and that re-constructing function γ’ (8) is an unbiased estimator for the function γ determined by the CUR decomposition method. Finally, for the sake of completeness, from (5) and (8) the approximate answer to Q, z’, can be obtained as follows: z’ = (xT  A  y) – (xT  C  y)

(27)

such that C is m  n matrix whose C[i][j] elements are defined as follows:

C[i][ j ] 

(1  pi )  p j pi  (1  p j )

b

(28)

where pi and pj are the probabilities (3) and (2), respectively, exploited by the CUR decomposition method, and b is the quantity (9).



(29)

Theorem 3. The final global OLAP view VGLOBAL obtained from any arbitrary SUM-based secure OLAP aggregation task over a distributed environment populated by n nodes can be retrieved from combining the local OLAP view V0 at node N0, the privacy preserving OLAP view V0PP at node N0 and the privacy PP n 1

preserving OLAP view V

at node Nn-1 without dependence on

the OLAP views located at other nodes Ni, with 1  i  n – 2, of the reference distributed environment, i.e. PP . V GLOBAL  V0  VnPP  V 1 0





Proof. Take as reference a distributed environment populated by n nodes. First, note that, given two consecutive nodes Ni-1 and Ni in the fixed node ordering, such that 1  i  n – 2, since we focus on SUM-based OLAP aggregations, the privacy preserving view Vi PP at node Ni is obtained by combining the local view Vi at node Ni with the privacy preserving view Vi PP1 returned to the node Ni from the node Ni-1, as follows (see Section 1):

Vi PP  Vi  Vi PP1

(30)

Contrary to this, for the sole instance represented by the first node N0, the privacy preserving view V0PP is directly obtained from the local view V0 via the CUR-based approximation method (see Section 1). Hence, with respect to privacy preserving views located at nodes of the reference distributed environment, the following equalities hold:

V0PP V1PP V2PP ... VnPP 1

 CUR (V0 )  V1  V0PP  V2  V1PP

(32)

PP VnPP 1  Vn 1  Vn  2  ...  V1  V0

From [9], to the fact that SUM-based OLAP aggregation is a nonholistic operator [16], it is easy to demonstrate that the final global result of the target distributed OLAP aggregation task, i.e. the view VGLOBAL, can be reconstructed as follows, like formally stated by Theorem 3:



V1PP  V1  V0PP V2PP  V2  V1PP  V2  V1  V0PP ...

3. THEORETICAL ANALYSIS AND RESULTS ON THE INDEPENDENCE CAPABILITIES OF THE CUR DECOMPOSITION METHOD

PP V GLOBAL  V0  VnPP 1  V 0

V0PP  CUR (V0 )

Based on (32), (29) can be expanded as follows:

V GLOBAL

 



PP  V0  VnPP 1  V0  V0  Vn 1  Vn  2  ...  V1  V0PP  V0PP (33)





i.e.:

V GLOBAL  V0  V1  V 2  ...  V n 1

(34)

which, from Section 1, represents the (exact) final result of the target distributed OLAP aggregation task. Theorem 3 is another relevant theoretical result of our research. It allows us to obtain the final global result of the target secure distributed OLAP aggregation task, VGLOBAL from the OLAP views stored at the first node N0 of the reference distributed environment, V0 and

V0PP , respectively, one exact (i.e., V0) and

one privacy preserving (i.e., preserving OLAP view

V0PP ), and from the privacy

returned to the node N0 from the VnPP 1

node Nn-1, without dependence on the OLAP views (local and privacy preserving) of the other nodes Ni, with 1  i  n – 2, of the reference distributed environment. Intuitively enough, this phenomenon opens to interesting theoretical as well as queryoptimization opportunities to be embedded within the proposed privacy preserving distributed OLAP framework. As a useful corollary deriving from Theorem 3 (Corollary 1), it follows that our proposed framework is orthogonal to the specific method used to obtain the privacy preserving view Vi PP at node Ni (CUR, in our case), hence it maintains its validity and generality with any arbitrary privacy preserving method from the state-of-the-art literature (e.g., [11,20,26,29,30,31,12,23,3]). This gives further merits to our research. Corollary 1. The proposed privacy preserving distributed OLAP framework is orthogonal to the method used to compute privacy preserving two-dimensional OLAP views.

4. CONCLUSIONS AND FUTURE WORK (31)

 Vn 1  VnPP 2

Based on (30), by applying simple mathematical substitutions, (31) can be re-written as follows:

Starting from our previous research result provided in [9], where a privacy preserving distributed OLAP framework has been presented and experimentally assessed, in this paper we have provided a number of theoretical results that nicely extend the capabilities and the potentialities of the framework above. These theoretical results are mainly related to some relevant capabilities of the CUR matrix decomposition method, which is the core tool for computing privacy preserving two-dimensional OLAP views within the framework [9]. Future work is mainly oriented to extend the theoretical results presented here as to make them more robust in order to cover two “difficult” privacy preserving distributed OLAP scenarios of the main framework [9], i.e. (i) the need for multi-resolution OLAP analysis across suitable dimensional hierarchies, and (ii) the presence of coalition of

attackers that may share partial knowledge in order to magnify the capabilities of sensitive data cell inference tasks.

5. REFERENCES [1] S. Agrawal, J.R. Haritsa, B.A. Prakash, “FRAPP: A Framework for High-Accuracy Privacy-Preserving Mining”. Data Mining and Knowledge Discovery 18(1), pp. 101–139, 2009. [2] R. Agrawal, R. Srikant, D. Thomas, “Privacy-Preserving OLAP”. In Proc. of SIGMOD, pp. 251–262, 2005. [3] B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, K. Talwar, “Privacy, Accuracy, and Consistency Too: A Holistic Solution to Contingency Table Release”. In Proc. of PODS, pp. 273–282. 2007. [4] A.C.-F. Chan, C. Castelluccia, “A Security Framework for Privacy-Preserving Data Aggregation in Wireless Sensor Networks”. ACM Transactions on Sensor Networks 7(4), art. 29, 2011. [5] C. Clifton, M. Kantarcioglu, X. Lin, J. Vaidya, M. Zhu, “Tools for Privacy Preserving Distributed Data Mining”. SIGKDD Explorations 4(2), pp. 28–34, 2002. [6] G. Colliat, “OLAP, Relational, and Multidimensional Database Systems”. SIGMOD Record 25(3),pp. 64–69, 1996. [7] A. Cuzzocrea, “Accuracy Control in Compressed Multidimensional Data Cubes for Quality of Answer-based OLAP Tools”. In Proc. of SSDBM, pp. 301–310, 2006. [8] A. Cuzzocrea, “Privacy Preserving OLAP: Models, Issues, Algorithms”. In Proc of MIPRO, pp. 1538–1543, 2011. [9] A. Cuzzocrea, E. Bertino, “A Secure Multiparty Computation Privacy Preserving OLAP Framework over Distributed XML Data”. In Proc. of SAC, pp. 1666–1673, 2010. [10] A. Cuzzocrea, V. Russo, “Privacy Preserving OLAP and OLAP Security”. In J. Wang (ed.), “Encyclopedia of Data Warehousing and Mining”, 2nd edition, IGI Global, pp. 1575–1581, 2009. [11] A. Cuzzocrea, V. Russo, D. Saccà, “A Robust Samplingbased Framework for Privacy Preserving OLAP”. In Proc. of DaWaK, pp. 97–114, 2008. [12] A. Cuzzocrea, D. Saccà, “Balancing Accuracy and Privacy of OLAP Aggregations on Data Cubes”. In Proc. of DOLAP, pp. 93–98, 2010. [13] P. Drineas, R. Kannan, M.W. Mahoney, “Computing Sketches of Matrices Efficiently and Privacy Preserving Data Mining”, in: Proc. of DIMACS PPDM, 2004, available online at: http://dimacs.rutgers.edu/Workshops/Privacy/ [14] P. Drineas, R. Kannan, M.W. Mahoney, “Fast Monte Carlo algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition”. SIAM Journal on Computing 36(1), pp. 184–206, 2006. [15] C. Dwork, “Differential Privacy: A Survey of Results”, In Proc. of TAMC, pp. 1–19, 2008.

[16] J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, H. Pirahesh, “Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals”. Data Mining and Knowledge Discovery 1(1), pp. 29–53, 1997. [17] G.H. Golub, C.F. Van Loan, Matrix Computations, Johns Hopkins University Press, 1989. [18] J. Han, J. Pei, G. Dong, K. Wang, “Efficient Computation of Iceberg Cubes with Complex Measures”. In Proc. of SIGMOD, pp. 1–12, 2001. [19] W. He, X. Liu, H. Nguyen, K. Nahrstedt, T. Abdelzaher, “PDA: Privacy-Preserving Data Aggregation for Information Collection”. ACM Transactions on Sensor Networks 8(1), art. 6, 2011. [20] M. Hua, S. Zhang, W. Wang, H. Zhou, B. Shi, “FMC: An Approach for Privacy Preserving OLAP”. In Proc. of DaWaK,, pp. 408–417, 2005. [21] F. Li, B. Luo, P. Liu, “Secure and Privacy-Preserving Information Aggregation for Smart Grids”. International Journal of Security and Networks 6(1), pp. 28–39, 2011. [22] X. Lin, R. Lu, X. Shen, “MDPA: Multidimensional PrivacyPreserving Aggregation Scheme for Wireless Sensor Networks”. Wireless Communications and Mobile Computing 10(6), pp. 843–856, 2010. [23] Y. Liu, S.Y. Sung, H. Xiong, “A Cubic-Wise Balance Approach for Privacy Preservation in Data Cubes”. Information Sciences 176(9), pp. 1215–1240, 2006. [24] A. Papoulis, Probability, Random Variables, and Stochastic Processes, McGraw-Hill, 1984. [25] B. Pinkas, “Cryptographic Techniques for PrivacyPreserving Data Mining”. SIGKDD Explorations 4(2), pp. 12–19, 2002. [26] S.Y. Sung, Y. Liu, H. Xiong, P.A. Ng, “Privacy Preservation for Data Cubes”. Knowledge and Information Systems 9(1), pp. 38–61, 2006. [27] S.K. Thompson, G.A.F. Seber, Adaptive Sampling, John Wiley & Sons, 1996. [28] Y. Tong, G. Sun, P. Zhang, S. Tang, “Privacy-Preserving OLAP based on Output Perturbation Across Multiple Sites”. In Proc. of PST, p. 46, 2006. [29] L. Wang, S. Jajodia, D. Wijesekera, “Securing OLAP Data Cubes against Privacy Breaches”. In Proc. of SP, pp. 161– 175, 2004. [30] L. Wang, D. Wijesekera, S. Jajodia, “Cardinality-based Inference Control in Data Cubes”. Journal of Computer Security 12(5), pp. 655–692, 2004. [31] N. Zhang, W. Zhao, J. Chen, “Cardinality-based Inference Control in OLAP Systems: An Information Theoretic Approach”. In Proc. of DOLAP, pp. 59–64, 2004.

Recommend Documents

A Fuzzy Approach for Privacy Preserving in Data ... - Semantic Scholar