Measuring Secrecy by the Probability of a Successful Guess - arXiv

Report 2 Downloads 28 Views
1

Measuring Secrecy by the Probability of a Successful Guess Ibrahim Issa and Aaron B. Wagner

arXiv:1507.02342v2 [cs.IT] 27 Jul 2015

Abstract The secrecy of a communication system in which both the legitimate receiver and an eavesdropper are allowed some distortion is investigated. The secrecy metric considered is the exponent of the probability that the eavesdropper estimates the source sequence successfully within an acceptable distortion level. When the transmitter and the legitimate receiver do not share any key and the transmitter is not subject to a rate constraint, a single-letter characterization of the highest achievable exponent is provided. Moreover, asymptotically-optimal universal strategies for both the primary user and the eavesdropper are demonstrated, where universality means independence of the source statistics. When the transmitter and the legitimate receiver share a secret key and the transmitter is subject to a rate constraint, upper and lower bounds are derived on the exponent by analyzing the performance of suggested strategies for the primary user and the eavesdropper. The bounds admit a single-letter characterization and they match under certain conditions, which include the case in which the eavesdropper must reconstruct the source exactly.

I. I NTRODUCTION To compromise the security of a communication network, an eavesdropper need not have direct access to the decrypted content of the transmitted packets. In fact, simply monitoring and analyzing the network flow may help an eavesdropper deduce sensitive information. For example, Song et al. [1] show that the Secure Shell (SSH) is vulnerable to what is called timing attacks. In SSH, each keystroke is immediately sent to the remote machine, and an eavesdropper can thus observe the timing of the keystrokes. It is shown that this information can be used to significantly speed up the search for passwords, and it is estimated that each consecutive pair of keystrokes leaks around 1 bit of information. Zhang and Wang [2] enhance the attack proposed in [1], and apply it in the setting of multi-user operating systems, in which a malicious user eavesdrops on other users’ keystrokes. Timing-based attacks appear also in various other settings, including: compromising the anonymity of users in networks [3,4], information leakage in the context of shared schedulers [5] and in the context of on-chip networks [6]. In this paper, we consider a stylized model of such information leakage problems, and call it the information blurring system. The setup, shown in Figure 1, consists of the following. A transmitter observes a sequence X n , which corresponds roughly to

Fig. 1.

Information blurring system: both the legitimate receiver and the eavesdropper are allowed a certain distortion level.

the original timing vector, and maps it to a sequence Y n that is observed by both the legitimate receiver and an eavesdropper. The mapping must almost surely satisfy a distortion constraint, which corresponds to some quality constraints imposed by the network (e.g., delay constraints). We do not require the mapping to be causal as the intent of this work is to provide fundamental limits for a simplified version of the information leakage problem. In broad terms, the transmitter wants to blur the information in X n (hence the name), so that it is no longer useful for the eavesdropper. For example, one approach is to artificially add noise to the input sequence. In that sense, the problem is related to methods for ensuring differential privacy, The authors are with the School of Electrical and Computer Engineering, Cornell University, Ithaca, NY (email: [email protected], [email protected]).

2

in which a curator wants to publicly release statistical information about a given population without compromising the privacy of its individuals [7,8]. Upon observing the output Y n , the eavesdropper, who knows the source statistics and the transmitter’s encoding function, tries to estimate X n . We introduce a distortion function and consider the eavesdropper’s estimate to be successful if the distortion it incurs is below a given level. Hence, we measure the secrecy guaranteed by a given scheme via the probability that the eavesdropper makes a successful guess. The primary user (i.e., the transmitter legitimate-receiver pair) aims then to minimize that probability. Since computing the exact probability is quite difficult, this paper will be mainly concerned with asymptotic analysis: we will derive the rate of decay (i.e., the exponent) of the probability of a successful guess. Other metrics for quantifying secrecy exist in the literature; we discuss the motivations and the shortcomings of the commonly used ones in Section II. For a discrete memoryless source (DMS), we provide a single-letter characterization of the optimal exponent. We show that the problem is related to source coding with side information. Essentially, the eavesdropper first attempts to guess the joint type of X n and Y n . S/he, then, “pretends” that Y n is received through a memoryless channel the probability law of which is the conditional probability P (Y |X) induced by the joint type. The problem can be viewed at this point as compression with side information, so the eavesdropper picks a codeword from an optimal rate-distortion code. The primary user’s objective, therefore, is to supply the “worst” side information. Moreover, we demonstrate asymptotically-optimal universal schemes for both the legitimate receiver and the eavesdropper. The schemes are universal in the sense that they do not depend on the source statistics. Next, we extend the study to the setup in which the transmitter is subject to a rate constraint, and the transmitter and the legitimate receiver have access to a common source of randomness, called the key. The eavesdropper has full knowledge of the encryption system, except for the realization of the key and the realization of X n . The setup is shown in Figure 2, and the special case in which the legitimate receiver and the eavesdropper must reconstruct the source exactly is known as the Shannon cipher system [9]. For the primary user, we analyze the performance of the following scheme: for each type class, an

Fig. 2. The Shannon cipher system with lossy communication: the transmitter and the legitimate receiver have access to a common key K, which consists of nr purely random bits, where r is called the key rate. The transmitter encodes X n using K, and sends a message M through a noiseless public channel of rate R. Both the legitimate receiver and the eavesdropper are allowed a certain level of distortion. The legitimate receiver generates the reconstruction Y n based on M and K, whereas the eavesdropper has access to M only to produce an estimate V n .

optimal rate-distortion code is generated; each sequence xn is mapped to the proper reconstruction in the code corresponding to its type; and the common key is used to randomize the choice of the reconstruction within that code. For the eavesdropper, two schemes are suggested. The first consists of generating a blind guess, i.e., completely ignoring the public message. The second consists of guessing the value of the key to reproduce the reconstruction at the legitimate receiver, and then applying the strategy developed in the first part of the paper. The suggested schemes do not depend on the source statistics, and the implied upper and lower bounds admit a single-letter characterization. Moreover, we show that they match under a given condition, which is satisfied in certain cases of interest. For instance, they match if the eavesdropper must reconstruct the source sequence exactly, or if the source is binary and the distortion functions are the Hamming distance with De ≤ D, where De and D are the allowed distortion levels at the eavesdropper and the legitimate receiver, respectively. They also match if the key rate is sufficiently high. Finally, it should be noted that Weinberger and Merhav studied the Shannon cipher system with lossy communication [10] (i.e, the setup of the second part of this paper), and independently suggested the same secrecy metric we proposed. Furthermore,

3

they allowed a variable key rate. However, they derived the optimal exponent under the assumption that the distortion constraint of the eavesdropper is more lenient than that of the legitimate receiver (which makes the no-key problem degenerate). Hence, the results of this paper and that of [10] are not comparable. II. S ECRECY M ETRIC The information-theoretic study of secrecy systems was initiated by Shannon in [9]. Shannon derived the following negative result: ensuring perfect secrecy, i.e., making the source sequence X n and the public message M (cf. Figure 2) statistically independent, requires that the key rate be at least as large as the message rate. As opposed to perfect secrecy, the notion of “partial” secrecy is more difficult to quantify. However, the impracticality of ensuring perfect secrecy, as implied by Shannon’s result, means that developing such a notion is important from a practical point of view as well as a theoretical one. Shannon used equivocation — the conditional entropy of the source sequence given the public message H(X n |M ) — as a “theoretical secrecy index”. A main motivation for equivocation was the similarity between the deciphering problem for the eavesdropper in the secrecy setting and the decoding problem for the receiver in the standard noisy communication setting [9]. Equivocation has subsequently been used as a secrecy metric in several works [11]– [17]. However, its use is not well motivated operationally. It only provides a lower bound on the exponent of the list size that the eavesdropper must generate to reliably include the source sequence. Moreover, Massey showed in [18] that the expected number of guesses that needs to be made to correctly guess a discrete random variable X may be arbitrarily large for arbitrarily small H(X). Merhav and Arikan [19] proposed a more direct approach: they consider an i.i.d. source and they measure secrecy by the expected number of guesses that the eavesdropper needs to make before finding the correct source sequence, which they denote by E[G(X n |M )], where G(.|m) is a “guessing” function defined for each possible public message m. This is intended to capture the scenario in which the eavesdropper has a testing mechanism to check whether or not his/her guess is correct. Such mechanism exists, for example, if the source message is a password to a computer account. When the source is discrete and memoryless, and the transmitter and the legitimate receiver have access to nr purely random common bits (where r is called the key rate), the optimal exponent of E[G(X n |M )] is found to be [19, Theorem 1]: E(P, r) , lim

n→∞

1 log E[G(X n |M )] = max {min{H(Q), r} − D(Q||P )} , Q n

(1)

where P is the source distribution and D(·||·) is the Kullback-Leibler (KL) divergence. Two issues arise with this metric. First, even if a testing mechanism exists, any practical system would only allow a small number of incorrect inputs. Thus, it is not clear how to interpret an exponentially large number of guesses. Second, and more importantly, it turns out that even highly-insecure systems can appear to be secure under this metric. Indeed, by modifying the asymptotically-optimal scheme proposed in [19], we can construct a scheme for the primary user that allows the eavesdropper to find the source sequence correctly with high probability by the first guess, and yet achieves the optimal exponent in (1). The scheme proposed in [19] operates on the source sequences on a type-by-type basis, and it yields:   E G(X n |M ) X n ∈ TQ ≥ 2n min{r,H(Q)}−o(n) ,

(2)

where o(n)/n → 0 as n → ∞, and TQ is the type class of a given type Q, i.e., the set of sequences with empirical distribution Q. Averaging over the probabilities of {TQ } yields the exponent in (1) (as a lower bound). However, this means that it is enough to apply the proposed scheme to the type class TQ that achieves the maximum of [min{H(Q), r} − D(Q||P )], whereas sequences belonging to other type classes can be sent with no encoding whatsoever with no effect on the exponent. Therefore, only a set with vanishing probability is encoded, whereas sequences outside that set are immediately known by the eavesdropper1 . A different approach, based on rate-distortion theory, was adopted by Yamamoto in [20]. A distortion function is introduced and the secrecy of a given scheme is measured by the minimum attainable expected distortion at the eavesdropper. Also, a 1 Merhav and Arikan actually characterize, for any ρ > 0, the exponent of E[Gρ (X n |M )]. This more general result can still yield large exponents for systems that are highly insecure, although one could potentially address this issue by requiring schemes that yield large exponents simultaneously over a range of ρ values.

4

certain level of distortion, possibly corresponding to a different distortion function, is allowed at the legitimate receiver. An earlier work by Yamamoto [21] considered the special case where no key is available, under the same secrecy metric. A standard example, discussed and generalized in [22], shows why expected distortion is inadequate: Suppose X n is a sequence of independent and identically distributed bits with Xi ∼ Ber(1/2), the transmitter and the legitimate receiver have access to one common bit K ∼ Ber(1/2), and the distortion function is the Hamming distance. The transmitter then sends the sequence X n as is if K = 0, and flips all its bits if K = 1. The induced expected distortion at the eavesdropper is then equal to 1/2, which is also the maximum expected distortion that the eavesdropper can possibly incur, since it is achievable even if the public message is not observed. However, this “optimal” scheme in fact reveals a lot about the true source sequence, namely, it is one of only two possible candidates. To overcome this limitation of expected distortion, Schieler and Cuff [23,24] allow the eavesdropper to generate an exponentially-sized list of estimates and propose the expected minimum distortion over the list as a secrecy metric. It is not clear, however, how to operationally interpret a list of exponential size. Moreover, this metric leads to a degenerate tradeoff between the key rate r, the allowed list exponent RL , and the expected minimum distortion in the list De . For example, if the legitimate receiver must reconstruct X n losslessly, one of two cases must occur (see [24, Theorem 1]): either the public message M is made completely useless to the eavesdropper when r > RL , and De is then given by the distortion-rate function at RL ; or, when r ≤ RL , the eavesdropper can trivially find the exact sequence by listing all the possible keys. Even when RL < r, the eavesdropper can still list 2nRL possible keys and thus recover exactly the correct sequence with probability at least 2n(RL −r) . As RL approaches r, this probability can be made to decay arbitrarily slowly. In this paper, we take a different approach. In many applications, the eavesdropper has no way to verify if his/her estimate is correct. This is particularly true in our main case of interest, i.e., timing of events. Moreover, as mentioned before, most practical systems allow a small number of incorrect guesses even if a testing mechanism exists. Therefore, we allow the eavesdropper to make one guess only. Secrecy is measured then by the probability that the guess is successful, i.e., the distortion incurred is below a given level. For the purposes of the asymptotic analysis in this paper, we will study only the exponent of the probability of a successful guess. A special case of such analysis was considered by Merhav in [25]. In particular, [25] is concerned with necessary and sufficient conditions for achieving the perfect secrecy exponent, which is the exponent attained by the eavesdropper in the absence of any observation. It is also restricted to the case in which both the legitimate receiver and the eavesdropper must reconstruct the source sequence exactly. Finally, a relevant earlier work by Arikan and Merhav [26] considers the problem of blindly guessing a random variable up to a distortion level and characterizes the least achievable exponential growth of the expected number of guesses. III. I NFORMATION B LURRING S YSTEM We consider the following secrecy system. Let X , Y, and V be the alphabets associated with the transmitter, the legitimate receiver, and the eavesdropper, respectively. The transmitter wants to provide the legitimate receiver with a quantized version of an n-length message X n = (X1 , X2 , · · · , Xn ). It thus generates a vector Y n through a (possibly randomized) function f : Y n = f (X n ). For a given distortion function d : X × Y → R+ , the quantization is required to satisfy a constraint of Pn the form d(X n , Y n ) = n1 i=1 d(Xi , Yi ) ≤ D for a given distortion level D. The restriction is imposed on each realization of (X n , Y n ). An eavesdropper, with an associated distortion function de : X × V → R+ , also observes Y n and generates a Pn guess V n = g(Y n ), aiming to have de (X n , V n ) = n1 i=1 de (Xi , Vi ) ≤ De for a given distortion level De . It is assumed that the eavesdropper knows the source statistics and the primary user’s encoding function f . The secrecy metric we adopt is the probability that the eavesdropper makes a successful guess, i.e., Pr(de (X n , V n ) ≤ De ). The primary user’s objective is to minimize this probability. So, the problem can be written as:     min max Pr de X n , gn (fn (X n )) ≤ De . fn

gn

We characterize the highest achievable exponent of the probability of a successful guess under the following assumptions: (A1) The source is memoryless. (A2) The alphabets X , Y and V are finite.

5

(A3) The distortion functions d and de are bounded, i.e., there exists Dmax and De,max such that, for all x ∈ X , y ∈ Y, and v ∈ V, d(x, y) ≤ Dmax and de (x, v) ≤ De,max . Moreover, D ≥ Dmin , where Dmin = maxx∈X miny∈Y d(x, y). Similarly, De ≥ De,min , where De,min = maxx∈X minv∈V de (x, v). We denote the optimal exponent by E(P, D, De ), where P is the source distribution, i.e.,     1 E(P, D, De ) = lim max min − log Pr de X n , gn (fn (X n )) ≤ De . n→∞ {fn } {gn } n

(3)

The existence of the limit will be seen later. We will show that the problem is related to source coding with side information, where Y n acts as side information for the eavesdropper. Therefore, the primary user’s job is to provide the “worst” side information subject to a distortion constraint of his/her own. To this end, we denote the conditional rate-distortion function as: R(PXY , De ) =

min

I(X; V |Y ),

(4)

max

R(PXY , De ).

(5)

PV |X,Y : E[de (X,V )]≤De

and define the quantity R(PX , D, De ) as: R(PX , D, De ) =

PY |X : E[d(X,Y )]≤D

Roughly speaking, when the joint type of X n and Y n is PXY , the eavesdropper can restrict the guessing space to 2nR(PXY ,De ) reconstruction sequences, knowing that at least one of them must satisfy the distortion constraint. The maximization in (5) corresponds to the primary user’s goal of maximizing that quantity. We prove the following properties of R(PXY , De ) and R(PX , D, De ) in Appendix A. Proposition 1: In the following statements, the domains of D and De are [Dmin , +∞) and [De,min , +∞), respectively. (P1) For fixed PXY , R(PXY , De ) is a finite-valued, non-increasing convex function of De . Furthermore, R(PXY , De ) is a uniformly continuous function of the pair (PXY , De ). (P2) For fixed PX , R(PX , D, De ) is a finite-valued function of (D, De ). Moreover, for fixed De , R(PX , D, De ) is a uniformly continuous function of the pair (PX , D). (P3) Re (PX , De ) − R(PX , D) ≤ R(PX , D, De ) ≤ Re (PX , De ), where R(PX , D) and Re (PX , De ) are the rate-distortion functions corresponding to the distortion constraints d and de , respectively. Our main result is the characterization of the optimal exponent as follows: Theorem 1: Under assumptions (A1)-(A3), for any DMS P , and distortion functions d and de with associated distortion levels D ≥ Dmin and De ≥ De,min , corresponding respectively to the primary user and the eavesdropper: E(P, D, De ) = min D(Q||P ) + R(Q, D, De ), Q

(6)

where Q ranges over all probability distributions on the source alphabet, and R(Q, D, De ) is as defined in (5). Remark 1: We do not require any -backoff for D or De to characterize the associated exponent. An interesting feature of Theorem 1 is the emergence of mutual information as part of the solution in (6), even though the setup does not include any rate constraints. Moreover, an interesting contrast can be seen between the expression in (1) for the expected number of guesses metric and the expression in (6) for our metric. Indeed, the former evaluates the performance of a given scheme asymptotically by a weighted best-case scenario, whereas the latter evaluates it by a weighted worst-case scenario. As an application of the theorem, we compute the perfect secrecy exponent, which we define as the best achievable exponent when the primary user is not subject to any constraint and denote it by E0 (P, De ). To this end, we introduce a trivial distortion

6

function: d(x, y) = 0, for all x ∈ X and y ∈ Y. Then, R(Q, D) = 0, for all Q and all D ≥ 0. It then follows from (P3) of Proposition 1 that R(Q, D, De ) = Re (Q, De ) for all Q. Therefore, E0 (P, De ) = min D(Q||P ) + Re (Q, De ). Q

(7)

We claim that this is the optimal exponent achievable by the eavesdropper when he/she needs to guess in the absence of any observation. This is easy to see if the eavesdropper must reconstruct the source exactly. In that case, Re (Q, 0) = H(Q) for all Q, and E0 (P, 0) is simply given by the exponent of the most likely sequence according to the prior distribution of the source. To prove the claim for any distortion constraint, let E∅ (P, De ) be the best achievable exponent for the eavesdropper in the absence of any observation. Then, E0 (P, De ) ≤ E∅ (P, De ), since the eavesdropper can always choose to ignore the output Y n . On the other hand, a valid scheme for the primary user, when the trivial constraint is imposed, is to deterministically send a fixed sequence y n for some y ∈ Y. Let Ec (P, De ) be the exponent associated with that scheme. Then, Ec (P, De ) ≤ E0 (P, De ). Finally, note that P (X n |y n ) = P (X n ). Therefore, Ec (P, De ) = E∅ (P, De ), yielding E0 (P, De ) = E∅ (P, De ). The next two subsections are devoted to proving Theorem 1. We first propose a scheme for the primary user and show that the induced exponent is lower-bounded by the right-hand side of (6). From the eavesdropper’s point of view, this is a converse result. Similarly, we propose a scheme for the eavesdropper and show that the induced exponent is upper-bounded by the right-hand side of (6), which establishes the desired result. We set some notation for the remainder of the paper. In the following, Z is an arbitrary discrete set, and Z is a random variable over Z. - The set of probability distributions over Z is denoted by PZ . - For a sequence z n ∈ Z n , Qzn is the empirical PMF of z n , also referred to as its type. - QnZ is the set of types in Z n , i.e., the set of rational PMF’s with denominator n. - For QZ ∈ QnZ , the type class of QZ is TQZ , {z n ∈ Z n : Qzn = QZ }. - EQ [·], HQ (·), and IQ (·; ·) denote respectively expectation, entropy, and mutual information taken with respect to distribution Q. - All logarithms and exponentials are taken to the base 2. A. Achievability for the Primary User (Eavesdropper’s Converse Result) Let E − (P, D, De ) = lim inf max min − n→∞ {fn } {gn }

    1 log Pr de X n , gn (fn (X n )) ≤ De . n

(8)

We will show that E − (P, D, De ) ≥ minQ D(Q||P ) + R(Q, D, De ). The primary user will operate on the source sequences on a type-by-type basis. For each type QX ∈ QnX , we create a rate distortion code CQX to cover each sequence in TQX as follows. We associate with QX a joint type QXY from QnX Y (QX , D)2 : QnX Y (QX , D) = {PXY ∈ QnX Y : PX = QX , EPXY [d(X, Y )] ≤ D}.

(9)

The code is then constructed from TQY as given by the following lemma, which bounds the size of the code. Lemma 2: Given  > 0, there exists n0 (, |X |, |Y|) such that for any n ≥ n0 , for each joint type QXY ∈ QnX Y , there exists n a code (y1n , y2n , · · · , yN ) such that N ≤ 2n(IQXY (X;Y )+) , and for all xn ∈ TQX , there exists i satisfying (xn , yin ) ∈ TQXY . The proof of the lemma is given in Appendix B.

Remark 2: One might be tempted to use an optimal rate-distortion code for each type QX , presuming that this choice is best at preserving secrecy since it achieves optimal compression, i.e., it only sends the necessary information. However, the problem is more subtle since the “redundancy of information” depends on the eavesdropper’s distortion constraint de . The optimal choice of QXY will be revealed when analyzing the eavesdropper’s optimal strategy. 2 Assumption

(A3) guarantees that Qn X Y (QX , D) is nonempty for any QX .

7

n Now, fix  > 0 and let n be at least as large as n0 in Lemma 2. We will denote by CQ the rate distortion code associated X n with type QX . Thus, the function f of the primary user is as follows: each sequence xn is mapped to a sequence y n ∈ CQ xn

satisfying Qxn yn = QXY (where QXY is associated with Qxn ) and subsequently d(xn , y n ) ≤ D. To determine the eavesdropper’s optimal guess, we define BDe (v n ) = {xn ∈ X n : de (xn , v n ) ≤ De }. Then, for each observed y n , the optimal rule is given by X

g(y n ) = argmax v n ∈V n

p(xn |y n ).

xn ∈BDe (v n )

This can be understood as the MAP rule, and we denote in the remainder by3 go (where “o” stands for optimal). To upper-bound the probability of a correct guess, we consider a genie-aided rule that is aware of the type of the transmitted source sequence. That is, the genie-aided MAP rule yields X

go (y n , QX ) = argmax v n ∈V n

p(xn |y n , X n ∈ TQX ).

xn ∈BDe (v n )∩QX

Remark 3: One should not expect the upper bound to be loose since there are only polynomially many types in n, so that the exponent is not affected. −1 For a given y n , let fQ (y n ) = {xn ∈ TQX : f (xn ) = y n } be the set of sequences in TQX that are mapped to it. Then, X −1 the observation of y n implies that X n ∈ fQ (y n ), and the genie-aided MAP rules makes a successful guess if X n ∈ X

BDe (go (y n , QX )). Therefore, we will derive an upper bound on the maximum possible size of the intersection of these two sets. First, note that, xn ∈ TQX and f (xn ) = y n implies that Qxn yn = QXY , where QXY is the joint type associated with −1 QX . So fQ (y n ) ⊆ TQX|Y (y n ) , {xn ∈ TQX : (xn , y n ) ∈ TQXY }. Now, consider any v n ∈ V n , X \ \ −1 n n n (y ) ≤ (v ) T (y ) BDe (v n ) fQ B D Q e X|Y X X X (a) = 1 PXY V ∈Qn xn : X YV : (xn ,y n ,v n )∈TPXY V PXY =QXY EPXY V [de (X,V )]≤De PY V =Qyn vn (b)

X

=

|TPX|V,Y (v n , y n )|

n

PXY V ∈Q (QXY ,De ): PY V =Qyn vn

≤ (n + 1)|X ||Y||V|

max

|TPX|V,Y (v n , y n )|

max

2nHPXY V (X|V,Y ) ,

PXY V ∈Qn (QXY ,De ): PY V =Qyn vn

(b)

≤ (n + 1)|X ||Y||V|

PXY V ∈Qn (QXY ,De ): PY V =Qyn vn

(10)

where (a) follows from the fact that (xn , y n , v n ) ∈ TPXY V ⇒ PY V = Qyn vn . (b) follows from the definition of Qn (QXY , De ) as: Qn (QXY , De ) = {PXY V ∈ QnX YV : PXY = QXY , EPXY V [de (X, V )] ≤ De }.

(11)

(b) follows from Lemma 1.2.5 in [27]. Therefore, for large enough n, we get \ −1 n n max (v ) f (y ) B ≤ D e Q X n n v ∈V

3 The

max

PXY V ∈Qn (QXY ,De )

2n(HPXY V (X|V,Y )+) .

MAP rule depends on f and thus should be denoted by go,f . Since this is obvious, we drop the subscript f for notational convenience.

(12)

8

Let Pn? (QXY ) be the joint type achieving the max in (12), where the dependence on De is suppressed since it is fixed throughout the analysis. We can now upper-bound the probability that the eavesdropper makes a successful guess as follows:         Pr de X n , go (f (X n )) ≤ De ≤ Pr de X n , go (f (X n ), QX n ) ≤ De . X   P (xn )1 xn ∈ BDe go f (xn ), Qxn = xn ∈X n

X

=

X

n n QX ∈Qn X y ∈CQ

(a)

X X

X

X

QX ∈Qn X

n y n ∈CQ X

≤ (b)

xn ∈TQX : f (xn )=y n

2n(−D(QX ||P )−HQX (X)) 2n(HPn? (QXY ) (X|V,Y )+)

2n(IQXY (X;Y )+−D(QX ||P )−HQX (X)+HPn? (QXY ) (X|V,Y )+)

X



  P (xn )1 xn ∈ BDe go y n , QX

QX ∈Qn X

2n(−D(QX ||P )−HQXY (X|Y )+HPn? (QXY ) (X|V,Y )+2)

X

=

QX ∈Qn X

2−n(D(QX ||P )+IPn? (QXY ) (X;V |Y )−2) ,

X

=

(13)

QX ∈Qn X

where (a) follows from (12). (b) follows from Lemma 2. To interpret the exponent in (13), note that Pn? (QXY ) minimizes I(X; V |Y ) over Qn (QXY , De ) (follows readily from (12)). Therefore, IPn? (QXY ) (X; V |Y ) is roughly R(QXY , De ). The eavesdropper’s scheme can then be seen as picking a codeword from an optimal rate-distortion code that uses side information generated according to QY |X . Since QXY is the choice of the primary user, who is interested in maximizing the exponents in (13), we define for each QX ∈ QnX : Q? (QX ) =

argmax QXY ∈Qn X Y (QX ,D)

IPn? (QXY ) (X; V |Y ),

where we have again suppressed the dependence on D and De in the notation. Remark 4: The maximization does not depend on the source statistics, and consequently neither does the proposed encoding function f . With a slight abuse of notation, we rewrite Pn? (Q? (QX )) as Pn? (QX ) to get IPn? (QX ) (X; V |Y ) =

max

QXY ∈ Qn X Y (QX ,D)

We can now rewrite (13) as     Pr de X n , go (f (X n )) ≤ De ≤

IPn? (QXY ) (X; V |Y ) =

X

max

QXY ∈ Qn X Y (QX ,D)

min

QXY V ∈ Qn (QXY ,De )

IQXY V (X; V |Y ).

(14)

2−n(D(QX ||P )+IPn? (QX ) (X;V |Y )−2)

QX ∈Qn X

≤ (n + 1)|X | maxn 2−n(D(QX ||P )+IPn? (QX ) (X;V |Y )−2) QX ∈QX    |X | = (n + 1) exp −n −2 + min n [D(QX ||P ) + IPn? (QX ) (X; V |Y )] . QX ∈QX

Taking the limit as n goes to infinity, and noting that  is arbitrary, we get     1 E − (P, D, De ) = lim inf max min − log Pr de X n , gn (fn (X n )) ≤ De ≥ min D(Q||P ) + R(Q, D, De ), n→∞ {fn } {gn } Q n where the last inequality follows from the following proposition, the proof of which is given in Appendix C. Proposition 3: lim

min [D(QX ||P ) + IPn? (QX ) (X; V |Y )] = min D(Q||P ) + R(Q, D, De ).

n→∞ QX ∈Qn X

Q

(15)

(16)

9

TABLE I S UMMARY OF THE DEFINED SETS . Set Notation Qn X Y (QX , D) Qn X Y (QY , D) Qn (QXY , De )

Description PXY ∈ Qn : P = QX , EPXY [d(X, Y )] ≤ D. X XY PXY ∈ Qn X Y : PY = QY , EPXY [d(X, Y )] ≤ D. PXY V ∈ Qn X YV : PXY = QXY , EPXY V [de (X, V )] ≤ De .

B. Converse for the Primary User (Eavesdropper’s Achievability Result) Let E + (P, D, De ) = lim sup max min − n→∞

{fn } {gn }

    1 log Pr de X n , gn (fn (X n )) ≤ De . n

(17)

We will now show that E + (P, D, De ) ≤ minQ D(Q||P ) + R(Q, D, De ). This means that the eavesdropper can achieve the exponent in (6) for any function f the primary user implements. We propose a two-stage scheme for the eavesdropper. In the first stage, observing y n , s/he tries to guess the joint type of xn and y n by choosing an element uniformly at random from the set QnX Y (Qyn , D), where QnX Y (QY , D) = {PXY ∈ QnX Y : PY = QY , EPXY [d(X, Y )] ≤ D}.

(18)

The correct joint type must fall in this set since the restriction d(X n , Y n ) ≤ D is imposed on each realization of (X n , Y n ). We denote the function corresponding to this stage by g1 : Y n → QnXY . Remark 5: We differentiate between QnX Y (QY , D) and QnX Y (QX , D) by their first argument. A summary of the defined sets is given in Table I. The eavesdropper then proceeds assuming g1 (y n ) is the correct joint type. S/he randomly chooses a sequence from a set that covers TQX|Y (y n ). To this end, we associate with each joint type QXY a joint type QXY V from Qn (QXY , De ) (cf. Table I), and generate a sequence uniformly at random from TQV |Y (y n ), where QV |Y is the conditional probability induced by QXY V . We denote the function corresponding to this stage by g2 : Y n × QnXY → V n . Thus, g(y n ) = g2 (y n , g1 (y n )). Remark 6: The above strategy does not depend on the specifics of the function f implemented by the primary user, i.e., it only uses the fact that d(X n , f (X n )) ≤ D. It is also independent of the source statistics. The following lemma lower-bounds the probability that g2 (y n , QXY ) generates a sequence V n satisfying de (xn , V n ) ≤ De , for a given pair (xn , y n ) ∈ TQXY , i.e., assuming the eavesdropper guesses the joint type correctly. Lemma 4: Given joint type QXY V ∈ QnXY V and (xn , y n ) ∈ TQXY , if V n is chosen uniformly at random from TQV |Y (y n ),  then Pr V n ∈ TQV |X (xn ) ≥ cn 2−nIQXY V (X;V |Y ) , where cn = (n + 1)−|X ||Y||V| . Proof: Pr V

n

  TQV |X,Y (xn , y n ) cn 2nH(V |X,Y ) n n n ∈ TQV |X (x ) ≥ Pr V ∈ TQV |X,Y (x , y ) = ≥ = cn 2−nI(X;V |Y ) . nH(V |Y ) TQ (y n ) 2 V |Y n

where the second inequality follows from Lemma 1.2.5 in [27]. Since the eavesdropper is interested in maximizing this probability, s/he will associate, with each QXY , the joint type achieving the maximum: Pn? (QXY ) =

argmin

I(X; V |Y ).

PXY V ∈Qn (QXY ,De )

Note that this is the same joint type achieving the maximum in (12). We can now lower-bound the probability that xn ∈ BDe (g(y n )), for a given pair (xn , y n ) satisfying d(xn , y n ) ≤ D.

(19)

10

 Lemma 5: Given (xn , y n ) ∈ X n × Y n satisfying d(xn , y n ) ≤ D, Pr xn ∈ BDe g(y n ) ≥ c0n 2−nIPn? (QXY ) (X;V |Y ) , where c0n = (n + 1)−|X ||Y|(|V|+1) , QXY = Qxn yn , and g is as described above. Proof:    p(g1 (y n ) = Q0XY )p xn ∈ BDe g2 (y n , Q0XY )

X

Pr(xn ∈ BDe (g(y n ))) =

n Q0XY ∈Qn XY (Qy ,D)

   ≥ p(g1 (y n ) = Qxn yn )p xn ∈ BDe g2 (y n , Qxn yn )    ≥ (n + 1)−|X ||Y| p xn ∈ BDe g2 (y n , QXY ) ≥ (n + 1)−|X ||Y|(|V|+1) 2−nIPn? (QXY ) (X;V |Y ) , where the last inequality follows from Lemma 4. We now show that the above described scheme indeed achieves the exponent in (6). Consider any possibly random function f implemented by the primary user (and satisfying the distortion constraint), and denote by Pf the induced joint probability on (X n , Y n ). Now, consider the following chain of inequalities.     X X  Pr de X n , g(f (X n )) ≤ De = P (xn )Pf (y n |xn )p xn ∈ BDe g(y n ) xn ∈X n y n ∈Y n (a)

X

≥ c0n

X

P (xn )Pf (y n |xn )2

? (Q n n ) (X;V |Y ) −nIPn x y

xn ∈X n y n ∈Y n



c0n

=

c0n

X

P (xn )

xn ∈X n

X

X y n ∈Y n

X

P (xn )

n QX ∈Qn X x ∈TQX

(b)

X

X

QX ∈Qn X

xn ∈TQX

= c0n

(c)

Pf (y n |xn )

≥ c0n (n + 1)−|X |

min

n QXY ∈Qn X Y (Qx ,D)

min

QXY ∈Qn X Y (QX ,D)

2−nIPn? (QXY ) (X;V |Y )

2−nIPn? (QXY ) (X;V |Y )

2−n(D(QX ||P )+HQX (X)) 2−nIPn? (QX ) (X;V |Y ) X

2−n(D(QX ||P )+IPn? (QX ) (X;V |Y ))

QX ∈Qn X

≥ c0n (n + 1)−|X | maxn 2−n(D(QX ||P )+IPn? (QX ) (X;V |Y )) QX ∈QX    = c0n (n + 1)−|X | exp −n min n [D(QX ||P ) + IPn? (QX ) (X; V |Y )] , QX ∈QX

(20)

where (a) follows from Lemma 5. (b) follows from (19) and (14).  (c) follows from Lemma 1.2.3 in [27] |TQX | ≥ (n + 1)−|X | 2nHQX (X) . Taking the limit as n goes to infinity, we get E + (P, D, De ) = lim sup max min − n→∞

{fn } {gn }

    1 log Pr de X n , gn (fn (X n )) ≤ De ≤ min D(Q||P ) + R(Q, D, De ), Q n

(21)

where the last inequality follows from Proposition 3. Combining (21) and (16) yields that the limit in (3) exists and is equal to the expression given in (6), thus establishing Theorem 1. IV. L OSSY C OMMUNICATION FOR THE S HANNON C IPHER S YSTEM We now consider the setup of the Shannon cipher system with lossy communication. More precisely, the transmitter is subject to a rate constraint, and the transmitter and legitimate receiver share common randomness K ∈ K = {0, 1}nr , where r > 0 denotes the rate of the key. K is uniformly distributed over K and is independent of X n . The transmitter sends a message M = f (X n , K) to the receiver over a noiseless channel at rate R, i.e., M ∈ M = {0, 1}nR . The receiver, then, generates

11

   Y n = h(M, K). Both functions f and h are allowed to be stochastic, but must satisfy d X n , h f (X n , K), K ≤ D for all realizations of (X n , K, Y n ). The message M is overheard by the eavesdropper who knows the statistics of the source and the encoding and decoding functions f and h. However, s/he does not have access to the common randomness K. As before, the relevant secrecy metric is the probability of a successful guess, i.e., a guess V n = g(M ) satisfying de (X n , V n ) ≤ De . The optimal guess is determined, again, by the MAP rule go . We assume (A1)-(A3) (given in Section III) hold throughout this section. We further assume4 (A4) R > maxQ R(Q, D). → − → − → − → − Let D = (D, De ) and R = (R, r). For a given DMS P , distortion vector D, and rate vector R , we denote the optimal → − → − exponent by E(P, D , R ), i.e.,     → − → − 1 E(P, D , R ) = lim max min − log Pr de X n , gn (fn (X n , K)) ≤ De , (22) n→∞ {fn } {gn } n → − → − → − → − whenever the limit exists. Otherwise, we define E(P, D , R ) using the lim inf. Similarly to (22), we define E − (P, D , R ) and → − → − E + (P, D , R ) using the lim inf and lim sup, respectively. The main result of this section is given by the following bounds on E − and E + . Theorem 2: Under assumptions (A1)-(A4), for any DMS P , and distortion functions d and de with associated distortion levels D ≥ Dmin and De ≥ De,min , corresponding respectively to the primary user and the eavesdropper: 1) Lower bound:

 + → − → − E − (P, D , R ) ≥ min D(Q||P ) + Re (Q, De ) − R(Q, D) + min{r, R(Q, D)} , Q

(23)

where [a]+ , max{0, a}. 2) Upper bound: → − → − E + (P, D , R ) ≤ min {r + E(P, D, De ), E0 (P, De )} = min D(Q||P ) + min {r + R(Q, D, De ), Re (Q, De )} . Q

(24) (25)

Remark 7: As opposed to the results in [23,24], both the lower and the upper bound are continuous in the key rate r. Remark 8: Under the leniency assumption of [10], it is easy to check that E(P, D, De ) = 0 so that (24) coincides with Theorem 1 of [10]. The lower bound (23), however, does not necessarily coincide with it. The lower bound is derived by analyzing a simple scheme for the primary user. For each type Q, we associate a good rate-distortion code, and we use the shared key K to randomize the codeword associated with a given source sequence xn . → − → − To interpret the result, let E − (P, D , R , Q) = [Re (Q, De ) − R(Q, D) + min{r, R(Q, D)}]+ . This can be understood as the exponent conditioned on X n ∈ TQ (assuming, for simplicity, Q is a valid type). If r ≥ R(Q, D), the expression is reduced to Re (Q, De ). Indeed, r ≥ R(Q, D) means that the number of available keys is larger than the codebook size needed to cover TQ (in the d-distortion sense). The primary user can therefore completely “hide” the source sequence within the type class. The Re (Q, De ) exponent then follows from the fact that sequences of the same type are equally likely and the effective size of TQ , under distortion function de , is 2nRe (Q,De ) . As for the upper bound, it easy to see how the first exponent in (24) is achieved. Namely, the eavesdropper tries to guess the value of the key and then applies the scheme suggested in the previous section. The second exponent is the perfect secrecy exponent (given in (7)), which we have already shown to be achievable even if the eavesdropper has to guess in the absence of any observation. The following corollary gives sufficient conditions for the lower and upper bounds to match. We will demonstrate two interesting cases in which the corollary holds. The first is the case in which the eavesdropper must reconstruct exactly, and the second is the case in which the source is binary, the distortion functions are both the Hamming distance, and De ≤ D. 4 For

the primary user’s problem to be feasible, it is necessary to have R ≥ maxQ R(Q, D).

12

Corollary 1: Under assumptions (A1)-(A4), for any DMS P , and distortion functions d and de with associated distortion levels D ≥ Dmin and De,min , corresponding respectively to the primary user and the eavesdropper: (a) If r ≥ E0 (P, De ) + maxQ [−D(Q||P ) − Re (Q, De ) + R(Q, D)], then the limit in (22) exists and is equal to → − → − E(P, D , R ) = E0 (P, De ).

(26)

(b) If for all Q, R(Q, D, De ) = Re (Q, De ) − R(Q, D), then the limit in (22) exists and is equal to → − → − E(P, D , R ) = min D(Q||P ) + min{r + Re (Q, De ) − R(Q, D), Re (Q, De )}.

(27)

Q

We first prove the corollary and discuss examples in which it applies in Section IV-A. We prove the lower and the upper bound of Theorem 2 in Sections IV-B and IV-C, respectively. A. Proof and Applications of Corollary 1 (a) If r ≥ E0 (P, De ) + maxQ [−D(Q||P ) − Re (Q, De ) + R(Q, D)], then the bound in (23) can be rewritten as follows → − → − E − (P, D , R ) ≥ min min {r + D(Q||P ) + Re (Q, De ) − R(Q, D), D(Q||P ) + Re (Q, De )} Q

≥ min min {E0 (P, De ), D(Q||P ) + Re (Q, De )} Q

= E0 (P, De ). Combining this with the bound in (24) yields (26). The given condition is similar to the result of [25]. There, both the eavesdropper and the legitimate receiver had to reconstruct the source sequence exactly, and the available key rate was a function of the type of the sequence. It was shown that it is sufficient (and necessary) to have r(Q) ≥ E0 (P, 0) − D(Q||P ), for all Q, to achieve the perfect secrecy exponent E0 (P, 0). Remark 9: One can verify that the given condition implies r + E(P, D, De ) ≥ E0 (P, De ) as follows:     r + E(P, D, De ) ≥ E0 (P, De ) + max[−D(Q||P ) − Re (Q, De ) + R(Q, D)] + min D(Q||P ) + R(Q, D, De ) Q Q     ≥ E0 (P, De ) − min[D(Q||P ) + Re (Q, De ) − R(Q, D)] + min D(Q||P ) + Re (Q, De ) − R(Q, D) Q

Q

= E0 (P, De ), where the second inequality follows from (P3) in Proposition 1. Remark 10: The discussion following the statement of Theorem 2 suggests that r ≥ maxQ R(Q, D) is sufficient to achieve E0 (P, De ). This is, indeed, true as it implies the condition of the corollary: D(Q||P ) + Re (Q, De ) − D(Q||P ) − Re (Q, De ) + R(Q, D) = R(Q, D), ⇒ min [D(Q0 ||P ) + Re (Q0 , De )] − D(Q||P ) − Re (Q, De ) + R(Q, D) ≤ R(Q, D), 0 Q

⇒ E0 (P, De ) + max[−D(Q||P ) − Re (Q, De ) + R(Q, D)] ≤ max R(Q, D). Q

Q

(b) We can relax the bound in (23) and rewrite it as → − → − E − (P, D , R ) ≥ min D(Q||P ) + min{r + Re (Q, De ) − R(Q, D), Re (Q, De )}. Q

Then, in view of (25), (27) clearly holds if R(Q, D, De ) = Re (Q, De ) − R(Q, D) for all Q. We now give two applications of the corollary.

13

1) Perfect Reconstruction at the Eavesdropper: Suppose V = X , and the eavesdropper is required to reconstruct the source sequence perfectly, i.e., the secrecy metric is Pr(V n = X n ). In our formulation, this is equivalent to setting de to be the Hamming distance and De to 0. Then, for each Q, we get R(Q, D, 0) =

max

PY |X : E[d(X,Y )]≤D

R(PXY , 0) =

max

PY |X : E[d(X,Y )]≤D

H(X|Y ) = HQ (X) − R(Q, D).

Since Re (Q, 0) = HQ (X), then R(Q, D, 0) = Re (Q, 0) − R(Q, D), for all Q. Thus, the exponent is determined in this case by Corollary 1:

→ − → − E(P, D , R ) = min D(Q||P ) + min{r + HQ (X) − R(Q, D), HQ (X)}. Q

If, moreover, the legitimate receiver has to reconstruct exactly, we get − → − → E(P, 0 , R ) = min D(Q||P ) + min{r, H(Q)} = min{r, E0 (P, 0)}, Q

where the last equality follows from the fact that r = minQ [D(Q||P ) + r]. If r ≥ E0 (P, 0), the exponent is given by the perfect secrecy exponent E0 (P, 0), which can also be seen from the result of [25]. Otherwise, the exponent is r, which implies that the eavesdropper’s optimal strategy is to simply guess the value of the key. 2) Binary Source with Hamming Distortion and De ≤ D: Suppose X = Y = V = {0, 1}, d and de are both the Hamming distance, and De ≤ D. We will show that, for all Q, R(Q, D, De ) = R(Q, De ) − R(Q, D),

(28)

where we dropped the subscript e from Re (Q, De ), since the distortion functions are the same. That the left-hand side is greater than or equal to the right-hand side has already been established in Proposition 1. Thus, we only need to show the other direction. If D ≥ 1/2, one can readily verify that R(Q, D, De ) = R(Q, De ), which satisfies (28) since R(Q, D) = 0. The more interesting case is when De ≤ D < 1/2. We prove the following lemma in Appendix D. Lemma 6: If De ≤ D < 1/2,    0, H(Q) ≤ H(De ),   R(Q, D, De ) = H(Q) − H(De ), H(De ) ≤ H(Q) ≤ H(D),    H(D) − H(D ), H(Q) ≥ H(D). e It is straightforward to check that R(Q, D, De ) = R(Q, De ) − R(Q, D) in all cases. The exponent is then given by:    min {D(Q||P ) : H(Q) ≤ H(De )} ,   → − → − E(P, D , R ) = min min {D(Q||P ) + H(Q) − H(De ) : H(De ) ≤ H(Q) ≤ H(D)} ,    min {D(Q||P ) + H(D) − H(D ) + min{r, H(Q) − H(D)} : H(Q) ≥ H(D)} . e If X ∼ Ber(1/2), then D(Q||P ) = 1 − H(Q), and the minimums corresponding to the first two cases reduce to 1 − H(De ). The third minimum can be computed as follows: min

Q: H(Q)≥H(D)

1 − H(De ) + min{r + H(D) − H(Q), 0} = 1 − H(De ) + min{r + H(D) − 1, 0}.

Therefore, part (b) of Corollary 1 implies that  1 − H(D ), r ≥ 1 − H(D), → − → − e E(P, D , R ) = r + H(D) − H(D ), r < 1 − H(D). e

14

→ − → − For comparison, part (a) of Corollary 1 implies that E(P, D , R ) = 1 − H(De ), i.e., the perfect secrecy exponent, can be achieved if r ≥ 1 − H(D). This is because maxQ [−D(Q||P ) − Re (Q, De ) + R(Q, D)] = H(De ) − H(D). Note that, in this example, the condition r ≥ 1 − H(D) is also necessary to achieve the perfect secrecy exponent. The resulting expression when r < 1 − H(D) admits a simple geometric explanation, shown in Figure 3 below. Upon

Fig. 3. The dots represent sequences in a type class TQ . Each of the 2nr non-dashed circles represents a Hamming-distortion ball of radius D, corresponding to a possible reconstruction at the legitimate receiver. Thus, dots within the circle (in blue) represent candidate source sequences. The dashed circle represents the distortion ball of radius De around the eavesdropper’s reconstruction, and it fits entirely in a non-dashed circle.

observing the public message, the candidate source sequences are clustered into 2nr balls. Each ball corresponds to a possible value of the key K, and has volume 2nH(D) since it is the pre-image of a possible reconstruction at the legitimate receiver. For the eavesdropper, the maximum volume of the ball that he/she can generate to “engulf” candidate sequences is 2nH(De ) . Due to the structure of Hamming distortion, this maximally-sized ball can fit entirely into any one of the clusters, so that the probability of a successful guess is 2nH(De ) 2−n(r+H(D)) . B. Proof of the Inner Bound → − → − We now prove that E − (P, D , R ) ≥ minQ D(Q||P ) + [Re (Q, De ) − R(Q, D) + min{r, R(Q, D)}]+ by demonstrating an encoding-decoding strategy for the primary user that achieves the given exponent. n that For a given  > 0, let n be large enough such that we can associate, with each QX ∈ QnX , a rate-distortion code CQ X n(R(QX ,D)+) n ≤ 2 . This is feasible by the type covering satisfies the following: each sequence xn ∈ TQX is covered and CQ  Xn nr  n C /2 bins, each of size 2nr , except for possibly the lemma (Lemma 2.4.1 in [27]). We divide the codebook CQ into QX X n n last one. We denote by CQ (i, .) the ith partition of the codebook, and by CQ (i, j) the jth codeword in the ith partition. For X X

each xn ∈ TQX , let ixn and jxn denote, respectively, the index of the partition containing the codeword associated with xn and the index of the codeword within the partition. Finally, let m(QX , i, j) be a message consisting of the following: • • •

dlog |QX |e bits to describe the type QX .   n  nr   n  nr  2 . log CQX 2 bits to describe the index i, where 1 ≤ i ≤ CQ    n  X n log CQX (i, .) bits to describe the index j, where 0 ≤ j ≤ exp log CQX (i, .) − 1.

Remark 11: Let l(QX , i, j) denote the number of bits needed to describe m(QX , i, j). Then, if R(QX , D) +  > r, l(QX , i, j) ≤ log(n + 1)|X | + 1 + log(2n(R(QX ,D)−r+) + 1) + 1 + log 2nr + 1 ≤ |X | log(n + 1) + 4 + nR(QX , D) + n < nR,

15

where the last inequality holds for large enough n and small enough , since R > maxQ R(QX , D) by assumption. Similarly,    if R(QX , D) +  ≤ r, then C n 2nr = 1. So, there is only one bin and C n (1, .) = C n . Then, QX

QX

QX

`(QX , i, j) ≤ log(n + 1)|X | + 1 + 0 + log(2n(R(QX ,D)+) ) + 1 ≤ |X | log(n + 1) + 2 + nR(QX , D) + n < nR. Therefore, the above described encoding is feasible.   n (ixn , .) , and let Ks(xn ) be the first s(xn ) bits of K. The transmitter’s encoding For a sequence xn , let s(xn ) = log CQ X function f is given by:  f (X n , K) = m QX n , iX n , jX n ⊕ Ks(X n ) ,

(29)

where the XOR-operation is performed bitwise. The legitimate receiver can read the type of transmitted sequence and the n index of the bin directly from the message, and can recover jX n using M and K, so that h(M, K) = CQ (iX n , jX n ). Xn

We first describe, on a high level, why the strategy achieves the exponent in (23). The eavesdropper can tell from M the type of the transmitted sequence. For any sequence V n it generates, the maximum number of source sequences in TQX it can cover, in the de -distortion sense, is of the order of 2n(HQX (X)−Re (QX ,De )) . On the other hand, the size of the pre-image of a n codeword in CQ is of the order of 2n(HQX (X)−R(QX ,D)) . Two cases arise: n X nr 1) If CQX < 2 , then there is one partition (the entire codebook). Therefore, the number of source sequences that

could be mapped to M is of the order of 2nHQX (X) . Then, the probability of a successful guess is upper-bounded by 2−nRe (QX ,De ) . n ≥ 2nr , the number of source sequences that could be mapped to M is of the order of 2n(r+HQX (X)−R(QX ,D)) 2) If CQ X since each partition contains 2nr codewords. Then, the probability of a successful guess is upper-bounded by 2−n(Re (QX ,De )−R(QX ,D)+r) . Noting that, for large n, the conditions correspond respectively to R(QX , D) < r and R(QX , D) ≥ r, the upper bounds can be combined as 2−n(Re (QX ,De )−R(QX ,D)+min{r,R(QX ,D)}) . Finally, the exponent is trivially lower-bounded by 0. Summing over all types Q yields the exponent in (23). To make the above argument rigorous, we first study the joint probability distribution Pf of (X n , M ) induced by the encoding function f . Given xn , we get by (29):  n Pf m(Qxn , ixn , j) xn = 2−s(x ) ,

0 ≤ j ≤ 2s(x

n

)

− 1.

(30)

n

For ease of notation, we let S(xn ) = 2s(x ) . Also, note that S(xn ) depends on xn only through its type and the index of its bin. We can therefore equivalently denote it as S(Qxn , ixn ). Analogously to (12), given a type QX , we derive an upper bound on the number of source sequences in TQX that can be covered by any sequence v n ∈ V n . For any v n ∈ V n , \ BDe (v n ) TQX =

X

X

1

PXV ∈Qn xn ∈TQX : XV: PX =QX (xn ,v n )∈TPXV EPXV [de (X,V )]≤De PV =Qvn

(a)

=

X

TP (v n ) X|V

PXV ∈Qn X V (QX ,De ): PV =Qvn

≤ (n + 1)|X ||V| ≤ (n + 1)|X ||V|

(v n )

max n

TP

max

2nHPXV (X|V ) ,

PXV ∈QX V (QX ,De ): PV =Qvn PXV ∈Qn X V (QX ,De ): PV =Qvn

X|V

16

where (a) follows from the definition of QnXV (QX , De ) as: QnXV (QX , De ) = {PXV ∈ QnX V : PX = QX , EPXV [de (X, V )] ≤ De }. Therefore, for large enough n, we get \ n ≤ max (v ) T B D Q e X n n v ∈V

max

PXV ∈Qn X V (QX ,De )

2n(HPXV (X|V )+) .

(31)

Let P ◦ (QX ) be the joint type achieving the maximum in (31). We can now upper-bound the probability that the eavesdropper makes a correct guess as follows:     Pr de X n , go (f (X n , K)) ≤ De S(xn )−1

=

=

X

X

xn ∈X n

j=0

X

   n o P (xn )Pf m(Qxn , ixn , j) xn 1 xn ∈ BDe go m(Qxn , ixn , j)

n |/2nr e S(QX ,i)−1 d|CQ X X X

QX ∈Qn X

i=1

X

n   o P (xn )Pf (m(QX , i, j) xn )1 xn ∈ BDe go m(QX , i, j) .

(32)

xn ∈TQX : ixn =i

j=0

{z

|

}

,`(QX ,i)

Let `(QX , i) be as defined above, and let `(QX ) = S(QX ,i)−1

`(QX , i) =

i

`(QX , i). So we get

   n o P (xn )Pf m(QX , i, j) xn 1 xn ∈ BDe go m(QX , i, j)

X

X

j=0

xn ∈TQX : ixn =i

S(QX ,i)−1 (a)

=

P

n   o 2−n(D(QX ||P )+HQX (X)) S(QX , i)−1 1 xn ∈ BDe go m(QX , i, j)

X

X

j=0

xn ∈TQX : ixn =i S(QX ,i)−1

(b)

≤ S(QX , i)−1

X

2−n(D(QX ||P )+HQX (X)) 2n(HP ◦ (QX ) (X|V )+)

j=0 −n(D(QX ||P )+IP ◦ (QX ) (X;V )−)

=2

,

where (a) follows from (30). (b) follows from (31).  n  n Since |CQ |/2nr ≤ |CQ |/2nr + 1 ≤ 2n(R(QX ,D)+−r) + 1, we get X X   `(QX ) ≤ 2−n(D(QX ||P )+IP ◦ (QX ) (X;V )−) 2n(R(QX ,D)+−r) + 1 ≤ 2−n(D(QX ||P )+IP ◦ (QX ) (X;V )−) 2n(max{R(QX ,D)+−r,0}) × 2 n  o = 2 exp −n D(QX ||P ) + IP ◦ (QX ) (X; V ) + min{r − R(QX , D) − , 0} −  n  o = 2 exp −n D(QX ||P ) + IP ◦ (QX ) (X; V ) − R(QX , D) + min{r − 2, R(QX , D) − } .

(33)

The exponent in (33) might be negative, so we derive another upper bound on `(QX ) by rewriting it as follows: S(xn )−1

X

X

xn ∈TQX

j=0

`(QX ) =

n   o P (xn )Pf (m(QX , i, j) xn )1 xn ∈ BDe go m(QX , i, j)

S(xn )−1

X

X

xn ∈TQX

j=0

≤ ≤2

−nD(QX ||P )

.

P (xn )S(xn )−1 (34)

17

In light of (33) and (34), we get   h i+   `(QX ) ≤ 2 exp − n D(QX ||P ) + IP ◦ (QX ) (X; V ) − R(QX , D) + min{r − 2, R(QX , D) − }

We can now rewrite (32) as:

    Pr de X n , go (f (X n , K)) ≤ De   h i+   |X | ≤ 2(n + 1) max exp − n D(QX ||P ) + IP ◦ (QX ) (X; V ) − R(QX , D) + min{r − 2, R(QX , D) − } QX ∈Qn X   h i+   |X | . ≤ 2(n + 1) maxn exp − n D(QX ||P ) + Re (QX , De ) − R(QX , D) + min{r − 2, R(QX , D) − } QX ∈QX

Taking the limit as n tends to infinity, and noting that  is arbitrary, we get     → − → − 1 E − (P, D , R ) = lim max min − log Pr de X n , gn (fn (X n , K)) ≤ De n→∞ {fn } {gn } n  + ≥ min D(Q||P ) + Re (Q, De ) − R(Q, D) + min{r, R(Q, D)} . Q

C. Proof of Outer Bound → − → − We now prove that E + (P, D , R ) ≤ min{r + E(P, D, De ), E0 (P, De )}. We have already shown, following Theorem 1, that the perfect secrecy exponent E0 (P, De ) is achievable by the eavesdropper even in the absence of any observation. It follows immediately that

→ − → − E + (P, D , R ) ≤ E0 (P, De ).

(35)

So we only need to demonstrate a strategy that achieves the first exponent, i.e., r + E(P, D, De ). The strategy is a simple modification of the one suggested in section III-B. We will add an initial stage in which the eavesdropper tries to guess the ˜ is value of K, by choosing an element uniformly at random from {1, 2, · · · , 2nr }. The eavesdropper’s guess, denoted by K, ˜ which equal to K with probability 2−nr (This will correspond to the r term in (24)). Then, s/he generates Y˜ n = h(M, K), thus satisfies d(X n , Y˜ n ) ≤ D with probability at least 2−nr . Remark 12: If h is stochastic, the eavesdropper can choose any of the possible outputs of h since they all must satisfy the ˜ distortion constraint. We assume, therefore, that the eavesdropper generates Y˜ n as a deterministic function of M and K. Next, the eavesdropper implements the same stages suggested in section III-B, where Y˜ n plays the role of Y n (this stage will yield the E(P, D, De ) term in (24)). We denote the strategy by g 0 . Now, consider any possibly stochastic functions f and h implemented by the primary user (and satisfying the distortion constraint). Let Pf denote the induced joint probability of (X n , M, K), and PK denote the distribution of K. Finally, consider the following chain of inequalities.     Pr de X n , g 0 (f (X n , K)) ≤ nDe    X X X = P (xn )PK (k)Pf (m|xn , k)p xn ∈ BDe g 0 (m) xn ∈X n k∈K m∈M

=

X X X

P (xn )PK (k)Pf (m|xn , k)

xn ∈X n k∈K m∈M

≥ 2−nr

X X X xn ∈X n k∈K m∈M

    X  ˜ ˜ = k˜ p xn ∈ BD g(h(m, k)) p K e ˜ k∈K

   P (xn )PK (k)Pf (m|xn , k)p xn ∈ BDe g(h(m, k))

18

(a)

≥ c0n 2−nr

X X X

P (xn )PK (k)Pf (m|xn , k)2

? (Q n n ) (X;V |Y ) −nIPn x y ˜

xn ∈X n k∈K m∈M



c0n 2−nr

(b)

=

c0n 2−nr

=

c0n 2−nr

X

P (xn )

xn ∈X n

X

X X

PK (k)Pf (m|xn , k)

k∈K m∈M n

P (x )

xn ∈X n

min

n QXY ∈Qn X Y (Qx ,D)

X

X

QX ∈Qn X

xn ∈TQX

min

n QXY ∈Qn X Y (Qx ,D)

2−nIPn? (QXY ) (X;V |Y )

2−nIPn? (QXY ) (X;V |Y )

2−n(D(QX ||P )+HQX (X)) 2−nIPn? (QX ) (X;V |Y )

≥ c0n 2−nr (n + 1)−|X | maxn 2−n(D(QX ||P )+IPn? (QX ) (X;V |Y )) QX ∈QX    = c0n (n + 1)−|X | exp −n r + min n [D(QX ||P ) + IPn? (QX ) (X; V |Y )] , QX ∈QX

where (a) follows from Lemma 5. (b) follows from the fact that the minimum does not depend on m or k. Taking the limit as n goes to infinity, and using Proposition 3, we get     → − → − 1 E + (P, D , R ) = lim sup max min − log Pr de X n , gn (fn (X n , K)) ≤ nDe ≤ r + E(P, D, De ). n n→∞ {fn } {gn }

(36)

Combining (35) and (36) yields (24). A PPENDIX A P ROOF OF P ROPOSITION 1 A. Proof of Property (P1) (P1): For fixed PXY , R(PXY , De ) is a finite valued, non-increasing convex function of De . Furthermore, R(PXY , De ) is a uniformly continuous function of the pair (PXY , De ). Fix PXY . The minimization in (4) is over a compact set, which is non-empty due to assumption (A3). Since I(X; V |Y ) is a continuous function of PV |X,Y , the minimum is achieved. The monotonicity in De follows directly from the definition. It is easy to check that I(X; V |Y ) is convex in PV |X,Y for fixed PXY . Then, the proof of the convexity of R(PXY , De ) in De follows similarly to the case of the rate-distortion function with no side information (see Lemma 2.2.2 in [27]). To show the uniform continuity in the pair (PXY , De ), consider the following proposition, the proof of which is given in Appendix E-A. Proposition 7: Let N1 and N2 be in N, and let S and U be compact subsets of RN1 and RN2 , respectively. Let ν be a non-negative continuous function defined on S × U, and let ϑ be a real-valued continuous function defined on S × U. Suppose they satisfy the following condition: (PA) If (s, u1 ) ∈ S × U satisfies ν(s, u1 ) = minu0 ∈U ν(s, u0 ), then there exists u2 such that ϑ(s, u2 ) = ϑ(s, u1 ), and for all s0 ∈ S, ν(s0 , u2 ) = minu0 ∈U ν(s0 , u0 ). Let t0 = maxs∈S minu∈U ν(s, u), and let ϕ be a function on S × [t0 , +∞) defined as follows: ϕ(s, t) =

min

ϑ(s, u).

u:ν(s,u)≤t

If for fixed s ∈ S, ϕ(s, t) is continuous in t, then ϕ(s, t) is continuous in the pair (s, t). Remark 13: The proposition generalizes Lemma 2.2.2 in [27], which shows the continuity of the regular rate-distortion function, and the proof follows along similar lines. The proposition yields immediately the continuity of R(PXY , De ) by identifying S with PX Y , U with the set of conditional probability distributions PV |XY , t0 with De,min , and the functions ν, ϑ, and ϕ with E[de (X, V )], I(X; V |Y ), and R(PXY , De )

19

respectively. It is easy to check that De,min = maxPXY minPV |XY E[de (X, V )] so that we can identify it with t0 . To see why E[de (X, V )] and I(X; V |Y ) satisfy (PA), note the following. For notational convenience, we write E[de (X, V )] as de (PXY , PV |XY ), and I(X; V |Y ) as I(PXY , PV |XY ). Suppose de (PXY , PV |XY ) = minPˆV |XY de (PXY , PˆV |XY ) and let De (x) = minv∈V de (x, v) for x ∈ X . Then for all (x, v) such that de (x, v) > De (x), PXV (x, v) = 0. Expanding PXV (x, v): P PXV (x, v) = y PXY (x, y)PV |XY (v|x, y) = 0 ⇒ for all y ∈ Y, PXY (x, y) = 0 or PV |XY (v|x, y) = 0. Then, define PV0 |XY as follows: •

If PXY (x, y) > 0, let PV0 |(X=x,Y =y) = PV |(X=x,Y =y) .



If PXY (x, y) = 0, let PV0 |(X=x,Y =y) satisfy PV0 |(X=x,Y =y) (v|x, y) = 0 if de (x, v) > De (x).

Then PXY PV |X,Y = PXY PV0 |X,Y , thus I(PXY , PV |XY ) = I(PXY , PV0 |XY ). Moreover, the definition of PV0 |XY guarantees 0 0 that de (x, v) > De (x) ⇒ PV0 |XY (v|x, y) = 0 for all y. Therefore, for any joint distribution PXY , de (PXY , PV0 |XY ) = de (PXY , PˆV |XY ). min ˆ PV |XY

Finally, to prove uniform continuity, note that R(PXY , De ) = R(PXY , De,max ) for all De ≥ De,max . Therefore, R(PXY , De ) is uniformly continuous on the set PX Y × [De,max , ∞). Since it is also uniformly continuous on PX Y × [De,min , De,max ], the 

result is established. B. Proof of Property (P2)

(P2): For fixed PX , R(PX , D, De ) is a finite-valued function of (D, De ). Moreover, for fixed De , R(PX , D, De ) is a uniformly continuous function of the pair (PX , D). Fix PX . The maximization in (5) is over a compact set, which is non-empty due to assumption (A3). Since R(PXY , De ) is a continuous function of PXY , it is also continuous in PY |X for fixed PX . Therefore, the maximum is achieved. As for the continuity in (PX , D) for fixed De , we view R(PX , D, De ) as a function of (PX , D), and R(PXY , De ) as function of (PX , PY |X ). In the terminology of Proposition 7, we identify S with PX , U with the set of conditional probability distributions PY |X , t0 with Dmin , and the functions ν, ϑ, and ϕ with E[d(X, Y )], −R(PXY , De ), and −R(PX , D, De ) respectively. Proving that E[d(X, Y )] and R(PXY , De ) satisfy (PA) follows along the same lines as proving E[de (X, V )] and I(X; V |Y ) satisfy (PA) . Moreover, if continuity holds, uniform continuity follows from the fact that R(PX , D, De ) is constant for all D ≥ Dmax . It remains to show that R(PX , D, De ) is a continuous function of D for fixed PX and De . The result of Proposition 7 then applies immediately. To this end, consider the following proposition, the proof of which is given in Appendix E-B Proposition 8: Let N be in N, and let T be a non-empty compact subset of RN . Let L be a real-valued continuous function T defined on T . Let T1 ⊇ T2 ⊇ · · · be a decreasing sequence of non-empty compact subsets of T . Let T = i≥1 Ti . Then, lim max L(t) = max L(t).

k→∞ t∈Tk

t∈T

Moreover, let S1 ⊆ S2 ⊆ · · · be an increasing sequence of non-empty compact subsets of T . Let S =

S

i≥1

Si . If S is

compact, then lim max L(t) = max L(t).

k→∞ t∈Sk

t∈S

Now consider D ≥ Dmin , and let Dk be a decreasing sequence converging to D. Then, lim R(PX , Dk , De ) = lim

k→∞

k→∞

max

PY |X : E[d(X,Y )]≤Dk

R(PXY , De ) =

max

PY |X : E[d(X,Y )]≤D

R(PXY , De ) = R(PX , D, De ),

where the second equality follows from Proposition 8. Therefore, R(PXY , D, De ) is right-continuous in D for fixed PX and De . Finally, consider D > Dmin and let Dk be an increasing sequence converging to D. Then, lim R(PX , Dk , De ) = lim

k→∞

k→∞

max

PY |X : E[d(X,Y )]≤Dk

R(PXY , De ) =

max

PY |X : E[d(X,Y )]≤D

R(PXY , De ) = R(PX , D, De ),

where the second equality follows from Proposition 8. Therefore, R(PXY , D, De ) is left-continuous in D for fixed PX and De .



20

C. Proof of Property (P3) (P3): Re (PX , De ) − R(PX , D) ≤ R(PX , D, De ) ≤ Re (PX , De ). The upper bound is straightforward since R(PXY , De ), the rate-distortion function with side information, is always upperbounded by Re (PX , De ). The lower bound is derived as follows: R(PX , D, De ) = = ≥ =

max

PY |X : E[d(X,Y )]≤D

max

PY |X : E[d(X,Y )]≤D

max

PY |X : E[d(X,Y )]≤D

max

PY |X : E[d(X,Y )]≤D

min

I(X; V |Y )

min

H(X|Y ) − H(X|V, Y )

min

H(X|Y ) − H(X|V )

PV |X,Y : E[de (X,V )]≤De PV |X,Y : E[de (X,V )]≤De PV |X,Y : E[de (X,V )]≤De

H(X|Y ) − H(X) +

min

PV |X,Y : E[de (X,V )]≤De

H(X) − H(X|V )

= −R(PX , D) + Re (PX , De ) A PPENDIX B P ROOF OF L EMMA 2 We prove the existence of such a code by a random coding argument. To that end, we generate C ⊆ TQY by choosing N elements independently and uniformly at random from TQY . Let U (C) = {xn ∈ TQX : @ y n ∈ C satisfying (xn , y n ) ∈ TQXY }. It is enough to show that E[|U (C)|] < 1, since this implies the existence of a code C such that |U (C)| = 0. Let χ(xn ) be the characteristic function of U (C), i.e., χ(xn ) =

 1,

xn ∈ U (C),

0,

xn ∈ / U (C).

Then, E[|U (C)|] = E

X xn ∈T

χ(xn ) =

X xn ∈T

QX

Pr (xn ∈ U (C)) .

(37)

QX

Fix xn ∈ TQX and let Y1n be the first element generated in C. Then, Pr ((xn , Y1n ) ∈ / TQXY ) = 1 − Pr ((xn , Y1n ) ∈ TQXY ) = 1 −

TQY |X (xn ) 2n(H(Y |X)−/2) = 1 − 2−n(I(X;Y )+/2) , ≤1− TQY 2nH(Y )

where the inequality follows from Lemma 1.2.5 in [27] and I(X; Y ) is computed with respect to QXY . Since the elements of C are generated independently, we get Pr(xn ∈ U (C)) =

N Y

Pr ((xn , Yin ) ∈ / TQXY )

i=1



≤ 1 − 2−n(I(X;Y )+/2)

N

≤ exp{−N 2−n(I(X;Y )+/2) },

(38)

where the last inequality follows from the fact (1 − t)N ≤ exp{−tN }. Now choose N to be an integer satisfying 2n(I(X;Y )+2/3) ≤ N ≤ 2n(I(X;Y )+) . Substituting into (38), we get Pr(xn ∈ U (C)) ≤ exp{−2n/6 }. We can now rewrite (37) as E[|U (C)|] ≤ |TQX | exp{−2n/6 } ≤ exp{n log |X | − 2n/6 } < 1, for large enough n.



21

A PPENDIX C P ROOF OF P ROPOSITION 3 First, consider the following proposition. Proposition 9: For all  > 0, there exists n1 (, |X |, |Y|, |V|), such that for all n ≥ n1 , for all De ≥ De,min , for each QXY ∈ QnXY ,



min

PXY V ∈ Qn XY V (QXY ,De )

IPXY V (X; V |Y ) − R(QXY , De ) ≤ .

Proof: It follows directly from the definition that min

PXY V ∈ Q (QXY ,De )

IPXY V (X; V |Y ) ≥ R(QXY , De ).

n

So, we only need to show the other direction. To that end, let δ > 0 be small enough such that 0 0 kPXY V − PXY (X; V |Y ) ≤ , (39) V k ≤ δ ⇒ IPXY V (X; V |Y ) − IPXY V p where k · k is used to indicate the L2 -norm. Let n ≥ n1 ≥ |V| |X ||Y||V|/δ. Fix QXY ∈ QnX Y , and let PV? |XY be the conditional distribution achieving the minimum in R(QXY , De ). We construct a conditional distribution PV0 |XY as follows. nQXY (x,y)

For each (x, y) ∈ X ×Y, we will choose PV0 |X=x,Y =y from QV nQXY (x, y) (if QXY (x, y) = 0, then we can choose

PV0 |X=x,Y =y

, i.e., the set of rational PMFs over V with denominator

to be any distribution). This guarantees that QXY PV0 |XY is

in QnX YV . Let v(x) = argminv∈V de (x, v) for x ∈ X (if more than one v achieves the minimum, choose one arbitrarily). We construct PV0 |XY by rounding PV? |XY as follows. For each (x, y) ∈ X ×V, for v 6= v(x), we set PV0 |XY (v|x, y) to be the largest integer multiple of 1/(nQXY (x, y)) that is smaller than PV? |XY (v|x, y), i.e., we round down with resolution 1/(nQXY (x, y)) and denote this operation by b.cnQXY (x,y) . Finally, we set PV0 |XY (v(x)|x, y) appropriately to make PV0 |XY (.|x, y) a valid probability distribution. It is easy to see that, for such a choice, |V| 0 . PV |XY (v|x, y) − PV? |XY (v|x, y) ≤ nQXY (x, y) Moreover, this readily implies that

? Let PXY V

|V|p|X ||Y||V|

0 ? ≤ δ.

QXY PV |XY − QXY PV |XY ≤ n 0 0 = QXY PV? |XY , and PXY V = QXY PV |XY . Now, note that ? De ≥ EPXY [de (X, V )] V XX = QXY (x, y)PV? |XY (v|x, y)de (x, v)

x,y

=

v

XX x,y

+

XX x,y



x,y

  QXY (x, y) PV? |XY (v|x, y) − bPV? |XY (v|x, y)cnQXY (x,y) de (x, v)

v

XX +

QXY (x, y)bPV? |XY (v|x, y)cnQXY (x,y) de (x, v)

v

QXY (x, y)bPV? |XY (v|x, y)cnQXY (x,y) de (x, v)

v

XX x,y

  QXY (x, y) PV? |XY (v|x, y) − bPV? |XY (v|x, y)cnQXY (x,y) de (x, v(x))

v

0 = EPXY [de (X, V )]. V

Therefore, min

PXY V ∈ Q (QXY ,De )

? 0 IPXY V (X; V |Y ) ≤ IPXY (X; V |Y ) ≤ IPXY (X; V |Y ) +  = R(QXY , De ) + , V V

n

where the second inequality follows from (39) and (40).

(40)

22

Similarly, we have the following proposition. Proposition 10: For all  > 0, there exists n2 (, |X |, |Y|, de ), such that for all n ≥ n2 , D ≥ Dmin , De ≥ De,min , and for each QX ∈ QnX ,



max

PXY ∈ Qn XY (QX ,D)

R(QXY , De ) − R(QX , D, De ) ≤ . 

The proof follows along the same lines as that of Proposition 9, and is thus omitted. By the previous two propositions, for any given  > 0, we can set n large enough to satisfy IP ? (Q ) (X; V |Y ) − R(QX , D, De ) ≤ , X n for all QX ∈ QnX . Therefore, min D(QX ||P ) + R(QX , D, De ) −  ≤

QX ∈Qn X

min D(QX ||P ) + IPn? (QX ) (X; V |Y ) ≤

QX ∈Qn X

min D(QX ||P ) + R(QX , D, De ) + 

QX ∈Qn X

By taking the limit as n goes to infinity, and noting that  is arbitrary, the proof is concluded. A PPENDIX D P ROOF OF L EMMA 6 The first two cases correspond to the regime in which R(Q, D) = 0. It then follows directly from (P3) of Proposition 1 that R(Q, D, De ) = R(Q, De ). Suppose R(Q, D) > 0, and consider any PY |X satisfying E[d(X, Y )] = D0 ≤ D. Since R(Q, De ) − R(Q, D) is a lower bound to R(Q, D, De ), to prove equality, we need to find PV |X,Y that satisfies E[d(X, V )] ≤ De , and I(X; V |Y ) ≤ H(D) − H(De ).

(41)

If D0 ≤ De , simply setting V = Y satisfies (41). Suppose now D0 > De . Let py = PX|Y (1|y), y ∈ {0, 1}. py is well defined since R(Q, D) > 0 implies that PY (y) > 0, y ∈ {0, 1}. We consider several cases, depending on the value of De as compared to the py ’s. Before enumerating the cases, we show that p0 ≤ 1 − p0 . Suppose p0 > 1 − p0 , then E[d(X, 1)] = E[E[d(X, 1)|Y ]] = pY (0)(1 − p0 ) + pY (1)(1 − p1 ) < pY (0)p0 + pY (1)(1 − p1 ) = D0 ≤ D, contradicting R(Q, D) > 0. A similar argument yields p1 ≥ 1 − p1 . We are now ready to present the cases. Case 1: De ≤ min{p0 , 1 − p1 }. For each y ∈ {0, 1}, we choose PV |X,Y =y to be the optimal test channel corresponding to input distribution PX|Y =y and distortion level De , i.e., we choose PV |X,Y =y that achieves the joint shown in Figure 4.

Fig. 4.

Desired joint distribution.

One can readily verify that the following choice yields the desired joint: PV |X,Y =y (1|0, y) =

De (py − De ) (1 − De )(py − De ) , and PV |X,Y =y (1|1, y) = . (1 − 2De )(1 − py ) (1 − 2De )py

(42)

23

Then, E[d(X, V )|Y = y] = De , and H(X|V, Y = y) = H(De ). It follows that E[d(X, V )] = De , and I(X; V |Y ) = H(X|Y ) − H(X|V, Y ) ≤ H(D) − H(De ), as desired. Case 2: 1 − p0 ≥ De > p0 . For y = 0, we set PV |X,Y =0 (0|0, 0) = PV |X,Y =0 (0|1, 0) = 1. This choice yields E[d(X, V )|Y = 0] = E[d(X, 0)|Y = 0] = p0 ≤ De , and I(X; V |Y = 0) = 0.

(43)

Now, define De,1 as De,1 =

De − p0 pY (0) D0 − p0 pY (0) < = 1 − p1 , pY (1) pY (1)

(44)

where the inequality follows from the fact that D0 > De and the last equality follows from the fact that D0 = pY (0)p0 + pY (1)(1 − p1 ). Since 1 − p1 ≤ p1 , we get De,1 ≤ min{p1 , 1 − p1 }. Therefore, similarly to (42), we can choose PV |X,Y =1 to satisfy E[d(X, V )|Y = 1] = De,1 , and I(X; V |Y = 1) = H(1 − p1 ) − H(De,1 ).

(45)

Equations (43), (44), and (45) imply E[d(X, V )] = E[E[d(X, V )|Y ]] = pY (0)p0 + pY (1)De,1 = De , and I(X; V |Y ) = pY (1) (H(1 − p1 ) − H(De,1 )) . It remains to show that the latter expression is upper-bounded by H(D) − H(De ). To this end, define ϕ(t) as   t − p0 pY (0) ϕ(t) = H(t) − pY (1)H , De ≤ t ≤ D0 . pY (1) Then, ϕ0 (t) = H 0 (t) − H 0



t − p0 pY (0) pY (1)

 ≥ 0,

where the last inequality follows from the fact that H 0 (t) is decreasing and (t − p0 pY (0))/pY (1) ≥ t for t ≥ p0 . Therefore, ϕ(t) is increasing in t and  H(De ) − pY (1)H

De − p0 pY (0) pY (1)



≤ H(D0 ) − pY (1)H



D0 − p0 pY (0) pY (1)

 .

(46)

Finally, equations (46) and (44) yield pY (1) (H(1 − p1 ) − H(De,1 )) ≤ H(D0 ) − H(De ) ≤ H(D) − H(De ), as desired. Case 3: p1 ≥ De > 1 − p1 . This case is treated analogously to case 2. The above 3 cases cover all possibilities since De ≤ 1/2 ≤ min{1 − p0 , p1 } and De < D0 ≤ max{p0 , 1 − p1 }.



24

A PPENDIX E P ROOFS OF P ROPOSITIONS 7 AND 8 A. Proof of Proposition 7 We restate the proposition. Proposition 7: Let N1 and N2 be in N, and let S and U be compact subsets of RN1 and RN2 , respectively. Let ν be a non-negative continuous function defined on S × U, and let ϑ be a real-valued continuous function defined on S × U. Suppose they satisfy the following condition: (PA) If (s, u1 ) ∈ S × U satisfies ν(s, u1 ) = minu0 ∈U ν(s, u0 ), then there exists u2 such that ϑ(s, u2 ) = ϑ(s, u1 ), and for all s0 ∈ S, ν(s0 , u2 ) = minu0 ∈U ν(s0 , u0 ). Let t0 = maxs∈S minu∈U ν(s, u), and let ϕ be a function on S × [t0 , +∞) defined as follows: ϕ(s, t) =

min

ϑ(s, u).

u:ν(s,u)≤t

If for fixed s ∈ S, ϕ(s, t) is continuous in t, then ϕ(s, t) is continuous in the pair (s, t). First, note that, for all s ∈ S and all t ≥ t0 , ν −1 (s, [0, t]) , {u : ν(s, u) ≤ t} is closed by continuity of ν, so it is compact since it is also bounded. Moreover it is non-empty since t ≥ t0 . Since ϑ is continuous and the minimization is over a compact set, ϕ is well defined. Now fix (s, t) ∈ S × [t0 , +∞), and consider any sequence (sk , tk ) → (s, t). Let ts = minu∈U ν(s, u) and consider any  > 0. If t > ts : By continuity of ϕ(s, t) as a function of t for fixed s, there exists δ > 0 such that |t − t0 | ≤ δ ⇒ |ϕ(s, t) − ϕ(s, t0 )| ≤ . Let t0 = t−min{δ/2, (t−ts )/2}, and let u0 ∈ argminu:ν(s,u)≤t0 ϑ(s, u). Then, ν(s, u0 ) < t and ϕ(s, t0 ) = ϑ(s, u0 ) ≤ ϕ(s, t)+. If t = ts : Let u0 be a minimizer for ϕ(s, ts ) satisfying ν(s0 , u0 ) = ts0 for all s0 ∈ S. Such choice is possible by assumption (PA). Note that ϑ(s, u0 ) = ϕ(s, ts ). We claim that the choice of u0 is feasible for the minimization in ϕ(sk , tk ), i.e., ν(sk , u0 ) ≤ tk for sufficiently large k. Indeed, if t > ts , ν(sk , u0 ) → ν(s, u0 ) = t0 < t, then for sufficiently large k, ν(sk , u0 ) ≤ tk . If t = ts , then ν(sk , u0 ) = tsk ≤ t0 ≤ tk . Moreover, by continuity of ϑ, ϑ(sk , u0 ) → ϑ(s, u0 ). Then, for sufficiently large k, ϑ(sk , u0 ) ≤ ϕ(s, t) + /2. So, we get lim sup ϕ(sk , tk ) ≤ lim sup ϑ(sk , u0 ) ≤ ϕ(s, t). k→∞

k→∞

On the other hand, let uk be a minimizer for ϕ(sk , tk ). Consider a sequence of integers {kj } such that ϕ(skj , tkj ) → lim inf ϕ(sk , tk ). k→∞

Let {ukj } be the corresponding subsequence of minimizers. Since U is a bounded set, then {ukj } has a convergent subsequence {ukj` }. Let u0 be its limit. By continuity of ν, we have ν(s, u0 ) = lim`→∞ ν(skj` , ukj` ) ≤ lim`→∞ tkj` = t. Therefore, ϕ(s, t) ≤ ϑ(s, u0 ) = lim ϑ(skj` , ukj` ) = lim ϕ(skj` , tkj` ) = lim inf ϕ(sk , tk ). `→∞

`→∞

k→∞



B. Proof of Proposition 8 We restate the proposition. Proposition 8: Let N be in N, and let T be a non-empty compact subset of RN . Let L be a real-valued continuous function T defined on T . Let T1 ⊇ T2 ⊇ · · · be a decreasing sequence of non-empty compact subsets of T . Let T = i≥1 Ti . Then, lim max L(t) = max L(t).

k→∞ t∈Tk

t∈T

25

Moreover, let S1 ⊆ S2 ⊆ · · · be an increasing sequence of non-empty compact subsets of T . Let S =

S

i≥1

Si . If S is

compact, then lim max L(t) = max L(t).

k→∞ t∈Sk

t∈S

First, note that T is non-empty and compact since a countable intersection of non-empty decreasing compact sets is non-empty and compact. Let tk = argmax L(t)

and

t? = argmax L(t).

t∈Tk

t∈T 0

?

0

We need to show that L(tk ) → L(t ). Let Bδ (t) = {t ∈ T : kt − tk < δ}, and consider the following claim. Claim 1: For all δ > 0, there exists k0 such that for all k ≥ k0 , Tk ⊆ Bδ (T ), where [ Bδ (T ) = Bδ (t). t∈T

We show first how the claim yields our result. Let  > 0 be given. By the uniform continuity of L (continuity on a compact set), there exists δ > 0 such that kt − t0 k ≤ δ ⇒ |L(t) − L(t0 )| ≤ . Let k be large enough as guaranteed by the claim. Then, for all t ∈ Tk , there exists t0 ∈ T such that kt − t0 k ≤ δ, and subsequently |L(t) − L(t0 )| ≤ . In particular, there exists t0 ∈ T such that |L(tk ) − L(t0 )| ≤ . Then, we get L(tk ) ≤ L(t0 ) +  ≤ L(t? ) + . Since L(tk ) ≥ L(t? ), we get |L(tk ) − L(t? )| ≤ . Therefore, L(tk ) → L(t? ). It remains to prove the claim to establish the first part of the proposition. Proof of Claim 1: Fix δ > 0. Bδ (T ) is open in T by construction. Therefore, Tk \Bδ (T ) is closed in T . Since T is closed in RN , then Tk \Bδ (T ) is also closed in RN . Moreover, it is bounded, so it is compact. Since   / \ \ Tk \Bδ (T ) =  Tk  Bδ (T ) = T \Bδ (T ) = ∅, i≥1

i≥1

and Tk \Bδ (T ) is a decreasing sequence of compact sets, there exists k0 such that for all k ≥ k0 , Tk \Bδ (T ) is empty. Similarly, to prove the second part of the proposition, let sk = argmax L(t)

and

s? = argmax L(t).

t∈Sk

t∈S

?

We need to show that L(sk ) → L(s ). To this end, consider the following claim. Claim 2: For all δ > 0, there exists k1 such that for all k ≥ k1 , S ⊆ Bδ (Sk ). We show first how the claim yields our result. Let  > 0 be given. By the uniform continuity of L, there exists δ > 0 such that kt − t0 k ≤ δ ⇒ |L(t) − L(t0 )| ≤ . Let k be large enough as guaranteed by the claim. Then, for all t ∈ S, there exists t0 ∈ Sk such that kt − t0 k ≤ δ, and subsequently |L(t) − L(t0 )| ≤ . In particular, there exists t0 ∈ Sk such that |L(s? ) − L(t0 )| ≤ . Then, we get L(sk ) ≥ L(t0 ) ≥ L(s? ) − . Since L(sk ) ≤ L(s? ), we get |L(sk ) − L(s? )| ≤ . Therefore, L(sk ) → L(s? ). It remains to prove the claim. Proof of Claim 2: Fix δ > 0. Bδ (Sk ) is open in T by construction. Therefore, S\Bδ (Sk ) is closed in T . Then S\Bδ (Sk ) is closed in RN . Moreover, it is bounded, so it is compact. Since   / [ \ S\Bδ (Sk ) = S  Bδ (Sk ) = S\Bδ (S) = ∅, i≥1

i≥1

and S\Bδ (Sk ) is a decreasing sequence of compact sets, there exists k1 such that for all k ≥ k1 , S\Bδ (Sk ) is empty.



R EFERENCES [1] D. X. Song, D. Wagner, and X. Tian, “Timing analysis of keystrokes and timing attacks on SSH,” in Proceedings of the 10th USENIX Security Symposium - Volume 10. Berkeley, CA, USA: USENIX Association, 2001. [Online]. Available: http://dl.acm.org/citation.cfm?id=1251327.1251352 [2] K. Zhang and X. Wang, “Peeping tom in the neighborhood: Keystroke eavesdropping on multi-user systems,” in Proceedings of the 18th USENIX Security Symposium (USENIX Security 09). Montreal, Canada: USENIX, 2009. [Online]. Available: https://www.usenix.org/node/

26

[3] P. Venkitasubramaniam, T. He, and L. Tong, “Anonymous networking amidst eavesdroppers,” Information Theory, IEEE Transactions on, vol. 54, no. 6, pp. 2770–2784, June 2008. [4] P. Venkitasubramaniam and A. Mishra, “Anonymity of memory-limited Chaum mixes under timing analysis: An information theoretic perspective,” Information Theory, IEEE Transactions on, vol. 61, no. 2, pp. 996–1009, Feb 2015. [5] S. Kadloor, N. Kiyavash, and P. Venkitasubramaniam, “Mitigating timing side channel in shared schedulers,” Networking, IEEE/ACM Transactions on, vol. PP, no. 99, pp. 1–12, 2015. [6] Y. Wang and G. E. Suh, “Efficient timing channel protection for on-chip networks,” in Networks on Chip (NoCS), 2012 Sixth IEEE/ACM International Symposium on, May 2012, pp. 142–151. [7] F. McSherry and K. Talwar, “Mechanism design via differential privacy,” in Foundations of Computer Science, 2007. FOCS ’07. 48th Annual IEEE Symposium on, Oct 2007, pp. 94–103. [8] C. Dwork, “Differential privacy: A survey of results,” in Theory and Applications of Models of Computation, ser. Lecture Notes in Computer Science, M. Agrawal, D. Du, Z. Duan, and A. Li, Eds. Springer Berlin Heidelberg, 2008, vol. 4978, pp. 1–19. [Online]. Available: http://dx.doi.org/10.1007/$978-3-540-79228-4 1$ [9] C. E. Shannon, “Communication theory of secrecy systems,” Bell system technical journal, vol. 28, no. 4, pp. 656–715, 1949. [10] N. Weinberger and N. Merhav, “A large deviations approach to secure lossy compression,” arXiv preprint arXiv:1504.05756, 2015. [11] A. D. Wyner, “The wire-tap channel,” Bell System Technical Journal, vol. 54, no. 8, pp. 1355–1387, Oct 1975. [12] P. K. Gopala, L. Lai, and H. El Gamal, “On the secrecy capacity of fading channels,” Information Theory, IEEE Transactions on, vol. 54, no. 10, pp. 4687–4698, Oct 2008. [13] L. Lai and H. El Gamal, “The relay-eavesdropper channel: cooperation for secrecy,” Information Theory, IEEE Transactions on, vol. 54, no. 9, pp. 4005–4019, Sept 2008. [14] N. Marina, H. Yagi, and H. V. Poor, “Improved rate-equivocation regions for secure cooperative communication,” in Information Theory Proceedings (ISIT), 2011 IEEE International Symposium on, July 2011, pp. 2871–2875. [15] I. Csisz´ar and J. K¨orner, “Broadcast channels with confidential messages,” Information Theory, IEEE Transactions on, vol. 24, no. 3, pp. 339–348, May 1978. [16] N. Merhav, “Shannon’s secrecy system with informed receivers and its application to systematic coding for wiretapped channels,” Information Theory, IEEE Transactions on, vol. 54, no. 6, pp. 2723–2734, June 2008. [17] D. Gunduz, E. Erkip, and H. V. Poor, “Lossless compression with security constraints,” in Information Theory, 2008. ISIT 2008. IEEE International Symposium on, July 2008, pp. 111–115. [18] J. Massey, “Guessing and entropy,” in Information Theory, 1994. Proceedings., 1994 IEEE International Symposium on, Jun 1994, pp. 204–. [19] N. Merhav and E. Arıkan, “The Shannon cipher system with a guessing wiretapper,” IEEE Transactions on Information Theory, vol. 45, no. 6, pp. 1860–1866, 1999. [20] H. Yamamoto, “Rate-distortion theory for the Shannon cipher system,” Information Theory, IEEE Transactions on, vol. 43, no. 3, pp. 827–835, 1997. [21] ——, “A rate-distortion problem for a communication system with a secondary decoder to be hindered,” Information Theory, IEEE Transactions on, vol. 34, no. 4, pp. 835–842, 1988. [22] C. Schieler and P. Cuff, “Rate-distortion theory for secrecy systems,” Information Theory, IEEE Transactions on, vol. 60, no. 12, pp. 7584–7605, Dec 2014. [23] ——, “The henchman problem: measuring secrecy by the minimum distortion in a list,” in Information Theory (ISIT), 2014 IEEE International Symposium on. IEEE, 2014, pp. 596–600. [24] ——, “The henchman problem: measuring secrecy by the minimum distortion in a list,” CoRR, vol. abs/1410.2881, 2014. [Online]. Available: http://arxiv.org/abs/1410.2881 [25] N. Merhav, “A large-deviations notion of perfect secrecy,” Information Theory, IEEE Transactions on, vol. 49, no. 2, pp. 506–508, 2003. [26] E. Arıkan and N. Merhav, “Guessing subject to distortion,” Information Theory, IEEE Transactions on, vol. 44, no. 3, pp. 1041–1056, May 1998. [27] I. Csisz´ar and J. K¨orner, Information theory: coding theorems for discrete memoryless systems. Budapest: Akad´emiai Kiad´o, 1997.