Quantum Rate-Distortion Theory for I.I.D. Sources
arXiv:quant-ph/0011085v1 20 Nov 2000
Igor Devetak∗ and Toby Berger School of Electrical and Computer Engineering and Center for Quantum Information Cornell University, Ithaca, New York 14850 March 18, 2008
Abstract We consider a natural distortion measure based on entanglement fidelity and find the exact rate-distortion function for isotropic sources. An upper bound is found in the case of biased sources which we believe to be exact. We conclude that optimal rate-distortion codes for this measure produce no entropy exchange with the environment of any individual qubit.
1
Introduction
The lossless coding theorem tells us about the minimum rate (i.e. number of output qubits per source qubit) to which we can compress information so that it can be perfectly reproduced from the output. In realistic applications we may be able to tolerate a certain amount of distortion from the original message or require a rate less than the entropy of the source. In either case we would like to make errors as intelligently as possible: to minimize the required rate for a given allowed distortion, or equivalently to minimize the distortion for a given rate. Here the distortion measure is a user defined function of the input and output, and the precise form of it depends on the nature of the application. Finding such optimal rate vs. distortion curves is the subject of rate-distortion theory. Classical rate-distortion theory [1] is an important and fertile area in information theory. It is curious then that little effort has been put into developing quantum rate-distortion theory although the noiseless [2] and noisy [3] quantum channel theorems have been discovered over four years ago. The purpose of this paper is to fill this gap. One’s first impulse is to try an approach paralleling the classical theory. There the ratedistortion function has the simple form R(D) =
min
Y :d(X,Y )≤D
I(X; Y )
(1)
where X and Y are random variables, X describing the source (kept fixed), d(X, Y ) is a suitably defined distortion function, and I(X; Y ) is the mutual information between X and ∗
Electronic address:
[email protected] 1
Y . The relevant information-like quantity playing the role in the quantum channel capacity formula is the coherent information Ic (ρ, E) [4] to be defined in the next section. One is tempted to assume the same quantity should appear in the expression for the rate-distortion function. Indeed, Barnum [5] has derived a lower bound based on coherent information. This bound is far from tight, however, as suggested by the fact that the coherent information can be negative for relatively small distortion. This does not cause problems for the channel capacity beacuse the maximization procedure ensures positivity. Here we instead pursue the rate-distortion function from first principles using a natural distortion measure based on entanglement fidelity. We define the problem in section 2, and give some relevant background on quantum operations, entropies and fidelity measures. In section 3 we find the rate distortion function for a restricted class of coding procedures, which we argue to be exact in section 4. Section 5 presents a simple physical realization of the optimal coding procedure. Speculations are left for the final section.
2
Definitions
Let us recall some basic definitions of quantum information theory [6] [7]. A general quantum information source is described by a density matrix ρQ of a quantum system Q. This density matrix results from the system being prepared in certain pure states with respective probabilities. Alternatively, we may view our quantum system Q as a part of a larger system RQ which includes the reference system which may always be constructed so that the overall state is pure |ΨRQ i and ρQ is a result of restricting to Q ρQ = trR (|ΨRQ ihΨRQ |)
(2)
Next consider a quantum process acting on the source ρQ b Q) ≡ ρQ → E(ρ
E(ρQ ) tr(E(ρQ ))
(3)
with a general quantum operation E of the form E(ρQ ) =
k X
Ai ρQ A†i
(4)
i=1
Note that the action of E is completely determined by the set of operation elements {Ai }. A useful way to think about the quantum process is by embedding RQ into an even larger space RQE by adding an environment E, initially in a pure state |si and hence decoupled from RQ. Then a well known representation theorem [6] [7] states that a general quantum process E may be realized by performing a unitary transformation U QE entangling Q and E, followed by projecting via P E onto the environment alone and tracing out R and E. E(ρQ ) = c trRE (P E U QE (|ΨRQ ihΨRQ | ⊗ |sihs|)U QE† P E )
(5)
where c is a positive constant. Although the theorem refers to a mathematical construction, it provides physical insight; for instance, it enables one to define the entropy exchange [6] [3]
2
′
′
Se (ρQ , E) ≡ S(ρE ) = S(ρRQ )
(6) E′
RQ′
denote the state Here S(σ) ≡ −tr(σ log2 σ) is the Von Neumann entropy and ρ and ρ of E and RQ respectively after the operation. The equality in (6) comes from the fact that the system RQE remains in a pure state after the process. So Se (ρQ , E) measures the amount of noise introduced into the system RQ as a consequence of becoming entangled with E, and vice versa. A convenient expression in terms of the original operation elements {Ai } is Se (ρQ , E) = S(W ) = −tr(W log2 W )
(7)
with Wij =
tr(Ai ρQ A†j ) tr(E(ρQ ))
(8)
Observe that if there is only one operation element (or, equivalently, if they are all the same) then the entropy exchange is zero. The noise interpretation of Se is also evident from the formula for coherent information b Q )) − Se (ρQ , E) Ic (ρQ , E) = S(E(ρ
(9)
Fe (ρQ , E) = hΨRQ |(IR ⊗ E)(|ΨRQ ihΨRQ |)|ΨRQ i
(10)
which appears in the channel capacity formula. When compared to its classical couterpart I(X; Y ) = H(Y ) − H(Y |X) we see that Se (ρQ , E) plays a role analogous to the noise term H(Y |X). We end this brief review with the definition of entanglement fidelity Fe (ρQ , E)
which tells us how well the state is preserved and how well the entanglement with its surroundings R that do not participate directly in the quantum process is preserved under the operation in question. Like any meaningful quantity is has an expression which is manifestly purification independent Q
Fe (ρ , E) =
P
Q 2 i |tr(Ai ρ )| tr(E(ρQ ))
(11)
We now follow Barnum [5] and restrict attention to i.i.d. sources with density matrix ρ so that ρ(n) ≡ ρ⊗n . An (n, R) rate-distortion code consists of an encoding operation C (n) from the source space ρ(n) to a block of ⌊nR⌋ (henceforth abbreviated to nR ) qubits, and a decoding operation D (n) acting in the reverse direction. Here R < 1, so in effect we are compressing the n qubit source to nR qubits and then decompressing them back to n qubits, in an attempt to recover the original with the maximum possible fidelity consistent with the value of R. Based on entanglement fidelity Barnum defines a natural distortion measure for the ratedistortion code (C (n) , D (n) ) [5]
3
de (ρ(n) , D (n) ◦ C (n) ) ≡
n X 1
n α=1
(1 − Fe (ρ, T α ))
(12)
with T α being the marginal operation on the α-th copy of ρ induced by the encoding-decoding operation T α (σ) ≡ tr1,...,α−1,α+1,...,nD (n) ◦ C (n) (ρ ⊗ ρ · · · ⊗ ρ ⊗ σ ⊗ ρ · · · ⊗ ρ)
(13)
A rate distortion pair (R, d) is achievable for a given ρ iff there exists a sequence of (n, R) rate-distortion codes (C (n) , D (n) ) such that lim de (ρ(n) , D (n) ◦ C (n) ) ≤ d
n→∞
(14)
Then the rate distortion function R(d) is defined as the infimum of all R for which (R, d) is achievable. We could approach the problem by dividing the encoding prodecure into two steps. In the first step we would manipulate blocks of qubits of size n via a quantum operator E in order to reduce the output Von Neumann entropy per qubit to the desired rate R while leaving the average distortion as low as possible. In the second step we take N such blocks and process them in the standard noisless coding way [2, 8] in order to get a string of N nR qubits in the limit of large N . The decoding procedure is just reversing the second step which can be done with perfect fidelity in the large N limit by the noiseless coding theorem. This scheme is not quite general however. An important condition on our quantum operation P P E(ρ⊗n ) = ki=1 Ai ρ⊗n A†i is that it must be trace preserving ki=1 A†i Ai = 1. Define quantum operations EAi (ρ⊗n ) = Ai ρ⊗n A†i . A given decomposition {Ai } of unity implies that with probability λi = tr(EAi (ρ⊗n )) the operation EAi is performed. Quantum mechanics forbids us to have control over which of the k operations has been performed, but we do have information about which operation has been performed. The optimal procedure is to group the output blocks of n qubits according to which operation got carried out, and then perform Schumacher encoding and decoding separately on each group. This way we make use of all the classical information available, the only penalty being in having to store information about which qubit block was coded using which operation, so that the decoder may them properly. The average unscramble P rate associated with this scheme is Rn = ki=1 λi S EbAi (ρ⊗n ) . This does better than simply ignoring the classical information and coding everything together since the distortion in either P case is the same, de (ρ⊗n , E) = ki=1 λi de (ρ⊗n , EAi ) ≡ d (this is easily shown e.g. by induction on n), but the rate in the latter case is greater by the concavity of Von Neumann entropy [9]: b ⊗n )) = S S(E(ρ
k X i=1
! k X ⊗n b λi S EbAi (ρ⊗n ) = Rn λi EAi (ρ ) ≥
(15)
i=1
The corresponding intuitive argument is that the operation E increases the entropy exchange, which we interpreted as noise, whereas the individual EAi do not. Finally, the rate distortion function will be achieved in the limit of large n as well as large N , R(d) = limn→∞ Rn (d). This limit indeed exists since the Rn (d) are non-increasing and bounded 4
from below by zero. In the next section we analyze the n = 1 case. Subsequently we demonstrate the perhaps surprising fact that n = 1 already attains the R(d) curve in the i.i.d. case under consideration.
3
The rate-distortion function for n = 1
Let us temporarily restrict attention to k = 1, so that (4) becomes E(σ) = AσA† , and also temporarily ignore the trace-preserving constraint. First a technical lemma: Lemma 1 Let ∆ and Λ be positive diagonal matrices whose diagonal elements are given in a non-ascending order. Then for any unitary U and V the inequality |tr(U ∆V Λ)| ≤ tr(∆Λ) holds. Proof Consider the Cauchy-Schwartz inequality for the Hilbert-Schmidt inner product [7] hA, Bi ≡ tr(AB † ), namely |tr(AB † )|2 ≤ tr(AA† )tr(BB †) (16) √ √ √ √ Since and Λ are positive we have ∆ = ∆∆† and Λ = ΛΛ† . Setting A = ∆V Λ and √ ∆† √ B = ∆U Λ we find that |tr(U ∆V Λ)|2 ≤ tr(U ∆U † Λ)tr(V ∆V † Λ)
(17)
so without loss of generality we may take V = U † . Next, denote the elements of U and diagonal elements of ∆ and Λ by {uij }, {δi } and {λi } respectively. Defining the matix P with elements pij = |uij |2 we have tr(U ∆U † Λ) =
X
uij δj u∗ij λi =
i,j
X
pij δj λi
(18)
i,j
Since elements of each row and column of P add up to 1, P is a stochastic matrix, and hence a convex combination of permutation matrices [9]. So the maximum value of tr(U ∆U † Λ) P is equal to i δi′ λi with δi′ a permutation of the δi . It can be shown in general that P = I corresponds to the optimum permutation; this is especially easy to see for 2 × 2 matrices for which the ordering condition implies (λ1 − λ2 )(δ1 − δ2 ) ≥ 0, or λ1 δ1 + λ2 δ2 ≥ λ1 δ2 + λ2 δ1 . Therefore U = V = I maximizes |tr(U ∆V Λ)| as claimed in the Lemma. Theorem 1 For all single qubit quantum operations EA (ρ) = AρA† , there exists a quantum operation ED (ρ) = DρD† with [D, ρ] = 0 and D positve, of the same output entropy and smaller or equal distortion. Proof We work in the basis {|0i, |1i} in which ρ is diagonal so ρ = p0 |0ih0| + p1 |1ih1| with p0 + p1 = 1 and p0 ≥ p1 . It is easy to see that any complex matrix A can be expressed as a product A = U Dρ1/2 V ρ−1/2 where U and V are unitary and D is diagonal positive (and hence commutes with ρ). This follows from applying the polar decomposition of any complex matrix B, namely B = U ∆V . Here U 5
and V are unitary, ∆ is diagonal postive with non-ascending elements and we choose B = Aρ1/2 , D = ∆ρ−1/2 . Such a decomposition ensures that AρA† = U (DρD† )U † so that tr(AρA† ) = tr(DρD† ) and S(EbA ) = S(EbD ). In addition, Lemma 1 asserts that |tr(Aρ)| ≤ |tr(Dρ)|. Combining the above with the single qubit distortion formula de (ρ, EA ) = 1 −
|tr(Aρ)|2 , tr(AρA† )
(19)
we see that the operation ED has the same output entropy but a distortion that is less than or equal to that of EA , thus proving the statement of the Theorem. ⋆ 1
0.9
0.8
0.7
S1
0.6
0.5
0.4
0.3
0.2 p0 = 0.9
0.1
0
0
0.05
p0 = 0.8
0.1
0.15
p0 = 0.7
0.2
p0 = 0.6
0.25 d
0.3
p0 = 0.5
0.35
0.4
0.45
0.5
Fig. 1. The lower bound S1 (d) on the single qubit rate distortion function plotted for p0 = 0.5, 0.6, 0.7, 0.8 and 0.9
Theorem 1 gives a complete parametrization for the unphysical n = k = 1 curve S1 (d) since A is defined only upto a multiplicative constant. It is easy to see that in the {|0i, |1i} basis A=
cos θ 0 0 sin θ
!
π , θ ∈ [0, ] 4
(20)
interpolates between the zero distortion limit A = I where S = S(ρ) and the zero entropy limit A = |0ih0| where we replace the source with the pure “best guess” state |0ih0|. This curve is shown in Fig. 1 for several values of p0 . It is easily verified to be convex. This serves as a lower bound for the physical n = 1 rate-distortion curve (i.e. the one generated by trace preserving P operations). Indeed, for any decomposition of unity i A†i Ai = 1 and λi = tr(EAi (ρ)) we have k X i=1
λi S EbAi (ρ) ≥
k X i=1
λi S1 (de (ρ, EAi )) ≥ S1
6
k X i=1
!
λi de (ρ, EAi )
(21)
by the convexity of S1 (d). In the case of p0 = with k = 2, A1 =
!
cos θ 0 0 sin θ
1 2
due to isotropy this lower bound is attainable
sin θ 0 0 cos θ
, A2 =
!
π , θ ∈ [0, ] 4
(22)
The case p0 > 21 is not as obvious. First we would like to show that k = 2 suffices. We fix A1 and vary Ai , 2 ≤ i ≤ k . We use Lagrange multpliers and seek the minimum of k X i=2
tr(Ai ρA†i )S
Ai ρA†i tr(Ai ρA†i )
!
−µ
k X i=2
|tr(Ai ρ)|2 −
k X
tr(ΛA†i Ai )
(23)
i=2
Differentiating with respect to Ai and A†i and setting this to zero we obtain an equation involving only Ai , A†i , µ and Λ. So evidently a solution is obtained for A2 = . . . = Ak . This has the same effect as retaining only A2 . So k = 2 includes natural solutions to the extremum problem; motivated by the p = 21 case, we conjecture that the global maximum is among them. Restricting attention to k = 2 we concentrate on the case where A1 and A2 are diagonal and use the folowing parametrization A1 =
cos α 0 0 cos(α + ∆)
!
, A2 =
sin α 0 0 sin(α + ∆)
!
π , ∆ ∈ [0, ] 2
(24)
and d = 2p0 p1 (1 − cos ∆). Here α is function of ∆ such that S=
2 X
tr(Ai ρA†i )S
i=1
Ai ρA†i tr(Ai ρA†i )
!
(25)
is maximized. Differentiating with respect to α we arrive at
2p0 p1 sin ∆ log2
!
cos α cos(α + ∆) p1 cos2 (α + ∆) + log2 p0 cos2 α p0 cos α + p1 cos(α + ∆)
+ (p0 sin 2α + p1 sin 2(α + ∆)) h2
p0
sin2
p0 sin2 α α+p1 sin2 (α+∆)
− h2
!
p1 sin2 (α + ∆) sin α sin(α + ∆) p0 sin α + p1 sin(α + ∆) p0 sin2 α
p0 cos2 α p0 cos2 α+p1 cos2 (α+∆)
= 0(26)
which we solve numerically. Here h2 (λ) ≡ −λ log2 (λ) − (1 − λ) log2 (1 − λ) is the Shannon binary entropy function. The function α(∆) is plotted in Fig. 2 for several values of p0 . We also plot the corresponding rate-distortion curves in Fig. 3. The curves are convex and approach dmax = 2p0 p1 with zero slope . Note that the p0 = 21 solution is precisely the one obtained previously.
7
!
0.8
0.7
0.6 p0 = 0.5
Alpha
0.5
0.4
p0 = 0.6
0.3 p0 = 0.7 0.2 p0 = 0.8 0.1
0
p0 = 0.9
0
0.2
0.4
0.6
0.8 Delta
1
1.2
1.4
1.6
Fig. 2. The function α(∆) that solves (26) plotted for p0 = 0.5, 0.6, 0.7, 0.8 and 0.9
Now we show that this diagonal solution is optimal with respect to local perturbations of the {Ai }. Recall that we wish to find the optimal tradeoff between S defined in (25) and P P d = 1 − i |tr(Ai ρ)|2 such that i A†i Ai = 1. Notice that both S and the trace preserving condition are invariant under the transformation Ai → Ui Ai where Ui are unitary matrices. Furthermore |tr(Ui Ai ρ)| ≤ |tr(Ai ρ)| when Ai ρ is positive (see Lemma 2 below), and we may always pick Ui to achieve this upper bound. This can be seen from the polar decomposition Ai ρ = Vi Di Wi and choosing Ui = (Vi Wi )−1 . Therefore we restrict attention to positive Ai ρ and use a new parametrization: A1 = f
λ cos θ p0 x∗ sin θ p0
x sin θ p1 (1−λ) cos θ p1
!
, A2 = f
µ sin θ p0 ∗ θ − x pcos 0
θ − x cos p1
(1−µ) sin θ p1
!
(27)
in terms of θ and complex x. Here λ and µ are functions of |x| determined by the conditions λ2 cos2 θ + µ2 sin2 θ =
p20 − |x|2 f2
(1 − λ)2 cos2 θ + (1 − µ)2 sin2 θ =
p21 − |x|2 f2
(28) (29)
and d = 1−f 2 . We see from the expansion about x = 0 that λ and µ are both quadratic in |x|. It is also easy to see that the traces and determinants of the Ai ρA†i (and hence the eigenvalues) also have no terms linear in x. Expanding to second order about the optimal diagonal solution, we verify that S is indeed at a local maximum with respect to varying x. We thus conclude our argument that the n = 1 rate-distortion curves R1 (d) are the ones depicted in Fig. 3. 8
1
0.9
0.8
0.7
R1
0.6
0.5
0.4
0.3
0.2 p0 = 0.9
0.1
0
0
0.05
p0 = 0.8
0.1
0.15
p0 = 0.7
0.2
0.25 d
p0=0.6
0.3
p0 = 0.5
0.35
0.4
0.45
0.5
Fig. 3. The single qubit rate distortion function R1 (d) plotted for p0 = 0.5, 0.6, 0.7, 0.8 and 0.9
4
The rate-distortion function for general n
Now we move to the general n case and argue that we cannot do any better than R1 (d). We have n qubits with joint density operator ρ⊗n , and we consider appropriate combinations of quantum operations EA (ρ⊗n ) = A(ρ⊗n )A† . We work in the basis B n = {|0i, |1i}n with |0i and |1i defined as before. In this basis the system operator A is given by A=
B K L C
!
(30)
where the B, K, L and C are 2n−1 × 2n−1 matrices acting on the last n − 1 qubits. It is easy to verify that the restriction E > of E to the last n − 1 qubits is given by the set √ √ √ √ { p0 B, p1 K, p0 L, p1 C} of operation elements. We first restrict attention to processes with A diagonal in the B n basis. Theorem 2 General n-qubit trace-preserving processes with operation elements {Ai } diagonal in the B n basis cannot perform below the single qubit rate-distortion curve R1 (d). Proof We prove the theorem using induction on n. It is true for n = 1 by the results of the previous section. Let us now assume it holds for n, and then show its validity for n + 1. We work in the B n+1 basis where Ai is represented by a 2n+1 × 2n+1 dimensional matrix Ai =
√1 Bi p0
9
√1 Ci p1
!
(31)
with Bi and Ci both diagonal 2n × 2n matrices acting on the last n qubits. Then the projection of EAi onto the last n qubits is EA>i (ρ⊗n ) = Bi ρ⊗n Bi† + Ci ρ⊗n Ci† . We also have from (31) that ⊗n+1
EAi (ρ
Bi ρ⊗n Bi†
)=
Ci ρ⊗n Ci†
!
(32)
Then the normalized projection of EAi onto the first qubit is
where λi =
tr(EBi (ρ⊗n )) tr(EAi (ρ⊗n+1 )) .
EbA1 i (ρ)
λi
=
1 − λi
!
(33)
The average distortion associated with the coding procedure defined by the {Ai } is d=
n > 1 d + d1 n+1 n+1
(34)
where d> =
X i
tr(EBi (ρ⊗n ))de (ρ⊗n , EBi ) + tr(ECi (ρ⊗n ))de (ρ⊗n , ECi )
(35)
and d1 =
X i
de (ρ, EA1 i )
(36)
Using the simple identity S(λρ1 ⊕ (1 − λ)ρ2 ) = λS(ρ1 ) + (1 − λ)S(ρ2 ) + h2 (λ)
(37)
S(EbAi (ρ⊗n+1 ) = λi S( EbBi (ρ⊗n )) + (1 − λi )S(EbCi (ρ⊗n )) + h2 (λi )
(38)
we find that
Hence:
1 X tr(EAi (ρ⊗n+1 ))S(EbAi (ρ⊗n+1 )) n+1 i = + ≥
n n+1
!
1X tr(EBi (ρ⊗n ))S( EbBi (ρ⊗n )) + tr(ECi (ρ⊗n ))S( EbCi (ρ⊗n )) n i
1 X tr(EA1 i (ρ))S(EbA1 i (ρ)) n+1 i
1 n R1 (d> ) + R1 (d1 ) ≥ R1 (d) n+1 n+1
10
(39)
The equality comes from (33),(38) and the fact that tr(EAi (ρ⊗n+1 )) = tr(EbA1 i (ρ)), the first inequality comes from the inductive hypothesis, and the second inequality is a consequence of convexity of R1 (d) and (34). So the rate for {Ai } is higher than or equal to R1 (d) at the same distortion, as claimed. ⋆ Finally, it remains to show that for general n diagonal processes are optimal. This may be shown exactly in the case of p0 = 21 due to its many simplifying features. We begin with two lemmas. Lemma 2 Given matrices {Yi } with |tr(D)|2 holds.
P
Proof We use the fact that D = inequality (16) X i
|tr(Yi D)|2 =
X i
† i Yi Yi
√
= I and positive D the inequality
P
i
|tr(Yi D)|2 ≤
DD† for D positive and employ the Cauchy-Schwartz
X √ √ tr(Yi DYi† )tr(D) = |tr(D)|2 |tr((Yi D) D † )|2 ≤
(40)
i
The last equality comes from the cyclicity and linearity of trace. ⋆ Lemma 3 Given operators {Yi } acting on n qubits with in the B n basis, the inequality X i
holds.
P
† i Yi Yi
= I and positive D, diagonal
tr(EYi D (ρ⊗n ))de (ρ⊗n , EYi D ) ≥ tr(ED (ρ⊗n ))de (ρ⊗n , ED )
Proof We again use induction. The n = 1 case follows from Lemma 2. Assuming the Lemma holds for n we prove it for n + 1. Consider 2n+1 × 2n+1 dimensional matrices {Yi }, and let Ei Fi Gi Hi
!
with Ei etc. of dimension 2n × 2n .
P
Yi =
D=
† i Yi Yi
√1 D0 p0
√1 D1 p1
!
(41)
= I implies that
X
Ei† Ei + G†i Gi = I
i
(42)
and similarly for Fi and Hi . The restriction EY>i D of EYi D onto the last n qubits is described by the set {Ei D0 , Fi D1 , Gi D0 , Hi D1 }. Then X i
tr(EY>i D (ρ⊗n ))de (ρ⊗n , EY>i D ) =
X
tr(EEi D0 (ρ⊗n ))de (ρ⊗n , EEi D0 ) + tr(EFi D1 (ρ⊗n ))de (ρ⊗n , EFi D1 )
X
tr(ED0 (ρ⊗n ))de (ρ⊗n , ED0 ) + tr(ED1 (ρ⊗n ))de (ρ⊗n , ED1 )
i
+ tr(EGi D0 (ρ⊗n ))de (ρ⊗n , EGi D0 ) + tr(EHi D1 (ρ⊗n ))de (ρ⊗n , EHi D1 )
≥
i
> ⊗n > = tr(ED (ρ ))de (ρ⊗n , ED )
11
(43)
The inequality comes from the inductive hypothesis and (42). Finally, this result is invariant under permutations od the qubits; averaging over all permutations yields X i
tr(EYi D (ρ⊗n+1 ))de (ρ⊗n+1 , EYi D ) ≥ tr(ED (ρ⊗n+1 ))de (ρ⊗n+1 , ED )
(44)
This proves the Lemma. ⋆ Theorem 3 General n-qubit processes cannot perform below the single qubit entropy-distortion curve S1 (d) in the case of isotropic sources (p0 = 21 ). Proof This is an immediate consequence of Lemma 3. We ignore the trace preserving condition for the time being and consider EA (ρ⊗n ) = A(ρ⊗n )A† . Then we use the polar decomposition A = U DV with U and V unitary and D diagonal positive. Using the fact that ρ = 21 I it is easy to see that tr(EA (ρ⊗n )) = tr(ED (ρ⊗n )) , S(EbA (ρ⊗n )) = S(EbD (ρ⊗n )) and de (ρ⊗n , EA ) = de (ρ⊗n , EV U D ). Then from Lemma 3 with m = 1 and Y1 = V U we get de (ρ⊗n , EA ) ≥ de (ρ⊗n , ED ). So there is a diagonal map that does at least as well as EA . From a trivial variation on Theorem 2 (note that the trace-preserving condition is not necessary for proving it), this diagonal map cannot do better than the n = k = 1 curve S1 (d) which is attainable for p0 = 21 . Having established that the optimal EA yields the convex S1 (d), using the same argument as in (21) we see that re-introducing the trace-preserving condition does not affect our result. Hence the Theorem is proved. ⋆ We conjecture that the theorem also holds for the case p0 > 12 , and we now present some evidence to support this conjecture. It again suffices to show that diagonal processes are optimal for general n. • Consider perturbing a process defined by 2n × 2n dimensional diagonal matrices {Ai } with † i Ai Ai = I by a general matrices {Qi } with diagonal elements all equal to zero. It is easy to see that to linear order the trace-preserving condition still holds, and both average entropy and distortion remain unchanged. Hence all diagonal processes are local extrema with respect to off-diagonal perturbations. • In Theorem 2 we never used the fact that Bi and Ci were diagonal, so a more general class of operatons given by (31), in B n or any other basis obtained by permutations of the qubits, lies above the R1 (d) curve. • A straightforward modification of Theorem 3 shows that diagonal processes Di do better than Ui Di , where Ui is any unitary operator (note that the trace preserving condition still holds). • By iterating the argument preceding Theorem 2, the restriction of a general n-qubit operation onto a single qubit involves 2n−1 operation elements which greatly increases the entropy of the environment of that qubit. Essentially, individual qubits act as the environment for each other, and entangling them creates noise. On the other hand, as in classical information theory, the benefit of entangling (correlating) the qubits is a reduction in entropy since P S(E(ρ⊗n )) ≤ α S(E α (ρ)) where E α is the restriction of E to the αth qubit. There is a competition between these two effects, and the former wins, as we have proven rigorously for p0 = 21 . In this sense, however, there is nothing special about p0 = 12 . If anything, we would expect the P
12
entropy to be even harder to reduce via quantum operations for p0 > it is lower to start with.
5
1 2
than for p0 =
1 2
because
Physical realization of the R(d) curve
We now elaborate on how our coding procedure may be realized physically. For the lossy part of the coding we need to provide an ancilla qubit in a definite state. We first apply a unitary transformation entangling the ancilla with the source qubit, and then measure the ancilla. In the basis {|0iA |0iQ , |0iA |1iQ , |1iA |0iQ , |1iA |1iQ }, the unitary transformation is given by the matrix
π 2]
cos α
sin α
U =
− sin α
cos(α + ∆)
cos α sin(α + ∆)
− sin(α + ∆) cos(α + ∆)
(45)
with ∆ ∈ [0, and α = α(∆) as defined before. The ancilla is prepared in the |0iA state so that the initial density operator for the ancilla-source system is Ξ=
ρ 0 0 0
!
(46)
Then U ΞU † =
A1 ρA†1 A1 ρA†2 A2 ρA†1 A2 ρA†2
!
(47)
where A1 and A2 are the ones defined in (24). We then measure the ancilla qubit. If the outcome is |0iA we know the map ρ → EbA1 (ρ) has been performed, and we label the qubit as belonging to type 1. Similarly, if the outcome is |1iA , we know the map ρ → EbA2 (ρ) has taken place, and the qubit is of type 2. In the end we perform Schumacher encodings on all the bits of the first type and separately on all the bits of the second type. When decoding, we need information about the sequence of operations performed. The rate of classical information required for this is h2 (tr(A1 ρA†1 )). These classical rates are plotted for several values of p0 in Fig. 4.
6
Discussion
We have seen that the optimal quantum rate-distortion codes are separable into a lossy part involving single qubit operations followed by the standard Schumacher lossless coding involving large blocks of qubits. Our result has the following interpretation: the rate-distortion curve is achieved by quantum operations that produce no entropy exchange with the environment of any individual qubit. We do not expect this to be true for more general distortion measures; since ours cares about
13
preserving the state of RQ, it particularly forbids the increase of the entropy of RQ which is precisely the entropy exchange. 1 p0 = 0.5 0.9 p0 = 0.6 0.8
0.7
p = 0.7
Classical Rate
0
0.6
0.5
p = 0.8 0
0.4 p0 = 0.9
0.3
0.2
0.1
0
0
0.05
0.1
0.15
0.2
0.25 d
0.3
0.35
0.4
0.45
0.5
Fig. 4. The classical information rate needed for the decoding procedure plotted for p0 = 0.5, 0.6, 0.7, 0.8 and 0.9
Let us examine the action of our quantum map on normalized pure states. If we picture |0i and |1i as orthogonal vectors, then depending on which of the two operations has been performed the map rotates our pure state vector towards |0i or towards |1i. Originally the source is biased towards |0i since it is produced with a higher probability than |1i. The first type of map biases the source it even more towards it, hence causing a decrease in entropy. The second type does the opposite, which may even increase the entropy, and is suboptimal for p0 > 21 , but it has to occur a certain fraction of the time in order to obey the trace preserving condition (which says that the total probablilty of performing some operation must be equal to 1 irrespectively of the input state). On average, however, the entropy does decrease. At the same time the discrepancy between the initial and final state increases. The R(d) curve is thus swept out. Notice an unsual feature of our R(d) curve at R = 0: dmax = 2p0 p1 instead of the classical dmax = p1 which comes about by replacing the source bit with the best guess state. This is due to our choice of fidelity measure: replacing the original qubit with a fresh one prepared it the state |0i destroys the entanglement with the original reference system. The best we can do is project onto |0i with probability p0 and otherwise project onto |1i. We do not expect a general expression resembling the classical one (1) valid for all distortion measures to exist for quantum rate-distortion. Our reason for this lies in the richness of distortion measures which vary in their degreee of ”quantumness”. The one we have used based on entanglement fidelity is evidently highly quantum in nature. On the other hand, we could view ρ as being realized by a specific ensemble like Q = {(|0i, p0 ), (|1i, p1 )}, and as our distortion measure use the corresponding average pure state distorton measure d(Qn , D (n) ◦ C (n) ) based on the average pure state fidelity F (Q, E) 14
F (Q, E) = p0 h0|E(|0ih0|)|0i + p1 h1|E(|1ih1|)|1i
(48)
Here we are able to attain zero distortion by sending mere classical information – the measurement results in the {|0i, |1i} basis. If we do not allow storing classical information then the rate distortion curve is the classical one for the Hamming measure, namely R(d) = S(ρ) − h2 (d). More general ensembles admit formulations with or without storing classical information. One could also investigate distortion measures tied to a specific quantum cryptography protocol. Finally, the work presented here naturally generalizes to systems with more than two degrees of freedom. Acknowledgement We thank Konrad Banaszek, David Mermin and Ian Walmsley for valuable comments, in particular for pointing out problem formulation inadequacies in eariler versions of the manuscript. This research was supported in part by the ARO-administered MURI Grant No. DAAG-19-99-1-0125 and NSF Grant CCR-9980616.
References [1] T.Berger, Rate distortion theory, Prentice Hall (1971) [2] B.Schumacher, “Quantum coding”, Phys.Rev.A 51, 2738 (1995); R.Josza and B.Schumacher, “A new proof of the quantum noiseless coding theorem”, J.Mod.Opt 41, 2343 (1994) [3] S.Lloyd, “Capacity of the noisy quantum channel”, Phys.Rev.A 55, 1613 (1996) [4] B.Schumacher and M.A.Nielsen, “Quantum data processing and error correction”, Phys.Rev.A 54, 2629 (1996) [5] H.Barnum, “Quantum rate-distortion coding”, Phys.Rev.A 62, 42309 (2000) [6] B.Schumacher, “Sending entanglement through noisy quantum channels”, Phys.Rev.A 54, 2614 (1995) [7] H.Barnum, M.A.Nielsen and B.Schumacher, “Information transmission through a noisy quantum channel”, Phys.Rev.A 57, 4153 (1998) [8] I.L.Chuang and D.S.Modha, “Reversible arithmetic coding for quantum data compression”, IEEE Trans. IT 46, 1104 (2000) [9] A.Wehrl, “General properties of entropy”, Rev.Mod.Phys 50, 221 (1978)
15