Inequalities for the quantum Rényi divergences with applications to compound coding problems
arXiv:1310.7525v3 [quant-ph] 20 Apr 2014
Milán Mosonyi Física Teòrica: Informació i Fenomens Quàntics, Universitat Autònoma de Barcelona, ES-08193 Bellaterra (Barcelona), Spain. Mathematical Institute, Budapest University of Technology and Economics Egry József u 1., Budapest, 1111 Hungary
Abstract We show two-sided bounds between the traditional quantum Rényi divergences and the new notion of Rényi divergences introduced recently in Müller-Lennert, Dupuis, Szehr, Fehr and Tomamichel, J. Math. Phys. 54, 122203, (2013), and Wilde, Winter, Yang, arXiv:1306.1586. The bounds imply that the two versions can be used interchangeably near α = 1, and hence one can benefit from the best properties of both when proving coding theorems in the case of asymptotically vanishing error. We illustrate this by giving short and simple proofs of the quantum Stein’s lemma with composite null-hypothesis, universal source compression, and the achievability part of the classical capacity of compound quantum channels. Apart from the above interchangeability, we benefit from a weak quasi-concavity property of the new Rényi divergences that we also establish here.
1
Introduction
Rényi introduced a generalization of the Kullback-Leibler divergence (relative entropy) in [49]. According to his definition, the α-divergence of two probability distributions p and q on a finite set X for a parameter α ∈ [0, +∞) \ {1} is given by Dα (pkq) :=
X 1 log p(x)α q(x)1−α . α−1 x∈X
(1)
The limit α → 1 yields the standard relative entropy. These quantities turned out to play a central role in information theory and statistics; indeed, the Rényi divergences quantify the trade-off between the exponents of the relevant quantities in many information-theoretic tasks, including hypothesis testing, source coding and noisy channel coding; see, e.g. [14] for an overview of these results. It was also shown in [14] that the Rényi relative entropies, and other related quantities, like the Rényi entropies and the Rényi capacities, have direct operational interpretations as so-called generalized cutoff rates in the corresponding information-theoretic tasks. 1
In quantum theory, the state of a system is described by a density operator instead of a probability distribution, and the definition (1) can be extended for pairs of density operators in various inequivalent ways, due to the non-commutativity of operators. The traditional way to define the Rényi divergence of two density operators is Dα(old) (ρkσ) :=
1 log Tr ρα σ 1−α . α−1
(2)
It has been shown in [38] that, similarly to the classical case, the Rényi α-divergences (old) Dα with α ∈ (0, 1) have a direct operational interpretation as generalized cutoff rates in the so-called direct domain of binary state discrimination. This is a consequence of another, indirect, operational interpretation in the setting of the quantum Hoeffding bound [5, 21, 23, 42]. Recently, a new quantum extension of the Rényi α-divergences has been proposed in [40, 56], defined as 1−α 1−α α 1 . (3) Dα(new) (ρkσ) := log Tr σ 2α ρσ 2α α−1 This definition was introduced in [40] as a parametric family that connects the minand max-relative entropies [16, 48] and Umegaki’s relative entropy [55]. In [56], the corresponding generalized Holevo capacities were used to establish the strong converse property for the classical capacities of entanglement-breaking and Hadamard channels. It was shown in [39] that these new Rényi divergences play the same role in the (strong) converse problem of binary state discrimination as the traditional Rényi divergences in the direct problem. In particular, the strong converse exponent was expressed as a function of the new Rényi divergences, and from that a direct operational interpretation was derived for them as generalized cutoff rates in the sense of [14]. The above results suggest that, somewhat surprisingly, one should use two different quantum extensions of the classical Rényi divergences: for the direct part, corresponding to α ∈ (0, 1), the “right” definition is the one given in (2), while for the converse part, corresponding to α > 1, the “right” definition is the one in (3). Although coding theorems supporting this separation have only been shown for binary state discrimination so far, it seems reasonable to expect the same separation in the case of other information-theoretic tasks. We remark that, in line with this expectation, lower bounds on the classical capacity of quantum channels can be obtained in terms of the traditional Rényi divergences [37], while upper bounds were found in terms of the new Rényi divergences in [56]. On the other hand, the above two quantum Rényi divergences have different mathematical properties, which might make them better or worse suited for certain mathematical manipulations, and therefore it might be beneficial to use the new Rényi divergences in the direct part of coding problems, and the traditional ones in converse parts, despite the “real” quantities being the opposite. The problem that one faces then is how to arrive back to the natural quantity of the given problem. As it turns out, this is possible, at least if one’s aim is to study the case of asymptotically vanishing error, corresponding to α → 1; this is thanks to the well-known Araki-Lieb-Thirring inequality, and its complement due to Audenaert [6]. We explain this in detail in Section 3.1. 2
Convexity properties of these divergences are of particular importance for applications. As it was shown in [18, 56], both versions of the Rényi divergences are jointly quasi-convex around α = 1. In Section 3.2 we show a certain converse to this quasiconvexity in the form of a weak partial quasi-concavity (Corollary 3.15 and Proposition 3.17), which is still strong enough to be useful for applications, as we illustrate on various examples in Section 4. Coding theorems for the problems considered in Section 4 have been established in [9, 44] for Stein’s lemma with composite null-hypothesis, in [30] for universal source compression, and in [11, 15] for the classical capacity of compound and averaged channels. Here we provide alternative proofs for these coding theorems, using the following general approach: (1) We take a single-shot coding theorem that bounds the relevant error probability in terms of a Rényi divergence. In the case of Stein’s lemma and source compression, this is Audenaert’s inequality [4], while in the case of channel coding, we use the random coding theorem due to Hayashi and Nagaoka [19]. The bounds are given (old) in terms of Dα . (2) We use lemma 3.3 to switch from the old to the new Rényi divergences in the upper bound to the error probability, and then we use the weak partial quasi-concavity properties of the Rényi divergences, given in Corollary 3.15 and Proposition 3.17, to decouple the upper bound into a sum of individual Rényi divergences. (old)
(3) If necessary, we use again lemma 3.3 to return to Dα
in the upper bound.
(4) We use the additivity of the relevant Rényi quantities (divergences, entropies, generalized Holevo quantities) to obtain the asymptotics. The advantage the above approach is that it only uses very general arguments that are largely independent of the concrete model in consideration. Once the single-shot coding theorems are available, the coding theorems for the composite cases follow essentially by the same amount of effort as for the simple cases (simple null-hypothesis, single source, single channel), using only very general properties of the Rényi divergences. This makes the proofs considerably shorter and simpler than e.g., in [9, 11, 15]. Moreover, this approach is very easy to generalize to non-i.i.d. compound problems, unlike the methods of [30, 44], which are based on the method of types.
2
Notations
For a finite-dimensional Hilbert space H, let B(H)+ denote the set of all non-zero positive semidefinite operators on H, and let S(H) := {ρ ∈ B(H)+ ; Tr ρ = 1} be the set of all density operators (states) on H. We define the powers of a positive semidefinite operator A only on its support; that is, if λ1 , . . . , λr are the strictly positive eigenvalues Pr ofα A, with corresponding spectral α projections Pr P1 , . . . , Pr , then we define A := i=1 λi Pi for all α ∈ R. In particular, 0 A = i=1 Pi is the projection onto the support of A. We will use the convention log 0 := −∞ and log +∞ := +∞. 3
3 3.1
Rényi divergences Two definitions
For non-zero positive semidefinite operators ρ, σ, the Rényi α-divergence [49] of ρ w.r.t. σ with parameter α ∈ (0, +∞) \ {1} is traditionally defined as ( 1 1 log Tr ρα σ 1−α − α−1 log Tr ρ, α ∈ (0, 1) or supp ρ ⊆ supp σ, Dα(old) (ρkσ) := α−1 +∞, otherwise. (old)
For the mathematical properties of Dα , see, e.g. [32, 38, 47]. Recently, a new notion of Rényi divergence has been introduced in [40, 56], defined as 1−α 1−α α ( 1 1 − α−1 log Tr σ 2α ρσ 2α log Tr ρ, α ∈ (0, 1) or supp ρ ⊆ supp σ, Dα(new) (ρkσ) := α−1 +∞, otherwise. (new)
For the mathematical properties of Dα
, see, e.g. [8, 18, 39, 40, 56]. (old)
Remark 3.1. It is easy to see that for non-zero ρ, we have limσ→0 Dα (ρkσ) = (new) (old) (new) limσ→0 Dα (ρkσ) = +∞, and hence we define Dα (ρk0) := Dα (ρk0) := +∞ (old) when ρ 6= 0. On the other hand, for non-zero σ, the limits limρ→0 Dα (ρkσ) and (new) (old) limρ→0 Dα (ρkσ) don’t exist, and hence we don’t define the values of Dα (0kσ) and (new) Dα (0kσ). To see the latter, one can consider ρn := n1 |0ih0|+ n1β |1ih1|, and σ := |1ih1|, where |0ih0| and |1ih1| are orthogonal rank 1 projections. It is easy to see that for α < 1, (old) (new) n1−βα 1 log 1+n limn→+∞ Dα (ρn kσ) = limn→+∞ Dα (ρn kσ) = limn→+∞ α−1 1−β depends on the value of β. A similar example can be used for α > 1. (old)
Remark 3.2. Note that the definition of Dα makes sense also for α = 0, and we get D0 (ρkσ) = − log Tr ρ0 σ. It is easy to see that if supp ρ ⊆ supp σ then (old) D∞ (ρkσ) := lim Dα(old) (ρkσ) = max{r/s : Tr Pρ ({r})Pσ ({s}) > 0}, α→+∞
where Pρ ({r}) and Pσ ({s}) are the spectral projections of ρ and σ corresponding to r (old) and s, respectively. If supp ρ * supp σ then obviously D∞ (ρkσ) = +∞. In the case (new) of Dα , it was shown in [40] that (new) D∞ (ρkσ) := lim Dα(new) (ρkσ) = Dmax (ρkσ) := log inf{γ : ρ ≤ γσ}, α→+∞
(new)
where Dmax is the max-relative entropy [16, 48]. The limit D0 (old) is in general different from D0 (ρkσ); see, e.g., [7, 17].
(new)
(ρkσ) := limα→0 Dα
(ρkσ)
According to the Araki-Lieb-Thirring inequality [3, 33], for any positive semidefinite operators A, B, Tr Aα B α Aα ≤ Tr(ABA)α (4) 4
for α ∈ (0, 1), and the inequality holds in the converse direction for α > 1. A converse to the Araki-Lieb-Thirring inequality was given in [6], where it was shown that 1−α Tr(ABA)α ≤ kBkα Tr A2α (Tr Aα B α Aα )α (5)
for α ∈ (0, 1), and the inequality holds in the converse direction for α > 1. Applying 1−α 1 (4) and (5) to A := ρ 2 and B := σ α , we get 1 1−α 1 α α 2 (6) ≤ kσk(1−α) (Tr ρα )1−α Tr ρα σ 1−α Tr ρα σ 1−α ≤ Tr ρ 2 σ α ρ 2 for α ∈ (0, 1), and the inequalities hold in the converse direction for α > 1. This immediately yields the following: lemma 3.3. For any ρ, σ ∈ B(H)+ and α ∈ [0, +∞) \ {1}, Dα(old) (ρkσ) ≥ Dα(new) (ρkσ) ≥ αDα(old) (ρkσ) + log Tr ρ − log Tr ρα + (α − 1) log kσk . (7) Remark 3.4. The first inequality in (7) has already been noted in [56] for α > 1. (old)
It is straightforward to verify that Dα limit α → 1; i.e., for any ρ, σ ∈ B(H)+ , ( D1 (ρkσ) := lim Dα(old) (ρkσ) = α→1
yields Umegaki’s relative entropy in the
1 Tr ρ
Tr ρ(log ρ − log σ), supp ρ ⊆ supp σ, +∞, otherwise.
(8)
This, together with lemma 3.3, yields immediately the following: Corollary 3.5. For any two non-zero positive semidefinite operator ρ, σ, lim Dα(new) (ρkσ) = D1 (ρkσ) .
(9)
α→1
(old)
Taking into account (8)–(9) and Remark 3.2, we finally have the definitions of Dα (new) and Dα for every parameter value α ∈ [0, +∞].
Remark 3.6. The limit relation (9) has been shown [40],and in [56] for α ց 1, by 1 in1−α 1 α explicitly computing the derivative of α 7→ log Tr ρ 2 σ α ρ 2 at α = 1.
It is easy to see (by computing its second derivative) that ψ (old) (α) := log Tr ρα σ 1−α (old) is a convex function of α, which yields immediately that Dα (ρkσ) is a monotonic increasing function of α for any fixed ρ and σ. The following Proposition, due to [53] and [54], complements this monotonicity property around α = 1, and in the same time gives a quantitative version of (8): Proposition 3.7. Let ρ, σ ∈ B(H)+ be such that n suppoρ ⊆ supp σ, let η := 1 + c 3/2 −1/2 1/2 1/2 . Then Tr ρ σ + Tr ρ σ , let c > 0, and δ := min 21 , 2 log η D1 (ρkσ) ≥ Dα(old) (ρkσ) ≥ D1 (ρkσ) − 4(1 − α)(log η)2 cosh c,
1 − δ < α < 1,
D1 (ρkσ) ≤ Dα(old) (ρkσ) ≤ D1 (ρkσ) − 4(1 − α)(log η)2 cosh c,
1 < α < 1 + δ.
5
(new)
The new Rényi divergences Dα (ρkσ) are also monotonic increasing in α, as was shown in in Theorem 6 of [40] (see also [39] for a different proof for the case α > 1). Combining Proposition 3.7 with lemma 3.3, we obtain the following: Corollary 3.8. In the setting of Proposition 3.7, we have D1 (ρkσ) ≥ Dα(new) (ρkσ) ≥αD1 (ρkσ) − 4α(1 − α)(log η)2 cosh c + log Tr ρ − log Tr ρα + (1 − α) log kσk−1 , 1 − δ < α < 1, D1 (ρkσ) ≤ Dα(new) (ρkσ) ≤D1 (ρkσ) − 4(1 − α)(log η)2 cosh c,
1 < α < 1 + δ.
Remark 3.9. The inequalities in the second line above have already appeared in [56]. Finally, we consider Lemma 3.3 in some special cases. Note that the monotonicity of the Rényi divergences in α yields that the Rényi entropies 1 1 Sα (ρ) := −Dα(old) (ρkI) = −Dα(new) (ρkI) = log Tr ρα − log Tr ρ 1−α 1−α are monotonic decreasing in α for any fixed ρ, and hence, Tr ρα ≤ (Tr ρ0 )(1−α) (Tr ρ)α
(10)
for every α ∈ (0, 1), and the inequality holds in the converse direction for α > 1. Assume that α ∈ (0, 1). Plugging (10) into (6), we get that for any ρ, σ ∈ B(H)+ , 1 1−α 1 α α (1−α)2 2 α 1−α (Tr ρ)α(1−α) Tr ρα σ 1−α (11) ≤ kσk(1−α) Tr ρ0 Tr ρ σ ≤ Tr ρ 2 σ α ρ 2 for every α ∈ (0, 1). This in turn yields that for every α ∈ (0, 1),
Dα(new) (ρkσ) ≥ αDα(old) (ρkσ) + (1 − α) log Tr ρ − log Tr ρ0 − log kσk .
In particular, if kσk ≤ 1 then
Dα(new) (ρkσ) ≥ αDα(old) (ρkσ) + (1 − α) log Tr ρ − log Tr ρ0 .
(12)
Assume now that α > 1. Then Tr (ρ/ kρk)α ≤ Tr (ρ/ kρk), and plugging it into (7) yields Dα(new) (ρkσ) ≥ αDα(old) (ρkσ) + (α − 1) (log kσk − log kρk) . In particular, if kρk ≤ 1 then Tr σ ≤ kσk Tr σ 0 yields Dα(new) (ρkσ) ≥ αDα(old) (ρkσ) + (α − 1) log Tr σ − log Tr σ 0 .
(13)
Corollary 3.10. Let ρ, σ ∈ S(H) be density operators. For every α ∈ [0, +∞), Dα(old) (ρkσ) ≥ Dα(new) (ρkσ) ≥ αDα(old) (ρkσ) − |α − 1| log(dim H). Proof. Immediate from Lemma 3.3, (12) and (13). Corollary 3.10 together with Proposition 3.7 yield the following version of Corollary 3.8 when ρ and σ are states: Corollary 3.11. Let ρ, σ ∈ S(H) be density operators. With the notations of Proposition 3.7, we have D1 (ρkσ) ≥ Dα(new) (ρkσ) ≥ αD1 (ρkσ) − (1 − α) 4α(log η)2 cosh c + log(dim H) . for every 1 − δ < α < 1.
6
3.2
Convexity properties
The general concavity result in [26, Theorem 2.1] implies as a special case that the quantity 1−α 1−α α 1 1−α 1 α (new) 2α 2α Qα (ρkσ) := Tr σ ρσ = Tr ρ 2 σ α ρ 2 (14)
is jointly concave for α ∈ [1/2, 1). (See also [18] for a different proof of this). In [40, 56], (new) joint convexity of Qα was shown for α ∈ [1, 2], which was later extended in [18], using a different proof method, to all α > 1. That is, if ρi , σi ∈ B(H)+ , i = 1, . . . , r, and γ1 , . . . , γr is a probability distribution on [r] := {1, . . . , r}, then !
X X X 1
≤ α < 1, (15) Q(new) γ ρ γ σ ≥ γi Q(new) (ρi kσi ),
i i i i α α 2 i i i !
X X X
γi ρi γi σi ≤ γi Q(new) (ρi kσi ), 1 < α. (16) Q(new) α α i
i
i
(For the second inequality one also has to assume that supp ρi ⊆ supp σi for all i.) (new) This yields immediately that the Rényi divergences Dα are jointly quasi-convex for α > 1 (see [56] for α ∈ (1, 2]), and jointly convex for α ∈ [1/2, 1) when restricted to {ρ ∈ B(H)+ : Tr ρ = t} × B(H)+ for any fixed t > 0 [18]. Our goal here is to complement these inequalities to some extent. The following lemma is a special case of the famous Rotfel’d inequality (see, e.g., Section 4.5 in [25]). Below we provide an elementary proof for α ∈ [0, 2]. lemma 3.12. The function A 7→ Tr Aα is subadditive on positive semidefinite operators for every α ∈ [0, 1], and superadditive for α ≥ 1. That is, if A, B ∈ B(H)+ then Tr(A + B)α ≤ Tr Aα + Tr B α , Tr(A + B)α ≥ Tr Aα + Tr B α ,
α ∈ [0, 1], 1 ≤ α.
(17) (18)
Proof. We only prove the case α ∈ [0, 2]. Assume first that A and B are invertible and let α ∈ (0, 1). Then Z 1 Z 1 d α α α Tr(A + B) − Tr A = Tr(A + tB) dt = α Tr B(A + tB)α−1 dt dt 0 Z0 1 Z 1 α−1 α ≤ α Tr B(tB) dt = Tr B αtα−1 dt = Tr B α , 0
0
where in the first line we used the identity (d/dt) Tr f (A + tB) = Tr Bf ′ (A + tB), and the inequality follows from the fact that x 7→ xα−1 is operator monotone decreasing on (0, +∞) for α ∈ (0, 1). This proves (17) for invertible A and B, and the general case follows by continuity. The proof for the case α ∈ (1, 2] goes the same way, using the fact that x 7→ xα−1 is operator monotone increasing on (0, +∞) for α ∈ (1, 2]. The case α = 1 is trivial, and the case α = 0 follows by taking the limit α → 0 in (17).
7
Proposition 3.13. Let σ, ρ1 , . . . , ρr ∈ B(H)+ , and γ1 , . . . , γr be a probability distribution on [r]. We have !
X X X
γi Q(new) (ρi kσ) ≤ Q(new) γiα Q(new) (ρi kσ), 0 < α < 1, (19) γi ρi σ ≤ α α α i
X
i
i
γi Q(new) (ρi kσ) ≥ Q(new) α α
i
X
γi ρi σ i
!
≥
X
γiα Q(new) (ρi kσ), α
1 < α.
(20)
i
Moreover, the second inequalities in (19) and (20) are valid for arbitrary non-negative γ1 , . . . , γr with γ1 + . . . + γr > 0. Proof. By lemma 3.12, we have !α ! r r r 1−α 1−α X X X 1−α α 1−α α 1−α 1−α γi ρi σ 2α ≤ Tr σ 2α γi ρi σ 2α = γiα Tr σ 2α ρi σ 2α Tr σ 2α i=1
i=1
i=1
for α ∈ (0, 1), and the inequality is reversed for α > 1, which proves the second inequalities in (19) and (20). The first inequalities follow the same way, by noting that A 7→ Tr Aα is concave for α ∈ (0, 1) and convex for α > 1. Remark 3.14. Note that the first inequality in (20) follows from the joint cconvexity (new) of Qα , and the first inequality in (19) can be obtained from the joint concavity of (new) for 1/2 ≤ α < 1; however, not for the range 0 < α < 1/2, where joint concavity Qα fails [40]. Corollary 3.15. Let σ, ρ1 , . . . , ρr ∈ B(H)+ , and γ1 , . . . , γr be a probability distribution on [r]. For every α ∈ [0, +∞], ! r
X
min Dα(new) (ρi kσ) + log min γi ≤ Dα(new) γi ρi σ ≤ max Dα(new) (ρi kσ) . i
i
i=1
i
Proof. We prove the inequalities for α ∈ (1, +∞); the proof for α ∈ (0, 1) goes exactly the same way, and the cases α = 0, 1, +∞ follow by taking the corresponding limit in α. By the first inequality in (20), we have
!
(new) P P r (new)
Q γ ρ
σ X α i i i γi Qα (ρi kσ) 1 1
(new) i P P Dα γi ρi σ = log log ≤ α−1 α−1 i γi Tr ρi i γi Tr ρi i=1 (new)
Qα (ρi kσ) 1 ≤ log min , i α−1 Tr ρi
proving the second inequality of the assertion. The second inequality in (20) yields
!
(new) P P α (new) r
Q γ ρ X α i i i σ γi Qα (ρi kσ) 1 1
(new) P log log i P . ≥ Dα γi ρi σ = α − 1 Tr α − 1 γ Tr ρ γ ρ i i i i i i i=1 8
We have γiα Q(new) α
(ρi kσ) ≥
(γi Tr ρi )γiα−1
(new) (new) γjα Qα (ρj kσ) Qα (ρj kσ) α−1 min ≥ γi Tr ρi min γj min , α j j j γj Tr ρj Tr ρj
and summing over i yields that P α (new) (new) Qα (ρj kσ) 1 (ρi kσ) 1 α i γi QP + log min γj , ≥ log log min j j α−1 Tr i γi ρi α−1 Tr ρj as required.
Remark 3.16. Note that the inequalities in (15) and (16) express joint concavity/convexity, whereas in the complements given in Proposition 3.13 and Corollary 3.15 we only took a convex combination in the first variable and not in the second. It is easy to see that this restriction is in fact necessary. Indeed, let ρ1 := σ2 := |xihx| and ρ2 := σ1 := |yihy|, where x andPy are orthogonal unit vectors in some Hilbert space. If P we choose γ1 = γ2 = 1/2 then i γi ρi = i γi σi , and hence ! r r
X X
Dα(new) γi σi = 0, while Dα(new) (ρ1 kσ1 ) = Dα(new) (ρ2 kσ2 ) = +∞, γi ρi i=1
i=1
P
r (new) (new) Pr γ σ ≥ c1 mini Dα (ρi kσi )− and hence no inequality of the form Dα γ ρ
i i i i i=1 i=1 c2 can hold for any positive constants c1 and c2 . The quantity Q(old) (ρkσ) := Tr ρα σ 1−α α is jointly concave for α ∈ (0, 1) according to Lieb’s concavity theorem [32], and jointly convex for α ∈ (1, 2] according to Ando’s convexity theorem [1]; see also [47] for a different proof of both. That is, if ρi , σi ∈ B(H)+ , i = 1, . . . , r, and γ1 , . . . , γr is a probability distribution on [r] := {1, . . . , r}, then !
X X X
Q(old) γ σ ≥ γi Q(old) (ρi kσi ), 0 ≤ α < 1, γ ρ
i i i i α α i
Q(old) α
X i
i
X
γi σi γi ρi i
i
!
≤
X
γi Q(old) (ρi kσi ), α
1 < α ≤ 2.
i
(For the second inequality, one has to assume that supp ρi ⊆ supp σ for all i.) Note the difference in the ranges of joint convexity/concavity as compared to (15) and (16). (old) This yields immediately that Dα is jointly convex for α ∈ (0, 1) when restricted to {ρ ∈ B(H)+ : Tr ρ = t} × B(H)+ for any fixed t > 0, and it is jointly quasiconvex for α ∈ (1, 2]. Moreover, it is convex in its second argument for α ∈ (1, 2], according to Theorem II.1 in [38]; see also Proposition 1.1 in [2]. It is not clear whether a subadditivity argument can be used to complement the above concavity/convexity (new) (new) properties. However, one can use the bounds for Qα and Dα together with lemma 3.3 to obtain the following: 9
Proposition 3.17. Let σ, ρ1 , . . . , ρr ∈ B(H)+ , and γ1 , . . . , γr be a probability distribution on [r]. We have ! X X 2 (21) Q(old) γ ρ kσ ≤ γiα Q(old) (ρi kσ)α kσk(1−α) (Tr ραi )1−α i i α α i
i
for α ∈ (0, 1), and the inequality holds in the converse direction for α > 1. As a conseqence, ! X Tr ρi (old) (old) Dα γi ρi kσ ≥ α min Dα (ρi kσ) + (α − 1) log kσk + log min γi (22) i i Tr ραi i for all α ∈ (0, +∞) \ {1}. Proof. The inequality in (21) is immediate from (6) and Proposition 3.13. The same argument as in the proof of Corollary 3.15 yields (22). Remark 3.18. For α ∈ (0, 1), we can use (10) to further bound the RHS of (22) from below and get ! X (old) Dα γi ρi kσ i
≥ α min Dα(old) (ρi kσ) + (α − 1) log kσk + log min γi (Tr ρi )1−α (Tr ρ0i )α−1 . i
3.3
i
(23)
Rényi capacities
By a channel W we mean a map W : X → S(H), where X is some input alphabet (which can be an arbitrary non-empty set) and H is a finite-dimensional Hilbert space. We recover the usual notion of a quantum channel when X = S(K) for some Hilbert space K, and W is a completely positive trace-preserving linear map. For an input alphabet X , let {δx }x∈X be a set of rank-1 orthogonal projections in some Hilbert space HX , and for every channel W : X → S(H) define ˆ : x 7→ δx ⊗ Wx , W which is a channel from X to S(HX ⊗H). Let Pf (X ) denote the set of finitely supported ˆ can naturally be extended to convex probability measures on X . The channels W and W ˆ : Pf (X ) → S(HX ⊗ H), as maps W : Pf (X ) → S(H) and W X X X ˆ (p) := ˆ (p) = W (p) := p(x)W (x), W p(x)W p(x)δx ⊗ W (x). x∈X
x∈X
x∈X
ˆ (p) is a classical-quantum state, and the marginals of W ˆ (p) are given by Note that W X ˆ (p) = pˆ := ˆ (p) = W (p). TrH W p(x)δx and TrHX W x
10
Let D be a function on pairs of positive semidefinite operators. For a channel W : X → S(H), we define its corresponding D-capacity as χ ˆD (W ) := sup χD (W, p), p∈Pf (X )
where ˆ (p)kˆ χD (W, p) := inf D W p⊗σ , σ∈S(H)
(old)
p ∈ Pf (X ).
(new)
For the cases D = Dα and D = Dα , we use the shorthand notations χ(old) (W, p), α (new) (new) χˆ(old) (W ) and χ (W, p), χ ˆ (W ), respectively. Note that these quantities generalα α α ize the Holevo quantity (new) ˆ (p)kˆ χ(W, p) := χ(old) (W, p) = χ (W, p) = inf D W p ⊗ σ 1 1 1 σ∈S(H) ˆ (p)kˆ = D1 W p ⊗ W (p) (24) and the Holevo capacity
(25)
χ(W ˆ ) := sup χ(W, p), p∈Pf (X )
and hence we refer to them as generalized Holevo quantities for a general D, and generalized α-Holevo quantities for the α-divergences. As it was pointed out in [31, 52], α ˆ (p)kˆ p⊗σ = Dα(old) W log Tr ω(p) + Dα(old) (¯ ω (W, p)kσ) (26) α−1
for any state σ, where
ω ¯ (W, p) := ω(W, p)/ Tr ω(W, p),
ω(W, p) :=
X
(old)
.
(27)
! α1
.
(28)
p(x)W (x)α
x
Since Dα
! α1
is non-negative on pairs of density operators, we get
X α α χ(old) (W, p) = log Tr ω(p) = log Tr p(x)W (x)α α α−1 α−1 x
However, no such explicit formula is known for χ(new) (W, p). α 0 0 ˆ (p) , Tr(ˆ Note that max{Tr W p ⊗ σ) } ≤ | supp p| dim H, where | supp p| denotes the cardinality of the support of p, and Lemma 3.3 with (12) and (13) yields that χ(old) (W, p) ≥ χ(new) (W, p) ≥ αχ(new) (W, p) − |α − 1| log (| supp p| dim H) α α α
(29)
for every α ∈ (0, +∞). A more careful application of (12) and (13) yields the following improved bound: 11
lemma 3.19. Let W : X → S(H) be a channel, and α ∈ (0, +∞). For any p ∈ Pf (X ) and any σ ∈ S(H), we have ˆ (p)kˆ ˆ (p)kˆ Dα(new) W p ⊗ σ ≥ αDα(old) W p ⊗ σ − |α − 1| log(dim H),
and hence,
χ(old) (W, p) ≥ χ(new) (W, p) ≥ αχ(old) (W, p) − |α − 1| log(dim H). α α α α 1−α 1 1 2 α 2 Proof. Assume that α > 1. By Corollary 3.10 we have Tr W (x) σ W (x) ≥ 2
α
(dim H)−(α−1) (Tr W (x)α σ 1−α ) for every x ∈ X , and hence, X 1−α 1 1 α 1 ˆ (p)kˆ Dα(new) W p⊗σ = log p(x) Tr W (x) 2 σ α W (x) 2 α−1 x X α 1 log p(x) Tr W (x)α σ 1−α ≥ −(α − 1) log(dim H) + α−1 x !α X 1 log p(x) Tr W (x)α σ 1−α ≥ −(α − 1) log(dim H) + α−1 x ˆ (p)kˆ = −(α − 1) log(dim H) + αDα(old) W p⊗σ ,
where the second inequality is due to the convexity of x 7→ xα . The proof for α ∈ (0, 1) goes exactly the same way.
Monotonicity of the Rényi divergences in α yields that the corresponding quantities χα (W, p) and χ(new) (W, p) are also monotonic increasing in α. A simple minimax α argument shows (see, e.g. [38, Lemma B.3]) that (old)
lim χ(old) (W, p) = χ(W, p), α
α→1
(30)
where χ(W, p) is the Holevo quantity. This, together with lemma 3.19 yields that also lim χ(new) (W, p) = χ(W, p). α
α→1
Moreover, it was shown in [38, Proposition B.5] that if ran W := {W (x) : x ∈ X } is compact then lim χˆ(old) (W ) = χ(W ˆ ). α α→1
Applying lemma 3.19 to this, we obtain lim χ ˆ(new) (W ) = χ(W ˆ ). α
α→1
(31)
Remark 3.20. Carathéodory’s theorem and the explicit formula (28) imply that in the definition χ ˆ(old) (W ) := supp∈Pf (X ) χ(old) (W, p) it is enough to consider probability α α distributions with | supp p| ≤ (dim H)2 + 1. However, this is not known for χˆ(new) (W ), α and hence (29) is insufficient to derive (31). 12
Finally, we point out a connection between α-capacities and a special case of a famous convexity result by Carlen and Lieb [12, 13]. For any finite-dimensional Hilbert space H and A1 , . . . , An ∈ B(H)+ , define !q/α 1/q n X , Φα,q (A1 , . . . , An ) := Tr Aαi α ≥ 0, q > 0. i=1
Theorem 1.1 in [13] says that for any finite-dimensional Hilbert space H, Φα,q is concave on (B(H)+ )n for 0 ≤ α ≤ q ≤ 1, and convex for all 1 ≤ α ≤ 2 and q ≥ 1. Below we give an elementary proof of the following weaker statement: Φαα,1 is concave for α ∈ (0, 1) and convex for α ∈ (1, 2]. For a set X , a finitely supported non-negative function p : X → R+ , and a finiteˆ p,H,α : (B(H)+ )X → R+ be defined as dimensional Hilbert space H, let Φ !1/α α X ˆ p,H,α(W ) := Tr , Φ p(x)W (x)α W ∈ (B(H)+ )X . x∈X
The following Proposition is equivalent to our assertion:
ˆ p,H,α is concave on (B(H)+ )X for α ∈ (0, 1) Proposition 3.21. For any X , p and H, Φ and convex for α ∈ (1, 2]. Proof. Exactly the same way as in (26)–(28), we can see that ! α1 X α ˆ (p)kˆ = min Dα(old) W p⊗σ . log Tr p(x)W (x)α σ∈S(H) α−1 x
(32)
Assume for the rest that α ∈ (1, 2]; the proof for the case α ∈ (0, 1) goes exactly the same way. Let r ∈ N, W1 , . . . , Wr ∈ (B(H)+ )X , and γ1 , . . . , γr be a probability distribution. Then ! !
X X ˆ p,H,α ˆ (p) Φ γi Wi = min Q(old) γi W
pˆ ⊗ σ σ∈S(H)
i
=
≤ =
i
min
Q(old) α
min
X
σ1 ,...,σr ∈S(H)
σ1 ,...,σr ∈S(H)
X i
=
α
X
X i
γi Q(old) α
i
γi min Q(old) α σi
X ˆ (p) γi σi γi W
pˆ ⊗ i ˆ (p)kˆ W p ⊗ σi
!
ˆ W (p)kˆ p ⊗ σi
ˆ p,H,α (Wi ) , γi Φ
i
where the first and the last identities are due to (32), and the inequality follows from the (old) joint convexity of Qα [1, 47]. (In the case α ∈ (0, 1), we have to use joint concavity [32, 47].) 13
4 4.1
Applications to coding theorems Preliminaries
For a self-adjoint operator X, we will use the notation {X > 0} to denote the spectral projection of X corresponding to the positive half-line (0, +∞). The spectral projections {X ≥ 0}, {X < 0} and {X ≤ 0} are defined similarly. The positive part X+ and the negative part X− are defined as X+ := X{X > 0} and X− := X{X < 0}, respectively, and the absolute value of X is |X| := X+ +X− . The trace-norm of X is kXk1 := Tr |X|. The following lemma is Theorem 1 from [4]; see also Proposition 1.1 in [28] for a simplifed proof. lemma 4.1. Let A, B be positive semidefinite operators on the same Hilbert space. For any t ∈ [0, 1], Tr A(I − {A − B > 0}) + Tr B{A − B > 0} =
1 1 Tr(A + B) − kA − Bk1 ≤ Tr At B 1−t . 2 2
The next lemma is a reformulation of Lemma 2.6 in [34]. We include the proof for readers’ convenience. lemma 4.2. Let (V, k.k) be a finite-dimensional normed vector space, and let D denote its real dimension. Let N ⊂ V be a subset. For every δ > 0, there exists a finite subset Nδ ⊂ N such that 1. |Nδ | ≤ (1 + 2/δ)D , and 2. for every v ∈ N there exists a vδ ∈ Nδ such that kv − vδ k < δ. Proof. For every δ > 0, let Nδ be a maximal set in N such that kv − v ′ k ≥ δ for every v, v ′ ∈ Nδ ; then Nδ clearly satisfies 2. On the other hand, the open k k-balls with radius δ/2 around the elements of Nδ are disjoint, and contained in the k k-ball with radius 1 + δ/2 and origin 0. Since the volume of balls scales with their radius on the power D, we obtain 1. The fidelity of positive semidefinite operators A and B is defined as F (A, B) := 1/2 Tr A1/2 BA1/2 . The entanglement fidelity of a state ρ and a completely positive trace-preserving map Φ is Fe (ρ, Φ) := F (|ψρ ihψρ |, (id ⊗Φ)|ψρ ihψρ |), where ψρ is any purification of the state ρ; see Chapter 9 in [43] for details.
4.2
Quantum Stein’s lemma with composite null-hypothesis
Consider the asymptotic hypothesis testing problem with null-hypothesis H0 : Nn ⊂ S(Hn ) and alternative hypothesis H1 : σn ∈ S(Hn ), n ∈ N, where Hn is some finitedimensional Hilbert space. Our goal is to decide between these two hypotheses based on the outcome of a binary POVM (Tn (0), Tn (1)) on Hn , where 0 and 1 indicate the acceptance of H0 and H1 , respectively. Since Tn (1) = I − Tn (0), the POVM is uniquely determined by Tn = Tn (0), and the only constraint on Tn is that 0 ≤ Tn ≤ In . We will 14
call such operators tests. Given a test Tn , the probability of mistaking H0 for H1 (type I error) and the probability of mistaking H1 for H0 (type II error) are given by αn (Tn ) := sup Tr ρn (I − Tn ), (type I),
and
ρn ∈Nn
βn (Tn ) := Tr σn Tn , (type II).
Definition 4.3. We say that a rate R ≥ 0 is achievable if there exists a sequence of tests Tn , n ∈ N, with lim αn (Tn ) = 0
n→+∞
and
lim sup n→+∞
1 log βn (Tn ) ≤ −R. n
The largest achievable rate R({Nn }n∈N k{σn }n∈N ) is the direct rate of the hypothesis testing problem. For what follows, we assume that Hn = H⊗n , n ∈ N, where H = H1 , and that the alternative hpothesis is i.i.d., i.e., σn = σ ⊗n , n ∈ N, with σ = σ1 . We say that the nullhypothesis is composite i.i.d. if there exists a set N ⊂ S(H) such that for all n ∈ N, Nn = N (⊗n) := {ρ⊗n : ρ ∈ N }. The null-hypothesis is simple i.i.d. if N consists of one single element, i.e., N = {ρ} for some ρ ∈ S(H). According to the quantum Stein’s lemma [22, 46], the direct rate in the simple i.i.d. case is given by D1 (ρkσ). The case of the general composite null-hypothesis was treated in [9] under the name of quantum Sanov theorem. There it was shown that there exists a sequence of tests {Tn }n∈N such that limn→+∞ Tr ρ⊗n (I − Tn ) = 0 for every ρ ∈ N , and lim supn→+∞ n1 log βn (Tn ) ≤ −D1 (N kρ), where D1 (N kρ) := inf ρ∈N D1 (ρkσ). Note that this is somewhat weaker than D1 (N kρ) being achievable in the sense of Definition 4.3. Achievability in this stronger sense has been shown very recently in [44], using the representation theory of the symmetric group and the method of types. The proof in both papers followed the approach in [22] of reducing the problem to a classical hypothesis testing problem by projecting all states onto the commutative algebra generated by {σ ⊗n }n∈N . Below we use a different proof technique to show that D1 (N kρ) is achievable in the sense of Defintion 4.3. Our proof is based solely on Audenaert’s trace inequality [4] (new) and the subadditivity property of Qα , given in Proposition 3.13. We obtain explicit upper bounds on the error probabilities for any finite n ∈ N for a sequence of NeymanPearson types tests. Moreover, if a δ-net can be explicitly constructed for N for every δ > 0 (this is trivially satisfied when N is finite) then the tests can also be constructed explicitly. In [9], Stein’s lemma was stated with weak converse, while the results of [44] imply a strong converse. Here we use Nagaoka’s method to further strengthen the converse part by giving exlicit bounds on the exponential rate with which the worst-case type I success probability goes to zero when the type II error decays with a rate larger than the optimal rate D1 (N kρ). Note that our proof technique doesn’t actually rely on the i.i.d. assumption, as we demonstrate in Theorem 4.9, where we give achievability bounds in the general correlated scenario. However, in the most general case we have to restrict to a finite nullhypothesis. We show examples in Remark 4.10 where the achievable rate of Theorem 4.9 can be expressed as the regularized relative entropy distance of the null-hypothesis and the alternative hypothesis, giving a direct generalization of the i.i.d. case. These 15
results complement those of [10], where it was shown that if Θ is a set of ergodic states on a spin chain, and Φ is a state on the spin chain such that for every Ψ ∈ Θ, Stein’s lemma holds for the simple hypothesis testing problem H0 : Ψ, H1 : Φ, then it also holds for the composite hypothesis testing problem H0 : Θ, H1 : Φ. This was also extended in [10] to the case where Θ consists of translation-invariant states, using ergodic decomposition. Now let N ⊂ S(H) be a non-empty set of states, and let σ ∈ B(H)+ be a positive semidefinite operator such that supp ρ ⊆ supp σ,
ρ ∈ N.
(33)
Note that in hypothesis testing σ is usually assumed to be a state on H; however, the proof for Stein’s lemma works the same way for a general positive semidefinite σ, and considering this more general case is actually useful e.g., for state compression. Let (new)
ψ(t) := sup log Qt
(ρkσ),
(34)
t > 0,
ρ∈N
and for every a ∈ R, let ϕ(a) := sup {at − ψ(t)},
ϕ(a) ˆ := sup {a(t − 1) − ψ(t)} = ϕ(a) − a.
0 0, and N (n) := Nδn ⊂ N as in lemma 4.2. Then 1 lim sup log αn (Sn,a ) ≤ − min{κ, ϕ(a) ˆ − κD(H)}, (38) n→+∞ n 1 lim sup log βn (Sn,a ) ≤ −(ϕ(a) − κD(H)). (39) n→+∞ n P Proof. For every n ∈ N, let ρ¯n := ρ∈N (n) ρ⊗n , σn := σ ⊗n . Applying lemma 4.1 to A := e−na ρ¯n and B := σn for some fixed a ∈ R, we get en (a) := e−na Tr ρ¯n (I − Sn,a ) + Tr σn Sn,a ≤ e−nat Tr ρ¯tn σn1−t
(40)
for every t ∈ [0, 1]. This we can further upper bound as X (new) (new) (new) ρ⊗n kσ ⊗n ρ⊗n kσ ⊗n ≤ |N (n)| sup Qt Tr ρ¯tn σn1−t ≤ Qt (¯ ρn kσn ) ≤ Qt ρ∈N
ρ∈N (n)
n (new) (ρkσ) = |N (n)|enψ(t) , = |N (n)| sup Qt ρ∈N
16
(41)
where the first inequality is due to lemma 3.3, the second inequality is due to (19), the third inequality is obvious, the succeeding identity follows from the definition (14), and the last identity is due to the definition of ψ. Since (40) holds for every t ∈ (0, 1], together with (41) it yields en (a) ≤ |N (n)|e−nϕ(a) . Hence we have Tr σn Sn,a ≤ en (a) ≤ |N (n)|e−nϕ(a) , proving (36). Similarly, Tr ρ¯n (I − Sn,a ) ≤ ena en (a) yields ˆ sup Tr ρ⊗n (I − Sn,a ) ≤ Tr ρ¯n (I − Sn,a ) ≤ ena |N (n)|e−nϕ(a) = |N (n)|e−nϕ(a) .
(42)
ρ∈N (n)
The submultiplicativity of the trace-norm on tensor products yields that supρ∈N Tr ρ⊗n (I− Sn,a ) ≤ supρ∈N (n) Tr ρ⊗n (I − Sn,a ) + nδ(N (n))). Combined with (42), this yields (37). The inequalities in (38)–(39) are obvious from the choice of δn . lemma 4.5. We have ϕ(a) ≥ a, and for every a < D1 (N kσ), we have ϕ(a) ˆ > 0. (new)
Proof. Note that for any t ∈ (0, 1), a(t − 1) − ψ(t) = (t − 1)[a − inf ρ∈N Dt (ρkσ)]. (new) Moreover, by the assumption in (33), ρ 7→ Dt (ρkσ) is continuous on N , and hence, (new) (new) (ρkσ) for every t ∈ (0, 1). Note that N is cominf ρ∈N Dt (ρkσ) = minρ∈N Dt (new) (ρkσ) is monotone increasing, due to [40, pact, and for every ρ ∈ N , t 7→ Dt Theorem 6]. Applying now the minimax theorem from [38, Corollary A.2], we get (new) (new) (new) supt∈(0,1) minρ∈N Dt (ρkσ) = minρ∈N supt∈(0,1) Dt (ρkσ) = minρ∈N D1 (ρkσ) = D1 (N kσ). Thus, for any a < D1 (N kσ), there exists a ta ∈ (0, 1) such that a − (new) (new) inf ρ∈N Dta (ρkσ) < 0, and hence 0 < (ta − 1)[a − inf ρ∈N Dta (ρkσ)] ≤ ϕ(a). ˆ Finally, note that assumption (33) yields that ψ(1) = 0, and hence ϕ(a) ≥ a − ψ(1) = a. Theorem 4.6. The direct rate is lower bounded by D1 (N kσ), i.e., R({N (⊗n) }n∈N k{σ ⊗n }n∈N ) ≥ D1 (N kσ).
(43)
Proof. The proposition is trivial when D1 (N kσ) = 0, and hence for the rest we assume D1 (N kσ) > 0. By lemma 4.5, for every 0 < a < D1 (N kσ) we can find 0 < κ < ϕ(a)/D(H), so that (38)–(39) hold. Since we can take κ arbitrarily small, and a arbitrarily close to D1 (N kσ), we see that any rate below sup01 Dt (ρkσ) = D1 (N kσ), we see that (new) if r > D1 (N kσ) then there exists a t > 1 such that −r + inf t>1 inf ρ∈N Dt (ρkσ) < 0, and hence the RHS of (44) is strictly negative. The rest of the statements follow immediately. Remark 4.8. Theorem 4.6 shows the existence of a sequence of tests such that the type II error probability decays exponentially fast with rate D1 (N kσ), while the type I error probability goes to zero. Note that for this statement, it is enough to choose δn polynomially decaying; e.g. δn := 1/n2 does the job, and we get an improved exponent for the type II error, lim supn→+∞ n1 log βn (Sn,a ) ≤ −ϕ(a). Theorem 4.4 yields more detailed information in the sense that it shows that for any rate r below the optimal rate D1 (N kσ), there exists a sequence of tests along which the type II error decays with the given rate r, while the type I error also decays exponentially fast; moreover, (38)–(39) provide a lower bound on the rate of the type I error. Note that if N is finite then the approximation process can be omitted, and we obtain the bounds 1 1 lim sup log αn (Sn,a ) ≤ −ϕ(a), ˆ lim sup log βn (Sn,a ) ≤ −ϕ(a). n→+∞ n n→+∞ n These bounds are not optimal; indeed, in the simple i.i.d. case the quantum Hoeffding bound theorem [5, 21, 23, 41] shows that the above inequalities become equalities with (old) ϕ and ϕˆ replaced by ϕ(old) (a) := sup0 ϕ(a) and ϕˆ(old) (a) > ϕ(a) ˆ for any 0 < a < D1 (ρkσ), due to the Araki-Lieb-Thirring inequality [3, 33]. On the other hand, the RHS of (44) is known to give the exact strong converse rate in the simple i.i.d. case [39]. The above arguments can also be used to obtain bounds on the direct rate in the case of states with arbitrary correlations. In this case, however, it may not be possible to find a suitable approximation procedure, and hence we restrict our attention to the case of finite null-hypothesis. Thus, for every n ∈ N, our alternative hypothesis H1 is given by some state σn ∈ S(Hn ), where Hn is some finite-dimensional Hilbert space, and the null-hypothesis H0 is given by Nn = {ρ1,n , . . . , ρr,n } ⊂ S(Hn ), where r ∈ N is some fixed number. We assume that supp ρi,n ⊆ supp σn for every i and n. 18
Theorem 4.9. In the above setting, we have 1 1 (new) lim sup log αn (Sn,a ) ≤ − sup a(t − 1) − max lim sup log Qt (ρi,n kσn ) , (45) 1≤i≤r n→+∞ n n→+∞ n 0 0, we have 1 log(1 − F (Cn , Dn )) ≤ − min{κ, ϕ(a) ˆ − κD(H)}, n→+∞ n 1 lim sup log Tr [Cn (Nn )] ≤ −ϕ(a) + κD(H). n→+∞ n
lim sup
(52) (53)
On the other hand, for any coding scheme (Cn , Dn ), n ∈ N, we have 1 1 t − 1 lim sup log Fˆ (Cn , Dn ) ≤ inf lim sup log Tr [Cn (Nn )] − sup St (ρ) . t>1 t n→+∞ n n→+∞ n ρ∈N where St (ρ) :=
1 1−t
log Tr ρt is the Rényi entropy of ρ with parameter t.
Corollary 4.12. The optimal compression rate is equal to the maximum entropy, i.e., [⊗n]
R({Nn∈N }) = sup S(ρ). ρ∈N
Remark 4.13. We recover the result of [30] by choosing N := {ρ ∈ S(H) : S(ρ) ≤ s}. Remark 4.14. Theorem 4.11 and Corollary 4.12 can be extended to correlated states and averaged states the same way as the analogous results for state discrimination in Section 4.2. Since these extensions are trivial, we omit the details.
21
Remark 4.15. The simple i.i.d. state compression problem can also be formulated in an ensemble setting, which is in closer resemblance with the usual formulation of classical source coding. In that formulation, a discrete i.i.d. quantum information source is specified by a finite set {ρx }x∈X ⊂ S(H) of states and a probability distribution p on X . Invoking the source n times, we obtain a state ρx := ρx1 ⊗ . . . ⊗ ρxn with probability px := p(x1 )·. . .·p(xn ). The P fidelity of a compression-decompression pair (Cn , Dn ) is then defined as F (Cn , Dn ) := x∈X p(x)Fe (ρx , Dn ◦ Cn ). In the classical case the signals ρx can be identified with a system of orthogonal rank 1 projections, Cn and Dn are classical stochastic maps, and F (Cn , Dn ) as defined above gives back the usual expression for the success probability. It follows from standard properties of the fildelity that the optimal compression rate, under the constraint P that F (Cn , Dn ) goes to 1 asymptotically, only depends on the average state ρ(p) := x p(x)ρx , and is equal to S(ρ(p)). Theorem 4.11 and Corollary 4.12 thus also provide the optimal compression rate and exponential bounds on the error and success probabilities in the ensemble formulation, for multiple quantum sources.
4.4
Classical capacity of compound channels
Recall that by a channel W we mean a map W : X → S(H), where X is some input alphabet (which can be an arbitrary non-empty set) and H is a finite-dimensional Hilbert space. For a channel W : X → S(H), we define its n-th i.i.d. extension W ⊗n as the channel W ⊗n : X n → S(H⊗n ), defined as W ⊗n (x1 , . . . , xn ) := W (x1 ) ⊗ . . . ⊗ W (xn ), (old)
It is obvious from the explicit formula (28) for χα
x1 , . . . , xn ∈ X .
(54)
n ∈ N,
(55)
that
χ(old) (W ⊗n , p⊗n ) = nχ(old) (W, p), α α
where p⊗n ∈ Pf (X n ) is the n-th i.i.d. extension of p, defined as p⊗n (x1 , . . . , xn ) := p(x1 ) · . . . · p(xn ), x1 , . . . , xn ∈ X . It is not known whether the same additivity property (new) holds for χα . Remark 4.16. Note that in our definition of a channel, we didn’t make any assumption on the cardinality of the input alphabet X , nor did we require any further mathematical properties from W , apart from being a function to S(H). The usual notion of a quantum channel is a special case of this definition, where X is the state space of some Hilbert space and W is a completely positive trace-preserving convex map. In this case, however, our definition of the i.i.d. extensions are more restrictive than the usual definition of the tensor powers of a quantum channel. Indeed, our definition corresponds to the notion of quantum channels with product state encoding. Hence, our definition of the classical capacity below corresponds to the classical capacity of quantum channels with product state encoding. Let Wi : X → S(H), i ∈ I, be a set of channels with the same input alphabet X and the same output Hilbert space H, where I is any index set. A code C = (Ce , Cd ) for {Wi }i∈I consists of an encoding Ce : {1, . . . , M} → X and a decoding Cd : {1, . . . , M} → 22
B(H)+ , where {Cd (1), . . . , Cd (M)} is a POVM on H, and M ∈ N is the size of the code, which we will denote by |C|. The worst-case average error probability of a code C is |C|
1 X pe ({Wi }i∈I , C) := sup Tr Wi (Ce (k))(I − Cd (k)). i∈I |C| k=1 Consider now a sequence W := {Wn }n∈N , where each Wn is a set of channels with input alphabet X n and output space H⊗n . The classical capacity C(W) of W is the (n) (n) (n) with largest number R such that there exists a sequence of codes C = Ce , Cd lim pe (Wn , Cn ) = 0
n→+∞
and
lim inf n→+∞
1 log |Cn | ≥ R. n
We say that W is simple i.i.d. if Wn consists of one single element W ⊗n for every n ∈ N with some fixed channel W . In this case we denote the capacity by C(W ). The Holevo-Schumacher-Westmoreland theorem [27, 51] tells that in this case C(W ) ≥ χ(W ˆ ) = sup χ(W, p),
(56)
p∈Pf (X )
where χ(W, p) is the Holevo quantity (24), and χ(W ˆ ) is the Holevo capacity (25) of the channel. It is easy to see that (56) actually holds as an equality, i.e., no sequence of codes with a rate above supp∈Pf (X ) χ(W, p) can have an asymptotic error equal to zero; this is called the weak converse to the channel coding theorem, while the strong converse theorem [45, 57] says that such sequences of codes always have an asymptotic error equal to 1. Here we will consider two generalizations of the simple i.i.d. case: In the compound i.i.d. case Wn = {Wi⊗n }i∈I for some fixed channels WiP: X → S(H). In the averaged ⊗n ¯ n := i.i.d. case Wn consists of the single element W i∈I γi Wi , where I is finite, and γ is a probability distribution on I. The capacity of finite averaged channels has been shown to be equal to supp∈Pf (X ) mini χ(Wi , p) in [15], and the same formula for the capacity of a finite compound channel follows from it in a straightforward way. The protocol used in [15] to show the achievability was to use a certain fraction of the communication rounds to guess which channel the parties are actually using, and then code for that channel in the remaining rounds. These results were generalized to arbitray index sets I in [11], using a different approach. The starting point in [11] was the following random coding theorem from [19] (for the exact form below, see [37]). Theorem 4.17. Let W : X → S(H) be a channel. For any M ∈ N, and any p ∈ Pf (X ), there exists a code C such that |C| = M and ˆ (p)α (ˆ pe (W, C) ≤ κ(c, α)M 1−α Tr W p ⊗ W (p))1−α for every α ∈ (0, 1) and every c > 0, where κ(c, α) := (1 + c)α (2 + c + 1/c)1−α . Applying the general properties of the Rényi divergences, established in Section 3, together with the single-shot coding theorem of Theorem 4.17, we get a very simple 23
proof of the achievability part of the coding theorems in [15] and [11]. Since our primary interest is the applicability of the inequalities of Section 3, we only consider the achievability part and not the converse. The key step of our approach is the following extension of Theorem 4.17 to multiple channels. Theorem 4.18. Let Wi : X → S(H), i ∈ I, be a set of channels, where I is a finite index set. For every R ≥ 0, every n ∈ N and every p ∈ Pf (X ), there exists a code Cn , n ∈ N, such that for every α ∈ (0, 1), |Cn | ≥ exp(nR),
and
h i pe {Wi⊗n }i∈I , Cn ≤ 8|I|2 exp n(α − 1) α min χ(old) (W , p) − R − (α − 1) log dim(H) . i α
i
(57)
Proof. P Let Mn := ⌈exp(nR)⌉, n ∈ N and γi := 1/|I|, i ∈ I. Applying Theorem 4.17 to ¯ Wn = i∈I γi Wi⊗n , Mn and p⊗n , we get the existence of a code Cn with |Cn | = Mn and !
X
¯ n (p⊗n ) ˆ ⊗n (p⊗n ) pˆ⊗n ⊗ W ¯ n , Cn ) ≤ 8Mn1−α Q(old) (58) γi W pe ( W α i
i∈I
for every α ∈ (0, 1). Here we chose c = 1, and used the upper bound κ(1, α) ≤ 8. We can further upper bound the RHS above as !
X ¯ n (p⊗n ) ˆ ⊗n (p⊗n ) Q(old) γi W
pˆ⊗n ⊗ W i
α
i∈I
≤ Q(new) α
X i∈I
≤
X
¯ n (p⊗n ) ˆ ⊗n (p⊗n ) γi W
pˆ⊗n ⊗ W i
γiα Q(new) α
i∈I
≤
X
γiα
i∈I
≤
X i∈I
=
X i∈I
=
X i∈I
γiα
sup
!
⊗n ⊗n ⊗n ⊗n ˆ ¯ Wi (p ) pˆ ⊗ Wn (p )
σ∈S(H⊗n )
sup σ∈S(H⊗n )
ˆ ⊗n (p⊗n ) pˆ⊗n ⊗ σ Q(new) W α i
Q(old) α
α
2 ⊗n ⊗n ⊗n ˆ Wi (p ) pˆ ⊗ σ (dim H⊗n )(α−1)
2 γiα exp α(α − 1)χ(old) (Wi⊗n , p⊗n (dim H)n(α−1) α
2 γiα exp nα(α − 1)χ(old) (Wi , p) (dim H)n(α−1) , α
2 (old) ≤ |I| exp nα(α − 1) min χα (Wi , p) (dim H)n(α−1) i∈I
(59)
(60) (61) (62) (63) (64) (65)
where (59) is due to (7), (60) is due to (19), (61) is trivial, (62) follows from Corollary 3.10, and (64) is due to (55). Note that X 1 ¯ n , Cn ) = 1 pe ( W pe (Wi⊗n , Cn ) ≥ sup pe (Wi⊗n , Cn ). (66) |I| i∈I |I| i∈I Combining (58), (65), and (66), we get (57).
24
Corollary 4.19. Let Wi : X → S(H), i ∈ I := {1, . . . , r}, be a set of channels, and let γ1 , . . . , γr be a probability distribution on I with strictly positive weights. Then nX o ⊗n C γi Wi = C {Wi⊗n : i ∈ I}n∈N ≥ sup min χ(Wi , p). (67) i
n∈N
p∈Pf (X )
i
Proof. Let R < mini χ(Wi , p), and for every n ∈ N, let Cn be a code as in Theorem 4.18. Then lim inf n→∞ n1 log |Cn | ≥ R, and 1 lim sup log pe {Wi⊗n }i∈I , Cn ≤ (α − 1) α min χ(old) (W , p) − R − (α − 1) log dim(H) . i α i n→∞ n Note that
(old) lim α min χα (Wi , p) − R − (α − 1) dim(H) = χ(W, p) − R
αր1
i
due to (30), and hence there exists an α0 ∈ (0, 1) such that the upper bound in (57) goes to zero exponentially fast for every α ∈ (α0 , 1). This proves the inequality in (68), and the equality of the two capacities is trivial. When the channels are completely positive trace-preserving affine maps on the state space of a Hilbert space, the above results can be extended to the case of infinitely many channels by a simple approximation argument. It is easy to see that the same argument doesn’t work when the channels can be arbitrary maps on an input alphabet. Note that the classical capacity considered in the theorem below is the product-state capacity. Theorem 4.20. Let Hin and H be finite-dimensional Hilbert spaces, and Wi : S(Hin ) → S(H), i ∈ I, be completely positive trace-preserving affine maps, where I is an arbitrary index set. Then C {Wi⊗n : i ∈ I}n∈N ≥ sup inf χ(Wi , p). (68) p∈Pf (X ) i
Proof. We assume that supp∈Pf (X ) inf i χ(Wi , p) > 0, since otherwise the assertion is trivial. Let V be the vector space of linear maps from B(Hin ) to B(H), equipped with the norm kΦk := sup{kΦ(X)k1 : kXk1 ≤ 1}, and let D denote the real dimension of V . Let κ > 0, and for every n ∈ N, let I(n) be a finite index set such that |I(n)| ≤ (1 + 2enκ )D and δn := supi∈I inf j∈I(n) kWi − Wj k ≤ e−nκ . The existence of such index sets is guaranteed by lemma 4.2. Let p ∈ Pf (S(Hin )) be such that inf i χ(Wi , p) > 0, and for every n ∈ N, let Cn be a code as in Theorem 4.18, with I(n) in place of I. It is easy to see that pe {Wi⊗n }i∈I(n) , Cn ≥ pe {Wi⊗n }i∈I , Cn − nδn , and hence we have pe {Wi⊗n }i∈I , Cn
(old) ≤ 8|I(n)| exp n(α − 1) α inf χα (Wi , p) − R − (α − 1) log dim(H) + ne−nκ . 2
i∈I
25
(old)
Let 0 < R < inf i∈I χα (Wi , p). By the same argument as in the proof of Corollary 4.19, (old) there exists an α ∈ (0, 1) such that ϕ := α inf i∈I χα (Wi , p) − R − (α − 1) log dim(H) > 0. Choosing then κ such that 2κD/(1 − α) < ϕ, we see that the error probability goes to zero exponentially fast, while the rate is at least R. This shows that C {Wi⊗n : i ∈ I}n∈N ≥ inf i χ(Wi , p), and taking the supremum over p yields the assertion.
Acknowledgment The author is grateful to Professor Fumio Hiai and Nilanjana Datta for discussions. This research was supported by a Marie Curie International Incoming Fellowship within the 7th European Community Framework Programme. The author also acknowledges support by the European Research Council (Advanced Grant “IRQUAT”). Part of this work was done when the author was a Marie Curie research fellow at the School of Mathematics, University of Bristol.
References [1] T. Ando: Concavity of certain maps and positive definite matrices and applications to Hadamard products; Linear Algebra Appl. 26, 203–241 (1979) [2] T. Ando, F. Hiai: Operator log-convex functions and operator means Math. Annalen [3] H. Araki: On an inequality of Lieb and Thirring; Letters in Mathematical Physics; Volume 19, Issue 2, pp. 167–170, (1990) [4] K.M.R. Audenaert, J. Calsamiglia, Ll. Masanes, R. Munoz-Tapia, A. Acin, E. Bagan, F. Verstraete.: Discriminating states: the quantum Chernoff bound; Phys. Rev. Lett. 98 160501, (2007) [5] K.M.R. Audenaert, M. Nussbaum, A. Szkoła, F. Verstraete: Asymptotic error rates in quantum hypothesis testing; Commun. Math. Phys. 279, 251–283 (2008) [6] K.M.R. Audenaert: On the Araki-Lieb-Thirring inequality; Int. J. of Information and Systems Sciences 4, pp. 78–83, (2008) [7] Koenraad M.R. Audenaert, Nilanjana Datta: arXiv:1310.7178, (2013)
α-z-relative Renyi entropies,
[8] Salman Beigi: Quantum Rényi divergence satisfies data processing inequality; J. Math. Phys., 54, 122202 (2013) [9] Igor Bjelakovic, Jean-Dominique Deuschel, Tyll Krüger, Ruedi Seiler, Rainer Siegmund-Schultze, Arleta Szkoła: A quantum version of Sanov’s theorem; Communications in Mathematical Physics Vol. 260, Issue 3, pp. 659–671, (2005) 26
[10] Igor Bjelakovic, Jean-Dominique Deuschel, Tyll Krüger, Ruedi Seiler, Rainer Siegmund-Schultze, Arleta Szkoła: Typical support and Sanov large deviations of correlated states; Communications in Mathematical Physics Vol. 279, pp. 559–584, (2008) [11] I. Bjelakovic, H. Boche: Classical capacities of compound and averaged quantum channels; IEEE Trans. Inform. Theory 55, 3360–3374, (2009) [12] E.A. Carlen, E.H. Lieb: A Minkowski type trace inequality and strong subadditivity of entropy; Amer. Math. Soc. Transl. Ser. 2 textbf189, 59–68, (1999) [13] Eric A. Carlen, Elliot H. Lieb: A Minkowski type inequality and strong subadditivity of quantum entropy II: convexity and concavity; Lett. Math. Phys. 83, pp. 107–126, (2008) [14] I. Csiszár: Generalized cutoff rates and Rényi’s information measures; IEEE Trans. Inf. Theory 41, 26–34, (1995) [15] N. Datta, T.C. Dorlas: The Coding Theorem for a Class of Quantum Channels with Long-Term Memory; Journal of Physics A: Mathematical and Theoretical, vol. 40, p. 8147, (2007) [16] N. Datta: Min- and Max-Relative Entropies and a New Entanglement Monotone; IEEE Transactions on Information Theory, vol. 55, no. 6, pp. 2816–2826, (2009). [17] Nilanjana Datta and Felix Leditzky: A limit of the quantum RĂŠnyi divergence; J. Phys. A: Math. Theor. 47, 045304, (2014) [18] Rupert L. Frank and Elliott H. Lieb: Monotonicity of a relative Rényi entropy; arXiv:1306.5358, (2013). [19] M. Hayashi, H. Nagaoka: General Formulas for Capacity of Classical-Quantum Channels; IEEE Trans. Inf. Theory 49, (2003) [20] M. Hayashi: Quantum Information Theory: An Introduction; Springer, (2006). [21] M. Hayashi: Error exponent in asymmetric quantum hypothesis testing and its application to classical-quantum channel coding; Phys. Rev. A 76, 062301, (2007) [22] F. Hiai, D. Petz: The proper formula for relative entropy and its asymptotics in quantum probability; Comm. Math. Phys. 143, 99–114 (1991). [23] F. Hiai, M. Mosonyi, T. Ogawa: Error exponents in hypothesis testing for correlated states on a spin chain; J. Math. Phys. 49, 032112, (2008) [24] F. Hiai, M. Mosonyi, M. Hayashi, Quantum hypothesis testing with group symmetry, J. Math. Phys. 50, 103304 (2009) [25] F. Hiai: Matrix Analysis: Matrix Monotone Functions, Matrix Means, and Majorization (GSIS selected lectures); Interdisciplinary Information Sciences 16, 139– 248 (2010) 27
[26] F. Hiai: Concavity of certain matrix trace and norm functions; Linear Algebra and Appl. 439, 1568–1589, (2013) [27] A.S. Holevo: The capacity of the quantum channel with general signal states; IEEE Transactions on Information Theory, vol. 44, no. 1, pp. 269-273, (1998) [28] V. Jaksic, Y. Ogata, C.-A. Pillet, R. Seiringer: Quantum hypothesis testing and non-equilibrium statistical mechanics; Rev. Math. Phys. 24, no. 6, 1230002, (2012) [29] R. Jozsa, B. Schumacher: A new proof of the quantum noiseless coding theorem; Journal of Modern Optics Volume 41, Issue 12, (1994) [30] R. Jozsa, M. Horodecki, P. Horodecki, R. Horodecki: Universal quantum information compression; Phys. Rev. Lett. 81, 1714–1717, (1998) [31] R. König and S. Wehner: A strong converse for classical channel coding using entangled inputs; Physical Review Letters, vol. 103, no. 7, 070504, (2009) [32] E.H. Lieb: Convex trace functions and the Wigner-Yanase-Dyson conjecture; Adv. Math. 11, 267–288 (1973) [33] E.H. Lieb, W. Thirring: Studies in mathematical physics; pp. 269–297. Princeton University Press, Princeton, (1976) [34] Vitali D. Milman, Gideon Schechtman: Asymptotic Theory of Finite Dimensional Normed Spaces; Lecture Notes in Mathematics, Springer-Verlag Berlin Heidelberg, (1986) [35] M. Mosonyi, F. Hiai, T. Ogawa, M. Fannes, Asymptotic distinguishability measures for shift-invariant quasi-free states of fermionic lattice systems, J. Math. Phys. 49, 072104, (2008) [36] M. Mosonyi, Hypothesis testing for Gaussian states on bosonic lattices, J. Math. Phys. 50, 032104, (2009) [37] M. Mosonyi, N. Datta: Generalized relative entropies and the capacity of classicalquantum channels; J. Math. Phys. 50, 072104 (2009) [38] M. Mosonyi, F. Hiai: On the quantum Rényi relative entropies and related capacity formulas; IEEE Trans. Inf. Theory, 57, 2474–2487, (2011) [39] M. Mosonyi, T. Ogawa: Quantum hypothesis testing and the operational interpretation of the quantum Rényi relative entropies; arXiv:1308.3228, (2013) [40] Martin Müller-Lennert, Frédéric Dupuis, Oleg Szehr, Serge Fehr, Marco Tomamichel: On quantum Renyi entropies: a new definition and some properties; J. Math. Phys. 54, 122203, (2013) [41] H. Nagaoka: Strong converse theorems in quantum information theory; in the book “Asymptotic Theory of Quantum Statistical Inference” edited by M. Hayashi, World Scientific, (2005) 28
[42] H. Nagaoka: The converse part of the theorem for quantum Hoeffding bound; quantph/0611289 [43] M.A. Nielsen, I.L. Chuang: Quantum Information and Quantum Computation; Cambridge University Press, Cambridge, UK, (2000) [44] J. Nötzel: Hypothesis testing on invariant subspaces of the symmetric group, part I quantum Sanov’s theorem and arbitrarily varying sources; arXiv:1310.5553, (2013) [45] T. Ogawa, H. Nagaoka: Strong converse to the quantum channel coding theorem; IEEE Transactions on Information Theory, vol. 45, no. 7, pp. 2486-2489, (1999) [46] T. Ogawa, H. Nagaoka: Strong converse and Stein’s lemma in quantum hypothesis testing; IEEE Trans. Inform. Theory 47, 2428–2433 (2000). [47] D. Petz: Quasi-entropies for finite quantum systems; Rep. Math. Phys. 23, 57–65, (1986) [48] R. Renner: Security of Quantum Key Distribution, PhD dissertation, Swiss Federal Institute of Technology Zurich, Diss. ETH No. 16242, (2005). [49] A. Rényi: On measures of entropy and information; Proc. 4th Berkeley Sympos. Math. Statist. and Prob., Vol. I, pp. 547–561, Univ. California Press, Berkeley, California, (1961) [50] B. Schumacher: Quantum coding; Phys. Rev. A 51, 2738, (1995) [51] B. Schumacher, M. Westmoreland: Sending classical information via noisy quantum channels; Physical Review A, vol. 56, no. 1, pp. 131-138, (1997) [52] R. Sibson: Information radius; Z. Wahrscheinlichkeitsth. Verw. Gebiete 14, 149– 161, (1969) [53] M. Tomamichel, R. Colbeck, R. Renner: A fully quantum asymptotic equipartition property; IEEE Trans. Inform. Theory 55, 5840–5847, (2009) [54] M. Tomamichel: A framework for non-asymptotic quantum information theory; PhD thesis, ETH Zürich, (2006) [55] H. Umegaki: Conditional expectation in an operator algebra; Math. Sem. Rep. 14, 59–85, (1962)
Kodai
[56] Mark M. Wilde, Andreas Winter, Dong Yang: Strong converse for the classical capacity of entanglement-breaking and Hadamard channels; arXiv:1306.1586, (2013) [57] A. Winter: Coding theorem and strong converse for quantum channels; IEEE Transactions on Information Theory, vol. 45, no. 7, pp. 2481–2485, (1999)
29