Efficient Approximation of Channel Capacities
David Sutter Quantum Information Theory Group, ETH Zürich www.qit.ethz.ch
Tobias Sutter, Peyman Mohajerin Esfahani, John Lygeros
Automatic Control Laboratory, ETH Zürich www.control.ethz.ch
1 / 25
Channel Capacity PY |X X
Channel W
Y
Discrete memoryless channel (DMC) Wx,y := W (y|x) = P Y = y|X = x , where X = {1, 2, . . . , N }, Y = {1, 2, . . . , M }
Definition ([Shannon’48]) The capacity of a DMC channel is defined as C := max I(p, W) p∈∆N
Operational meaning to the definition of the capacity C as the maximum number of bits we can transmit reliably over the channel W P PM Wi,k P Mutual information I(p, W) := N p W log i,k N i=1 i k=1 `=1
p` W`,k
2 / 25
Channel Capacity (cont’d) PY |X X
Channel W
Y
Capacity of W under additional input cost constraint E[s(X)] 6 S I(p, W) max p P: CS = s.t. sT p 6 S p ∈ ∆N . I
I I
PN PM W Mutual information I(p, W) = i=1 pi k=1 Wi,k log PN pi,kW `,k `=1 ` finite-dimensional convex optimization problem non-smooth objective function
3 / 25
State of the Art Blahut-Arimoto Algorithm: [Blahut’72] and [Arimoto’72] I I I
Iterative method based on the primal problem Specifically tuned for this problem Difficulties for large input alphabet sizes N
4 / 25
State of the Art Blahut-Arimoto Algorithm: [Blahut’72] and [Arimoto’72] I I I
Iterative method based on the primal problem Specifically tuned for this problem Difficulties for large input alphabet sizes N
Geometric programming and duality: [Mung & Boyd’04] I I I
Showed that the dual program can be recast as a geometric program Generate analytical upper and lower bounds on the capacity problem Point out connection between the dual programs of the capacity and the rate distortion problem
4 / 25
State of the Art Blahut-Arimoto Algorithm: [Blahut’72] and [Arimoto’72] I I I
Iterative method based on the primal problem Specifically tuned for this problem Difficulties for large input alphabet sizes N
Geometric programming and duality: [Mung & Boyd’04] I I I
Showed that the dual program can be recast as a geometric program Generate analytical upper and lower bounds on the capacity problem Point out connection between the dual programs of the capacity and the rate distortion problem
Cutting plane algorithm: [Huang & Meyn’05] I I
Iteratively approximate the mutual information by linear functionals Solve an LP in each iteration step
4 / 25
State of the Art Blahut-Arimoto Algorithm: [Blahut’72] and [Arimoto’72] I I I
Iterative method based on the primal problem Specifically tuned for this problem Difficulties for large input alphabet sizes N
Geometric programming and duality: [Mung & Boyd’04] I I I
Showed that the dual program can be recast as a geometric program Generate analytical upper and lower bounds on the capacity problem Point out connection between the dual programs of the capacity and the rate distortion problem
Cutting plane algorithm: [Huang & Meyn’05] I I
Iteratively approximate the mutual information by linear functionals Solve an LP in each iteration step
Many more: Lapidoth, Ben-Tal, Lasserre, . . . 4 / 25
Outline analytical reformulate problem
introduce additional decision variable
dualize problem
closed form Lagrange dual function
smoothing
entropy maximization
fast gradient method
a priori & a posteriori error
numerical 5 / 25
Equivalent Primal Reformulation Idea: Introduce the output distribution q ∈ ∆M as additional decision variable Assumption (w.l.o.g.): S 6 Smax
Lemma The channel capacity problem P is equivalent to max −rT p + H(q) p,q s. t. WT p = q CS = sT p = S p ∈ ∆N , q ∈ ∆M . ri := −
PM
j=1 Wij
log(Wij )
6 / 25
Dual Capacity Problem Its dual program (with strong duality) is D : CS = min G(λ) + F (λ) : λ ∈ RM , λ
with the Lagrange dual function given by ( −rT p + λT WT p max p max H(q) − λT q q T G(λ) = and F (λ) = s.t. s p = S s.t. q ∈ ∆M p ∈ ∆N
7 / 25
Dual Capacity Problem Its dual program (with strong duality) is D : CS = min G(λ) + F (λ) : λ ∈ RM , λ
non-compact
with the Lagrange dual function given by ( −rT p + λT WT p max p max H(q) − λT q q T G(λ) = and F (λ) = s.t. s p = S s.t. q ∈ ∆M p ∈ ∆N smooth, entropy maximization non-smooth P M −λi F (λ) = log 2 i=1 7 / 25
Bounding Dual Variables Assumption (Non-singular channel matrix W) γ := min Wij > 0 i,j
Since the mutual information is continuous in W, in case of a singular entry one could perturb this entry.
8 / 25
Bounding Dual Variables Assumption (Non-singular channel matrix W) γ := min Wij > 0 i,j
Since the mutual information is continuous in W, in case of a singular entry one could perturb this entry.
Proposition Under the above assumption, the dual program D is equivalent to CS = min {G(λ) + F (λ) : λ ∈ Q} , λ
where Q := λ ∈ RM : kλk2 6 M log(γ −1 ) 8 / 25
Smoothing Step Consider a smoothing parameter ν > 0 and the program λT WT p − rT p + νH(p) − ν log(N ) max p Gν (λ) = s.t. sT p = S p ∈ ∆N ,
which is a modified entropy maximization and has the solution [Cover’06] 1
T
T
T
pν (λ)i = 2µ1 + ν (λ W − r )i +µ2 si , where µ1 , µ2 ∈ R are such that sT pν (λ) = S and pν (λ) ∈ ∆N . Uniform Approximation: Gν (λ) 6 G(λ) 6 Gν (λ) + ν log(N )
9 / 25
Smoothing Step Consider a smoothing parameter ν > 0 and the program λT WT p − rT p + νH(p) − ν log(N ) max p Gν (λ) = s.t. sT p = S p ∈ ∆N ,
which is a modified entropy maximization and has the solution [Cover’06] 1
T
T
T
pν (λ)i = 2µ1 + ν (λ W − r )i +µ2 si , where µ1 , µ2 ∈ R are such that sT pν (λ) = S and pν (λ) ∈ ∆N . Uniform Approximation: Gν (λ) 6 G(λ) 6 Gν (λ) + ν log(N )
Lemma ([Nesterov’05]) The gradient ∇Gν (λ) = WT pν (λ) is Lipschitz continuous with constant Lν = ν1 . 9 / 25
Smoothing Step (cont’d) Consider the smooth, convex optimization problem over a compact set ( min F (λ) + Gν (λ) λ Dν : s.t. λ ∈ Q,
10 / 25
Smoothing Step (cont’d) Consider the smooth, convex optimization problem over a compact set ( min F (λ) + Gν (λ) λ Dν : s.t. λ ∈ Q, πQ is the projection operator of Q Algorithm (): Optimal scheme for smooth optimization [Nesterov’05] For k > 0 do Step 1: Step 2: Step 3: Step 4:
Compute∇F (λk ) + ∇Gν (λk ) yk = πQ − L1ν (∇F (λk ) + ∇Gν (λk )) + λk P zk = πQ − L1ν ki=0 i+1 (∇F (λ ) + ∇G (λ )) i ν i 2 λk+1 =
2 k+3 zk
+
k+1 k+3 yk
10 / 25
Error Bound Theorem ([Nesterov’05]) Consider the parameter D1 :=
1 2 (M
log(γ
−1
2
)) ,
D2 := log(N ),
2 ν= n+1
r
D1 . D2
Then after n iterations of Algorithm () we can generate
Then,
n X
2(i + 1) pν (λi ) ∈ ∆N . (n + 1)(n + 2) i=0 √ 4D1 4 ˆ + G(λ) ˆ − I(ˆ 0 6 F (λ) p, W ) 6 n+1 D1 D2 + (n+1) 2 ˆ = yn ∈ Q, λ
pˆ =
a posteriori error ˆ + G(λ), ˆ CUB := F (λ)
a priori error CLB := I(ˆ p, W) 11 / 25
Computational Complexity N = input alphabet size M = output alphabet size Blahut-Arimoto
New method
O(M N 2 )
O(M N )
Complexity of one iteration step Complexity for an ε-solution
O
M N 2 log N ε
O
√ M 2 N log N ε
12 / 25
Simulation Results — Example 1 Consider a DMC having a channel matrix W ∈ RN ×M with N = 10000 and M = 100, whose entries are chosen i.i.d. uniformly distributed in [0, 1].
running time [s]
104
103 Blahut-Arimoto A priori error A posteriori error
102 10−1
10−2 approximation error
10−3 13 / 25
Simulation Results — Example 2 Consider a DMC having a channel matrix W ∈ RN ×M with N = 105 and M = 10, whose entries are chosen i.i.d. uniformly distributed in [0, 1]. Table : Example 2 A priori error CUB (W ) CLB (W ) A posteriori error Time [s] Iterations
1 1.0938 1.0101 0.0837 23 1249
0.1 1.0543 1.0512 0.0031 227 12 329
Blahut-Arimoto algorithm: single iteration
0.01 1.0516 1.0514 2.5·10−4 1833 123 132
0.001 1.0514 1.0514 2.9·10−5 17 529 1 231 160
14 / 25
Continuous Input — Countable Output Input alphabet X ⊆ R, output alphabet Y = N0 (e.g. discrete-time Poisson channel) Peak-power constraint: A ⊆ X compact, P X ∈ A = 1 Average-power constraint: s continuous function Channel capacity
CA,S (W) =
sup I(p, W ) p
s. t.
E[s(X)] 6 S p ∈ D(A),
space of probability densities on A 15 / 25
Truncation WM (i|x) :=
(
W (i|x) + 0,
1 M
P
j>M
W (j|x), i ∈ {0, 1, . . . , M − 1} i > M.
Channel WM has input alphabet X and output alphabet {0, 1, . . . , M − 1} W (·|x) WM (·|x) 1 M
W (i|x)
i≥M
W (i|x)
i≥M
0
M
N0
Figure : Pictorial representation of the M -truncated channel counterpart 16 / 25
Truncation WM (i|x) :=
(
W (i|x) + 0,
1 M
P
j>M
W (j|x), i ∈ {0, 1, . . . , M − 1} i > M.
Channel WM has input alphabet X and output alphabet {0, 1, . . . , M − 1}
Assumption (Tail decay) For M ∈ N0 and k ∈ (0, 1)
Rk (M ) :=
X
i>M
sup W (i|x) x∈X
k
0
y∈{0,1,...,M −1} x∈A
P In case j>M W (j|x) > 0 for all x, a lower bound can be given by P 1 minx j>M W (j|x). γM > M
19 / 25
Bounding Dual Variables Assumption (Non-singular channel WM ) γM :=
min
min WM (y|x) > 0
y∈{0,1,...,M −1} x∈A
P In case j>M W (j|x) > 0 for all x, a lower bound can be given by P 1 minx j>M W (j|x). γM > M
Proposition
Under the above assumption, the dual program D is equivalent to CA,S (WM ) = min {G(λ) + F (λ) : λ ∈ Q} , λ
−1 where Q := λ ∈ RM : kλk2 6 M log(γM ) 19 / 25
Smoothing Step sup p Gν (λ) = s.t.
For ν > 0
p, Wλ − p, r + νh(p) − ν log(ρ)
p, s = S p ∈ D(A),
which is a modified entropy maximization and has the solution [Cover’06] 1
pλν (x) = 2µ1 + ν (Wλ(x)−r(x))+µ2 s(x) , x ∈ A,
where µ1 , µ2 ∈ R are such that pλν , s = S and pλν ∈ D(A). Uniform Approximation: Gν (λ) 6 G(λ) 6 Gν (λ) +
ι(ν) |{z}
lim ι(ν)=0
ν→0
20 / 25
Smoothing Step sup p Gν (λ) = s.t.
For ν > 0
p, Wλ − p, r + νh(p) − ν log(ρ)
p, s = S p ∈ D(A),
which is a modified entropy maximization and has the solution [Cover’06] 1
pλν (x) = 2µ1 + ν (Wλ(x)−r(x))+µ2 s(x) , x ∈ A,
where µ1 , µ2 ∈ R are such that pλν , s = S and pλν ∈ D(A). Uniform Approximation: Gν (λ) 6 G(λ) 6 Gν (λ) +
ι(ν) |{z}
lim ι(ν)=0
ν→0
Lemma
The gradient ∇Gν (λ) = W ? pλν is Lipschitz continuous with constant Lν = ν1 . 20 / 25
Error Bound Theorem ε/α Consider the parameter α = †, ν = log(α/ε) and q √ n > 1ε 8D1 α log(ε−1 ) + log(α) + 41 . Then after n iterations of Algorithm () we can generate
ˆ = yn ∈ Q λ ⇒
and
pˆ =
n X k=0
2(i + 1) pxk ∈ D(A). (n + 1)(n + 2) ν
ˆ + G(λ) ˆ − I(ˆ 0 6 F (λ) p, W ) 6 ε a posteriori error ˆ + G(λ), ˆ CUB := F (λ)
a priori error
CLB := I(ˆ p, W)
21 / 25
Discrete-Time Poisson Channel y
W (y|x) = e−(x+η) (x+η) y! ,
y ∈ N0 , x ∈ R>0
Important example to model optical communication systems [Shamai’90] Peak-power constraint P X ∈ [0, A] = 1
No analytic expression for the capacity of the Poisson channel with a peak-power constraint is known I
Analytical lower bound is available
Channel has fast decaying tail ⇒ output alphabet truncation possible
22 / 25
Capacity [bits per channel use]
Discrete-Time Poisson Channel (cont’d) upper bound lower bound lower bound [Lapidoth’09]
1
0.5
0
0
2
4
6 A [dB]
8
10
12
23 / 25
Outlook Technical report
arX
iv:
soo
Capacity of a classical-quantum channel I
More precise modelling of optical channels should include quantum-mechanical effects
n,
Capacity of a random channel Approximate dynamic programming I
Topic of my next coffee talk? :)
24 / 25
Acknowledgements and Questions
Special thanks to Stefan Richter Prof. Renato Renner
25 / 25
Acknowledgements and Questions
Questions?
25 / 25
Entropy Maximization Consider the optimization problem
h(p) + p, c sup p
s.t. p, s = S p ∈ D(A),
(1)
with c, s ∈ L∞ (A).
Lemma ([Boltzmann, 1877]) Let p?µ (x) = 2µ1 +c(x)+µ2 s(x) , where µ1 , µ2 ∈ R are chosen such that p?µ satisfies the constraints in (1). Then p?µ uniquely solves (1). straightforward extension to multiple moment contraints find µi using semidefinite programming [Lasserre, 2009] 26 / 25
Capacity of a Classical-Quantum (CQ) Channel
D(H) := {ρ ∈ H| tr[ρ] = 1, ρ > 0} is the set of density operators on a Hilbert space H CQ-channel W : X → D(H), x 7−→ ρx von Neumann entropy H(ρx ) := − tr[ρx log ρx ] Capacity of a CQ-channel P P H x∈X PX (x)ρx − x∈X PX (x)H(ρx ) max PX P Ccq = s.t. x∈X s(x)PX (x) 6 S PX ∈ ∆N . 27 / 25