Efficient Approximation of Channel Capacities - Semantic Scholar

Report 0 Downloads 36 Views
Efficient Approximation of Channel Capacities

David Sutter Quantum Information Theory Group, ETH Zürich www.qit.ethz.ch

Tobias Sutter, Peyman Mohajerin Esfahani, John Lygeros

Automatic Control Laboratory, ETH Zürich www.control.ethz.ch

1 / 25

Channel Capacity PY |X X

Channel W

Y

Discrete memoryless channel (DMC) Wx,y := W (y|x) = P Y = y|X = x , where X = {1, 2, . . . , N }, Y = {1, 2, . . . , M }

Definition ([Shannon’48]) The capacity of a DMC channel is defined as C := max I(p, W) p∈∆N

Operational meaning to the definition of the capacity C as the maximum number of bits we can transmit reliably over the channel W   P PM Wi,k P Mutual information I(p, W) := N p W log i,k N i=1 i k=1 `=1

p` W`,k

2 / 25

Channel Capacity (cont’d) PY |X X

Channel W

Y

Capacity of W under additional input cost constraint E[s(X)] 6 S  I(p, W)   max p P: CS = s.t. sT p 6 S   p ∈ ∆N . I

I I

  PN PM W Mutual information I(p, W) = i=1 pi k=1 Wi,k log PN pi,kW `,k `=1 ` finite-dimensional convex optimization problem non-smooth objective function

3 / 25

State of the Art Blahut-Arimoto Algorithm: [Blahut’72] and [Arimoto’72] I I I

Iterative method based on the primal problem Specifically tuned for this problem Difficulties for large input alphabet sizes N

4 / 25

State of the Art Blahut-Arimoto Algorithm: [Blahut’72] and [Arimoto’72] I I I

Iterative method based on the primal problem Specifically tuned for this problem Difficulties for large input alphabet sizes N

Geometric programming and duality: [Mung & Boyd’04] I I I

Showed that the dual program can be recast as a geometric program Generate analytical upper and lower bounds on the capacity problem Point out connection between the dual programs of the capacity and the rate distortion problem

4 / 25

State of the Art Blahut-Arimoto Algorithm: [Blahut’72] and [Arimoto’72] I I I

Iterative method based on the primal problem Specifically tuned for this problem Difficulties for large input alphabet sizes N

Geometric programming and duality: [Mung & Boyd’04] I I I

Showed that the dual program can be recast as a geometric program Generate analytical upper and lower bounds on the capacity problem Point out connection between the dual programs of the capacity and the rate distortion problem

Cutting plane algorithm: [Huang & Meyn’05] I I

Iteratively approximate the mutual information by linear functionals Solve an LP in each iteration step

4 / 25

State of the Art Blahut-Arimoto Algorithm: [Blahut’72] and [Arimoto’72] I I I

Iterative method based on the primal problem Specifically tuned for this problem Difficulties for large input alphabet sizes N

Geometric programming and duality: [Mung & Boyd’04] I I I

Showed that the dual program can be recast as a geometric program Generate analytical upper and lower bounds on the capacity problem Point out connection between the dual programs of the capacity and the rate distortion problem

Cutting plane algorithm: [Huang & Meyn’05] I I

Iteratively approximate the mutual information by linear functionals Solve an LP in each iteration step

Many more: Lapidoth, Ben-Tal, Lasserre, . . . 4 / 25

Outline analytical reformulate problem

introduce additional decision variable

dualize problem

closed form Lagrange dual function

smoothing

entropy maximization

fast gradient method

a priori & a posteriori error

numerical 5 / 25

Equivalent Primal Reformulation Idea: Introduce the output distribution q ∈ ∆M as additional decision variable Assumption (w.l.o.g.): S 6 Smax

Lemma The channel capacity problem P is equivalent to  max −rT p + H(q)   p,q   s. t. WT p = q CS =  sT p = S    p ∈ ∆N , q ∈ ∆M . ri := −

PM

j=1 Wij

log(Wij )

6 / 25

Dual Capacity Problem Its dual program (with strong duality) is  D : CS = min G(λ) + F (λ) : λ ∈ RM , λ

with the Lagrange dual function given by  ( −rT p + λT WT p   max p max H(q) − λT q q T G(λ) = and F (λ) = s.t. s p = S  s.t. q ∈ ∆M  p ∈ ∆N

7 / 25

Dual Capacity Problem Its dual program (with strong duality) is  D : CS = min G(λ) + F (λ) : λ ∈ RM , λ

non-compact

with the Lagrange dual function given by  ( −rT p + λT WT p   max p max H(q) − λT q q T G(λ) = and F (λ) = s.t. s p = S  s.t. q ∈ ∆M  p ∈ ∆N smooth, entropy maximization non-smooth P  M −λi F (λ) = log 2 i=1 7 / 25

Bounding Dual Variables Assumption (Non-singular channel matrix W) γ := min Wij > 0 i,j

Since the mutual information is continuous in W, in case of a singular entry one could perturb this entry.

8 / 25

Bounding Dual Variables Assumption (Non-singular channel matrix W) γ := min Wij > 0 i,j

Since the mutual information is continuous in W, in case of a singular entry one could perturb this entry.

Proposition Under the above assumption, the dual program D is equivalent to CS = min {G(λ) + F (λ) : λ ∈ Q} , λ

 where Q := λ ∈ RM : kλk2 6 M log(γ −1 ) 8 / 25

Smoothing Step Consider a smoothing parameter ν > 0 and the program  λT WT p − rT p + νH(p) − ν log(N )   max p Gν (λ) = s.t. sT p = S   p ∈ ∆N ,

which is a modified entropy maximization and has the solution [Cover’06] 1

T

T

T

pν (λ)i = 2µ1 + ν (λ W − r )i +µ2 si , where µ1 , µ2 ∈ R are such that sT pν (λ) = S and pν (λ) ∈ ∆N . Uniform Approximation: Gν (λ) 6 G(λ) 6 Gν (λ) + ν log(N )

9 / 25

Smoothing Step Consider a smoothing parameter ν > 0 and the program  λT WT p − rT p + νH(p) − ν log(N )   max p Gν (λ) = s.t. sT p = S   p ∈ ∆N ,

which is a modified entropy maximization and has the solution [Cover’06] 1

T

T

T

pν (λ)i = 2µ1 + ν (λ W − r )i +µ2 si , where µ1 , µ2 ∈ R are such that sT pν (λ) = S and pν (λ) ∈ ∆N . Uniform Approximation: Gν (λ) 6 G(λ) 6 Gν (λ) + ν log(N )

Lemma ([Nesterov’05]) The gradient ∇Gν (λ) = WT pν (λ) is Lipschitz continuous with constant Lν = ν1 . 9 / 25

Smoothing Step (cont’d) Consider the smooth, convex optimization problem over a compact set ( min F (λ) + Gν (λ) λ Dν : s.t. λ ∈ Q,

10 / 25

Smoothing Step (cont’d) Consider the smooth, convex optimization problem over a compact set ( min F (λ) + Gν (λ) λ Dν : s.t. λ ∈ Q, πQ is the projection operator of Q Algorithm (): Optimal scheme for smooth optimization [Nesterov’05] For k > 0 do Step 1: Step 2: Step 3: Step 4:

Compute∇F (λk ) + ∇Gν (λk )  yk = πQ − L1ν (∇F (λk ) + ∇Gν (λk )) + λk   P zk = πQ − L1ν ki=0 i+1 (∇F (λ ) + ∇G (λ )) i ν i 2 λk+1 =

2 k+3 zk

+

k+1 k+3 yk

10 / 25

Error Bound Theorem ([Nesterov’05]) Consider the parameter D1 :=

1 2 (M

log(γ

−1

2

)) ,

D2 := log(N ),

2 ν= n+1

r

D1 . D2

Then after n iterations of Algorithm () we can generate

Then,

n X

2(i + 1) pν (λi ) ∈ ∆N . (n + 1)(n + 2) i=0 √ 4D1 4 ˆ + G(λ) ˆ − I(ˆ 0 6 F (λ) p, W ) 6 n+1 D1 D2 + (n+1) 2 ˆ = yn ∈ Q, λ

pˆ =

a posteriori error ˆ + G(λ), ˆ CUB := F (λ)

a priori error CLB := I(ˆ p, W) 11 / 25

Computational Complexity N = input alphabet size M = output alphabet size Blahut-Arimoto

New method

O(M N 2 )

O(M N )

Complexity of one iteration step Complexity for an ε-solution

O



M N 2 log N ε



O



 √ M 2 N log N ε

12 / 25

Simulation Results — Example 1 Consider a DMC having a channel matrix W ∈ RN ×M with N = 10000 and M = 100, whose entries are chosen i.i.d. uniformly distributed in [0, 1].

running time [s]

104

103 Blahut-Arimoto A priori error A posteriori error

102 10−1

10−2 approximation error

10−3 13 / 25

Simulation Results — Example 2 Consider a DMC having a channel matrix W ∈ RN ×M with N = 105 and M = 10, whose entries are chosen i.i.d. uniformly distributed in [0, 1]. Table : Example 2 A priori error CUB (W ) CLB (W ) A posteriori error Time [s] Iterations

1 1.0938 1.0101 0.0837 23 1249

0.1 1.0543 1.0512 0.0031 227 12 329

Blahut-Arimoto algorithm: single iteration

0.01 1.0516 1.0514 2.5·10−4 1833 123 132

0.001 1.0514 1.0514 2.9·10−5 17 529 1 231 160

 14 / 25

Continuous Input — Countable Output Input alphabet X ⊆ R, output alphabet Y = N0 (e.g. discrete-time Poisson channel)  Peak-power constraint: A ⊆ X compact, P X ∈ A = 1 Average-power constraint: s continuous function Channel capacity

CA,S (W) =

    

sup I(p, W ) p

s. t.

E[s(X)] 6 S p ∈ D(A),

space of probability densities on A 15 / 25

Truncation WM (i|x) :=

(

W (i|x) + 0,

1 M

P

j>M

W (j|x), i ∈ {0, 1, . . . , M − 1} i > M.

Channel WM has input alphabet X and output alphabet {0, 1, . . . , M − 1} W (·|x) WM (·|x) 1 M



W (i|x)

i≥M



W (i|x)

i≥M

0

M

N0

Figure : Pictorial representation of the M -truncated channel counterpart 16 / 25

Truncation WM (i|x) :=

(

W (i|x) + 0,

1 M

P

j>M

W (j|x), i ∈ {0, 1, . . . , M − 1} i > M.

Channel WM has input alphabet X and output alphabet {0, 1, . . . , M − 1}

Assumption (Tail decay) For M ∈ N0 and k ∈ (0, 1)

Rk (M ) :=

X

i>M

sup W (i|x) x∈X

k

0

y∈{0,1,...,M −1} x∈A

P In case j>M W (j|x) > 0 for all x, a lower bound can be given by P 1 minx j>M W (j|x). γM > M

19 / 25

Bounding Dual Variables Assumption (Non-singular channel WM ) γM :=

min

min WM (y|x) > 0

y∈{0,1,...,M −1} x∈A

P In case j>M W (j|x) > 0 for all x, a lower bound can be given by P 1 minx j>M W (j|x). γM > M

Proposition

Under the above assumption, the dual program D is equivalent to CA,S (WM ) = min {G(λ) + F (λ) : λ ∈ Q} , λ

 −1 where Q := λ ∈ RM : kλk2 6 M log(γM ) 19 / 25

Smoothing Step    sup p Gν (λ) = s.t.  

For ν > 0

p, Wλ − p, r + νh(p) − ν log(ρ)

p, s = S p ∈ D(A),

which is a modified entropy maximization and has the solution [Cover’06] 1

pλν (x) = 2µ1 + ν (Wλ(x)−r(x))+µ2 s(x) , x ∈ A,

where µ1 , µ2 ∈ R are such that pλν , s = S and pλν ∈ D(A). Uniform Approximation: Gν (λ) 6 G(λ) 6 Gν (λ) +

ι(ν) |{z}

lim ι(ν)=0

ν→0

20 / 25

Smoothing Step    sup p Gν (λ) = s.t.  

For ν > 0

p, Wλ − p, r + νh(p) − ν log(ρ)

p, s = S p ∈ D(A),

which is a modified entropy maximization and has the solution [Cover’06] 1

pλν (x) = 2µ1 + ν (Wλ(x)−r(x))+µ2 s(x) , x ∈ A,

where µ1 , µ2 ∈ R are such that pλν , s = S and pλν ∈ D(A). Uniform Approximation: Gν (λ) 6 G(λ) 6 Gν (λ) +

ι(ν) |{z}

lim ι(ν)=0

ν→0

Lemma

The gradient ∇Gν (λ) = W ? pλν is Lipschitz continuous with constant Lν = ν1 . 20 / 25

Error Bound Theorem ε/α Consider the parameter α = †, ν = log(α/ε) and q √ n > 1ε 8D1 α log(ε−1 ) + log(α) + 41 . Then after n iterations of Algorithm () we can generate

ˆ = yn ∈ Q λ ⇒

and

pˆ =

n X k=0

2(i + 1) pxk ∈ D(A). (n + 1)(n + 2) ν

ˆ + G(λ) ˆ − I(ˆ 0 6 F (λ) p, W ) 6 ε a posteriori error ˆ + G(λ), ˆ CUB := F (λ)

a priori error

CLB := I(ˆ p, W)

21 / 25

Discrete-Time Poisson Channel y

W (y|x) = e−(x+η) (x+η) y! ,

y ∈ N0 , x ∈ R>0

Important example to model optical communication systems [Shamai’90]  Peak-power constraint P X ∈ [0, A] = 1

No analytic expression for the capacity of the Poisson channel with a peak-power constraint is known I

Analytical lower bound is available

Channel has fast decaying tail ⇒ output alphabet truncation possible

22 / 25

Capacity [bits per channel use]

Discrete-Time Poisson Channel (cont’d) upper bound lower bound lower bound [Lapidoth’09]

1

0.5

0

0

2

4

6 A [dB]

8

10

12

23 / 25

Outlook Technical report

arX

iv:

soo

Capacity of a classical-quantum channel I

More precise modelling of optical channels should include quantum-mechanical effects

n,

Capacity of a random channel Approximate dynamic programming I

Topic of my next coffee talk? :)

24 / 25

Acknowledgements and Questions

Special thanks to Stefan Richter Prof. Renato Renner

25 / 25

Acknowledgements and Questions

Questions?

25 / 25

Entropy Maximization Consider the optimization problem

 h(p) + p, c   sup p

s.t. p, s = S   p ∈ D(A),

(1)

with c, s ∈ L∞ (A).

Lemma ([Boltzmann, 1877]) Let p?µ (x) = 2µ1 +c(x)+µ2 s(x) , where µ1 , µ2 ∈ R are chosen such that p?µ satisfies the constraints in (1). Then p?µ uniquely solves (1). straightforward extension to multiple moment contraints find µi using semidefinite programming [Lasserre, 2009] 26 / 25

Capacity of a Classical-Quantum (CQ) Channel

D(H) := {ρ ∈ H| tr[ρ] = 1, ρ > 0} is the set of density operators on a Hilbert space H CQ-channel W : X → D(H), x 7−→ ρx von Neumann entropy H(ρx ) := − tr[ρx log ρx ] Capacity of a CQ-channel   P P H  x∈X PX (x)ρx − x∈X PX (x)H(ρx )  max PX P Ccq = s.t. x∈X s(x)PX (x) 6 S   PX ∈ ∆N . 27 / 25