The Information Bottleneck Method - Semantic Scholar

Report 12 Downloads 122 Views
The Information Bottleneck Method Naftali Tishby, Fernando C. Pereira, William Bialek

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

What is information bottleneck? It is a technique for finding the best tradeoff between accuracy and complexity.

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

Example

Speech compression: A transcript of spoken words has low entropy =⇒ It can be compressed without loosing the information about the words.

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

Problem Definition

Input signals x ∈ X , and y ∈ Y mapping function f: X → Y P(X = x), P(Y = y , X = x) Output ˜ X →X ˜ →Y X

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

Example

1

X = Speech signal

Y = Transcription signal

2

X = Speech signal

Y = Speakers identity

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

Relevant quantization

˜ Mapping X → X 

Soft Partitioning P(˜ x |x) ←− Hard Partitioning P P(˜ x ) = x p(x)p(˜ x |x)

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

What is a good quantization?

The first factor is the rate, or the average number of bits per message needed to specify an element in the codebook without confusion. This number per element of X is bounded from below by the mutual information

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

˜ ), H(X |X ˜) H(X ), I (X , X

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

The average volume of the elements of X that are mapped to the ˜ same codeword is 2H(X |X )

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

Information rate alone is not enough to characterize good quantization since the rate can always be reduced by throwing away details of the original signal x. We need therefore some additional constraints.

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

The information bottleneck

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

The optimal assignment, that minimizes previous equation, satisfies the equation

p(y |˜ x ) can be computed by Bayes’ rule and Markov chain ˜ ←X ←Y condition X

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

The information bottleneck iterative algorithm

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

The structure of the solutions

The formal solution of the self consistent equations, described above, still requires a specification of the structure and cardinality ˜ , as in rate distortion theory. of X

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

a novel implementation of the information bottleneck method for unsupervised document clustering. Input: X = Documents, Y = Words P(X ) and P(X , Y )

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

Hard Clustering

β −→ ∞

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method

Naftali Tishby, Fernando C. Pereira, William Bialek

The Information Bottleneck Method