The Information Bottleneck Method Naftali Tishby, Fernando C. Pereira, William Bialek
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
What is information bottleneck? It is a technique for finding the best tradeoff between accuracy and complexity.
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
Example
Speech compression: A transcript of spoken words has low entropy =⇒ It can be compressed without loosing the information about the words.
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
Problem Definition
Input signals x ∈ X , and y ∈ Y mapping function f: X → Y P(X = x), P(Y = y , X = x) Output ˜ X →X ˜ →Y X
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
Example
1
X = Speech signal
Y = Transcription signal
2
X = Speech signal
Y = Speakers identity
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
Relevant quantization
˜ Mapping X → X
Soft Partitioning P(˜ x |x) ←− Hard Partitioning P P(˜ x ) = x p(x)p(˜ x |x)
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
What is a good quantization?
The first factor is the rate, or the average number of bits per message needed to specify an element in the codebook without confusion. This number per element of X is bounded from below by the mutual information
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
˜ ), H(X |X ˜) H(X ), I (X , X
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
The average volume of the elements of X that are mapped to the ˜ same codeword is 2H(X |X )
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
Information rate alone is not enough to characterize good quantization since the rate can always be reduced by throwing away details of the original signal x. We need therefore some additional constraints.
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
The information bottleneck
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
The optimal assignment, that minimizes previous equation, satisfies the equation
p(y |˜ x ) can be computed by Bayes’ rule and Markov chain ˜ ←X ←Y condition X
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
The information bottleneck iterative algorithm
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
The structure of the solutions
The formal solution of the self consistent equations, described above, still requires a specification of the structure and cardinality ˜ , as in rate distortion theory. of X
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
a novel implementation of the information bottleneck method for unsupervised document clustering. Input: X = Documents, Y = Words P(X ) and P(X , Y )
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
Hard Clustering
β −→ ∞
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method
Naftali Tishby, Fernando C. Pereira, William Bialek
The Information Bottleneck Method