Unsupervised Learning
Lecture 6: Hierarchical and Nonlinear Models
Zoubin Ghahramani
[email protected] Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College London Autumn 2003
Why we need nonlinearities Linear systems have limited modelling capability.
X1
X2
X3
XT
Y1
Y2
Y3
YT
Consider linear-Gaussian state-space models. Only certain dynamics can be modelled.
Why we need hierarchical models Many generative processes can be naturally described at different levels of detail.
(e.g. objects, illumination, pose)
(e.g. object parts, surfaces)
(e.g. edges)
(retinal image, i.e. pixels)
Biology seems to have developed hierarchical representations.
Why we need distributed representations
S1
S2
S3
ST
Y1
Y2
Y3
YT
Consider a hidden Markov model. To capture N bits of information about the history of the sequence, a HMM requires K = 2N states!
Factorial Hidden Markov Models and Dynamic Bayesian Networks
(1)
(1)
S t-1
(1)
St
S t+1
(2)
(2)
S t-1
(2)
St
S t+1
(3)
(3)
(3)
S t-1
St
S t+1
Yt-1
Yt
Yt+1
At
At+1 Bt
Ct
Bt+1 Ct+1
Dt
At+2 Bt+2
...
Ct+2
Dt+1
Dt+2
These are hidden Markov models with many state variables (i.e. a distributed representation of the state).
Blind Source Separation
Independent Components Analysis XK
X1
Λ Y1
Y2
YD
• P (xk ) is non-Gaussian. • Equivalently P (xk ) is Gaussian, with a nonlinearity g(·):
yd =
K X
Λdk g(xk ) + d
k=1
• For K = D, and observation noise assumed to be zero, inference and learning are easy (standard ICA). Many extensions are possible (e.g. with noise ⇒ IFA).
ICA Nonlinearity Generative model: x = g(w) y = Λx + v where w and v are zero-mean Gaussian noises with covariances I and R respectively. The density of x can be written in terms of g(·), N (0, 1)|g−1(x) px(x) = |g 0(g −1(x))| For example, if px(x) =
1 π cosh(x)
we find that setting:
20 15 10
g(w) = ln tan
π 4
√ 1 + erf(w/ 2)
5 0 −5 −10
generates vectors x in which each component is distributed according to 1/(π cosh(x)).
−15 −20 −6
−4
−2
0
2
4
6
So, ICA can be seen either as a linear generative model with non-Gaussian priors for the hidden variables, or as a nonlinear generative model with Gaussian priors for the hidden variables.
Natural Scenes and Sounds 0
Probability
10
10
Response histogram Gaussian density
-2
-4
10 500
0
Filter Response
500
Natural Scenes a.
b.
!#"%$&(')*,+ +-.*/0/.+1 2345687:9 ;/ =*?@* A
Natural Scenes
!"#%$ &"' ( ) *'+,"-,./, 0" *,1#( 2$ 0,&"3, 0 -465 7,8, 0":9 07'==?A@/B
Natural Movies
! #"$&%'( )'*+ (,- ".0/21(345'& %6 7"8'9 9:; < >=' "8?" %6@ "BA6 (3C"