Gaussian Process Latent Variable Models for Human Pose Estimation Carl Henrik Ek, Philip H.S. Torr and Neil D. Lawrence Oxford Brookes University and University of Manchester
⊲ Pose Data is dynamic. ⊲ Encourage the latent structure to respect the data’s dynamic ⊲ Auto-regressive Gaussian Process in time over latent space [WFH05]
Abstract
We describe a generative approach to recover 3D human pose from image silhouettes. Our method is based on learning a shared low dimensional latent representation capable of generating both human pose and image observations through the GP-LVM [Law05] We learn a dynamical model over the latent space which allows us to disambiguate between ambiguous silhouettes by temporal consistency. The model has only two free parameters and requires no manual initialization.
Introduction
j=1
(2π) |K|
1 2
T − 12 y:,j K−1y:,j
e
Z Pose
Figure 1: Graphical representation of the shared latent space GP-LVM model for the data. Round shaded circles are observed data, square black boxes are variables optimized using maximum likelihood. The “arched” arrow shows the back-constraint of the probabilistic model.
,
We want to learn a latent representation X to represent both Y and Z. This can be done by maximizing the joint marginal likelihood of two separate GP-LVM models sharing the same latent coordinates X [SGHR05].
Conclusion
Inference
1. Discrete Optimization Find most likely sequence X through training data ⊲ Observations: p(y∗t |xtrain ) i train ) |x ⊲ Transitions: p(xtrain j i
xi = g(zi, W)
Z
We presented a fully automatic method for human pose estimation using GP-LVM and showed results on synthetic data. Learning a low dimensional latent space shared between silhouette observations and human pose we have made exact inference in a generative model computationally feasible for human pose estimation. By re-representing the data in a shared space we can encourage this representation to respect the data’s dynamic, we exploit this structure to automatically initialize our inference.
References
p(Lj |Lj )
P (Y, Z|X, ΦY , ΦZ ) = P (Y|X, ΦY )P (Z|X, ΦZ )
Y
Figure 3: Every 20th frame from a circular walk sequence, Top Row: Input Silhouette, Middle Row: Model Pose Estimate, Bottom Row: Ground Truth. The box indicates bad estimates by our model
⇒ Two Free Parameters: 1. Dimensionality of Latent Space 2. Width of RBF-Kernel of back-constraint
p(Yt−1|Lj )
⊲ Multi-Modal relationship between Silhouette and Pose. ⊲ Encourage Multi-Modality to be contained in relation between Silhouette and Latent Representation. ⇒ Represent the latent positions as a smooth function of pose, [LC06]. ˆ ΦˆY , ΦˆZ } = argmaxW,Φ ,Φ P (Y, Z|W, ΦY , ΦZ ) {W,
ˆ to pose, through GP mean prediction 3. Map X
ΦZ
11111 00000 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111
Feature
1 N 2
W
Y
⊲ Data yi generated by latent variable xi, yi = f (xi) ⊲ GP-LVM: ⊲ Gaussian Process (GP) prior over f ⊲ Marginal likelihood of the data Y by integration over f ⊲ Find latent coordinates X and GP-hyperparameters Φ that maximizes the marginal likelihood
{X, Φ} = argmaxX,Φ =
X 11111 00000 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111
Gaussian Process Latent Variable Models
ˆ = argmaxX p(Y∗, X∗|Y, X, ΦY , Φdyn) X ∗
Φdyn ΦY
⊲ Two Main Lines of Work: 1. Regression: Model pose = f (silhouette) ⊲ Cannot Handle Multi-Modality 2. Generative: Model p(silhouette|pose) ⊲ High-dimensional ⇒ Approximative inference
D Y
ˆ ΦˆY , ΦˆZ , Φdyn ˆ } = argmaxW,Φ ,Φ ,Φ P (Y, Z|W, ΦY , ΦZ )P (W|Φdyn) {W, Y Z dyn
2. Continuous Optimization
[Law05]
Neil D. Lawrence. Probabilistic non-linear principal component analysis with gaussian process latent variable models. Journal of Machine Learning Research, 6:1783–1816, 2005.
[LC06]
Neil D. Lawrence and Joaquin Qui˜nonero Candela. Local distance preservation in the gp-lvm through back constraints. In ICML, pages 513–520, 2006.
p(Yt−1|Li ) p(Li |Li )
Xt−1
Xt
Xt+1
Figure 2: Hidden Markov Model Used to initialize inference with the most likely path through the training data
[SGHR05] Aaron P. Shon, Keith Grochow, Aaron Hertzmann, and Rajesh P. N. Rao. Learning shared latent structure for image synthesis and robotic imitation. In NIPS, 2005. [WFH05] Jack Wang, David J. Fleet, and Aaron Hertzmann. Gaussian process dynamical models. In NIPS, 2005.
Acknowledgment: We would like thank Ankur Agarwal for sharing his code and data, Guido Sanguinetti and Nathaniel King for useful discussions. This work was supported by the Pascal Network, EPSRC and Sharp Laboratories Europe.