Computer Vision
CS 188: Ar)ficial Intelligence
Advanced Applica)ons: Computer Vision and Robo)cs*
Pieter Abbeel, Dan Klein University of California, Berkeley
Object Detec)on
Object Detec)on Approach 1: HOG + SVM
Features and Generaliza)on
Features and Generaliza)on
Image [Dalal and Triggs, 2005]
HoG
Training
State-‐of-‐the-‐art Results
§ Round 1 § Training set =
sofa
§ Posi)ve examples: from labeling § Nega)ve examples: random patches
à preliminary SVM
§ Round 2 (“bootstrapping” or “mining hard nega)ves”)
bo\le
§ Training set = § Posi)ve examples: from labeling § Nega)ve examples: patches that have score >= -‐1
cat
à final SVM [Girschik, Felzenszwalb, McAllester]
State-‐of-‐the-‐art Results
Object Detec)on Approach 2: Deep Learning
person
car
horse
[Girschik, Felzenszwalb, McAllester]
How Many Computers to Iden)fy a Cat?
Perceptron
f1 f2 f3
“Google Brain” [Le, Ng, Dean, et al, 2012]
w1 w2 w3
Σ
>0?
Two-‐Layer Neural Network
N-‐Layer Neural Network
w11 w21 w31 f1 f2
f3
Σ
>0? w1
w32
Σ
>0?
w2
Σ
w3 w13 w23 w33
>0?
Σ
>0?
…
Σ
>0?
Σ
>0?
Σ
>0?
…
Σ
>0?
Σ
>0?
Σ
>0?
…
Σ
>0?
f1
w12 w22
Σ
Σ
f2
Σ
f3
>0?
Hill Climbing
Auto-‐Encoder (Crude Idea Sketch)
§ Simple, general idea: § § § §
Start wherever Repeat: move to the best neighboring state If no neighbors be\er than current, quit Neighbors = small perturba)ons of w
f1
Σ
§ Many local op)ma
Σ
f1
Σ
>0?
f2
Σ
>0?
f3
>0?
f3
-‐-‐> How to find a good local op1mum?
>0?
>0?
f2
§ Property
Σ
Training Procedure: Stacked Auto-‐Encoder
Final Result: Trained Neural Network
§ Auto-‐encoder § Layer 1 = “compressed” version of input layer
§ Stacked Auto-‐encoder § For every image, make a compressed image (= layer 1 response to image) § Learn Layer 2 by using compressed images as input, and as output to be predicted § Repeat similarly for Layer 3, 4, etc.
§ Some details lef out § Typically in between layers responses get agglomerated from several neurons (“pooling” / “complex cells”)
Σ
>0?
Σ
>0?
…
Σ
>0?
Σ
>0?
Σ
>0?
…
Σ
>0?
f1 f2
… fN
…
Σ
… >0?
Σ
… >0?
…
Σ
>0?
Robo)cs
Final Result: Trained Neural Network Σ
>0?
Σ
>0?
…
Σ
>0?
Σ
>0?
Σ
>0?
…
Σ
>0?
f1 f2
… fN
…
Σ
… >0?
Σ
Σ
… >0?
…
Σ
>0?
Robo)c Helicopters
Mo)va)ng Example
n
Autonomous Helicopter Flight
How do we execute a task like this?
Autonomous Helicopter Setup
On-‐board iner)al measurement unit (IMU) Posi)on
§ Key challenges: § Track helicopter posi)on and orienta)on during flight § Decide on control inputs to send to helicopter
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA
Send out controls to helicopter
HMM for Tracking the Helicopter
Helicopter MDP § State:
˙ µ, ˙ Ã) ˙ s = (x, y, z, Á, µ, Ã, x, ˙ y, ˙ z, ˙ Á,
§ Ac)ons (control inputs):
§ State:
˙ µ, ˙ Ã) ˙ s = (x, y, z, Á, µ, Ã, x, ˙ y, ˙ z, ˙ Á,
§ Measurements: [observa)on update] § 3-‐D coordinates from vision, 3-‐axis magnetometer, 3-‐axis gyro, 3-‐axis accelerometer
§ Transi)ons (dynamics): [)me elapse update] § st+1 = f (st, at) + wt
§ § § §
alon : Main rotor longitudinal cyclic pitch control (affects pitch rate) alat : Main rotor la)tudinal cyclic pitch control (affects roll rate) acoll : Main rotor collec)ve pitch (affects main rotor thrust) arud : Tail rotor collec)ve pitch (affects tail rotor thrust)
§ Transi)ons (dynamics): § st+1 = f (st, at) + wt [f encodes helicopter dynamics] [w is a probabilistic noise model]
f: encodes helicopter dynamics, w: noise
§ Can we solve the MDP yet?
Problem: What’s the Reward?
Hover
§ Reward for hovering:
[Ng et al, 2004]
Problem: What’s the Reward?
Flips (?)
§ Rewards for “Flip”? § Problem: what’s the target trajectory? § Just write it down by hand?
40
Helicopter Appren)ceship?
Demonstra)ons
41
Learning a Trajectory
Probabilis)c Alignment using a Bayes’ Net
Hidden
Hidden
Demo 1
Demo 1
Demo 2
Demo 2
• HMM-‐like genera)ve model
– Dynamics model used as HMM transi)on model – Demos are observa)ons of hidden trajectory
§ Dynamic Time Warping
• Problem: how do we align observa)ons to hidden trajectory?
(Needleman&Wunsch 1970, Sakoe&Chiba, 1978)
§ Extended Kalman filter / smoother
Abbeel, Coates, Ng, IJRR 2010
Aligned Demonstra)ons
Abbeel, Coates, Ng, IJRR 2010
Alignment of Samples
§ Result: inferred sequence is much cleaner!
Legged Locomo)on
Final Behavior
[Abbeel, Coates, Quigley, Ng, 2010]
Quadruped
Experimental setup § Demonstrate path across the “training terrain”
§ Run appren)ceship to learn the reward func)on § Receive “tes)ng terrain”-‐-‐-‐height map. § Low-‐level control problem: moving a foot into a new loca)on à search with successor func)on ~ moving the motors § High-‐level control problem: where should we place the feet? § Reward func)on R(x) = w . f(s) [25 features] [Kolter, Abbeel & Ng, 2008]
Without learning
§ Find the op)mal policy with respect to the learned reward func,on for crossing the tes)ng terrain. [Kolter, Abbeel & Ng, 2008]
With learned reward function
Personal Robo)cs
PR1 (tele-‐op)
PR2 (autonomous)
PR2 (autonomous)
Darpa Robo)cs Challenge
Next Time
§ Disaster response (e.g., Fukushima) § E.g., Get into car, drive it, get out, open door, enter building, climb ladder, traverse industrial walkway, use tool to break a panel, locate and close a valve, replace a cooling pump
§ Compe))on / Prizes § Simula)on compe))on (June 2013) § Prize: Petman
§ Real robot (petman) compe))on (November 2014) § Prize: $ 2M
§ AI for games § Final Contest results § Where to go next to learn more about AI