CS 188: Ar)ficial Intelligence
Advanced Applica)ons: Computer Vision and Robo)cs *
Instructor: Pieter Abbeel University of California, Berkeley Slides by Dan Klein and Pieter Abbeel
Computer Vision
Object Detec)on
Object Detec)on Approach 1: HOG + SVM
Features and Generaliza)on
[Dalal and Triggs, 2005]
Features and Generaliza)on
Image
HoG
Training § Round 1 § Training set = § Posi)ve examples: from labeling § Nega)ve examples: random patches
à preliminary SVM
§ Round 2 (“bootstrapping” or “mining hard nega)ves”) § Training set = § Posi)ve examples: from labeling § Nega)ve examples: patches that have score >= -‐1
à final SVM
State-‐of-‐the-‐art Results sofa
bo_le
cat [Girschik, Felzenszwalb, McAllester]
State-‐of-‐the-‐art Results person
car
horse
[Girschik, Felzenszwalb, McAllester]
Object Detec)on Approach 2: Deep Learning
How Many Computers to Iden)fy a Cat?
“Google Brain” [Le, Ng, Dean, et al, 2012]
Perceptron
f1 f2 f3
w1 w2 w3
Σ
>0?
Two-‐Layer Neural Network w11 w21 w31 f1
Σ
>0?
Σ
>0?
w1
w12 w22
f2
w32
w2
Σ
w3
f3
w13 w23 w33
Σ
>0?
N-‐Layer Neural Network Σ
>0?
Σ
>0?
…
Σ
>0?
Σ
>0?
Σ
>0?
…
Σ
>0?
Σ
>0?
Σ
>0?
…
Σ
>0?
f1 f2
f3
Σ
Hill Climbing § Simple, general idea: § § § §
Start wherever Repeat: move to the best neighboring state If no neighbors be_er than current, quit Neighbors = small perturba)ons of w
§ Property § Many local op)ma
-‐-‐> How to find a good local op1mum?
Auto-‐Encoder (Crude Idea Sketch) f1
Σ
f3
>0?
f1
Σ
>0?
f2
Σ
>0?
f3
>0?
f2
Σ
Σ
>0?
Training Procedure: Stacked Auto-‐Encoder § Auto-‐encoder § Layer 1 = “compressed” version of input layer
§ Stacked Auto-‐encoder § For every image, make a compressed image (= layer 1 response to image) § Learn Layer 2 by using compressed images as input, and as output to be predicted § Repeat similarly for Layer 3, 4, etc.
§ Some details leh out § Typically in between layers responses get agglomerated from several neurons (“pooling” / “complex cells”)
Final Result: Trained Neural Network Σ
>0?
Σ
>0?
…
Σ
>0?
Σ
>0?
Σ
>0?
…
Σ
>0?
f1 f2
… fN
…
Σ
… >0?
Σ
… >0?
…
Σ
>0?
Final Result: Trained Neural Network Σ
>0?
Σ
>0?
…
Σ
>0?
Σ
>0?
Σ
>0?
…
Σ
>0?
f1 f2
… fN
…
Σ
… >0?
Σ
… >0?
Robo)cs
…
Σ
>0?
Σ
Robo)c Helicopters
Mo)va)ng Example
n
How do we execute a task like this?
Autonomous Helicopter Flight
§ Key challenges: § Track helicopter posi)on and orienta)on during flight § Decide on control inputs to send to helicopter
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA
Autonomous Helicopter Setup
On-‐board iner)al measurement unit (IMU) Posi)on Send out controls to helicopter
HMM for Tracking the Helicopter
§ State:
˙ ˙) s = (x, y, z, , ✓, , x, ˙ y, ˙ z, ˙ ˙ , ✓,
§ Measurements: [observa)on update] § 3-‐D coordinates from vision, 3-‐axis magnetometer, 3-‐axis gyro, 3-‐axis accelerometer
§ Transi)ons (dynamics): [)me elapse update] § st+1 = f (st, at) + wt
f: encodes helicopter dynamics, w: noise
Helicopter MDP § State:
˙ ˙) s = (x, y, z, , ✓, , x, ˙ y, ˙ z, ˙ ˙ , ✓,
§ Ac)ons (control inputs): § § § §
alon : Main rotor longitudinal cyclic pitch control (affects pitch rate) alat : Main rotor la)tudinal cyclic pitch control (affects roll rate) acoll : Main rotor collec)ve pitch (affects main rotor thrust) arud : Tail rotor collec)ve pitch (affects tail rotor thrust)
§ Transi)ons (dynamics): § st+1 = f (st, at) + wt [f encodes helicopter dynamics] [w is a probabilistic noise model]
§ Can we solve the MDP yet?
Problem: What’s the Reward? § Reward for hovering:
Hover
[Ng et al, 2004]
Problem: What’s the Reward? § Rewards for “Flip”? § Problem: what’s the target trajectory? § Just write it down by hand?
Flips (?)
40
Helicopter Appren)ceship?
41
Demonstra)ons
Learning a Trajectory Hidden
Demo 1
Demo 2
• HMM-‐like genera)ve model
– Dynamics model used as HMM transi)on model – Demos are observa)ons of hidden trajectory
• Problem: how do we align observa)ons to hidden trajectory?
Abbeel, Coates, Ng, IJRR 2010
Probabilis)c Alignment using a Bayes’ Net Hidden
Demo 1
Demo 2
§ Dynamic Time Warping (Needleman&Wunsch 1970, Sakoe&Chiba, 1978)
§ Extended Kalman filter / smoother Abbeel, Coates, Ng, IJRR 2010
Aligned Demonstra)ons
Alignment of Samples
§ Result: inferred sequence is much cleaner!
Final Behavior
[Abbeel, Coates, Quigley, Ng, 2010]
Legged Locomo)on
Quadruped
§ Low-‐level control problem: moving a foot into a new loca)on à search with successor func)on ~ moving the motors § High-‐level control problem: where should we place the feet? § Reward func)on R(x) = w . f(s) [25 features] [Kolter, Abbeel & Ng, 2008]
Experimental setup § Demonstrate path across the “training terrain”
§ Run appren)ceship to learn the reward func)on § Receive “tes)ng terrain”-‐-‐-‐height map.
§ Find the op)mal policy with respect to the learned reward func,on for crossing the tes)ng terrain. [Kolter, Abbeel & Ng, 2008]
Without learning
With learned reward function
Next Time § Final Contest results § Robot butlers § Where to go next to learn more about AI