CS 188: Arvficial Intelligence Computer Vision

Comment

Report 6 Downloads 31 Views

CS 188: Ar)ﬁcial Intelligence

Advanced Applica)ons: Computer Vision and Robo)cs*

Pieter Abbeel, Dan Klein University of California, Berkeley

Computer Vision

Object Detec)on

Object Detec)on Approach 1: HOG + SVM

Features and Generaliza)on

[Dalal and Triggs, 2005]

Features and Generaliza)on

Image

HoG

Training §  Round 1 §  Training set = §  Posi)ve examples: from labeling §  Nega)ve examples: random patches

à  preliminary SVM

§  Round 2 (“bootstrapping” or “mining hard nega)ves”) §  Training set = §  Posi)ve examples: from labeling §  Nega)ve examples: patches that have score >= -‐1

à ﬁnal SVM

State-‐of-‐the-‐art Results sofa

bo\le

cat [Girschik, Felzenszwalb, McAllester]

State-‐of-‐the-‐art Results person

car

horse

[Girschik, Felzenszwalb, McAllester]

Object Detec)on Approach 2: Deep Learning

How Many Computers to Iden)fy a Cat?

“Google Brain” [Le, Ng, Dean, et al, 2012]

Perceptron

f1 f2 f3

w1 w2 w3

Σ

>0?

Two-‐Layer Neural Network w11 w21 w31 f1

Σ

>0?

Σ

>0?

w1

w12 w22

f2

w32

w2

Σ

w3

f3

w13 w23 w33

Σ

>0?

N-‐Layer Neural Network Σ

>0?

Σ

>0?

…

Σ

>0?

Σ

>0?

Σ

>0?

…

Σ

>0?

Σ

>0?

Σ

>0?

…

Σ

>0?

f1 f2

f3

Σ

Hill Climbing §  Simple, general idea: §  §  §  § 

Start wherever Repeat: move to the best neighboring state If no neighbors be\er than current, quit Neighbors = small perturba)ons of w

§  Property §  Many local op)ma

-‐-‐> How to ﬁnd a good local op1mum?

Auto-‐Encoder (Crude Idea Sketch) f1

Σ

f3

>0?

f1

Σ

>0?

f2

Σ

>0?

f3

>0?

f2

Σ

Σ

>0?

Training Procedure: Stacked Auto-‐Encoder §  Auto-‐encoder §  Layer 1 = “compressed” version of input layer

§  Stacked Auto-‐encoder §  For every image, make a compressed image (= layer 1 response to image) §  Learn Layer 2 by using compressed images as input, and as output to be predicted §  Repeat similarly for Layer 3, 4, etc.

§  Some details lef out §  Typically in between layers responses get agglomerated from several neurons (“pooling” / “complex cells”)

Final Result: Trained Neural Network Σ

>0?

Σ

>0?

…

Σ

>0?

Σ

>0?

Σ

>0?

…

Σ

>0?

f1 f2

… fN

…

Σ

… >0?

Σ

… >0?

…

Σ

>0?

Final Result: Trained Neural Network Σ

>0?

Σ

>0?

…

Σ

>0?

Σ

>0?

Σ

>0?

…

Σ

>0?

f1 f2

… fN

…

Σ

… >0?

Σ

… >0?

Robo)cs

…

Σ

>0?

Σ

Robo)c Helicopters

Mo)va)ng Example

n 

How do we execute a task like this?

Autonomous Helicopter Flight

§  Key challenges: §  Track helicopter posi)on and orienta)on during ﬂight §  Decide on control inputs to send to helicopter

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA

Autonomous Helicopter Setup

On-‐board iner)al measurement unit (IMU) Posi)on Send out controls to helicopter

HMM for Tracking the Helicopter

§  State:

˙ µ, ˙ Ã) ˙ s = (x, y, z, Á, µ, Ã, x, ˙ y, ˙ z, ˙ Á,

§  Measurements: [observa)on update] §  3-‐D coordinates from vision, 3-‐axis magnetometer, 3-‐axis gyro, 3-‐axis accelerometer

§  Transi)ons (dynamics): [)me elapse update] §  st+1 = f (st, at) + wt

f: encodes helicopter dynamics, w: noise

Helicopter MDP §  State:

˙ µ, ˙ Ã) ˙ s = (x, y, z, Á, µ, Ã, x, ˙ y, ˙ z, ˙ Á,

§  Ac)ons (control inputs): §  §  §  § 

alon : Main rotor longitudinal cyclic pitch control (aﬀects pitch rate) alat : Main rotor la)tudinal cyclic pitch control (aﬀects roll rate) acoll : Main rotor collec)ve pitch (aﬀects main rotor thrust) arud : Tail rotor collec)ve pitch (aﬀects tail rotor thrust)

§  Transi)ons (dynamics): §  st+1 = f (st, at) + wt [f encodes helicopter dynamics] [w is a probabilistic noise model]

§  Can we solve the MDP yet?

Problem: What’s the Reward? §  Reward for hovering:

Hover

[Ng et al, 2004]

Problem: What’s the Reward? §  Rewards for “Flip”? §  Problem: what’s the target trajectory? §  Just write it down by hand?

Flips (?)

40

Helicopter Appren)ceship?

41

Demonstra)ons

Learning a Trajectory Hidden

Demo 1

Demo 2

•  HMM-‐like genera)ve model

–  Dynamics model used as HMM transi)on model –  Demos are observa)ons of hidden trajectory

•  Problem: how do we align observa)ons to hidden trajectory?

Abbeel, Coates, Ng, IJRR 2010

Probabilis)c Alignment using a Bayes’ Net Hidden

Demo 1

Demo 2

§  Dynamic Time Warping (Needleman&Wunsch 1970, Sakoe&Chiba, 1978)

§  Extended Kalman ﬁlter / smoother Abbeel, Coates, Ng, IJRR 2010

Aligned Demonstra)ons

Alignment of Samples

§  Result: inferred sequence is much cleaner!

Final Behavior

[Abbeel, Coates, Quigley, Ng, 2010]

Legged Locomo)on

Quadruped

§  Low-‐level control problem: moving a foot into a new loca)on à search with successor func)on ~ moving the motors §  High-‐level control problem: where should we place the feet? §  Reward func)on R(x) = w . f(s) [25 features] [Kolter, Abbeel & Ng, 2008]

Experimental setup §  Demonstrate path across the “training terrain”

§  Run appren)ceship to learn the reward func)on §  Receive “tes)ng terrain”-‐-‐-‐height map.

§  Find the op)mal policy with respect to the learned reward func,on for crossing the tes)ng terrain. [Kolter, Abbeel & Ng, 2008]

Without learning

With learned reward function

Personal Robo)cs

PR1 (tele-‐op)

PR2 (autonomous)

PR2 (autonomous)

Darpa Robo)cs Challenge §  Disaster response (e.g., Fukushima) §  E.g., Get into car, drive it, get out, open door, enter building, climb ladder, traverse industrial walkway, use tool to break a panel, locate and close a valve, replace a cooling pump

§  Compe))on / Prizes §  Simula)on compe))on (June 2013) §  Prize: Petman

§  Real robot (petman) compe))on (November 2014) §  Prize: $ 2M

Next Time §  AI for games §  Final Contest results §  Where to go next to learn more about AI