CS 188: Arvficial Intelligence Computer Vision Object Detecvon Object ...

Comment

Report 1 Downloads 46 Views

Computer Vision

CS 188: Ar)ﬁcial Intelligence

Advanced Applica)ons: Computer Vision and Robo)cs*

Pieter Abbeel, Dan Klein University of California, Berkeley

Object Detec)on

Object Detec)on Approach 1: HOG + SVM

Features and Generaliza)on

Features and Generaliza)on

Image [Dalal and Triggs, 2005]

HoG

Training

State-‐of-‐the-‐art Results

§  Round 1 §  Training set =

sofa

§  Posi)ve examples: from labeling §  Nega)ve examples: random patches

à  preliminary SVM

§  Round 2 (“bootstrapping” or “mining hard nega)ves”)

bo\le

§  Training set = §  Posi)ve examples: from labeling §  Nega)ve examples: patches that have score >= -‐1

cat

à ﬁnal SVM [Girschik, Felzenszwalb, McAllester]

State-‐of-‐the-‐art Results

Object Detec)on Approach 2: Deep Learning

person

car

horse

[Girschik, Felzenszwalb, McAllester]

How Many Computers to Iden)fy a Cat?

Perceptron

f1 f2 f3

“Google Brain” [Le, Ng, Dean, et al, 2012]

w1 w2 w3

Σ

>0?

Two-‐Layer Neural Network

N-‐Layer Neural Network

w11 w21 w31 f1 f2

f3

Σ

>0? w1

w32

Σ

>0?

w2

Σ

w3 w13 w23 w33

>0?

Σ

>0?

…

Σ

>0?

Σ

>0?

Σ

>0?

…

Σ

>0?

Σ

>0?

Σ

>0?

…

Σ

>0?

f1

w12 w22

Σ

Σ

f2

Σ

f3

>0?

Hill Climbing

Auto-‐Encoder (Crude Idea Sketch)

§  Simple, general idea: §  §  §  § 

Start wherever Repeat: move to the best neighboring state If no neighbors be\er than current, quit Neighbors = small perturba)ons of w

f1

Σ

§  Many local op)ma

Σ

f1

Σ

>0?

f2

Σ

>0?

f3

>0?

f3

-‐-‐> How to ﬁnd a good local op1mum?

>0?

>0?

f2

§  Property

Σ

Training Procedure: Stacked Auto-‐Encoder

Final Result: Trained Neural Network

§  Auto-‐encoder §  Layer 1 = “compressed” version of input layer

§  Stacked Auto-‐encoder §  For every image, make a compressed image (= layer 1 response to image) §  Learn Layer 2 by using compressed images as input, and as output to be predicted §  Repeat similarly for Layer 3, 4, etc.

§  Some details lef out §  Typically in between layers responses get agglomerated from several neurons (“pooling” / “complex cells”)

Σ

>0?

Σ

>0?

…

Σ

>0?

Σ

>0?

Σ

>0?

…

Σ

>0?

f1 f2

… fN

…

Σ

… >0?

Σ

… >0?

…

Σ

>0?

Robo)cs

Final Result: Trained Neural Network Σ

>0?

Σ

>0?

…

Σ

>0?

Σ

>0?

Σ

>0?

…

Σ

>0?

f1 f2

… fN

…

Σ

… >0?

Σ

Σ

… >0?

…

Σ

>0?

Robo)c Helicopters

Mo)va)ng Example

n 

Autonomous Helicopter Flight

How do we execute a task like this?

Autonomous Helicopter Setup

On-‐board iner)al measurement unit (IMU) Posi)on

§  Key challenges: §  Track helicopter posi)on and orienta)on during ﬂight §  Decide on control inputs to send to helicopter

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA

Send out controls to helicopter

HMM for Tracking the Helicopter

Helicopter MDP §  State:

˙ µ, ˙ Ã) ˙ s = (x, y, z, Á, µ, Ã, x, ˙ y, ˙ z, ˙ Á,

§  Ac)ons (control inputs):

§  State:

˙ µ, ˙ Ã) ˙ s = (x, y, z, Á, µ, Ã, x, ˙ y, ˙ z, ˙ Á,

§  Measurements: [observa)on update] §  3-‐D coordinates from vision, 3-‐axis magnetometer, 3-‐axis gyro, 3-‐axis accelerometer

§  Transi)ons (dynamics): [)me elapse update] §  st+1 = f (st, at) + wt

§  §  §  § 

alon : Main rotor longitudinal cyclic pitch control (aﬀects pitch rate) alat : Main rotor la)tudinal cyclic pitch control (aﬀects roll rate) acoll : Main rotor collec)ve pitch (aﬀects main rotor thrust) arud : Tail rotor collec)ve pitch (aﬀects tail rotor thrust)

§  Transi)ons (dynamics): §  st+1 = f (st, at) + wt [f encodes helicopter dynamics] [w is a probabilistic noise model]

f: encodes helicopter dynamics, w: noise

§  Can we solve the MDP yet?

Problem: What’s the Reward?

Hover

§  Reward for hovering:

[Ng et al, 2004]

Problem: What’s the Reward?

Flips (?)

§  Rewards for “Flip”? §  Problem: what’s the target trajectory? §  Just write it down by hand?

40

Helicopter Appren)ceship?

Demonstra)ons

41

Learning a Trajectory

Probabilis)c Alignment using a Bayes’ Net

Hidden

Hidden

Demo 1

Demo 1

Demo 2

Demo 2

•  HMM-‐like genera)ve model

–  Dynamics model used as HMM transi)on model –  Demos are observa)ons of hidden trajectory

§  Dynamic Time Warping

•  Problem: how do we align observa)ons to hidden trajectory?

(Needleman&Wunsch 1970, Sakoe&Chiba, 1978)

§  Extended Kalman ﬁlter / smoother

Abbeel, Coates, Ng, IJRR 2010

Aligned Demonstra)ons

Abbeel, Coates, Ng, IJRR 2010

Alignment of Samples

§  Result: inferred sequence is much cleaner!

Legged Locomo)on

Final Behavior

[Abbeel, Coates, Quigley, Ng, 2010]

Quadruped

Experimental setup §  Demonstrate path across the “training terrain”

§  Run appren)ceship to learn the reward func)on §  Receive “tes)ng terrain”-‐-‐-‐height map. §  Low-‐level control problem: moving a foot into a new loca)on à search with successor func)on ~ moving the motors §  High-‐level control problem: where should we place the feet? §  Reward func)on R(x) = w . f(s) [25 features] [Kolter, Abbeel & Ng, 2008]

Without learning

§  Find the op)mal policy with respect to the learned reward func,on for crossing the tes)ng terrain. [Kolter, Abbeel & Ng, 2008]

With learned reward function

Personal Robo)cs

PR1 (tele-‐op)

PR2 (autonomous)

PR2 (autonomous)

Darpa Robo)cs Challenge

Next Time

§  Disaster response (e.g., Fukushima) §  E.g., Get into car, drive it, get out, open door, enter building, climb ladder, traverse industrial walkway, use tool to break a panel, locate and close a valve, replace a cooling pump

§  Compe))on / Prizes §  Simula)on compe))on (June 2013) §  Prize: Petman

§  Real robot (petman) compe))on (November 2014) §  Prize: $ 2M

§  AI for games §  Final Contest results §  Where to go next to learn more about AI