Reinforcement Learning - in a Nutshell

Report 10 Downloads 171 Views
REINFORCEMENT LEARNING in a Nutshell

Abraham Nunes Supervisor: T. Trappenberg

LEARNING OBJECTIVES

# Build a simple RL algorithm

# Identify the difference between learning and control

# Describe temporal-difference and actor-critic models

# Describe difference between model-free and model-based RL # Describe some findings from RL studies of depression

2

BUILDING A REINFORCEMENT LEARNING MODEL

CREATE A WORLD AND AN AGENT

World

Agent

4

PROPERTIES OF THE WORLD # States, s ∈ S # Available actions, a ∈ A # State transition dynamics, P (st +1 | st , at ) ◦ Stochastic vs. deterministic ◦ May or may not be action dependent # Reward function, r : S → R ◦ May take into account at

5

PROPERTIES OF THE AGENT

# State value, V (s) , or State-action value, Q (s , a) , "table"

# Learning rule for V (s) or Q (s , a) # Control policy, π : S → A

# May or may not have an internal "model" of the world

# Goal: to find the policy π (s) that maximizes the total future reward

6

AT EACH TIME STEP

t0 st World

Agent

7

AT EACH TIME STEP...

t0 st Agent

World

π (st )  at

8

AT EACH TIME STEP...

t0 s t +1 World

r

Agent

Then agent updates values for V (s) (i.e. "learns")

9

FOR YOUR FUTURE REFERENCE 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:

procedure RLSimulate(· · · ) Instantiate world Ω, then agent Ψ Sample first state s ∈ S from Ω Sample first action a ∈ A from Ψ Ψ acts on Ω Ω produces reward r ∈ R and next state s0 for observation Ψ selects next action a0 for t  1 : T do Ψ learns/updates model Q (s, a) , R (s , a, s0 ) , T (s , a, s0 ) s ← s0, a ← a0 Ψ acts on Ω Ω produces reward r and next state s0 for observation Ψ selects next action a0 end for end procedure 10

LEARNING RULES

LEARNING RULE FORM

v w·u

δ  r − v then, w ← w + αδ u Basically gradient descent: 1. Take the difference between reality and my prediction ( δ ) 2. Change weights by a small fraction ( α ) of the error ◦ α  "learning rate" or "step size" (usually
Recommend Documents