With special structure, by analyzing one policy, obtain performance of its neighboring policies Performance gradient
q
q+Dq
gradient
hill climbing
Queuing networks, Markov processes 2
Policies in Distance? With special structure, by analyzing one policy, find a better policy in the distance Policy Iteration (PI): Discrete version of PA
3
Continuous ➢ Performance derivatives d dq
???
Discrete ➢ Performance difference '
???
(PDF)
➢ Find the best direction ➢ Hill climbing ➢ Gradient