yields PDF

Report 2 Downloads 64 Views
From Perturbation Analysis to a New Paradigm of Optimization

Xi-Ren Cao Shanghai Jiao Tong University

The Problem in Optimization Policy Space:

Best Policy?

D



Policy space too large for exhaustive search

(100 states, 2 actions  2100=1030 policies, 10Gh ->1012 yrs to count) 

State space too large, we cannot analyze every policy

1

Perturbation Analysis (PA) - gradient-based approach

 With special structure, by analyzing one policy, obtain performance of its neighboring policies  Performance gradient

q

q+Dq

gradient

hill climbing

 Queuing networks, Markov processes 2

Policies in Distance?  With special structure, by analyzing one policy, find a better policy in the distance  Policy Iteration (PI): Discrete version of PA

3

Continuous ➢ Performance derivatives d  dq

???

Discrete ➢ Performance difference  ' 

???

(PDF)

➢ Find the best direction ➢ Hill climbing ➢ Gradient