Basis Function Adaptation Methods for Cost Approximation in MDP - MIT

In Proc. of IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning 2009, Nashville, TN, USA

Basis Function Adaptation Methods for Cost Approximation in MDP Huizhen Yu

Dimitri P. Bertsekas

Department of Computer Science and HIIT University of Helsinki Helsinki 00014, Finland Email: [email protected]

Laboratory for Information and Decision Systems (LIDS) Massachusetts Institute of Technology MA 02139, USA Email: [email protected]

Abstract—We generalize a basis adaptation method for cost approximation in Markov decision processes (MDP), extending earlier work of Menache, Mannor, and Shimkin. In our context, basis functions are parametrized and their parameters are tuned by minimizing an objective function involving the cost function approximation obtained when a temporal differences (TD) or other method is used. The adaptation scheme involves only low order calculations and can be implemented in a way analogous to policy gradient methods. In the generalized basis adaptation framework we provide extensions to TD methods for nonlinear optimal stopping problems and to alternative cost approximations beyond those based on TD.

I. OVERVIEW We consider a parameter optimization context consisting of a parameter vector θ ∈ Θ, where Θ is an open subset of