Markov Decision Processes (MDPs) are a powerful and general framework for addressing problems involving sequential decision-making under uncertainty. Such problems occur frequently in a number of fields, including engineering, finance, and operations research. Unfortunately, the curse of dimensionality prevents exact solution of most problems of practical interest.
We have developed new approximate policy iteration algorithms that exploit flexible, kernel-based cost approximation architectures to compute an approximation to the cost-to-go function by minimizing the error incurred in solving Bellman's equation over a set of sample states. Unlike other such Bellman residual methods, our approach is guaranteed to find a solution for which the Bellman residuals at the sample states are identically zero. As a result, convergence to the optimal policy in the limit of sampling the entire state space can be proved. Experimental results indicate that the method produces near-optimal policies for several applications, including a multi-UAV coordination and planning problem, in significantly less time than would be required to compute the exact optimal policy.