Regularizing policy iteration for recursive feasibility and stability
We present a new algorithm called policy iteration plus (PI+) for the optimal control of nonlinear deterministic discrete-time plants with general cost functions. PI+ builds upon classical policy iteration and has the distinctive feature to enforce recursive feasibility under mild conditions, in the sense that the minimization problems solved at each iteration are guaranteed to admit a solution. While recursive feasibility is a desired property, it appears that existing results on the policy iteration algorithm fail to ensure it in general, contrary to PI+. We also establish the recursive stability of PI+: the policies generated at each iteration ensure a stability property for the closed-loop system. We prove our results under more general conditions than those currently available for policy iteration, by notably covering set stability. Finally, we present characterizations of near-optimality bounds for PI+ and prove the uniform convergence of the value functions generated by PI+ to the optimal value function. We believe that these results would benefit the burgeoning literature on approximate dynamic programming and reinforcement learning, where recursive feasibility is typically assumed without a clear method for verifying it and where recursive stability is essential for safe operation of the system.