Policy Optimization via Optimal Policy Evaluation
Authors Alberto Maria Metelli, Samuele Meta, Marcello Restelli Abstract Off-policy methods are the basis of a large number of effective Policy Optimization (PO) algorithms. In this setting, Importance Sampling (IS) is typically employed as a what-if analysis tool, with the goal of estimating the performance of a target policy, given samples collected with a different […]