Policy Optimization as Online Learning with Mediator Feedback
Policy Optimization as Online Learning with Mediator Feedback Authors: Alberto Maria Metelli, Matteo Papini, Pierluca D’Oro, Marcello Restelli Conference: AAAI 2021 Abstract: Policy Optimization (PO) is a widely used approach to address continuous control tasks. In this paper, we introduce the notion of mediator feedback that frames PO as an online learning problem over the […]