Research

04 Mar 22

Learning to Explore a Class of Multiple Reward-Free Environments

Authors Mirco Mutti, Mattia Mancassola, Marcello Restelli Abstract Several recent works have been dedicated to the pure exploration of a single reward-free environment. Along this line, we address the problem of learning to explore a class of multiple reward-free environments with a unique general strategy, which aims to provide a universal initialization to subsequent reinforcement […]

04 Mar 22

A Policy Gradient Method for Task-Agnostic Exploration

Author Mirco Mutti, Lorenzo Pratissoli, Marcello Restelli Abstract In a reward-free environment, what is a suitable intrinsic objective for an agent to pursue so that it can learn an optimal task-agnostic exploration policy? In this paper, we argue that the entropy of the state distribution induced by limited-horizon trajectories is a sensible target. Especially, we […]

04 Mar 22

Inferring Functional Properties from Fluid Dynamics Features

Author Andrea Schillaci, Maurizio Quadrio, Carlotta Pipolo, Marcello Restelli, Giacomo Boracchi Abstract In a wide range of applied problems involving fluid flows, Computational Fluid Dynamics (CFD) provides detailed quantitative information on the flow field, at variable level of fidelity and computational cost. However, CFD alone cannot predict high-level functional properties that are not easily obtained […]

04 Mar 22

Gaussian Approximation for Bias Reduction in Q-Learning

Authors Carlo D’Eramo, Andrea Cini, Alessandro Nuara, Matteo Pirotta, Cesare Alippi, Jan Peters, Marcello Restelli Abstract Temporal-Difference off-policy algorithms are among the building blocks of reinforcement learning (RL). Within this family, Q-Learning is arguably the most famous one, which has been widely studied and extended. The update rule of Q-learning involves the use of the […]

05 Mar 21

Policy Optimization as Online Learning with Mediator Feedback

Policy Optimization as Online Learning with Mediator Feedback Authors: Alberto Maria Metelli, Matteo Papini, Pierluca D’Oro, Marcello Restelli Conference: AAAI 2021 Abstract: Policy Optimization (PO) is a widely used approach to address continuous control tasks. In this paper, we introduce the notion of mediator feedback that frames PO as an online learning problem over the […]

05 Mar 21

Task-Agnostic Exploration via Policy Gradient of a Non-Parametric State Entropy Estimate

Task-Agnostic Exploration via Policy Gradient of a Non-Parametric State Entropy Estimate Authors: Mirco Mutti, Lorenzo Pratissoli, Marcello Restelli Conference: AAAI 2021 Abstract: In a reward-free environment, what is a suitable intrinsic objective for an agent to pursue so that it can learn an optimal task-agnostic exploration policy? In this paper, we argue that the entropy […]

05 Mar 21

Newton Optimization on Helmholtz Decomposition for Continuous Games

Newton Optimization on Helmholtz Decomposition for Continuous Games Authors: Giorgia Ramponi, Marcello Restelli Conference: AAAI 2021 Abstract: Many learning problems involve multiple agents optimizing different interactive functions. In these problems, the standard policy gradient algorithms fail due to the non-stationarity of the setting and the different interests of each agent. In fact, algorithms must take […]

15 Feb 21

Learning Probably Approximately Correct Maximin Strategies in Games with Infinite Strategy Spaces

Learning Probably Approximately Correct Maximin Strategies in Games with Infinite Strategy Spaces Authors: Alberto Marchesi, Francesco Trovò, Nicola Gatti Conference: AAAI 2021 Abstract: We tackle the problem of learning equilibria in simulationbased games. In such games, the players’ utility functions cannot be described analytically, as they are given through a black-box simulator that can be […]

15 Feb 21

Online Learning in Non-Cooperative Configurable Markov Decision Process

Online Learning in Non-Cooperative Configurable Markov Decision Process Authors: Giorgia Ramponi, Alberto Maria Metelli, Alessandro Concetti, Marcello Restelli Conference: AAAI 2021 Abstract: In the Configurable Markov Decision Processes there are two entities, a Reinforcement Learning agent and a configurator which can modify some parameters of the environment to improve the performance of the agent. What […]

15 Feb 21

Inverse Reinforcement Learning from a Gradient-based Learner

Inverse Reinforcement Learning from a Gradient-based Learner Authors: Giorgia Ramponi, Gianluca Drappo, Marcello Restelli Conference: NeurIPS 2020 Abstract: Inverse Reinforcement Learning addresses the problem of inferring an expert’s reward function from demonstrations. However, in many applications, we not only have access to the expert’s near-optimal behaviour, but we also observe part of her learning process. […]