March, 2022 - ML cube

04 Mar 22

Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning

Authors Alberto Maria Metelli, Alessio Russo, Marcello Restelli Abstract Importance Sampling (IS) is a widely used building block for a large variety of off-policy estimation and learning algorithms. However, empirical and theoretical studies have progressively shown that vanilla IS leads to poor estimations whenever the behavioral and target policies are too dissimilar. In this paper, […]

04 Mar 22

Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection

Authors Matteo Papini, Andrea Tirinzoni, Aldo Pacchiano, Marcello Restelli, Alessandro Lazaric, Matteo Pirotta Abstract We study the role of the representation of state-action value functions in regret minimization in finite-horizon Markov Decision Processes (MDPs) with linear structure. We first derive a necessary condition on the representation, called universally spanning optimal features (UNISOFT), to achieve constant […]

04 Mar 22

Time-variant variational transfer for value functions

Authors Giuseppe Canonaco, Andrea Soprani, Matteo Giuliani, Andrea Castelletti, Manuel Roveri, Marcello Restelli Abstract In most of the transfer learning approaches to reinforcement learning (RL) the distribution over the tasks is assumed to be stationary. Therefore, the target and source tasks are i.i.d. samples of the same distribution. Unfortunately, this assumption rarely holds in real-world […]

04 Mar 22

Policy Optimization via Optimal Policy Evaluation

Authors Alberto Maria Metelli, Samuele Meta, Marcello Restelli Abstract Off-policy methods are the basis of a large number of effective Policy Optimization (PO) algorithms. In this setting, Importance Sampling (IS) is typically employed as a what-if analysis tool, with the goal of estimating the performance of a target policy, given samples collected with a different […]

04 Mar 22

Goal-Directed Planning via Hindsight Experience Replay

Authors Lorenzo Moro, Amarildo Likmeta, Enrico Prati, Marcello Restelli Abstract We consider the problem of goal-directed planning under a deterministic transition model. Monte Carlo Tree Search has shown remarkable performance in solving deterministic control problems. It has been extended from complex continuous domains through function approximators to bias the search of the planning tree in […]

04 Mar 22

Exploiting Minimum-Variance Policy Evaluation for Policy Optimization

Machine-Learning-and-Knowledge-Discovery-in-Database

04 Mar 22

Conservative Online Convex Optimization

Authors Martino Bernasconi de Luca, Edoardo Vittori, Francesco Trovò, Marcello Restelli Abstract Online learning algorithms often have the issue of exhibiting poor performance during the initial stages of the optimization procedure, which in practical applications might dissuade potential users from deploying such solutions. In this paper, we study a novel setting, namely conservative online convex […]

04 Mar 22

Exploiting History Data for Nonstationary Multi-armed Bandit

Authors Gerlando Re, Fabio Chiusano, Francesco Trovò, Diego Carrera, Giacomo Boracchi, Marcello Restelli. Abstract The Multi-armed Bandit (MAB) framework has been applied successfully in many application fields. In the last years, the use of active approaches to tackle the nonstationary MAB setting, i.e., algorithms capable of detecting changes in the environment and re-configuring automatically to […]

04 Mar 22

Policy space identification in configurable environments

Authors Alberto Maria Metelli, Guglielmo Manneschi, Marcello Restelli Abstract We study the problem of identifying the policy space available to an agent in a learning process, having access to a set of demonstrations generated by the agent playing the optimal policy in the considered space. We introduce an approach based on frequentist statistical testing to […]

04 Mar 22

Data-driven indicators for the detection and prediction of stuck-pipe events in oil&gas drilling operations

Abstract Stuck-pipe phenomena can have disastrous effects on drilling performance, with outcomes that can range from time delays to loss of expensive machinery. In this work, we develop three indicators based on mudlog data, which aim to detect three different physical phenomena associated with theinsurgence of a sticking. In particular, two indices target respectively the detection […]