Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning

3 Marzo 2022
kmlcube

Authors

Alberto Maria Metelli, Alessio Russo, Marcello Restelli

Abstract

Importance Sampling (IS) is a widely used building block for a large variety of off-policy estimation and learning algorithms. However, empirical and theoretical studies have progressively shown that vanilla IS leads to poor estimations whenever the behavioral and target policies are too dissimilar. In this paper, we analyze the theoretical properties of the IS estimator by deriving a novel anticoncentration bound that formalizes the intuition behind its undesired behavior. Then, we propose a new class of IS transformations, based on the notion of power mean. To the best of our knowledge, the resulting estimator is the first to achieve, under certain conditions, two key properties: (i) it displays a subgaussian concentration rate; (ii) it preserves the differentiability in the target distribution. Finally, we provide numerical simulations on both synthetic examples and contextual bandits, in comparison with off-policy evaluation and learning baselines.

Full paper

Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning

Authors

Abstract

I3Lung: cure mediche personalizzate basate sull’intelligenza artificiale

Machine Learning Models Life Cycle

Configurable Environments in Reinforcement Learning: An Overview

Bayesian Persuasion in Online Settings

Multi-Receiver Online Bayesian Persuasion

Connecting Optimal Ex-Ante Collusion in Teams to Extensive-Form Correlation: Faster Algorithms and Positive Complexity Results

Bayesian Agency: Linear versus Tractable Contracts

Election Manipulation on Social Networks: Seeding, Edge Removal, Edge Addition