Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning
Authors Alberto Maria Metelli, Alessio Russo, Marcello Restelli Abstract Importance Sampling (IS) is a widely used building block for a large variety of off-policy estimation and learning algorithms. However, empirical and theoretical studies have progressively shown that vanilla IS leads to poor estimations whenever the behavioral and target policies are too dissimilar. In this paper, […]