Daily TMLR digest for Jul 18, 2022

0 views

Skip to first unread message

TMLR

unread,

Jul 17, 2022, 8:00:06 PM7/17/22

to tmlr-anno...@googlegroups.com

New submissions
===============

Title: Bridging Offline and Online Experimentation: Constraint Active Search for Deployed Performance Optimization

Abstract: A common challenge in machine learning model development is that models perform differently between the offline development phase and the eventual deployment phase. Fundamentally, the goal of such a model is to maximize performance during deployment, but such performance can not be measured offline. As such, we propose to augment the standard offline sample efficient hyperparameter optimization to instead search offline for a diverse set of models which can have potentially superior online performance. To this end, we utilize Constraint Active Search to identify such a diverse set of models, and we study their online performance using a variant of Best Arm Identification to select the best model for deployment. The key contribution of this article is the theoretical analysis of this development phase, both in analyzing the probability of improvement over the baseline as well as the number of viable treatments for online testing. We demonstrate the viability of this strategy on synthetic examples, as well as a recommendation system benchmark.

URL: https://openreview.net/forum?id=XX8CEN815d

---

Title: Modified Threshold Method for Ordinal Regression

Abstract: Ordinal regression (OR, also called ordinal classification) is the classification of ordinal data in which the underlying target variable is discrete and has a natural ordinal relation. For OR problems, threshold methods are often employed since they are considered to capture the ordinal relation of data well: they learn a one-dimensional transformation (1DT) of the explanatory variable and classify the data by labeling that learned 1DT according to the rank of the interval to which the 1DT belongs among intervals of the number of classes. In existing methods, threshold parameters for separating intervals are determined regardless of the learning result of the 1DT and the task under consideration, which has no theoretical rationality. Such conventional settings may deteriorate the classification performance. We, therefore, propose a novel computationally efficient method for determining the threshold parameters: it learns each threshold parameter independently through solving a problem relaxed from the minimization of the empirical task risk for the learned 1DT. The proposed labeling procedure experimentally gave superior classification performance with a feasible degree of additional computational load compared to four related existing labeling procedures.

URL: https://openreview.net/forum?id=PInXz6Gasv

---

Title: An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task

Abstract: Off-policy prediction, learning the value function for one policy from data generated while following another policy, is one of the most challenging subproblems in reinforcement learning. This paper presents empirical results with eleven prominent off-policy learning algorithms that use linear function approximation: five Gradient-TD methods, two Emphatic-TD methods, Off-policy TD, Vtrace, and variants of Tree Backup and ABQ that are derived in this paper such that they are applicable to the prediction setting. Our experiments used the Collision task, a small off-policy problem analogous to that of an autonomous car trying to predict whether it will collide with an obstacle. We assessed the performance of the algorithms according to their learning rate, asymptotic error level, and sensitivity to step-size and bootstrapping parameters. By these measures, the eleven algorithms can be partially ordered on the Collision task. In the top tier, the two Emphatic-TD algorithms learned the fastest, reached the lowest errors, and were robust to parameter settings. In the middle tier, the five Gradient-TD algorithms and Off-policy TD were more sensitive to the bootstrapping parameter. The bottom tier comprised Vtrace, Tree Backup, and ABQ; these algorithms were no faster and had higher asymptotic error than the others. Our results are definitive for this task, though of course experiments with more tasks are needed before an overall assessment of the algorithms' merits can be made.

URL: https://openreview.net/forum?id=4w3Pya9OxC

---

Reply all

Reply to author

Forward

0 new messages