Daily TMLR digest for Mar 24, 2023

0 views
Skip to first unread message

TMLR

unread,
Mar 23, 2023, 8:00:09 PM3/23/23
to tmlr-anno...@googlegroups.com

Accepted papers
===============


Title: FLUID: A Unified Evaluation Framework for Flexible Sequential Data

Authors: Matthew Wallingford, Aditya Kusupati, Keivan Alizadeh-Vahid, Aaron Walsman, Aniruddha Kembhavi, Ali Farhadi

Abstract: Modern machine learning methods excel when training data is IID, large-scale, and well labeled. Learning in less ideal conditions remains an open challenge. The sub-fields of few-shot, continual, transfer, and representation learning have made substantial strides in learning under adverse conditions, each affording distinct advantages through methods and insights. These methods address different challenges such as data arriving sequentially or scarce training examples, however often the difficult conditions an ML system will face over its lifetime cannot be anticipated prior to deployment. Therefore, general ML systems which can handle the many challenges of learning in practical settings are needed. To foster research towards the goal of general ML methods, we introduce a new unified evaluation framework – FLUID (Flexible Sequential Data). FLUID integrates the objectives of few-shot, continual, transfer, and representation learning while enabling comparison and integration of techniques across these subfields. In FLUID, a learner faces a stream of data and must make sequential predictions while choosing how to update itself, adapt quickly to novel classes, and deal with changing data distributions; while accounting for the total amount of compute. We conduct experiments on a broad set of methods which shed new insight on the advantages and limitations of current techniques and indicate new research problems to solve. As a starting point towards more general methods, we present two new baselines which outperform other evaluated methods on FLUID.

URL: https://openreview.net/forum?id=UvJBKWaSSH

---


New submissions
===============


Title: Test-Time Adaptation for Visual Document Understanding

Abstract: For visual document understanding (VDU), self-supervised pretraining has been shown to successfully generate transferable representations, yet, effective adaptation of such representations to distribution shifts at test-time remains to be an unexplored area. We propose DocTTA, a novel test-time adaptation method for documents, that does source-free domain adaptation using unlabeled target document data. DocTTA leverages cross-modality self-supervised learning via masked visual language modeling, as well as pseudo labeling to adapt models learned on a \textit{source} domain to an unlabeled \textit{target} domain at test time. We introduce new benchmarks using existing public datasets for various VDU tasks, including entity recognition, key-value extraction, and document visual question answering. DocTTA shows significant improvements on these compared to the source model performance, up to 1.89\% in (F1 score), 3.43\% (F1 score), and 17.68\% (ANLS score), respectively.

URL: https://openreview.net/forum?id=zshemTAa6U

---

Title: Using Confounded Data in Reinforcement Learning

Abstract: In the presence of confounding, naively using off-the-shelf offline reinforcement learning (RL) algorithms leads to sub-optimal behaviour. In this work, we propose a safe method to exploit confounded offline data in model-based RL, which improves the sample-efficiency of an interactive agent that also collects online, unconfounded data. First, we import ideas from the well-established framework of $do$-calculus to express model-based RL as a causal inference problem, thus bridging the gap between the fields of RL and causality. Then, we propose a generic method for learning a causal transition model from offline and online data, which captures and corrects the confounding effect using a hidden latent variable. We prove that our method is correct and efficient, in the sense that it attains better generalization guarantees thanks to the confounded offline data (in the asymptotic case), regardless of the confounding effect (the offline expert's behaviour). We showcase our method on a series of synthetic experiments, which demonstrate that a) using confounded offline data naively degrades the sample-efficiency of an RL agent; b) using confounded offline data correctly improves sample-efficiency.

URL: https://openreview.net/forum?id=nFWRuJXPkU

---

Title: Semantic Self-adaptation: Enhancing Generalization with a Single Sample

Abstract: The lack of out-of-domain generalization is a critical weakness of deep networks for semantic segmentation. Previous studies relied on the assumption of a static model, i. e., once the training process is complete, model parameters remain fixed at test time. In this work, we challenge this premise with a self-adaptive approach for semantic segmentation that adjusts the inference process to each input sample. Self-adaptation operates on two levels. First, it fine-tunes the parameters of convolutional layers to the input image using consistency regularization. Second, in Batch Normalization layers, it interpolates between the training and the reference distribution derived from a single test sample. Despite these techniques being well known in the literature, we surprisingly find their combination to set new state-of-the-art accuracy on synthetic-to-real generalization benchmarks. Our empirical study suggests that self-adaptation may complement the established practice of model regularization at training time for improving deep network generalization to out-of-domain data.

URL: https://openreview.net/forum?id=ILNqQhGbLx

---

Title: JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games

Abstract: In this paper we present an empirical study of non-transitivity in perfect-information games by studying Xiangqi, a traditional board game in China with similar game-tree complexity to chess and shogi. After analyzing over 10,000 human Xiangqi playing records, we demonstrate that the game’s strategic structure contains both transitive and non-transitive components. To address non-transitivity, we propose the JiangJun algorithm, which combines Monte-Carlo Tree Search (MCTS) with Policy Space Response Oracles (PSRO) to find an approximate Nash equilibrium. We evaluate the algorithm empirically using a WeChat mini program and achieve a Master level with a 99.39% win rate against human players. The algorithm’s effectiveness in overcoming non-transitivity is confirmed by relative population performance and visualization results.

URL: https://openreview.net/forum?id=MMsyqXIJuk

---

Reply all
Reply to author
Forward
0 new messages