Daily TMLR digest for Jun 13, 2024

0 views

Skip to first unread message

TMLR

unread,

Jun 13, 2024, 12:00:08 AMJun 13

to tmlr-anno...@googlegroups.com

New certifications
==================

Featured Certification: Q-Learning for Stochastic Control under General Information Structures and Non-Markovian Environments

Ali Devran Kara, Serdar Yuksel

https://openreview.net/forum?id=1Yp6xpTV55

---

Accepted papers
===============

Title: Solving the Tree Containment Problem Using Graph Neural Networks

Authors: Arkadiy Dushatskiy, Esther Julien, Leen Stougie, Leo van Iersel

Abstract: \textsc{Tree containment} is a fundamental problem in phylogenetics useful for verifying a proposed phylogenetic network, representing the evolutionary history of certain species. \textsc{Tree containment} asks whether the given phylogenetic tree (for instance, constructed from a DNA fragment showing tree-like evolution) is contained in the given phylogenetic network. In the general case, this is an NP-complete problem. We propose to solve it approximately using Graph Neural Networks. In particular, we propose to combine the given network and the tree and apply a Graph Neural Network to this network-tree graph. This way, we achieve the capability of solving the tree containment instances representing a larger number of species than the instances contained in the training dataset (i.e., our algorithm has the inductive learning ability). Our algorithm demonstrates an accuracy of over $95\%$ in solving the tree containment problem on instances with up to 100 leaves.

URL: https://openreview.net/forum?id=nK5MazeIpn

---

Title: A Simple Video Segmenter by Tracking Objects Along Axial Trajectories

Authors: Ju He, Qihang Yu, Inkyu Shin, Xueqing Deng, Alan Yuille, Xiaohui Shen, Liang-Chieh Chen

Abstract: Video segmentation requires consistently segmenting and tracking objects over time. Due to the quadratic dependency on input size, directly applying self-attention to video segmentation with high-resolution input features poses significant challenges, often leading to GPU Out-Of-Memory errors. Consequently, modern video segmenters either extend an image segmenter without incorporating any temporal attention or resort to window space-time attention in a naive manner. In this work, we present Axial-VS, a general and simple framework that enhances video segmenters by tracking objects along axial trajectories. The framework tackles video segmentation through two sub-tasks: short-term within-clip segmentation and long-term cross-clip tracking. In the first step, Axial-VS augments an off-the-shelf clip-level video segmenter with the proposed axial-trajectory attention, sequentially tracking objects along the height- and width-trajectories within a clip, thereby enhancing temporal consistency by capturing motion trajectories. The axial decomposition significantly reduces the computational complexity for dense features, and outperforms the window space-time attention in segmentation quality. In the second step, we further employ axial-trajectory attention to the object queries in clip-level segmenters, which are learned to encode object information, thereby aiding object tracking across different clips and achieving consistent segmentation throughout the video. Without bells and whistles, Axial-VS showcases state-of-the-art results on video segmentation benchmarks, emphasizing its effectiveness in addressing the limitations of modern clip-level video segmenters. Code will be made available.

URL: https://openreview.net/forum?id=Sy6ZOStz5v

---

Title: Targeted Active Learning for Bayesian Decision-Making

Authors: Louis Filstroff, Iiris Sundin, Petrus Mikkola, Aleksei Tiulpin, Juuso Kylmäoja, Samuel Kaski

Abstract: Active learning is usually applied to acquire labels of informative data points in supervised learning, to maximize accuracy in a sample-efficient way. However, maximizing the supervised learning accuracy is not the end goal when the results are used for decision-making, for example in personalized medicine or economics. We argue that when acquiring samples sequentially, the common practice of separating learning and decision-making is sub-optimal, and we introduce an active learning strategy that takes the down-the-line decision problem into account. Specifically, we adopt a Bayesian experimental design approach, in which the proposed acquisition criterion maximizes the expected information gain on the posterior distribution of the optimal decision. We compare our targeted active learning strategy to existing alternatives on both simulated and real data and show improved performance in decision-making accuracy.

URL: https://openreview.net/forum?id=KxPjuiMgmm

---

Title: Q-Learning for Stochastic Control under General Information Structures and Non-Markovian Environments

Authors: Ali Devran Kara, Serdar Yuksel

Abstract: As a primary contribution, we present a convergence theorem for stochastic iterations, and in particular, Q-learning iterates, under a general, possibly non-Markovian, stochastic environment. Our conditions for convergence involve an ergodicity and a positivity criterion. We provide a precise characterization on the limit of the iterates and conditions on the environment and initializations for convergence. As our second contribution, we discuss the implications and applications of this theorem to a variety of stochastic control problems with non-Markovian environments involving (i) quantized approximations of fully observed Markov Decision Processes (MDPs) with continuous spaces (where quantization break down the Markovian structure), (ii) quantized approximations of belief-MDP reduced partially observable MDPS (POMDPs) with weak Feller continuity and a mild version of filter stability (which requires the knowledge of the model by the controller), (iii) finite window approximations of POMDPs under a uniform controlled filter stability (which does not require the knowledge of the model), and (iv) for multi-agent models where convergence of learning dynamics to a new class of equilibria, subjective Q-learning equilibria, will be studied. In addition to the convergence theorem, some implications of the theorem above are new to the literature and others are interpreted as applications of the convergence theorem. Some open problems are noted.

URL: https://openreview.net/forum?id=1Yp6xpTV55

---

New submissions
===============

Title: Highway Graph to Accelerate Reinforcement Learning

Abstract: Reinforcement Learning (RL) algorithms often suffer from low training efficiency.
A strategy to mitigate this issue is to incorporate a model-based planning algorithm, such as Monte Carlo Tree Search (MCTS) or Value Iteration (VI), into the environmental model.
The major limitation of VI is the need to iterate over
a large tensor with the shape $|\mathcal{S}|\times |\mathcal{A}| \times |\mathcal{S}|$, where $\mathcal{S}/\mathcal{A}$ denotes the state/action space.
This process iteratively updates the value of the preceding state $s_{t-1}$ based on the state $s_t$ in one step via value propagation. These still lead to intensive computations.
We focus on improving the training efficiency of RL algorithms by improving the efficiency of the value learning process.
For the deterministic environments with discrete state and action spaces, on the sampled empirical state-transition graph, a non-branching sequence of transitions can directly bring the agent from $s_0$ to $s_T$ without deviating from intermediate states, which we call a \textit{highway}.
On such non-branching highways, the value-updating process can be merged as a one-step process instead of iterating the value step-by-step.
Based on this observation, we propose a novel graph structure, named \textit{highway graph}, to model the state transition.
Our highway graph compresses the transition model into a concise graph, where edges can represent multiple state transitions to support value propagation across multiple time steps in each iteration.
We thus can obtain a more efficient value learning approach by facilitating the VI algorithm on highway graphs.
By integrating the highway graph into RL (as a model-based off-policy RL method), the RL training can be remarkably accelerated in the early stages (within 1 million frames).
Moreover, a deep neural network-based agent is trained using the highway graph, resulting in better generalization and lower storage costs.
Comparison against various baselines on four categories of environments reveals that our method outperforms both representative and novel model-free and model-based RL algorithms, demonstrating 10 to more than 150 times more efficiency while maintaining an equal or superior expected return, as confirmed by carefully conducted analyses.

URL: https://openreview.net/forum?id=3mJZfL77WM

---

Reply all

Reply to author

Forward

0 new messages