Weekly TMLR digest for Sep 01, 2024

1 view

Skip to first unread message

TMLR

unread,

Sep 1, 2024, 12:00:10 AM9/1/24

to tmlr-annou...@googlegroups.com

New certifications
==================

Featured Certification: Equivariant Graph Network Approximations of High-Degree Polynomials for Force Field Prediction

Zhao Xu, Haiyang Yu, Montgomery Bohde, Shuiwang Ji

https://openreview.net/forum?id=7DAFwp0Vne

---

Survey Certification: Graph Reinforcement Learning for Combinatorial Optimization: A Survey and Unifying Perspective

Victor-Alexandru Darvariu, Stephen Hailes, Mirco Musolesi

https://openreview.net/forum?id=HduK51xNtS

---

Reproducibility Certification: Guarantees of confidentiality via Hammersley-Chapman-Robbins bounds

Kamalika Chaudhuri, Chuan Guo, Laurens van der Maaten, Saeed Mahloujifar, Mark Tygert

https://openreview.net/forum?id=DOWSP7y2cu

---

Expert Certification: Deconfounding Imitation Learning with Variational Inference

Risto Vuorio, Pim De Haan, Johann Brehmer, Hanno Ackermann, Daniel Dijkman, Taco Cohen

https://openreview.net/forum?id=3FsVtsISHW

---

Accepted papers
===============

Title: Identifying and Clustering Counter Relationships of Team Compositions in PvP Games for Efficient Balance Analysis

Authors: Chiu-Chou Lin, Yu-Wei Shih, Kuei-Ting Kuo, Yu-Cheng Chen, Chien-Hua Chen, Wei-Chen Chiu, I-Chen Wu

Abstract: \textbf{How can balance be quantified in game settings?} This question is crucial for game designers, especially in player-versus-player (PvP) games, where analyzing the strength relations among predefined team compositions—such as hero combinations in multiplayer online battle arena (MOBA) games or decks in card games—is essential for enhancing gameplay and achieving balance. We have developed two advanced measures that extend beyond the simplistic win rate to quantify balance in zero-sum competitive scenarios. These measures are derived from win value estimations, which employ strength rating approximations via the Bradley-Terry model and counter relationship approximations via vector quantization, significantly reducing the computational complexity associated with traditional win value estimations. Throughout the learning process of these models, we identify useful categories of compositions and pinpoint their counter relationships, aligning with the experiences of human players without requiring specific game knowledge. Our methodology hinges on a simple technique to enhance codebook utilization in discrete representation with a deterministic vector quantization process for an extremely small state space. Our framework has been validated in popular online games, including \textit{Age of Empires II}, \textit{Hearthstone}, \textit{Brawl Stars}, and \textit{League of Legends}. The accuracy of the observed strength relations in these games is comparable to traditional pairwise win value predictions, while also offering a more manageable complexity for analysis. Ultimately, our findings contribute to a deeper understanding of PvP game dynamics and present a methodology that significantly improves game balance evaluation and design.

URL: https://openreview.net/forum?id=2D36otXvBE

---

Title: Perceptual Similarity for Measuring Decision-Making Style and Policy Diversity in Games

Authors: Chiu-Chou Lin, Wei-Chen Chiu, I-Chen Wu

Abstract: Defining and measuring decision-making styles, also known as playstyles, is crucial in gaming, where these styles reflect a broad spectrum of individuality and diversity. However, finding a universally applicable measure for these styles poses a challenge. Building on $\textit{Playstyle Distance}$, the first unsupervised metric to measure playstyle similarity based on game screens and raw actions by identifying comparable states with discrete representations for computing policy distance, we introduce three enhancements to increase accuracy: multiscale analysis with varied state granularity, a perceptual kernel rooted in psychology, and the utilization of the intersection-over-union method for efficient evaluation. These innovations not only advance measurement precision but also offer insights into human cognition of similarity. Across two racing games and seven Atari games, our techniques significantly improve the precision of zero-shot playstyle classification, achieving an accuracy exceeding 90\% with fewer than 512 observation-action pairs—less than half an episode of these games. Furthermore, our experiments with $\textit{2048}$ and $\textit{Go}$ demonstrate the potential of discrete playstyle measures in puzzle and board games. We also develop an algorithm for assessing decision-making diversity using these measures. Our findings improve the measurement of end-to-end game analysis and the evolution of artificial intelligence for diverse playstyles.

URL: https://openreview.net/forum?id=30C9AWBW49

---

Title: Equivariant Graph Network Approximations of High-Degree Polynomials for Force Field Prediction

Authors: Zhao Xu, Haiyang Yu, Montgomery Bohde, Shuiwang Ji

Abstract: Recent advancements in equivariant deep models have shown promise in accurately predicting atomic potentials and force fields in molecular dynamics simulations. Using spherical harmonics (SH) and tensor products (TP), these equivariant networks gain enhanced physical understanding, like symmetries and many-body interactions. Beyond encoding physical insights, SH and TP are also crucial to represent equivariant polynomial functions. In this work, we analyze the equivariant polynomial functions for the equivariant architecture, and introduce a novel equivariant network, named PACE. The proposed PACE utilizes edge booster and the Atomic Cluster Expansion (ACE) technique to approximate a greater number of $SE(3) \times S_n$ equivariant polynomial functions with enhanced degrees. As experimented in commonly used benchmarks, PACE demonstrates state-of-the-art performance in predicting atomic energy and force fields, with robust generalization capability across various geometric distributions under molecular dynamics (MD) across different temperature conditions. Our code is publicly available as part of the AIRS library \url{https://github.com/divelab/AIRS/}.

URL: https://openreview.net/forum?id=7DAFwp0Vne

---

Title: AdaFlood: Adaptive Flood Regularization

Authors: Wonho Bae, Yi Ren, Mohamed Osama Ahmed, Frederick Tung, Danica J. Sutherland, Gabriel L. Oliveira

Abstract: Although neural networks are conventionally optimized towards zero training loss, it has been recently learned that targeting a non-zero training loss threshold, referred to as a flood level, often enables better test time generalization. Current approaches, however, apply the same constant flood level to all training samples, which inherently assumes all the samples have the same difficulty. We present AdaFlood, a novel flood regularization method that adapts the flood level of each training sample according to the difficulty of the sample. Intuitively, since training samples are not equal in difficulty, the target training loss should be conditioned on the instance. Experiments on datasets covering four diverse input modalities -- text, images, asynchronous event sequences, and tabular -- demonstrate the versatility of AdaFlood across data domains and noise levels.

URL: https://openreview.net/forum?id=2s5YU6CSEz

---

Title: Mitigating Group Bias in Federated Learning: Beyond Local Fairness

Authors: Ganghua Wang, Ali Payani, Myungjin Lee, Ramana Rao Kompella

Abstract: The issue of group fairness in machine learning models, where certain sub-populations or groups are favored over others, has been recognized for some time. While many mitigation strategies have been proposed in centralized learning, many of these methods are not directly applicable in federated learning, where data is privately stored on multiple clients. To address this, many proposals try to mitigate bias at the level of clients before aggregation, which we call locally fair training. However, the effectiveness of these approaches is not well understood. In this work, we investigate the theoretical foundation of locally fair training by studying the relationship between global model fairness and local model fairness. Additionally, we prove that for a broad class of fairness metrics, the global model's fairness can be obtained using only summary statistics from local clients. Based on that, we propose a globally fair training algorithm that optimizes the fairness-regularized empirical loss. Real-data experiments demonstrate the promising performance of our proposed approach for enhancing fairness while retaining high accuracy compared to locally fair training methods.

URL: https://openreview.net/forum?id=ANXoddnzct

---

Title: A Lennard-Jones Layer for Distribution Normalization

Authors: Mulun Na, Jonathan Klein, Biao Zhang, Wojtek Palubicki, Soren Pirk, Dominik Michels

Abstract: We introduce the Lennard-Jones layer (LJL) for the equalization of the density of 2D and 3D point clouds through systematically rearranging points without destroying their overall structure (distribution normalization). LJL simulates a dissipative process of repulsive and weakly attractive interactions between individual points by considering the nearest neighbor of each point at a given moment in time. This pushes the particles into a potential valley, reaching a well-defined stable configuration that approximates an equidistant sampling after the stabilization process. We apply LJLs to redistribute randomly generated point clouds into a randomized uniform distribution. Moreover, LJLs are embedded in the generation process of point cloud networks by adding them at later stages of the inference process. The improvements in 3D point cloud generation utilizing LJLs are evaluated qualitatively and quantitatively. Finally, we apply LJLs to improve the point distribution of a score-based 3D point cloud denoising network. In general, we demonstrate that LJLs are effective for distribution normalization which can be applied at negligible cost without retraining the given neural network.

URL: https://openreview.net/forum?id=imGl7xItqQ

---

Title: Graph Reinforcement Learning for Combinatorial Optimization: A Survey and Unifying Perspective

Authors: Victor-Alexandru Darvariu, Stephen Hailes, Mirco Musolesi

Abstract: Graphs are a natural representation for systems based on relations between connected entities. Combinatorial optimization problems, which arise when considering an objective function related to a process of interest on discrete structures, are often challenging due to the rapid growth of the solution space. The trial-and-error paradigm of Reinforcement Learning has recently emerged as a promising alternative to traditional methods, such as exact algorithms and (meta)heuristics, for discovering better decision-making strategies in a variety of disciplines including chemistry, computer science, and statistics. Despite the fact that they arose in markedly different fields, these techniques share significant commonalities. Therefore, we set out to synthesize this work in a unifying perspective that we term Graph Reinforcement Learning, interpreting it as a constructive decision-making method for graph problems. After covering the relevant technical background, we review works along the dividing line of whether the goal is to optimize graph structure given a process of interest, or to optimize the outcome of the process itself under fixed graph structure. Finally, we discuss the common challenges facing the field and open research questions. In contrast with other surveys, the present work focuses on non-canonical graph problems for which performant algorithms are typically not known and Reinforcement Learning is able to provide efficient and effective solutions.

URL: https://openreview.net/forum?id=HduK51xNtS

---

Title: Sequential Best-Arm Identification with Application to P300 Speller

Authors: Xin Zhou, Botao Hao, Tor Lattimore, Jian Kang, Lexin Li

Abstract: A brain-computer interface (BCI) is an advanced technology that facilitates direct communication between the human brain and a computer system, by enabling individuals to interact with devices using only their thoughts. The P300 speller is a primary type of BCI system, which allows users to spell words without using a physical keyboard, but instead by capturing and interpreting brain electroencephalogram (EEG) signals under different stimulus presentation paradigms. Traditional non-adaptive presentation paradigms, however, treat each word selection as an isolated event, resulting in a lengthy learning process. To enhance efficiency, we cast the problem as a sequence of best-arm identification tasks within the context of multi-armed bandits, where each task corresponds to the interaction between the user and the system for a single character or word. Leveraging large language models, we utilize the prior knowledge learned from previous tasks to inform and facilitate subsequent tasks. We propose a sequential top-two Thompson sampling algorithm under two scenarios: the fixed-confidence setting and the fixed-budget setting. We study the theoretical property of the proposed algorithm, and demonstrate its substantial empirical improvement through both simulations as well as the data generated from a P300 speller simulator that was built upon the real BCI experiments.

URL: https://openreview.net/forum?id=QweNIIqvZf

---

Title: EHI: End-to-end Learning of Hierarchical Index for Efficient Dense Retrieval

Authors: Ramnath Kumar, Anshul Mittal, Nilesh Gupta, Aditya Kusupati, Inderjit S Dhillon, Prateek Jain

Abstract: Dense embedding-based retrieval is widely used for semantic search and ranking. However, conventional two-stage approaches, involving contrastive embedding learning followed by approximate nearest neighbor search (ANNS), can suffer from misalignment between these stages. This mismatch degrades retrieval performance. We propose End-to-end Hierarchical Indexing (EHI), a novel method that directly addresses this issue by jointly optimizing embedding generation and ANNS structure. EHI leverages a dual encoder for embedding queries and documents while simultaneously learning an inverted file index (IVF)-style tree structure. To facilitate the effective learning of this discrete structure, EHI introduces dense path embeddings that encodes the path traversed by queries and documents within the tree. Extensive evaluations on standard benchmarks, including MS MARCO (Dev set) and TREC DL19, demonstrate EHI's superiority over traditional ANNS index. Under the same computational constraints, EHI outperforms existing state-of-the-art methods by +1.45% in MRR@10 on MS MARCO (Dev) and +8.2% in nDCG@10 on TREC DL19, highlighting the benefits of our end-to-end approach.

URL: https://openreview.net/forum?id=GeLLOGsHV9

---

Title: On the Data Heterogeneity in Adaptive Federated Learning

Authors: Yujia Wang, Jinghui Chen

Abstract: Adaptive federated learning, which benefits from the characteristic of both adaptive optimizer and federated training paradigm, has recently gained lots of attention. Despite achieving outstanding performances on tasks with heavy-tail stochastic gradient noise distributions, adaptive federated learning also suffers from the same data heterogeneity issue as standard federated learning: heterogeneous data distribution across the clients can largely deteriorate the convergence of adaptive federated learning. In this paper, we propose a novel adaptive federated learning framework with local gossip averaging to address this issue. Particularly, we introduce a client re-sampling mechanism and peer-to-peer gossip communications between local clients to mitigate the data heterogeneity without requiring additional gradient computation costs. We theoretically prove the fast convergence for our proposed method under non-convex stochastic settings and empirically demonstrate its superior performances over vanilla adaptive federated learning with client sampling. Moreover, we extend our framework to a communication-efficient variant, in which clients are divided into disjoint clusters determined by their connectivity or communication capabilities. We exclusively perform local gossip averaging within these clusters, leading to an enhancement in network communication efficiency for our proposed method.

URL: https://openreview.net/forum?id=hv7iXsiBZE

---

Title: Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners

Authors: Xuehai He, Weixi Feng, Tsu-Jui Fu, Varun Jampani, Arjun Reddy Akula, Pradyumna Narayana, S Basu, William Yang Wang, Xin Eric Wang

Abstract: Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation. Since text-to-image generation often requires models to generate visual concepts with fine-grained details and attributes specified in text prompts, can we leverage the powerful representations learned by pre-trained diffusion models for discriminative tasks such as image-text matching? To answer this question, we propose a novel approach, Discriminative Stable Diffusion (Discffusion), which turns pre-trained text-to-image diffusion models into few-shot discriminative learners. Our approach uses the cross-attention score of a Stable Diffusion model to capture the mutual influence between visual and textual information and fine-tune the model via a new attention-based prompt learning to perform image-text matching. By comparing Discffusion with state-of-the-art methods on several benchmark datasets, we demonstrate the potential of using pre-trained diffusion models for discriminative tasks with superior results on few-shot image-text matching.

URL: https://openreview.net/forum?id=GtnipgAomT

---

Title: “Studying How to Efficiently and Effectively Guide Models with Explanations” - A Reproducibility Study

Authors: Adrian Sauter, Milan Miletić, Ryan Ott, Rohith Saai Pemmasani Prabakaran

Abstract: Model guidance describes the approach of regularizing the explanations of a deep neu-
ral network model towards highlighting the correct features to ensure that the model is
“right for the right reasons”. Rao et al. (2023) conducted an in-depth evaluation of ef-
fective and efficient model guidance for object classification across various loss functions,
attributions methods, models, and ’guidance depths’ to study the effectiveness of differ-
ent methods. Our work aims to (1) reproduce the main results obtained by Rao et al.
(2023), and (2) propose several extensions to their research. We conclude that the major
part of the original work is reproducible, with certain minor exceptions, which we discuss
in this paper. In our extended work, we point to an issue with the Energy Pointing Game
(EPG) metric used for evaluation and propose an extension for increasing its robustness.
In addition, we observe the EPG metric’s predisposition towards favoring larger bounding
boxes, a bias we address by incorporating a corrective penalty term into the original En-
ergy loss function. Furthermore, we revisit the feasibility of using segmentation masks in
light of the original study’s finding that minimal annotated data can significantly boost
model performance. Our findings suggests that Energy loss inherently guides models to
on-object features without the requirement for segmentation masks. Finally, we explore
the role of contextual information in object detection and, contrary to the assumption
that focusing solely on object-specific features suffices for accurate classification, our find-
ings suggest the importance of contextual cues in certain scenarios.
Code available at: https://anonymous.4open.science/r/model_guidance_repro_study.

URL: https://openreview.net/forum?id=9ZzASCVhDF

---

Title: PCNN: Probable-Class Nearest-Neighbor Explanations Improve Fine-Grained Image Classification Accuracy for AIs and Humans

Authors: Giang Nguyen, Valerie Chen, Mohammad Reza Taesiri, Anh Nguyen

Abstract: Nearest neighbors (NN) are traditionally used to compute final decisions, e.g., in Support Vector Machines or k-NN classifiers, and to provide users with explanations for the model's decision.
In this paper, we show a novel utility of nearest neighbors: To improve predictions of a frozen, pretrained image classifier C.
We leverage an image comparator S that (1) compares the input image with NN images from the top-K most probable classes given by C; and (2) uses scores from S to weight the confidence scores of C to refine predictions.
Our method consistently improves fine-grained image classification accuracy on CUB-200, Cars-196, and Dogs-120.
Also, a human study finds that showing users our probable-class nearest neighbors (PCNN) reduces over-reliance on AI, thus improving their decision accuracy over prior work which only shows only the most-probable (top-1) class examples.

URL: https://openreview.net/forum?id=OcFjqiJ98b

---

Title: Leveraging Task Structures for Improved Identifiability in Neural Network Representations

Authors: Wenlin Chen, Julien Horwood, Juyeon Heo, José Miguel Hernández-Lobato

Abstract: This work extends the theory of identifiability in supervised learning by considering the consequences of having access to a distribution of tasks. In such cases, we show that linear identifiability is achievable in the general multi-task regression setting. Furthermore, we show that the existence of a task distribution which defines a conditional prior over latent factors reduces the equivalence class for identifiability to permutations and scaling of the true latent factors, a stronger and more useful result than linear identifiability. Crucially, when we further assume a causal structure over these tasks, our approach enables simple maximum marginal likelihood optimization, and suggests potential downstream applications to causal representation learning. Empirically, we find that this straightforward optimization procedure enables our model to outperform more general unsupervised models in recovering canonical representations for both synthetic data and real-world molecular data.

URL: https://openreview.net/forum?id=WLcPrq6pu0

---

Title: Re-Thinking Inverse Graphics With Large Language Models

Authors: Peter Kulits, Haiwen Feng, Weiyang Liu, Victoria Fernandez Abrevaya, Michael J. Black

Abstract: Inverse graphics -- the task of inverting an image into physical variables that, when rendered, enable reproduction of the observed scene -- is a fundamental challenge in computer vision and graphics. Successfully disentangling an image into its constituent elements, such as the shape, color, and material properties of the objects of the 3D scene that produced it, requires a comprehensive understanding of the environment. This complexity limits the ability of existing carefully engineered approaches to generalize across domains. Inspired by the zero-shot ability of large language models (LLMs) to generalize to novel contexts, we investigate the possibility of leveraging the broad world knowledge encoded in such models to solve inverse-graphics problems. To this end, we propose the Inverse-Graphics Large Language Model (IG-LLM), an inverse-graphics framework centered around an LLM, that autoregressively decodes a visual embedding into a structured, compositional 3D-scene representation. We incorporate a frozen pre-trained visual encoder and a continuous numeric head to enable end-to-end training. Through our investigation, we demonstrate the potential of LLMs to facilitate inverse graphics through next-token prediction, without the application of image-space supervision. Our analysis enables new possibilities for precise spatial reasoning about images that exploit the visual knowledge of LLMs. We release our code and data at https://ig-llm.is.tue.mpg.de/ to ensure the reproducibility of our investigation and to facilitate future research.

URL: https://openreview.net/forum?id=u0eiu1MTS7

---

Title: Guarantees of confidentiality via Hammersley-Chapman-Robbins bounds

Authors: Kamalika Chaudhuri, Chuan Guo, Laurens van der Maaten, Saeed Mahloujifar, Mark Tygert

Abstract: Protecting privacy during inference with deep neural networks is possible by adding Gaussian noise to the activations in the last layers prior to the final classifiers or other task-specific layers. The activations in such layers are known as "features" (or, less commonly, as "embeddings" or "feature embeddings"). The added noise helps prevent reconstruction of the inputs from the noisy features. Lower bounding the variance of every possible unbiased estimator of the inputs quantifies the confidentiality arising from such added noise. Convenient, computationally tractable bounds are available from classic inequalities of Hammersley and of Chapman and Robbins -- the HCR bounds. Numerical experiments indicate that the HCR bounds are on the precipice of being effectual for small neural nets with the data sets, "MNIST" and "CIFAR-10," which contain 10 classes each for image classification. The HCR bounds appear to be insufficient on their own to guarantee confidentiality of the inputs to inference with standard deep neural nets, "ResNet-18" and "Swin-T," pre-trained on the data set, "ImageNet-1000," which contains 1000 classes. Supplementing the addition of Gaussian noise to features with other methods for providing confidentiality may be warranted in the case of ImageNet. In all cases, the results reported here limit consideration to amounts of added noise that incur little degradation in the accuracy of classification from the noisy features. Thus, the added noise enhances confidentiality without much reduction in the accuracy on the task of image classification.

URL: https://openreview.net/forum?id=DOWSP7y2cu

---

Title: Path Development Network with Finite-dimensional Lie Group

Authors: Hang Lou, Siran Li, Hao Ni

Abstract: Signature, lying at the heart of rough path theory, is a central tool for analysing controlled differential equations driven by irregular paths. Recently it has also found extensive applications in machine learning and data science as a mathematically principled, universal feature that boosts the performance of deep learning-based models in sequential data tasks. It, nevertheless, suffers from the curse of dimensionality when paths are high-dimensional.

We propose a novel, trainable path development layer, which exploits representations of sequential data through finite-dimensional Lie groups, thus resulting in dimension reduction. Its backpropagation algorithm is designed via optimization on manifolds. Our proposed layer, analogous to recurrent neural networks (RNN), possesses an explicit, simple recurrent unit that alleviates the gradient issues.

Our layer demonstrates its strength in irregular time series modelling. Empirical results on a range of datasets show that the development layer consistently and significantly outperforms signature features on accuracy and dimensionality. The compact hybrid model (stacking one-layer LSTM with the development layer) achieves state-of-the-art against various RNN and continuous time series models. Our layer also enhances the performance of modelling dynamics constrained to Lie groups. Code is available at https://github.com/PDevNet/DevNet.git.

URL: https://openreview.net/forum?id=YoWBLu74TL

---

Title: Biased Dueling Bandits with Stochastic Delayed Feedback

Authors: Bongsoo Yi, Yue Kang, Yao Li

Abstract: The dueling bandit problem, an essential variation of the traditional multi-armed bandit problem, has become significantly prominent recently due to its broad applications in online advertising, recommendation systems, information retrieval, and more. However, in many real-world applications, the feedback for actions is often subject to unavoidable delays and is not immediately available to the agent. This partially observable issue poses a significant challenge to existing dueling bandit literature, as it significantly affects how quickly and accurately the agent can update their policy on the fly. In this paper, we introduce and examine the biased dueling bandit problem with stochastic delayed feedback, revealing that this new practical problem will delve into a more realistic and intriguing scenario involving a preference bias between the selections. We present two algorithms designed to handle situations involving delay. Our first algorithm, requiring complete delay distribution information, achieves the optimal regret bound for the dueling bandit problem when there is no delay. The second algorithm is tailored for situations where the distribution is unknown, but only the expected value of delay is available. We provide a comprehensive regret analysis for the two proposed algorithms and then evaluate their empirical performance on both synthetic and real datasets.

URL: https://openreview.net/forum?id=HwAZDVxkLX

---

Title: AdaStop: adaptive statistical testing for sound comparisons of Deep RL agents

Authors: Timothée Mathieu, Matheus Medeiros Centa, Riccardo Della Vecchia, Hector Kohler, Alena Shilova, Odalric-Ambrym Maillard, Philippe Preux

Abstract: Recently, the scientific community has questioned the statistical reproducibility of many empirical results, especially in the field of machine learning.
To contribute to the resolution of this reproducibility crisis, we propose a theoretically sound methodology for comparing the performance of a set of algorithms. We exemplify our methodology in Deep Reinforcement Learning (Deep RL). The performance of one execution of a Deep RL algorithm is a random variable. Therefore, several independent executions are needed to evaluate its performance.
When comparing algorithms with random performance, a major question concerns the number of executions to perform to ensure that the result of the comparison is theoretically sound. Researchers in Deep RL often use less than 5 independent executions
to compare algorithms: we claim that this is not enough in general. Moreover, when comparing more than 2 algorithms at once,
we have to use a multiple tests procedure to preserve low error guarantees. We introduce AdaStop, a new statistical test based on multiple group sequential tests.
When used to compare algorithms, AdaStop adapts the number of executions to stop as early as possible while ensuring that enough information has been collected to distinguish algorithms that have different score distributions. We prove theoretically that AdaStop has a low probability of making a (family-wise) error. We illustrate the effectiveness of AdaStop in various use-cases, including toy examples and Deep RL algorithms on challenging Mujoco environments.
AdaStop is the first statistical test fitted to this sort of comparisons: it is both a significant contribution to statistics, and an important contribution to computational studies performed in reinforcement learning and in other domains.

URL: https://openreview.net/forum?id=lXyZr9TLEU

---

Title: Deconfounding Imitation Learning with Variational Inference

Authors: Risto Vuorio, Pim De Haan, Johann Brehmer, Hanno Ackermann, Daniel Dijkman, Taco Cohen

Abstract: Standard imitation learning can fail when the expert demonstrators have different sensory inputs than the imitating agent. This is because partial observability gives rise to hidden confounders in the causal graph. In previous work, to work around the confounding problem, policies have been trained using query access to the expert’s policy or inverse reinforcement learning (IRL). However, both approaches have drawbacks as the expert’s policy may not be available and IRL can be unstable in practice. Instead, we propose to train a variational inference model to infer the expert’s latent information and use it to train a latent-conditional policy. We prove that using this method, under strong assumptions, the identification of the correct imitation learning policy is theoretically possible from expert demonstrations alone. In practice, we focus on a setting with less strong assumptions where we use exploration data for learning the inference model. We show in theory and practice that this algorithm converges to the correct interventional policy, solves the confounding issue, and can under certain assumptions achieve an asymptotically optimal imitation performance.

URL: https://openreview.net/forum?id=3FsVtsISHW

---

New submissions
===============

Title: Variational Stochastic Gradient Descent for Deep Neural Networks

Abstract: Optimizing deep neural networks is one of the main tasks in successful deep learning. Current state-of-the-art optimizers are adaptive gradient-based optimization methods such as Adam. Recently, there has been an increasing interest in formulating gradient-based optimizers in a probabilistic framework for better estimation of gradients and modeling uncertainties. Here, we propose to combine both approaches, resulting in the Variational Stochastic Gradient Descent (VSGD) optimizer. We model gradient updates as a probabilistic model and utilize stochastic variational inference (SVI) to derive an efficient and effective update rule. Further, we show how our VSGD method relates to other adaptive gradient-based optimizers like Adam. Lastly, we carry out experiments on two image classification datasets and four deep neural network architectures, where we show that VSGD outperforms Adam and SGD.

URL: https://openreview.net/forum?id=xu4ATNjcdy

---

Title: Regularized Proportional Fairness Mechanism for Resource Allocation Without Money

Abstract: Mechanism design in resource allocation studies dividing limited resources among self-interested agents whose satisfaction with the allocation depends on privately held utilities. We consider the problem in a payment-free setting, with the aim of maximizing social welfare while enforcing incentive compatibility(agents cannot inflate allocations by misreporting their utilities). The well-known proportional fairness (PF) mechanism achieves the maximum possible social welfare but incurs an undesirably high exploitability (the maximum unilateral inflation in utility from misreport and a measure of deviation from IC). In fact, it is known that no mechanism can achieve the maximum social welfare and exact incentive compatibility (IC) simultaneously without the use of monetary incentives (Cole et al., 2013). Motivated by this fact, we propose learning an approximate mechanism that desirably trades-off the competing objectives. The mechanism is parameterized by an innovative neural network architecture tailored to the resource allocation problem, which we name Regularized Proportional Fairness Network (RPF-Net). RPF-Net is designed to regularize the output of the PF mechanism by a learned function approximator of the worst-case misreported utilities, with the aim of reducing the incentive for any agent to misreport. We derive generalization bounds that guarantee the mechanism performance when trained under finite and out-of-distribution samples and experimentally demonstrate the merits of the proposed mechanism compared to the state-of-the-art.

The PF mechanism acts as an important benchmark for comparing the social welfare of any mechanism. However, there exists no established way of computing its exploitability. The challenge here is that we need to find the maximizer of an optimization problem in which the gradient is only implicitly defined. We for the first time provide a systematic method for finding such (sub)gradients, which enables the exploitability evaluation of the PF mechanism.

URL: https://openreview.net/forum?id=K85nJ60lDR

---

Title: Meta-Learning to Teach Semantic Prompts for Open Domain Generalization in Vision-Language Models

Abstract: Open Domain Generalization (ODG) addresses the challenges posed by domain and category shifts between labeled training sources and unlabeled target domains. Current state-of-the-art methods struggle with the limitations of traditional CNN backbones, leading to reduced generalization and increased error rates in detecting target open samples without prior knowledge. Additionally, recent CLIP-based prompt learning approaches fail to distinguish between known and unknown classes effectively, resulting in suboptimal performance. To address these challenges, we propose MetaPrompt, which leverages the semantic strengths of the vision-language model CLIP and the ''learning-to-learn'' capabilities of Meta-Learning to achieve robust generalization across domain and category shifts. Our framework introduces three key innovations: First, we approach ODG as a multi-class classification problem that includes both known and novel categories, designing novel prompts capable of detecting unknown class samples across multiple domains. These prompts are trained using Meta-Learning with momentum updates, enabling smooth and accurate differentiation between known and unknown classes. Second, we introduce a novel domain-agnostic semantic attention-based prompt alongside domain-focused prompts to enhance robustness in classifying unknown classes across various domains. Finally, we incorporate an unsupervised contrastive loss during episodic Meta-Training, which reinforces the boundaries in the metric space between known and unknown classes, thereby enhancing ''unknown'' class awareness in the prompts. MetaPrompt has demonstrated its superiority through extensive testing on diverse datasets, excelling in both closed and open-set DG scenarios and consistently outperforming existing solutions.

URL: https://openreview.net/forum?id=uJELgNGiMW

---

Title: Gaussian Processes with Bayesian Inference of Covariate Couplings

Abstract: Gaussian processes are powerful probabilistic models that are often coupled with Automatic Relevance Determination (ARD) capable of uncovering the importance of individual covariates. We develop covariances characterized by affine transformations of the inputs, formalized via a precision matrix between covariates, which can uncover covariate couplings for enhanced interpretability. We study a range of couplings priors from Wishart to Horseshoe and present fully Bayesian inference of such precision matrices within sparse Gaussian process. We demonstrate empirically the efficacy and interpretability of this approach.

URL: https://openreview.net/forum?id=jbg6H7n6Um

---

Title: Robust and Explainable Deep Hedging with Linearized Neural Network

Abstract: Deep hedging is promising for risk management for financial derivatives through deep learning, yet it remains hindered by complex, resource-intensive training and the challenge of effectively integrating deep neural networks with hedging optimization. To overcome these issues, we introduce a robust and efficient linearized neural network architecture, seamlessly integrated with Black-Scholes' Delta, to streamline deep learning-based hedging optimization (DHLNN). Our approach enhances both the efficiency and interpretability of hedging strategies in derivative markets. The proposed model shows strong resilience to market fluctuations, effectively addresses action-dependence challenges, and achieves faster convergence compared to existing methods. Extensive simulations confirm the superior performance and cost-effectiveness of our method, under varying market conditions, when compared to state-of-the-art deep hedging models. These findings underscore the potential of DHLNN to significantly improve both convergence and hedging performance in derivative markets.

URL: https://openreview.net/forum?id=pLMqtzjujS

---

Title: CluMo: Cluster-based Modality Fusion Prompt for Continual Learning in Visual Question Answering

Abstract: Large vision-language models (VLMs) have shown significant performance boosts in various application domains. However, adopting them to deal with several sequentially encountered tasks has been challenging because finetuning a VLM on a task normally leads to reducing its generalization power and the capacity of learning new tasks as well as causing catastrophic forgetting on previously learned tasks. Enabling using VLMs in multimodal continual learning (CL) settings can help to address such scenarios. To improve generalization capacity and prevent catastrophic forgetting, we propose a novel prompt-based CL method for VLMs, namely Cluster-based Modality Fusion Prompt (CluMo). We design a novel Key-Key-Prompt pair, where each prompt is associated with a visual prompt key and a textual prompt key. We adopt a two-stage training strategy. During the first stage, the single-modal keys are trained via K-means clustering algorithm to help select the best semantically matched prompt. During the second stage, the prompt keys are frozen, the selected prompt is attached to the input for training the VLM in the CL scenario. Experiments on two benchmarks demonstrate that our method achieves SOTA performance.

URL: https://openreview.net/forum?id=GYrkZ3TSXZ

---

Title: Out-of-Distribution Learning with Human Feedback

Abstract: Out-of-distribution (OOD) learning often relies heavily on statistical approaches or predefined assumptions about OOD data distributions, hindering their efficacy in addressing multifaceted challenges of OOD generalization and OOD detection in real-world deployment environments. This paper presents a novel framework for OOD learning with human feedback, which can provide invaluable insights into the nature of OOD shifts and guide effective model adaptation. Our framework capitalizes on the freely available unlabeled data in the wild that captures the environmental test-time OOD distributions under both covariate and semantic shifts. To harness such data, our key idea is to selectively provide human feedback and label a small number of informative samples from the wild data distribution, which are then used to train a multi-class classifier and an OOD detector. By exploiting human feedback, we enhance the robustness and reliability of machine learning models, equipping them with the capability to handle OOD scenarios with greater precision. We provide theoretical insights on the generalization error bounds to justify our algorithm. Extensive experiments show the superiority of our method, outperforming the current state-of-the-art by a significant margin.

URL: https://openreview.net/forum?id=5qo8MF3QU1

---

Title: QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning

Abstract: Prompt engineering has demonstrated remarkable success in enhancing the performance of large language models (LLMs) across diverse tasks. However, most existing prompt optimization methods only focus on the task-level performance, overlooking the importance of query-preferred prompts, which leads to suboptimal performances. Additionally, these methods rely heavily on frequent interactions with LLMs to obtain feedback for guiding the optimization process, incurring substantial redundant interaction costs. In this paper, we introduce Query-dependent Prompt Optimization ($\textbf{QPO}$), which leverages multi-loop offline reinforcement learning to iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries, thus significantly improving the prompting effect on the large target LLM. We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks, thereby circumventing the expenses of online interactions. Furthermore, we continuously augment the offline dataset with the generated prompts in each loop, as the prompts from the fine-tuned model are supposed to outperform the source prompts in the original dataset. These iterative loops bootstrap the model towards generating optimal prompts. Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.

URL: https://openreview.net/forum?id=bqMJToTkvT

---

Title: Provable Quantum Algorithm Advantage for Gaussian Process Quadrature

Abstract: The aim of this paper is to develop novel quantum algorithms for Gaussian process quadrature methods. Gaussian process quadratures are numerical integration methods where Gaussian processes are used as functional priors for the integrands to capture the uncertainty arising from the sparse function evaluations. Quantum computers have emerged as potential replacements for classical computers, offering exponential reductions in the computational complexity for machine learning tasks. In this paper, we combine Gaussian process quadratures and quantum computing by proposing a quantum low-rank Gaussian process quadrature method based on a Hilbert space approximation of the Gaussian process kernel and enhancing the quadrature using a quantum circuit. The method combines the quantum phase estimation algorithm with the quantum principal component analysis technique to extract information up to a desired rank. Then, Hadamard and SWAP tests are implemented to find the expected value and variance that determines the quadrature. We use numerical simulations of a quantum computer to demonstrate the effectiveness of the method. Furthermore, we provide a theoretical complexity analysis that shows a polynomial advantage over classical Gaussian process quadrature methods. The code is available at \url{https://anonymous.4open.science/r/Quantum_HS_GP_Quadrature/}

URL: https://openreview.net/forum?id=K6CvWPtF62

---

Title: Thompson Sampling for Non-Stationary Bandit Problems

Abstract: Non-stationary multi-armed bandit (MAB) problems have recently attracted extensive attention. We focus on the abruptly changing scenario where reward distributions remain constant for a certain period and change at unknown time steps. Although Thompson Sampling (TS) has shown empirical success in non-stationary settings, there is currently no regret bound analysis for TS with Gaussian priors. To address this, we propose two algorithms, discounted TS and sliding-window TS, designed for sub-Gaussian reward distributions. For these algorithms, we establish an upper bound for the expected regret by bounding the expected number of times a suboptimal arm is played. We show that the regret order of both algorithms is $\tilde{O}(\sqrt{TB_T})$, where $T$ is the time horizon, $B_T$ is the number of breakpoints. This upper bound matches the lower bound for abruptly changing problems up to a logarithmic factor. Empirical comparisons with other non-stationary bandit algorithms highlight the competitive performance of our proposed methods.

URL: https://openreview.net/forum?id=ni2aOhI48P

---

Title: Pretraining a Neural Operator in Lower Dimensions

Abstract: There has recently been increasing attention towards developing foundational neural Partial Differential Equation (PDE) solvers and neural operators through large-scale pertaining. However, unlike vision and language models that make use of abundant and inexpensive (unlabeled) data for pretraining, these neural solvers usually rely on simulated PDE data, which can be costly to obtain, especially for high dimensional PDEs. In this work, we aim to Pretrain neural PDE solvers on Lower Dimensional PDEs (PreLowD) where data collection is the least expensive. We evaluated the effectiveness of this pretraining strategy in similar PDEs in higher dimensions. We use the Factorized Fourier Neural Operator (FFNO) due to having the necessary flexibility to be applied to PDE data of arbitrary spatial dimensions and reuse trained parameters in lower dimensions. In addition, our work sheds light on the effect of the fine-tuning configuration to make the most of this pretraining strategy.

URL: https://openreview.net/forum?id=ZewaRoZehI

---

Title: Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: Global Convergence Guarantees and Feature Learning

Abstract: We consider gradient-based optimisation of wide, shallow neural networks, where the output of each hidden node is scaled by a positive parameter. The scaling parameters are non-identical, differing from the classical Neural Tangent Kernel (NTK) parameterisation. We prove that for large such neural networks, with high probability, gradient flow and gradient descent converge to a global minimum and can learn features in some sense, unlike in the NTK parameterisation. We perform experiments illustrating our theoretical results and discuss the benefits of such scaling in terms of prunability and transfer learning.

URL: https://openreview.net/forum?id=Sx1khIIi95

---

Title: From Promise to Practice: A Study of Common Pitfalls Behind the Generalization Gap in Machine Learning

Abstract: The world of Machine Learning (ML) offers great promise, but often there is a noticeable gap between claims made in research papers and the model's practical performance in real-life applications. This gap can often be attributed to systematic errors and pitfalls that occur during the development phase of ML models. This study aims to systematically identify these errors. For this, we break down the ML process into four main stages: data handling, model design, model evaluation, and reporting. Across these stages, we have identified fourteen common pitfalls based on a comprehensive review of around
60 papers discussing either broad challenges or specific pitfalls within ML pipeline. Moreover, Using the Brain Tumor Segmentation (BraTS) dataset, we perform three experiments to illustrate the impacts of these pitfalls, providing examples of how they can skew results and affect outcomes. In addition, we also perform a review to study the frequency of unclear reporting regarding these pitfalls in ML research. The goal of this review was to assess whether authors have adequately addressed these pitfalls in their reports. For this, we review 126 randomly chosen papers on image segmentation from the ICCV (2013-2021) and MICCAI (2013-2022) conferences from the last ten years. The results from this review show a notable oversight of these issues, with many of the papers lacking clarity on how the pitfalls are handled. This highlights an important gap in current reporting practices within the ML community. The codes for the experiments will be published upon acceptance.

URL: https://openreview.net/forum?id=DqWvxSQ1TK

---

Title: LLMs can learn self-restraint through iterative self-reflection

Abstract: In order to be deployed safely, Large Language Models (LLMs) must be capable of dynamically adapting their behavior based on their level of knowledge and uncertainty associated with specific topics. This adaptive behavior, which we refer to as self-restraint, is non-trivial to teach since it depends on the internal knowledge of an LLM. By default, LLMs are trained to maximize the next token likelihood, which does not teach the model to modulate its answer based on its level of uncertainty. In order to learn self-restraint, we devise a utility function that can encourage the model to produce responses only when its level of confidence is above a user-specified target accuracy $\rho^*$. This utility function can be used to score generation of different length and abstention. To optimize this function, we introduce ReSearch, a process of ``self-reflection'' consisting of iterative self-prompting and self-evaluation. We use the ReSearch algorithm to generate synthetic data on which we finetune our models. ReSearch elegantly incorporates the ability to abstain by augmenting the samples generated by the model during the search procedure with an answer expressing abstention. Compared to their original versions, our resulting models generate fewer hallucinations overall at no additional inference cost, for both known and unknown topics, as the model learns to selectively restrain itself. In addition, we show that our iterative search is more efficient as a function of tokens than naive search. Finally, we show that by modifying the target accuracy $\rho^*$, our trained models exhibit different behaviors.

URL: https://openreview.net/forum?id=SvKPfchVKX

---

Title: PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements

Abstract: In the field of psychology, traditional assessment methods, such as standardized scales, are frequently critiqued for their static nature, lack of personalization, and reduced participant engagement, while comprehensive counseling evaluations are often inaccessible.
The complexity of quantifying psychological traits further limits these methods. Despite advances with large language models (LLMs), many still depend on single-round Question-and-Answer interactions. To bridge this gap, we introduce PsyDI, a personalized and progressively in-depth chatbot designed for psychological measurements, exemplified by its application in the Myers-Briggs Type Indicator (MBTI) framework. PsyDI leverages user-related multi-modal information and engages in customized, multi-turn interactions to provide personalized, easily accessible measurements, while ensuring precise MBTI type determination. To address the challenge of unquantifiable psychological traits, we introduce a novel training paradigm that involves learning the ranking of proxy variables associated with these traits, culminating in a robust score model for MBTI measurements. The score model enables PsyDI to conduct comprehensive and precise measurements through multi-turn interactions within a unified estimation context. Through various experiments, we validate the efficacy of both the score model and the PsyDI pipeline, demonstrating its potential to serve as a general framework for psychological measurements. Furthermore, the online deployment of PsyDI has garnered substantial user engagement, with over 3,000 visits, resulting in the collection of numerous multi-turn dialogues annotated with MBTI types, which facilitates further research.

URL: https://openreview.net/forum?id=eqVFj1l0oW

---

Title: Your Classifier Can Be Secretly a Likelihood-Based OOD Detector

Abstract: The ability to detect out-of-distribution (OOD) inputs is critical to guarantee the reliability of classification models deployed in an open environment. A fundamental challenge in OOD detection is that a discriminative classifier is typically trained to estimate the posterior probability $p(y|\mathbf{z})$ for class $y$ given an input $\mathbf{z}$, but lacks the explicit likelihood estimation of $p(\mathbf{z})$ ideally needed for OOD detection. While numerous OOD scoring functions have been proposed for classification models, these estimate scores are often heuristic-driven and cannot be rigorously interpreted as likelihood. To bridge the gap, we propose Intrinsic Likelihood (INK), which offers rigorous likelihood interpretation to modern discriminative-based classifiers. Specifically, our proposed INK score operates on the constrained latent embeddings of a discriminative classifier, which are modeled as a mixture of hyperspherical embeddings with constant norm. We draw a novel connection between the hyperspherical distribution and the intrinsic likelihood, which can be effectively optimized in modern neural networks. Extensive experiments on the OpenOOD benchmark empirically demonstrate that INK establishes a new state-of-the-art in a variety of OOD detection setups, including both far-OOD and near-OOD.

URL: https://openreview.net/forum?id=FmA1JPWBM8

---

Title: Exercising Good Practices of Machine Learning Research: A Case Study of Environment Image Classification

Abstract: It is of the utmost importance that, in both research and industry applications, results in the field of Machine Learning are performed and presented in a fair, explainable, and reproducible fashion. This paper uses the framework of an image classification case study to explore the practical application of a range of fundamental approaches that can guide such effective practices. We discuss and implement ideas of data collection and analysis, fairness, evaluation metrics, statistical interpretation, model implementation, repeatability, as well as the encouragement and provision of necessary resources for future research and cross-checking.

URL: https://openreview.net/forum?id=nhPbsmZK0U

---

Title: Blending Two Styles: Generating Inter-domain Images with MiddleGAN

Abstract: From celebrity faces to cats and dogs, humans enjoy pushing the boundaries of art by blending existing concepts together in new ways. With the rise of generative artificial intelligence, machines are increasingly capable of creating new images. Generative Adversarial Networks (GANs) generate images similar to their training data but struggle to blend images from distinct datasets. This paper introduces MiddleGAN, a novel GAN variant that blends inter-domain images from two distinct input sets. By incorporating a second discriminator, MiddleGAN forces the generator to create images that fool both discriminators, thus capturing the qualities of both input sets. We also introduce a blend ratio hyperparameter to control the weighting of the input sets and compensate for datasets of different complexities. Evaluating MiddleGAN on the CelebA dataset, we demonstrate that it successfully generates images that lie between the distributions of the input sets, both mathematically and visually. An additional experiment verifies the viability of MiddleGAN on handwritten digit datasets (DIDA and MNIST). We provide a proof of optimal convergence for the neural networks in our architecture and show that MiddleGAN functions across various resolutions and blend ratios. We conclude with potential future research directions for MiddleGAN.

URL: https://openreview.net/forum?id=t7vWCHmwbG

---

Reply all

Reply to author

Forward

0 new messages