Weekly TMLR digest for Jun 09, 2024

1 view

Skip to first unread message

TMLR

unread,

Jun 9, 2024, 12:00:10 AMJun 9

to tmlr-annou...@googlegroups.com

New certifications
==================

Featured Certification: Linear Bandits with Memory

Giulia Clerici, Pierre Laforgue, Nicolò Cesa-Bianchi

https://openreview.net/forum?id=CrpDwMFgxr

---

Reproducibility Certification: Reproducibility study of “LICO: Explainable Models with Language-Image Consistency"

Luan Fletcher, Robert van der Klis, Martin Sedláček, Stefan Vasilev, Christos Athanasiadis

https://openreview.net/forum?id=Mf1H8X5DVb

---

Accepted papers
===============

Title: [Re] GNNInterpreter: A probabilistic generative model-level explanation for Graph Neural Networks

Authors: Ana Vasilcoiu, T.H.F. Stessen, Thies Kersten, Batu Helvacioglu

Abstract: Graph Neural Networks have recently gained recognition for their performance on graph
machine learning tasks. The increasing attention on these models’ trustworthiness and
decision-making mechanisms has instilled interest in the exploration of explainability tech-
niques, including the model proposed in "GNNInterpreter: A probabilistic generative model-
level explanation for Graph Neural Networks." (Wang & Shen (2022)). This work aims to
reproduce the findings of the original paper, by investigation the main claims made by its
authors, namely that GNNInterpreter (i) generates faithful and realistic explanations with-
out requiring domain-specific knowledge, (ii) has the ability to work with various node and
edge features, (iii) produces explanations that are representative for the target class and
(iv) has a much lower training time compared to XGNN, the current state-of-the-art model-
level GNN explanation technique. To reproduce the results, we make use of the open-source
implementation and we test the interpreter on the same datasets and GNN models as in
the original paper. We conduct an enhanced quantitative and qualitative evaluation, and
additionally we extend the original experiments to include another real-world dataset. Our
results show that we are not able to validate the first claim, due to significant hyperpa-
rameter and seed variation, as well as due to training instability. Furthermore, we partially
validate the second claim by testing on datasets with different node and edge features, but
we reject the third claim due to GNNInterpreter’s failure to outperform XGNN in producing
dataset aligned explanations. Lastly, we are able to confirm the last claim.

URL: https://openreview.net/forum?id=8cYcR23WUo

---

Title: VisionAD, a software package of performant anomaly detection algorithms, and Proportion Localised, an interpretable metric

Authors: Alexander D. J. Taylor, Phillip Tregidgo, Jonathan James Morrison, Neill D. F. Campbell

Abstract: We release VisionAD, an anomaly detection library in the domain of images. The library forms the largest and most performant collection of such algorithms to date. Each algorithm is written through a standardised API, for ease of use. The library has a focus on fair benchmarking intended to mitigate the issue of cherry-picked results. It enables rapid experimentation and straightforward integration of new algorithms. In addition, we propose a new metric, Proportion Localised (PL). This reports the proportion of anomalies that are sufficiently localised via classifying each discrete anomaly as localised or not. The metric is far more intuitive as it has a real physical relation, meaning it is attractive to industry-based professionals. We also release the VisionADIndustrial (VADI) benchmark, a thorough benchmarking of the top anomaly detection algorithms. This benchmark calculates the mean across the pooled classes of the MVTec and VisA datasets. We are committed to hosting an updated version of this leaderboard online, and encourage researchers to add, tweak and improve algorithms to climb this leaderboard. VisionAD code is found at https://github.com/alext1995/VisionAD, and Proportion Localised code is found at https://github.com/alext1995/proportion_localised.

URL: https://openreview.net/forum?id=o5kYH7bNe3

---

Title: Holistic Molecular Representation Learning via Multi-view Fragmentation

Authors: Seojin Kim, Jaehyun Nam, Junsu Kim, Hankook Lee, Sungsoo Ahn, Jinwoo Shin

Abstract: Learning chemically meaningful representations from unlabeled molecules plays a vital role in AI-based drug design and discovery. In response to this, several self-supervised learning methods have been developed, focusing either on global (e.g., graph-level) or local (e.g., motif-level) information of molecular graphs. However, it is still unclear which approach is more e ffective for learning better molecular representations. In this paper, we propose a novel holistic self-supervised molecular representation learning framework that e ffectively learns both global and local molecular information. Our key idea is to utilize fragmentation, which decomposes a molecule into a set of chemically meaningful fragments (e.g., functional groups), to associate a global graph structure to a set of local substructures, thereby preserving chemical properties and learn both information via contrastive learning between them. Additionally, we also consider the 3D geometry of molecules as another view for contrastive learning. We demonstrate that our framework outperforms prior molecular representation learning methods across various molecular property prediction tasks.

URL: https://openreview.net/forum?id=ufDh55J1ML

---

Title: Recent Link Classification on Temporal Graphs Using Graph Profiler

Authors: Muberra Ozmen, Thomas Markovich

Abstract: The performance of Temporal Graph Learning (TGL) methods are typically evaluated on the future link prediction task, i.e., whether two nodes will get connected and dynamic node classification task, i.e., whether a node's class will change. Comparatively, recent link classification, i.e., to what class an emerging edge belongs to, is investigated much less even though it exists in many industrial settings. In this work, we first formalize recent link classification on temporal graphs as a benchmark downstream task and introduce corresponding benchmark datasets. Secondly, we evaluate the performance of state-of-the-art methods with a statistically meaningful metric Matthews Correlation Coefficient, which is more robust to imbalanced datasets, in addition to the commonly used average precision and area under the curve. We propose several design principles for tailoring models to specific requirements of the task and the dataset including modifications on message aggregation schema, readout layer and time encoding strategy which obtain significant improvement on benchmark datasets. Finally, we propose an architecture that we call Graph Profiler, which is capable of encoding previous events' class information on source and destination nodes. The experiments show that our proposed model achieves an improved Matthews Correlation Coefficient on most cases under interest. We believe the introduction of recent link classification as a benchmark task for temporal graph learning will be useful for the evaluation of prospective methods within the field.

URL: https://openreview.net/forum?id=BTgHh0gSSc

---

Title: Hyperbolic Random Forests

Authors: Lars Doorenbos, Pablo Márquez Neila, Raphael Sznitman, Pascal Mettes

Abstract: Hyperbolic space is becoming a popular choice for representing data due to the hierarchical structure - whether implicit or explicit - of many real-world datasets. Along with it comes a need for algorithms capable of solving fundamental tasks, such as classification, in hyperbolic space.
Recently, multiple papers have investigated hyperbolic alternatives to hyperplane-based classifiers, such as logistic regression and SVMs. While effective, these approaches struggle with more complex hierarchical data. We, therefore, propose to generalize the well-known random forests to hyperbolic space.
We do this by redefining the notion of a split using horospheres. Since finding the globally optimal split is computationally intractable, we find candidate horospheres through a large-margin classifier. To make hyperbolic random forests work on multi-class data and imbalanced experiments, we furthermore outline new methods for combining classes based on the lowest common ancestor and class-balanced large-margin losses. Experiments on standard and new benchmarks show that our approach outperforms both conventional random forest algorithms and recent hyperbolic classifiers.

URL: https://openreview.net/forum?id=pjKcIzvXWR

---

Title: Distributionally Robust Policy Evaluation under General Covariate Shift in Contextual Bandits

Authors: Yihong Guo, Hao Liu, Yisong Yue, Anqi Liu

Abstract: We introduce a distributionally robust approach that enhances the reliability of offline policy evaluation in contextual bandits under general covariate shifts. Our method aims to deliver robust policy evaluation results in the presence of discrepancies in both context and policy distribution between logging and target data. Central to our methodology is the application of robust regression — a distributionally robust technique tailored here to improve the estimation of conditional reward distribution from logging data. Utilizing the reward model obtained from robust regression, we develop a comprehensive suite of policy value estimators, by integrating our reward model into established evaluation frameworks, namely direct methods and doubly robust methods. Through theoretical analysis, we further establish that the proposed policy value estimators offer a finite sample upper bound for the bias, providing a clear advantage over traditional methods, especially when the shift is large. Finally, we designed an extensive range of policy evaluation scenarios, covering diverse magnitudes of shifts and a spectrum of logging and target policies. Our empirical results indicate that our approach significantly outperforms baseline methods, most notably in 90% of the cases under the policy shift-only settings and 72% of the scenarios under the general covariate shift settings.

URL: https://openreview.net/forum?id=R7PReNELww

---

Title: Misspecification-robust Sequential Neural Likelihood for Simulation-based Inference

Authors: Ryan P. Kelly, David J Nott, David Tyler Frazier, David J Warne, Christopher Drovandi

Abstract: Simulation-based inference techniques are indispensable for parameter estimation of mechanistic and simulable models with intractable likelihoods. While traditional statistical approaches like approximate Bayesian computation and Bayesian synthetic likelihood have been studied under well-specified and misspecified settings, they often suffer from inefficiencies due to wasted model simulations. Neural approaches, such as sequential neural likelihood (SNL) avoid this wastage by utilising all model simulations to train a neural surrogate for the likelihood function. However, the performance of SNL under model misspecification is unreliable and can result in overconfident posteriors centred around an inaccurate parameter estimate. In this paper, we propose a novel SNL method, which through the incorporation of additional adjustment parameters, is robust to model misspecification and capable of identifying features of the data that the model is not able to recover. We demonstrate the efficacy of our approach through several illustrative examples, where our method gives more accurate point estimates and uncertainty quantification than SNL.

URL: https://openreview.net/forum?id=tbOYJwXhcY

---

Title: Rotate the ReLU to Sparsify Deep Networks Implicitly

Authors: Nancy Nayak, Sheetal Kalyani

Abstract: Compact and energy-efficient models have become essential in this era when deep learning-based solutions are widely used for various real-life tasks. In this paper, we propose rotating the ReLU activation to give an additional degree of freedom in conjunction with the appropriate initialization of the rotation. This combination leads to implicit sparsification without the use of a regularizer. We show that this rotated ReLU (RReLU) activation improves the representation capability of the parameters/filters in the network and eliminates those parameters/filters that are not crucial for the task, giving rise to significant savings in memory and computation. While the state-of-the-art regularization-based Network-Slimming method achieves $32.33\%$ saving in memory and $26.38\%$ saving in computation with ResNet-$164$, RReLU achieves a saving of $35.92\%$ in memory and $25.97\%$ in the computation with a better accuracy. The savings in memory and computation further increase by $64.67\%$ and $52.96\%$, respectively, with the introduction of $L_1$ regularization to the RReLU slopes. We note that the slopes of the rotated ReLU activations act as coarse feature extractors and can eliminate unnecessary features before retraining. Our studies indicate that features always choose to pass through a lesser number of filters. We demonstrate the results with popular datasets such as MNIST, CIFAR-10, CIFAR-100, SVHN, and Imagenet with different architectures, including Vision Transformers and EfficientNet. We also briefly study the impact of adversarial attacks on RReLU-based ResNets and observe that we get better adversarial accuracy for the architectures with RReLU than ReLU. We also demonstrate how this concept of rotation can be applied to the GELU and SiLU activation functions, commonly utilized in Transformer and EfficientNet architectures, respectively. The proposed method can be utilized by combining with other structural pruning methods resulting in better sparsity. For the GELU-based multi-layer perceptron (MLP) part of the Transformer, we obtain $2.6\%$ improvement in accuracy with $6.32\%$ saving in both memory and computation.

URL: https://openreview.net/forum?id=Nzy0XmCPuZ

---

Title: Bayesian Quantification with Black-Box Estimators

Authors: Albert Ziegler, Paweł Czyż

Abstract: Understanding how different classes are distributed in an unlabeled data set is important for the calibration of probabilistic classifiers and uncertainty quantification. Methods like adjusted classify and count, black-box shift estimators, and invariant ratio estimators use an auxiliary and potentially biased black-box classifier trained on a different data set to estimate the class distribution on the current data set and yield asymptotic guarantees under weak assumptions. We demonstrate that these algorithms are closely related to the inference in a
particular probabilistic graphical model approximating the assumed ground-truth generative process, and we propose a Bayesian estimator. Then, we discuss an efficient Markov chain Monte Carlo sampling scheme for the introduced model and show an asymptotic consistency guarantee in the large-data limit. We compare the introduced model against the established point estimators in a variety of scenarios, and show it is competitive, and in some cases superior, with the non-Bayesian alternatives.

URL: https://openreview.net/forum?id=Ft4kHrOawZ

---

Title: Linear Bandits with Memory

Authors: Giulia Clerici, Pierre Laforgue, Nicolò Cesa-Bianchi

Abstract: Nonstationary phenomena, such as satiation effects in recommendations, have mostly been modeled using bandits with finitely many arms. However, the richer action space provided by linear bandits is often preferred in practice. In this work, we introduce a novel nonstationary linear bandit model, where current rewards are influenced by the learner's past actions in a fixed-size window. Our model, which recovers stationary linear bandits as a special case, leverages two parameters: the window size $m \ge 0$, and an exponent $\gamma$ that captures the rotting ($\gamma < 0)$ or rising ($\gamma > 0$) nature of the phenomenon. When both $m$ and $\gamma$ are known, we propose and analyze a variant of OFUL which minimizes regret against cyclic policies. By choosing the cycle length so as to trade-off approximation and estimation errors, we then prove a bound of order $\sqrt{d}\,(m+1)^{\frac{1}{2}+\max\{\gamma,0\}}\,T^{3/4}$ (ignoring log factors) on the regret against the optimal sequence of actions, where $T$ is the horizon and $d$ is the dimension of the linear action space. Through a bandit model selection approach, our results are then extended to the case where both $m$ and $\gamma$ are unknown. Finally, we complement our theoretical results with experiments comparing our approach to natural baselines.

URL: https://openreview.net/forum?id=CrpDwMFgxr

---

Title: Simple Imputation Rules for Prediction with Missing Data: Theoretical Guarantees vs. Empirical Performance

Authors: Dimitris Bertsimas, Arthur Delarue, Jean Pauphilet

Abstract: Missing data is a common issue in real-world datasets. This paper studies the performance of impute-then-regress pipelines by contrasting theoretical and empirical evidence. We establish the asymptotic consistency of such pipelines for a broad family of imputation methods. While common sense suggests that a 'good' imputation method produces datasets that are plausible, we show, on the contrary, that, as far as prediction is concerned, crude can be good. Among others, we find that mode-impute is asymptotically sub-optimal, while mean-impute is asymptotically optimal. We then exhaustively assess the validity of these theoretical conclusions on a large corpus of synthetic, semi-real, and real datasets. While the empirical evidence we collect mostly supports our theoretical findings, it also highlights gaps between theory and practice and opportunities for future research, regarding the relevance of the MAR assumption, the complex interdependency between the imputation and regression tasks, and the need for realistic synthetic data generation models.

URL: https://openreview.net/forum?id=IKH5ziX9dk

---

Title: DIGNet: Learning Decomposed Patterns in Representation Balancing for Treatment Effect Estimation

Authors: Yiyan HUANG, WANG Siyi, Cheuk Hang LEUNG, Qi WU, Dongdong WANG, Zhixiang Huang

Abstract: Estimating treatment effects from observational data is often subject to a covariate shift problem incurred by selection bias. Recent research has sought to mitigate this problem by leveraging representation balancing methods that aim to extract balancing patterns from observational data and utilize them for outcome prediction. The underlying theoretical rationale is that minimizing the unobserved counterfactual error can be achieved through two principles: (I) reducing the risk associated with predicting factual outcomes and (II) mitigating the distributional discrepancy between the treated and controlled samples. However, an inherent trade-off between the two principles can lead to a potential loss of information useful for factual outcome predictions and, consequently, deteriorating treatment effect estimations. In this paper, we propose a novel representation balancing model, DIGNet, for treatment effect estimation. DIGNet incorporates two key components, PDIG and PPBR, which effectively mitigate the trade-off problem by improving one aforementioned principle without sacrificing the other. Specifically, PDIG captures more effective balancing patterns (Principle II) without affecting factual outcome predictions (Principle I), while PPBR enhances factual outcome prediction (Principle I) without affecting the learning of balancing patterns (Principle II). The ablation studies verify the effectiveness of PDIG and PPBR in improving treatment effect estimation, and experimental results on benchmark datasets demonstrate the superior performance of our DIGNet model compared to baseline models.

URL: https://openreview.net/forum?id=Z20FInfWlm

---

Title: Mildly Overparameterized ReLU Networks Have a Favorable Loss Landscape

Authors: Kedar Karhadkar, Michael Murray, Hanna Tseran, Guido Montufar

Abstract: We study the loss landscape of both shallow and deep, mildly overparameterized ReLU neural networks on a generic finite input dataset for the squared error loss. We show both by count and volume that most activation patterns correspond to parameter regions with no bad local minima. Furthermore, for one-dimensional input data, we show most activation regions realizable by the network contain a high dimensional set of global minima and no bad local minima. We experimentally confirm these results by finding a phase transition from most regions having full rank Jacobian to many regions having deficient rank depending on the amount of overparameterization.

URL: https://openreview.net/forum?id=10WARaIwFn

---

Title: Augmenting Ad-Hoc IR Dataset for Interactive Conversational Search

Authors: Pierre ERBACHER, Jian-Yun Nie, Philippe Preux, Laure Soulier

Abstract: A peculiarity of conversational search systems is that they involve mixed-initiatives such as system-generated query clarifying questions. Evaluating those systems at a large scale on the end task of IR is very challenging, requiring adequate datasets containing such interactions. However, current datasets only focus on either traditional ad-hoc IR tasks or query clarification tasks, the latter being usually seen as a reformulation task from the initial query.
Only few datasets are known to contain both document relevance judgments and the associated clarification interactions such as Qulac and ClariQ. Both are based on the TREC Web Track 2009-12 collection but cover a very limited number of topics
(237 topics), far from being enough for training and testing conversational IR models.
To fill the gap, we propose a methodology to automatically build large-scale conversational IR datasets from ad-hoc IR datasets in order to facilitate explorations on conversational IR.
Our methodology is based on two processes: 1) generating query clarification interactions through query clarification and answer generators, and 2) augmenting ad-hoc IR datasets with simulated interactions.
In this paper, we focus on MsMarco and augment it with query clarification and answer simulations. We perform a thorough evaluation showing the quality and the relevance of the generated interactions for each initial query. This paper shows the feasibility and utility of augmenting ad-hoc IR datasets for conversational IR.

URL: https://openreview.net/forum?id=z8d7nT1HWw

---

Title: Text Descriptions are Compressive and Invariant Representations for Visual Learning

Authors: Zhili Feng, Anna Bair, J Zico Kolter

Abstract: Modern image classification is based on directly predicting classes via large discriminative networks, which do not directly contain information about the intuitive visual features that may constitute a classification decision. Recently, work in vision-language models (VLM) such as CLIP has provided ways to specify natural language descriptions of image classes, but typically focuses on providing single descriptions for each class. In this work, we demonstrate that an alternative approach, in line with humans' understanding of multiple visual features per class, can also provide compelling performance in the robust few-shot learning setting. In particular, we introduce a novel method, \textit{SLR-AVD (Sparse Logistic Regression using Augmented Visual Descriptors)}. This method first automatically generates multiple visual descriptions of each class via a large language model (LLM), then uses a VLM to translate these descriptions to a set of visual feature embeddings of each image, and finally uses sparse logistic regression to select a relevant subset of these features to classify each image. Core to our approach is the fact that, information-theoretically, these descriptive features are more invariant to domain shift than traditional image embeddings, even though the VLM training process is not explicitly designed for invariant representation learning. These invariant descriptive features also compose a better input compression scheme. When combined with finetuning, we show that SLR-AVD is able to outperform existing state-of-the-art finetuning approaches in both in-distribution and out-of-distribution tasks.

URL: https://openreview.net/forum?id=spo705Fyv0

---

Title: Semi-Supervised Semantic Segmentation via Marginal Contextual Information

Authors: Moshe Kimhi, Shai Kimhi, Evgenii Zheltonozhskii, Or Litany, Chaim Baskin

Abstract: We present a novel confidence refinement scheme that enhances pseudo-labels in semi-supervised semantic segmentation. Unlike existing methods, which filter pixels with low-confidence predictions in isolation, our approach leverages the spatial correlation of labels in segmentation maps by grouping neighboring pixels and considering their pseudo-labels collectively. With this contextual information, our method, named S4MC, increases the amount of unlabeled data used during training while maintaining the quality of the pseudo-labels, all with negligible computational overhead. Through extensive experiments on standard benchmarks, we demonstrate that S4MC outperforms existing state-of-the-art semi-supervised learning approaches, offering a promising solution for reducing the cost of acquiring dense annotations. For example, S4MC achieves a 1.39 mIoU improvement over the prior art on PASCAL VOC 12 with 366 annotated images. The code to reproduce our experiments is available at https://s4mcontext.github.io/

URL: https://openreview.net/forum?id=i5yKW1pmjW

---

Title: Multimodal Chain-of-Thought Reasoning in Language Models

Authors: Zhuosheng Zhang, Aston Zhang, Mu Li, hai zhao, George Karypis, Alex Smola

Abstract: Large language models (LLMs) have shown impressive performance on complex reasoning by leveraging chain-of-thought (CoT) prompting to generate intermediate reasoning chains as the rationale to infer the answer. However, existing CoT studies have primarily focused on the language modality. We propose Multimodal-CoT that incorporates language (text) and vision (images) modalities into a two-stage framework that separates rationale generation and answer inference. In this way, answer inference can leverage better generated rationales that are based on multimodal information. Experimental results on ScienceQA and A-OKVQA benchmark datasets show the effectiveness of our proposed approach. With Multimodal-CoT, our model under 1 billion parameters achieves state-of-the-art performance on the ScienceQA benchmark. Our analysis indicates that Multimodal-CoT offers the advantages of mitigating hallucination and enhancing convergence speed. Code is publicly available at https://github.com/amazon-science/mm-cot.

URL: https://openreview.net/forum?id=y1pPWFVfvR

---

Title: Contrastive Graph Autoencoder for Shape-based Polygon Retrieval from Large Geometry Datasets

Authors: Zexian Huang, Kourosh Khoshelham, Martin Tomko

Abstract: Retrieval of polygon geometries with similar shapes from maps is a challenging geographic information task. Existing approaches can not process geometry polygons with complex shapes, (multiple) holes and are sensitive to geometric transformations (e.g., rotations). We propose Contrastive Graph Autoencoder (CGAE), a robust and effective graph representation autoencoder for extracting polygon geometries of similar shapes from real-world building maps based on template queries. By leveraging graph message-passing layers, graph feature augmentation and contrastive learning, the proposed CGAE embeds highly discriminative latent embeddings by reconstructing graph features w.r.t. the graph representations of input polygons, outperforming existing graph-based autoencoders (GAEs) in geometry retrieval of similar polygons. Experimentally, we demonstrate this capability based on template query shapes on real-world datasets and show its high robustness to geometric transformations in contrast to existing GAEs, indicating the strong generalizability and versatility of CGAE, including on complex real-world building footprints.

URL: https://openreview.net/forum?id=9fcZNAmnyh

---

Title: Prototypical Self-Explainable Models Without Re-training

Authors: Srishti Gautam, Ahcene Boubekki, Marina MC Höhne, Michael Kampffmeyer

Abstract: Explainable AI (XAI) has unfolded in two distinct research directions with, on the one hand, post-hoc methods that explain the predictions of a pre-trained black-box model and, on the other hand, self-explainable models (SEMs) which are trained directly to provide explanations alongside their predictions. While the latter is preferred in safety-critical scenarios, post-hoc approaches have received the majority of attention until now, owing to their simplicity and ability to explain base models without retraining. Current SEMs, instead, require complex architectures and heavily regularized loss functions, thus necessitating specific and costly training. To address this shortcoming and facilitate wider use of SEMs, we propose a simple yet efficient universal method called KMEx (K-Means Explainer), which can convert any existing pre-trained model into a prototypical SEM. The motivation behind KMEx is to enhance transparency in deep learning-based decision-making via class-prototype-based explanations that are diverse and trustworthy without retraining the base model. We compare models obtained from KMEx to state-of-the-art SEMs using an extensive qualitative evaluation to highlight the strengths and weaknesses of each model, further paving the way toward a more reliable and objective evaluation of SEMs\footnote{The code is available at https://github.com/SrishtiGautam/KMEx}.

URL: https://openreview.net/forum?id=HU5DOUp6Sa

---

Title: Conservative Prediction via Data-Driven Confidence Minimization

Authors: Caroline Choi, Fahim Tajwar, Yoonho Lee, Huaxiu Yao, Ananya Kumar, Chelsea Finn

Abstract: In safety-critical applications of machine learning, it is often desirable for a model to be \textit{conservative}, abstaining from making predictions on ``unknown'' inputs which are not well-represented in the training data. However, detecting unknown examples is challenging, as it is impossible to anticipate all potential inputs at test time. To address this, prior work minimizes model confidence on an auxiliary outlier dataset carefully curated to be disjoint from the training distribution. We theoretically analyze the choice of auxiliary dataset for confidence minimization, revealing two actionable insights: (1) if the auxiliary set contains unknown examples similar to those seen at test time, confidence minimization leads to provable detection of unknown test examples, and (2) if the first condition is satisfied, it is unnecessary to filter out known examples for out-of-distribution (OOD) detection. Motivated by these guidelines, we propose the Data-Driven Confidence Minimization (DCM) framework, which minimizes confidence on an \textit{uncertainty dataset}. We apply DCM to two problem settings in which conservative prediction is paramount -- selective classification and OOD detection -- and provide a realistic way to gather uncertainty data for each setting. In our experiments, DCM consistently outperforms existing selective classification approaches on 4 datasets when tested on unseen distributions and outperforms state-of-the-art OOD detection methods on 12 ID-OOD dataset pairs, reducing FPR (at TPR $95\%$) by $6.3\%$ and $58.1\%$ on CIFAR-10 and CIFAR-100 compared to Outlier Exposure.

URL: https://openreview.net/forum?id=QPuxjsjKCP

---

Title: Physical Reasoning and Object Planning for Household Embodied Agents

Authors: Ayush Agrawal, Raghav Prabhakar, Anirudh Goyal, Dianbo Liu

Abstract: In this study, we explore the sophisticated domain of task planning for robust household embodied agents, with a particular emphasis on the intricate task of selecting substitute objects. We introduce the \textbf{C}ommonSense \textbf{O}bject \textbf{A}ffordance \textbf{T}ask \textbf{(COAT)}, a novel framework designed to analyze reasoning capabilities in commonsense scenarios. This approach is centered on understanding how these agents can effectively identify and utilize alternative objects when executing household tasks, thereby offering insights into the complexities of practical decision-making in real-world environments. Drawing inspiration from factors affecting human decision-making, we explore how large language models tackle this challenge through four meticulously crafted commonsense question-and-answer datasets featuring refined rules and human annotations. Our evaluation of state-of-the-art language models on these datasets sheds light on three pivotal considerations: 1) aligning an object's inherent utility with the task at hand, 2) navigating contextual dependencies (societal norms, safety, appropriateness, and efficiency), and 3) accounting for the current physical state of the object. To maintain accessibility, we introduce five abstract variables reflecting an object's physical condition, modulated by human insights, to simulate diverse household scenarios. Our contributions include insightful human preference mappings for all three factors and four extensive QA datasets (2K, 15k, 60k, 70K questions) probing the intricacies of utility dependencies, contextual dependencies and object physical states. The datasets, along with our findings, are accessible at: \url{https://github.com/com-phy-affordance/COAT}. This research not only advances our understanding of physical commonsense reasoning in language models but also paves the way for future improvements in household agent intelligence.

URL: https://openreview.net/forum?id=xYkdmEGhIM

---

Title: Reproducibility study of “LICO: Explainable Models with Language-Image Consistency"

Authors: Luan Fletcher, Robert van der Klis, Martin Sedláček, Stefan Vasilev, Christos Athanasiadis

Abstract: The growing reproducibility crisis in machine learning has brought forward a need for careful examination of research findings. This paper investigates the claims made by Lei et al. (2023) regarding their proposed method, LICO, for enhancing post-hoc interpretability techniques and improving image classification performance. LICO leverages natural language supervision from a vision-language model to enrich feature representations and guide the learning process. We conduct a comprehensive reproducibility study, employing (Wide) ResNets and established interpretability methods like Grad-CAM and RISE. We were mostly unable to reproduce the authors' results. In particular, we did not find that LICO consistently led to improved classification performance or improvements in quantitative and qualitative measures of interpretability. Thus, our findings highlight the importance of rigorous evaluation and transparent reporting in interpretability research.

URL: https://openreview.net/forum?id=Mf1H8X5DVb

---

Title: Robust Distortion-free Watermarks for Language Models

Authors: Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, Percy Liang

Abstract: We propose a methodology for planting watermarks in text from an autoregressive language model that are robust to perturbations without changing the distribution over text up to a certain maximum generation budget. We generate watermarked text by mapping a sequence of random numbers—which we compute using a randomized watermark key—to a sample from the language model. To detect watermarked text, any party who knows the key can align the text to the random number sequence. We instantiate our watermark methodology with two sampling schemes: inverse transform sampling and exponential minimum sampling. We apply these watermarks to three language models—OPT-1.3B, LLaMA-7B and Alpaca-7B—to experimentally validate their statistical power and robustness to various paraphrasing attacks. Notably, for both the OPT-1.3B and LLaMA-7B models, we find we can reliably detect watermarked text ($p \leq 0.01$) from $35$ tokens even after corrupting between $40$-$50$\% of the tokens via random edits (i.e., substitutions, insertions or deletions). For the Alpaca-7B model, we conduct a case study on the feasibility of watermarking responses to typical user instructions. Due to the lower entropy of the responses, detection is more difficult: around $25\%$ of the responses—whose median length is around $100$ tokens—are detectable with $p \leq 0.01$, and the watermark is also less robust to certain automated paraphrasing attacks we implement.

URL: https://openreview.net/forum?id=FpaCL1MO2C

---

New submissions
===============

Title: ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text

Abstract: The utilization of deep learning on electrocardiogram (ECG) analysis has brought the advanced accuracy and efficiency of cardiac healthcare diagnostics. In this work, we address a critical challenge in the field of ECG analysis with deep learning: learning robust representation without large-scale labeled datasets. We propose ECG Semantic Integrator (ESI), a novel multimodal contrastive pretraining framework that jointly learns from ECG signals and associated textual descriptions. ESI employs a dual objective function that comprises a contrastive loss and a captioning loss to develop representations of ECG data. To create a sufficiently large and diverse training dataset, we develop a retrieval-augmented generation (RAG)-based Large Language Model (LLM) pipeline, called Cardio Query Assistant (CQA). This pipeline is designed to generate detailed textual descriptions for ECGs from diverse databases. The generated text includes information about demographics and waveform patterns. This approach enables us to compile a large-scale multimodal dataset with over 660,000 ECG-text pairs for pretraining ESI, which then learns robust and generalizable representations of 12-lead ECG. We validate our approach through various downstream tasks, including arrhythmia detection and ECG-based subject identification. Our experimental results demonstrate substantial improvements over strong baselines in these tasks. These baselines encompass supervised and self-supervised learning methods, as well as prior multimodal pretraining approaches. Our work shows the potential of combining multimodal pretraining to improve the analysis of ECG signals.

URL: https://openreview.net/forum?id=giEbq8Khcf

---

Title: FLR: Label-Mixture Regularization for Federated Learning with Noisy Labels

Abstract: Label noise in federated learning (FL) has garnered increasing attention due to the decentralized nature of FL, where data is collected from multiple clients with potentially different levels of label noise. This study introduces two pivotal contributions to this domain. First, we anatomize the memorization phenomenon in FL into server-side and client-side components, marking the first investigation into how these distinct forms of memorization impact learning. Second, to mitigate the memorization in FL, we present the Federated Label-mixture Regularization (FLR) strategy, a straightforward yet effective approach that employs regularization through pseudo labels generated by merging local and global model predictions. This method not only improves the accuracy of the global model in both i.i.d. and non-i.i.d. settings but also effectively counters the memorization of noisy labels. We empirically find that FLR aligns with and advances existing FL and noisy label mitigation methods over multiple datasets under various levels of data heterogeneity and label noise.

URL: https://openreview.net/forum?id=Z8A3HDgS0E

---

Title: On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization

Abstract: We consider a regularized expected reward optimization problem in the non-oblivious setting that covers many existing problems in reinforcement learning (RL). In order to solve such an optimization problem, we apply and analyze the classical stochastic proximal gradient method. In particular, the method has shown to admit an $O(\epsilon^{-4})$ sample complexity to an $\epsilon$-stationary point, under standard conditions. Since the variance of the classical stochastic gradient estimator is typically large, which slows down the convergence, we also apply an efficient stochastic variance-reduce proximal gradient method with an importance sampling based ProbAbilistic Gradient Estimator (PAGE). Our analysis shows that the sample complexity can be improved from $O(\epsilon^{-4})$ to $O(\epsilon^{-3})$ under additional conditions. Our results on the stochastic (variance-reduced) proximal gradient method match the sample complexity of their most competitive counterparts for discounted Markov decision processes under similar settings. To the best of our knowledge, the proposed methods represent a novel approach in addressing the general regularized reward optimization problem.

URL: https://openreview.net/forum?id=Ve4Puj2LVT

---

Title: Attention Normalization Impacts Cardinality Generalization in Slot Attention

Abstract: Object-centric scene decompositions are important representations for downstream tasks in fields such as computer vision and robotics. The recently proposed Slot Attention module, already leveraged by several derivative works for image segmentation and object tracking in videos, is a deep learning component which performs unsupervised object-centric scene decomposition on input images. It is based on an attention architecture, in which latent slot vectors, which hold compressed information on objects, attend to localized perceptual features from the input image. In this paper, we show that design decisions on normalizing the aggregated values in the attention architecture have considerable impact on the capabilities of Slot Attention to generalize to a higher number of slots and objects as seen during training. We argue that the original Slot Attention normalization scheme discards information on the objects' sizes, which impairs its generalization capabilities. Based on these findings, we propose and investigate alternative normalization approaches which increase the generalization capabilities of Slot Attention to varying slot and object counts, resulting in performance gains on the tasks of unsupervised image segmentation.

URL: https://openreview.net/forum?id=llQXLfbGOq

---

Title: Non-ergodicity in reinforcement learning: robustness via ergodicity transformations

Abstract: Envisioned application areas for reinforcement learning (RL) include autonomous driving, precision agriculture, and finance, which all require RL agents to make decisions in the real world. A significant challenge hindering the adoption of RL methods in these domains is the non-robustness of conventional algorithms. In this paper, we argue that a fundamental issue contributing to this lack of robustness lies in the focus on the expected value of the return as the sole ``correct'' optimization objective. The expected value is the average over the statistical ensemble of infinitely many trajectories. For non-ergodic returns, this average differs from the average over a single but infinitely long trajectory. Consequently, optimizing the expected value can lead to policies that yield exceptionally high returns with probability zero but almost surely result in catastrophic outcomes. This problem can be circumvented by transforming the time series of collected returns into one with ergodic increments. This transformation enables learning robust policies by optimizing the long-term return for individual agents rather than the average across infinitely many trajectories. We propose an algorithm for learning ergodicity transformations from data and demonstrate its effectiveness in an instructive, non-ergodic environment and on standard RL benchmarks.

URL: https://openreview.net/forum?id=N2Sbp2biU7

---

Title: IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks

Abstract: In-context learning allows adapting a model to new tasks given a task description at test time. In this paper, we present IMProv - a generative model that is able to in-context learn visual tasks from multimodal prompts. Given a textual description of a visual task (e.g. “Left: input image, Right: foreground segmentation”), a few input-output visual examples, or both, the model in-context learns to solve it for a new test input. We train a masked generative transformer on a new dataset of figures from computer vision papers and their associated captions, together with a captioned large-scale image-text dataset. During inference time, we prompt the model with text and/or image task example(s) and have the model inpaint the corresponding output. We show that training our model with text conditioning and scaling the dataset size improves in-context learning for computer vision tasks by over $+10\%$ AP for Foreground Segmentation, over $+5\%$ gains in AP for Single Object Detection, and almost $20\%$ lower LPIPS in Colorization. Our emperical results suggest that vision and language prompts are complementary and it is advantageous to use both to achieve better in-context learning performance.

URL: https://openreview.net/forum?id=qBTgnk2HAf

---

Title: Variational Inference on the Final-Layer Output of Neural Networks

Abstract: Traditional neural networks are simple to train but they typically produce overconfident predictions. In contrast, Bayesian neural networks provide good uncertainty quantification but optimizing them is time consuming due to the large parameter space. This paper proposes to combine the advantages of both approaches by performing Variational Inference in the Final layer Output space (VIFO), because the output space is much smaller than the parameter space. We use neural networks to learn the mean and the variance of the probabilistic output. Using the Bayesian formulation we incorporate collapsed variational inference with VIFO which significantly improves the performance in practice. On the other hand, like standard, non-Beyesian models, VIFO enjoys simple training and one can use Rademacher complexity to provide risk bounds for the model. Experiments show that VIFO provides a good tradeoff in terms of run time and uncertainty quantification, especially for out of distribution data.

URL: https://openreview.net/forum?id=mTOzXLmLKr

---

Title: Permutation invariant functions: statistical tests, density estimation, and computationally efficient embedding

Abstract: Permutation invariance is among the most common symmetry that can be exploited to simplify complex problems in machine learning (ML). There has been a tremendous surge of research activities in building permutation invariant ML architectures. However, less attention is given to: (1) how to statistically test for permutation invariance of coordinates in a random vector where the dimension is allowed to grow with the sample size; (2) how to leverage permutation invariance in estimation problems and how does it help reduce dimensions. In this paper, we take a step back and examine these questions in several fundamental problems: (i) testing the assumption of permutation invariance of multivariate distributions; (ii) estimating permutation invariant densities; (iii) analyzing the metric entropy of permutation invariant function classes and compare them with their counterparts without imposing permutation invariance; (iv) deriving an embedding of permutation invariant reproducing kernel Hilbert spaces for efficient computation. In particular, our methods for (i) and (iv) are based on a sorting trick and (ii) is based on an averaging trick. These tricks substantially simplify the exploitation of permutation invariance.

URL: https://openreview.net/forum?id=UbGwHVsFsQ

---

Title: Preconditioned Neural Posterior Estimation for Likelihood-free Inference

Abstract: Simulation based inference (SBI) methods enable the estimation of posterior distributions when the likelihood function is intractable, but where model simulation is feasible. Popular neural approaches to SBI are the neural posterior estimator (NPE) and its sequential version (SNPE). These methods can outperform statistical SBI approaches such as approximate Bayesian computation (ABC), particularly for relatively small numbers of model simulations. However, we show in this paper that the NPE methods are not guaranteed to be highly accurate, even on problems with low dimension. In such settings the posterior cannot be accurately trained over the prior predictive space, and even the sequential extension remains sub-optimal. To overcome this, we propose preconditioned NPE (PNPE) and its sequential version (PSNPE), which uses a short run of ABC to effectively eliminate regions of parameter space that produce large discrepancy between simulations and data and allow the posterior emulator to be more accurately trained. We present comprehensive empirical evidence that this melding of neural and statistical SBI methods improves performance over a range of examples including a motivating example involving a complex agent-based models applied to real tumour growth data.

URL: https://openreview.net/forum?id=vgIBAOkIhY

---

Title: InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions

Abstract: Recent works have explored text-guided image editing using diffusion models and generated edited images based on text prompts. However, the models struggle to accurately locate the regions to be edited and faithfully perform precise edits. In this work, we propose a framework termed InstructEdit that can do fine-grained editing based on user instructions. Our proposed framework has three components: language processor, segmenter, and image editor. The first component, the language processor, processes the user instruction using a large language model. The goal of this processing is to parse the user instruction and output prompts for the segmenter and captions for the image editor. We adopt ChatGPT and optionally BLIP2 for this step. The second component, the segmenter, uses the segmentation prompt provided by the language processor. We employ a state-of-the-art segmentation framework Grounded Segment Anything to automatically generate a high-quality mask based on the segmentation prompt. The third component, the image editor, uses the captions from the language processor and the masks from the segmenter to compute the edited image. We adopt Stable Diffusion and the mask-guided generation from DiffEdit for this purpose. Experiments show that our method outperforms previous editing methods in fine-grained editing applications where the input image contains a complex object or multiple objects. We improve the mask quality over DiffEdit and thus improve the quality of edited images. We also show that our framework can be combined with the NeRF or video editing pipeline to achieve fine-grained scale NeRF or video editing application.

URL: https://openreview.net/forum?id=O25Tahy6Ax

---

Title: PLUM: Improving Inference Efficiency By Leveraging Repetition-Sparsity Trade-Off

Abstract: Efficient inference of Deep Neural Networks (DNNs) on resource-constrained edge devices is essential. Quantization and sparsity are key techniques that translate to repetition and sparsity within tensors at the hardware-software interface. This paper introduces the concept of repetition-sparsity trade-off that helps explain computational efficiency during inference. We propose PLUM, a unified co-design framework that integrates DNN inference systems and quantization (forward and backward pass) to leverage the repetition-sparsity trade-off to improve inference efficiency. Our results demonstrate that PLUM’s quantization method is more accurate than binary quantization with the same number of non-zero weights. Detailed analysis indicates that signed binarization generates a smaller distribution of effectual (non-zero) parameters nested within a larger distribution of total parameters of latent full-precision weights for a DNN block. Finally, the proposed PLUM framework achieves a 26% speedup on real hardware, doubles energy efficiency, and reduces
density by 2.8× compared to binary methods while retaining top-1 accuracy when compared to prior-art methods for ResNets on ImageNet (by achieving 66.2% top-1 accuracy), presenting an alternative solution for deploying efficient models in resource-limited environments

URL: https://openreview.net/forum?id=IEKtMMSblm

---

Title: Object-Centric Learning of Neural Policies for Zero-shot Transfer over Domains with Varying Quantities of Interest

Abstract: Our goal is to learn policies that generalize across variations in quantities of interest in the domain (e.g., number of objects, motion dynamics, distance to the goal) in a zero-shot manner. Recent work on object-centric approaches for image and video processing has shown significant promise in building models that generalize well to unseen settings. In this work, we present {\em Object Centric Reinforcement Learning Agent (ORLA)}, an object-centric approach for model-free RL in perceptual domains. ORLA works in three phases: first, it learns to extract a variable number of object masks via an expert trained using encoder-decoder architecture, which in turn generates data for fine-tuning a YOLO-based model for extracting bounding boxes in unseen settings. Second, bounding boxes are used to construct a symbolic state consisting of object positions across a sequence of frames. Finally, a Graph Attention Network (GAT) based architecture is employed over the extracted object positions to learn a dense state embedding, which is then decoded to get the final policy that generalizes to unseen environments. Our experiments over a number of domains show that ORLA can learn significantly better policies that transfer across variations in different quantities of interest compared to existing baselines, which often fail to do any meaningful transfer.

URL: https://openreview.net/forum?id=OymQgY792o

---

Title: Contrastive Learning with Consistent Representations

Abstract: Contrastive learning demonstrates great promise for representation learning. Data augmentations play a critical role in contrastive learning by providing informative views of the data without necessitating explicit labels. Nonetheless, the efficacy of current methodologies heavily hinges on the quality of employed data augmentation (DA) functions, often chosen manually from a limited set of options. While exploiting diverse data augmentations is appealing, the complexities inherent in both DAs and representation learning can lead to performance deterioration. Addressing this challenge and facilitating the systematic incorporation of diverse data augmentations, this paper proposes Contrastive Learning with Consistent Representations (CoCor). At the heart of CoCor is a novel consistency metric termed DA consistency. This metric governs the mapping of augmented input data to the representation space, ensuring that these instances are positioned optimally in a manner consistent with the applied intensity of the DA. Moreover, we propose to learn the optimal mapping locations as a function of DA, all while preserving a desired monotonic property relative to DA intensity.
Experimental results demonstrate that CoCor notably enhances the generalizability and transferability of learned representations in comparison to baseline methods.

URL: https://openreview.net/forum?id=gKeSI8w63Z

---

Title: Approximation , Estimation and Optimization Errors for a Deep Neural Network

Abstract: The error of supervised learning is typically split into three components: Approximation, estimation and optimization errors. While all three have been extensively studied in the literature, a unified treatment is less frequent, in part because of conflicting assumptions: Approximation results typically rely on carefully hand crafted weights, which are difficult to achieve by gradient descent. Optimization theory is best understood in over-parametrized regimes with more weights than samples, while classical estimation errors typically require the opposite regime with more samples that weights.

This paper contains two results which bound all three error components simultaneously for deep fully connected networks. The first uses a regular least squares loss and shows convergence in the under-parametrized regime. The second uses a kernel based loss function and shows convergence in both under and over-parametrized regimes.

URL: https://openreview.net/forum?id=nnTKcGNrbV

---

Title: Probabilistic Guarantees for Abductive Inference

Abstract: Abductive reasoning is ubiquitous in artificial intelligence and everyday thinking. However, formal theories that provide probabilistic guarantees for abductive inference are lacking. We present a quantitative formalization of abductive logic that combines Bayesian probability with the interpretation of abduction as a search process within the Algorithmic Search Framework (ASF). By incorporating uncertainty in background knowledge, we establish two novel sets of probabilistic bounds on the success of abduction when (1) selecting the single most likely cause while assuming noiseless observations, and (2) selecting any cause above some probability threshold while accounting for noisy observations. To our knowledge, no existing abductive or general inference bounds account for noisy observations. Furthermore, while most existing abductive frameworks assume exact underlying prior and likelihood distributions, we assume only percentile-based confidence intervals for such values. These milder assumptions result in greater flexibility and applicability of our framework. We also explore additional information-theoretic results from the ASF and provide mathematical justifications for everyday abductive intuitions.

URL: https://openreview.net/forum?id=DtJen9ML0g

---

Title: Sample-efficient decoding of visual stimuli from fMRI through inter-individual functional alignment

Abstract: Deep learning is leading to major advances in the realm of brain decoding from functional Magnetic Resonance Imaging (fMRI). However, the large inter-individual variability in brain characteristics has constrained most studies to train models on one participant at a time. This limitation hampers the training of deep learning models, which typically requires very large datasets. Here, we propose to boost brain decoding of videos and static images across participants by aligning brain responses of training and left-out participants. Evaluated on a retrieval task, compared to the anatomically-aligned baseline, our method halves the median rank in out-of-subject setups. It also outperforms classical within-subject approaches when fewer than 100 minutes of data is available for the tested participant. Furthermore, we show that our alignment framework handles multiple subjects, which improves accuracy upon classical single-subject approaches. Finally, we show that this method aligns neural representations in accordance with brain anatomy. Overall, this study lays the foundations for leveraging extensive neuroimaging datasets and enhancing the decoding of individual brains when a limited amount of brain-imaging data is available.

URL: https://openreview.net/forum?id=qvJraN50DT

---

Title: Publicly-Detectable Watermarking for Language Models

Abstract: We present a highly detectable, trustless watermarking scheme for LLMs: the detection algorithm contains no secret information, and it is executable by anyone. We embed a publicly-verifiable cryptographic signature into LLM output using rejection sampling. We prove that our scheme is cryptographically correct, sound, and distortion-free. We make novel uses of error-correction techniques to overcome periods of low entropy, a barrier for all prior watermarking schemes. We implement our scheme and make empirical measurements over open models in the 2.7B to 70B parameter range. Our experiments suggest that our formal claims are met in practice.

URL: https://openreview.net/forum?id=KUcPucDTSl

---

Title: PerSEval: Assessing Personalization in Text Summarizers

Abstract: Personalized summarization models cater to individuals' subjective understanding of saliency, as represented by their reading history and current topics of attention. Existing personalized text summarizers are primarily evaluated based on accuracy measures such as BLEU, ROUGE, and METEOR. However, a recent study argued that accuracy measures are inadequate for evaluating the $\textit{degree of personalization}$ of these models and proposed EGISES, the first metric to evaluate personalized text summaries. It was suggested that accuracy is a separate aspect and should be evaluated standalone. In this paper, we challenge the necessity of an accuracy leaderboard, suggesting that relying on accuracy-based aggregated results might lead to misleading conclusions. To support this, we delve deeper into EGISES, demonstrating both theoretically and empirically that it measures the $\textit{degree of responsiveness}$, a necessary but not sufficient condition for degree-of-personalization. We subsequently propose PerSEval, a novel measure that satisfies the required sufficiency condition. Based on the benchmarking of ten SOTA summarization models on the PENS dataset, we empirically establish that -- (i) PerSEval is reliable w.r.t human-judgment correlation (Pearson's $r$ = 0.73; Spearman's $\rho$ = 0.62; Kendall's $\tau$ = 0.42), (ii) PerSEval has high rank-stability, (iii) PerSEval as a rank-measure is not entailed by EGISES-based ranking, and (iv) PerSEval can be a standalone rank-measure without the need of any aggregated ranking.

URL: https://openreview.net/forum?id=yqT7eBz1VJ

---

Title: Selective Classification Under Distribution Shifts

Abstract: In selective classification (SC), a classifier abstains from making predictions that are likely to be wrong to avoid excessive errors. To deploy imperfect classifiers---imperfect either due to intrinsic statistical noise of data or for robustness issue of the classifier or beyond---in high-stakes scenarios, SC appears to be an attractive and necessary path to follow. Despite decades of research in SC, most previous SC methods still focus on the ideal statistical setting only, i.e., the data distribution at deployment is the same as that of training, although practical data can come from the wild. To bridge this gap, in this paper, we propose an SC framework that takes into account distribution shifts, termed generalized selective classification, that covers label-shifted (or out-of-distribution) and covariate-shifted samples, in addition to typical in-distribution samples, the first of its kind in the SC literature. We focus on non-training-based confidence-score functions for generalized SC on deep learning (DL) classifiers and propose two novel margin-based score functions. Through extensive analysis and experiments, we show that our proposed score functions are more effective and reliable than the existing ones for generalized SC on a variety of classification tasks and DL classifiers.

URL: https://openreview.net/forum?id=dmxMGW6J7N

---

Title: Incremental Spatial and Spectral Learning of Neural Operators for Solving Large-Scale PDEs

Abstract: Fourier Neural Operators (FNO) offer a principled approach to solving challenging partial differential equations (PDE) such as turbulent flows. At the core of FNO is a spectral layer that leverages a discretization-convergent representation in the Fourier domain, and learns
weights over a fixed set of frequencies. However, training FNO presents two significant challenges, particularly in large-scale, high-resolution applications: (i) Computing Fourier transform on high-resolution inputs is computationally intensive but necessary since fine-scale details are needed for solving many PDEs, such as fluid flows, (ii) selecting the relevant set of frequencies in the spectral layers is challenging, and too many modes can lead to overfitting, while too few can lead to underfitting. To address these issues, we introduce
the Incremental Fourier Neural Operator (iFNO), which progressively increases both the number of frequency modes used by the model as well as the resolution of the training data. We empirically show that iFNO reduces total training time while maintaining or improving
generalization performance across various datasets. Our method demonstrates a 38% lower testing error, using 20% fewer frequency modes compared to the existing FNO, while also achieving up to 46% faster training and a 2.8x reduction in model size.

URL: https://openreview.net/forum?id=xI6cPQObp0

---

Title: A Survey of Reinforcement Learning from Human Feedback

Abstract: Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function. Building on prior work on the related setting of preference-based reinforcement learning (PbRL), it stands at the intersection of artificial intelligence and human-computer interaction. This positioning offers a promising avenue to enhance the performance and adaptability of intelligent systems while also improving the alignment of their objectives with human values. The training of large language models (LLMs) has impressively demonstrated this potential in recent years, where RLHF played a decisive role in directing the model's capabilities toward human objectives. This article provides a comprehensive overview of the fundamentals of RLHF, exploring the intricate dynamics between RL agents and human input. While recent focus has been on RLHF for LLMs, our survey adopts a broader perspective, examining the diverse applications and wide-ranging impact of the technique. We delve into the core principles that underpin RLHF, shedding light on the symbiotic relationship between algorithms and human feedback, and discuss the main research trends in the field. By synthesizing the current landscape of RLHF research, this article aims to provide researchers as well as practitioners with a comprehensive understanding of this rapidly growing field of research.

URL: https://openreview.net/forum?id=f7OkIurx4b

---

Title: FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation

Abstract: Controllable text-to-image (T2I) diffusion models generate images conditioned on both text
prompts and semantic inputs of other modalities like edge maps. Nevertheless, current
controllable T2I methods commonly face challenges related to efficiency and faithfulness,
especially when conditioning on multiple inputs from either the same or diverse modalities. In
this paper, we propose a novel Flexible and Efficient method, FlexEControl, for controllable
T2I generation. At the core of FlexEControl is a unique weight decomposition strategy, which
allows for streamlined integration of various input types. This approach not only enhances
the faithfulness of the generated image to the control, but also significantly reduces the
computational overhead typically associated with multimodal conditioning. Our approach
achieves a reduction of 41% in trainable parameters and 30% in memory usage compared
with Uni-ControlNet. Moreover, it doubles data efficiency and can flexibly generate images
under the guidance of multiple input conditions of various modalities.

URL: https://openreview.net/forum?id=y8DSGN5nuN

---

Reply all

Reply to author

Forward

0 new messages