Weekly TMLR digest for Oct 29, 2023

4 views

Skip to first unread message

TMLR

unread,

Oct 28, 2023, 8:00:11 PM10/28/23

to tmlr-annou...@googlegroups.com

Accepted papers
===============

Title: Homomorphic Self-Supervised Learning

Authors: T. Anderson Keller, Xavier Suau, Luca Zappella

Abstract: Many state of the art self-supervised learning approaches fundamentally rely on transformations applied to the input in order to selectively extract task-relevant information. Recently, the field of equivariant deep learning has developed to introduce structure into the feature space of deep neural networks by designing them as homomorphisms with respect to input transformations. In this work, we observe that many existing self-supervised learning algorithms can be both unified and generalized when seen through the lens of equivariant representations. Specifically, we introduce a general framework we call Homomorphic Self-Supervised Learning, and theoretically show how it may subsume the use of input-augmentations provided an augmentation-homomorphic feature extractor. We validate this theory experimentally for simple augmentations, demonstrate the necessity of representational structure for feature-space SSL, and further empirically explore how the parameters of this framework relate to those of traditional augmentation-based self-supervised learning. We conclude with a discussion of the potential benefits afforded by this new perspective on self-supervised learning.

URL: https://openreview.net/forum?id=tEKqQgbwbf

---

Title: Multimodal Language Learning for Object Retrieval in Low Data Regimes in the Face of Missing Modalities

Authors: Kasra Darvish, Edward Raff, Francis Ferraro, Cynthia Matuszek

Abstract: Our study is motivated by robotics, where when dealing with robots or other physical systems, we often need to balance competing concerns of relying on complex, multimodal data coming from a variety of sensors with a general lack of large representative datasets. Despite the complexity of modern robotic platforms and the need for multimodal interaction, there has been little research on integrating more than two modalities in a low data regime with the real-world constraint that sensors fail due to obstructions or adverse conditions. In this work, we consider a case in which natural language is used as a retrieval query against objects, represented across multiple modalities, in a physical environment. We introduce extended multimodal alignment (EMMA), a method that learns to select the appropriate object while jointly refining modality-specific embeddings through a geometric (distance-based) loss. In contrast to prior work, our approach is able to incorporate an arbitrary number of views (modalities) of a particular piece of data. We demonstrate the efficacy of our model on a grounded language object retrieval scenario. We show that our model outperforms state-of-the-art baselines when little training data is available. Our code is available at https://github.com/kasraprime/EMMA.

URL: https://openreview.net/forum?id=cXa6Xdm0v7

---

Title: Worst-case Feature Risk Minimization for Data-Efficient Learning

Authors: Jingshi Lei, Da Li, Chengming Xu, Liming Fang, Timothy Hospedales, Yanwei Fu

Abstract: Deep learning models typically require massive amounts of annotated data to train a strong model for a task of interest. However, data annotation is time-consuming and costly. How to use labeled data from a related but distinct domain, or just a few samples to train a satisfactory model are thus important questions. To achieve this goal, models should resist overfitting to the specifics of the training data in order to generalize well to new data. This paper proposes a novel Worst-case Feature Risk Minimization (WFRM) method that helps improve model generalization. Specifically, we tackle a minimax optimization problem in feature space at each training iteration. Given the input features, we seek the feature perturbation that maximizes the current training loss and then minimizes the training loss of the worst-case features. By incorporating our WFRM during training, we significantly improve model generalization under distributional shift – Domain Generalization (DG) and in the low-data regime – Few-shot Learning (FSL). We theoretically analyze WFRM and find the key reason why it works better than ERM – it induces an empirical risk-based semi-adaptive $L_{2}$ regularization of the classifier weights, enabling a better risk-complexity trade-off. We evaluate WFRM on two data-efficient learning tasks, including three standard DG benchmarks of PACS, VLCS, OfficeHome and the most challenging FSL benchmark Meta-Dataset. Despite the simplicity, our method consistently improves various DG and FSL methods, leading to the new state-of-the-art performances in all settings. Codes & models will be released at https://github.com/jslei/WFRM.

URL: https://openreview.net/forum?id=czev0exHXT

---

Title: Conformal prediction under ambiguous ground truth

Authors: David Stutz, Abhijit Guha Roy, Tatiana Matejovicova, Patricia Strachan, Ali Taylan Cemgil, Arnaud Doucet

Abstract: Conformal Prediction (CP) allows to perform rigorous uncertainty quantification by constructing a prediction set $C(X)$ satisfying $\mathbb{P}(Y \in C(X))\geq 1-\alpha$ for a user-chosen $\alpha \in [0,1]$ by relying on calibration data $(X_1,Y_1),...,(X_n,Y_n)$ from $\mathbb{P}=\mathbb{P}^{X} \otimes \mathbb{P}^{Y|X}$. It is typically implicitly assumed that $\mathbb{P}^{Y|X}$ is the ``true'' posterior label distribution. However, in many real-world scenarios, the labels $Y_1,...,Y_n$ are obtained by aggregating expert opinions using a voting procedure, resulting in a one-hot distribution $\mathbb{P}_{\textup{vote}}^{Y|X}$. This is the case for most datasets, even well-known ones like ImageNet. For such ``voted'' labels, CP guarantees are thus w.r.t. $\mathbb{P}_{\textup{vote}}=\mathbb{P}^X \otimes \mathbb{P}_{\textup{vote}}^{Y|X}$ rather than the true distribution $\mathbb{P}$. In cases with unambiguous ground truth labels, the distinction between $\mathbb{P}_{\textup{vote}}$ and $\mathbb{P}$ is irrelevant. However, when experts do not agree because of ambiguous labels, approximating $\mathbb{P}^{Y|X}$ with a one-hot distribution $\mathbb{P}_{\textup{vote}}^{Y|X}$ ignores this uncertainty. In this paper, we propose to leverage expert opinions to approximate $\mathbb{P}^{Y|X}$ using a non-degenerate distribution $\mathbb{P}_{\textup{agg}}^{Y|X}$. We then develop \emph{Monte Carlo CP} procedures which provide guarantees w.r.t. $\mathbb{P}_{\textup{agg}}=\mathbb{P}^X \otimes \mathbb{P}_{\textup{agg}}^{Y|X}$ by sampling multiple synthetic pseudo-labels from $\mathbb{P}_{\textup{agg}}^{Y|X}$ for each calibration example $X_1,...,X_n$. In a case study of skin condition classification with significant disagreement among expert annotators, we show that applying CP w.r.t. $\mathbb{P}_{\textup{vote}}$ under-covers expert annotations: calibrated for $72\%$ coverage, it falls short by on average $10\%$; our Monte Carlo CP closes this gap both empirically and theoretically. We also extend Monte Carlo CP to multi-label classification and CP with calibration examples enriched through data augmentation.

URL: https://openreview.net/forum?id=CAd6V2qXxc

---

Title: Towards Stability of Autoregressive Neural Operators

Authors: Michael McCabe, Peter Harrington, Shashank Subramanian, Jed Brown

Abstract: Neural operators have proven to be a promising approach for modeling spatiotemporal systems in the physical sciences. However, training these models for large systems can be quite challenging as they incur significant computational and memory expense---these systems are often forced to rely on autoregressive time-stepping of the neural network to predict future temporal states. While this is effective in managing costs, it can lead to uncontrolled error growth over time and eventual instability. We analyze the sources of this autoregressive error growth using prototypical neural operator models for physical systems and explore ways to mitigate it. We introduce architectural and application-specific improvements that allow for careful control of instability-inducing operations within these models without inflating the compute/memory expense. We present results on several scientific systems that include Navier-Stokes fluid flow, rotating shallow water, and a high-resolution global weather forecasting system. We demonstrate that applying our design principles to neural operators leads to significantly lower errors for long-term forecasts as well as longer time horizons without qualitative signs of divergence compared to the original models for these systems. We open-source our code for reproducibility.

URL: https://openreview.net/forum?id=RFfUUtKYOG

---

Title: $f$-MICL: Understanding and Generalizing InfoNCE-based Contrastive Learning

Authors: Yiwei Lu, Guojun Zhang, Sun Sun, Hongyu Guo, Yaoliang Yu

Abstract: In self-supervised contrastive learning, a widely-adopted objective function is InfoNCE, which uses the heuristic cosine similarity for the representation comparison, and is closely related to maximizing the Kullback-Leibler (KL)-based mutual information. In this paper, we aim at answering two intriguing questions: (1) Can we go beyond the KL-based objective? (2) Besides the popular cosine similarity, can we design a better similarity function? We provide answers to both questions by generalizing the KL-based mutual information to the $f$-Mutual Information in Contrastive Learning ($f$-MICL) using the $f$-divergences. To answer the first question, we provide a wide range of $f$-MICL objectives which share the nice properties of InfoNCE (e.g., alignment and uniformity), and meanwhile result in similar or even superior performance. For the second question, assuming that the joint feature distribution is proportional to the Gaussian kernel, we derive an $f$-Gaussian similarity with better interpretability and empirical performance. Finally, we identify close relationships between the $f$-MICL objective and several popular InfoNCE-based objectives. Using benchmark tasks from both vision and natural language, we empirically evaluate $f$-MICL with different $f$-divergences on various architectures (SimCLR, MoCo, and MoCo v3) and datasets. We observe that $f$-MICL generally outperforms the benchmarks and the best-performing $f$-divergence is task and dataset dependent.

URL: https://openreview.net/forum?id=ZD03VUZmRx

---

Title: Non-Stationary Contextual Pricing with Safety Constraints

Authors: Dheeraj Baby, Jianyu Xu, Yu-Xiang Wang

Abstract: In a contextual pricing problem, a seller aims at maximizing the revenue over a sequence of sales sessions (described by feature vectors) using binary-censored feedback of "sold" or "not sold". Existing methods often overlook two practical challenges (1) the best pricing strategy could change over time; (2) the prices and pricing policies must conform to hard constraints due to safety, ethical or legal restrictions. We address both challenges by solving a more general problem of "universal dynamic regret" minimization in proper online learning with exp-concave losses --- an open problem posed by Baby & Wang (2021) that we partially resolve in this paper, with attention restricted to loss functions coming from a generalized linear model. Here "dynamic regret" measures the performance relative to a non-stationary sequence of policies, and "proper" means that the learner must choose feasible strategies within a pre-defined convex set, which we use to model the safety constraints. In this work, we consider a linear noisy valuation model for the customers. In the case of a known strictly log-concave market noise, our algorithm achieves $\tilde{O}(d^3T^{1/3}C_T^{2/3} \vee d^3)$ dynamic regret in comparison with the optimal policy series, where $T$, $d$ and $C_T$ stand for the time horizon, the feature dimension and the total variation (characterizing non-stationarity) respectively. This regret is near-optimal with respect to $T$ (within $O(\log T)$ gaps) and $C_T$, and our algorithm is adaptable to unknown $C_T$ and remains feasible throughout. However, the dependence on $d$ is suboptimal and the minimax rate is still open.

URL: https://openreview.net/forum?id=fWIQ9Oaao0

---

Title: VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment

Authors: Shraman Pramanick, Li Jing, Sayan Nag, Jiachen Zhu, Hardik J Shah, Yann LeCun, Rama Chellappa

Abstract: Vision-language pre-training (VLP) has recently proven highly effective for various uni- and multi-modal downstream applications. However, most existing end-to-end VLP methods use high-resolution image-text-box data to perform well on fine-grained region-level tasks, such as object detection, segmentation, and referring expression comprehension. Unfortunately, such high-resolution images with accurate bounding box annotations are expensive to collect and use for supervision at scale. In this work, we propose VoLTA (Vision Language Transformer with weakly-supervised local-feature Alignment), a new VLP paradigm that only utilizes image-caption data but achieves fine-grained region-level image understanding, eliminating the need for expensive box annotations. VoLTA adopts graph optimal transport-based weakly-supervised alignment on local image patches and text tokens to germinate an explicit, self-normalized, and interpretable low-level matching criterion. In addition, VoLTA pushes multi-modal fusion deep into the uni-modal backbones during pre training and removes fusion-specific transformer layers, further reducing memory requirements. Extensive experiments on a wide range of vision- and vision-language downstream tasks demonstrate the effectiveness of VoLTA on fine-grained applications without compromising the coarse-grained downstream performance, often outperforming methods using significantly more caption and box annotations.

URL: https://openreview.net/forum?id=Kt2VJrCKo4

---

Title: Benefits of Max Pooling in Neural Networks: Theoretical and Experimental Evidence

Authors: Kyle Matoba, Nikolaos Dimitriadis, François Fleuret

Abstract: When deep neural networks became state of the art image classifiers, numerous max pooling operations were an important component of the architecture. However, modern computer vision networks typically have few, if any, max pooling operations. To understand whether this trend is justified, we develop a mathematical framework analyzing ReLU based approximations of max pooling, and prove a sense in which max pooling cannot be replicated. We formulate and analyze a novel class of optimal approximations, and find that the residual can be made exponentially small in the kernel size, but only with an exponentially wide approximation.

This work gives a theoretical basis for understanding the reduced use of max pooling in newer architectures. It also enables us to establish an empirical observation about natural images: since max pooling does not seem necessary, the inputs on which max pooling is distinct – those with a large difference between the max and other values – are not prevalent.

URL: https://openreview.net/forum?id=YgeXqrH7gA

---

Title: Local Advantage Networks for Multi-Agent Reinforcement Learning in Dec-POMDPs

Authors: Raphaël Avalos, Mathieu Reymond, Ann Nowe, Diederik M Roijers

Abstract: Many recent successful off-policy multi-agent reinforcement learning (MARL) algorithms for cooperative partially observable environments focus on finding factorized value functions, leading to convoluted network structures. Building on the structure of independent Q-learners, our LAN algorithm takes a radically different approach, leveraging a dueling architecture to learn for each agent a decentralized best-response policies via individual advantage functions. The learning is stabilized by a centralized critic whose primary objective is to reduce the moving target problem of the individual advantages. The critic, whose network's size is independent of the number of agents, is cast aside after learning. Evaluation on the StarCraft II multi-agent challenge benchmark shows that LAN reaches state-of-the-art performance and is highly scalable with respect to the number of agents, opening up a promising alternative direction for MARL research.

URL: https://openreview.net/forum?id=adpKzWQunW

---

Title: Beyond Distribution Shift: Spurious Features Through the Lens of Training Dynamics

Authors: Nihal Murali, Aahlad Manas Puli, Ke Yu, Rajesh Ranganath, kayhan Batmanghelich

Abstract: Deep Neural Networks (DNNs) are prone to learning spurious features that correlate with the label during training but are irrelevant to the learning problem. This hurts model generalization and poses problems when deploying them in safety-critical applications. This paper aims to better understand the effects of spurious features through the lens of the learning dynamics of the internal neurons during the training process. We make the following observations: (1) While previous works highlight the harmful effects of spurious features on the generalization ability of DNNs, we emphasize that not all spurious features are harmful. Spurious features can be "benign" or "harmful" depending on whether they are "harder" or "easier" to learn than the core features for a given model. This definition is model and dataset dependent. (2) We build upon this premise and use instance difficulty methods (like Prediction Depth) to quantify "easiness" for a given model and to identify this behavior during the training phase. (3) We empirically show that the harmful spurious features can be detected by observing the learning dynamics of the DNN's early layers. In other words, easy features learned by the initial layers of a DNN early during the training can (potentially) hurt model generalization. We verify our claims on medical and vision datasets, both simulated and real, and justify the empirical success of our hypothesis by showing the theoretical connections between Prediction Depth and information-theoretic concepts like $\mathcal{V}$-usable information. Lastly, our experiments show that monitoring only accuracy during training (as is common in machine learning pipelines) is insufficient to detect spurious features. We, therefore, highlight the need for monitoring early training dynamics using suitable instance difficulty metrics.

URL: https://openreview.net/forum?id=Tkvmt9nDmB

---

Title: Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning

Authors: Filippos Christianos, Georgios Papoudakis, Stefano V Albrecht

Abstract: This work focuses on equilibrium selection in no-conflict multi-agent games, where we specifically study the problem of selecting a Pareto-optimal Nash equilibrium among several existing equilibria.
It has been shown that many state-of-the-art multi-agent reinforcement learning (MARL) algorithms are prone to converging to Pareto-dominated equilibria due to the uncertainty each agent has about the policy of the other agents during training. To address sub-optimal equilibrium selection, we propose Pareto Actor-Critic (Pareto-AC), which is an actor-critic algorithm that utilises a simple property of no-conflict games (a superset of cooperative games): the Pareto-optimal equilibrium in a no-conflict game maximises the returns of all agents and, therefore, is the preferred outcome for all agents.
We evaluate Pareto-AC in a diverse set of multi-agent games and show that it converges to higher episodic returns compared to seven state-of-the-art MARL algorithms and that it successfully converges to a Pareto-optimal equilibrium in a range of matrix games. Finally, we propose PACDCG, a graph neural network extension of Pareto-AC, which is shown to efficiently scale in games with a large number of agents.

URL: https://openreview.net/forum?id=3AzqYa18ah

---

Title: Bridging Imitation and Online Reinforcement Learning: An Optimistic Tale

Authors: Botao Hao, Rahul Jain, Dengwang Tang, Zheng Wen

Abstract: In this paper, we address the following problem: Given an offline demonstration dataset from an imperfect expert, what is the best way to leverage it to bootstrap online learning performance in MDPs. We first propose an Informed Posterior Sampling-based RL (iPSRL) algorithm that uses the offline dataset, and information about the expert's behavioral policy used to generate the offline dataset. Its cumulative Bayesian regret goes down to zero exponentially fast in $N$, the offline dataset size if the expert is competent enough. Since this algorithm is computationally impractical, we then propose the iRLSVI algorithm that can be seen as a combination of the RLSVI algorithm for online RL, and imitation learning. Our empirical results show that the proposed iRLSVI algorithm is able to achieve significant reduction in regret as compared to two baselines: no offline data, and offline dataset but used without suitably modeling the generative policy.
Our algorithm can be seen as bridging online RL and imitation learning.

URL: https://openreview.net/forum?id=lanGfX0M6C

---

Title: Gradient Masked Averaging for Federated Learning

Authors: Irene Tenison, Sai Aravind Sreeramadas, Vaikkunth Mugunthan, Edouard Oyallon, Irina Rish, Eugene Belilovsky

Abstract: Federated learning (FL) is an emerging paradigm that permits a large number of clients with heterogeneous data to coordinate learning of a unified global model without the need to share data amongst each other. A major challenge in federated learning is the heterogeneity of data across client, which can degrade the performance of standard FL algorithms. Standard FL algorithms involve averaging of model parameters or gradient updates to approximate the global model at the server. However, we argue that in heterogeneous settings, averaging can result in information loss and lead to poor generalization due to the bias induced by dominant client gradients. We hypothesize that to generalize better across non-i.i.d datasets, the algorithms should focus on learning the invariant mechanism that is constant while ignoring spurious mechanisms that differ across clients. Inspired from recent works in Out-of-Distribution generalization, we
propose a gradient masked averaging approach for FL as an alternative to the standard averaging of client updates. This aggregation technique for client updates can be adapted as a drop-in replacement in most existing federated algorithms. We perform extensive experiments on multiple FL algorithms with in-distribution, real-world, feature-skewed out-of-distribution, and quantity imbalanced datasets and show that it provides consistent improvements, particularly in the case of heterogeneous clients.

URL: https://openreview.net/forum?id=REAyrhRYAo

---

Title: Training Vision-Language Transformers from Captions

Authors: Liangke Gui, Yingshan Chang, Qiuyuan Huang, Subhojit Som, Alexander G Hauptmann, Jianfeng Gao, Yonatan Bisk

Abstract: Vision-Language Transformers can be learned without low-level human labels (e.g. class labels, bounding boxes, etc). Existing work, whether explicitly utilizing bounding boxes (Chen et al., 2020b; Tan & Bansal, 2019; Lu et al., 2019) or patches (Kim et al., 2021), assumes that the visual backbone must first be trained on ImageNet (Russakovsky et al., 2015) class prediction before being integrated into a multimodal linguistic pipeline. We show that this is not necessary and introduce a new model Vision-Language from Captions (VLC) built on top of Masked Auto-Encoders (He et al., 2022) that does not require this supervision. We seek to provide general advice on multimodal pretraining by examining the roles of (a) unimodal initialization, (b) unimodal architectural components and (c) data annotation in the pretraining corpus. Our extensive and carefully controlled studies suggest that none of the above factors is absolutely important in achieving versatile vision-language representations. We conclude our analysis with suggestions on the choices of initialization, architectural components, and annotation formats targeting a better balance between data efficiency and representation quality.

URL: https://openreview.net/forum?id=xLnbSpozWS

---

New submissions
===============

Title: Incorporating Prior Knowledge into Neural Networks through an Implicit Composite Kernel

Abstract: It is challenging to guide neural network (NN) learning with prior knowledge. In contrast, many known properties, such as spatial smoothness or seasonality, are straightforward to model by choosing an appropriate kernel in a Gaussian process (GP). Many deep learning applications could be enhanced by modeling such known properties. For example, convolutional neural networks (CNNs) are frequently used in remote sensing, which is subject to strong seasonal effects. We propose to blend the strengths of NNs and the clear modeling capabilities of GPs by using a composite kernel that combines a kernel implicitly defined by a neural network with a second kernel function chosen to model known properties (e.g., seasonality). We implement this idea by combining a deep network and an efficient mapping function based on either Nystrom approximation or random Fourier features, which we call Implicit Composite Kernel (ICK). We then adopt a sample-then-optimize approach to approximate the full GP posterior distribution. We demonstrate that ICK has superior performance and flexibility on both synthetic and real-world datasets including a remote sensing dataset. The ICK framework can be used to include prior information into neural networks in many applications.

URL: https://openreview.net/forum?id=HhjSalvWVe

---

Title: Functional Linear Regression of Cumulative Distribution Functions

Abstract: The estimation of cumulative distribution functions (CDF) is an important learning task with a great variety of downstream applications, such as risk assessments in predictions and decision making. In this paper, we study functional regression of contextual CDFs where each data point is sampled from a linear combination of context dependent CDF basis functions. We propose functional ridge-regression-based estimation methods that estimate \CDF{}s accurately everywhere. In particular, given $n$ samples with $d$ basis functions, we show estimation error upper bounds of $\widetilde O(\sqrt{d/n})$ for fixed design, random design, and adversarial context cases. We also derive matching information theoretic lower bounds, establishing minimax optimality for CDF functional regression.
Furthermore, we remove the burn-in time in the random design setting using an alternative penalized estimator. Then, we consider agnostic settings where there is a mismatch in the data generation process. We characterize the error of the proposed estimators in terms of the mismatched error, and show that the estimators are well-behaved under model mismatch.
Moreover, to complete our study, we formalize infinite dimensional models where the parameter space is an infinite dimensional Hilbert space, and establish a self-normalized estimation error upper bound for this setting. Notably, the upper bound reduces to the $\widetilde O(\sqrt{d/n})$ bound when the parameter space is constrained to be $d$-dimensional.
Our comprehensive numerical experiments validate the efficacy of our estimation methods in both synthetic and practical settings.

URL: https://openreview.net/forum?id=ZOqJCP4eMk

---

Title: An optimal control perspective on diffusion-based generative modeling

Abstract: We establish a connection between stochastic optimal control and generative models based on stochastic differential equations (SDEs), such as recently developed diffusion probabilistic models. In particular, we derive a Hamilton--Jacobi--Bellman equation that governs the evolution of the log-densities of the underlying SDE marginals. This perspective allows to transfer methods from optimal control theory to generative modeling. First, we show that the evidence lower bound is a direct consequence of the well-known verification theorem from control theory. Further, we can formulate diffusion-based generative modeling as a minimization of the Kullback--Leibler divergence between suitable measures in path space. Finally, we develop a novel diffusion-based method for sampling from unnormalized densities -- a problem frequently occurring in statistics and computational sciences. We demonstrate that our time-reversed diffusion sampler (DIS) can outperform other diffusion-based sampling approaches on multiple numerical examples.

URL: https://openreview.net/forum?id=oYIjw37pTP

---

Title: Automated Design of Metaheuristic Algorithms: A Survey

Abstract: Metaheuristics have gained great success in academia and practice because their search logic can be applied to any problem with available solution representation, solution quality evaluation, and certain notions of locality. Manually designing metaheuristic algorithms for solving a target problem is criticized for being laborious, error-prone, and requiring intensive specialized knowledge. This gives rise to increasing interest in automated design of metaheuristic algorithms. With computing power to fully explore potential design choices, the automated design could reach and even surpass human-level design and could make high-performance algorithms accessible to a much wider range of researchers and practitioners. This paper presents a broad picture of automated design of metaheuristic algorithms, by conducting a survey on the common grounds and representative techniques in terms of design space, design strategies, performance evaluation strategies, and target problems in this field.

URL: https://openreview.net/forum?id=qhtHsvF5zj

---

Title: On the Adversarial Robustness of Camera-based 3D Object Detection

Abstract: In recent years, camera-based 3D object detection has gained widespread attention for its ability to achieve high performance with low computational cost. However, the robustness of these methods to adversarial attacks has not been thoroughly examined, especially when considering their deployment in safety-critical domains like autonomous driving. In this study, we conduct the first comprehensive investigation of the robustness of leading camera-based 3D object detection approaches under various adversarial conditions. We systematically analyze the resilience of these models under two attack settings: white-box and black-box; focusing on two primary objectives: classification and localization. Additionally, we delve into two types of adversarial attack techniques: pixel-based and patch-based. Our experiments yield four interesting findings: (a) bird's-eye-view-based representations exhibit stronger robustness against localization attacks; (b) depth-estimation-free approaches have the potential to show stronger robustness; (c) accurate depth estimation effectively improves robustness for depth-estimation-based methods; (d) incorporating multi-frame benign inputs can effectively mitigate adversarial attacks. We hope our findings can steer the development of future camera-based object detection models with enhanced adversarial robustness.

URL: https://openreview.net/forum?id=6SofFlwhEv

---

Title: Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey

Abstract: Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses. However, despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs such as spreading misinformation, generating fake news, plagiarism in academia, and contaminating the web. To address these concerns, a consensus among the research community is to develop algorithmic solutions to detect AI-generated text. The basic idea is that whenever we can tell if the given text is either written by a human or an AI, we can utilize this information to address the above-mentioned concerns. To that end, a plethora of detection frameworks have been proposed, highlighting the possibilities of AI-generated text detection. But in parallel to the development of detection frameworks, researchers have also concentrated on designing strategies to elude detection, i.e., focusing on the impossibilities of AI-generated text detection. This is a crucial step in order to make sure the detection frameworks are robust enough and it is not too easy to fool a detector. Despite the huge interest and the flurry of research in this domain, the community currently lacks a comprehensive analysis of recent developments. In this survey, we aim to provide a concise categorization and overview of current work encompassing both the prospects and the limitations of AI-generated text detection. To enrich the collective knowledge, we engage in an exhaustive discussion on critical and challenging open questions related to ongoing research on AI-generated text detection.

URL: https://openreview.net/forum?id=AXtFeYjboj

---

Title: Correlation Clustering with Active Learning of Pairwise Similarities

Abstract: Correlation clustering is a well-known unsupervised learning setting that deals with positive and negative pairwise similarities. In this paper, we study the case where the pairwise similarities are not given in advance and must be queried in a cost-efficient way. Thereby, we develop a generic active learning framework for this task that benefits from several advantages, e.g., flexibility in the type of feedback that a user/annotator can provide, adaptation to any correlation clustering algorithm and query strategy, and robustness to noise. In addition, we propose and analyze a number of novel query strategies suited to this setting. We demonstrate the effectiveness of our framework and the proposed query strategies via several experimental studies.

URL: https://openreview.net/forum?id=Ryf1TVCjBz

---

Title: Variance-aware decision making with linear function approximation under heavy-tailed rewards

Abstract: This paper studies how to achieve variance-aware regrets for online decision-making in the presence of heavy-tailed rewards with only finite variances. For linear stochastic bandits, we address the issue of heavy-tailed rewards by modifying the adaptive Huber regression and proposing AdaOFUL. AdaOFUL achieves a state-of-the-art regret bound of $\widetilde{\mathcal{O}}\big(d\big(\sum_{t=1}^T \nu_{t}^2\big)^{1/2}+d\big)$ as if the rewards were uniformly bounded, where $\nu_{t}^2$ is the conditional variance of the reward at round $t$ and $d$ is the feature dimension. Building upon AdaOFUL, we propose VARA for linear MDPs, which achieves a tighter variance-aware regret bound of $\widetilde{\mathcal{O}}(d\sqrt{H\mathcal{G}^*K})$. Here, $H$ is the length of episodes, $K$ is the number of episodes, and $\mathcal{G}^*$ is a smaller instance-dependent quantity that can be bounded by other instance-dependent quantities when additional structural conditions on the MDP are satisfied. Overall, our modified adaptive Huber regression algorithm may serve as a useful building block in the design of algorithms for online problems with heavy-tailed rewards.

URL: https://openreview.net/forum?id=8bnsoL2IyJ

---

Title: Exploit CAM by itself: Complementary Learning System for Weakly Supervised Semantic Segmentation

Abstract: Weakly Supervised Semantic Segmentation (WSSS) with image-level labels has long been suffering from fragmentary object regions led by Class Activation Map (CAM), which is incapable of generating fine-grained masks for semantic segmentation. To guide CAM to find more non-discriminating object patterns, this paper turns to an interesting working mechanism in agent learning named Complementary Learning System (CLS). CLS holds that the neocortex builds a sensation of general knowledge, while the hippocampus specially learns specific details, completing the learned patterns. Motivated by this simple but effective learning pattern, we propose a General-Specific Learning Mechanism (GSLM) to explicitly drive a coarse-grained CAM to a fine-grained pseudo mask. Specifically, GSLM develops a General Learning Module (GLM) and a Specific Learning Module (SLM). The GLM is trained with image-level supervision to extract coarse and general localization representations from CAM. Based on the general knowledge in the GLM, the SLM progressively exploits the specific spatial knowledge from the localization representations, expanding the CAM in an explicit way. To this end, we propose the Seed Reactivation to help SLM reactivate non-discriminating regions by setting a boundary for activation values, which successively identifies more regions of CAM. Without extra refinement processes, our method is able to achieve breakthrough improvements for CAM of over 20.0% mIoU on PASCAL VOC 2012 and 10.0% mIoU on MS COCO 2014 datasets, representing a new state-of-the-art among existing WSSS methods.

URL: https://openreview.net/forum?id=KutEe24Yai

---

Title: Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity

Abstract: In some settings neural networks exhibit a phenomenon known as \textit{grokking}, where they achieve perfect or near-perfect accuracy on the validation set long after the same performance has been achieved on the training set. In this paper, we discover that grokking is not limited to neural networks but occurs in other settings such as Gaussian process (GP) classification, GP regression and linear regression. We also uncover a mechanism by which to induce grokking on algorithmic datasets via the addition of dimensions containing spurious information. The presence of the phenomenon in non-neural architectures provides evidence that grokking is not specific to SGD or weight norm regularisation. Instead, grokking may be possible in any setting where solution search is guided by complexity and error. Based on this insight and further trends we see in the training trajectories of a Bayesian neural network (BNN) and GP regression model, we make progress towards a more general theory of grokking. Specifically, we hypothesise that the phenomenon is governed by the accessibility of certain regions in the error and complexity landscapes.

URL: https://openreview.net/forum?id=ux9BrxPCl8

---

Title: Hierarchical Neural Simulation-Based Inference Over Event Ensembles

Abstract: When analyzing real-world data it is common to work with event ensembles, which comprise sets of observations that collectively constrain the parameters of an underlying model of interest. Such models often have a hierarchical structure, where ``local'' parameters impact individual events and ``global'' parameters influence the entire dataset. We introduce practical approaches for optimal frequentist and Bayesian dataset-wide probabilistic inference in cases where the likelihood is intractable, but simulations can be realized via forward modeling. We construct neural estimators for the likelihood(-ratio) or posterior and show that explicitly accounting for the model's hierarchical structure can lead to significantly tighter parameter constraints. We ground our discussion using case studies from the physical sciences, focusing on examples from particle physics and cosmology.

URL: https://openreview.net/forum?id=Jy2IgzjoFH

---

Title: Boosting Visual-Language Models by Exploiting Hard Pairs

Abstract: Large vision and language models, such as Contrastive Language-Image Pre-training (CLIP), have emerged as the industry standard for aligning images with their corresponding textual descriptions. However, to enhance zero-shot recognition, current methods often demand ad- ditional data collection and retraining with the introduced new loss functions, which hinder their application to an already well-trained CLIP model. In this work, we present Helip, a low-cost strategy tailored to enhance the performance of pre-trained CLIP models. This is achieved by further training them with challenging text-image pairs selected from their training dataset. Our proposed Hard Pair Mining (HPM) method treats a text-image pair as a single point in the joint Vision-Language space and identifies those in close proximity to a given pair as its hard pairs. By incorporating these challenging data, we refine pretrained CLIP models using both the traditional contrastive alignment loss and the newly intro- duced Hard Negative Margin Loss (HNML). This approach ensures the optimal harnessing of insights from challenging data. Notably, Helip is designed to be seamlessly integrated with existing models, providing an enhancement without the need for training a model from scratch or collecting additional data. On a comprehensive zero-shot and retrieval benchmark, Helip consistently boosts existing models to achieve leading performance. In particular, for ImageNet zero-shot accuracy, Helip boosts CC3M and CC12M pretrained SLIP by 3.05 and 4.47 respectively. In addition, the systematic evaluations of zero-shot and linear probing experiments across fine-grained classification datasets demonstrate a consistent performance improvement and validates the efficacy of Helip. Specifically, Helip boosts the zero-shot performance of pretrained CLIP and SLIP by an average of 8.4% and 18.6%, respectively, and improves their linear probe performance by an average of 9.5% and 3.0%.

URL: https://openreview.net/forum?id=WWwEvGkJL9

---

Title: Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding

Abstract: This paper presents Predictive Pipelined Decoding (PPD), an approach that speeds up greedy decoding in Large Language Models (LLMs) while maintaining the exact same output as the original decoding. Unlike conventional strategies, PPD employs additional compute resources to parallelize the initiation of subsequent token decoding during the current token decoding. This method reduces decoding latency and reshapes the understanding of trade-offs in LLM decoding strategies. We have developed a theoretical framework that allows us to analyze the trade-off between computation and latency. Using this framework, we can analytically estimate the potential reduction in latency associated with our proposed method, achieved through the assessment of the match rate, represented as $p_\text{correct}$. The results demonstrate that the use of extra computational resources has the potential to accelerate LLM greedy decoding.

URL: https://openreview.net/forum?id=yUmJ483OB0

---

Title: Exploring Simple, High Quality Out-of-Distribution Detection with L2 Normalization

Abstract: We demonstrate that L2 normalization over feature space--an extremely simple method requiring no additional training strategies, hyperparameters, specialized loss functions or image augmentation--can produce competitive results for Out-of-Distribution (OoD) detection. It requires only a fraction of the training time (60 epochs with ResNet18, 100 epochs with ResNet50) of more sophisticated methods. We show theoretically and empirically that our simple method decouples feature norms from the Neural Collapse (NC) constraints imposed by CE loss minimization. This decoupling preserves more feature-level information than a standard CE loss training regime, and allows greater separability between ID norms and near-OoD or far-OoD norms. Our goal is to provide insight toward fundamental, model-based approaches to OoD detection, with less reliance on external factors such as hyperparameter tuning or specialized training regimes. We suggest that L2 normalization provides a collection of benefits large enough to warrant consideration as a standard architecture choice.

URL: https://openreview.net/forum?id=daX2UkLMS0

---

Title: SDGym: Low-Code Reinforcement Learning Environments using System Dynamics Models

Abstract: Understanding the long-term impact of algorithmic interventions on society is vital to achieving responsible AI. Traditional evaluation strategies often fall short due to the complex, adaptive and dynamic nature of society. While reinforcement learning (RL) can be a powerful approach for optimizing decisions in dynamic settings, the difficulty of realistic environment design remains a barrier to building robust agents that perform well in practical settings. To address this issue we tap into the field of system dynamics (SD) as a complementary method that incorporates collaborative simulation model specification practices. We introduce SDGym, a low-code library built on the OpenAI Gym framework which enables the generation of custom RL environments based on SD simulation models. Through a feasibility study we validate that well specified, rich RL environments can be generated from preexisting SD models and a few lines of configuration code. We demonstrate the capabilities of the SDGym environment using an SD model of the electric vehicle adoption problem. We compare two SD simulators, PySD and BPTK-Py for parity, and train a D4PG agent using the Acme framework to showcase learning and environment interaction. Our preliminary findings underscore the dual potential of SD to improve RL environment design and for RL to improve dynamic policy discovery within SD models. By open-sourcing SDGym, the intent is to galvanize further research and promote adoption across the SD and RL communities, thereby catalyzing collaboration in this emerging interdisciplinary space.

URL: https://openreview.net/forum?id=r8HATGatwy

---

Title: Personalized Algorithmic Recourse with Preference Elicitation

Abstract: Algorithmic Recourse (AR) is the problem of computing a sequence of actions that -- once performed by a user -- overturns an undesirable machine decision. It is paramount that the sequence of actions does not require too much effort for users to implement. Yet, most approaches to AR assume that actions cost the same for all users, and thus may recommend unfairly expensive recourse plans to certain users. Prompted by this observation, we introduce PEAR, the first human-in-the-loop approach capable of providing personalized algorithmic recourse tailored to the needs of any end-user. PEAR builds on insights from Bayesian Preference Elicitation to iteratively refine an estimate of the costs of actions by asking choice set queries to the target user. The queries themselves are computed by maximizing the Expected Utility of Selection, a principled measure of information gain accounting for uncertainty on both the cost estimate and the user's responses. PEAR integrates elicitation into a Reinforcement Learning agent coupled with Monte Carlo Tree Search to quickly identify promising recourse plans. Our empirical evaluation on real-world datasets highlights how PEAR produces high-quality personalized recourse in only a handful of iterations.

URL: https://openreview.net/forum?id=8sg2I9zXgO

---

Title: Rotate the ReLU to Sparsify Deep Networks Implicitly

Abstract: Compact and energy-efficient models have become essential in this era when deep learning-based solutions are widely used for various real-life tasks. In this paper, we propose rotating the ReLU activation to give an additional degree of freedom in conjunction with the appropriate initialization of the rotation. This combination leads to implicit sparsification without the use of a regularizer. We show that this rotated ReLU (RReLU) activation improves the representation capability of the parameters/filters in the network and eliminates those parameters/filters that are not crucial for the task, giving rise to significant savings in memory and computation. While the state-of-the-art regularization-based Network-Slimming method achieves $28.65\%$ saving in memory and $38.47\%$ saving in computation with ResNet-$164$, RReLU achieves a saving of $46.2\%$ in memory and $47.3\%$ in the computation without any loss in accuracy. We note that the slopes of the rotated ReLU activations act as coarse feature extractors and can eliminate unnecessary features before retraining. Our studies indicate that features always choose to pass through a lesser number of filters. We demonstrate the results with popular datasets such as MNIST, CIFAR-10, CIFAR-100, SVHN, and Imagenet with different architectures, including Vision Transformers. We also briefly study the impact of adversarial attacks on RReLU-based ResNets and observe that we get better adversarial accuracy for the architectures with RReLU than ReLU. We also demonstrate how this concept of rotation can be applied to the GELU activation function, commonly utilized in Transformer architectures. For the GELU-based multi-layer perceptron (MLP) part of the Transformer, we obtain $2.6\%$ improvement in accuracy with $6.32\%$ saving in both memory and computation.

URL: https://openreview.net/forum?id=Nzy0XmCPuZ

---

Title: DDLP: Unsupervised Object-centric Video Prediction with Deep Dynamic Latent Particles

Abstract: We propose a new object-centric video prediction algorithm based on the deep latent particle (DLP) representation of Daniel and Tamar (2022). In comparison to existing slot- or patch-based representations, DLPs model the scene using a set of keypoints with learned parameters for properties such as position and size, and are both efficient and interpretable. Our method, Deep Dynamic Latent Particles (DDLP), yields state-of-the-art object-centric video prediction results on several challenging datasets. The interpretable nature of DDLP allows us to perform ``what-if'' generation -- predict the consequence of changing properties of objects in the initial frames, and DLP's compact structure enables efficient diffusion-based unconditional video generation. Code and pre-trained models will be publicly available. Videos are available: https://sites.google.com/view/ddlp/.

URL: https://openreview.net/forum?id=Wqn8zirthg

---

Title: Hierarchical Autoencoder-based Lossy Compression for Large-scale High-resolution Scientific Data

Abstract: Lossy compression has become an important technique to reduce data size in many domains. This type of compression is especially valuable for large-scale scientific data, whose size ranges up to several petabytes. Although Autoencoder-based models have been successfully leveraged to compress images and videos, such neural networks have not widely gained attention in the scientific data domain. Our work presents a neural network that not only significantly compresses large-scale scientific data but also maintains high reconstruction quality. The proposed model is tested with scientific benchmark data available publicly and applied to a large-scale high-resolution climate modeling data set. Our model achieves a compression ratio of 140 on several benchmark data sets without compromising the reconstruction quality. 2D simulation data from the High-Resolution Community Earth System Model (CESM) Version 1.3 over 500 years are also being compressed with a compression ratio of 200 while the reconstruction error is negligible for scientific analysis.

URL: https://openreview.net/forum?id=PhRAqebTTD

---

Title: Balancing Privacy and Performance for Private Federated Learning Algorithms

Abstract: Federated learning (FL) is a distributed machine learning (ML) framework where multiple clients collaborate to train a model without exposing their private data. FL involves cycles of local computations and bi-directional communications between the clients and server. To bolster data security during this process, FL algorithms frequently employ a differential privacy (DP) mechanism that introduces noise into each client's model updates before sharing. However, while enhancing privacy, the DP mechanism often hampers convergence performance. In this paper, we posit that an optimal balance exists between the number of local steps and communication rounds, one that maximizes the convergence performance within a given privacy budget. Specifically, we prove the optimal number of local steps and communication rounds that enhance the convergence bounds of the DP version of the ScaffNew algorithm. Our findings reveal a direct correlation between the optimal number of local steps, communication rounds, and a set of variables, e.g the DP privacy budget and other problem parameters, specifically in the context of strongly convex optimization. We furthermore provide empirical evidence to validate our theoretical findings.

URL: https://openreview.net/forum?id=40iZW6Utg6

---

Title: What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?

Abstract: Various methods for Multi-Agent Reinforcement Learning (MARL) have been developed with the assumption that agents' policies are based on accurate state information. However, policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state perturbation attacks. In this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to investigate different solution concepts of MARL under state uncertainties. Our analysis shows that the commonly used solution concepts of optimal agent policy and robust Nash equilibrium do not always exist in SAMGs. To circumvent this difficulty, we consider a new solution concept called robust agent policy, where agents aim to maximize the worst-case expected state value. We prove the existence of robust agent policy for finite state and finite action SAMGs. Additionally, we propose a Robust Multi-Agent Adversarial Actor-Critic (RMA3C) algorithm to learn robust policies for MARL agents under state uncertainties. Our experiments demonstrate that our algorithm outperforms existing methods when faced with state perturbations and greatly improves the robustness of MARL policies.

URL: https://openreview.net/forum?id=HyqSwNhM3x

---

Title: Why do autoencoders work?

Abstract: Deep neural network autoencoders are routinely used computationally for model reduction. They allow recognizing the intrinsic dimension of data that lie in a $k$-dimensional subset $K$ of an input Euclidean space $\mathbb{R}^n$. The underlying idea is to obtain both an encoding layer that maps $\mathbb{R}^n$ into $\mathbb{R}^k$ (called the bottleneck layer or the space of latent variables) and a decoding layer that maps $\mathbb{R}^k$ back into $\mathbb{R}^n$, in such a way that the input data from the set $K$ is recovered when composing the two maps. This is achieved by adjusting parameters (weights) in the network to minimize the discrepancy between the input and the reconstructed output. Since neural networks (with continuous activation functions) compute continuous maps, the existence of a network that achieves perfect reconstruction would imply that $K$ is homeomorphic to a $k$-dimensional subset of $\mathbb{R}^k$, so clearly there are topological obstructions to finding such a network. On the other hand, in practice the technique is found to ``work'' well, which leads one to ask if there is a way to explain this effectiveness. We show that, up to small errors, indeed the method is guaranteed to work. This is done by appealing to certain facts from differential geometry. A computational example is also included to illustrate the ideas.

URL: https://openreview.net/forum?id=uGVFtjvI3v

---

Title: Evaluating Spatial Understanding of Large Language Models

Abstract: Large language models (LLMs) show remarkable capabilities across a variety of tasks. Despite the models only seeing text in training, several recent studies suggest that LLM representations implicitly capture aspects of the underlying grounded concepts. Here, we explore LLM representations of a particularly salient kind of grounded knowledge --- spatial relationships. We design natural-language navigation tasks and evaluate the ability of LLMs, in particular GPT-3.5-turbo, GPT-4, and Llama2 series models, to represent and reason about spatial structures. We also compare these abilities to human performance on the same tasks. These tasks reveal substantial variability in LLM performance across different spatial structures, including square, hexagonal, and triangular grids, rings, and trees. We also discover that, similar to humans, LLMs utilize object names as landmarks for maintaining spatial maps. Finally, in extensive error analysis, we find that LLMs' mistakes reflect both spatial and non-spatial factors. These findings suggest that LLMs appear to capture certain aspects of spatial structure implicitly, but room for improvement remains.

URL: https://openreview.net/forum?id=xkiflfKCw3

---

Title: Synthesizing Libraries of Programs with Auxiliary Functions

Abstract: A common approach to program synthesis is to use a learned function to guide the search for a program that satisfies the user's intent. In this paper, we propose a method that offers search guidance, through a domain-dependent auxiliary function, that can be orthogonal to the guidance previous functions provide. Our method, which we call Auxiliary-Based Library Learning (Aulile), searches for a solution in the program space using a base algorithm. If this search does not produce a solution, Aulile enhances the language with a library of programs discovered in the search that optimizes for the auxiliary function. Then, it repeats the search with this library-augmented language. This process is repeated until a solution is found or the system reaches a timeout. We evaluate Aulile in string manipulation tasks. Aulile improved, in some cases by a large margin, the performance of several base algorithms that use different search and learning strategies: Bus, Bustle, Crossbeam, and Bee Search. Our results suggest that Aulile offers an effective method of injecting domain knowledge into existing systems through a library learning scheme that optimizes for an auxiliary function.

URL: https://openreview.net/forum?id=tP1PBrMUlX

---

Title: Speeding Up Speech Synthesis in Diffusion Models by Reducing Data Distribution Recovery Steps via Content Transfer.

Abstract: Diffusion based vocoders have been criticised for being slow due to the many steps required
during sampling. Moreover, the model’s loss function that is popularly implemented is
designed such that the target is the original input x0 or error ε0. For early time steps of the
reverse process, this results in large prediction errors, which can lead to speech distortions
and increase the learning time. We propose a setup where the targets are the different
outputs of forward process time steps with a goal to reduce the magnitude of prediction
errors and reduce the training time. We use the different layers of a neural network (NN) to
perform denoising by training them to learn to generate representations similar to the noised
outputs in the forward process of the diffusion. The NN layers learn to progressively denoise
the input in the reverse process until finally the final layer estimates the clean speech. To
avoid 1:1 mapping between layers of the neural network and the forward process steps, we
define a skip parameter τ > 1 such that an NN layer is trained to cumulatively remove the
noise injected in the τ steps in the forward process. This significantly reduces the number
of data distribution recovery steps and, consequently, the time to generate speech. We show
through extensive evaluation that the proposed technique generates high-fidelity speech in
competitive time that outperforms current state-of-the-art tools. The proposed technique
is also able to generalize well to unseen speech.

URL: https://openreview.net/forum?id=3KavBfu7pM

---

Title: Smoothed Robustness Analysis: Bridging worst- and average-case robustness analyses via smoothed analysis

Abstract: Understanding the robustness of neural networks has attracted significant attention due to its sensitivity to adversarial and noise attacks being still a major drawback. In one extreme, the worst-case approach gives a region in the input space robust to any perturbation, i.e., a worst-case region. On the other, the average-case approach describes robustness against random perturbations. Several studies have attempted to bridge these two extremes. Among them, randomized smoothing became a prominent approach by certifying a worst-case region of a classifier subject to random noise. Here, inspired by smoothed analysis of algorithmic complexity, which bridges the worst- and average-case analyses of algorithms, we propose a new framework of robustness analysis of classifiers, which contains randomized smoothing as a special case. Then, starting from the framework’s requirements, we propose a margin loss-based robustness analysis. This analysis, different from randomized smoothing, in case of having access to the classifier’s Lipschitz constant, gives a certified radius that doesn’t scale with the input noise variance, making this robustness analysis suitable even when the noise variance is small. To validate our approach, we evaluated the robustness of 1-Lipschitz neural networks with the margin-based certified radius as objective function for the MNIST classification task. We found that with the margin-based loss both adversarial and noise robustness were improved in comparison to the randomized smoothing-based one. Attempting to capture empirical robustness, we also compare the trained neural networks to previously reported human-level robustness. The code used in all experiments as well as the data and code for plotting the images from the results section are available in the supplementary material.

URL: https://openreview.net/forum?id=BogwFMz5tU

---

Title: Cognitive Architectures for Language Agents

Abstract: Recent efforts have augmented large language models (LLMs) with external resources (e.g., the Internet) or internal control flows (e.g., prompt chaining) for tasks requiring grounding or reasoning, leading to a new class of language agents. While these agents have achieved substantial empirical success, we lack a systematic framework to organize existing agents and plan future developments. In this paper, we draw on the rich history of cognitive science and symbolic artificial intelligence to propose Cognitive Architectures for Language Agents (CoALA). CoALA describes a language agent with modular memory components, a structured action space to interact with internal memory and external environments, and a generalized decision-making process to choose actions. We use CoALA to retrospectively survey and organize a large body of recent work, and prospectively identify actionable directions towards more capable agents. Taken together, CoALA contextualizes today's language agents within the broader history of AI and outlines a path towards language-based general intelligence.

URL: https://openreview.net/forum?id=1i6ZCvflQJ

---

Reply all

Reply to author

Forward

0 new messages