Weekly TMLR digest for Aug 27, 2023

10 views

Skip to first unread message

TMLR

unread,

Aug 26, 2023, 8:00:10 PM8/26/23

to tmlr-annou...@googlegroups.com

New certifications
==================

Featured Certification, Expert Certification: Holistic Evaluation of Language Models

Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Alexander Cosgrove, Christopher D Manning, Christopher Re, Diana Acosta-Navas, Drew Arad Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue WANG, Keshav Santhanam, Laurel Orr, Lucia Zheng, Mert Yuksekgonul, Mirac Suzgun, Nathan Kim, Neel Guha, Niladri S. Chatterji, Omar Khattab, Peter Henderson, Qian Huang, Ryan Andrew Chi, Sang Michael Xie, Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, Yuta Koreeda

https://openreview.net/forum?id=iO4LZibEqW

---

Accepted papers
===============

Title: Nonconvex-nonconcave min-max optimization on Riemannian manifolds

Authors: Andi Han, Bamdev Mishra, Pratik Jawanpuria, Junbin Gao

Abstract: This work studies nonconvex-nonconcave min-max problems on Riemannian manifolds. We first characterize the local optimality of nonconvex-nonconcave problems on manifolds with a generalized notion of local minimax points. We then define the stability and convergence criteria of dynamical systems on manifolds and provide necessary and sufficient conditions of strictly stable equilibrium points for both continuous and discrete dynamics. Additionally, we propose several novel second-order methods on manifolds that provably converge to local minimax points asymptotically. We validate the empirical benefits of the proposed methods with extensive experiments.

URL: https://openreview.net/forum?id=EDVIHPZhFo

---

Title: Learning to Boost Resilience of Complex Networks via Neural Edge Rewiring

Authors: Shanchao Yang, MA KAILI, Baoxiang Wang, Tianshu Yu, Hongyuan Zha

Abstract: The resilience of complex networks refers to their ability to maintain functionality in the face of structural attacks. This ability can be improved by performing minimal modifications to the network structure via degree-preserving edge rewiring-based methods. Existing learning-free edge rewiring methods, although effective, are limited in their ability to generalize to different graphs. Such a limitation cannot be trivially addressed by existing graph neural networks (GNNs)-based learning approaches since there is no rich initial node features for GNNs to learn meaningful representations. In this work, inspired by persistent homology, we specifically design a variant of GNN called FireGNN to learn meaningful node representations solely from graph structures. We then develop an end-to-end inductive method called ResiNet, which aims to discover resilient network topologies while balancing network utility. ResiNet reformulates the optimization of network resilience as a Markov decision process equipped with edge rewiring action space. It learns to sequentially select the appropriate edges to rewire for maximizing resilience. Extensive experiments demonstrate that ResiNet outperforms existing approaches and achieves near-optimal resilience gains on various graphs while balancing network utility.

URL: https://openreview.net/forum?id=moZvOx5cxe

---

Title: Simulate Time-integrated Coarse-grained Molecular Dynamics with Multi-scale Graph Networks

Authors: Xiang Fu, Tian Xie, Nathan J. Rebello, Bradley Olsen, Tommi S. Jaakkola

Abstract: Molecular dynamics (MD) simulation is essential for various scientific domains but computationally expensive. Learning-based force fields have made significant progress in accelerating ab-initio MD simulation but are not fast enough for many real-world applications due to slow inference for large systems and small time steps (femtosecond-level). We aim to address these challenges by learning a multi-scale graph neural network that directly simulates coarse-grained MD with a very large time step (nanosecond-level) and a novel refinement module based on diffusion models to mitigate simulation instability. The effectiveness of our method is demonstrated in two complex systems: single-chain coarse-grained polymers and multi-component Li-ion polymer electrolytes. For evaluation, we simulate trajectories much longer than the training trajectories for systems with different chemical compositions that the model is not trained on. Structural and dynamical properties can be accurately recovered at several orders of magnitude higher speed than classical force fields by getting out of the femtosecond regime.

URL: https://openreview.net/forum?id=y8RZoPjEUl

---

Title: Meta-Calibration: Learning of Model Calibration Using Differentiable Expected Calibration Error

Authors: Ondrej Bohdal, Yongxin Yang, Timothy Hospedales

Abstract: Calibration of neural networks is a topical problem that is becoming more and more important as neural networks increasingly underpin real-world applications. The problem is especially noticeable when using modern neural networks, for which there is a significant difference between the confidence of the model and the probability of correct prediction. Various strategies have been proposed to improve calibration, yet accurate calibration remains challenging. We propose a novel framework with two contributions: introducing a new differentiable surrogate for expected calibration error (DECE) that allows calibration quality to be directly optimised, and a meta-learning framework that uses DECE to optimise for validation set calibration with respect to model hyper-parameters. The results show that we achieve competitive performance with existing calibration approaches. Our framework opens up a new avenue and toolset for tackling calibration, which we believe will inspire further work on this important challenge.

URL: https://openreview.net/forum?id=R2hUure38l

---

Title: Holistic Evaluation of Language Models

Authors: Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Alexander Cosgrove, Christopher D Manning, Christopher Re, Diana Acosta-Navas, Drew Arad Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue WANG, Keshav Santhanam, Laurel Orr, Lucia Zheng, Mert Yuksekgonul, Mirac Suzgun, Nathan Kim, Neel Guha, Niladri S. Chatterji, Omar Khattab, Peter Henderson, Qian Huang, Ryan Andrew Chi, Sang Michael Xie, Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, Yuta Koreeda

Abstract: Language models (LMs) are becoming the foundation for almost all major language technologies, but their capabilities, limitations, and risks are not well understood. We present Holistic Evaluation of Language Models (HELM) to improve the transparency of language models. First, we taxonomize the vast space of potential scenarios (i.e. use cases) and metrics (i.e. desiderata) that are of interest for LMs. Then we select a broad subset based on coverage and feasibility, noting what’s missing or underrepresented (e.g. question answering for neglected English dialects, metrics for trustworthiness). Second, we adopt a multi-metric approach: We measure 7 metrics (accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency) for each of 16 core scenarios to the extent possible (87.5% of the time), ensuring that metrics beyond accuracy don’t fall to the wayside, and that trade-offs across models and metrics are clearly exposed. We also perform 7 targeted evaluations, based on 26 targeted scenarios, to more deeply analyze specific aspects (e.g. knowledge, reasoning, memorization/copyright, disinformation). Third, we conduct a large-scale evaluation of 30 prominent language models (spanning open, limited-access, and closed models) on all 42 scenarios, including 21 scenarios that were not previously used in mainstream LM evaluation. Prior to HELM, models on average were evaluated on just 17.9% of the core HELM scenarios, with some prominent models not sharing a single scenario in common. We improve this to 96.0%: now all 30 models have been densely benchmarked on a set of core scenarios and metrics under standardized conditions. Our evaluation surfaces 25 top-level findings concerning the interplay between different scenarios, metrics, and models. For full transparency, we release all raw model prompts and completions publicly for further analysis, as well as a general modular toolkit for easily adding new scenarios, models, metrics, and prompting strategies. We intend for HELM to be a living benchmark for the community, continuously updated with new scenarios, metrics, and models.

URL: https://openreview.net/forum?id=iO4LZibEqW

---

Title: Understanding convolution on graphs via energies

Authors: Francesco Di Giovanni, James Rowbottom, Benjamin Paul Chamberlain, Thomas Markovich, Michael M. Bronstein

Abstract: Graph Neural Networks (GNNs) typically operate by message-passing, where the state of a node is updated based on the information received from its neighbours. Most message-passing models act as graph convolutions, where features are mixed by a shared, linear transformation before being propagated over the edges. On node-classification tasks, graph convolutions have been shown to suffer from two limitations: poor performance on heterophilic graphs, and over-smoothing. It is common belief that both phenomena occur because such models behave as low-pass filters, meaning that the Dirichlet energy of the features decreases along the layers incurring a smoothing effect that ultimately makes features no longer distinguishable. In this work, we rigorously prove that simple graph-convolutional models can actually enhance high frequencies and even lead to an asymptotic behaviour we refer to as over-sharpening, opposite to over-smoothing. We do so by showing that linear graph convolutions with symmetric weights minimize a multi-particle energy that generalizes the Dirichlet energy; in this setting, the weight matrices induce edge-wise attraction (repulsion) through their positive (negative) eigenvalues, thereby controlling whether the features are being smoothed or sharpened. We also extend the analysis to non-linear GNNs, and demonstrate that some existing time-continuous GNNs are instead always dominated by the low frequencies. Finally, we validate our theoretical findings through ablations and real-world experiments.

URL: https://openreview.net/forum?id=v5ew3FPTgb

---

New submissions
===============

Title: Temporally Rich Deep Learning Models for Magnetoencephalography

Abstract: Deep learning has been used in a wide range of applications, but it has only very recently been applied to Magnetoencephalography (MEG). MEG is a neurophysiological technique used to investigate a variety of cognitive processes such as language and learning, and an emerging technology in the quest to identify neural correlates of cognitive impairments such as those occurring in dementia.
Recent work has shown that it is possible to apply deep learning to MEG to categorise induced responses to stimuli across subjects.
While novel in the application of deep learning, such work has generally used relatively simple neural network (NN) models compared to those being used in domains such as computer vision and natural language processing.
In these other domains, there is a long history in developing complex NN models that combine spatial and temporal information.
We propose more complex NN models that focus on modelling temporal relationships in the data, and apply them to the challenges of MEG data.
We apply these models to an extended range of MEG-based tasks and find that they substantially outperform existing work on a range of tasks, particularly but not exclusively temporally-oriented ones. We also show that an autoencoder-based preprocessing component that focuses on the temporal aspect of the data can improve the performance of existing models.

URL: https://openreview.net/forum?id=zSeoG5dRHK

---

Title: Improving Native CNN Robustness with Filter Frequency Regularization

Abstract: Neural networks tend to overfit the training distribution and perform poorly on out-of-distribution data. A conceptually simple solution lies in adversarial training, which introduces worst-case perturbations into the training data and thus improves model generalization to some extent. However, it is only one ingredient towards generally more robust models and requires knowledge about the potential attacks or inference time data corruptions during model training. This paper focuses on the native robustness of models that can learn robust behavior directly from conventional training data without out-of-distribution examples. To this end, we investigate the frequencies present in learned convolution filters. Clean-trained models often prioritize high-frequency information, whereas adversarial training enforces models to shift the focus to low-frequency details during training. By mimicking this behavior through frequency regularization in learned convolution weights, we achieve improved native robustness to adversarial attacks, common corruptions, and other out-of-distribution tests. Additionally, this method leads to more favorable shifts in decision-making towards low-frequency information, such as shapes, which inherently aligns more closely with human vision.

URL: https://openreview.net/forum?id=2wecNCpZ7Y

---

Title: A Review of the Applications of Deep Learning-Based Emergent Language

Abstract: Emergent language, or emergent communication, is the field of research which studies how human language-like communication systems emerge de novo in deep multi-agent reinforcement learning environments. The possibilities of replicating the emergence of a complex behavior like language have strong intuitive appeal, yet it is necessary to complement this with clear notions of how such research can be applicable to other fields of science, technology, and engineering. This paper comprehensively reviews the applications of emergent language research across machine learning, natural language processing, linguistics, and cognitive science. Each application is illustrated with a description of its scope, an explication of emergent language's unique role in addressing it, a summary of the extant literature working towards the application, and brief recommendations for near-term research directions.

URL: https://openreview.net/forum?id=jesKcQxQ7j

---

Title: Learning Search Space Boundaries Improves Supernet Training

Abstract: Neural architecture search (NAS) seeks to automate neural network design to optimize performance criteria, but designing a search space for NAS largely remains a manual effort. When available, strong prior knowledge can be used to construct small search spaces, but using such spaces inevitably limits the flexibility of NAS, and prior information is not always available on novel tasks and/or architectures.
On the other hand, many NAS methods have been shown to be sensitive to the choice of search space and struggle when the search space is not sufficiently refined. To address this problem, we propose a differentiable technique that learns a policy to refine a broad initial search space during supernet training. Our proposed solution is orthogonal to almost all existing improvements to NAS pipelines, is largely search space-agnostic, and incurs little additional overhead beyond standard supernet training. Despite its simplicity, we show that on tasks without strong priors, our solution consistently discovers performant subspaces within an initially large, complex search space (where even the state-of-the-art methods underperform), significantly robustifies the resultant supernet and improves the performance across a wide range model sizes. We argue that our work takes a step toward full automation of the network design pipeline.

URL: https://openreview.net/forum?id=LO9mmBNmie

---

Title: Latent State Models of Training Dynamics

Abstract: The impact of randomness on model training is poorly understood. How do differences in data order and initialization actually manifest in the model, such that some training runs outperform others or converge faster? Furthermore, how can we interpret the resulting training dynamics and the phase transitions that characterize different trajectories? To understand the effect of randomness on the dynamics and outcomes of neural network training, we train models multiple times with different random seeds and compute a variety of metrics throughout training, such as the $L_2$ norm, mean, and variance of the neural network's weights. We then fit a hidden Markov model (HMM) over the resulting sequences of metrics. The HMM represents training as a stochastic process of transitions between latent states, providing an intuitive overview of significant changes during training. Using our method, we produce a low-dimensional, discrete representation of training dynamics on grokking tasks, image classification, and masked language modeling. We use the HMM representation to study phase transitions and identify latent "detour" states that slow down convergence.

URL: https://openreview.net/forum?id=NE2xXWo0LF

---

Title: Benchmarks for Physical Reasoning AI

Abstract: Physical reasoning is a crucial aspect in the development of general AI systems, given that human learning starts with interacting with the physical world before progressing to more complex concepts. Although researchers have studied and assessed the physical reasoning of AI approaches through various specific benchmarks, there is no comprehensive approach to evaluating and measuring progress. Therefore, we aim to offer an overview of existing benchmarks and their solution approaches and propose a unified perspective for measuring the physical reasoning capacity of AI systems. We select benchmarks that are designed to test algorithmic performance in physical reasoning tasks. While each of the selected benchmarks poses a unique challenge, their ensemble provides a comprehensive proving ground for an AI generalist agent with a measurable skill level for various physical reasoning concepts. This gives an advantage to such an ensemble of benchmarks over other holistic benchmarks that aim to simulate the real world by intertwining its complexity and many concepts. We group the presented set of physical reasoning benchmarks into subcategories so that more narrow generalist AI agents can be tested first on these clusters.

URL: https://openreview.net/forum?id=cHroS8VIyN

---

Title: Conditional Sampling of Variational Autoencoders via Iterated Approximate Ancestral Sampling

Abstract: Conditional sampling of variational autoencoders (VAEs) is needed in various applications, such as missing data imputation, but is computationally intractable. A principled choice for asymptotically exact conditional sampling is Metropolis-within-Gibbs (MWG). However, we observe that the tendency of VAEs to learn a structured latent space, a commonly desired property, can cause the MWG sampler to get “stuck” far from the target distribution. This paper mitigates the limitations of MWG: we systematically outline the pitfalls in the context of VAEs, propose two original methods that address these pitfalls, and demonstrate an improved performance of the proposed methods on a set of sampling tasks.

URL: https://openreview.net/forum?id=I5sJ6PU6JN

---

Title: Targeted Active Learning for Bayesian Decision-Making

Abstract: Active learning is usually applied to acquire labels of informative data points in supervised learning, to maximize accuracy in a sample-efficient way. However, maximizing the accuracy is not the end goal when the results are used for decision-making, for example in personalized medicine or economics. We argue that when acquiring samples sequentially, separating learning and decision-making is sub-optimal, and we introduce an active learning strategy which takes the down-the-line decision problem into account. Specifically, we adopt a Bayesian experimental design approach, and the proposed criterion maximizes the expected information gain on the posterior distribution of the optimal decision. We compare our targeted active learning strategy to existing alternatives on both simulated and real data, and show improved performance in decision-making accuracy.

URL: https://openreview.net/forum?id=VldyVuH0eX

---

Title: Controlling the Inductive Bias of Wide Neural Networks by Modifying the Kernel’s Spectrum

Abstract: Wide neural networks are biased towards learning certain functions, influencing both the rate of convergence of gradient descent (GD) and the functions that are reachable with GD in finite training time. As such, there is a great need for methods that can modify this bias according to the task at hand. To that end, we introduce Modified Spectrum Kernels (MSKs), a novel family of constructed kernels that can be used to approximate kernels with desired eigenvalues for which no closed form is known. We leverage the duality between wide neural networks and Neural Tangent Kernels and propose a preconditioned gradient descent method, which alters the trajectory of GD. As a result, this allows for a polynomial and, in some cases, exponential training speedup without changing the final solution. Our method is both computationally efficient and simple to implement.

URL: https://openreview.net/forum?id=aD0ExytnEK

---

Title: A Combinatorial Semi-Bandit Approach to Charging Station Selection for Electric Vehicles

Abstract: In this work, we address the problem of long-distance navigation for battery electric vehicles (BEVs), where one or more charging sessions are required to reach the intended destination. We consider the availability and performance of the charging stations to be unknown and stochastic, and develop a combinatorial semi-bandit framework for exploring the road network to learn the parameters of the queue time and charging power distributions. Within this framework, we first outline a method for transforming the road network graph into a graph of feasible paths between charging stations to handle the constrained combinatorial optimization problem in an efficient way. Then, for the feasibility graph, we use a Bayesian approach to model the stochastic edge weights, utilizing conjugate priors for the one-parameter exponential and two-parameter gamma distributions, the latter of which is novel to multi-armed bandit literature. Finally, we apply combinatorial versions of Thompson Sampling, BayesUCB and Epsilon-greedy to the problem. We demonstrate the performance of our framework on long-distance navigation problem instances in large-scale country-sized road networks, with simulation experiments in Norway, Sweden and Finland.

URL: https://openreview.net/forum?id=ndw90pkNM9

---

Title: DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity

Abstract: The unprecedented photorealistic results achieved by recent text-to-image generative systems and their increasing use as plug-and-play content creation solutions make it crucial to understand their potential biases. In this work, we introduce three indicators to evaluate the realism, diversity and prompt-generation consistency of text-to-image generative systems when prompted to generate objects from across the world. Our indicators complement qualitative analysis of the broader impact of such systems by enabling automatic and efficient benchmarking of geographic disparities, an important step towards building responsible visual content creation systems. We use our proposed indicators to analyze potential geographic biases in state-of-the-art visual content creation systems and find that: (1) models have less realism and diversity of generations when prompting for Africa and West Asia than Europe, (2) prompting with geographic information comes at a cost to prompt-consistency and diversity of generated images, and (3) models exhibit more region-level disparities for some objects than others. Perhaps most interestingly, our indicators suggest that progress in image generation quality has come at the cost of real-world geographic representation. Our comprehensive evaluation constitutes a crucial step towards ensuring a positive experience of visual content creation for everyone.

URL: https://openreview.net/forum?id=FDt2UGM1Nz

---

Title: Stochastic Submodular Bandits with Delayed Composite Anonymous Bandit Feedback

Abstract: This paper investigates the problem of combinatorial multiarmed bandits with stochastic submodular (in expectation) rewards and full-bandit delayed feedback, where the delayed feedback is assumed to be composite and anonymous.
In other words, the delayed feedback is composed of components of rewards from past actions, with unknown division among the sub-components. Three models of delayed feedback: bounded adversarial, stochastic independent, and stochastic conditionally independent are studied, and regret bounds are derived for each of the delay models. Ignoring the problem dependent parameters, we show that regret bound for all the delay models is $\tilde{O}(T^{2/3} + T^{1/3} \nu)$ for time horizon $T$, where $\nu$ is a delay parameter defined differently in the three cases, thus demonstrating an additive term in regret with delay in all the three delay models.The considered algorithm is demonstrated to outperform other full-bandit approaches with delayed composite anonymous feedback. We also demonstrate the generalizability of our analysis of the delayed composite anonymous feedback in combinatorial bandits as long as there exists an algorithm for the offline problem satisfying a certain robustness condition.

URL: https://openreview.net/forum?id=w5q7ZJdXvq

---

Title: Norm-count Hypothesis: On the Relationship Between Norm and Object Count in Visual Representations

Abstract: We present a novel hypothesis on norms of representations produced by convolutional neural networks (CNNs). In particular, we propose the norm-count hypothesis (NCH), which states that there is a monotonically increasing relationship between the number of certain objects in the image, and the norm of the corresponding representation. We formalize and prove our hypothesis in a controlled setting, showing that the NCH is true for linear CNNs followed by global average pooling, when they are applied to a certain class of images. Further, we present experimental evidence that corroborates our hypothesis for CNN-based representations. Our experiments are conducted with several real-world image datasets, in both supervised, self-supervised, and few-shot learning -- providing new insight on the relationship between object counts and representation norms.

URL: https://openreview.net/forum?id=0hJGrRuhEA

---

Reply all

Reply to author

Forward

0 new messages