Weekly TMLR digest for Dec 25, 2022

9 views

Skip to first unread message

TMLR

unread,

Dec 24, 2022, 7:00:27 PM12/24/22

to tmlr-annou...@googlegroups.com

New certifications
==================

Featured Certification: On Characterizing the Trade-off in Invariant Representation Learning

Bashir Sadeghi, Sepehr Dehdashtian, Vishnu Boddeti

https://openreview.net/forum?id=3gfpBR1ncr

---

Survey Certification: A Snapshot of the Frontiers of Client Selection in Federated Learning

Gergely Dániel Németh, Miguel Angel Lozano, Novi Quadrianto, Nuria M Oliver

https://openreview.net/forum?id=vwOKBldzFu

---

Reproducibility Certification: Deconstructing Self-Supervised Monocular Reconstruction: The Design Decisions that Matter

Jaime Spencer, Chris Russell, Simon Hadfield, Richard Bowden

https://openreview.net/forum?id=GFK1FheE7F

---

Accepted papers
===============

Title: Collaborative Algorithms for Online Personalized Mean Estimation

Authors: Mahsa Asadi, Aurélien Bellet, Odalric-Ambrym Maillard, Marc Tommasi

Abstract: We consider an online estimation problem involving a set of agents. Each agent has access to a (personal) process that generates samples from a real-valued distribution and seeks to estimate its mean. We study the case where some of the distributions have the same mean, and the agents are allowed to actively query information from other agents. The goal is to design an algorithm that enables each agent to improve its mean estimate thanks to communication with other agents. The means as well as the number of distributions with same mean are unknown, which makes the task nontrivial. We introduce a novel collaborative strategy to solve this online personalized mean estimation problem. We analyze its time complexity and introduce variants that enjoy good performance in numerical experiments. We also extend our approach to the setting where clusters of agents with similar means seek to estimate the mean of their cluster.

URL: https://openreview.net/forum?id=VipljNfZSZ

---

Title: Indiscriminate Data Poisoning Attacks on Neural Networks

Authors: Yiwei Lu, Gautam Kamath, Yaoliang Yu

Abstract: Data poisoning attacks, in which a malicious adversary aims to influence a model by injecting ``poisoned'' data into the training process, have attracted significant recent attention. In this work, we take a closer look at existing poisoning attacks and connect them with old and new algorithms for solving sequential Stackelberg games. By choosing an appropriate loss function for the attacker and optimizing with algorithms that exploit second-order information, we design poisoning attacks that are effective on neural networks. We present efficient implementations by parameterizing the attacker and allowing simultaneous and coordinated generation of tens of thousands of poisoned points, in contrast to most existing methods that generate poisoned points one by one. We further perform extensive experiments that empirically explore the effect of data poisoning attacks on deep neural networks. Our paper sets a new benchmark on the possibility of performing indiscriminate data poisoning attacks on modern neural networks.

URL: https://openreview.net/forum?id=x4hmIsWu7e

---

Title: An empirical study of implicit regularization in deep offline RL

Authors: Caglar Gulcehre, Srivatsan Srinivasan, Jakub Sygnowski, Georg Ostrovski, Mehrdad Farajtabar, Matthew Hoffman, Razvan Pascanu, Arnaud Doucet

Abstract: Deep neural networks are the most commonly used function approximators in offline reinforcement learning. Prior works have shown that neural nets trained with TD-learning and gradient descent can exhibit implicit regularization that can be characterized by under-parameterization of these networks. Specifically, the rank of the penultimate feature layer, also called effective rank, has been observed to drastically collapse during the training. In turn, this collapse has been argued to reduce the model's ability to further adapt in later stages of learning, leading to the diminished final performance. Such an association between the effective rank and performance makes effective rank compelling for offline RL, primarily for offline policy evaluation. In this work, we conduct a careful empirical study on the relation between effective rank and performance on three offline RL datasets : bsuite, Atari, and DeepMind lab. We observe that a direct association exists only in restricted settings and disappears in the more extensive hyperparameter sweeps. Also, we empirically identify three phases of learning that explain the impact of implicit regularization on the learning dynamics and found that bootstrapping alone is insufficient to explain the collapse of the effective rank. Further, we show that several other factors could confound the relationship between effective rank and performance and conclude that studying this association under simplistic assumptions could be highly misleading.

URL: https://openreview.net/forum?id=HFfJWx60IT

---

Title: On Characterizing the Trade-off in Invariant Representation Learning

Authors: Bashir Sadeghi, Sepehr Dehdashtian, Vishnu Boddeti

Abstract: Many applications of representation learning, such as privacy preservation, algorithmic fairness, and domain adaptation, desire explicit control over semantic information being discarded. This goal is formulated as satisfying two objectives: maximizing utility for predicting a target attribute while simultaneously being invariant (independent) to a known semantic attribute. Solutions to invariant representation learning (IRepL) problems lead to a trade-off between utility and invariance when they are competing. While existing works study bounds on this trade-off, two questions remain outstanding: 1) What is the exact trade-off between utility and invariance? and 2) What are the encoders (mapping the data to a representation) that achieve the trade-off, and how can we estimate it from training data? This paper addresses these questions for IRepLs in reproducing kernel Hilbert spaces (RKHS)s. Under the assumption that the distribution of a low-dimensional projection of high-dimensional data is approximately normal, we derive a closed-form solution for the global optima of the underlying optimization problem for encoders in RKHSs. This yields closed formulae for a near-optimal trade-off, corresponding optimal representation dimensionality, and the corresponding encoder(s). We also numerically quantify the trade-off on representative problems and compare them to those achieved by baseline IRepL algorithms.

URL: https://openreview.net/forum?id=3gfpBR1ncr

---

Title: Unsupervised Network Embedding Beyond Homophily

Authors: Zhiqiang Zhong, Guadalupe Gonzalez, Daniele Grattarola, Jun Pang

Abstract: Network embedding (NE) approaches have emerged as a predominant technique to represent complex networks and have benefited numerous tasks. However, most NE approaches rely on a homophily assumption to learn embeddings with the guidance of supervisory signals, leaving the unsupervised heterophilous scenario relatively unexplored. This problem becomes especially relevant in fields where a scarcity of labels exists. Here, we formulate the unsupervised NE task as an r-ego network discrimination problem and develop the SELENE framework for learning on networks with homophily and heterophily. Specifically, we design a dual-channel feature embedding pipeline to discriminate r-ego networks using node attributes and structural information separately. We employ heterophily adapted self-supervised learning objective functions to optimise the framework to learn intrinsic node embeddings. We show that SELENE's components improve the quality of node embeddings, facilitating the discrimination of connected heterophilous nodes. Comprehensive empirical evaluations on both synthetic and real-world datasets with varying homophily ratios validate the effectiveness of SELENE in homophilous and heterophilous settings showing an up to 12.52% clustering accuracy gain.

URL: https://openreview.net/forum?id=sRgvmXjrmg

---

Title: Unsupervised Learning of Neurosymbolic Encoders

Authors: Eric Zhan, Jennifer J. Sun, Ann Kennedy, Yisong Yue, Swarat Chaudhuri

Abstract: We present a framework for the unsupervised learning of neurosymbolic encoders, which are encoders obtained by composing neural networks with symbolic programs from a domain-specific language. Our framework naturally incorporates symbolic expert knowledge into the learning process, which leads to more interpretable and factorized latent representations compared to fully neural encoders. We integrate modern program synthesis techniques with the variational autoencoding (VAE) framework, in order to learn a neurosymbolic encoder in conjunction with a standard decoder. The programmatic descriptions from our encoders can benefit many analysis workflows, such as in behavior modeling where interpreting agent actions and movements is important. We evaluate our method on learning latent representations for real-world trajectory data from animal biology and sports analytics. We show that our approach offers significantly better separation of meaningful categories than standard VAEs and leads to practical gains on downstream analysis tasks, such as for behavior classification.

URL: https://openreview.net/forum?id=eWvBEMTlRq

---

Title: Sequentially learning the topological ordering of directed acyclic graphs with likelihood ratio scores

Authors: Gabriel Ruiz, OSCAR HERNAN MADRID PADILLA, Qing Zhou

Abstract: Causal discovery, the learning of causality in a data mining scenario, has been of strong scientific and theoretical interest as a starting point to identify "what causes what?'' Contingent on assumptions and a proper learning algorithm, it is sometimes possible to identify and accurately estimate an underlying directed acyclic graph (DAG), as opposed to a Markov equivalence class of graphs that gives ambiguity of causal directions. The focus of this paper is in highlighting the identifiability and estimation of DAGs through a sequential sorting procedure that orders variables one at a time, starting at root nodes, followed by children of the root nodes, and so on until completion. We demonstrate a novel application of this general sequential approach to estimate the topological ordering of the DAG corresponding to a linear structural equation model with a non-Gaussian error distribution family. At each step of the procedure, only simple likelihood ratio scores are calculated on regression residuals to decide the next node to append to the current partial ordering. The computational complexity of our algorithm on a $p$-node problem is $\mathcal{O}(pd)$, where $d$ is the maximum neighborhood size. Under mild assumptions, the population version of our procedure provably identifies a true ordering of the underlying DAG. We provide extensive numerical evidence to demonstrate that this sequential procedure scales to possibly thousands of nodes and works well for high-dimensional data. We accompany these numerical experiments with an application to a single-cell gene expression dataset. Our $\texttt{R}$ package with examples and installation instructions can be found at https://gabriel-ruiz.github.io/scorelingam/.

URL: https://openreview.net/forum?id=4pCjIGIjrt

---

Title: A Snapshot of the Frontiers of Client Selection in Federated Learning

Authors: Gergely Dániel Németh, Miguel Angel Lozano, Novi Quadrianto, Nuria M Oliver

Abstract: Federated learning (FL) has been proposed as a privacy-preserving approach in distributed machine learning. A federated learning architecture consists of a central server and a number of clients that have access to private, potentially sensitive data. Clients are able to keep their data in their local machines and only share their locally trained model's parameters with a central server that manages the collaborative learning process. FL has delivered promising results in real-life scenarios, such as healthcare, energy and finance. However, when the number of participating clients is large, the overhead of managing the clients slows down the learning. Thus, client selection has been introduced as a strategy to limit the number of communicating parties at every step of the process. Since the early naïve random selection of clients, several client selection methods have been proposed in the literature. Unfortunately, given that this is an emergent field, there is a lack of a taxonomy of client selection methods, making it hard to compare approaches. In this paper, we propose a taxonomy of client selection in Federated Learning that enables us to shed light on current progress in the field and identify potential areas of future research in this promising area of machine learning.

URL: https://openreview.net/forum?id=vwOKBldzFu

---

Title: Object-aware Cropping for Self-Supervised Learning

Authors: Shlok Kumar Mishra, Anshul Shah, Ankan Bansal, Janit K Anjaria, Abhyuday Narayan Jagannatha, Abhishek Sharma, David Jacobs, Dilip Krishnan

Abstract: A core component of the recent success of self-supervised learning is cropping data augmentation, which selects sub-regions of an image to be used as positive views in the self-supervised
loss. The underlying assumption is that randomly cropped and resized regions of a given
image share information about the objects of interest, which is captured by the learned
representation. This assumption is mostly satisfied in datasets such as ImageNet where
there is a large, centered object, which is highly likely to be present in random crops of
the full image. However, in other datasets such as OpenImages or COCO, which are more
representative of real world uncurated data, there are typically multiple small objects in
an image. In this work, we show that self-supervised learning based on the usual random
cropping performs poorly on such datasets (measured by the difference from fully-supervised
learning). Instead of using pairs of random crops, we propose to leverage an unsupervised
object proposal technique; the first view is a crop obtained from this algorithm, and the
second view is a dilated version of the first view. This encourages the self-supervised model
to learn both object and scene level semantic representations. Using this approach, which we
call object-aware cropping, results in significant improvements over random scene cropping on
classification and object detection benchmarks. For example, for pre-training on OpenImages,
our approach achieves an improvement of 8.8% mAP over random scene cropping (both meth-
ods using MoCo-v2). We also show significant improvements on COCO and PASCAL-VOC
object detection and segmentation tasks over the state-of-the-art self-supervised learning
approaches. Our approach is efficient, simple and general, and can be used in most existing
contrastive and non-contrastive self-supervised learning frameworks.

URL: https://openreview.net/forum?id=WXgJN7A69g

---

Title: Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty

Authors: Thomas George, Guillaume Lajoie, Aristide Baratin

Abstract: Among attempts at giving a theoretical account of the success of deep neural networks, a recent line of work has identified a so-called `lazy' training regime in which the network can be well approximated by its linearization around initialization. Here we investigate the comparative effect of the lazy (linear) and feature learning (non-linear) regimes on subgroups of examples based on their difficulty. Specifically, we show that easier examples are given more weight in feature learning mode, resulting in faster training compared to more difficult ones. In other words, the non-linear dynamics tends to sequentialize the learning of examples of increasing difficulty. We illustrate this phenomenon across different ways to quantify example difficulty, including c-score, label noise, and in the presence of easy-to-learn spurious correlations. Our results reveal a new understanding of how deep networks prioritize resources across example difficulty.

URL: https://openreview.net/forum?id=lukVf4VrfP

---

Title: Fourier Sensitivity and Regularization of Computer Vision Models

Authors: Kiran Krishnamachari, See-Kiong Ng, Chuan-Sheng Foo

Abstract: Recent work has empirically shown that deep neural networks latch on to the Fourier statistics of training data and show increased sensitivity to Fourier-basis directions in the input. Understanding and modifying this Fourier-sensitivity of computer vision models may help improve their robustness, hence, in this paper we study the frequency sensitivity characteristics of deep neural networks using a principled approach. We first propose a $\textbf{\textit{basis trick}}$, proving that unitary transformations of the input-gradient of a function can be used to compute its gradient in the basis induced by the transformation. Using this result, we propose a general measure of any differentiable computer vision model's $\textit{\textbf{Fourier-sensitivity}}$ using the unitary Fourier-transform of its input-gradient. When applied to deep neural networks, we find that computer vision models are consistently sensitive to particular frequencies dependent on the dataset, training method and architecture. Based on this measure, we further propose a $\textit{\textbf{Fourier-regularization}}$ framework to modify the Fourier-sensitivities and frequency bias of models. Using our proposed regularizer-family, we demonstrate that deep neural networks obtain improved classification accuracy on robustness evaluations.

URL: https://openreview.net/forum?id=VmTYgjYloM

---

Title: Deconstructing Self-Supervised Monocular Reconstruction: The Design Decisions that Matter

Authors: Jaime Spencer, Chris Russell, Simon Hadfield, Richard Bowden

Abstract: This paper presents an open and comprehensive framework to systematically evaluate state-of-the-art contributions to self-supervised monocular depth estimation. This includes pretraining, backbone, architectural design choices and loss functions. Many papers in this field claim novelty in either architecture design or loss formulation. However, simply updating the backbone of historical systems results in relative improvements of 25%, allowing them to outperform most modern systems. A systematic evaluation of papers in this field was not straightforward. The need to compare like-with-like in previous papers means that longstanding errors in the evaluation protocol are ubiquitous in the field. It is likely that many papers were not only optimized for particular datasets, but also for errors in the data and evaluation criteria. To aid future research in this area, we release a modular codebase (https://github.com/jspenmar/monodepth_benchmark), allowing for easy evaluation of alternate design decisions against corrected data and evaluation criteria. We re-implement, validate and re-evaluate 16 state-of-the-art contributions and introduce a new dataset (SYNS-Patches) containing dense outdoor depth maps in a variety of both natural and urban scenes. This allows for the computation of informative metrics in complex regions such as depth boundaries.

URL: https://openreview.net/forum?id=GFK1FheE7F

---

Title: MVSFormer: Multi-View Stereo by Learning Robust Image Features and Temperature-based Depth

Authors: Chenjie Cao, Xinlin Ren, Yanwei Fu

Abstract: Feature representation learning is the key recipe for learning-based Multi-View Stereo (MVS). As the common feature extractor of learning-based MVS, vanilla Feature Pyramid Networks (FPNs) suffer from discouraged feature representations for reflection and texture-less areas, which limits the generalization of MVS. Even FPNs worked with pre-trained Convolutional Neural Networks (CNNs) fail to tackle these issues. On the other hand, Vision Transformers (ViTs) have achieved prominent success in many 2D vision tasks. Thus we ask whether ViTs can facilitate feature learning in MVS? In this paper, we propose a pre-trained ViT enhanced MVS network called MVSFormer, which can learn more reliable feature representations benefited by informative priors from ViT. The finetuned MVSFormer with hierarchical ViTs of efficient attention mechanisms can achieve prominent improvement based on FPNs. Besides, the alternative MVSFormer with frozen ViT weights is further proposed. This largely alleviates the training cost with competitive performance strengthened by the attention map from the self-distillation pre-training. MVSFormer can be generalized to various input resolutions with efficient multi-scale training strengthened by gradient accumulation. Moreover, we discuss the merits and drawbacks of classification and regression-based MVS methods, and further propose to unify them with a temperature-based strategy. MVSFormer achieves state-of-the-art performance on the DTU dataset. Particularly, MVSFormer ranks as Top-1 on both intermediate and advanced sets of the highly competitive Tanks-and-Temples leaderboard. Codes and models are released in https://github.com/ewrfcas/MVSFormer.

URL: https://openreview.net/forum?id=2VWR6JfwNo

---

Title: Controllable Generative Modeling via Causal Reasoning

Authors: Joey Bose, Ricardo Pio Monti, Aditya Grover

Abstract: Deep latent variable generative models excel at generating complex, high-dimensional data, often exhibiting impressive generalization beyond the training distribution. However, many such models in use today are black-boxes trained on large unlabelled datasets with statistical objectives and lack an interpretable understanding of the latent space required for controlling the generative process.
We propose CAGE, a framework for controllable generation in latent variable models based on causal reasoning.
Given a pair of attributes, CAGE infers the implicit cause-effect relationships between these attributes as induced by a deep generative model. This is achieved by defining and estimating a novel notion of unit-level causal effects in the latent space of the generative model.
Thereafter, we use the inferred cause-effect relationships to design a novel strategy for controllable generation based on counterfactual sampling. Through a series of large-scale synthetic and human evaluations, we demonstrate that generating counterfactual samples which respect the underlying causal relationships inferred via CAGE leads to subjectively more realistic images.

URL: https://openreview.net/forum?id=Z44YAcLaGw

---

New submissions
===============

Title: Enhancing image captioning with depth information using a Transformer-based framework

Abstract: Captioning images is a challenging scene-understanding task that connects computer vision and natural language processing. Although image captioning models have been capable of producing excellent descriptions, a significant advancement in this area focuses primarily on generating a single sentence for 2D images. In this paper, we investigate whether combining depth information with RGB images for the captioning task can help to generate better captions. For this purpose, we propose a Transformer-based encoder-decoder framework for generating a multi-sentence description of a 3D scene. The RGB image and its corresponding depth map are provided as inputs to our framework, which combines them to produce a better understanding of the input scene. We explore different fusion approaches to fuse RGB and depth images. We first study the NYU-v2 dataset and found inconsistent labeling that prevents the benefit of using the depth information to enhance the captioning task. The results were even worse than when using RGB images only. As a result, we propose a cleaned version of the NYU-v2 dataset that is more consistent and informative. Extensive experiments on the cleaned dataset show that the proposed framework can effectively benefit from depth information and generate better captions. Code, pre-trained models, and the cleaned version of the NYU-v2 dataset will be made publically available.

URL: https://openreview.net/forum?id=PtrK8Aoe2M

---

Title: Trip-ROMA: Self-Supervised Learning with Triplets and Random Mappings

Abstract: Contrastive self-supervised learning (SSL) methods, such as MoCo and SimCLR, have achieved great success in unsupervised visual representation learning. They rely on a large number of negative pairs and thus require either large memory banks or large batches. Some recent non-contrastive SSL methods, such as BYOL and SimSiam, attempt to discard negative pairs and have also shown remarkable performance. To avoid collapsed solutions caused by not using negative pairs, these methods require sophisticated asymmetry designs. However, in small data regimes, we can not obtain a sufficient number of negative pairs or effectively avoid the over-fitting problem when negatives are not used at all. To address this situation, we argue that negative pairs are still important but one is generally sufficient for each positive pair. We show that a simple Triplet-based loss (Trip) can achieve surprisingly good performance without requiring large batches or asymmetry designs. Moreover, to alleviate the over-fitting problem in small data regimes and further enhance the effect of Trip, we propose a simple plug-and-play RandOm MApping (ROMA) strategy by randomly mapping samples into other spaces and requiring these randomly projected samples to satisfy the same relationship indicated by the triplets. Integrating the triplet-based loss with random mapping, we obtain the proposed method Trip-ROMA. Extensive experiments, including unsupervised representation learning and unsupervised few-shot learning, have been conducted on ImageNet-1K and seven small datasets. They successfully demonstrate the effectiveness of Trip-ROMA and consistently show that ROMA can further effectively boost other SSL methods.

URL: https://openreview.net/forum?id=MR4glug5GU

---

Title: How Robust is Your Fairness? Evaluating and Sustaining Fairness under Unseen Distribution Shifts

Abstract: Increasing concerns have been raised on deep learning fairness in recent years. Existing fairness-aware machine learning methods mainly focus on the fairness of in-distribution data. However, in real-world applications, it is common to have distribution shift between the training and test data. In this paper, we first show that the fairness achieved by existing methods can be easily broken by slight distribution shifts. To solve this problem, we propose a novel fairness learning method termed CUrvature MAtching (CUMA), which can achieve robust fairness generalizable to unseen domains with unknown distributional shifts. Specifically, CUMA enforces the model to have similar generalization ability on the majority and minority groups, by matching the loss curvature distributions of the two groups. We evaluate our method on three popular fairness datasets. Compared with existing methods, CUMA achieves superior fairness under unseen distribution shifts, without sacrificing either the overall accuracy or the in-distribution fairness.

URL: https://openreview.net/forum?id=11pGlecTz2

---

Title: A Separation Law of Membership Privacy between One- and Two-Layer Networks

Abstract: We study the problem of identifying whether a target sample is included in the training procedure of neural networks (i.e. member vs. non-member). This problem is known as the problem of membership inference attacks, and raises concerns on the security and privacy of machine learning. In this work, we prove a separation law of membership privacy between one- and two-layer networks: the latter provably preserves less membership privacy against confidence-based attacks than the former. We also prove the phenomenon of confidence collapse in two-layer networks, which refers to the phenomenon that the samples of the same class have exactly the same confidence score. Our results are two-fold: a) gradient methods on two-layer ReLU networks converge to a confidence-collapsed solution, such that the attacker can classify members and non-members with perfect precision and recall; b) under the same assumptions as in a), there exists a training dataset such that the confidence collapse phenomenon does not occur and the attacker fails to classify all members and non- members correctly.

URL: https://openreview.net/forum?id=JnyraW4kBt

---

Title: Group Fairness in Reinforcement Learning

Abstract: We pose and study the problem of satisfying fairness in the online Reinforcement Learning (RL) setting. We focus on the group notions of fairness, according to which agents belonging to different groups should have similar performance based on some given measure. We consider the setting of maximizing return in an unknown environment (unknown transition and reward function) and show that it is possible to have RL algorithms that learn the best fair policies without violating the fairness requirements at any point in time during the learning process. In the tabular finite-horizon episodic setting, we provide an algorithm that combines the principle of optimism and pessimism under uncertainty to achieve zero fairness violation with arbitrarily high probability while also maintaining sub-linear regret guarantees. For the high-dimensional Deep-RL setting, we present algorithms based on the performance-difference style approximate policy improvement update step and we report encouraging empirical results on various traditional RL-inspired benchmarks showing that our algorithms display the desired behavior of learning the optimal policy while performing a fair learning process.

URL: https://openreview.net/forum?id=JkIH4MeOc3

---

Title: Graph Neural Networks Designed for Different Graph Types: A Survey

Abstract: Graphs are ubiquitous in nature and can therefore serve as models for many practical but also
theoretical problems. For this purpose, they can be defined as many different types which
suitably reflect the individual contexts of the represented problem. To address cutting-edge
problems based on graph data, the research field of Graph Neural Networks (GNNs) has
emerged. Despite the field’s youth and the speed at which new models are developed, many
recent surveys have been published to keep track of them. Nevertheless, it has not yet
been gathered which GNN can process what kind of graph types. In this survey, we give a
detailed overview of already existing GNNs and, unlike previous surveys, categorize them
according to their ability to handle different graph types and properties. We consider GNNs
operating on static and dynamic graphs of different structural constitutions, with or without
node or edge attributes. Moreover, we distinguish between GNN models for discrete-time or
continuous-time dynamic graphs and group the models according to their architecture. We
find that there are still graph types that are not or only rarely covered by existing GNN
models. We point out where models are missing and give potential reasons for their absence.

URL: https://openreview.net/forum?id=h4BYtZ79uy

---

Title: A Kernel Perspective on Behavioural Metrics for Markov Decision Processes

Abstract: We present a novel perspective on behavioural metrics for Markov decision processes via the use of positive definite kernels. We define a new metric under this lens that is provably equivalent to the recently introduced MICo distance (Castro et al., 2021). The kernel perspective enables us to provide new theoretical results, including value-function bounds and low-distortion finite-dimensional Euclidean embeddings, which are crucial when using behavioural metrics for reinforcement learning representations. We complement our theory with strong empirical results that demonstrate the effectiveness of these methods in practice.

URL: https://openreview.net/forum?id=nHfPXl1ly7

---

Title: HighMMT: Quantifying Modality & Interaction Heterogeneity for High-Modality Representation Learning

Abstract: Many real-world problems are inherently multimodal, from the communicative modalities humans use to express social and emotional states such as spoken language, gestures, and paralinguistics to the force, proprioception, and visual sensors ubiquitous on robots. While there has been an explosion of interest in multimodal representation learning, these methods are still largely focused on a small set of modalities, primarily in the language, vision, and audio space. In order to accelerate generalization towards diverse and understudied modalities, this paper studies efficient representation learning for high-modality scenarios involving a large set of diverse modalities. Since adding new models for every new modality or task becomes prohibitively expensive, a critical technical challenge is heterogeneity quantification: how can we measure which modalities encode similar information and interactions in order to permit parameter sharing with previous modalities? This paper proposes two new information theoretic metrics for heterogeneity quantification: (1) modality heterogeneity studies how similar $2$ modalities $\{X_1,X_2\}$ are by measuring how much information can be transferred from $X_1$ to $X_2$, while (2) interaction heterogeneity studies how similarly pairs of modalities $\{X_1,X_2\}, \{X_3,X_4\}$ interact by measuring how much interaction information can be transferred from $\{X_1,X_2\}$ to $\{X_3,X_4\}$. We show the importance of these $2$ proposed metrics in high-modality scenarios as a way to automatically prioritize the fusion of modalities that contain unique information or unique interactions. The result is a single model, HighMMT, that scales up to $10$ modalities (text, image, audio, video, sensors, proprioception, speech, time-series, sets, and tables) and $15$ tasks from $5$ different research areas. Not only does HighMMT outperform prior methods on the tradeoff between performance and efficiency, it also demonstrates a crucial scaling behavior: performance continues to improve with each modality added, and it transfers to entirely new modalities and tasks during fine-tuning. We release our code and benchmarks, which we hope will present a unified platform for subsequent theoretical and empirical analysis.

URL: https://openreview.net/forum?id=ttzypy3kT7

---

Title: Attentional-Biased Stochastic Gradient Descent

Abstract: In this paper, we present a simple yet effective systematic method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning. Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch. The individual-level weight of a sampled data is systematically proportional to the exponential of a scaled loss value of the data, where the scaling factor is interpreted as the regularization parameter in the framework of distributionally robust optimization (DRO). Depending on whether the scaling factor is positive or negative, ABSGD is guaranteed to converge to a stationary point of an information-regularized min-max or min-min DRO problem, respectively. Compared with existing class-level weighting schemes, our method can capture the diversity between individual examples within each class. Compared with existing individual-level weighting methods using meta-learning that require three backward propagations for computing mini-batch stochastic gradients, our method is more efficient with only one backward propagation at each iteration as in standard deep learning methods. ABSGD is flexible enough to combine with other robust losses without any additional cost. Our empirical studies on several benchmark datasets demonstrate the effectiveness of the proposed method.

URL: https://openreview.net/forum?id=B0WYWvVA2r

---

Title: Image Compression with Product Quantized Masked Image Modeling

Abstract: Recent neural compression methods have been based on the popular hyperprior framework. It relies on Scalar Quantization and offers a very strong compression performance. This contrasts from recent advances in image generation and representation learning, where Vector Quantization is more commonly employed.
In this work, we attempt to bring these lines of research closer by revisiting vector quantization for image compression. We build upon the VQ-VAE framework and introduce several modifications. First, we replace the vanilla vector quantizer by a product quantizer. This intermediate solution between vector and scalar quantization allows for a much wider set of rate-distortion points: It implicitly defines high-quality quantizers that would otherwise require intractably large codebooks. Second, inspired by the success of Masked Image Modeling (MIM) in the context of self-supervised learning and generative image models, we propose a novel conditional entropy model which improves entropy coding by modelling the co-dependencies of the quantized latent codes. The resulting PQ-MIM model is surprisingly effective: its compression performance on par with recent hyperprior methods. It also outperforms HiFiC in terms of FID and KID metrics when optimized with perceptual losses (e.g. adversarial). Finally, since PQ-MIM is compatible with image generation frameworks, we show qualitatively that it can operate under a hybrid mode between compression and generation, with no further training or finetuning. As a result, we explore the extreme compression regime where an image is compressed into 200 bytes, i.e., less than a tweet.

URL: https://openreview.net/forum?id=Z2L5d9ay4B

---

Title: Synthesizing a Progression of Subtasks for Block-Based Visual Programming Tasks

Abstract: Block-based visual programming environments play an increasingly important role in introducing computing concepts to K-12 students. In recent years, they have also gained popularity in neuro-symbolic AI, serving as a benchmark to evaluate general problem-solving and logical reasoning skills. The open-ended and conceptual nature of these visual programming tasks make them challenging, both for state-of-the-art AI agents as well as for novice programmers. A natural approach to providing assistance for problem-solving is breaking down a complex task into a progression of simpler subtasks; however, this is not trivial given that the solution codes are typically nested and have non-linear execution behavior. In this paper, we formalize the problem of synthesizing such a progression for a given reference block-based visual programming task. We propose a novel synthesis algorithm that generates a progression of subtasks that are high-quality, well-spaced in terms of their complexity, and solving this progression leads to solving the reference task. We show the utility of our synthesis algorithm in improving the efficacy of AI agents (in this case, neural program synthesizers and search-based agents) for solving tasks in the Karel programming environment. Then, we conduct a user study to demonstrate that our synthesized progression of subtasks can assist a novice programmer in solving tasks in the Hour of Code: Maze Challenge by Code.org.

URL: https://openreview.net/forum?id=PekuQXzSo0

---

Title: Label Noise-Robust Learning using a Confidence-Based Sieving Strategy

Abstract: In learning tasks with label noise, boosting model robustness against overfitting is a pivotal challenge because the model eventually memorizes labels including the noisy ones. Identifying the samples with corrupted labels and preventing the model from learning them is a promising approach to address this challenge. Per-sample training loss is a previously studied metric that considers samples with small loss as clean samples on which the model should be trained. In this work, we first demonstrate the ineffectiveness of this small-loss trick. Then, we propose a novel discriminator metric called confidence error and a sieving strategy called CONFES to effectively differentiate between the clean and noisy samples. We experimentally illustrate the superior performance of our proposed approach compared to recent studies on various settings such as synthetic and real-world label noise. Moreover, we show CONFES can be combined with other approaches such as Co-teaching and DivideMix to further improve the model performance.

URL: https://openreview.net/forum?id=QptGQOnRwl

---

Title: Multivariable Causal Discovery for General Nonlinear Functions

Abstract: Today's methods for uncovering causal relationships from observational data either constrain functional assignments (linearity/additive noise assumptions) or the data generating process (e.g., non-i.i.d. assumptions). We assume non-i.i.d.data to develop a framework for causal discovery that works for general non-linear dependencies. We use nonlinear Independent Component Analysis (ICA) to infer the underlying sources from the observed variables. Unlike previous works, which use conditional independence tests, we rely on the Jacobian of the inference function to determine the causal relationships. In particular, we prove that, under strong identifiability, the inference function's Jacobian captures the sparsity structure of the causal graph; thus generalizing the classic LiNGAM method to the nonlinear case. Our approach avoids the cost of exponentially many independence tests and makes our method end-to-end differentiable. We demonstrate that the proposed method can infer the causal graph on multiple synthetic data sets, and in most scenarios outperforms previous work.

URL: https://openreview.net/forum?id=2Yo9xqR6Ab

---

Title: Deep Double Descent via Smooth Interpolation

Abstract: Overparameterized deep networks can interpolate noisy data while at the same time showing good generalization performance. Common intuition from polynomial regression suggests that large networks are able to sharply interpolate noisy data without considerably deviating from the ground-truth signal. At present, a precise characterization of this phenomenon for deep networks is missing. In this work, we present an empirical study of input-space smoothness of the loss landscape of deep networks over volumes around cleanly- and noisily-labeled training samples, as we systematically increase the number of model parameters and training epochs. Our findings show that loss sharpness in the input space follows both model- and epoch-wise double descent, with worse peaks observed around noisy labels. While small interpolating models sharply fit both clean and noisy data, large interpolating models express a smooth loss landscape, where noisy targets are predicted over large volumes around training data points, in contrast to existing intuition.

URL: https://openreview.net/forum?id=fempQstMbV

---

Reply all

Reply to author

Forward

0 new messages