Daily TMLR digest for Jul 21, 2022

1 view
Skip to first unread message

TMLR

unread,
Jul 20, 2022, 8:00:09 PM7/20/22
to tmlr-anno...@googlegroups.com

Accepted papers
===============


Title: Deformation Robust Roto-Scale-Translation Equivariant CNNs

Authors: Liyao Gao, Guang Lin, Wei Zhu

Abstract: Incorporating group symmetry directly into the learning process has proved to be an effective guideline for model design. By producing features that are guaranteed to transform covariantly to the group actions on the inputs, group-equivariant convolutional neural networks (G-CNNs) achieve significantly improved generalization performance in learning tasks with intrinsic symmetry. General theory and practical implementation of G-CNNs have been studied for planar images under either rotation or scaling transformation, but only individually. We present, in this paper, a roto-scale-translation equivariant CNN ($\mathcal{RST}$-CNN), that is guaranteed to achieve equivariance jointly over these three groups via coupled group convolutions. Moreover, as symmetry transformations in reality are rarely perfect and typically subject to input deformation, we provide a stability analysis of the equivariance of representation to input distortion, which motivates the truncated expansion of the convolutional filters under (pre-fixed) low-frequency spatial modes. The resulting model provably achieves deformation-robust $\mathcal{RST}$ equivariance, i.e., the $\mathcal{RST}$ symmetry is still "approximately” preserved when the transformation is "contaminated” by a nuisance data deformation, a property that is especially important for out-of-distribution generalization. Numerical experiments on MNIST, Fashion-MNIST, and STL-10 demonstrate that the proposed model yields remarkable gains over prior arts, especially in the small data regime where both rotation and scaling variations are present within the data.

URL: https://openreview.net/forum?id=yVkpxs77cD

---

Title: On the link between conscious function and general intelligence in humans and machines

Authors: Arthur Juliani, Kai Arulkumaran, Shuntaro Sasai, Ryota Kanai

Abstract: In popular media, there is often a connection drawn between the advent of awareness in artificial agents and those same agents simultaneously achieving human or superhuman level intelligence. In this work, we explore the validity and potential application of this seemingly intuitive link between consciousness and intelligence. We do so by examining the cognitive abilities associated with three contemporary theories of conscious function: Global Workspace Theory (GWT), Information Generation Theory (IGT), and Attention Schema Theory (AST). We find that all three theories specifically relate conscious function to some aspect of domain-general intelligence in humans. With this insight, we turn to the field of Artificial Intelligence (AI) and find that, while still far from demonstrating general intelligence, many state-of-the-art deep learning methods have begun to incorporate key aspects of each of the three functional theories. Having identified this trend, we use the motivating example of mental time travel in humans to propose ways in which insights from each of the three theories may be combined into a single unified and implementable model. Given that it is made possible by cognitive abilities underlying each of the three functional theories, artificial agents capable of mental time travel would not only possess greater general intelligence than current approaches, but also be more consistent with our current understanding of the functional role of consciousness in humans, thus making it a promising near-term goal for AI research.

URL: https://openreview.net/forum?id=LTyqvLEv5b

---


New submissions
===============


Title: Efficient Gradient Flows in Sliced-Wasserstein Space

Abstract: Minimizing functionals in the space of probability distributions can be done with Wasser-
stein gradient flows. To solve them numerically, a possible approach is to rely on the
Jordan–Kinderlehrer–Otto (JKO) scheme which is analogous to the proximal scheme in
Euclidean spaces. However, it requires solving a nested optimization problem at each it-
eration, and is known for its computational challenges, especially in high dimension. To
alleviate it, very recent works propose to approximate the JKO scheme leveraging Brenier’s
theorem, and using gradients of Input Convex Neural Networks to parameterize the density
(JKO-ICNN). However, this method comes with a high computational cost and stability is-
sues. Instead, this work proposes to use gradient flows in the space of probability measures
endowed with the sliced-Wasserstein (SW) distance. We argue that this method is more flex-
ible than JKO-ICNN, since SW enjoys a closed-form differentiable approximation. Thus,
the density at each step can be parameterized by any generative model which alleviates the
computational burden and makes it tractable in higher dimensions.

URL: https://openreview.net/forum?id=Au1LNKmRvh

---

Title: Bounding generalization error with input compression: An empirical study with infinite-width networks

Abstract: Estimating the Generalization Error (GE) of Deep Neural Networks (DNNs) is an important task that often relies on availability of held-out data. The ability to better predict GE based on a single training set may yield overarching DNN design principles to reduce a reliance on trial-and-error, along with other performance assessment advantages.
In search of a quantity relevant to GE, we investigate the Mutual Information (MI) between the input and final layer representations, using the infinite-width DNN limit to bound MI. An existing input compression-based GE bound is used to link MI and GE. To the best of our knowledge, this represents the first empirical study of this bound. In our attempt to empirically falsify the theoretical bound, we find that it is often tight for best-performing models. Furthermore, it detects randomization of training labels in many cases, reflects test-time perturbation robustness, and works well given only few training samples. These results are promising given that input compression is broadly applicable where MI can be estimated with confidence.

URL: https://openreview.net/forum?id=jbZEUtULft

---

Title: Unimodal Likelihood Models for Ordinal Data

Abstract: Ordinal regression (OR) is the classification of ordinal data, in which the underlying target variable is categorical and considered to have a natural ordinal relation for the explanatory variable. We in this study suppose the unimodality of conditional probability distributions as a natural ordinal relation of the ordinal data. Under this supposition, unimodal likelihood models are expected promising for improving the generalization performance in OR tasks. Demonstrating that previous unimodal likelihood models have a weak representation ability, we thus develop more representable unimodal models including the most representable one. Our took OR experiments show that the developed more representable unimodal models yielded better generalization performance for real-world ordinal data compared with previous unimodal models and popular statistical OR models having no unimodality guarantee.

URL: https://openreview.net/forum?id=1l0sClLiPc

---

Reply all
Reply to author
Forward
0 new messages