# Weekly TMLR digest for Apr 17, 2022

4 views

### TMLR

Apr 16, 2022, 8:00:06 PMApr 16

New submissions
===============

Abstract: Discovering what is learned by neural networks remains a challenge. In self-supervised learning, classification is the most common task used to evaluate how good a representation is. However, relying only on such downstream task can limit our understanding of how much information is retained in the representation of a given input. In this work, we showcase the use of a conditional diffusion based generative model (RCDM) to visualize representations learned with self-supervised models. We further demonstrate how this model's generation quality is on par with state-of-the-art generative models while being faithful to the representation used as conditioning. By using this new tool to analyze self-supervised models, we can show visually that i) SSL (backbone) representation are not really invariant to many data augmentation they were trained on. ii) SSL projector embeddings appear too invariant for tasks like classification. iii) SSL representations are more robust to small adversarial perturbation of their inputs iv) there is an inherent structure learned with SSL models that can be used for image manipulation.

---

Title: The Impact of Reinitialization on Generalization in Convolutional Neural Networks

Abstract: We study the impact of different reinitialization methods in several convolutional architectures for small-size image classification datasets. We analyze the potential gains of reinitialization and highlight limitations. We also study a new layerwise reinitialization algorithm that outperforms previous methods and suggest explanations of the observed improved generalization. First, we show that layerwise reinitialization increases the margin on the training examples without increasing the norm of the weights, hence leading to an improvement in margin-based generalization bounds for neural networks. Second, we demonstrate that it settles in flatter local minima of the loss surface. Third, it encourages learning general rules and discourages memorization by placing emphasis on the lower layers of the neural network.

---

Title: Estimating Unbiased Averages of Sensitive Attributes without Handshakes among Agents

Abstract: We consider the problem of distributed averaging of sensitive attributes in a network of agents without central coordinators, where the graph of the network has an arbitrary degree sequence (degrees refer to numbers of neighbors of vertices). Usually, existing works solve this problem by assuming that either (i) the agents reveal their degrees to their neighbors or (ii) every two neighboring agents can perform handshakes (requests that rely on replies) in every exchange of information. However, the degrees suggest the profiles of the agents and the handshakes are impractical upon inactive agents. We propose an approach which solves the problem with privatized degrees and without handshakes upon a stronger self-organization. In particular, we propose a simple gossip algorithm that computes averages that are biased by the variance of the degrees and a mechanism that corrects that bias. We will suggest a use case of the proposed approach that allows for fitting a linear regression model in a distributed manner, while privatizing the target values, the features and the degrees. We will provide theoretical guarantees that the mean squared error between an estimated regression parameter and a true regression parameter is $\mathcal{O}(\frac{1}{n})$, where $n$ is the number of agents. We will show on synthetic graph datasets that the theoretical error is close to its empirical counterpart. Also, we will show on synthetic graph datasets and real graph datasets that the regression model fitted by our approach is close to the solution when locally privatized values are averaged by central coordinators.

---

Title: Unsupervised Network Representation Learning and the Illusion of Progress

Abstract: A number of methods have been developed for network representation learning -- ranging from classical methods based on the graph spectra to recent random walk based methods and from deep learning based methods to matrix factorization based methods. Each new study inevitably seeks to establish the relative superiority of the proposed method over others. The lack of a standard assessment protocol and benchmark suite often leave practitioners wondering if a new idea represents a significant scientific advance. In this work, we articulate a clear and pressing need to systematically and rigorously benchmark such methods. Our overall assessment -- a result of a careful benchmarking of 13 methods for unsupervised network representation learning on 16 datasets (several with different characteristics) - is that many recently proposed improvements are somewhat of an illusion. Specifically, we find that several recent improvements are marginal at best and that aspects of many of these datasets often render such small differences insignificant, especially when viewed from a rigorous statistical lens. A more detailed analysis of our results identify several new insights: first, we find that classical methods, often dismissed or not considered by recent efforts, can compete on certain types of datasets if they are tuned appropriately; second, we find that from a qualitative standpoint, a couple of recent methods based on matrix factorization offer a small but not always consistent advantage over alternative methods; third, no single method completely outperforms other embedding methods on both node classification and link prediction tasks. Finally, we also present several drill-down analysis that reveals settings under which certain algorithms perform well (e.g., the role of neighborhood context and dataset properties that impact performance). An important outcome of this study is the benchmark and evaluation protocol, which practitioners may find useful for future research in this area.

---

Title: Deformation Robust Roto-Scale-Translation Equivariant CNNs

Abstract: Incorporating group symmetry directly into the learning process has proved to be an effective guideline for model design. By producing features that are guaranteed to transform covariantly to the group actions on the inputs, group-equivariant convolutional neural networks (G-CNNs) achieve significantly improved generalization performance in learning tasks with intrinsic symmetry. General theory and practical implementation of G-CNNs have been studied for planar images under either rotation or scaling transformation, but only individually. We present, in this paper, a roto-scale-translation equivariant CNN ($\mathcal{RST}$-CNN), that is guaranteed to achieve equivariance jointly over these three groups via coupled group convolutions. Moreover, as symmetry transformations in reality are rarely perfect and typically subject to input deformation, we provide a stability analysis of the equivariance of representation to input distortion, which motivates the truncated expansion of the convolutional filters under (pre-fixed) low-frequency spatial modes. The resulting model provably achieves deformation-robust $\mathcal{RST}$ equivariance, i.e., the $\mathcal{RST}$ symmetry is still "approximately” preserved when the transformation is "contaminated” by a nuisance data deformation, a property that is especially important for out-of-distribution generalization. Numerical experiments on MNIST, Fashion-MNIST, and STL-10 demonstrate that the proposed model yields remarkable gains over prior arts, especially in the small data regime where both rotation and scaling variations are present within the data.

---

Title: NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes

Abstract: We present NeSF, a method for producing 3D semantic fields from posed RGB images alone.In place of classical 3D representations, our method builds on recent work in neural fields wherein 3D structure is captured by point-wise functions. We leverage this methodology to recover 3D density fields upon which we then train a 3D semantic segmentation model supervised by posed 2D semantic maps. Despite being trained on 2D signals alone, our method is able to generate 3D-consistent semantic maps from novel camera poses and can be queried at arbitrary 3D points. Notably, NeSF is compatible with any method producing a density field. Our empirical analysis demonstrates comparable quality to competitive 2D and 3D semantic segmentation baselines on complex, realistically-rendered scenes and significantly outperforms a comparable neural radiance field-based method on a series of tasks requiring 3D reasoning. Our method is the first to learn semantics from the geometry stored within a 3D neural field representation. NeSF is trained using purely 2D signals, can be trained with as few as one labeled image per-scene, and no RGB input is required for inference on novel scenes

---

Title: Parameter Sharing For Heterogeneous Agents in Multi-Agent Reinforcement Learning

Abstract: Parameter sharing, where each agent independently learns a policy with fully shared parameters between all policies, is a popular baseline method for multi-agent deep reinforcement learning. Unfortunately, since all agents share the same policy network, they cannot learn different policies or tasks. This issue has been circumvented experimentally by adding an agent-specific indicator signal to observations, which we term agent indication.'' Agent indication is limited, however, in that without modification it does not allow parameter sharing to be applied to environments where the action spaces and/or observation spaces are heterogeneous. This work formalizes the notion of agent indication and proves that it enables convergence to optimal policies for the first time. Next, we formally introduce methods to extend parameter sharing to learning in heterogeneous observation and action spaces, and prove that these methods allow for convergence to optimal policies. Finally, we experimentally confirm that the methods we introduce function empirically, and conduct a wide array of experiments studying the empirical efficacy of many different agent indication schemes for graphical observation spaces.

---

Title: Generalization in Deep RL for TSP Problems via Equivariance and Local Search

Abstract: Deep reinforcement learning (RL) has proved to be a competitive heuristic for solving small-sized instances of traveling salesman problems (TSP), but its performance on larger-sized instances is insufficient. Since training on large instances is impractical, we design a novel deep RL approach with a focus on generalizability. Our proposition consisting of a simple deep learning architecture, which learns with novel RL training techniques, exploits two main ideas. First, we exploit equivariance to facilitate training. Second, we interleave efficient local search heuristics with the usual RL training to smooth the value landscape. In order to validate the whole approach, we empirically evaluate our proposition on random and realistic TSP problems against relevant state-of-the-art deep RL methods. Moreover, we present an ablation study to understand the contribution of each of its components.

---

Title: Towards Backwards-Compatible Data with Confounded Domain Adaptation

Abstract: Most current domain adaptation methods address either covariate shift or label shift, but are not applicable where they occur simultaneously and are confounded with each other. Domain adaptation approaches which do account for such confounding are designed to adapt covariates to optimally predict a particular label whose shift is confounded with covariate shift. In this paper, we instead seek to achieve general-purpose data backwards compatibility. This would allow the adapted covariates to be used for a variety of downstream problems, including on pre-existing prediction models and on data analytics tasks. To do this we consider a special case of generalized label shift (GLS), which we call confounded shift. We present a novel framework for this problem, based on minimizing the expected divergence between the source and target conditional distributions, conditioning on possible confounders. Within this framework, we propose using the Gaussian reverse Kullback-Leibler divergence, demonstrating the use of parametric and nonparametric Gaussian estimators of the conditional distribution. We also propose using the Maximum Mean Discrepancy (MMD), introducing a dynamic strategy for choosing the kernel bandwidth, which is applicable even outside the confounded shift setting. Finally, we demonstrate our approach on synthetic and real datasets.

---

Title: Why Emergent Communication is Repulsive

Abstract: With the success of deep reinforcement learning, there has been a resurgence of interest in situated emergent communication research. Properties of successful emergent communication have been identified which typically involve auxiliary losses that ensure a trade-off between ensuring diversity of message-action pairs, conditioned on observations, and consistency, when the reward acquired is significant. In this work, we draw theoretically connections between these auxiliary losses and the probabilistic framework of repulsive point processes. We show how in fact these auxiliary losses are promoting repulsive point processes, as well as outline ways in which the practitioner could utilise these repulsive point processes directly. We hope this newfound connection between language and repulsive point processes offers new avenues of research for the situated language researcher or probabilistic modeller.

---