Daily TMLR digest for Feb 21, 2023

0 views

Skip to first unread message

TMLR

unread,

Feb 20, 2023, 7:00:10 PM2/20/23

to tmlr-anno...@googlegroups.com

Accepted papers
===============

Title: Layerwise Bregman Representation Learning of Neural Networks with Applications to Knowledge Distillation

Authors: Ehsan Amid, Rohan Anil, Christopher Fifty, Manfred K Warmuth

Abstract: We propose a new method for layerwise representation learning of a trained neural network that conforms to the non-linearity of the layer's transfer function. In particular, we form a Bregman divergence based on the convex function induced by the layer's transfer function and construct an extension of the original Bregman PCA formulation by incorporating a mean vector and revising the normalization constraint on the principal directions. These modifications allow exporting the learned representation as a fixed layer with a non-linearity. As an application to knowledge distillation, we cast the learning problem for the student network as predicting the compression coefficients of the teacher's representations, which is then passed as the input to the imported layer. Our empirical findings indicate that our approach is substantially more effective for transferring information between networks than typical teacher-student training that uses the teacher's soft labels.

URL: https://openreview.net/forum?id=6dsvH7pQHH

---

Title: Learn, Unlearn and Relearn: An Online Learning Paradigm for Deep Neural Networks

Authors: Vijaya Raghavan T Ramkumar, Elahe Arani, Bahram Zonooz

Abstract: Deep neural networks (DNNs) are often trained on the premise that the complete training data set is provided ahead of time. However, in real-world scenarios, data often arrive in chunks over time. This leads to important considerations about the optimal strategy for training DNNs, such as whether to fine-tune them with each chunk of incoming data (warm-start) or to retrain them from scratch with the entire corpus of data whenever a new chunk is available. While employing the latter for training can be resource-intensive, recent work has pointed out the lack of generalization in warm-start models. Therefore, to strike a balance between efficiency and generalization, we introduce "Learn, Unlearn, and Relearn (LURE)" an online learning paradigm for DNNs. LURE interchanges between the unlearning phase, which selectively forgets the undesirable information in the model through weight reinitialization in a data-dependent manner, and the relearning phase, which emphasizes learning on generalizable features. We show that our training paradigm provides consistent performance gains across datasets in both classification and few-shot settings. We further show that it leads to more robust and well-calibrated models.

URL: https://openreview.net/forum?id=WN1O2MJDST

---

New submissions
===============

Title: Black-Box Batch Active Learning for Regression

Abstract: Batch active learning is a popular approach for efficiently training machine learning models on large, initially unlabelled datasets, which repeatedly acquires labels for a batch of data points. However, many recent batch active learning methods are white-box approaches limited to differentiable parametric models: they score unlabeled points using acquisition functions based on model embeddings or first- and second-order derivatives. In this paper, we propose black-box batch active learning for regression tasks as an extension of white-box approaches. This approach is compatible with a wide range of machine learning models including regular and Bayesian deep learning models and non-differentiable models such as random forests. It is rooted in Bayesian principles and utilizes recent kernel-based approaches. Importantly, our method only relies on model predictions. This allows us to extend a wide range of existing state-of-the-art white-box batch active learning methods (BADGE, BAIT, LCMD) to black-box models. We demonstrate the effectiveness of our approach through extensive experimental evaluations on regression datasets, achieving surprisingly strong performance compared to white-box approaches for deep learning models.

URL: https://openreview.net/forum?id=fvEvDlKko6

---

Title: Evaluation of Causal Inference Models to Access Heterogeneous Treatment Effect

Abstract: Causal inference has gained popularity over the last years due to the ability to see through
correlation and find causal relationship between covariates. There are a number of methods
that were created to this end, but there is not a systematic benchmark between those
methods, including the benefits and drawbacks of using each one of them. This research
compares a number of those methods on how well they access the heterogeneous treatment
effect using a variety of synthetically created data sets, divided between low-dimensional
and high-dimensional covariates and increasing complexity between the covariates and the
target. We compare the error between those method and discuss in which setting and
premises each method is better suited.

URL: https://openreview.net/forum?id=75NtszAm76

---

Title: Learning Online Data Association

Abstract: When an agent interacts with a complex environment, it receives a stream of percepts in which it may detect entities, such as objects or people. To build up a coherent, low-variance estimate of the underlying state, it is necessary to fuse information from multiple detections over time. To do this fusion, the agent must decide which detections to associate with one another. We address this data-association problem in the setting of an online filter, in which each observation is processed by aggregating into an existing object hypothesis. Classic methods with strong probabilistic foundations exist, but they are computationally expensive and require models that can be difficult to acquire. In this work, we use the deep-learning tools of sparse attention and representation learning to learn a machine that processes a stream of detections and outputs a set of hypotheses about objects in the world. We evaluate this approach on simple clustering problems, problems with dynamics, and complex image-based domains. We find that it generalizes well from short to long observation sequences and from a few to many hypotheses, outperforming other learning approaches and classical non-learning methods.

URL: https://openreview.net/forum?id=iW9lN8EGLw

---

Title: Subgraph Permutation Equivariant Networks

Abstract: In this work we develop a new method, named Sub-graph Permutation Equivariant Networks (SPEN), which provides a framework for building graph neural networks that operate on sub-graphs, while using a base update function that is permutation equivariant, that are equivariant to a novel choice of automorphism group. Message passing neural networks have been shown to be limited in their expressive power and recent approaches to over come this either lack scalability or require structural information to be encoded into the feature space. The general framework presented here overcomes the scalability issues associated with global permutation equivariance by operating more locally on sub-graphs. In addition, through operating on sub-graphs the expressive power of higher-dimensional global permutation equivariant networks is improved; this is due to fact that two non-distinguishable graphs often contain distinguishable sub-graphs. Furthermore, the proposed framework only requires a choice of $k$-hops for creating ego-network sub-graphs and a choice of representation space to be used for each layer, which makes the method easily applicable across a range of graph based domains. We experimentally validate the method on a range of graph benchmark classification tasks, demonstrating statistically indistinguishable results from the state-of-the-art on six out of seven benchmarks. Further, we demonstrate that the use of local update functions offers a significant improvement in GPU memory over global methods.

URL: https://openreview.net/forum?id=3agxS3aDUs

---

Reply all

Reply to author

Forward

0 new messages