Daily TMLR digest for Jun 02, 2024

0 views
Skip to first unread message

TMLR

unread,
Jun 2, 2024, 12:00:07 AMJun 2
to tmlr-anno...@googlegroups.com

Accepted papers
===============


Title: Enhancing Low-Precision Sampling via Stochastic Gradient Hamiltonian Monte Carlo

Authors: Ziyi Wang, Yujie Chen, Qifan Song, Ruqi Zhang

Abstract: Low-precision training has emerged as a promising low-cost technique to enhance the training efficiency of deep neural networks without sacrificing much accuracy. Its Bayesian counterpart can further provide uncertainty quantification and improved generalization accuracy.
This paper investigates low-precision sampling via Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) with low-precision and full-precision gradient accumulators for both strongly log-concave and non-log-concave distributions. Theoretically, our results show that to achieve $\epsilon$-error in the 2-Wasserstein distance for non-log-concave distributions, low-precision SGHMC achieves quadratic improvement ($\tilde{\mathcal{O}}\left({\epsilon^{-2}{\mu^*}^{-2}\log^2\left({\epsilon^{-1}}\right)}\right)$) compared to the state-of-the-art low-precision sampler, Stochastic Gradient Langevin Dynamics (SGLD) ($\tilde{\mathcal{O}}\left({{\epsilon}^{-4}{\lambda^{*}}^{-1}\log^5\left({\epsilon^{-1}}\right)}\right)$). Moreover, we prove that low-precision SGHMC is more robust to the quantization error compared to low-precision SGLD due to the robustness of the momentum-based update w.r.t. gradient noise.
Empirically, we conduct experiments on synthetic data, and MNIST, CIFAR-10 \& CIFAR-100 datasets, which validate our theoretical findings. Our study highlights the potential of low-precision SGHMC as an efficient and accurate sampling method for large-scale and resource-limited machine learning.

URL: https://openreview.net/forum?id=uSLNzzuiDJ

---

Title: Improved Convergence of Score-Based Diffusion Models via Prediction-Correction

Authors: Francesco Pedrotti, Jan Maas, Marco Mondelli

Abstract: Score-based generative models (SGMs) are powerful tools to sample from complex data distributions. Their underlying idea is to \emph{(i)} run a forward process for time $T_1$ by adding noise to the data, \emph{(ii)} estimate its score function, and \emph{(iii)} use such estimate to run a reverse process. As the reverse process is initialized with the stationary distribution of the forward one, the existing analysis paradigm requires $T_1\to\infty$. This is however problematic: from a theoretical viewpoint, for a given precision of the score approximation, the convergence guarantee fails as $T_1$ diverges; from a practical viewpoint, a large $T_1$ increases computational costs and leads to error propagation.
This paper addresses the issue by considering a version of the popular \emph{predictor-corrector} scheme: after running the forward process, we first estimate the final distribution via an inexact Langevin dynamics and then revert the process. Our key technical contribution is to provide convergence guarantees which require to run the forward process \emph{only for a fixed finite time} $T_1$.
Our bounds exhibit a mild logarithmic dependence on the input dimension and the subgaussian norm of the target distribution, have minimal assumptions on the data, and require only to control the $L^2$ loss on the score approximation, which is the quantity minimized in practice.

URL: https://openreview.net/forum?id=0zKvH7YiAq

---


New submissions
===============


Title: Learning a Decision Tree Algorithm with Transformers

Abstract: Decision trees are renowned for their ability to achieve high predictive performance while remaining interpretable, especially on tabular data. Traditionally, they are constructed through recursive algorithms, where they partition the data at every node in a tree. However, identifying a good partition is challenging, as decision trees optimized for local segments may not bring global generalization. To address this, we introduce MetaTree, which trains a Transformer-based model on outputs from classical algorithms to directly produce strong decision trees in a meta-learning manner. Specifically, we fit both greedy decision trees and optimized decision trees on a large number of datasets. We then train MetaTree to produce the trees that achieve strong generalization performance. This training enables MetaTree to emulate these algorithms and intelligently adapt its strategy according to the context, thereby achieving superior generalization performance.

URL: https://openreview.net/forum?id=1Kzzm22usl

---

Title: M$^3$PL: Identifying and Exploiting View Bias of Prompt Learning

Abstract: Prompt learning is an effective means of fine-tuning multi-modal foundation models such as CLIP. Despite existing success, the inner mechanism of multi-modal prompt learning has not been well understood. In this work, we identify an inductive bias of multi-modal prompt learning, which we refer to as view bias, that the learned prompts may extract only a partial subset of useful features (views) and ignore others. This bias can undermine the model's generalization ability, particularly under distribution shifts. We further observe that independently trained prompts have distinct view biases, contrary to the existing belief that they may converge to similar local optima due to having the same cross-modal representation matching objective. Based on our observations, we propose Multi-modal Matching Multi-Prompt Learning (M$^3$PL), which incorporates multiple paired prompts and a cross-modal contrastive regularizer that facilitates the prompt pairs to encapsulate a broader spectrum of views. Extensive experiments show that M$^3$PL effectively boosts the model's generalization capability, achieving state-of-the-art performance under various distribution shifts.

URL: https://openreview.net/forum?id=2rnTIBm19V

---

Title: Variational Autoencoding of Dental Point Clouds

Abstract: Digital dentistry has made significant advancements, yet numerous challenges remain. This paper introduces the FDI 16 dataset, an extensive collection of tooth meshes and point clouds. Additionally, we present a novel approach: Variational FoldingNet (VF-Net), a fully probabilistic variational autoencoder for point clouds. Notably, prior latent variable models for point clouds lack a one-to-one correspondence between input and output points. Instead, they rely on optimizing Chamfer distances, a metric that lacks a normalized distributional counterpart, rendering it unsuitable for probabilistic modeling. We replace the explicit minimization of Chamfer distances with a suitable encoder, increasing computational efficiency while simplifying the probabilistic extension. This allows for straightforward application in various tasks, including mesh generation, shape completion, and representation learning. Empirically, we provide evidence of lower reconstruction error in dental reconstruction and interpolation, showcasing state-of-the-art performance in dental sample generation while identifying valuable latent representations

URL: https://openreview.net/forum?id=nH416rLxtI

---

Title: Training Wasserstein GANs without Gradient Penalties via Duality

Abstract: We propose a new method for training Wasserstein generative adversarial networks (WGANs) without using gradient penalties. We develop a novel approach to accurately estimate the Wasserstein distance between datasets, specifically tailored for the generative setting. Instead of employing the commonly used gradient penalty term in the WGAN training procedure, we introduce two objective functions that utilize the $c$-transform based on Kantorovich duality, which is a fundamental property in optimal transport theory. Through our experiments, we observe that this algorithm effectively enforces the Lipschitz constraint on the discriminator, paving the way for understanding the optimal transport problem via a deep learning approach. As a result, our method provides an accurate estimation not only for the optimal discriminator but also for the Wasserstein distance between the true and generated distribution. Notably, our method eliminates the need for gradient penalties and corresponding hyperparameter tuning. Moreover, we demonstrate its effectiveness by generating competitive synthetic images using various datasets such as MNIST, Fashion-MNIST, CIFAR-10, and CelebA-HQ.

URL: https://openreview.net/forum?id=vN3eAGHhpC

---

Title: Identifying and Clustering Counter Relationships of Team Compositions in PvP Games for Efficient Balance Analysis

Abstract: $\textbf{How can balance be quantified in game settings?}$ This question is crucial for game designers, especially in player-versus-player (PvP) games, where analyzing the strength relations among predefined team compositions—such as hero combinations in MOBA games or decks in card games—is essential for enhancing gameplay and achieving balance. We have developed two advanced measures that extend beyond the simplistic win rate to quantify balance in competitive scenarios. These measures are derived from win value estimations, which employ strength rating approximations via the Bradley-Terry model and counter relationship approximations via vector quantization, significantly reducing the computational complexity associated with traditional win value estimations.
Throughout the learning process of these models, we identify useful categories of compositions and pinpoint their counter relationships, aligning with the experiences of human players without requiring specific game knowledge. Our methodology, underpinned by a novel discrete representation learning technique, enhances codebook utilization in a deterministic vector quantization process with an extremely small state space.
Our framework has been validated in popular online games, including $\textit{Age of Empire II}$, $\textit{Hearthstone}$, $\textit{Brawl Stars}$, and $\textit{League of Legends}$. The accuracy of the observed strength relations in these games is comparable to traditional pairwise win value predictions, while also offering a more manageable complexity for analysis. Ultimately, our findings contribute to a deeper understanding of PvP game dynamics and present a methodology that significantly improves game balance evaluation and design.

URL: https://openreview.net/forum?id=2D36otXvBE

---

Title: Model Recycling Framework for Multi-Source Data-Free Supervised Transfer Learning

Abstract: Increasing concerns for data privacy and other difficulties associated with retrieving source data for model training have created the need for source-free transfer learning, in which one only has access to pre-trained models instead of data from the original source domains. This setting introduces many challenges, as most existing transfer learning methods require the availability of source data, and thus cannot be directly adapted to the source-free scenario. Further, practical concerns make it more difficult, for instance efficiently selecting models for transfer without information on source data, and transferring without full access to the source models. So motivated, we propose a model recycling framework for parameter-efficient training of models that identifies subsets of related source models to reuse in both white-box and black-box settings. Consequently, our framework makes it possible for Model as a Service (MaaS) providers to build libraries of efficient pre-trained models, thus creating an opportunity for multi-source data-free supervised transfer learning.

URL: https://openreview.net/forum?id=CuPQhnRJFs

---

Reply all
Reply to author
Forward
0 new messages