Daily TMLR digest for Jun 24, 2024

0 views

Skip to first unread message

TMLR

unread,

Jun 24, 2024, 12:00:23 AM (7 days ago) Jun 24

to tmlr-anno...@googlegroups.com

Accepted papers
===============

Title: Learning Tree-Structured Composition of Data Augmentation

Authors: Dongyue Li, Kailai Chen, Predrag Radivojac, Hongyang R. Zhang

Abstract: Data augmentation is widely used in scenarios where one needs to train a neural network given little labeled data. A common practice of augmentation training is applying a composition of multiple transformations sequentially to the data. Existing augmentation methods such as RandAugment rely on domain expertise to select a list of transformations, while other methods such as AutoAugment formulate an optimization problem over a search space of size $k^d$, which is the number of sequences of length $d$, given a list of $k$ transformation functions.

In this paper, we focus on designing efficient algorithms whose running time complexity is much faster than the worst-case complexity of $O(k^d)$, provably. We propose a new algorithm to search for a binary tree-structured composition of $k$ transformations, where each tree node corresponds to one transformation. The binary tree generalizes sequential augmentations, such as the one constructed by SimCLR. Using a top-down, recursive search procedure, our algorithm achieves a runtime complexity of $O(2^d k)$, which is much faster than $O(k^d)$ as $k$ increases above $2$. We apply the algorithm to tackle data distributions with heterogeneous subpopulations, by searching for one tree in each subpopulation, and then learn a weighted combination, leading to a forest of the trees.

We validate the proposed algorithms on numerous graph and image data sets, including a multi-label graph classification data set we collected. The data set exhibits significant variations in the sizes of graphs and their average degrees, making it ideal for studying data augmentation. We show that our approach can reduce the computation cost (measured by GPU hours) by 43% over existing augmentation search methods while improving performance by 4.3%. Extensive experiments on contrastive learning also validate the benefit of our approach. The tree structures can be used to interpret the relative importance of each transformation, such as identifying the important transformations on small vs. large graphs.

URL: https://openreview.net/forum?id=lmgf03HeqV

---

Title: Large Language Models (LLMs) on Tabular Data: Prediction, Generation, and Understanding - A Survey

Authors: Xi Fang, Weijie Xu, Fiona Anting Tan, Ziqing Hu, Jiani Zhang, Yanjun Qi, Srinivasan H. Sengamedu, Christos Faloutsos

Abstract: Recent breakthroughs in large language modeling have facilitated rigorous exploration of their application in diverse tasks related to tabular data modeling, such as prediction, tabular data synthesis, question answering, and table understanding. Each task presents unique challenges and opportunities. However, there is currently a lack of comprehensive review that summarizes and compares the key techniques, metrics, datasets, models, and optimization approaches in this research domain. This survey aims to address this gap by consolidating recent progress in these areas, offering a thorough survey and taxonomy of the datasets, metrics, and methodologies utilized. It identifies strengths, limitations, unexplored territories, and gaps in the existing literature, while providing some insights for future research directions in this vital and rapidly evolving field. It also provides relevant code and datasets references. Through this comprehensive review, we hope to provide interested readers with pertinent references and insightful perspectives, empowering them with the necessary tools and knowledge to effectively navigate and address the prevailing challenges in the field.

URL: https://openreview.net/forum?id=IZnrCGF9WI

---

New submissions
===============

Title: Multi-intention Inverse Q-learning for Interpretable Behavior Representation

Abstract: In advancing the understanding of natural decision-making processes, inverse reinforcement learning (IRL) methods have proven instrumental in reconstructing animal's intentions underlying complex behaviors. Given the recent development of a continuous-time multi-intention IRL framework, there has been persistent inquiry into inferring discrete time-varying rewards with IRL. To address this challenge, we introduce the class of hierarchical inverse Q-learning (HIQL) algorithms. Through an unsupervised learning process, HIQL divides expert trajectories into multiple intention segments, and solves the IRL problem independently for each. Applying HIQL to simulated experiments and several real animal behavior datasets, our approach outperforms current benchmarks in behavior prediction and produces interpretable reward functions. Our results suggest that the intention transition dynamics underlying complex decision-making behavior is better modeled by a step function instead of a smoothly varying function. This advancement holds promise for neuroscience and cognitive science, contributing to a deeper understanding of decision-making and uncovering underlying brain mechanisms.

URL: https://openreview.net/forum?id=hrKHkmLUFk

---

Title: Nonlinear Behaviour of Critical Points for a Simple Neural Network

Abstract: In severely over-parametrized regimes, neural network optimization can be analyzed by linearization techniques as the neural tangent kernel, which shows gradient descent convergence to zero training error, and landscape analysis, which shows that all local minima are global minima.

Practical networks are often much less over-parametrized, and training behavior becomes more nuanced and nonlinear. This paper contains a fine grained analysis of the nonlinearity for a simple shallow network in one dimension. We show that the networks have unfavorable critical points, which can be mitigated by sufficiently high local resolution. Given this resolution, all critical points satisfy $L_2$ loss bounds of optimal adaptive approximation in Sobolev and Besov spaces on convex and concave subdomains of the target function. These bounds cannot be matched by linear approximation methods and show nonlinear and global behavior of the critical point's inner weights.

URL: https://openreview.net/forum?id=wfdG2PEOHS

---

Title: Structure-Preserving Network Compression Via Low-Rank Induced Training Through Linear Layers Composition

Abstract: Deep Neural Networks (DNNs) have achieved remarkable success in addressing many previously unsolvable tasks. However, the storage and computational requirements associated with DNNs pose a challenge for deploying these trained models on resource-limited devices. Therefore, a plethora of compression and pruning techniques have been proposed in recent years. Low-rank decomposition techniques are among the approaches most utilized to address this problem. Compared to post-training compression, compression-promoted training is still under-explored. In this paper, we present a theoretically-justified novel approach, termed Low-Rank Induced Training (LoRITa), that promotes low-rankness through the composition of linear layers and compresses by using singular value truncation. This is achieved without the need to change the structure at inference time or require constrained and/or additional optimization, other than the standard weight decay regularization. Moreover, LoRITa eliminates the need to (i) initialize with pre-trained models, (ii) specify rank selection prior to training, and (iii) compute SVD in each iteration. Our experimental results (i) demonstrate the effectiveness of our approach using MNIST on Fully Connected Networks, CIFAR10 on Vision Transformers, and CIFAR10/100 on Convolutional Neural Networks, and (ii) illustrate that we achieve either competitive or state-of-the-art results when compared to leading structured pruning and low-rank training methods in terms of FLOPs and parameters drop.

URL: https://openreview.net/forum?id=1KCrVMJoJ9

---

Title: TSCMamba: Mamba Meets Multi-View Learning for Time Series Classification

Abstract: Time series classification (TSC) on multivariate time series is a critical problem. We propose a novel multi-view approach integrating frequency-domain and time-domain features to provide complementary contexts for TSC. Our method fuses continuous wavelet transform spectral features with temporal convolutional or multilayer perceptron features. We leverage the Mamba state space model for efficient and scalable sequence modeling. We also introduce a novel tango scanning scheme to better model sequence relationships. Experiments on 10 standard benchmark datasets demonstrate our approach achieves an average 6.45% accuracy improvement over state-of-the-art TSC models.

URL: https://openreview.net/forum?id=cpHGwrkbbb

---

Reply all

Reply to author

Forward

0 new messages