Daily TMLR digest for Aug 04, 2022

0 views
Skip to first unread message

TMLR

unread,
Aug 3, 2022, 8:00:09 PM8/3/22
to tmlr-anno...@googlegroups.com

Accepted papers
===============


Title: Exploring Generative Neural Temporal Point Process

Authors: Haitao Lin, Lirong Wu, Guojiang Zhao, Liu Pai, Stan Z. Li

Abstract: Temporal point process (TPP) is commonly used to model the asynchronous event sequence featuring occurrence timestamps and revealed by probabilistic models conditioned on historical impacts.
While lots of previous works have focused on `goodness-of-fit' of TPP models by maximizing the likelihood, their predictive performance is unsatisfactory, which means the timestamps generated by models are far apart from true observations.
Recently, deep generative models such as denoising diffusion and score matching models have achieved great progress in image generating tasks by demonstrating their capability of generating samples of high quality.
However, there are no detailed and unified works exploring and studying the potential of generative models in the context of event prediction of TPP.
In this work, we try to fill the gap by designing a unified generative framework for neural temporal point process (GNTPP) model to explore their feasibility and effectiveness, and further improve models' predictive performance.
Besides, in terms of measuring the historical impacts, we revise the attentive models which summarize influence from historical events with an adaptive reweighting term considering events' type relation and time intervals.
Extensive experiments have been conducted to illustrate the improved predictive capability of GNTPP with a line of generative probabilistic decoders, and performance gain from the revised attention.
To the best of our knowledge, this is the first work that adapts generative models in a complete unified framework and studies their effectiveness in the context of TPP.


URL: https://openreview.net/forum?id=NPfS5N3jbL

---

Title: Improving the Trainability of Deep Neural Networks through Layerwise Batch-Entropy Regularization

Authors: David Peer, Bart Keulen, Sebastian Stabinger, Justus Piater, Antonio Rodriguez-sanchez

Abstract: Training deep neural networks is a very demanding task, especially challenging is how to adapt architectures to improve the performance of trained models. We can find that sometimes, shallow networks generalize better than deep networks, and the addition of more layers results in higher training and test errors. The deep residual learning framework addresses this degradation problem by adding skip connections to several neural network layers. It would at first seem counter-intuitive that such skip connections are needed to train deep networks successfully as the expressivity of a network would grow exponentially with depth. In this paper, we first analyze the flow of information through neural networks. We introduce and evaluate the batch-entropy which quantifies the flow of information through each layer of a neural network. We prove empirically and theoretically that a positive batch-entropy is required for gradient descent-based training approaches to optimize a given loss function successfully. Based on those insights, we introduce batch-entropy regularization to enable gradient descent-based training algorithms to optimize the flow of information through each hidden layer individually. With batch-entropy regularization, gradient descent optimizers can transform untrainable networks into trainable networks. We show empirically that we can therefore train a "vanilla" fully connected network and convolutional neural network---no skip connections, batch normalization, dropout, or any other architectural tweak---with 500 layers by simply adding the batch-entropy regularization term to the loss function. The effect of batch-entropy regularization is not only evaluated on vanilla neural networks, but also on residual networks, autoencoders, and also transformer models over a wide range of computer vision as well as natural language processing tasks.

URL: https://openreview.net/forum?id=LJohl5DnZf

---


New submissions
===============


Title: Neural Fixed-Point Acceleration for Second-order Cone Optimization Problems

Abstract:
Continuous fixed-point problems are a computational primitive in numerical computing, optimization, machine learning, and the natural and social sciences, and have recently been incorporated into deep learning models as optimization layers. Acceleration of fixed-point computations has traditionally been explored in optimization research without the use of learning. In this work, we introduce neural fixed-point acceleration, a framework to automatically learn to accelerate fixed-point problems that are drawn from a distribution; a key question motivating our work is to better understand the characteristics that make neural acceleration more beneficial for some problems than others. We apply the framework to solve second-order cone programs with the Splitting Conic Solver (SCS), and evaluate on distributions of Lasso problems and Kalman filtering problems. Our main results show that we are able to get a 10× performance improvement in accuracy on the Kalman filtering distribution, while those on Lasso are much more modest. We then isolate a few factors that make neural acceleration much more useful on the Kalman filtering distribution than on the Lasso distribution; we apply a number of problem and distribution modifications on a scaled-down version of the Lasso problem, adding in properties that make it structurally closer to Kalman filtering, and show when the problem benefits from neural acceleration.

URL: https://openreview.net/forum?id=9iRpRNB994

---

Title: Systematically and efficiently improving existing $k$-means initialization algorithms by pairwise-nearest-neighbor smoothing

Abstract: We present a meta-method for initializing (seeding) the $k$-means clustering algorithm called PNN-smoothing. It consists in splitting
a given dataset into $J$ random subsets, clustering each of them individually, and merging the resulting clusterings with the pairwise-nearest-neighbor (PNN) method. It is a meta-method in the sense that when clustering the individual subsets any seeding algorithm can be used. If the computational complexity of that seeding algorithm is linear in the size of the data $N$ and the number of clusters $k$, PNN-smoothing is also almost linear with an appropriate choice of $J$, and quite competitive in practice. We show empirically, using several existing seeding methods and testing on several synthetic and real datasets, that this procedure results in systematically better costs. Our implementation is publicly available at [ANONYMIZED - ATTACHED AS AUXILIARY MATERIAL FOR THE REVIEW].

URL: https://openreview.net/forum?id=FTtFAg3pek

---

Title: On Noise Abduction for Answering Counterfactual Queries: A Practical Outlook

Abstract: A crucial step in counterfactual inference is abduction - inference of the exogenous noise variables. Deep Learning approaches model an exogenous noise variable as a latent variable. Our ability to infer a latent variable comes at a computational cost as well as a statistical cost. In this paper, we show that it may not be necessary to abduct all the noise variables in a structural causal model (SCM) to answer a counterfactual query. In a fully specified causal model with no unobserved confounding, we also identify exogenous noises that must be abducted for a counterfactual query. We introduce a graphical condition for noise identification from an action consisting of an arbitrary combination of hard and soft interventions. We report experimental results on both synthetic and real-world German Credit Dataset showcasing the promise and usefulness of the proposed exogenous noise identification.

URL: https://openreview.net/forum?id=4FU8Jz1Oyj

---

Title: An Efficient One-Class SVM for Novelty Detection in IoT

Abstract: One-Class Support Vector Machines (OCSVM) are a state-of-the-art approach for novelty detection, due to their flexibility in fitting complex nonlinear boundaries between {normal} and {novel} data. Novelty detection is important in the Internet of Things (``IoT'') due to the threats these devices can present, and OCSVM often performs well in these environments due to the variety of devices, traffic patterns, and anomalies that IoT devices present. Unfortunately, conventional OCSVMs can introduce prohibitive memory and computational overhead at detection time. This work designs, implements and evaluates an efficient OCSVM for such practical settings. We extend Nystr\"om and (Gaussian) Sketching approaches to OCSVM, combining these methods with clustering and Gaussian mixture models to achieve 15-30x speedup in prediction time and 30-40x reduction in memory requirements, without sacrificing detection accuracy. Here, the very nature of IoT devices is crucial: they tend to admit few modes of \emph{normal} operation, allowing for efficient pattern compression.

URL: https://openreview.net/forum?id=LFkRUCalFt

---

Title: Revisiting the Noise Model of Stochastic Gradient Descent

Abstract: The stochastic gradient noise (SGN) is known as a significant factor in the success of stochastic gradient descent (SGD).
Following the central limit theorem, SGN was initially modeled as Gaussian, and lately, it has been suggested that stochastic gradient noise is better characterized using $S\alpha S$ Lévy distribution.
This claim was allegedly refuted and rebounded to the previously suggested Gaussian noise model.
This paper presents solid and detailed empirical evidence that SGN is heavy-tailed and better depicted by the $S\alpha S$ distribution. Furthermore, we argue that different parameters in a deep neural network (DNN) hold distinct SGN characteristics throughout training. To more accurately approximate the dynamics of SGD near a local minimum, we construct a novel framework in $\mathbb{R}^N$, based on Lévy-driven stochastic differential equation (SDE), where one-dimensional Lévy processes model each parameter in the DNN. Next, we study the effect of learning rate decay (LRdecay) on the training process. We demonstrate theoretically and empirically that its main optimization advantage stems from the reduction of the SGN. Based on our analysis, we examine the mean escape time, trapping probability, and more properties of DNNs near local minima. Finally, we prove that the training process is more likely to exit from the basin in the direction of parameters with heavier tail SGN. We will share our code for reproducibility.

URL: https://openreview.net/forum?id=uNl5MLPvTz

---

Reply all
Reply to author
Forward
0 new messages