Daily TMLR digest for Mar 16, 2023

2 views

Skip to first unread message

TMLR

unread,

Mar 15, 2023, 8:00:09 PM3/15/23

to tmlr-anno...@googlegroups.com

Accepted papers
===============

Title: Enhancing Diffusion-Based Image Synthesis with Robust Classifier Guidance

Authors: Bahjat Kawar, Roy Ganz, Michael Elad

Abstract: Denoising diffusion probabilistic models (DDPMs) are a recent family of generative models that achieve state-of-the-art results. In order to obtain class-conditional generation, it was suggested to guide the diffusion process by gradients from a time-dependent classifier. While the idea is theoretically sound, deep learning-based classifiers are infamously susceptible to gradient-based adversarial attacks. Therefore, while traditional classifiers may achieve good accuracy scores, their gradients are possibly unreliable and might hinder the improvement of the generation results. Recent work discovered that adversarially robust classifiers exhibit gradients that are aligned with human perception, and these could better guide a generative process towards semantically meaningful images. We utilize this observation by defining and training a time-dependent adversarially robust classifier and use it as guidance for a generative diffusion model. In experiments on the highly challenging and diverse ImageNet dataset, our scheme introduces significantly more intelligible intermediate gradients, better alignment with theoretical findings, as well as improved generation results under several evaluation metrics. Furthermore, we conduct an opinion survey whose findings indicate that human raters prefer our method's results.

URL: https://openreview.net/forum?id=tEVpz2xJWX

---

Title: Improved Overparametrization Bounds for Global Convergence of SGD for Shallow Neural Networks

Authors: Bartłomiej Polaczyk, Jacek Cyranka

Abstract: We study the overparametrization bounds required for the global convergence of stochastic gradient descent algorithm for a class of one hidden layer feed-forward neural networks equipped with ReLU activation function. We improve the existing state-of-the-art results in terms of the required hidden layer width. We introduce a new proof technique combining nonlinear analysis with properties of random initializations of the network.

URL: https://openreview.net/forum?id=RjZq6W6FoE

---

New submissions
===============

Title: Designing Injective and Low-Entropic Transformer for Short-Long Range Encoding

Abstract: Multi-headed self-attention-based Transformers have shown promise in different learning tasks. Albeit these models exhibit significant improvement in understanding short-term and long-term contexts from sequences, encoders of Transformers and their variants fail to preserve layer-wise contextual information. Transformers usually project tokens onto a sparse manifold and fail to preserve injectivity among the token representations. In this work, we propose TransJect, an encoder model that guarantees a theoretical bound for layer-wise distance preservation between a pair of tokens. We propose a simple alternative to dot product attention to ensure Lipschitz continuity. This allows TransJect to learn injective mappings to transform token representations to different manifolds with similar topology and preserve Euclidean distance between every pair of tokens in subsequent layers. Evaluations across multiple benchmark short- and long-sequence classification tasks show maximum improvements of $6.8\%$ and $5.9\%$, respectively, over the variants of Transformers. TransJect achieves the best average accuracy on the long-range arena benchmark, showcasing its superiority in capturing temporal and spatial hierarchical relationships from long sequences. We further highlight the shortcomings of multi-headed self-attention from the statistical physics viewpoint. Although multi-headed self-attention was incepted to learn different abstraction levels within the networks, our empirical analyses suggest that different attention heads learn randomly and unorderly. On the contrary, TransJect adapts a mixture of experts for regularization; these experts are found to be more orderly and balanced and learn different sparse representations from the input sequences. TransJect exhibits very low entropy, and therefore, can be efficiently scaled to larger depths.

URL: https://openreview.net/forum?id=MOvh472UNH

---

Title: Elementwise Language Representation

Abstract: We propose a new technique for computational language representation called elementwise embedding, in which a material (semantic unit) is abstracted into a horizontal concatenation of lower-dimensional element (character) embeddings. While elements are always characters, materials are arbitrary levels of semantic units so it generalizes to any type of tokenization. To focus only on the important letters, the $n^{th}$ spellings of each semantic unit are aligned in $n^{th}$ attention heads, then concatenated back into original forms creating unique embedding representations; they are jointly projected thereby determining own contextual importance. Technically, this framework is achieved by passing a sequence of materials, each consists of $v$ elements, to a transformer having $h=v$ attention heads. As a pure embedding technique, elementwise embedding replaces the $w$-dimensional embedding table of a transformer model with $256$ $c$-dimensional elements (each corresponding to one of UTF-8 bytes) where $c=w/v$. Using this novel approach, we show that the standard transformer architecture can be reused for all levels of language representations and be able to process much longer sequences at the same time-complexity without "any" architectural modification and additional overhead. BERT trained with elementwise embedding outperforms its subword equivalence (original implementation) in multilabel patent document classification exhibiting superior robustness to domain-specificity and data imbalance, despite using $0.005\%$ of embedding parameters. Experiments demonstrate the generalizability of the proposed method by successfully transferring these enhancements to differently architected transformers CANINE and ALBERT.

URL: https://openreview.net/forum?id=J5RDV32Yu9

---

Reply all

Reply to author

Forward

0 new messages