Accepted papers
===============
Title: OADAT: Experimental and Synthetic Clinical Optoacoustic Data for Standardized Image Processing
Authors: Firat Ozdemir, Berkan Lafci, Xose Luis Dean-Ben, Daniel Razansky, Fernando Perez-Cruz
Abstract: Optoacoustic (OA) imaging is based on excitation of biological tissues with nanosecond-duration laser pulses followed by subsequent detection of ultrasound waves generated via light-absorption-mediated thermoelastic expansion. OA imaging features a powerful combination between rich optical contrast and high resolution in deep tissues. This enabled the exploration of a number of attractive new applications both in clinical and laboratory settings. However, no standardized datasets generated with different types of experimental set-up and associated processing methods are available to facilitate advances in broader applications of OA in clinical settings. This complicates an objective comparison between new and established data processing methods, often leading to qualitative results and arbitrary interpretations of the data. In this paper, we provide both experimental and synthetic OA raw signals and reconstructed image domain datasets rendered with different experimental parameters and tomographic acquisition geometries. We further provide trained neural networks to tackle three important challenges related to OA image processing, namely accurate reconstruction under limited view tomographic conditions, removal of spatial undersampling artifacts and anatomical segmentation for improved image reconstruction. Specifically, we define 44 experiments corresponding to the aforementioned challenges as benchmarks to be used as a reference for the development of more advanced processing methods.
URL: https://openreview.net/forum?id=BVi6MhKO0G
---
Title: Stacking Diverse Architectures to Improve Machine Translation
Authors: Andrea Schioppa, Nal Kalchbrenner
Abstract: Repeated applications of the same neural block primarily based on self-attention characterize the current state-of-the-art in neural architectures for machine translation. In such architectures the decoder adopts a masked version of the same encoding block. Although simple this strategy doesn't encode the various inductive biases such as locality that arise from alternative architectures and that are central to the modelling of translation. We propose Lasagna, an encoder-decoder model that aims to combine the inductive benefits of different architectures by layering multiple instances of different blocks. Lasagna’s encoder first grows the representation from local to mid-sized using convolutional blocks and only then applies a pair of final self-attention blocks. Lasagna’s decoder uses only convolutional blocks that attend to the encoder representation. On a large suit of machine translation tasks, we find that Lasagna not only matches or outperforms the Transformer baseline, but it does so more efficiently thanks to widespread use of the efficient convolutional blocks. These findings suggest that the widespread use of uniform architectures may be suboptimal in certain scenarios and exploiting the diversity of inductive architectural biases can lead to substantial gains.
URL: https://openreview.net/forum?id=mNEqiC924B
---
Title: Contrastive Search Is What You Need For Neural Text Generation
Authors: Yixuan Su, Nigel Collier
Abstract: Generating text with autoregressive language models (LMs) is of great importance to many natural language processing (NLP) applications. Previous solutions for this task often produce text that contains degenerative expressions (Welleck et al., 2020) or lacks semantic consistency (Basu et al., 2021). Recently, Su et al. (2022b) introduced a new decoding method, contrastive search, based on the isotropic representation space of the language model and obtained new state of the art on various benchmarks. Additionally, Su et al. (2022b) argued that the representations of autoregressive LMs (e.g. GPT-2) are intrinsically anisotropic which is also shared by previous studies (Ethayarajh, 2019). Therefore, to ensure the language model follows an isotropic distribution, Su et al. (2022b) proposed a contrastive learning scheme, SimCTG, which calibrates the language model’s representations through additional training.
In this study, we first answer the question: “Are autoregressive LMs really anisotropic?”. To this end, we extensively evaluate the isotropy of LMs across 16 major languages. Surprisingly, we find that the anisotropic problem only exists in the two specific English GPT-2-small/medium models. On the other hand, all other evaluated LMs are naturally isotropic which is in contrast to the conclusion drawn by previous studies (Ethayarajh, 2019; Su et al., 2022b). Based on our findings, we further assess the contrastive search decoding method using off-the-shelf LMs on four generation tasks across 16 languages. Our experimental results demonstrate that contrastive search significantly outperforms previous decoding methods without any additional training. More notably, on 12 out of the 16 evaluated languages, contrastive search performs comparably with human-level performances as judged by human evaluations.
URL: https://openreview.net/forum?id=GbkWw3jwL9
---
Title: Separable Self-attention for Mobile Vision Transformers
Authors: Sachin Mehta, Mohammad Rastegari
Abstract: Mobile vision transformers (MobileViT) can achieve state-of-the-art performance across several mobile vision tasks, including classification and detection. Though these models have fewer parameters, they have high latency as compared to convolutional neural network-based models. The main efficiency bottleneck in MobileViT is the multi-headed self-attention (MHA) in transformers, which requires $O(k^2)$ time complexity with respect to the number of tokens (or patches) $k$. Moreover, MHA requires costly operations (e.g., batch-wise matrix multiplication) for computing self-attention, impacting latency on resource-constrained devices. This paper introduces a separable self-attention method with linear complexity, i.e. $O(k)$. A simple yet effective characteristic of the proposed method is that it uses element-wise operations for computing self-attention, making it a good choice for resource-constrained devices. The improved model, MobileViTv2, is state-of-the-art on several mobile vision tasks, including ImageNet object classification and MS-COCO object detection. With about three million parameters, MobileViTv2 achieves a top-1 accuracy of 75.6% on the ImageNet dataset, outperforming MobileViT by about 1% while running $3.2\times$ faster on a mobile device. Our source code is available at: https://github.com/apple/ml-cvnets
URL: https://openreview.net/forum?id=tBl4yBEjKi
---
New submissions
===============
Title: Positive Difference Distribution based Image Outlier Detection using Normalizing Flows and Contrastive Data
Abstract: Detecting test data deviating from training data is a central problem for safe and robust machine learning. Likelihoods learned by a generative model, e.g., a normalizing flow via standard log-likelihood training, perform poorly as an outlier score. We propose to use an unlabelled auxiliary dataset and a probabilistic outlier score for outlier detection. We use a self-supervised feature extractor trained on the auxiliary dataset and train a normalizing flow on the extracted features by maximizing the likelihood on in-distribution data and minimizing the likelihood on the contrastive dataset. We show that this is equivalent to learning the normalized positive difference between the in-distribution and the contrastive feature density. We conduct experiments on benchmark datasets and compare to the likelihood, the likelihood ratio and state-of-the-art anomaly detection methods.
URL: https://openreview.net/forum?id=B4J40x7NjA
---