Daily TMLR digest for Jun 30, 2024

0 views
Skip to first unread message

TMLR

unread,
12:00 AM (12 hours ago) 12:00 AM
to tmlr-anno...@googlegroups.com


Accepted papers
===============


Title: Bytes Are All You Need: Transformers Operating Directly On File Bytes

Authors: Maxwell Horton, Sachin Mehta, Ali Farhadi, Mohammad Rastegari

Abstract: Modern deep learning approaches usually utilize modality-specific processing. For example, the most common deep learning approach to image classification involves decoding image file bytes into an RGB tensor which is passed into a neural network. Instead, we investigate modality-independent representation learning by performing classification directly on file bytes, without the need for decoding files at inference time. This enables models to operate on various modalities without any hand-designed, modality-specific processing. Our model, ByteFormer, improves ImageNet Top-1 classification accuracy by $5\%$ (from $72.2\%$ to $77.33\%$) relative to DeIT models of similar size. Compared to Perceiver IO, our model requires absolutely no modality-specific processing at inference time, and uses an order of magnitude fewer parameters at equivalent accuracy on ImageNet. We demonstrate that the same ByteFormer architecture can perform audio classification without modifications or modality-specific preprocessing. We achieve $95.42\%$ classification accuracy on the Speech Commands V2 dataset (comparable to the state-of-the-art accuracy of $98.7\%$). Additionally, we demonstrate that ByteFormer can operate jointly on images and audio, handling joint classification without explicit knowledge of the input modality. We release our code at https://github.com/apple/corenet/tree/main/projects/byteformer.

URL: https://openreview.net/forum?id=RkaqxxAOfN

---


New submissions
===============


Title: Conditional Idempotent Generative Networks

Abstract: We propose Conditional Idempotent Generative Networks (CIGN), a new approach that expands upon Idempotent Generative Networks (IGN) to enable conditional generation.
While IGNs offer efficient single-pass generation, they lack the ability to control the content of the generated data.
CIGNs address this limitation by incorporating conditioning mechanisms, allowing users to steer the generation process towards specific types of data.

We establish the theoretical foundations for CIGNs, outlining their scope, loss function and evaluation metrics.
We then present two potential architectures for implementing CIGNs, which we call channel conditioning and filter conditioning.
We discuss experimental results obtained on the MNIST dataset, demonstrating the effectiveness of both conditioning approaches.
Our findings pave the way for further exploration of CIGNs on larger datasets and more complex use cases.

URL: https://openreview.net/forum?id=VOKmQLsl6C

---

Title: Deep Neural Networks Can Learn Generalizable Same-Different Visual Relations

Abstract: Although deep neural networks can achieve human-level performance on many object recognition benchmarks, prior work suggests that these same models fail to learn simple abstract relations, such as determining whether two objects are the same or different. Much of this prior work focuses on training convolutional neural networks to classify images of two same or two different abstract shapes, testing generalization on within-distribution stimuli. In this article, we comprehensively study whether deep neural networks can acquire and generalize same-different relations both within and out-of-distribution using a variety of architectures, forms of pretraining, and fine-tuning datasets. We find that certain pretrained transformers can learn a same-different relation that generalizes with near perfect accuracy to out-of-distribution stimuli. Furthermore, we find that fine-tuning on abstract shapes that lack texture or color provides the strongest out-of-distribution generalization. Our results suggest that, with the right approach, deep neural networks can learn generalizable same-different visual relations.

URL: https://openreview.net/forum?id=fYIO7nQrTZ

---
Reply all
Reply to author
Forward
0 new messages