Daily TMLR digest for Jul 15, 2024

0 views

Skip to first unread message

TMLR

unread,

Jul 15, 2024, 12:00:07 AM (12 days ago) Jul 15

to tmlr-anno...@googlegroups.com

New certifications
==================

Survey Certification: Vision-Language Instruction Tuning: A Review and Analysis

Chen Li, Yixiao Ge, Dian Li, Ying Shan

https://openreview.net/forum?id=ul2tbUPtIQ

---

Accepted papers
===============

Title: Vision-Language Instruction Tuning: A Review and Analysis

Authors: Chen Li, Yixiao Ge, Dian Li, Ying Shan

Abstract: Instruction tuning is a crucial supervised training phase in Large Language Models (LLMs), aiming to enhance the LLM's ability to generalize instruction execution and adapt to user preferences. With the increasing integration of multi-modal data into LLMs, there is growing interest in Vision-Language Instruction Tuning (VLIT), which presents more complex characteristics compared to pure text instruction tuning. In this paper, we systematically review the latest VLIT settings and corresponding datasets in multi-modal LLMs and provide insights into the intrinsic motivations behind their design. For the first time, we offer a detailed multi-perspective categorization for existing VLIT datasets and identify the characteristics that high-quality VLIT data should possess. By incorporating these characteristics as guiding principles into the existing VLIT data construction process, we conduct extensive experiments and verify their positive impact on the performance of tuned multi-modal LLMs. Furthermore, we discuss the current challenges and future research directions of VLIT, providing insights for the continuous development of this field. The code and dataset related to this paper have been open-sourced at \url{https://github.com/palchenli/VL-Instruction-Tuning}.

URL: https://openreview.net/forum?id=ul2tbUPtIQ

---

Title: Harnessing the Power of Federated Learning in Federated Contextual Bandits

Authors: Chengshuai Shi, Ruida Zhou, Kun Yang, Cong Shen

Abstract: Federated learning (FL) has demonstrated great potential in revolutionizing distributed machine learning, and tremendous efforts have been made to extend it beyond the original focus on supervised learning. Among many directions, federated contextual bandits (FCB), a pivotal integration of FL and sequential decision-making, has garnered significant attention in recent years. Despite substantial progress, existing FCB approaches have largely employed their tailored FL components, often deviating from the canonical FL framework. Consequently, even renowned algorithms like FedAvg remain under-utilized in FCB, let alone other FL advancements. Motivated by this disconnection, this work takes one step towards building a tighter relationship between the canonical FL study and the investigations on FCB. In particular, a novel FCB design, termed FedIGW, is proposed to leverage a regression-based CB algorithm, i.e., inverse gap weighting. Compared with existing FCB approaches, the proposed FedIGW design can better harness the entire spectrum of FL innovations, which is concretely reflected as (1) flexible incorporation of (both existing and forthcoming) FL protocols; (2) modularized plug-in of FL analyses in performance guarantees; (3) seamless integration of FL appendages (such as personalization, robustness, and privacy). We substantiate these claims through rigorous theoretical analyses and empirical evaluations.

URL: https://openreview.net/forum?id=Z8wcREe9qV

---

Title: Diversity-Preserving $K$--Armed Bandits, Revisited

Authors: Hedi Hadiji, Sébastien Gerchinovitz, Jean-Michel Loubes, Gilles Stoltz

Abstract: We consider the bandit-based framework for diversity-preserving recommendations introduced by Celis et al. (2019), who approached it in the case of a polytope mainly by a reduction to the setting of linear bandits. We design a UCB algorithm using the specific structure of the setting and show that it enjoys a bounded distribution-dependent regret in the natural cases when the optimal mixed actions put some probability mass on all actions (i.e., when diversity is desirable). The regret lower bounds provided show that otherwise, at least when the model is mean-unbounded, a $\ln T$ regret is suffered. We also discuss an example beyond the special case of polytopes.

URL: https://openreview.net/forum?id=Viz7KBqO4A

---

Title: Masked multi-prediction for multi-aspect anomaly detection

Authors: Yassine Naji, Romaric Audigier, Aleksandr Setkov, Angelique Loesch, Michèle Gouiffès

Abstract: In this paper, we address the anomaly detection problem in the context of heterogeneous normal observations and propose an approach that accounts for this heterogeneity. Although prediction-based methods are common to learn normality, the vast majority of previous work predicts a single outcome, which is generally not sufficient to account for the multiplicity of possible normal observations. To address this issue, we introduce a new masked multi-prediction (MMP) approach that produces multiple likely normal outcomes, and show both theoretically and experimentally that it improves normality learning and leads to a better anomaly detection performance. In addition, we observed that normality can be characterized from multiple aspects, depending on the types of anomalies to be detected. Therefore, we propose an adaptation (MMP-AMS) of our approach to cover multiple aspects of normality such as appearance, motion, semantics and location. Since we model each aspect separately, our approach has the advantage of being interpretable and modular, as we can select only a subset of normality aspects. The experiments conducted on several benchmarks show the effectiveness of the proposed approach.

URL: https://openreview.net/forum?id=7wybYcK1pw

---

New submissions
===============

Title: Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey)

Abstract: Can we obtain insights about the brain using AI models? How is the information in deep learning models related to brain recordings? Can we improve AI models with the help of brain recordings? Such questions can be tackled by studying brain recordings like functional magnetic resonance imaging (fMRI). As a first step, the neuroscience community has contributed several large cognitive neuroscience datasets related to passive reading/listening/viewing of concept words, narratives, pictures, and movies. Encoding and decoding models using these datasets have also been proposed in the past two decades. These models serve as additional tools for basic cognitive science and neuroscience research. Encoding models aim at generating fMRI brain representations given a stimulus automatically. They have several practical applications in evaluating and diagnosing neurological conditions and thus may also help design therapies for brain damage. Decoding models solve the inverse problem of reconstructing the stimuli given the fMRI. They are useful for designing brain-machine or brain-computer interfaces. Inspired by the effectiveness of deep learning models for natural language processing, computer vision, and speech, several neural encoding and decoding models have been recently proposed. In this survey, we will first discuss popular representations of language, vision and speech stimuli, and present a summary of neuroscience datasets. Further, we will review popular deep learning based encoding and decoding architectures and note their benefits and limitations. Finally, we will conclude with a summary and discussion about future trends. Given the large amount of recently published work in the computational cognitive neuroscience (CCN) community, we believe that this survey enables an entry point for DNN researchers to diversify into CCN research.

URL: https://openreview.net/forum?id=YxKJihRcby

---

Title: Multiple-Resolution Tokenization for Time Series Forecasting with an Application to Pricing

Abstract: We propose a transformer architecture for time series forecasting with a focus on time series tokenisation and apply it to a real-world prediction problem from the pricing domain. Our architecture aims to learn effective representations at many scales across all available data simultaneously. The model contains a number of novel modules: a differentiated form of time series patching which employs multiple resolutions, a multiple-resolution module for time-varying known variables, a mixer-based module for capturing cross-series information, and a novel output head with favourable scaling to account for the increased number of tokens. We present an application of this model to a real world prediction problem faced by the markdown team at a very large retailer. On the experiments conducted our model outperforms in-house models and the selected existing deep learning architectures.

URL: https://openreview.net/forum?id=dknvQtQNja

---

Reply all

Reply to author

Forward

0 new messages