Call for participation - The Interspeech 2026 Audio Encoder Capability Challenge for Large Audio Language Models

24 views

Skip to first unread message

Wenwu Wang

unread,

Jan 5, 2026, 12:52:13 PM (3 days ago) Jan 5

to 'George Angelos Papadopoulos' via Machine Learning News

Call for participation

The Interspeech 2026 Audio Encoder Capability Challenge for Large Audio Language Models

The Interspeech 2026 Audio Encoder Capability Challenge, hosted by Xiaomi, University of Surrey, Tsinghua University and Dataocean AI, evaluates pre-trained audio encoders as front-end modules for large audio language models (LALM), focusing on their ability to understand and represent audio semantics in complex scenarios.

The challenge adopts a unified end-to-end generative evaluation framework. Participants only need to submit a pre-trained encoder model, while the downstream task training and evaluation are completed by the organizers. The organizers provide XARES-LLM benchmark, an open-source evaluation system. XARES-LLM trains a typical LALM using the audio encoder provided by the user. The system automatically downloads training data, trains the LALM then tests various downstream tasks, providing scores for each.

More details can be found from: https://dataoceanai.github.io/Interspeech2026-Audio-Encoder-Challenge/

Timelines:

December 15, 2025: Challenge announcement
February 12 23:59 AoE, 2026: Submissions Deadline
February 20, 2026: Final Ranking Announced
February 25 23:59 AoE, 2026: Paper Submission Deadline

Apologies for cross-posting.

Best wishes,

Wenwu

--
Wenwu Wang

Professor of Signal Processing and Machine Learning,

Centre for Vision Speech and Signal Processing (CVSSP)

Associate Head of External Engagement,

School of Computer Science and Electronic Engineering

AI Fellow,

Surrey Institute for People Centred AI

University of Surrey

Guildford, GU2 7XH
United Kingdom
Phone: +44 (0) 1483 686039
Fax: +44 (0) 1483 686031
Email: w.w...@surrey.ac.uk