Call for participation
The Interspeech 2026 Audio Encoder Capability Challenge for Large Audio Language Models
The Interspeech 2026 Audio Encoder Capability Challenge, hosted by Xiaomi, University of Surrey, Tsinghua University and Dataocean AI, evaluates pre-trained audio encoders as front-end modules for large audio language models (LALM), focusing on their
ability to understand and represent audio semantics in complex scenarios.
The challenge adopts a unified end-to-end generative evaluation framework. Participants only need to submit a pre-trained encoder model, while the downstream task training and evaluation are completed by the organizers. The
organizers provide XARES-LLM benchmark,
an open-source evaluation system. XARES-LLM trains a typical LALM using the audio encoder provided by the user. The system automatically downloads training data, trains the LALM then tests various downstream tasks, providing scores for each.
Timelines:
-
December 15, 2025: Challenge announcement
-
February 12 23:59 AoE, 2026: Submissions Deadline
-
February 20, 2026: Final Ranking Announced
-
February 25 23:59 AoE, 2026: Paper Submission Deadline
Apologies for cross-posting.
Best wishes,
Wenwu
--
Wenwu Wang
Professor of Signal Processing and Machine Learning,
Centre for Vision Speech and Signal Processing (CVSSP)
Associate Head of External Engagement,
School of Computer Science and Electronic Engineering
AI Fellow,
Surrey Institute for People Centred AI
University of Surrey