Call for participation - The Interspeech 2026 Audio Encoder Capability Challenge for Large Audio Language Models

24 views
Skip to first unread message

Wenwu Wang

unread,
Jan 5, 2026, 12:52:13 PM (3 days ago) Jan 5
to 'George Angelos Papadopoulos' via Machine Learning News
Call for participation 
The Interspeech 2026 Audio Encoder Capability Challenge for Large Audio Language Models
The Interspeech 2026 Audio Encoder Capability Challenge, hosted by Xiaomi, University of Surrey, Tsinghua University and Dataocean AI, evaluates pre-trained audio encoders as front-end modules for large audio language models (LALM), focusing on their ability to understand and represent audio semantics in complex scenarios.
The challenge adopts a unified end-to-end generative evaluation framework. Participants only need to submit a pre-trained encoder model, while the downstream task training and evaluation are completed by the organizers. The organizers provide XARES-LLM benchmark, an open-source evaluation system. XARES-LLM trains a typical LALM using the audio encoder provided by the user. The system automatically downloads training data, trains the LALM then tests various downstream tasks, providing scores for each.
Timelines:
  • December 15, 2025: Challenge announcement
  • February 12 23:59 AoE, 2026: Submissions Deadline
  • February 20, 2026: Final Ranking Announced
  • February 25 23:59 AoE, 2026: Paper Submission Deadline
Apologies for cross-posting.
Best wishes,
 
Wenwu
 
 
--
Wenwu Wang

Professor of Signal Processing and Machine Learning,
Centre for Vision Speech and Signal Processing (CVSSP)

Associate Head of External Engagement, 
School of Computer Science and Electronic Engineering

AI Fellow,
Surrey Institute for People Centred AI

University of Surrey
Guildford, GU2 7XH
United Kingdom
Phone: +44 (0) 1483 686039
Fax: +44 (0) 1483 686031
Email: w.w...@surrey.ac.uk

Reply all
Reply to author
Forward
0 new messages