ICME 2026 Grand Challenge on Academic Text-to-Music Generation

14 views

Skip to first unread message

Hao-Wen Dong

unread,

Feb 4, 2026, 10:38:19 PMFeb 4

to musicalme...@googlegroups.com

Dear AIMC Community,

We are happy to announce the ICME 2026 Grand Challenge on Academic Text-to-Music Generation! The ATTM challenge focuses on training text-to-music models from scratch using a standardized, CC-licensed dataset. The goal is to shift attention away from data scale and pre-trained black-box models, and toward algorithmic efficiency, model design, and musical intelligence.

The ATTM challenge will launch on February 10, and the final audio submission will be due on April 23. Please see below for more information.

We look forward to contributions from the AIMC community! If you have any questions, please feel free to contact us.

Best regards,

Hao-Wen Dong, on behalf of the ATTM Challenge Organizers

---

Challenge Website

ntu-musicailab.github.io/ICME26-ATTM-Grand-Challenge/

Key Principles

Core generative models must be trained from scratch

No pre-trained weights are allowed for the main music generation model.

Auxiliary components may use public checkpoints, including:

Audio tokenizers / autoencoders
Audio Language Models (ALMs) for captioning
Vocoders or audio enhancement models
Proprietary or non-reproducible models are strictly prohibited.

Fully automatic generation only

No human-in-the-loop annotation, manual editing, or cherry-picking of samples is allowed.

Instrumental music only

All training data is processed to remove vocals, and generated outputs must be purely instrumental.

Standardized text prompts

Organizers will provide official caption sets (generated by Music Flamingo or Qwen2-audio) to ensure consistent evaluation across teams, though teams may create their own captions using public ALMs.

Tracks

Efficiency Track

Maximum of 500M parameters for the core generative model
Designed to encourage innovation in efficient architectures
Suitable for student teams and resource-constrained labs

Performance Track

No parameter limit
Focuses on pushing the upper bound of performance under academic data constraints
Suitable for teams exploring large or complex architectures

Awards

We are proud to partner with Moises to offer cash awards to the best performing teams in this challenge.

Efficiency Track

First Prize: $1,000 USD
Second Prize: $500 USD

Performance Track

First Prize: $1,000 USD
Second Prize: $500 USD

Evaluation Criteria

All teams will generate 100 audio samples based on a hidden set of test prompts provided by the organizers. The evaluation process is performed on these submitted samples.

Phase 1: Objective Evaluation (Scorecard)

All submissions are first ranked using a composite score based on the following metrics:

Audio Quality — Fréchet Audio Distance (FAD)

Measures distributional similarity between generated audio and a hidden reference set.

Semantic Alignment — CLAP Score

Evaluates how well the generated audio matches the input text prompt.

Concept Coverage Score (CCS / K–M Metric)

Each prompt contains M musical concepts (e.g., tempo, instrumentation, style).
Audio Language Models act as blind judges to detect whether each concept is present.
A score of K / M is assigned per prompt and averaged across the evaluation set.

Phase 2: Human Evaluation (MOS)

Based on Phase 1 rankings, the Top N teams per track advance to a formal Mean Opinion Score (MOS) study conducted by expert listeners. (N will be determined after the registration deadline based on the number of participants in each track.)

Evaluation dimensions include:

Audio Quality
Musicality (rhythmic stability, harmonic progressions, phrasing)
Prompt Adherence

How to Participate

Registration

Teams must register before March 20 to indicate their intent to participate. Registration helps organizers prepare evaluation resources and does not require a completed system. Registration instructions and links will be released at the official launch.

Submission

Teams must submit their final entries before April 23. Final submissions must include:

Generated audio for 100 hidden test prompts

Format: WAV or MP3
Sample rate: 44.1 kHz
Duration: exactly 10 seconds used for evaluation

Model code for parameter verification and reproducibility
A short Grand Challenge paper (up to 4 pages): Grand Challenge papers are required only from finalist teams (announced April 30) and must be submitted by May 15. Papers from non-finalist teams, though not required, are still encouraged and welcome.

Detailed submission instructions will be released at the official launch.

Top teams will be invited to present their work at the ICME 2026 Grand Challenge session.

Important Dates

Feb 10: Official launch
Mar 20: Registration deadline
Mar 30: Dry-run submission deadline (pipeline verification)
Apr 20: Final test prompts released
Apr 23: Final audio submission deadline (72-hour window)
Apr 30: Finalists announcement
May 15: Grand Challenge paper submission deadline
May 22: Final MOS results & announcement of winners & paper acceptance notification
May 30: Camera-ready and author registration deadline

Organizers

Yi-Hsuan (Eric) Yang
Hao-Wen (Herman) Dong
Hung-Yi Lee
Fang-Chih (Andrew) Hsieh
Wei-Jaw (Lonian) Lee

Hao-Wen (Herman) Dong
Assistant Professor
Department of Performing Arts Technology

School of Music, Theatre & Dance
University of Michigan

hermandong.com

Reply all

Reply to author

Forward

0 new messages