[CfP] BeTraC: Beyond Transcription Challenge @ IEEE SLT 2026 — Data live, registration open
12 views
Skip to first unread message
Thomas Schaaf
unread,
10:12 AM (11 hours ago) 10:12 AM
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to ml-...@googlegroups.com
(Apologies for cross-posting)
Dear colleagues,
We are pleased to announce BeTraC, the Beyond Transcription Challenge, an IEEE SLT 2026 shared task addressing a foundational question in audio AI: Can a model reason over speech without first converting it to text?
== Motivation ==
Current speech models still struggle to extract meaning directly from audio, particularly when the signal includes overlapping speakers, ambient sounds, and room acoustics. Clinical note generation from doctor–patient conversations is an ideal stress test: the model must attend to who said what, filter environmental noise, and produce faithful structured output.
On the Synth-DoPaCo dataset, end-to-end models hallucinate at alarming rates — 99–100% of clinical claims are unsupported by the source audio, compared to 21–23% for traditional transcribe-then-summarize pipelines. BeTraC aims to close this gap.
== Tracks (open-weight models only; no intermediate transcription) ==
* Lightweight (≤ 6B parameters): direct end-to-end audio-to-SOAP. No tool use, no agentic pipelines. * Heavyweight (≤ 36B parameters): tool use and agentic architectures permitted; only the final model generates text from audio.
== Synth-DoPaCo dataset ==
* 8,800 synthetic doctor–patient conversations (~1,329 hours) * 66 ambient sound classes, room reverberation, Opus compression * Available on Hugging Face: https://huggingface.co/datasets/betrac
== Key dates ==
* Apr 2, 2026 — Training + dev data release (live now) * May 4, 2026 — Open-source inclusion proposals deadline * Jun 24, 2026 — System description submission deadline * ~Jul 1, 2026 — Test SOAP notes due (~1 week after test audio release) * Jul 8, 2026 — Challenge paper submission
Baselines are posted; team registration is open. If you work on speech, audio understanding, or multimodal AI, we would love to have you compete.
The BeTraC organizers: Andrew Perrault (The Ohio State University) Jiyun (Amy) Chun (The Ohio State University) Samuele Cornell (Carnegie Mellon University) Siddhant Arora (Carnegie Mellon University) Syed-Amad Hussain (The Ohio State University / Nationwide Children's Hospital) Thomas Schaaf (Solventum / CMU LTI) Markus Müller (Amazon) Leibny Paola Garcia (Johns Hopkins University, CLSP) Ahmed Hassoon (Johns Hopkins Bloomberg School of Public Health)