[CfP] BeTraC: Beyond Transcription Challenge @ IEEE SLT 2026 — Data live, registration open

12 views
Skip to first unread message

Thomas Schaaf

unread,
10:12 AM (11 hours ago) 10:12 AM
to ml-...@googlegroups.com
(Apologies for cross-posting)

Dear colleagues,

We are pleased to announce BeTraC, the Beyond Transcription Challenge, an
IEEE SLT 2026 shared task addressing a foundational question in audio AI:
Can a model reason over speech without first converting it to text?

== Motivation ==

Current speech models still struggle to extract meaning directly from
audio, particularly when the signal includes overlapping speakers,
ambient sounds, and room acoustics. Clinical note generation from
doctor–patient conversations is an ideal stress test: the model must
attend to who said what, filter environmental noise, and produce
faithful structured output.

On the Synth-DoPaCo dataset, end-to-end models hallucinate at alarming
rates — 99–100% of clinical claims are unsupported by the source audio,
compared to 21–23% for traditional transcribe-then-summarize pipelines.
BeTraC aims to close this gap.

== Tracks (open-weight models only; no intermediate transcription) ==

  * Lightweight (≤ 6B parameters): direct end-to-end audio-to-SOAP.
    No tool use, no agentic pipelines.
  * Heavyweight (≤ 36B parameters): tool use and agentic architectures
    permitted; only the final model generates text from audio.

== Synth-DoPaCo dataset ==

  * 8,800 synthetic doctor–patient conversations (~1,329 hours)
  * 66 ambient sound classes, room reverberation, Opus compression
  * Available on Hugging Face: https://huggingface.co/datasets/betrac

== Key dates ==

  * Apr 2, 2026   — Training + dev data release (live now)
  * May 4, 2026   — Open-source inclusion proposals deadline
  * Jun 24, 2026  — System description submission deadline
  * ~Jul 1, 2026  — Test SOAP notes due (~1 week after test audio release)
  * Jul 8, 2026   — Challenge paper submission

Baselines are posted; team registration is open. If you work on speech,
audio understanding, or multimodal AI, we would love to have you
compete.

Website:    https://betrac.github.io
Baselines:  https://github.com/betrac/betrac-2026-baseline
Contact:    bet...@googlegroups.com

The BeTraC organizers:
  Andrew Perrault      (The Ohio State University)
  Jiyun (Amy) Chun     (The Ohio State University)
  Samuele Cornell      (Carnegie Mellon University)
  Siddhant Arora       (Carnegie Mellon University)
  Syed-Amad Hussain    (The Ohio State University / Nationwide Children's Hospital)
  Thomas Schaaf        (Solventum / CMU LTI)
  Markus Müller        (Amazon)
  Leibny Paola Garcia  (Johns Hopkins University, CLSP)
  Ahmed Hassoon        (Johns Hopkins Bloomberg School of Public Health)

Reply all
Reply to author
Forward
0 new messages