Dear colleagues,
📢 Call for Participation: CAR-bench Challenge @ IJCAI-ECAI 2026, Bremen
The first competition on LLM agent reliability. Build an agent that completes multi-turn tasks across 58 tools and 19 domain policies in an automotive voice assistant setting, but critically, also knows when to refuse, clarify, or admit it can't help. Baseline frontier LLMs manage only 58% consistency. Can your agent harnessing, planning, self-verification, or reliability design do better?
Website: https://car-bench.github.io/car-bench/
Co-organized by
Elisabeth André (Univ. Augsburg)
Lukas Stappen (BMW)
Patrick Dreisch (Anthropic)
Natalia Vassilieva (Cerebras)
Raj Tumuluri (OpenStream.ai), Erik Cambria (NTU Singapore), Iryna Gurevych (TU Darmstadt), Varin Sikka (Stanford), and Johannes Kirmayr (BMW / Univ. Augsburg).
Prizes:
Two tracks:
Winners receive the opportunity to present at IJCAI-ECAI 2026 and co-author the competition report paper.
Key dates (AoE):
The competition runs online.
Accepted at ACL 2026 Main · HF Paper of the Day · 1st at UC Berkeley AgentX
Website: https://car-bench.github.io/car-bench/
Please share with colleagues and students who might be interested.
Best regards,
CAR-bench organizing team