Hi All,
We invite submissions of 4-page short papers for our upcoming workshop on evaluating AI agents. This workshop focuses on rigorous evaluation methods, RL environment design principles, benchmarks, and real-world case studies, with a particular interest in systematic measurement of agent behavior.
We welcome work-in-progress research; submissions do not need to be fully finished papers.
Topics of Interest
- Evaluation Techniques: Interventional evaluation, causal/counterfactual methods, and automated graders (verifiers, rubrics, LLM-as-a-judge).
- Data and Benchmarks: New benchmarks, analyses of existing ones, and discussions on design validity and limitations.
- RL Environments for LLM-Based Agents: Effective environment design, synthetic data, tool design, and infrastructure.
- Enterprise Case Studies: Evaluation in production settings and lessons from successful deployments.
- Specific Capabilities: Code execution, NL2SQL, computer use, multimodal I/O, MCP tools, memory, and web search.
Submission Information
- Review Process: Single-blind (non-anonymized).
- Format: Accepted papers will be featured in an interactive poster session. High-scoring papers will be selected for Contributed Talks and eligible for the Best Paper/Poster Award.
- Requirements: Submissions under review elsewhere are permitted, but previously published papers are not. At least one author must register and attend.
Key Dates
- Submission Deadline: May 11, 2026
- Notification: May 18, 2026
- Camera-ready: May 22, 2026
- Workshop Date: May 26, 2026
Links
We look forward to your submissions and to discussing the future of AI agent evaluation. For questions, please contact the organizers at rl-...@googlegroups.com.
On behalf of the organizers,
Rasool