Neurips 2025 Competition (Call for Participation): Advancing Theory-of-Mind in LLM Agents

118 views
Skip to first unread message

Jianzhu Yao

unread,
Aug 25, 2025, 1:07:55 PM8/25/25
to Machine Learning News

🧠 Join the MindGames Challenge at NeurIPS 2025: Advancing Theory-of-Mind in LLM Agents!

Ready to push the boundaries of AI social intelligence through belief modeling, deception detection, and strategic cooperation?

🌐 Competition Website: https://www.mindgamesarena.com/
📝 Register Now: https://docs.google.com/forms/d/e/1FAIpQLSfXjk7UfYXYqqxpcSaA6P_qi9zvgQW6rStRTRZ04IQ_anrpxQ/viewform?usp=preview

🎮 The Arena: Where LLM Agents Face Theory-of-Mind Challenges

While LLMs have revolutionized NLP, critical questions remain about their abilities to model beliefs, detect deception, coordinate under uncertainty, and plan strategically. MindGames provides the competitive arena where these theory-of-mind capabilities are put to the test through head-to-head gameplay.

Your agents will compete against other teams' agents in games that require:

  • 🧠 Belief modeling and reasoning about others' mental states
  • 🕵️ Deception and detection in adversarial settings
  • 🤝 Strategic coordination under uncertainty
  • 📈 Dynamic planning in multi-round interactions
🚀 What Makes MindGames Unique?
Live Competition, Not Static Evaluation

This isn't a benchmark - it's a live competitive arena where your agents face off against other teams' agents in real-time. Performance is measured through head-to-head competition using the TrueSkill rating system.

Natural Language Gameplay

All agent communication happens through natural language. Your agents must navigate complex social dynamics using text alone.

Battle Weekends

Every Saturday-Sunday (12PM ET), the arena comes alive with enhanced support and increased participation - perfect for testing and iterating your strategies!

Four Strategic Games to Master

  1. 🎭 Mafia - Social deduction with hidden roles and persuasion
  2. 🦌 Three Player IPD - Mixed-motive cooperation dynamics
  3. ⚔️ Colonel Blotto - Resource allocation under competition
  4. 🗣️ Codenames - Communication and team coordination
💡 Two Divisions, One Arena Open Division
  • Bring your most powerful models
  • No size restrictions
  • Test any approach - closed or open source
Efficient Agent Division
  • Max 8B parameters
  • Open-source models only
  • Fair competition for resource-constrained research
🏆 How It Works
  1. Build your agent using any LLM or approach you prefer
  2. Deploy to our game framework using the provided starter kit
  3. Compete against other teams in automated matches
  4. Climb the leaderboard through strategic gameplay
  5. Iterate and improve based on match results
💰 Rewards for Excellence

🎁 $500 Modal Labs GPU Credits for EVERY team with a valid submission!

💵 $10,000+ Prize Pool:

  • $9,000 for top leaderboard positions
  • $1,000 for research impact
  • Growing prize pool with additional sponsors

🏅 NeurIPS 2025 Recognition: Top teams present at the conference

🛠️ Everything You Need to Compete
  • Game playing framework - Just plug in your agent
  • Starter kit with baseline agents to build upon
  • Game engines for local testing
  • Active Discord for strategy discussions
  • Documentation for all game rules and APIs
Competition Timeline
  • Arena Open: July 7 - October 7, 2025
  • 📅 Final Submissions: October 7, 2025
  • 📅 Winners Announced: October 15, 2025
  • 📅 NeurIPS Presentation: December 2025
🔬 Why Compete?

MindGames offers a unique opportunity to:

  • Test your agents against the current best in the field
  • Explore how LLMs handle theory-of-mind challenges
  • Learn from diverse strategies employed by other teams
  • Contribute to understanding AI's social intelligence capabilities
  • Network with top researchers in multi-agent AI
👥 World-Class Organization

Competition organized by researchers from UT Austin, Princeton, TextArena, Sentient Foundation, Radboud University, NYU Shanghai, King's College London, and Meta.

Supported by Modal Labs, Sentient Foundation, Mithril, TextArena, and Intersection Research.

🚦 Ready to Enter the Arena?

No game theory expertise required - if you can prompt or fine-tune an LLM, you can compete! The arena is waiting to test what your agents can do when facing other minds.

The arena is live. The games have begun. Will your agents rise to the challenge?

Reply all
Reply to author
Forward
0 new messages