Neurips 2025 Competition (Call for Participation): Advancing Theory-of-Mind in LLM Agents

118 views

Skip to first unread message

Jianzhu Yao

unread,

Aug 25, 2025, 1:07:55 PM8/25/25

to Machine Learning News

🧠 Join the MindGames Challenge at NeurIPS 2025: Advancing Theory-of-Mind in LLM Agents!

Ready to push the boundaries of AI social intelligence through belief modeling, deception detection, and strategic cooperation?

🌐 Competition Website: https://www.mindgamesarena.com/
📝 Register Now: https://docs.google.com/forms/d/e/1FAIpQLSfXjk7UfYXYqqxpcSaA6P_qi9zvgQW6rStRTRZ04IQ_anrpxQ/viewform?usp=preview

🎮 The Arena: Where LLM Agents Face Theory-of-Mind Challenges

While LLMs have revolutionized NLP, critical questions remain about their abilities to model beliefs, detect deception, coordinate under uncertainty, and plan strategically. MindGames provides the competitive arena where these theory-of-mind capabilities are put to the test through head-to-head gameplay.

Your agents will compete against other teams' agents in games that require:

🧠 Belief modeling and reasoning about others' mental states
🕵️ Deception and detection in adversarial settings
🤝 Strategic coordination under uncertainty
📈 Dynamic planning in multi-round interactions

🚀 What Makes MindGames Unique?

Live Competition, Not Static Evaluation

This isn't a benchmark - it's a live competitive arena where your agents face off against other teams' agents in real-time. Performance is measured through head-to-head competition using the TrueSkill rating system.

Natural Language Gameplay

All agent communication happens through natural language. Your agents must navigate complex social dynamics using text alone.

Battle Weekends

Every Saturday-Sunday (12PM ET), the arena comes alive with enhanced support and increased participation - perfect for testing and iterating your strategies!

Four Strategic Games to Master

🎭 Mafia - Social deduction with hidden roles and persuasion
🦌 Three Player IPD - Mixed-motive cooperation dynamics
⚔️ Colonel Blotto - Resource allocation under competition
🗣️ Codenames - Communication and team coordination

💡 Two Divisions, One Arena Open Division

Bring your most powerful models
No size restrictions
Test any approach - closed or open source

Efficient Agent Division

Max 8B parameters
Open-source models only
Fair competition for resource-constrained research

🏆 How It Works

Build your agent using any LLM or approach you prefer
Deploy to our game framework using the provided starter kit
Compete against other teams in automated matches
Climb the leaderboard through strategic gameplay
Iterate and improve based on match results

💰 Rewards for Excellence

🎁 $500 Modal Labs GPU Credits for EVERY team with a valid submission!

💵 $10,000+ Prize Pool:

$9,000 for top leaderboard positions
$1,000 for research impact
Growing prize pool with additional sponsors

🏅 NeurIPS 2025 Recognition: Top teams present at the conference

🛠️ Everything You Need to Compete

Game playing framework - Just plug in your agent
Starter kit with baseline agents to build upon
Game engines for local testing
Active Discord for strategy discussions
Documentation for all game rules and APIs

⏰ Competition Timeline

✅ Arena Open: July 7 - October 7, 2025
📅 Final Submissions: October 7, 2025
📅 Winners Announced: October 15, 2025
📅 NeurIPS Presentation: December 2025

🔬 Why Compete?

MindGames offers a unique opportunity to:

Test your agents against the current best in the field
Explore how LLMs handle theory-of-mind challenges
Learn from diverse strategies employed by other teams
Contribute to understanding AI's social intelligence capabilities
Network with top researchers in multi-agent AI

👥 World-Class Organization

Competition organized by researchers from UT Austin, Princeton, TextArena, Sentient Foundation, Radboud University, NYU Shanghai, King's College London, and Meta.

Supported by Modal Labs, Sentient Foundation, Mithril, TextArena, and Intersection Research.

🚦 Ready to Enter the Arena?

No game theory expertise required - if you can prompt or fine-tune an LLM, you can compete! The arena is waiting to test what your agents can do when facing other minds.

The arena is live. The games have begun. Will your agents rise to the challenge?

Reply all

Reply to author

Forward

0 new messages