Just published in Nat.Comm.: "Mapping global dynamics of benchmark creation and saturation in artificial intelligence"

40 views
Skip to first unread message

Matthias Samwald

unread,
Nov 21, 2022, 4:10:29 AM11/21/22
to AI Evaluation
Hi everyone,

Hopefully of interest to some --We just published a large-scale analysis of AI benchmark data in the journal Nature Communications:

Mapping global dynamics of benchmark creation and saturation in artificial intelligence
Abstract: Benchmarks are crucial to measuring and steering progress in artificial intelligence (AI). However, recent studies raised concerns over the state of AI benchmarking, reporting issues such as benchmark overfitting, benchmark saturation and increasing centralization of benchmark dataset creation. To facilitate monitoring of the health of the AI benchmarking ecosystem, we introduce methodologies for creating condensed maps of the global dynamics of benchmark creation and saturation. We curate data for 3765 benchmarks covering the entire domains of computer vision and natural language processing, and show that a large fraction of benchmarks quickly trends towards near-saturation, that many benchmarks fail to find widespread utilization, and that benchmark performance gains for different AI tasks are prone to unforeseen bursts. We analyze attributes associated with benchmark popularity, and conclude that future benchmarks should emphasize versatility, breadth and real-world utility.


Short-form summaries:




Jose Hernandez-Orallo

unread,
Mar 7, 2023, 3:42:26 AM3/7/23
to ai-...@googlegroups.com
Dear all,

We're happy to inform that the event:

"Predictable AI: Evaluation, Anticipation and Control"

https://www.predictable-ai.org/march2023event

will be broadcast tomorrow. If you want to follow some of the sessions,
this is the youtube link:

https://www.youtube.com/watch?v=oG6mPc7Q4Xg

Best wishes,

Jose.


Reply all
Reply to author
Forward
0 new messages