Starkly Speaking: BiomniBench: Evaluating AI Agents in Biology

44 views
Skip to first unread message

Hannes Stärk

unread,
Apr 12, 2026, 4:53:52 PMApr 12
to stark...@googlegroups.com
Hi together,

Reading group session tomorrow - not a paper, but should be fun!

Speaker:
Kexin Huang and Yunhao Qu from Phylo.

Paper:
BiomniBench: Evaluating AI Agents in Biology https://phylo.bio/blog/evaluating-ai-agents-in-biology
As AI agents become central to biological research, evaluation must keep pace. We examine why existing benchmarks fall short for biology, share lessons from our experience with BixBench including a verified subset, and introduce BiomniBench, a trace-based evaluation framework that scores agents on their analytical process, not just the final answer. Biomni Lab achieves state-of-the-art performance across both general-purpose and domain-specific agents on both benchmarks.

Meeting Details:
Every Monday at 9:00 PT / 12:00 ET /  18:00 CE(S)T  
https://zoom.us/j/5775722530?pwd=ZzlGTXlDNThhUDZOdU4vN2JRMm5pQT09

Slack Workspace for discussion and paper voting:
https://join.slack.com/t/logag/shared_invite/zt-2zuxi7gd1-rLUgxg6gnCkhO7WlRsyElg

All information: Schedule of upcoming papers, recordings, mailing list:
https://hannes-stark.com/starkly-speaking

Hannes Stärk
PhD student at MIT
Reply all
Reply to author
Forward
0 new messages