Starkly Speaking: BiomniBench: Evaluating AI Agents in Biology

46 views

Skip to first unread message

Hannes Stärk

unread,

Apr 12, 2026, 4:53:52 PMApr 12

to stark...@googlegroups.com

Hi together,

Reading group session tomorrow - not a paper, but should be fun!

Speaker:
Kexin Huang and Yunhao Qu from Phylo.

Paper:
BiomniBench: Evaluating AI Agents in Biology https://phylo.bio/blog/evaluating-ai-agents-in-biology
As AI agents become central to biological research, evaluation must keep pace. We examine why existing benchmarks fall short for biology, share lessons from our experience with BixBench including a verified subset, and introduce BiomniBench, a trace-based evaluation framework that scores agents on their analytical process, not just the final answer. Biomni Lab achieves state-of-the-art performance across both general-purpose and domain-specific agents on both benchmarks.

Meeting Details:
Every Monday at 9:00 PT / 12:00 ET / 18:00 CE(S)T
https://zoom.us/j/5775722530?pwd=ZzlGTXlDNThhUDZOdU4vN2JRMm5pQT09

Slack Workspace for discussion and paper voting:
https://join.slack.com/t/logag/shared_invite/zt-2zuxi7gd1-rLUgxg6gnCkhO7WlRsyElg

All information: Schedule of upcoming papers, recordings, mailing list:
https://hannes-stark.com/starkly-speaking

Hannes Stärk

Website: https://hannes-stark.com

PhD student at MIT

Reply all

Reply to author

Forward

0 new messages