Wolfram LLM Benchmarking Project

7 views
Skip to first unread message

Alex Shkotin

unread,
Oct 6, 2024, 5:10:34 AMOct 6
to ontolog-forum

Hi All,


In our last meeting today Mike Peters presented on the second slide the page from his blog about great Wolfram's team benchmarking project. With 114 LLM evaluated in coding with highest score for "semantics" just 52.2% 🐓


"All LLM are wrong but some are useful"


Alex


Reply all
Reply to author
Forward
0 new messages