Hi All,
In our last meeting today Mike Peters presented on the second slide the page from his blog about great Wolfram's team benchmarking project. With 114 LLM evaluated in coding with highest score for "semantics" just 52.2% 🐓
"All LLM are wrong but some are useful"
Alex