Hi, and thank you for the great work!
I was wondering if there are any TLA+ datasets (coding problems, with problem statements, and their solutions); these could be used for a variety of things: by language models, or by benchmarks; for instance, Aider uses Exercism, but TLA+ is not supported for benchmarks. If we had that, I could for example run Aider against TLA+ exercises using different models, and get the results. I would be able to pick, based on the benchmarks, which models perform best for TLA+. Of course, I do understand that benchmarks do not really measure "objectively" how good an LLM is in TLA+, but at least, it gives an idea when picking up the tool/model. TLA+ could even have its own "independent" benchmark against different LLMs, and it could be shared with the community and people could run it against all sorts of LLMs and have results published.. I think the CommunityModules are already a good start for a dataset in that regard.
Also, I was thinking maybe we could have this great repo (https://github.com/tlaplus/awesome-tlaplus) have a dedicated section for LLM/AI and eventually point to the MCP server PR (I personally didn't know this was done until I saw it on LinkedIn).
Y.
> To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/tlaplus/6cdd7cf9-6bb5-423a-a799-b47fa730ea15n%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tlaplus/2EBFA6F0-83A1-4531-BD07-113C75DE121F%40lemmster.de.