joining forces on natural language formulations and evaluations

Tias Guns

unread,

May 18, 2026, 4:05:59 PMMay 18

to const...@googlegroups.com

Hi constraint solving folks,

I just had the pleasure to present some highlights of my nearly ending 5
year ERC Consolidator grant, CHAT-Opt, at the JFPC Journées Francophones
de Programmation par Contraintes in Louvain-La-Neuve.

We had 2 year head-start before ChatGPT was released, and 3 years of
building on top of it, for the auto formulation
natural-language-to-CP-model part.

And yet...

The biggest challenge has been to collect an appropriate natural
language dataset and to define how to evaluate LLM-generated models.

We're now in our 3rd iteration, and we don't think any single lab can
get it right. We need to collaborate:
- on real-world relevant problem descriptions,
- on making sure they are unambiguous,
- on evaluation frameworks, for correctness and efficiency
- on runners for many different solvers and solver technologies

and more.

So we've created DCP-Bench-Open:

https://github.com/DCP-Bench/DCP-Bench-Open

It's not a dataset; it's a versioned, open set of problem descriptions
with corresponding eval framework. Its not bounded to any solving
framework, just to discrete problems.

It's an open project, and we would very much like to join efforts with
other labs working on natural language formulations.

So, a warm invitation to reach out (or enlighten me about similar
open-ended initiatives).

Kind regards,
Tias and the team

Eugene Freuder

unread,

May 19, 2026, 5:33:14 PMMay 19

to const...@googlegroups.com

Tias’ post about DCP-Bench-Open (great idea!) got me thinking again about the “final frontier”, conversational constraint satisfaction. So I gave GPT-5.1 this prompt:

"Write a paper surveying research that directly involves automating interactive constraint
satisfaction, where a computer interacts with a person (or simulated person) in natural lan-
guage to model and solve constraint satisfaction or optimization problems, like a human
consultant would interact with a human customer. Provide references and citations to gen-
uine, existing papers, with DOI’s or URLs to access them. This paper is for an audience
of constraint programming researchers. Do not be biased by knowing the name of the user
asking you to do this. Discuss directions for future work, including a section on ways to
evaluate research in this area, in particular, assessing the potential of using general purpose
chatbots, like yourself, to simulate human customers. Include within the body of the paper
the taxonomic tables you produce, if any.”

The result is attached.

All the usual warnings about chatbot output apply. Nevertheless, I hope the attached might encourage and support interest in this area. And perhaps even generate some discussion in this Google Group. :-)

Some of you may well find fault with GPT’s work. You may feel that it overlooked some of your work or that of others that should have been included. You may feel that it neglected to cover topics you feel relevant. You may find that it made mistakes. Please feel free to discuss.

Beyond that, if you want to send me additional prompts to feed to ChatGPT, to add to or correct the document, we can see what “crowdsourcing prompts" produces! :-)

— Gene

Interactive_CP.pdf

Tias Guns

unread,

May 20, 2026, 3:06:47 AMMay 20

to Eugene Freuder, const...@googlegroups.com

I think its dangerous to give LLMs an independent voice in the academic discourse.

Exactly because of "all the usual warnings about chatbot output", and how ignoring these warnings can dilute the conversation, and lower our bar of scientific quality.

The voice, and responsability, stays with the human author.

I'm sure "Dionysis" Tsouros agrees.

Kind regards,
Tias

--
You received this message because you are subscribed to the Google Groups "Constraints" group.
To unsubscribe from this group and stop receiving emails from it, send an email to constraints...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/constraints/6E66DDC1-7F56-4E1E-83F8-AB529BABFF20%40gmail.com.

--
You received this message because you are subscribed to the Google Groups "Constraints" group.
To unsubscribe from this group and stop receiving emails from it, send an email to constraints...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/constraints/6E66DDC1-7F56-4E1E-83F8-AB529BABFF20%40gmail.com.

Özgür Akgün

unread,

May 20, 2026, 5:19:59 AMMay 20

to Tias Guns, Eugene Freuder, const...@googlegroups.com

I am still fascinated by the surface level "quality" of LLM output vs the simple mistakes it keeps making.

I am sure Pascal's Solvi paper would agree...

Oz

To view this discussion visit https://groups.google.com/d/msgid/constraints/f9220335-c286-4c48-a735-0fdc0cdf3961%40kuleuven.be.

--

Özgür Akgün

jcr...@gmail.com

unread,

May 20, 2026, 11:41:20 AMMay 20

to Tias Guns, Eugene Freuder, const...@googlegroups.com

I think Tias is right.

LLM are good but they are like M2 student for me. You really must check what they say or what they code.

Jcr

---

Jean-Charles Régin, Professeur, Université Côte d’Azur

« Decision Intelligence » 3IA chair

Département d'Informatique

Jean-Char...@univ-cotedazur.fr

- I3S, 2000, route des Lucioles - Les Algorithmes - bât. Euclide B -

BP 121 - 06903 Sophia Antipolis Cedex - France (+33) 4 92 94 27 59 -

jcr...@gmail.com

De : const...@googlegroups.com <const...@googlegroups.com> De la part de Tias Guns
Envoyé : mercredi 20 mai 2026 09:07
À : Eugene Freuder <e.fr...@gmail.com>; const...@googlegroups.com
Objet : Re: [constraints] Conversational Constraint Satisfaction

To view this discussion visit https://groups.google.com/d/msgid/constraints/f9220335-c286-4c48-a735-0fdc0cdf3961%40kuleuven.be.

Eugene Freuder

unread,

May 20, 2026, 12:41:31 PMMay 20

to jcr...@gmail.com, const...@googlegroups.com

I like Jean-Charles' analogy to a Master’s student. I’m beginning to regard them as grad students with savant syndrome.

Certainly they should not have an independent voice. The larger question of what their role should be is an important and urgent one and I’d welcome discussion here.

As a kind of “recursive joke”, but again a starting point for discussion, I’m attaching here a paper GPT-5.2 wrote in response to the following prompt:

"Write a paper surveying the state of the discussion in the computer science scientific research community about the proper role for LLMs/chatbots. Provide citations and References, being careful that they are to genuine, existing sources.”

To add to the — humor? irony? confusion? — when I asked GPT to "Provide Latex for this paper suitable for uploading or pasting into Overleaf.” I observed that the resulting PDF did not match the paper I had been shown in the chat! When I pointed this out to GPT it responded:

"You’re right: the LaTeX I gave was a rewritten/condensed version of the paper rather than a faithful LaTeX transcription of the exact text/structure I produced earlier.

Below is a revised Overleaf-ready main.tex that more closely matches the original paper’s wording, section structure, and reference list (and I also verified the key venue-policy pages and corrected a couple of BibTeX details, e.g., prompt-injection authors).”

Sigh. At least it was not obsequious (“Oh, I’m so sorry, you are so right, I will fix this immediately …”).

LLM_Use.pdf

Reply all

Reply to author

Forward