A chat about the "Holy Grail"

123 views
Skip to first unread message

Eugene Freuder

unread,
Mar 18, 2026, 4:24:32 PMMar 18
to Constraints
I just had an interesting chat with ChatGPT on progress towards the "Holy Grail": 


I hope this will generate some discussion in this Group! 

(Note I had a prepared list of inputs for the chat, so didn't follow up at the time on the directions ChatGPT proposed, though that could be interesting as well.)

Please note that a panel is planned for CP 2026 on "Thirty Years of Progress Towards the Holy Grail". Maybe we should have ChatGPT (or Claude or ...) occupy a chair. :-) 

Eugene Freuder

unread,
Mar 18, 2026, 8:52:58 PMMar 18
to Constraints
And here is what Claude had to say:


(Incidentally, I'm just using the free ChatGPT and Claude websites.)

What do YOU have to say?

-- Gene

Luis Quesada

unread,
Mar 19, 2026, 12:54:35 PMMar 19
to Eugene Freuder, Constraints
Dear Gene,

Thanks for sharing the chat—very interesting indeed.

It seems to me that there is a great opportunity for symbolic reasoning in the verification of code that has been automatically generated. I would be very interested in learning more about efforts that are already underway in this area within our community.

On a separate note, the idea of having ChatGPT as a member of the panel does not sound too crazy to me. In fact, I have to say that ChatGPT understands my Spanish accent much better than Zoom. Indeed, sometimes Zoom asks me to confirm my language (i.e., Zoom’s polite way of saying that it is struggling with my accent:-)). ChatGPT understands me perfectly. So much so that I now find myself interacting more frequently with ChatGPT via voice, often using prompts that are more than 100 words long. I think this is a very good way of practising (for non-native speakers), since ChatGPT’s pronunciation is quite good.

Cheers,  
Luis

--
You received this message because you are subscribed to the Google Groups "Constraints" group.
To unsubscribe from this group and stop receiving emails from it, send an email to constraints...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/constraints/3852fe6d-8688-4bbc-9e70-3f2951d8603dn%40googlegroups.com.

Eugene Freuder

unread,
Mar 25, 2026, 1:37:07 PMMar 25
to Constraints
There are 349 members of this Group. I've only received one (1) response (thanks, Luis!) to the posts about my conversations with ChatGPT and Claude relating to the Holy Grail. In general, I'd like to see this Group used more as a discussion forum, in addition to the excellent service it performs as a bulletin board. And I think this particular discussion could be a very good one. So I'm going to try bribing -- er, encouraging :-) participation, as follows: If we can involve at least 10 additional participants (approximately 3% of the Group membership) in this discussion within the next week (by April 1, and I'm not fooling), one of the discussion participants will be offered the opportunity to be a member of the panel at CP'26 on Thirty Years of Progress Towards the Holy Grail! (Of course, you are welcome to participate in the discussion even if you cannot serve on the panel.) 

On Wednesday, March 18, 2026 at 4:24:32 PM UTC-4 Eugene Freuder wrote:

Deepak Mehta

unread,
Mar 25, 2026, 6:37:40 PMMar 25
to Eugene Freuder, Constraints

Dear Gene, Luis, and colleagues,

Thank you for sharing the conversations.

One observation that emerges strongly from both the ChatGPT and Claude exchanges is that the Holy Grail implicitly assumes that the decision problem can be clearly defined. Much of the discussion understandably focuses on the ability of systems to translate natural language descriptions into formal models and solver code. Recent progress in generative AI is indeed impressive in this respect. LLMs are clearly accelerating the journey from well-articulated problem descriptions to candidate mathematical models, significantly reducing the cost of exploring alternative formulations.

However, my experience building decision optimisation systems is that the main difficulty often arises earlier in the pipeline.

Real-world optimisation problems rarely arrive as clean natural language inputs. Instead, they typically emerge through an iterative process involving conflicting stakeholder perspectives, partially defined objectives, evolving data definitions, hidden constraints, and shifting priorities. A substantial portion of effort is devoted to clarifying what the decision problem actually is: what is controllable, which trade-offs are acceptable, and which constraints reflect the current process versus temporary artefacts of the current process.

In this sense, the main difficulty is not purely mathematical, it is epistemic.

Recent research on LLM-based modelling often assumes a pipeline of the form:

clear natural language description → mathematical model → solver

In practice, the process often looks more like:

vague problem context (business) → understood decision problem (technical, and possibly imperfect) → solution approach → mathematical model → solver

This process is deeply iterative and continues after deployment, as processes evolve, systems changes, and new constraints arrive, policy adjustments happen, and data changes accumulate.

Luis’ point about symbolic reasoning for verification is also relevant here. The combination of generative methods with formal verification, explanation, and iterative refinement loops may bridge the gap.

Personally, explanation itself remains a challenge. I often encounter questions such as “Why is the solver recommending x = 5?”. When conflicts arise, explaining them purely in terms of technical constraint interactions is often insufficient. Explanations need to be lifted to a higher level of abstraction so that business stakeholders can understand, trust, and gain confidence in the system's behaviour.

LLMs are clearly accelerating model construction, which is a significant step forward toward the Holy Grail. However, a stronger interpretation of the Holy Grail also requires progress in accelerating problem understanding.

I look forward to the discussion.

Best regards,
Deepak

steven kelk

unread,
Mar 26, 2026, 5:00:26 AMMar 26
to Constraints
Hi Gene, all,

I'm something of a lurker in the CP world but I use it occasionally (in particular, MiniZinc) together with a blend of ILP, SAT and branching/data-reduction techniques from parameterized complexity.

I read the transcripts from ChatGPT and Claude with interest. ChatGPT's response in particular seems very mature, although as Deepak mentions the process of coming to a model in the first place is a complex iterative social process.

The question about the Holy Grail is always at the back of my mind. I teach various courses about (I)LP and CP am always explaining that solver-based technologies are important because they allow the separation of specification and implementation. Although the truth is always more nuanced -- what happens if the first formulation runs too slowly? -- I notice that it is pedagogically clearer, at least when teaching students who have not seen formulate-then-solve technolologies before, on the cases when the one-shot approach does work.

Now of course the LLMs are getting very good at modelling. When the one-shot formulation run too slowly, I notice that at least for fairly clean models the LLMs are also pretty good at suggesting possible speed-ups. I suspect that the next phase is where the AI is more actively and autonomously involved in the phase after the initial formulation has been tried (and found to be too slow). I don't currently use Claude Code, simply because my job doesn't require me to do much coding and experimentation at the moment, but it might be that this is already something that is happening as a result of the growing power (and ever larger context windows) of such tools? I would be curious to hear from anyone using tools like Claude Code in this way (in relation to CP, ILP etc).

Cheers,

Steven






Op woensdag 25 maart 2026 om 23:37:40 UTC+1 schreef deepak...@gmail.com:

Steven Prestwich

unread,
Mar 27, 2026, 7:21:51 AMMar 27
to Constraints
Hi, Gene et al.

Sorry for the slow response: this is an interesting discussion but
I've been swamped with end-of-semester duties!

I like Claude's point that LLMs don't solve the Holy Grail because
they need humans in the loop.  ChatGPT's point that LLMs have only
recently contributed, and haven't yet done much, seems obvious as they
only recently became available.  But they're already transforming
software engineering.  Programmers have said that they spent their
lives writing code because they enjoy it, but that their job has been
reduced to prompt engineering.

I think we should expect constraint modelling to follow a similar path
to software engineering: needing human interaction to check
correctness, and to cope with the many other considerations such as
compactness, efficiency with respect to different solvers, etc.  I
like ChatGPT's idea of "LLM-assisted modeling pipelines" and its
vision for the future is persuasive (I wonder whose writing that's
based on?).

Maybe our role as constraint modellers will move toward prompt
engineering, and our role as researchers will be to publish new
techniques that LLMs can then recommend to others.  This wouldn't be a
full solution to the Holy Grail but it would be great progress.

Steve

Gerhard Friedrich

unread,
Mar 27, 2026, 11:17:56 AMMar 27
to Constraints, Eugene Freuder
Dear Gene,

Thank you very much for your work on the “Holy Grail” and for insisting on pushing forward. I consider this as critical for the future of the field. 

I am referring to “large-scale” combinatorial decision (actually generating an acceptable solution) and optimization problems in automated engineering (configuration) and manufacturing scheduling, which we encountered at Siemens and voestalpine (see the problems formulated in the cited papers below). To some extent also to hard problem instances that are not large. 

At most 10% of the problems at Siemens can be solved in acceptable runtime just by modeling and then using a general problem solver (CP/ASP/MIP/SAT ....). Usually heavy engineering of human experts is required. So currently we are very far off the holy grail in these application areas regarding solving performance. 

BUT, we have signs that LLMs are probably providing a significant step (leap?) forward towards the holy grail of solving performance. Unfortunately, it is too early to say more, but I'm very optimistic regarding the progress in the near future. 

Best regards,
Gerhard Friedrich

============  

@article{FleischanderlFHSS98,
  author       = {Gerhard Fleischanderl and
                  Gerhard Friedrich and
                  Alois Haselb{\"{o}}ck and
                  Herwig Schreiner and
                  Markus Stumptner},
  title        = {Configuring Large Systems Using Generative Constraint Satisfaction},
  journal      = {{IEEE} Intell. Syst.},
  volume       = {13},
  number       = {4},
  pages        = {59--68},
  year         = {1998},
  url          = {https://doi.org/10.1109/5254.708434},
  doi          = {10.1109/5254.708434},
  timestamp    = {Sun, 25 Jul 2021 11:43:03 +0200},
  bibsource    = {dblp computer science bibliography, https://dblp.org}

@article{FalknerFHSS16,
  author       = {Andreas A. Falkner and
                  Gerhard Friedrich and
                  Alois Haselb{\"{o}}ck and
                  Gottfried Schenner and
                  Herwig Schreiner},
  title        = {Twenty-Five Years of Successful Application of Constraint Technologies
                  at {S}iemens},
  journal      = {{AI} Mag.},
  volume       = {37},
  number       = {4},
  year         = {2016},
  doi          = {10.1609/AIMAG.V37I4.2688},
  timestamp    = {Sun, 25 Jul 2021 11:38:38 +0200},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}
@article{zuccato2025energy,
  title={Energy-Aware Double-Flexible Job Shop Scheduling with Machine Modes and Setup Times: A Real-World Industrial Case Study using Constraint Programming},
  author={Zuccato, Francesco and Rodler, Patrick and Friedrich, Gerhard and Schekotihin, Konstantin and Comploi-Taupe, Richard},
  journal={Proceedings of the ECAI Workshop on AI-based Planning for Complex Real-World Applications (CAIPI 2025), co-located with ECAI 2025},
  volume={4103},
  pages={84--99},
  year={2025},
  publisher={CEUR-WS.org}
}




--
You received this message because you are subscribed to the Google Groups "Constraints" group.
To unsubscribe from this group and stop receiving emails from it, send an email to constraints...@googlegroups.com.

Alessio Pellegrino

unread,
Mar 30, 2026, 5:19:22 AM (13 days ago) Mar 30
to Gerhard Friedrich, Constraints, Eugene Freuder
Hi all,
Forgive me, but I am going to go a little against the emergent sentiment: I usually avoid using LLMs to get any opinion: they tend to have a very plain and “average” opinion (because, of course, they are trained to do exactly that). From my (very) limited experience doing research, LLMs can be extremely useful to “rephrase” something, and I find them particularly helpful when I need some help understanding papers, as they have a broader view of the whole field (much more in-depth than mine) and can also help translate math-heavy passages into something more intuitive.
On the other hand, when I need more “creative” help, they tend to be very disappointing: I work a lot with machine learning topics, but I still lack a lot of practical experience, which is very important when training ML models. Recently, I needed some help to apply some feature reduction techniques, and I asked Claude Sonet 4.6. My rationale was that, having ingested all the literature in the field, given a very specific question with a big enough context of the problem, it could guide me towards the right direction. Instead, I got the most generic answer I could get, which did not help me at all.
I think this generic style of answers has also emerged from the shared chats: the average between AI Boomers and Doomers (look here for an in-depth view of the two sides of the conversation https://karendhao.com/), which results in a non-interesting view of the topic (at least in my opinion).
As for me, I don’t think AI will replace the modellers' jobs, not only due to the lack of accountability and general lack of context, but also because these tools require a lot of data to be trained on, and we do not have a lot of different problems to throw at these models. The very few we have are (most probably) already over-represented in the training data. My fear is that they will perform very well on benchmarks, as they can just repeat what they know from the training data, but fail even with small changes and adjustments. This is something we can already see in a lot of other contexts (https://claude.ai/share/46cf5be1-bb37-48d4-a2cf-5cfd59757b5c here’s an example with a simple child puzzle).
Of course, I am not saying they won’t be useful at all, but not too much more than a good IDE with a language server and integrated documentation. I think they could also be useful as semantic search engines: A searchable database with models and common modelling ideas for problems, paired with an LLM, can be extremely helpful to see what the common practice is for what you need.
As a final remark, while I think that it is useful and interesting to see how close we can get to the “holy grail”, I don’t see the appeal of it: the first thing my logic professor told me on my first day in university is that we need programming and mathematical languages to avoid ambiguity, paradoxes, and general misunderstandings. The only way we have to get a correct model from a natural language description is to describe it in a precise and unambiguous way... Which is not too different from just writing it in your favourite formal modelling language.
Again, sorry for being a bit destructive.
Best,
Alessio


Eugene Freuder

unread,
Apr 2, 2026, 3:58:11 PM (10 days ago) Apr 2
to Constraints
Andras has given me permission to forward this to the Group. 
Begin forwarded message:

From: Andras Salamon <Andras....@st-andrews.ac.uk>
Subject: the Holy Grail discussion
Date: April 1, 2026 at 6:13:55 PM EDT

Dear Gene,

I'm responding via a side channel because Google is classifying my personal email addresses as spam (I'm on the constraints list via a personal address).

You asked us to respond to the ChatGPT and Claude transcripts.

Modelling via a chatbot interface is useful progress in the direction of the holy grail, because it provides access to an LLM that can translate English specifications into those in a CP language. However, doing CP via an agentic tool (which gives the LLM access to locally-running tools and the ability to call other LLM instances) like Claude Code, OpenAI Codex, Google Antigravity, or a system like Goose, is a major advance. The LLM is a crucial part but only one part; the agent harness allows the various components to work together and also guide the LLM in its work. The whole is greater than the sum of its parts and seems to get us closer to the holy grail than just LLMs by themselves.

Stefan Szeider nicely explained some of the issues in his talks at the 2025 CP and SAT, as an early successful adopter of Claude Code for significant CP work.

Claude Opus 4.6 knows Essence (in contrast 4.5 had a sketchy grasp of Essence syntax) and does an excellent job of translating English specifications into idiomatic Essence. In fact, it also does an excellent job of providing multiple different Essence specifications in different styles if prompted to do so (and is run in Claude Code with access to CP tools to verify its progress). Like several commenters noted, this doesn't magically solve the semantics issue. Just like with a human modeller, it is easy to miss constraints, to slightly modify constraints into different ones, or to produce a correct-looking specification that is subtly wrong. Rigorous test cases help (just as it always has) but don't solve the entire problem.

The ChatGPT transcript seemed glib and what I would expect of slightly older smaller models, eager to generate output without first checking whether the ambiguities in the request would be better resolved first. I think paid GPT-5.4 with a high level of thinking would have produced fewer unsubstantiated estimates especially if prompted to look at recent literature.

The Claude transcript seemed quite generic, like Sonnet rather than Opus, and again lacked grounding in recent papers (the free version might not be able to do web searches).

With an agent harness, it becomes feasible to do the kinds of iterative development mentioned by several responders. It can in some cases even be done interactively, with user feedback used to update the model (and validate against the tests, or suggest new tests) during discussion. My laptop has never been as busy as when it's running multiple CP solver pipelines at the behest of a coding agent exploring different options or trying to validate its proposed model.

However, whether coding agents really are getting us significantly closer to the holy grail, or just help us to be more ambitious in what we attempt on our way there, is still to be resolved.

Best,

András Salamon
--
The University of St Andrews is a charity registered in Scotland, No.SC013532.


Eugene Freuder

unread,
Apr 7, 2026, 1:06:20 PM (5 days ago) Apr 7
to Constraints
Sadly we did not generate enough discussion to trigger the bribe -- er reward. :-) This is disappointing, but on the bright side the discussion we did generate was very interesting! Thanks to all who participated. Feel free everyone to continue the discussion here, and looking forward to continuing it in Lisbon. :-)
Reply all
Reply to author
Forward
0 new messages