"Bridging the Chasm: Programming the Next Generation of Arti cial Intelligence"

129 views
Skip to first unread message

Bruno Deferrari

unread,
Jun 13, 2025, 11:09:12 AMJun 13
to qil...@googlegroups.com
Was reading this PDF https://drive.google.com/file/d/1KbwH2zfye7-_8Utxa7h0VhDBBA2mbJr8/view

and reached a few paragraphs in page 8 that made me think of Shen, so thought about sharing here (emphasis not mine):

Verifiable & Safe Agentic Compute: As AI systems become more autonomous
("agentic") and potentially take over coding tasks, ensuring their safety and
correctness becomes paramount. This requires moving beyond statistical
testing towards formal verification methods adapted for the AI context.
Research projects like AgentSpec, a domain-specific language for specifying
and enforcing runtime safety constraints on LLM agents, demonstrate the
feasibility of applying formal methods to AI-generated artifacts.

The lab will focus on developing practical techniques for verifying AI-generated
code and the behavior of agentic systems. This aligns with Meijer's vision of
using AI to assist in formal methods, potentially generating proofs alongside
code ("proof-carrying code") based on specifications expressed in formalisms
like Horn clauses.

The emphasis is on enabling human developers to easily check and
mathematically guarantee the safety and correctness properties of AI
systems, much like verifying the authenticity of currency through built-in
security features.

This shifts the human role from writing and debugging potentially flawed code
to verifying high-level specifications and AI-generated proofs. This involves
building safety guarantees directly into the programming frameworks and
runtime environments for AI agents, drawing on principles from AI agent
framework development but with a much stronger emphasis on formal
specification and enforcement.


I have been playing with coding agents quite a bit lately, and they are quite good, much better than I would have imagined last year. Something I have noticed is that I can get much better results when the agent is able to get feedback from the typechecker, lints, running tests, etc, because that allows the agent to more easily iterate by himself and self-correct without going down the wrong path. If you start thinking about languages like Shen it starts to get more interesting.

Some other related links in this tweet: https://x.com/headinthebox/status/1933219697404854434

--
BD

dr.mt...@gmail.com

unread,
Jun 15, 2025, 3:31:25 AMJun 15
to Shen
That's a very interesting document and it echoes things I've been saying about the proper
commercial application of Shen.  Shen's forte is as a metalanguage for formally specifying
and reasoning over other languages.  I've been saying this now for some years; the
future of Shen is as a tool for constructing DSLs and checking code in other languages.

Vide my comments in this group from 2012.

Editors can contribute to solving this but you need a different model
of editor. For example, right now, you have colour coding so that
strings are purple say, lists are green etc. You get this on NotePad2
and also on Qi/Emacs. That's too low.

Have you seen Predator II? Remember the part where they are hunting
the alien in a meat packing factory and the alien gets suspicious and
switches his vision from IR to UV? Then he sees the humans who are
masking under IR and of course they are toast after that.

What we need wrt Babel is something like the power that the alien had
to switch filters.because syntax is very superficial. For example I
may say 'Babel switch to the code weight filter' and it colour codes
my source in terms of the weight of the tail showing (input) up in
blinding crimson. Or i might say 'Babel switch to the termination
filter' and it highlights the code that might lead to an infinite
loop. Or 'Switch to the type filter' and I see my datatypes in
different colours with a manky colour for the parts which are not type
secure.

To build something like that requires an integration of the best of
GUI with a total mastery of logic engines and type theories and how
to build them together with a metalanguage that is so powerful that it
can put all these things together for any arbitrary language. Shen is
about the only thing I know that is powerful enough to do that.

That is 2012.  It did not happen then, but retrospectively Shen was not 
mature then either, so in a way that was good.  Things have a way of 
happening within their own time.  

Mark  

dr.mt...@gmail.com

unread,
Jun 16, 2025, 2:38:18 AMJun 16
to Shen
I'm reading the reference more slowly.

If AI will possess superhuman coding abilities, we must build the systems
for AI to write and execute code, with humans shifting to the crucial roles of
specification and verification.

The problem with this approach has been the complexity of formal methods; the
actual execution of the verification proofs which has required human beings so far.   
And its very hard work!!  A proof is typically much longer than the programs it verifies.
This has been true for 50 years.  So I guess we require an AI revolution in verification too 
for this to work.

The proofs tend to be rather unreadable.  Here is a step in the proof of the associativity of
append.  THORN is failing here - whether because the problem has a bug in it or THORN is
not bright enough I don't know yet.  I will test it against Prover 9; a more mature ATP.

Step 10 [1]

?- [[append [append [cons a b] c] d] = [append [cons a b] [append c d]]]

1. [d : [list t]]
2. [c : [list t]]
3. [all A : type [all x140091 : [list A] [all x140092 : [list A] [[append x140091 x140092] : [list A]]]]]
4. [all t135565 : type [all X : [list t135565] [[append [] X] = X]]]
5. [all t140090 : type [all X : t140090 [all Y : [list t140090] [all Z : [list t140090] [[append [cons X Y] Z] = [cons X [append Y Z]]]]]]]
6. [a : t]
7. [b : [list t]]
8. [all y : [list t] [all z : [list t] [[append [append a y] z] = [append a [append y z]]]]]
9. [t : type]

So the revolution the article refers to is going to require a revolution in the way these proofs
are conducted, then  AI is going to be needed to conduct the proofs!

Which then begs the question - if AI is verifying AI have we moved forward?  Yes because if
AI can conduct these proofs autonomously, it is a great step in automated reasoning and
though programs to make proofs are laborious, checking them is not and in a type secure
system you can enforce the security of the proof (re Edinburgh LCF and also the software
driving proofs in LPC which requires type secure i.e. valid steps).  But note AFAIK this revolution in
automated reasoning has not happened - yet.

In effect that means that the  effort will move to specification.

Crucially, this paradigm shifts the focus from AI assisting humans in writing code using 
traditional tools, to AI generating code within an AI-native framework, where humans 
primarily focus on specifying requirements and verifying the correctness and safety of 
the AI's output.

That differs from my vision of years back, because I envisaged using Shen to help programmers
and not AI.  But time moves on.

I confess a slightly queasy feeling about programs we don't understand, verified by proofs which
are machine generated and unsurveyable.  But then KL is pretty unreadable so maybe I'm just
fussing. 
 
Mark

dr.mt...@gmail.com

unread,
Jun 17, 2025, 5:32:15 PMJun 17
to Shen
So the revolution the article refers to is going to require a revolution in the way these proofs
are conducted

That's my conclusion.   If the future of programming is about verifying AI generated code
then we need to look at computer-supported verification proofs and for that we need the same AI
to function as an ATP.  I'm very interested to see if this can be done.

Right now the best ATPs are still founded on AI technology that is 30 years old or more.
Vampire, which is the world's best, is 250,000 lines of OCAML and C++.  The reason
it is so huge is that the big ATPs have a lots of domain specific optimisations for
solving problems.  If general AI beats Vampire at its own game, that would be like
Alpha Zero beating Stockfish.   Stockfish, like Vampire, is GOFAI, the trad stuff I learnt
based on alpha-beta pruning etc.  But Alpha Zero did beat Stockfish.

So that's the target - beat Vampire and its game set and match to the new technology.   
The way is clear to automatic verification.

M.
Reply all
Reply to author
Forward
0 new messages