When I read this post, my first thought was: "This is talking about me."
About three months ago, with the help of Codex, I wrote a proof assistant that I called Base2 (because it is based on classical second-order logic), and it does exactly this: axioms, definitions, theorems, and proofs represented simply as lists of formulas.
Codex not only helped me write the proof assistant itself, but it also generated a few proofs. It did so in such a straightforward way that I ended up thinking: "If it can do it with my system, it can probably do it with any system."
But maybe it's not quite that simple. Reading your post, and seeing that progress on Wiedijk's list of 100 theorems doesn't seem to have accelerated much despite the rise of AI agents (I'm speaking generally, not just about Metamath), makes me think the situation may be more complicated.
I had actually started a discussion about my system on Stack Exchange, but it didn't attract much interest.
In the end, I even got into an argument with someone there and decided to delete my account altogether.
Anyway, I think I'll continue working on my proof assistant this summer. It was partially inspired by Metamath, although it takes a somewhat different approach.
I added some automation to mm-lamp to support LLM driven proofs. The idea is based on the fact that LLMs can generate structured responses. I added some generic JavaScript API functions to mm-lamp. Those functions accept JSON objects as input and run some common actions like add/modify/delete steps, invoke the bottom-up prover for a specific step, put current editor state to the clipboard. Then I instructed an LLM what it should do and how it can use those API functions. Mainly I was doing these few actions in a loop:
Send the current editor state to the LLM. This is as simple as clicking a button in mm-lamp, which puts current state to a clipboard. Then open LLM UI interface and paste the clipboard content.
Get the response from LLM which includes a JSON object in the expected format (from my instruction).
Copy-paste the JSON object to mm-lamp. This made mm-lamp to execute some action.
Go to step 1.
Here are files which I developed:
API functions - https://github.com/expln/metamath-lamp/blob/d6ea38a41fe2439bb956fe9c8a52f6ac15b99eb7/src/metamath/api/macros/llm_api_set_mm.js
Instructions for an LLM - https://github.com/expln/metamath-lamp/blob/d6ea38a41fe2439bb956fe9c8a52f6ac15b99eb7/src/metamath/api/macros/llm_prompt.md
And here is my dialog with an LLM where I asked it to find and prove the solution for a simple equation - https://claude.ai/share/d46eb564-cac5-45fd-a949-02332e5e1dbc
(Some of my replicas in the chat were big, so they are hidden in the dialog. The very first my replica is one of them. But it is just the content of the md file with the instruction mentioned above.)
The dialog was very long. I think I could prove the same myself much faster. I had to do some manual corrections in the mm-lamp editor. However, some of them were caused by a bug in mm-lamp. But eventually the LLM succeeded to develop a proof:
${
hyp1 $e |- X e. CC $.
hyp2 $e |- ( ( ; 4 5 x. X ) + 8 ) = 3 $.
goal
$p |- X = ( -u 1 / 9 )
$= ( c3 c8 cmin co c4 c5 cdc cdiv c1 cneg c9 4nn0 5nn decnncl nncni nnne0i cmul mulcli 8cn mvlraddi mvllmuli wceq 3cn negsubdi2i eqcomi 5cn gbpart8 mvrladdi negeqi eqtri oveq1i w3a cc wcel a1i cc0 wne divnegd eqeq2d 4animp1 eel0000 9t5e45 oveq2i wtru 9cn 9ne0 5ne0 divcan8d mptru wn 1cnd pm2.18i ) ADZEZFZGZHZIZJZKZGZLZMNZWCGZWBAVSWBVTWAOPQZRZBWBWH SZWBATZGVQVPWBAWIBUAUBZCUCUDWDWEWFWCGZMZWGWDWAWBWC GZMZWNWDWAMZWBWCGZUEZWSWSWSWDWPUEZVSWQWBWCVSVQVPVR GZMZWQXBVSVQVPWLUFZUGUHXAWAVQVPWAXCUIZUJUKULUMUNZX EXEXEWSWSWSWSWTWSWSWSUOZWPWRWDXFWAWBWAUPZUQZXFXDUR WBXGUQXFWIURWBUSZUTXFWJURVAVBVCVDWOWMWOWAWFWAWKGZW CGZWMXKWOXJWBWAWCVEVFUHXKWMUEVGZWFWAWFXGUQZXLVHZUR XHXLXDURWFXIUTZXLVIZURWAXIUTXLVJURVKVLUMULUMWNWGUE ZXQVMZWEWFXRVNXMXRXNURXOXRXPURVAVOUMUM $.
$}
Currently I don’t provide an instruction how to reproduce this due to lack of time. But this is not difficult. I will provide an instruction in a few days as soon as I have free time to write it.
-
Igor