Some questions about AI’s which can generate proofs. I’d welcome any and all answers and feedback :)

151 views
Skip to first unread message

Jon P

unread,
Jun 9, 2021, 11:21:56 AM6/9/21
to Metamath
There have been a couple of AI systems, gpt-f and Holophrasm, built on metamath whose aim is to generate proofs. Are there more I’m not aware of?

If I have understood correctly I think how the search process works is as follows:

You have some set of hypotheses and a proposition you want to prove.

You apply a series of already proven theorems, which are substitution rules, to the proposition until the resulting statements either match the hypotheses or match theorems already proven.

The question “Is there a proof of any length which proves this proposition” is in the same complexity class as the halting problem I think. Whereas “Is there a proof of length less than L steps which proves the proposition” is NP-complete, so is easier to tackle I think.

Trying to find such a chain of substitution rules is essentially a tree search problem. If there are N theorems in the database then to answer the latter question comes down to searching a tree of size N^L. Is that somewhat right?

So does this mean, in the abstract, given enough hardware and time you could find all proofs of length less than L just by brute force search? 

I am also interested in what the true branching factor is. For instance not all N theorems in the database can be applied to all hypotheses, only a subset can be. How much is the reduction? Can most theorems be applied to most statements or is it that only a very small subset can? Moreover I guess there is a reduction because it is sufficient to find only one proof whereas there will be many for any provable statement.

Is it true that any AI system built on metamath will need these same foundations? For instance it will need a parser for the database, a verifier and also it will need a function which: for any statement returns the possible substitutions/theorems which could be applied.

Is there any value in us (I don’t know if anyone would be interested in working on this) gathering some programs which already do this or writing some new code in a nice optimised way? For example in MineRL, a minecraft AI challenge, they provide an environment which offers the agent the state of the game and a list of options available to it. Maybe some sort of environment like this for metamath would help encourage people to tackle the AI side of these problems? 

Another question I have is how much can be done with optimised code, or multithreading, or gpu parallelisation? For instance the unify command in MMJ2 presumably is searching for all possible proofs of length 1 to fill in a gap? Could something like this be optimised to run in the background on other threads on the cpu so it could fill in 2, 3, or more steps for you? Would this be a powerful thing to add to a proof assistant, a brute force solver which is always trying to help? If a gpu optimised parallel search could be performed on a large amount of hardware might it be able to actually generate some reasonable length proofs?

Finally for the general AI problem presumably if any kind of mild heuristic could be found then it might be possible to generate proofs much more efficiently? I think that’s what the above systems are doing, if I have understood correctly, is given a list of possible substitutions to prefer some other overs as more likely to lead in the right direction? Their hope is that the Neural Networks can develop some sort of intuition.

In terms of finding a heuristic there is some interesting work here which might apply. The basic idea, as applied to metamath, would be to use some variant of word2vec to embed all the statements in the database into a high dimensional (say 100) vector space. Then when searching the tree of possible proofs statements could be evaluated based on whether they were leading closer to the hypotheses or not. Might this approach have any value here? Is anyone interested in it? Are there other ways of generating a heuristic to help with search?

Thank you for taking the time to read this. I’d really value any feedback or input. I apologise in advance for the many misunderstandings I have and would love to be corrected on anything. 

Jon

Mario Carneiro

unread,
Jun 9, 2021, 4:45:08 PM6/9/21
to metamath
On Wed, Jun 9, 2021 at 11:21 AM Jon P <drjonp...@gmail.com> wrote:
There have been a couple of AI systems, gpt-f and Holophrasm, built on metamath whose aim is to generate proofs. Are there more I’m not aware of?

Not that I know of, at least for metamath. There are other systems in other languages: GPT-f also has a lean version, HOList works for HOL light, Tactician is a Coq-based ML architecture, and there are several others that don't immediately come to mind.

The question “Is there a proof of any length which proves this proposition” is in the same complexity class as the halting problem I think. Whereas “Is there a proof of length less than L steps which proves the proposition” is NP-complete, so is easier to tackle I think.

Technically, L has to be written in unary for it to be NP-complete. If you write it normally then it is NEXPTIME.
.
Trying to find such a chain of substitution rules is essentially a tree search problem. If there are N theorems in the database then to answer the latter question comes down to searching a tree of size N^L. Is that somewhat right?

This is a slightly overly restricted setting. You probably want to be able to add new theorems to the database and then use them to compress future proofs. That means that the branching factor can grow over time, roughly (N+L)^L if we say that adding a theorem counts as a step. This is actually how DreamCoder works (https://arxiv.org/abs/2006.08381), you should take a look.
 
So does this mean, in the abstract, given enough hardware and time you could find all proofs of length less than L just by brute force search?

Yes. A proof of length L is a string, and there are only finitely many strings of that length. Just verify them all and see if one checks.

I am also interested in what the true branching factor is. For instance not all N theorems in the database can be applied to all hypotheses, only a subset can be. How much is the reduction? Can most theorems be applied to most statements or is it that only a very small subset can? Moreover I guess there is a reduction because it is sufficient to find only one proof whereas there will be many for any provable statement.

There are quite a few theorems, like ax-mp, that apply on any goal state, so the branching factor is always at least as many as the number of theorems that conclude with |- ph. But it's true that most lemmas are not like this. Worse, one of ax-mp's hypotheses is |- ps where ps is unrelated to ph, which means that you can apply literally any theorem T in set.mm to the current goal in two steps, by using ax-mp and then T in the minor premise. So the average branching factor is at least sqrt(N) where N is the number of theorems in the database.

Is it true that any AI system built on metamath will need these same foundations? For instance it will need a parser for the database, a verifier and also it will need a function which: for any statement returns the possible substitutions/theorems which could be applied.

No. The brute force search method just described is not practical above about 15 steps or so. It has been used for parts of http://us.metamath.org/mmsolitaire/pmproofs.txt but it runs out of steam even before the end of that list.

Most theorem provers are ultimately doing tree search, but the tree being searched may not literally be the tree of substitutions mentioned. For example they might instead use tactics (proof producing programs) as building blocks and tree search a way to fit them together. This method may not be complete, and may not produce the shortest proof, but it can eliminate a lot of the excess branching factor in the naive method.
 
Is there any value in us (I don’t know if anyone would be interested in working on this) gathering some programs which already do this or writing some new code in a nice optimised way? For example in MineRL, a minecraft AI challenge, they provide an environment which offers the agent the state of the game and a list of options available to it. Maybe some sort of environment like this for metamath would help encourage people to tackle the AI side of these problems? 

set.mm is already a fairly good source for hand-optimized proofs, since this has been a running goal for a while. You should talk to the gpt-f folks about a metamath-based gym environment; I think they already have something like that.

Another question I have is how much can be done with optimised code, or multithreading, or gpu parallelisation? For instance the unify command in MMJ2 presumably is searching for all possible proofs of length 1 to fill in a gap? Could something like this be optimised to run in the background on other threads on the cpu so it could fill in 2, 3, or more steps for you? Would this be a powerful thing to add to a proof assistant, a brute force solver which is always trying to help? If a gpu optimised parallel search could be performed on a large amount of hardware might it be able to actually generate some reasonable length proofs?

Certainly bigger small-scale brute force search is useful for proof authors. Smarter search is also useful. Unfortunately, most proof activities are not well-suited to GPU parallelism because it is a very discrete and nonlinear problem (as you already mentioned, not unlike the halting problem) and it's hard to find a piece of it that looks like "do the same thing to these 1000 data elements". Multithreading is fine though, you can have this stuff running in the background on other CPUs.

Finally for the general AI problem presumably if any kind of mild heuristic could be found then it might be possible to generate proofs much more efficiently? I think that’s what the above systems are doing, if I have understood correctly, is given a list of possible substitutions to prefer some other overs as more likely to lead in the right direction? Their hope is that the Neural Networks can develop some sort of intuition.

Yes, neural network based tree search builds heuristics for how promising the goal state looks like, so that silly things like applying ax-mp many times in a row are rejected for generating too many goals. You can still improve on this by giving the neural network human level steps to be able to work with: for example not requiring that the NN fills in substitutions and just give theorem names with the substitutions being handled by unification.

In terms of finding a heuristic there is some interesting work here which might apply. The basic idea, as applied to metamath, would be to use some variant of word2vec to embed all the statements in the database into a high dimensional (say 100) vector space. Then when searching the tree of possible proofs statements could be evaluated based on whether they were leading closer to the hypotheses or not. Might this approach have any value here? Is anyone interested in it? Are there other ways of generating a heuristic to help with search?

GPT-f is a transformer, but previous methods like Holophrasm and HOList are based on a tree embedding of the formulas. You kind of have to do this because NNs work on a finite dimensional latent space.

Mario

Norman Megill

unread,
Jun 9, 2021, 7:03:11 PM6/9/21
to Metamath
On Wednesday, June 9, 2021 at 4:45:08 PM UTC-4 di.gama wrote:
On Wed, Jun 9, 2021 at 11:21 AM Jon P wrote:
...
Is it true that any AI system built on metamath will need these same foundations? For instance it will need a parser for the database, a verifier and also it will need a function which: for any statement returns the possible substitutions/theorems which could be applied.
 
No. The brute force search method just described is not practical above about 15 steps or so. It has been used for parts of http://us.metamath.org/mmsolitaire/pmproofs.txt but it runs out of steam even before the end of that list.
...

Actually the proofs there with up to 21 steps are the shortest possible.  All possible proofs up to this length were exhaustively generated and their final theorems compared to the list.  This was about 30 years ago, using a BASIC+ program running on a VAX for several days.  Today's computers might be able to go a little further.

The number of possible proofs to generate and test is the Catalan number of the number of D's (modus ponens applications), multiplied by 3^([number of D's]+1) since there are 3 possible axioms to try at each non-D step.  Many of these will abort early because unification will not be possible, e.g. D23 is a valid proof, D32 is not.   The program drule.c mentioned there will let you play with this:  "./drule D23" will return ">>>~P~QQ>>~P~QP" (Polish notation) and "./drule D32" will return "?Error: Proof is invalid".

Proofs with more than 21 steps there were originally found by hand (me), with shorter ones found for many of them over the years by various clever people.

I've thought that pmproofs.txt might be interesting for some kind of AI experiment due to its simplicity, but AFAIK nothing has been tried.  There was a "genetic algorithm" experiment by George Szpiro in 2018 that found 3 shorter proofs automatically.  I think all other shorter proofs were basically found by hand, sometimes assisted by a custom program.

The proof for *2.16 was mentioned here:  https://codegolf.stackexchange.com/questions/161172/ and was the shortest (59 steps) of the ones that people there found.

Norm

Stanislas Polu

unread,
Jun 10, 2021, 3:17:04 AM6/10/21
to meta...@googlegroups.com
Adding some comments on top of Mario's:

>> Trying to find such a chain of substitution rules is essentially a tree search problem. If there are N theorems in the database then to answer the latter question comes down to searching a tree of size N^L. Is that somewhat right?
>
> This is a slightly overly restricted setting. You probably want to be able to add new theorems to the database and then use them to compress future proofs. That means that the branching factor can grow over time, roughly (N+L)^L if we say that adding a theorem counts as a step. This is actually how DreamCoder works (https://arxiv.org/abs/2006.08381), you should take a look.

There are also the substitutions. Not all substitutions are bound by
the requirement to unify to the conclusion, eg modus ponens. Then
either you discard these proof steps (and I presume we have
theoretical results on this that show that it's possible to prove
things that way but your proof size explodes?) or you keep them and
your search space becomes enumerable at each step (assuming no
restriction on the length of the proof step which in practice you
would have, but it's still prohibitively huge even for one step).
Hence the motivation to use a language model such that we can sample
possible next steps from it and use that list as the branching for
each step in our search tree. I call these unbound substitutions or
terms required in proofs, "exogenous terms" (haven't found a standard
name for them?); Being capable of generating those is again a core
motivation of GPT-f.

The consequence is that the search tree is not "complete" and you
can't just get to a proof by searching forever. But the hope is that
as the language model gets better it is powerful enough to cover the
search space properly yet keeping it small enough to be practical.

>> Is it true that any AI system built on metamath will need these same foundations? For instance it will need a parser for the database, a verifier and also it will need a function which: for any statement returns the possible substitutions/theorems which could be applied.

Generally speaking you definitely need an "interactive" verifier
(which includes, for Metamath, a parser for the grammar defined by the
database). I don't think you necessarily need a system that
mechanically proposes possible substitutions, as an example with GPT-f
we let the model do that work. For other approaches based on ranking
possible actions, having something that proposes actions is definitely
useful.

Hope it's useful. Happy to answer any specific questions about GPT-f
and how we interface to Metamath.

-stan
> --
> You received this message because you are subscribed to the Google Groups "Metamath" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to metamath+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/metamath/CAFXXJSuy4Cm%2BXwo%2Bn%3DiziRJ0hVJH34kgmBvrReJd2RK4KSWLHw%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages