the anatomy of a high-performance theorem-prover - thoughts and insights

147 views
Skip to first unread message

Mark Tarver

unread,
Sep 22, 2016, 5:58:09 AM9/22/16
to Shen
The new high performance ATP will go into SP very soon.  Coding these things is really a challenge because the architecture is about as dense as the Shen kernel.   The main engine E is based on Stickel's Prolog Technology Theorem Prover of the 80s but adapted for sequents and I've added a preprocessor P to it that chops up problems  before feeding them into E.  This preprocessor is actually a little theorem prover itself and has the capacity to solve problems independently of E.   I've called P a planner, but really this is IMO too grand a title for this device.

The top level function is crunchit! of type ((list fol.wff) * fol.wff) --> (list wff) --> boolean).  The last argument is optional and defaults to [] if no argument is supplied.  The function takes

1.  A sequent composed of a pair of a list of fol.wffs L and a fol.wff W which is the conclusion.
2.  A background theory T (list of wffs) which is compiled to code and which may be empty.

The ATP attempts to prove

L |-T  W    i.e. L entails W under T.

The logic is first-order logic without equality.

1.  A logical constant is v, ~, &, =>, <=>, exists, all, =df
2.  A recognised symbol is a symbol other than a logical constant.
3. A recognised symbol is a term.
4.  A list composed of a recognised symbol followed by terms is a term.
5.  A predicate is a recognised symbol.
6. An atom is a list composed of a predicate followed by terms or a recognised symbol (propositional case).
7.  An atom is a fol.wff.
8.  If P and Q are fol.wffs so are [~ P], [P v Q], [P & Q], [P => Q], [P <=> Q], [P =df Q], [all V P], [exists V P] where V is a recognised symbol.

This is standard but for =df which means 'is defined as' .  This is used in T.  If T contains [P =df Q] then all cases of the kind P are demodulated (normalised) to Q and crunchit! does that before attempting to solve.  

Performance is about 750,000 inferences per second per GHz.  This system will probably end up as part of something bigger.  More anon.

Mark





Mark Tarver

unread,
Sep 23, 2016, 10:10:07 AM9/23/16
to Shen
To continue; the architecture is that of a preprover P that feeds into the main engine E.  Suitably programmed the preprover can actually solve certain problems faster than E.  E is programmed to perform incrementally bounded depth-first search up to a bound provided by the user.  So the
pattern is:

1.  Preprover applies demodulation rules and tries to solve the problem by basic reasoning; if it succeeds, then true is returned.
2.  Else hands the problems over to the main engine for solution.  

The 'basic reasoning' uses a modified tableau approach although indirect proof is not used i.e. the search proceeds by both trying to derive a contradiction in the assumptions and by trying to show the conclusion follows directly from the assumptions given.

Demodulation or rewriting to a normal form gives significant advantages in dealing with equivalences that are simply definitions.   Let's see this in action:

Here is a function giving some axioms of naive set theory; m means member of, sub subset etc.   You can probably work out the rest.
 
(define st
{--> (list fol.wff)}
-> [[all x [~ [m x e]]]
[all x [all y [[[sub x y] & [sub y x]] <=> [eq x y]]]]
[all x [all y [[sub x y] <=> [all z [[m z x] => [m z y]]]]]]
[all x [all y [[pow x y] <=> [all z [[m z x] <=> [sub z y]]]]]]
[all x [all y [[com x y] <=> [all z [[m z x] <=> [~ [m z y]]]]]]]
[all x [all y [all z [[prod x y z] <=> [all w [[m w x] <=> [[pair w] & [[m [fst w] y] & [m [snd w] z]]]]]]]]] 
[all x [all y [all z [[int x y z] <=> [all w [[m w x] <=> [[m w y] & [m w z]]]]]]]]
[all x [all y [all z [[un x y z] <=> [all w [[m w x] <=> [[m w y] v [m w z]]]]]]]] ]) 

Let's compile this down.     This is done on my 1.3GHz laptop.

(14-) (compile-fol (st))
107 Horn clauses generated
true

(15-) (crunchit! (@p [] [all x [~ [pow e x]]]) [(@p verbose 1)] 
   \\ prove the empty set is not a powerset of any set and print off stats.

run time: 0.7800025939941406 secs
815594 inferences
true

(16-) (crunchit! (@p [] [un a a a]) [(@p verbose 1)])  \\ A U A = A

run time: 0.3899993896484375 secs
352518 inferences
true

(19-) (crunchit! (@p []
               [all x [all y [all z [[un x y z] <=> [un x z y]]
               [(@p verbose 1)])

run time: 0.7490081787109375 secs
811613 inferences
true

(20-) (crunchit! (@p [] [all x [all y [[pow x y] => [m y x]]]]) [(@p verbose 1)]
)

run time: 5.989990234375 secs
6592766 inferences
true

(21-) (crunchit! (@p [] [all x [all y [[prod x y e] => [eq x e]
]) [(@p verbose 1)])

run time: 0.0 secs
2832 inferences
true

(22-) (crunchit! (@p [] [all x [all y [[sub x y] => [un y x y]]) [(@p verbose 1)])

run time: 0.25 secs
279306 inferences
true

(23-) (crunchit! (@p []
               [all x [all y [all z [[sub z y] => [[un x y z])
               [(@p verbose 1)])

run time: 0.0159912109375 secs
12992 inferences
true

(24-) (crunchit! (@p [] [all x [~ [pow e x]]]) [(@p verbose 1)])

run time: 0.764007568359375 secs
815594 inferences
true

(25-) (crunchit! (@p [] [int a a a]) [(@p verbose 1)]) \\ A intersection A = A

run time: 208.3079948425293 secs
203763756 inferences
true

That's really huge - over 200 million inferences; and a tribute to the inferencing speed of E.  However we can choose to set up these problems differently. All the equivalences in st apart from the first are really implicit definitions.  Therefore we can demodulate all of these problems and use the first axiom as part of the explicit theory.  When we do that the preprover actually solves the problems itself at great speed.  This is because treating equivalences clausally is inefficient.  So here I enter the equivalences as definitions;  I generate a fresh variable V here to avoid variable capture.

(define fol.demodulate
  {fol.wff --> fol.wff}
   [eq X Y] -> [[sub X Y] & [sub Y X]] 
   [sub X Y] -> (let V (gensym v) [all V [[m V X] => [m V Y]]])
   [pow X Y] -> (let V (gensym v) [all V [[m V X] <=> [sub V Y]]])
   [com X Y] -> (let V (gensym v) [all V [[m V X] <=> [~ [m V Y]]]])
   [prod X Y Z] -> (let V (gensym v) 
                        [all V [[m V X] 
                                 <=> [[pair V] & [[m [fst V] Y] & [m [snd V] Z]]]]]) 
   [int X Y Z] -> (let V (gensym v) [all V [[m V X] <=> [[m V Y] & [m V Z]]]])
   [un X Y Z] -> (let V (gensym v) [all V [[m V X] <=> [[m V Y] v [m V Z]]]])
   P -> P)
  
(define e-axiom
  {--> fol.wff}
  -> [all x [~ [m x e]]])

Off we go.
    
(54+) (crunchit! (@p [(e-axiom)] [un a a a]) [(@p verbose 1)])

run time: 0.0159912109375 secs
410 inferences
true : boolean

(55+) (crunchit! (@p [(e-axiom)] [int a a a]) [(@p verbose 1)])

run time: 0.0 secs
407 inferences
true : boolean

(56+) (crunchit! (@p [(e-axiom)]
               [all x [all y [all z [[un x y z] <=> [un x z y
               [(@p verbose 1)])

run time: 0.0 secs
4332 inferences
true : boolean

(57+) (crunchit! (@p [(e-axiom)] [all x [all y [[pow x y] => [
rbose 1)])

run time: 0.0 secs
1188 inferences
true : boolean 

(57+) (crunchit! (@p [(e-axiom)] [all x [all y [[pow x y] => [m y x]]]]) [(@p verbose 1)])

run time: 0.0 secs
1188 inferences
true : boolean

(58+) (crunchit! (@p [(e-axiom)] [all x [all y [[prod x y e] => [eq x e]]]]) [(@p verbose 1)])

run time: 0.014984130859375 secs
6665 inferences
true : boolean

(59+) (crunchit! (@p [(e-axiom)] [all x [all y [[sub x y] => [un y x y]]]]) [(@p verbose 1)])

run time: 0.0 secs
848 inferences
true : boolean

(60+) (crunchit! (@p [(e-axiom)]
               [all x [all y [all z [[sub z y] => [[un x y z] => [un x y y]]]]]]
)
               [(@p verbose 1)])

run time: 0.0159912109375 secs
4114 inferences
true : boolean

(61+) (crunchit! (@p [(e-axiom)] [all x [~ [pow e x]]]) [(@p verbose 1)])

run time: 0.0 secs
551 inferences
true : boolean

Mark

Mark Tarver

unread,
Sep 23, 2016, 3:06:31 PM9/23/16
to Shen

In detail this is how it works

1.  The formulae in the assumptions and conclusion are demodulated and put in prenex form.
2.  The existentially quantified variables in the assumptions and the universally quantified ones in the conclusion are skolemised away.  The latter requires recognising that universally quantified variables in a conclusion behave much as existentially quantified variables in the assumptions.
3.  Remaining bound variables are replaced by Prolog variables.
4.  The preprover P performs some basic inferencing in which the sequent is reduced to a series S of sequents whose components are literals.
5.  Anything in S that cannot be solved by P is sent to the engine E to be crunched.

Mark

Mark Tarver

unread,
Sep 26, 2016, 8:26:39 AM9/26/16
to Shen
I've finally got the system working reasonably well and I'll send it out this week.   The program will be part of something called Logic Lab which will be a bunch of logic tools for SP people to experiment with.

Debugging the program is a challenge because the alien nature of the resolution process, the complexity of the code and the fact that an error may show up after thousands of inferences means that you have to be painstaking.  I've worked through it carefully.   The system is logically incomplete but sound.  Code is about 460 lines.

As regards the name FTP - fast theorem prover - will do.  Thanks to all for your suggestions which are more imaginative than mine!

Having eliminated the glitches I now have a clearer picture of the the performance profile.   The properly working system is generally much faster.  These problems are solved at depth 5.  You can see that the (int a a a) is now solved much more quickly - about 10^5 times faster!

(21+) (ftp [] [un a a a] -verbose 1)

run time: 0.0149993896484375 secs
17654 inferences
true : boolean

(22+) (ftp [] [int a a a] -verbose 1)

run time: 0.016002655029296875 secs
18807 inferences
true : boolean

(23+) (ftp [] [all x [all y [[pow x y] => [m y x]]]] -verbose 1)

run time: 0.03099822998046875 secs
42741 inferences
true : boolean

(24+) (ftp [] [all x [all y [[prod x y e] => [eq x e]]]] -verbose 1)

run time: 0.0 secs
1574 inferences
true : boolean

(25+) (ftp [] [all x [all y [[sub x y] => [un y x y]]]] -verbose 1)

run time: 0.046001434326171875 secs
41113 inferences
true : boolean

(26+) (crunchit! [] [all x [~ [pow e x]]] -verbose 1)

run time: 0.0 secs
1935 inferences
true : boolean

Two problems from the original set were not solved - by experiment I found that these could be solved at a higher depth.

(ftp [] [all x [all y [all z [[un x y z] <=> [un x z y]]]]] -verbose 1 -depth 8) 

(ftp [] [all x [all y [all z [[sub z y] => [[un x y z] => [un x y y]]]]]] -verbose 1 -depth 7) 

However sadly SBCL died on these two.  Only by cutting the axiom set down to those axioms needed to solve the problem was FTP able to solve both problems.

These figures were all obtained without demodulation.   With demodulation all problems are solved instantaneously.

Thoughts and reflections?   Speed is no substitute for brains.  FTP would probably be quite useful as a knowledge engine attached to a DB giving first-order querying capability.  

Mark  

Mark Tarver

unread,
Sep 26, 2016, 9:36:28 AM9/26/16
to Shen
(39+) (ftp [] [all x [all y [all z [[un x y z] <=> [un x z y]]]]] -verbose 1 -depth 5)

run time: 0.0630035400390625 secs
73484 inferences
false : boolean

(40+) (ftp [] [all x [all y [all z [[un x y z] <=> [un x z y]]]]] -verbose 1 -depth 6)

run time: 0.858001708984375 secs
887737 inferences
false : boolean

(41+) (ftp [] [all x [all y [all z [[un x y z] <=> [un x z y]]]]] -verbose 1 -depth 7)

run time: 80.8709945678711 secs
72921041 inferences
false : boolean

We've jumped from one order of magnitude to 2 in one bound increase; indicating that the search tree here is very bushy.   Increasing it to 8 would cause a memory failure - the solution is out in the billions it seems.   Decreasing the size of our axiom set will enable that bound to be reached.   But demodulation is the best answer.

(42+) (define fol.demodulate
  {fol.wff --> fol.wff}
   [eq X Y] -> [[sub X Y] & [sub Y X]]
   [sub X Y] -> (let V (gensym v) [all V [[m V X] => [m V Y]]])
   [pow X Y] -> (let V (gensym v) [all V [[m V X] <=> [sub V Y]]])
   [com X Y] -> (let V (gensym v) [all V [[m V X] <=> [~ [m V Y]]]])
   [prod X Y Z] -> (let V (gensym v)
                        [all V [[m V X]
                                 <=> [[pair V] & [[m [fst V] Y] & [m [snd V] Z]]]])
   [int X Y Z] -> (let V (gensym v) [all V [[m V X] <=> [[m V Y] & [m V Z]]]])
   [un X Y Z] -> (let V (gensym v) [all V [[m V X] <=> [[m V Y] v [m V Z]]]])
   P -> P)
fol.demodulate : (fol.wff --> fol.wff)

(43+) (ftp [] [all x [all y [all z [[un x y z] <=> [un x z y]]]]] -verbose 1 -depth 5)

run time: 0.0 secs
4734 inferences
true : boolean

Mark

Antti Ylikoski

unread,
Sep 27, 2016, 8:35:02 AM9/27/16
to Shen

I have attached in this group entry the Compleat Guide for the MRS.

The Stanford University MRS, the Metalevel Reasoning System, is a 1985
general-purpose reasoning tool, written by Professor Michael
Genesereth's group in Stanford University.

The file is rather large for a 'net group attachment, >130 pages and
3.3MB, but I feel that it makes sense to send the file to the group
for the comparison of Mark's FTP work, and the Stanford MRS.

My own comments:

The MRS is "handy" for building rather simpleminded reasoning systems,
such as using the MRS as an expert system shell.

But if one will delve into the Guide for the MRS and compare it with
Mark's work, it is easy to see that from the mathematical, theory
standpoint, Mark's work is significantly stronger.

However, one shall note here that the two systems are originally
intended for different applications, to a certain extent.

yours, AJY
CS-TR-85-1080.pdf

Mark Tarver

unread,
Sep 27, 2016, 10:16:49 AM9/27/16
to Shen
Well yes; they are different and the  guys who built the MRS probably spent more time on it.   I wouldn't compare the FTP and the MRS.   

But for those Shenturians interested there is a help page for the FTP which explains how it works - http://www.shenlanguage.org/professional/ftp.htm.  This system will be available over the cloud for SP users early next month.

Mark

Mark Tarver

unread,
Sep 27, 2016, 10:32:24 AM9/27/16
to Shen
As you can see I've added identity to the FTP.   Just started on the first Pelletier with identity.

(18-) (ftp [ [[a = b] v [c = d]] [[a = c] v [b = d]]] [[a = d] v [b = c]] -verbose 1 -id 1 -amplify 2)

run time: 0.0 secs
918 inferences
true

Mark

Mark Tarver

unread,
Sep 27, 2016, 2:10:21 PM9/27/16
to Shen
To put things in perspective;  Vampire AFAIK is the top ATP - 250,000 lines of code.  FTP is < 500; though probably in almost any other language apart from Prolog/Shen it would be much larger.

FTP is not logically complete nor designed to be so; it's a 'fire-and-forget' ATP that is designed to terminate fairly quickly.

Mark
Reply all
Reply to author
Forward
0 new messages