I've spent some time communing with the Lean documentation and have had more time to think about your questions. I'll answer in two parts, first about lambda-Pi, and second where I think Ghilbert has the opportunity to do things Lean misses.
The question of how lambda-Pi relates to Ghilbert breaks down to me into five sub-questions. I'll start with the one I consider a clear win.
1. Should Ghilbert proofs translate cleanly into lambda-Pi?
As I suggested, I'm sold on this one. I think it makes sense to even define valid Ghilbert proofs in terms of their translation to a lambda-Pi term. If a proof is accepted but it doesn't translate, that's a bug. If the resulting lambda-Pi term doesn't type-check, that's a bug. Actually producing the translation is a great way to keep the implementation honest.
There are other real benefits. I'd like to think that the Ghilbert proof library grow to the point where it's useful to translate these and import them into other systems. I'll probably start with the translation into Lean, but the star of dependently typed languages is rising, so Idris could be interesting, and possibly future versions of Haskell, etc.
I think there are other benefits in terms of reasoning. I've especially struggled keeping definitions sound. The idea that a sound definition just translates to a closed term that type-checks is extremely appealing.
2. Should hypotheses and conclusions in Ghilbert proofs be closed terms wrt binding variables?
This is where I struggle, and am leaning "no". My experience with the MM->Lean sketch a couple weeks ago was that manually adding lambdas was really tedious and hard to reason about. I can see it driving potential new users away. My gut feeling here is that adding lambdas to close terms is something that can and should be automated, and that the sweet spot for this automation is to start with something that looks like the source language for the ε translation presented in section 4.1 of the LF paper.
I'm open to the possibility there's a good way to do this, but would need to see something. So probably not.
3. Should Ghilbert be "adequate" wrt lambda-Pi?
In other words, for every lambda-Pi term should it be possible to express it in Ghilbert so that the original term results from translation, obviously modulo the usual lambda-calculus equivalences?
The answer here is, I'd like to if it's cleanly possible. Such a feature would obviously be pretty compelling. But I don't know enough yet to say either way, so it feels worth exploring. Probably one good way to explore it would be to write a unification-based lambda-Pi checker and see how it stacks up compared to the checker I just wrote. [Btw, that one feels pretty good. I have a patch locally that fixes most of the problems in the non-binding fragment, and can see how to add simple lambda binding without too much difficulty]
4. Should natural deduction be expressible?
This would seem to follow fairly directly from (3), as the LF paper and subsequent experience clearly demonstrate that natural deduction is expressible in lambda-Pi. But I'm absolutely not sold on its importance, as it seems to me that the experience of Metamath demonstrates equally clearly that natural deduction is not required for concise proofs.
This is probably one of the most subjective questions in the list, as it's probably the biggest single thing that would increase the rift with Metamath, but on the other hand people who have used natural deduction systems might well look at pure Hilbert deduction as primitive and barbaric.
Plus, if I did this, I'd probably have to change the name to "Gnatural."
5. Should Ghilbert grow types?
My gut feeling here is yes, it needs to. Metamath gets away without them, but even within Peano I can see how we need to express a number hierarchy (certainly up to rationals), finite lists, etc. In Z2 it's even stronger, as you'll at a minimum have two different ranges for quantifiers. I don't think my original plan of namespaces and kinds is going to cut it.
Types are obviously hard. We'll need subtypes in some form or other, to express the number hierarchy, but subtypes don't play well with other features.
If we accept (1), then it's very natural that the types should be based on lambda-Pi, because they'll need to translate into that. Also, we have the example of Lean, showing that we can express subtyping-style relations using typeclasses.
So my feeling here is that I'd like to do them, but I want to do so carefully. A major goal for my retreat this fall is to understand type theory well enough to make good decisions about how to incorporate them into Ghilbert.
Ok, onto the second set of thoughts, areas where I think Ghilbert can go farther than Lean.
1. A major goal of Ghilbert is publishing.
I now see Norm's "mmpegif" as visionary, using a simple static tool to produce a good Web page for every theorem in the library. I have ideas (some based on experience from Ghilbert 1, some new) to generate really good Web pages. I have in mind high-quality typesetting (starting with MathJax but then refining it further), folding/unfolding to hide subproofs (thanks Paul for that in the Gh1 prototype!), and better JS interactivity. I also think my new proof format (with lines and result terms) will lend itself to better presentation of proofs. But all this is a fairly conservative extension of what's already been done.
An equally important part of the original Ghilbert vision is inviting contributions. I planned on making a web-based IDE, but was too ambitious. Now I plan to outsource the collaboration part to Github, and they can take care of the really hard parts of managing accounts, having a flow for commenting on and approving proposed changes, etc. Also, we can just use stock CI for checking the validity of changes. I think this is a low enough barrier to entry I don't see coming back to the hand-rolled Git server any time soon. And I think a plugin for a local editor is going to be a better experience than a web-based IDE for a while. [That said, I am not ruling out web-based tools in the long run, and am looking at wasm as the way to get there]
You obviously could build a publishing tool on top of Lean (and what they've done with their books is impressive, especially live editing), but it seems a pretty different direction.
2. Ghilbert should be very good at "trivial proofs".
All theorem provers have some relationship with computation, and some way of automating trivial proofs that the result of evaluating some expression is some reduced term. In Lean, the main such feature is "eval", but it seems to be quite limited in terms of large computations, even "eval (2227 + 9999) * 33" (an example from the book) causes the prover to blow up. Thus, Lean has "vm_eval" which is a reasonably efficient bytecode interpreter for its language, which overlaps greatly with the Calculus of Constructions that underlies its logic.
I think "vm_eval" makes sense for what Lean is trying to do, but there's a lot I find unsatisfying about it. The main thing is that there isn't a pure lambda-Pi term representing the result, so among other things you can't port it to other lambda-Pi based systems.
I'd like to develop a very robust evaluator for Ghilbert, the output of which is optimized traces. The proof of the above math expression should be about a couple hundred bytes; just the addition is "add1 (add1 (add1 2adc9 2adc9) 2adc9) 7add9" where I've defined add1 as (with a;b = 10*a+b):
axiom add1 (a+b+1 = c) (d+e = 1;f): (a;d + b;e = c;f);
In the usual paradigm of theorem provers, many people would think of optimized traces as silly. Why bother producing an artifact at all, when you can either have a proved-correct interpreter, or an interpreter that generates a trace to be consumed immediately (as in the LCF tradition)? I think the big advantage is interoperability, a direct consequence of the looser coupling between the module that generates the trace and the module that consumes it. I can take my trace and translate it into Lean or Idris or whatnot, and it will just work. And I think the cost of storing the trace explicitly is manageable; we routinely deal with media in the gigabytes.
I think optimized traces are intellectually interesting as well. The underlying "machine" is nondeterministic at heart, for example a witness that a number is composite is just the factors, and the trace needn't capture any of the grinding search. More interesting, tools such as Primo generate "primality certificates". I'd really love to prove the justifying theorems for ECPP (Elliptic Curve Primality Proving) and import those into Ghilbert.
The new unification-based checker should work even better than the Gh1 prototype for this, as it will facilitate incremental state updates by doing sparse manipulations in a tree, much like persistent data structures in a purely functional language.
3. Translation in general.
Not just trivial theorems, but carefully handcrafted proofs, should be portable. There is progress on this in the outside world (
OpenTheory in the HOL space and
Dedukti from Inria), but
4. Other logics.
This is basically the "little theories" project. I keep seeing evidence that logics beyond first-order arithmetic or set theory are useful and relevant. The "
Chalk" logic programming fragment from the Rust world uses a type of modal logic. Separation logic is increasingly used for verifying programs that mutate a store.
LF has some problems with these other logics, as it assumes the underlying consequence relation must satisfy weakening and contraction. There are ways to deal with this, but I think it boils down to using Hilbert-style deduction. Ghilbert should hopefully be pretty good at this.
5. Rust.
The choice of language does matter, and for a variety of reasons I think Rust is ideal for my goals. In fact, I think one of the reasons I felt stuck earlier is not being able to decide on the right language.
I don't want to blame my tools too much (poor craftsman and all that), but looking back I see that Python was something of a mixed bag. It was excellent for my early work, mm_verify.py demonstrating that a Metamath checker could be done in just a few hundred lines of code. But I think it's also led me astray a bit, in particular representing terms in the "natural" s-expression encoding of lists and strings. It's now extremely clear to me that verification requires a graph representation and unification as a core algorithm, and Python doesn't particularly help with those. By contrast, in Rust I had a choice of three union-find algorithms in
crates.io. I'm also really happy about Rust's performance.
As I build more stuff (typesetting, evaluator, interactive editor plug-in), Rust feels like a good fit for all of these.
As I touched on above, I expect Rust to compile nicely into wasm, so that the tool can be used in a web-based environment.
Lastly, the Rust community is amazing, and I think there's special value in being one of the leading X projects written in Rust, for pretty much any value of X. I'm happy to apply the specialization [Theorem Checker/X] here.
Hope you found all this interesting,
Raph