New proof verifier by Dan Getz

91 views
Skip to first unread message

Norman Megill

unread,
Jun 12, 2016, 11:41:01 AM6/12/16
to Metamath, get...@gmail.com
Dan Getz has written a Metamath proof verifier in the Julia language:

https://github.com/getzdan/Metamath.jl

The code was inspired by checkmm.cpp by Eric Schmidt and includes verification of compressed proofs.  Dan conducted the following benchmark test for verifying set.mm:

Time (s)  Verifier        Computer language
108.1     mmverify.py     Python (3.4.3)
25.2      MM Tool         Javascript (V8 5.0.71.49)
15.5      checkmm.cpp     C++ (g++ 4.8.4)
10.1      Metamath.jl     Julia (0.4.6pre)

Dan mentioned the following subtlety in the spec, which was noted in checkmm.cpp:

"According to the spec, file inclusion commands should not include a file that has already been included. Unfortunately, determing whether two different strings refer to the same file is not easy, and, worse, is system-dependant. This program ignores the issue entirely and assumes that distinct strings name different files. This should be adequate for the present, at least."

I think we should add to the spec that the verifier should assume that different strings refer to different files for the purpose of determining whether a file has already been included.  This is probably already the case in all verifiers written so far, although I haven't checked.  This would be #8 in the list of spec clarifications at:

https://groups.google.com/d/msg/metamath/ZlRle52FVO0/7TqSetEtCQAJ

While there may be no OS-independent way to verify this, in practice in a file included twice would generate other errors that would detect it if it is harmful.  Since we now require included files to be at the outermost scope, I can't think of a situation where a file included twice will not either cause an error message (e.g. because of a duplicate $a or $p label) or be effectively vacuous (e.g. a file with nothing but comments or empty scopes).

- Norm

Stefan O'Rear

unread,
Jun 12, 2016, 6:12:45 PM6/12/16
to metamath list
On Sun, Jun 12, 2016 at 8:41 AM, Norman Megill <n...@alum.mit.edu> wrote:
> Dan Getz has written a Metamath proof verifier in the Julia language:
>
> https://github.com/getzdan/Metamath.jl
>
> The code was inspired by checkmm.cpp by Eric Schmidt and includes
> verification of compressed proofs. Dan conducted the following benchmark
> test for verifying set.mm:
>
> Time (s) Verifier Computer language
> 108.1 mmverify.py Python (3.4.3)
> 25.2 MM Tool Javascript (V8 5.0.71.49)
> 15.5 checkmm.cpp C++ (g++ 4.8.4)
> 10.1 Metamath.jl Julia (0.4.6pre)

Neat. Unfortunately my informal benchmarks only cover metamath.exe,
mmj2, smm2, and smm3 (will be announced soon when I get incremental
mode and the grammar check properly working), so I don't have an
anchor point. Anyone want to try setting up a more comprehensive run?

> Dan mentioned the following subtlety in the spec, which was noted in
> checkmm.cpp:
>
> "According to the spec, file inclusion commands should not include a file
> that has already been included. Unfortunately, determing whether two
> different strings refer to the same file is not easy, and, worse, is
> system-dependant. This program ignores the issue entirely and assumes that
> distinct strings name different files. This should be adequate for the
> present, at least."
>
> I think we should add to the spec that the verifier should assume that
> different strings refer to different files for the purpose of determining
> whether a file has already been included. This is probably already the case
> in all verifiers written so far, although I haven't checked. This would be
> #8 in the list of spec clarifications at:
>
> https://groups.google.com/d/msg/metamath/ZlRle52FVO0/7TqSetEtCQAJ
>
> While there may be no OS-independent way to verify this, in practice in a
> file included twice would generate other errors that would detect it if it
> is harmful. Since we now require included files to be at the outermost
> scope, I can't think of a situation where a file included twice will not
> either cause an error message (e.g. because of a duplicate $a or $p label)
> or be effectively vacuous (e.g. a file with nothing but comments or empty
> scopes).

There's an interaction here with the recent decision to use relative
paths. Suppose you have:

A.mm:

$[ B/C.mm $]
$[ D.mm $]

B/C.mm:

$[ ../D.mm $]

Is D included once or twice? (SMM3 picks "once", but it could be
persuaded to include a file twice using hard links or a
case-insensitive filesystem. Symbolic links do get followed before
the reinclude check, but that's more an accident than anything.)

-sorear

Scott Fenton

unread,
Jun 12, 2016, 6:17:43 PM6/12/16
to meta...@googlegroups.com

mmverify.py actually uses a realpath routine, so strings get put to canonical form first. Per the Python spec, it also handles hard and soft links correctly.

--
You received this message because you are subscribed to the Google Groups "Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metamath+u...@googlegroups.com.
To post to this group, send email to meta...@googlegroups.com.
Visit this group at https://groups.google.com/group/metamath.
For more options, visit https://groups.google.com/d/optout.

fl

unread,
Jun 13, 2016, 4:14:36 AM6/13/16
to Metamath, get...@gmail.com

While there may be no OS-independent way to verify this,


The C way to solve the problem is well-known. There is an identifier at the beginning of the file that is stored
by the compiler when it opens the file for the first time. If the file is opened a second time and the identifier
met again, the content of the file is not read.

So no more needs to consider the problem of the canonicity of the file names.

--
FL

Marnix Klooster

unread,
Jun 13, 2016, 6:07:05 AM6/13/16
to Metamath
Hi Norm,

Alternatively, you could choose to refine the specification, so that "the same file" means "a file with exactly the same contents".  That would allow any implementation to first check whether some platform-dependent canonical name is the same, and if not, just compute some hash of the file contents and check whether that is matches the hash of some file that was read earlier.

However, this forces verifiers to compute this hash for every file read, which may or may not be nice.

Groetjes,
 <><
Marnix

--
You received this message because you are subscribed to the Google Groups "Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metamath+u...@googlegroups.com.
To post to this group, send email to meta...@googlegroups.com.
Visit this group at https://groups.google.com/group/metamath.
For more options, visit https://groups.google.com/d/optout.



--
Marnix Klooster
marnix....@gmail.com

rzeno

unread,
Jun 13, 2016, 6:55:10 AM6/13/16
to meta...@googlegroups.com
On Mon, Jun 13, 2016 at 12:06:45PM +0200, Marnix Klooster wrote:
> Hi Norm,
>
> Alternatively, you could choose to refine the specification, so that "the
> same file" means "a file with exactly the same contents". That would allow
> any implementation to first check whether some platform-dependent canonical
> name is the same, and if not, just compute some hash of the file contents
> and check whether that is matches the hash of some file that was read
> earlier.
>
> However, this forces verifiers to compute this hash for every file read,
> which may or may not be nice.
>
NOT NICE! hashes, between same file with a single space difference, are very different.
"same content", ignoring the grammar of the file, is a very hard task!
I don't have a solution, so is just my opinion.

best regards

David A. Wheeler

unread,
Jun 13, 2016, 8:33:38 AM6/13/16
to meta...@googlegroups.com, Marnix Klooster
All of this seems excessively complex. It seems to me that all you need to do is avoid rereading a file when you've already seen exactly the same filename. It's not worth trying to counter someone who creates symbolic links or other trickery.
--- David A.Wheeler

Norman Megill

unread,
Jun 13, 2016, 10:50:49 AM6/13/16
to meta...@googlegroups.com
On 6/12/16 6:12 PM, Stefan O'Rear wrote:
...
> There's an interaction here with the recent decision to use relative
> paths.

Can you refresh my memory about this decision?

> Suppose you have:
>
> A.mm:
>
> $[ B/C.mm $]
> $[ D.mm $]
>
> B/C.mm:
>
> $[ ../D.mm $]
>
> Is D included once or twice? (SMM3 picks "once", but it could be
> persuaded to include a file twice using hard links or a
> case-insensitive filesystem. Symbolic links do get followed before
> the reinclude check, but that's more an accident than anything.)

Well, metamath.exe tries to open an included file with respect to the
current directory regardless of what directory the file with the
inclusion came from. It simply puts the string between $[ and $] into
the C language's fopen() function. So the ../D.mm would cause it to
give a file-not-found error (unless there is a different D.mm one level
up). In other words, it takes whatever string is in the inclusion
statement and gives it to the programming language's file-open command.

Perhaps you think this is ugly, but my intent was to make the verifier
as simple as possible by opening as a file whatever string is given,
without having to parse and keep track of directory paths with different
syntax for different OSs. Indeed, just specifying a file outside of the
current directory already risks making the .mm file non-portable.
In C, the pointers to the FILE structure for a file opened twice will
necessarily be different since the program has to know what position it
is reading from for each instance of the file.

The identifier for the file may be one of the FILE structure members,
but as far as I can tell, these members are defined by the compiler and
the OS, and there is no portable way for a program to access them. I
don't think the ANSI C spec says anything about the members of a FILE
structure. If someone knows otherwise, let me know.
I agree. Comparing the string for the filename is the only way I know
to guarantee the same behavior independent of the programming language
and operating system.

Norm

Norman Megill

unread,
Jun 14, 2016, 1:56:48 AM6/14/16
to meta...@googlegroups.com
On 6/13/16 10:50 AM, Norman Megill wrote:
> On 6/12/16 6:12 PM, Stefan O'Rear wrote:
> ...
>> There's an interaction here with the recent decision to use relative
>> paths.
>
> Can you refresh my memory about this decision?

I located this Dec. 2014 thread:

https://groups.google.com/d/msg/metamath/hrtD9_wAcHM/YjC4FaQi8A4J

which has a discussion but didn't seem to come to a definite conclusion.
Was there anything more recent?

>
>> Suppose you have:
>>
>> A.mm:
>>
>> $[ B/C.mm $]
>> $[ D.mm $]
>>
>> B/C.mm:
>>
>> $[ ../D.mm $]
>>
>> Is D included once or twice? (SMM3 picks "once", but it could be
>> persuaded to include a file twice using hard links or a
>> case-insensitive filesystem. Symbolic links do get followed before
>> the reinclude check, but that's more an accident than anything.)
>
> Well, metamath.exe tries to open an included file with respect to the
> current directory regardless of what directory the file with the
> inclusion came from. It simply puts the string between $[ and $] into
> the C language's fopen() function. So the ../D.mm would cause it to
> give a file-not-found error (unless there is a different D.mm one level
> up). In other words, it takes whatever string is in the inclusion
> statement and gives it to the programming language's file-open command.
>
> Perhaps you think this is ugly, but my intent was to make the verifier
> as simple as possible by opening as a file whatever string is given,
> without having to parse and keep track of directory paths with different
> syntax for different OSs. Indeed, just specifying a file outside of the
> current directory already risks making the .mm file non-portable.

Thinking this over, it does seem to be rather inflexible, although it is
how metamath.exe works now. The spec is silent on the issue.

Perhaps we should recognize "/" and ".." in the spec as the standard for
specifying the path. These are honored in HTML URLs, even on Windows,
so it has become more or less a universal and OS-independent de facto
standard if not an actual one.

In a metamath.exe implementation, the path would be set with the "read"
command in metamath.exe and tracked through includes. For example,
"read '../abc/A.mm'" in your example would set the path to "../abc".
"$[ B/C.mm $]" would set the path to "../abc/B" inside of C.mm, so "$[
../D.mm $]" inside of C.mm would become "../abc/B/../D.mm" which
collapses to "../abc/D.mm". The collapsed version is saved for
comparison for the "file already included" condition. When the
inclusion of C.mm is complete, the path would pop back to "../abc", and
when D.mm is included inside of A.mm, its path would be "../abc/D.mm"
which the program then skips since it was already included.

This wouldn't be a terribly difficult to change in metamath.exe,
although I always hate making the spec more complex. It would mean (1)
keeping track of the path through inclusions, with ".." and "/" as the
universal path specifiers, and (2) collapsing relative paths so that
".." does not occur in the middle of a file name, before doing a string
comparison to determine if a file has already been included.

A hard or soft symbolic link to D.mm called Dlink.mm, or a reference to
d.mm in a case-insensitive OS, would cause the above algorithm to
include D.mm twice, since its collapsed path doesn't equal "../abc/D.mm"
using a string comparison in the verifier. We would need to do this to
ensure that all verifiers behave the same way, and not use any
language-specific feature that determines that a symbolic link is the
same file. In particular, I don't think that determining that a file
and a symbolic link to it are the same can be done in a portable way in C.

This may be what Mario intended in the 2014 post, but there didn't seem
to be a consensus that I could find, at least not in that thread.

Norm

Mario Carneiro

unread,
Jun 14, 2016, 2:58:09 AM6/14/16
to metamath
Regarding my original intentions in the 2014 discussion:

I was interested in making the include file semantics more C-like, and in particular to make the path resolution not dependent on external factors like the current directory of the metamath.exe program. Frankly, I also don't use the $[ $] file inclusion practically at all, and like Stefan I have a general dislike of the feature's implementation. Luckily set.mm does not need file inclusions, but on the flip side this means that the feature does not get exercised in our biggest metamath project, which prevents it from evolving. I have an expectation that with the eventual breakup of set.mm will come a better or at least more principled usage of file inclusion, which will guide the future behavior of this command.

Currently I am in favor of a "project file" approach similar to what is described at https://groups.google.com/d/msg/metamath/ZlRle52FVO0/4TE_htrbBwAJ for set.mm breakup. Whether the $[ $] include command is used to achieve this, or if there are other use cases where other features of includes are important, is still up for debate.

Then again, I think it's still all up for debate - I don't recall there being any final decisions in this arena.

Mario



Norm

fl

unread,
Jun 14, 2016, 4:49:45 AM6/14/16
to Metamath
 
Then again, I think it's still all up for debate - I don't recall there being any final decisions in this arena.


The problem with splitting the file is always the same: it breaks the current possibility to search for the theorems thanks 
to a simple buffer in a good editor. This feature is terribly efficient. It would be unfortunate to remove  it. 

What is more, looking at mathematics as a simple linear thread of theorems -- what is currently involved by the set.mm
structure -- is pertinent. It helps to understand mathematics. Any other structures clouds them.

-- 
FL

Stefan O'Rear

unread,
Jun 18, 2016, 5:29:01 AM6/18/16
to metamath list
On Mon, Jun 13, 2016 at 10:56 PM, Norman Megill <n...@alum.mit.edu> wrote:
> On 6/13/16 10:50 AM, Norman Megill wrote:
>> On 6/12/16 6:12 PM, Stefan O'Rear wrote:
>>> There's an interaction here with the recent decision to use relative
>>> paths.
>>
>>
>> Can you refresh my memory about this decision?
>
> I located this Dec. 2014 thread:
>
> https://groups.google.com/d/msg/metamath/hrtD9_wAcHM/YjC4FaQi8A4J
>
> which has a discussion but didn't seem to come to a definite conclusion.
> Was there anything more recent?

FYI, since I seem to have only been imagining the consensus, I've
changed SMM3 back to use the metamath.exe behavior.

-sorear
Reply all
Reply to author
Forward
0 new messages