checkmm ported to TypeScript

31 views
Skip to first unread message

Antony Bartlett

unread,
May 16, 2022, 9:02:08 AMMay 16
to meta...@googlegroups.com
Hi,

I'm writing to announce I have completed my port of the checkmm verifier from C++ to TypeScript.  The test suite passes.

If you have nodejs installed (and tens of millions of developers worldwide do) you can verify an mm file in your current working directory with a command, e.g.

npx checkmm set.mm

Though naturally it's more efficient to install my checkmm package first.

Link to package and source code repositories here


(that's the announcement done, just a few remarks to follow).

The O(n^2) problem I mentioned previously turned out to be due to the JavaScript Array.shift operation (which removes an item from the front of an array) being an O(n) operation which was used n times.  Glauco reports previously having a similarly unexpected problem with Array.slice.  The takeaway from this is to make sure you properly understand the capabilities of whatever containers you decide to use, and in particular to be wary of a JavaScript array that might be a million items or more long.

I just reversed the array, by the way - which is also an O(n) operation, but I only had to do it once, then my tokens could be removed in the desired order with Array.pop which is an O(1) operation.

What enabled me to track down this problem in about three quarters of an hour was knowing my way around the Visual Studio Code debugger - that's what I didn't have in 2019 when I first encountered this difficulty.  Debuggers have their detractors (Linus Torvalds is probably the best known), and in the context investigating performance issues they're sometimes called the "poor man's profiler", but personally I find a debugger to be an indispensable general purpose tool

I did this port because I like Eric Schmidt's C++ source code for checkmm.  I alway respect a job well done by source code, but actually it's quite rare for me to like the source code itself.  As such my port should be pretty faithful to the original.  Hopefully you should almost be able to step through the two source files line-by-line and treat it like a Rosetta Stone between C++ and TypeScript.


It should be said though that reading code written with C++'s Standard Template Library containers is a bit of a specialised skill.  It's never let me down with unexpectedly slow operations not immediately obvious from the documentation like JavaScript has above, but with that caveat aside, I anticipate my TypeScript port being found to be slightly easier to read.

    Best regards,

        Antony

Thierry Arnoux

unread,
May 16, 2022, 10:45:40 AMMay 16
to meta...@googlegroups.com, Antony Bartlett

Congrats, a new verifier is always a good news!

And Typescript is a very relevant language.

Interestingly enough, Raph Levien (& al)'s Python verifier has the exact same "trick" with reversing the token buffer, then popping from it. This had me wonder.

What's the current running time?

In any case I would suggest adding it to our set.mm CI!

_
Thierry

--
You received this message because you are subscribed to the Google Groups "Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metamath+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/metamath/CAJ48g%2BD5uC0tesStx%2Bj-bJe1Zmw4bkx1pqG-puoHpAWosX7k0Q%40mail.gmail.com.

Mario Carneiro

unread,
May 16, 2022, 2:41:37 PMMay 16
to metamath, Antony Bartlett
The pop/reverse does seem unnecessary. I would just keep track of the position in the array, possibly using a little structure containing the array and the current position with some functions to get the current token and move to the next. In fact, you can use this interface to parse the next token on demand, so that you don't need to precalculate the array of tokens (which involves millions of tiny allocations); you just hold on to the input string and pull off tokens as you need.

Antony Bartlett

unread,
May 17, 2022, 4:16:33 AMMay 17
to meta...@googlegroups.com

Thierry Arnoux writes:

> Congrats, a new verifier is always a good news!

Thanks! :-)

> What's the current running time?

It should be respectable.  It's verifying set.mm for me in about eight seconds on the MacBook Pro I'm using.  Unfortunately XCode is currently broken on said device so I'm unable to build other verifiers in order to provide a fair comparison, other than to make the obvious statement that this is not, and never will be, a competitor for metamath-knife.

Mario's suggestion could potentially shave off maybe a couple of seconds from that, but I won't know until I try.

    Best regards,

        Antony


Thierry Arnoux

unread,
May 17, 2022, 5:30:01 AMMay 17
to meta...@googlegroups.com, Antony Bartlett

Eight seconds is respectable, we should try to add this new verifier to our set.mm CI.

Mario's suggestion is actually the way metamath-knife works: this avoids both to allocate and to copy each token. However, Typescript seems to create a new String every time even a substring or a slice is taken, so if you want to get this kind of speedup in TS, you'd probably need to use shared array buffers, which `slice` method, I believe, does not copy. That structure shall also allow use in multi-threading, which is another way to get speedup.

_
Thierry

Mario Carneiro

unread,
May 17, 2022, 12:47:23 PMMay 17
to metamath
On Tue, May 17, 2022 at 5:30 AM Thierry Arnoux <thierry...@gmx.net> wrote:

Eight seconds is respectable, we should try to add this new verifier to our set.mm CI.

Mario's suggestion is actually the way metamath-knife works: this avoids both to allocate and to copy each token. However, Typescript seems to create a new String every time even a substring or a slice is taken, so if you want to get this kind of speedup in TS, you'd probably need to use shared array buffers, which `slice` method, I believe, does not copy. That structure shall also allow use in multi-threading, which is another way to get speedup.


Note that even if subslicing creates a new string, it would still be worthwhile to do the slicing on the fly since these strings only exist for a short time: many of them are discarded (like comment tokens or structural tokens like $c or $.) and only math tokens and label tokens are retained for longer. GC generally does better if you have lots of short lived allocations than if you have them all in flight simultaneously (which is what happens if you first tokenize the entire file and then go through the array of tokens).
 
Reply all
Reply to author
Forward
0 new messages