Update: website generation rewrite

157 views
Skip to first unread message

Mario Carneiro

unread,
Sep 30, 2023, 12:52:56 AM9/30/23
to metamath
For those who aren't aware, I've been working on a rewrite of the website generation code. The main PR is https://github.com/metamath/metamath-website-scripts/pull/16 , with links to other parts of the process. I wanted to share another milestone: I'm also writing https://github.com/digama0/mm-web-rs/ as a rewrite-it-in-rust alternative to the metamath.exe theorem generation code. The most expensive part of the website generation is the theorem files (GIF and unicode) for set.mm, in my test these are 4893.94 s and 5020.87 s respectively (that is, 2h45m total). I just got the new version working, and it takes 10.8 s and 11.1 s respectively, a phenomenal 452x improvement over the original. This is multithreaded, but it is still only 18s + 17.3s single-threaded, so this still makes a huge difference even if the machine is single-threaded (I think it is).

Jim Kingdon

unread,
Sep 30, 2023, 3:04:10 AM9/30/23
to meta...@googlegroups.com
On 9/29/23 21:52, Mario Carneiro wrote:

> in my test these are 4893.94 s and 5020.87 s respectively (that is,
> 2h45m total). I just got the new version working, and it takes 10.8 s
> and 11.1 s respectively

Exciting stuff!

This has me pondering various possibilities, such as generating the
whole site on my machine (because I'm often proving where I don't have
good internet) rather than just one page at a time as I have been doing.


Mario Carneiro

unread,
Sep 30, 2023, 5:02:40 AM9/30/23
to meta...@googlegroups.com
Note that mm-web-rs also supports a "server" mode, where you can give it a file and it will serve a live site without much precomputation. All it needs is a live reload feature (i.e. it detects when you save the .mm file and reloads) and you will get instant feedback (loading individual pages takes milliseconds). I've also been contemplating just running this on the server directly, so that we don't need to spend gigabytes of space on all the precomputed HTML files if it is comparably fast to just compute them on the spot.

--
You received this message because you are subscribed to the Google Groups "Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metamath+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/metamath/a1613087-36e4-a936-0641-e935315cedda%40panix.com.

Benoit

unread,
Sep 30, 2023, 7:12:49 AM9/30/23
to Metamath
Thanks for the website generation work, and the much needed simplification. I think further simplification, on many levels, would be achieved by dropping GIF support (as for the HTML generation, we could have the default be the current HTML/unicode, and the backup being HTML/plaintext, which simply displays the text in the source, with possibly some coloring as in the unicode case).

As for the repos, this would allow to delete https://github.com/metamath/symbols. I also think that the two repos metamath-website-script and metamath-website-seed be a single repo "website" (which I already proposed when I was informed of their creation). I don't see the need for this extra complexity. (Also, why prepend repositories in the project "metamath" with the prefix "metamath-" ?)

Benoît

Mario Carneiro

unread,
Sep 30, 2023, 7:18:42 AM9/30/23
to meta...@googlegroups.com
I don't think we should delete the symbols repo. Those GIFs are the result of Norm's hard work, and they stand alone as a project. See https://us.metamath.org/symbols/symbols.html for more context.

Mario Carneiro

unread,
Sep 30, 2023, 7:27:16 AM9/30/23
to meta...@googlegroups.com
I agree that the -scripts and -seed repos could be merged into one repo. Moreover, I am of the opinion that we should also move all the .html and .raw.html files from set.mm repo to the website repo, even though those files are more heavily trafficked than most other pages. When you want to add a new page, you need to modify the website repo anyway because otherwise it won't get copied correctly, and if it's living next to the scripts then it will be easy to maintain .raw.html and the necessary processing along with it. By comparison, there is very little "tandem update" work involved in an .html file versus the .mm file - they can generally be updated in any order, with the main issue being possible broken links if the mm file links to the html file or vice versa before the other has landed.

Benoit

unread,
Sep 30, 2023, 7:48:55 AM9/30/23
to Metamath
On Saturday, September 30, 2023 at 1:18:42 PM UTC+2 di....@gmail.com wrote:
I don't think we should delete the symbols repo. Those GIFs are the result of Norm's hard work, and they stand alone as a project. See https://us.metamath.org/symbols/symbols.html for more context.

Oh, I agree. I misread the purpose of that repo.

Benoît

Jim Kingdon

unread,
Sep 30, 2023, 1:04:08 PM9/30/23
to meta...@googlegroups.com

On 9/30/23 04:27, Mario Carneiro wrote:

Moreover, I am of the opinion that we should also move all the .html and .raw.html files from set.mm repo to the website repo, even though those files are more heavily trafficked than most other pages. When you want to add a new page, you need to modify the website repo anyway because otherwise it won't get copied correctly, and if it's living next to the scripts then it will be easy to maintain .raw.html and the necessary processing along with it. By comparison, there is very little "tandem update" work involved in an .html file versus the .mm file - they can generally be updated in any order, with the main issue being possible broken links if the mm file links to the html file or vice versa before the other has landed.

I see the big issue as being the links from the html files to specific theorem names or math syntax. If the web pages now in the set.mm repo are moved, there would appear to be some kind of catch-22 in terms of which repository would have to be updated first and how we could build automated checks that the websites refer to valid syntax/theorems.

In particular, mmil.raw.html changes very frequently and it is very much tied to updates to iset.mm and set.mm.

Adding a new page is rarer, so I'm not sure that's a huge consideration (especially if we make the scripts better organized so we better understand where we need to add it).

I'm not saying I'm implacably opposed to any change but I'd tread cautiously. On the whole I think the current organization has served us well (I mean, in terms of which web pages are maintained where, not all the details of how the scripts are written which hopefully we are already on the path to improving quite aside from moving web page files around).


Mario Carneiro

unread,
Sep 30, 2023, 5:08:17 PM9/30/23
to meta...@googlegroups.com
There is very little code involved in keeping a file somewhere else, there is just one cp line in the script. So I'm fine if we want to make an exception for certain files. But currently it's not clear that this is what is happening, and yet there is a lot of inconsistency, for example mmbiblio.raw.html is hosted on set.mm repo but IL's mmbiblio.raw.html is in the seed repo. Some kind of general rule for where to look for files would be appreciated, the simpler the better.

--
You received this message because you are subscribed to the Google Groups "Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metamath+u...@googlegroups.com.

Mario Carneiro

unread,
Sep 30, 2023, 5:12:48 PM9/30/23
to meta...@googlegroups.com
the other thing is that dealing with files hosted on set.mm repo is a bit of a mess because it is so jumbled and flat. We have to cherry pick which files go in mpegif, which are for ilegif and which are for nfegif. Putting things in folders will make the script a lot simpler, and would make me more supportive of just moving all the database-specific files into set.mm repo if that's what we want to do.

Benoit

unread,
Oct 2, 2023, 7:42:42 PM10/2/23
to Metamath
On Saturday, September 30, 2023 at 11:12:48 PM UTC+2 di....@gmail.com wrote:
the other thing is that dealing with files hosted on set.mm repo is a bit of a mess because it is so jumbled and flat. We have to cherry pick which files go in mpegif, which are for ilegif and which are for nfegif. Putting things in folders will make the script a lot simpler, and would make me more supportive of just moving all the database-specific files into set.mm repo if that's what we want to do.

I lean toward a rule like:
* files associated with a given database go the metamath/set.mm repository (which could be renamed metamath/databases -- github manages repository aliases);
* files which are more general and "website-wide" go to the metamath/website repository (which is the merging of the metamath/metamath-website-seed and metamath/metamath-website-scripts repositories).

As for organizing the metamath/databases repository, I'm of course all for it, having opened https://github.com/metamath/set.mm/issues/1887 and https://github.com/metamath/set.mm/pull/3515.

Benoît

Mario Carneiro

unread,
Nov 10, 2023, 3:19:38 PM11/10/23
to meta...@googlegroups.com
Another update: mm-web-rs now supports mmtheoremN.html generation. This was not really a bottleneck but it is the next largest cost, taking around 10 seconds in metamath-exe, which is a small fraction of the overall build cost but is still much slower than mm-web-rs can do. It takes about 600ms now.

Besides speed improvements, the main other thing I am focusing on in this rewrite is updating the HTML to use modern standards and pass the validators (for some reason we have a link to the validator in the footer of every theorem page but it hasn't passed in years). So far I'm just using inline and <head> CSS but once metamath-exe is off the critical path for the website generation it will be easier to move more of the CSS into a separate file (which will result in big space savings since it won't have to be duplicated 40000 times).

Both kinds of autogenerated file are also supported in mm-web-rs's server mode, so you can also have a live version of the website without filling your disk. It supports multiple databases now with the same URL layout as us.metamath.org, although it doesn't know how to serve static files yet, so this could in principle be called by the metamath.org web server for files that are not precompiled.

Reply all
Reply to author
Forward
0 new messages