Metamath site us.metamath.org status and plans

123 views
Skip to first unread message

David A. Wheeler

unread,
Jul 24, 2022, 6:59:24 PM7/24/22
to Metamath Mailing List
All:

I've made some improvements to the us.metamath.org website infrastructure.
Details, including upcoming plans, are below. Comments welcome!

--- David A. Wheeler

====================

The most obvious change is that while you can continue to use the old "http:"
URLs, all such requests are immediately upgraded to secure "https:" requests.
Once you're using http requests, all relative requests continue to use https.
The TLS (security) certificates used by https are provided
by Let's Encrypt, and they are automatically updated.

I've made some other improvements (mostly to security); all should be invisible:
* We now use Debian 11 not Debian 10 (this upgrade improved security & performance somewhat)
* The system now *automatically* downloads & installs security updates, so any vulnerabilities
publicly found will be quickly addressed without waiting for me or anyone else.
* We now have a simple intrusion prevention system enabled (fail2ban - it's primarily there
to counter simplistic mass attacks).
* I've enabled a basic firewall to make life slightly harder on attackers by ensuring that
only specific identified services are visible.
* I've slightly hardened the kernel configuration against attack
(e.g., by enabling source address verification & ignoring source routing)
* I've slightly hardened the web server (nginx) against attack
(e.g., by only allowing the HTTP requests GET, HEAD, and OPTIONS).

There's always some additional security hardening you can do, but I'm hoping that
these steps will be adequate to keep the site relatively secure.

I've made a larger but subtler change by switching to an "Infrastructure as Code" approach.
That is, the *entire* server is defined by a set of scripts here:
https://github.com/metamath/metamath-website-scripts
We can at any time destroy the current virtual server & recreate it automatically
with those scripts. This eliminates the "I wonder how this server is configured" mystery,
and more importantly, it means we can always rebuild the server from scratch whenever we want
(making the server like cattle instead of like a pet). Anyone can review those scripts and
propose improvements, which if accepted will improve the system.

Currently us.metamath.org is just the web server. I plan to eventually *also* make it
the system that regenerates the website, 1/day. That is taking longer to implement.
For one, I got COVID-19 this week, so I've been asleep most of this week instead of
being useful :-(. Another is that Norm set up us2.metamath.org the way he wanted over
a long period of time, so it's taking me quite some time to figure out how to redo it.
He assumed that storage space was unlimited, but while we *can* get lots of storage, it
costs more money; I would rather not keep unused extra copies that would cost more money.
The cheapest linode plan has enough space for the website & 1 copy being generated, not several.
The scripts were also just very complicated, which I think made sense to him
because he built it over time. However, since we're transitioning to a different
system anyway, I want to have a much less complicated system that we can maintain it into the future.
I think the result will be much simpler & cleaner, but it's taking me a while to figure out
what Norm's scripts did so I can extract just the parts we need.

Once the us.metamath.org site can regenerate the webpages, we can have the other mirrors sync
from *that* system instead (us instead of us2).
I think it'd be good to continue to support mirrors, we've
been doing it for a long time & I see no reason to stop.
However, I think the mirrors should be copying from a site that is *not*
depending on anyone's basement :-).
I'm not sure how that sync'ing should take place. Currently it uses rsync, which isn't secured;
we *could* continue to support that but I think it's a lousy idea today.
The easy answer would be rsync+ssh, which would be fine and works basically
identically to rsync except it adds encryption. If someone wants an alternative
approach, we could probably support alternatives. That said, first we need
to generate data worth synchronizing :-).

Jon P

unread,
Jul 25, 2022, 7:19:38 AM7/25/22
to Metamath
Sounds like really nice progress David, very impressive, hope the covid is light and passes off quickly for you :)

As a tiny thing I noticed on the front page it it says "Constructs mathematics from scratch, starting from ZFC set theory axioms. Over 23,000 proofs." 

Whereas maybe there are about 43k proofs now? 

David A. Wheeler

unread,
Jul 25, 2022, 10:03:27 AM7/25/22
to Metamath Mailing List


> On Jul 25, 2022, at 7:19 AM, Jon P <drjonp...@gmail.com> wrote:
>
> Sounds like really nice progress David, very impressive, hope the covid is light and passes off quickly for you :)
>
> As a tiny thing I noticed on the front page it it says "Constructs mathematics from scratch, starting from ZFC set theory axioms. Over 23,000 proofs."
>
> Whereas maybe there are about 43k proofs now?

I believe that number is *only* for the main part (not, e.g., mathboxes and deprecated sections). We do say "over" so it's not wrong :-).

That said, yes, it'd be great to update it.

--- David A. Wheeler

savask

unread,
Jul 25, 2022, 10:07:41 AM7/25/22
to Metamath
Maybe it's a bit unrelated, but now that the site is under control, would it be possible to update "Recent news items" in the mmrecent.html page and add a note about Norman passing away? I think mmrecent.html isn't in the set.mm repo, so it can't be updated there.

Jim Kingdon

unread,
Jul 25, 2022, 11:30:14 AM7/25/22
to meta...@googlegroups.com

This part of the web page is in git at https://github.com/metamath/set.mm/blob/develop/mmset.raw.html#L468 so you can update it yourself via the normal pull request mechanism (can't remember whether you have been submitting pull requests, but if not there are lots of details at https://github.com/metamath/set.mm/blob/develop/CONTRIBUTING.md and the pages linked from there).

--
You received this message because you are subscribed to the Google Groups "Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metamath+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/metamath/fc16b33d-e226-4e6e-99ef-db668f4db491n%40googlegroups.com.

Glauco

unread,
Jul 25, 2022, 3:21:12 PM7/25/22
to Metamath

Impressive job, David.

I hope I will find the time to look at some details, it looks like there's a lot to learn from your solution.

Thanks, as always, for all the time and effort you put into this framework.

BR
Glauco

David A. Wheeler

unread,
Jul 30, 2022, 6:20:29 PM7/30/22
to Metamath Mailing List

> On Jul 25, 2022, at 3:21 PM, Glauco <glaform...@gmail.com> wrote:
> Impressive job, David.

Thanks! Here are a few additional notes.

I'm slowly working on getting the "rebuild metamath website scripts" working
on a new system. One problem I've found is that the current scripts assume that
the website is already working & downloads from it. So if anything fails, there's
no way to recover things. That sounds like a bad idea.
I intend to change things so that *all* bits used are:
1. From a public source (like Debian),
2. Version controlled publicly (e.g., GitHub), or
3. Generated from scripts that are publicly version controlled & only use data meeting criteria 1-3.
That way, the website is *only* generated from publicly-available information
(other than private keys).

It may be important to make a few more repos & split things off, e.g.,
maybe symbols should be in their own repo. That would make it
easier to do that, and the results should be easier to manage.

There are some old files from Mel O'Cat, including a whole bunch in
<./ocat/mmj2/$RECYCLE.BIN/>. I doubt we need them any more (if we ever did).

I'd like to eventually put us.metamath.org behind a content delivery network (CDN),
probably CloudFlare. This means that requestors will request pages through a CDN,
and if they get a page, the CDN will keep a *local* copy near the requestor.
This makes requests faster, and also protects us from distributed denial-of-service
attacks. These could hurt my wallet, since I get charged for requests more than a certain amount.

--- David A. Wheeler

Cris Perdue

unread,
Jul 30, 2022, 8:32:15 PM7/30/22
to meta...@googlegroups.com
Another advantage that a CDN could have is extra robustness through multiple "origins".
A CDN origin is a web server such as us.metamath.org, that the CDN refers to when it does
not have the data already cached. Many CDNs these days seem to support multiple origins, so if the main
origin, such as us.metamath.org, has problems, the CDN can automatically fail over to another origin
server in some manner. The details seem to vary significantly from one CDN to another.

David, I'd be glad to run down more detailed information on pricing or options if you desire.

-Cris
 

--- David A. Wheeler


--
You received this message because you are subscribed to the Google Groups "Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metamath+u...@googlegroups.com.

David A. Wheeler

unread,
Jul 30, 2022, 9:31:36 PM7/30/22
to Metamath Mailing List


> On Jul 30, 2022, at 8:31 PM, Cris Perdue <cr...@perdues.com> wrote:
> Another advantage that a CDN could have is extra robustness through multiple "origins".
> A CDN origin is a web server such as us.metamath.org, that the CDN refers to when it does
> not have the data already cached. Many CDNs these days seem to support multiple origins, so if the main
> origin, such as us.metamath.org, has problems, the CDN can automatically fail over to another origin
> server in some manner. The details seem to vary significantly from one CDN to another.
>
> David, I'd be glad to run down more detailed information on pricing or options if you desire.

Sure, more info would be great! I've been looking at Cloudflare, which has a free tier that I think would be
adequate and it seems to be widely used. But I'm not married to that
particular CDN. Any advice you have would be welcome.

I like the idea of eventually using multiple origins. We could integrate our existing mirrors so that
people can *use* a specific URL that is extremely fast & robust through the
use of a CDN and our existing mirrors. I think it needs to be inexpensive, and
once set up it should just "keep working", but that all seems possible to me.

I've used Fastly, which is a good CDN service. However, I don't think we need its main advantages
and I don't see a useful free tier. I think we should limit costs where that makes sense.

I think it would be good to eventually create a non-profit "Metamath Association' whose
purpose would be to support & encourage the formalization of mathematics. It'd be paid by
dues, and it would do stuff like pay for DNS registration & website fees. The idea would be to
enable this to outlast any of us. However, I doubt that it would be flush, so I think we need to
limit our costs even if that's the eventual plan. Hopefully some people here would be willing to join :-).

--- David A. Wheeeler

Mázsa Péter

unread,
Aug 1, 2022, 9:40:19 AM8/1/22
to meta...@googlegroups.com
David, thank you very much for your efforts. I like the way you solved
these problems.

On Sat, Jul 30, 2022 at 11:20 PM David A. Wheeler <dwhe...@dwheeler.com> wrote:
> These could hurt my wallet, since I get charged for requests more than a certain amount.
On Sun, Jul 31, 2022 at 2:31 AM David A. Wheeler <dwhe...@dwheeler.com> wrote:
> I think it would be good to eventually create a non-profit "Metamath Association' whose
purpose would be to support & encourage the formalization of
mathematics. It'd be paid by
dues, and it would do stuff like pay for DNS registration & website
fees. The idea would be to
enable this to outlast any of us. However, I doubt that it would be
flush, so I think we need to
limit our costs even if that's the eventual plan. Hopefully some
people here would be willing to join :-).

I intended to suggest that you should register some crypto wallet
shorthand names
https://unstoppabledomains.com/search?searchTerm=metamath.wallet&searchRef=home&tab=all
but somebody already registered them all.

P.

Cris Perdue

unread,
Sep 1, 2022, 8:07:50 PM9/1/22
to meta...@googlegroups.com
David and All,

I wrote that I would look at some options for CDNs a few weeks ago. I have been busier with other things than I expected, but I have indeed learned some things.

CloudFlare
========

Overall CloudFlare looks great. 

Their web control panel is quite pleasant to use, and as David says, they also have a free tier. Beyond that, they offer their "Pro" level of service at no charge for open source projects that they deem suitable, instead of the usual $20/month. (https://blog.cloudflare.com/cloudflare-open-source-your-upgrade-is-on-the-house/) Metamath is not a perfect fit for this program, but I could easily imagine it might be accepted. Even the free level might turn out to be very satisfactory, though "Pro" might be of some additional use.

I set up a small test site using their free service and studied the options available. Cloudfront would probably be more than adequate just "out of the box" using their free tier.  That alone would automatically give Metamath support for HTTPS without any need for anyone associated with Metamath (David!) to manage SSL certificates. In that setup the origin server would serve plain old HTTP, but users would see HTTPS through CloudFlare.

Beyond that -- given that the Metamath site is entirely(?) static content (served entirely from text files of HTML)  their option to "cache everything" looks like it might be very suitable for Metamath as long as the Cloudflare distributed cache is purged whenever a new copy of the Metamath site is released. Alternatively, their caching can be set to be responsive to "Cache-control" headers sent by the origin web server, but I see that us.metamath.org does not currently send these response headers. Notably, Cloudflare by default does not cache HTML unless this option is turned on, though it will by default cache css, js, images, etc.. Some relevant documentation of all of this is at https://developers.cloudflare.com/cache/best-practices/customize-cache, and https://developers.cloudflare.com/cache/about/cache-control. Netlify has an article at https://www.netlify.com/blog/2017/02/23/better-living-through-caching/ explaining the Cache-control headers they send in responses and why, should anyone be interested. But there may be little or no advantage to this for Metamath if using Cloudflare.

I also started a subscription to their load balancing and tried it out. This service costs an additional $5/month regardless of whether using the "Free" or "Pro" plan. It was not overly hard to set up, and allows users on the "Free" plan to set up two origin web servers, with failover if a server goes down. Traffic to the servers can be "steered" by geographic region as well. Health checks can be set up every 60 seconds, and Cloudflare can be set to stop sending traffic to a down server after a single health check failure.  Experiment showed that this worked properly, leaving only a window of up to 60 seconds where a bad server might be visible to users. They can also send you email when server health status changes.

I did not notice any potential gotchas along the way.

AWS CloudFront
=============

I investigated a similar setup with Amazon's CloudFront, which also has a "free tier" (https://aws.amazon.com/cloudfront/pricing/?loc=ft#AWS_Free_Usage_Tier). Cloudfront has plenty of capability, and I think a configuration would not be terribly hard to manage once it is set up; however I found their documentation and instructions for setup distinctly unhelpful overall. Some attempts at friendliness are made in places, but mostly you have to figure it out on your own. Diagnostics were also woeful.

CloudFront can support load balancing with the free tier at very low prices. My guess is that it would be less than Cloudflare's $5 per month, but that would not come close to making up for the extra hair-pulling involved. (Their charges are based on usage.)

Anything else?
===========

I didn't notice any other similar service with a genuine free tier, and it is not clear to me why Cloudflare would not be just fine.

David, it might be interesting if you could give a ballpark estimate of the Web service bandwidth per month typically used by us.metamath.org
Any thoughts on this?

Best,
Cris

--
You received this message because you are subscribed to the Google Groups "Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metamath+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages