My github repository is
https://github.com/mbstoner53 .
I had to make my own branch of metamath.exe and edit the function
called getTextLongMath in the file mmwtex.c to get the spacing to look right for an extra symbol I defined in my own database. I notice even the comment in the file itself calls it a "kludge".
I think if more and more symbols are added to the
set.mm all of the extra spacing rules to make html look prettier should be stored in the database itself somehow rather than in the source file mmwtex.c. I don't know if anyone thinks this would be worth it. I have an idea on how to do it, but I'm not familiar enough with how the source code works to do it all alone.
Anyways my idea would be to place a list of regex expressions like...
regex quantifier as "^A\.$|^E\.$|^F/$|^A*$|E*|^A!$|^E$!";
regex spaceafter as "^[a-z]$";
regex spacebefore as "^[a-z]|[A-Z]|[0-9]$";
regex neg as "^\-\.$"
regex parenth as "^\($|^\)$"
and then a list of rules like...
rule spcafterqvar as "-0 spacebefore; -1 spaceafter; -2 quantifier;" " ";
rule spcbeforeneg as "-0 neg; -1 -parenth;" " ";
The minus is the matching token's offset from the token being parsed. The first rule says if you have a quantifier, then a set variable, then a set variable, class, or numeric constant, then put a " " between the latter two. A minus sign before the regex name means "not matching", so the second rule says if you have a logical negation symbol immediately after something that is *not* a parenthesis, then put a " " between the two.
To make it efficient, the each regex would be tested on each element in the array g_MathToken then stored ahead of time. Then the rules are checked in the function getTexLongMath by accessing pre-calculated binary arrays linked directly to token type.
I wouldn't know how to write a regex engine myself for c so I would hope there's one to borrow.