New markup checks

115 views
Skip to first unread message

Mario Carneiro

unread,
Feb 2, 2022, 9:39:54 PM2/2/22
to metamath
Hi All,

I've been working on a reimplementation of metamath.exe's VERIFY MARKUP feature for metamath-knife, you can see the highlights at https://github.com/david-a-wheeler/metamath-knife/pull/70 . It's already started catching some bugs that will be PR'd to set.mm once the checker stabilizes, but I wanted to bring specific attention to two rules which have produced hundreds of new errors on set.mm:

  • 23. The ~ and ` characters, when used as delimiters in markup, should have whitespace around them
  • 24. Characters [, ~ and ` must be doubled if they could not be interpreted as special in markup

  • Some of the violators have been highlighted at https://github.com/david-a-wheeler/metamath-knife/pull/70#issuecomment-1027601499 .

  • The question is: Should these rules be kept?

    * The use of space after ~ was used to make it easier to find labels in comments for find/replace activities, and I think there are only a small handful of violators.
  • * Space *before* ~ seems to have a lot of violators, so unless we like the look of it I'm okay with relaxing this restriction.
  • * Space after the initial ` and before the final ` in a math string has no violators at a glance, I think VERIFY MARKUP was already checking this.
    * Space before the initial ` and after the final ` in a math string has lots of violators for things like (` e. `). Metamath will strip the surrounding spaces if it is a closing punctuation, but there are also examples like "I` /\ `2" in the description of natural deduction rules where the added spaces would be preserved, which is undesirable in this case (we want it to read I∧2). Using unicode html entities would be an alternative way to handle this.
  • * Non-doubled ~ inside a URL was already being warned against in metamath.exe, although arguably it's fine since it doesn't do anything and is interpreted literally.
  • * Non-doubled [ has lots of violators in set.mm. There are examples that seem to suggest that people are relying on [ getting printed literally as long as the ] comes after a space. Something like [foo] has to be written as [[foo] because otherwise it would be interpreted as a bibliography tag, but [foo bar] can't be a tag and is interpreted literally; this is the case that triggers the new warning.
  • * In label/url mode, non-doubled ~ is already warned by metamath.exe and there are no violators, but non-doubled [ interacts weirdly with bibliography tags, and I think rejecting these is the right thing to do (there are no violators).
  • * In bibliography references, there are no escapes but ~ and ` break metamath.exe, so there are no violators.

  • Some more fun things:
    * ~[ab] gets interpreted as *both* a bibliography tag and a label reference, resulting in a broken link to the statement [ and a link to the bibliography tag "ab". Currently mm-knife does the next best thing which is to interpret it as an empty label reference followed by a bibliography tag.
  • * ` alone in a comment starts math mode and doesn't end it, so you get a warning about "</TD></TR></TABLE></CENTER>" not having an html definition. Naturally, you can therefore add a token called </TD></TR></TABLE></CENTER> and give it htmldefs and it will pass VERIFY MARKUP, and the resulting HTML file will look just a bit off.

Mario Carneiro

unread,
Feb 2, 2022, 9:44:11 PM2/2/22
to metamath
... I suppose it's apropos that a post about bugs in text->html autoformatting would itself have buggy formatting. Please ignore the non-ascii list item formatting.

Mázsa Péter

unread,
Feb 3, 2022, 2:45:05 AM2/3/22
to meta...@googlegroups.com
Hi Mario,


I have 3 questions:

1. You see
( ~ elv , ~ el2v and ~ el3v theorems)
here: http://us2.metamath.org/mpeuni/elv.html
In /develop, you find
( ~ elv , ~ el2v* and ~ el3v* theorems)
Is it possible to refer to a set of theorems with one string? I mean,
you understand that ~ el3v* means all theorems beginning with el3v
(i.e. ~ el3v ~ el3v1 ~ el3v2 ~ el3v3 ~ el3v12 ~ el3v13 and ~ el3v23 ),
but can you express this in a way that shows ( ~ elv , ~ el2v* and ~
el3v* theorems) here http://us2.metamath.org/mpeuni/elv.html as well?
Or should I write this: ( ~ elv , and the theorems beginning with `
el2v ` or ` el3v ` ) ? (Or is it solved already just didn't show up at
us2.metamat.org yet?)

2. You see
Inference forms (with ) of the general theorems (with ) may be superfluous.
here: http://us2.metamath.org/mpeuni/elv.html
In /develop, you find
Inference forms (with ` $e |- A e. _V $. ` ) of the general theorems
(with ` $p |- ( A e. V -> ` ) may be superfluous.
How should I express what I want?

3. Reference to the ` $e |- A e. _V $. ` stings in set.mm (cf. above)
are verified by the verify markup feature. Sometimes, in the notes
parts of other theorems, I want to refer to longer strings in set.mm ,
but verify markup doesn't allow it. Is it a way to do it?


Thank you:
Peter

On 2/3/22, Mario Carneiro <di....@gmail.com> wrote:
> ... I suppose it's apropos that a post about bugs in text->html
> autoformatting would itself have buggy formatting. Please ignore the
> non-ascii list item formatting.
>
> On Wed, Feb 2, 2022 at 9:39 PM Mario Carneiro <di....@gmail.com> wrote:
>
>> Hi All,
>>
>> I've been working on a reimplementation of metamath.exe's VERIFY MARKUP
>> feature for metamath-knife, you can see the highlights at
>> https://github.com/david-a-wheeler/metamath-knife/pull/70 . It's already
>> started catching some bugs that will be PR'd to set.mm once the checker
>> stabilizes, but I wanted to bring specific attention to two rules which
>> have produced hundreds of new errors on set.mm:
>>
>>
>> - 23. The ~ and ` characters, when used as delimiters in markup,
>> should have whitespace around them
>> - 24. Characters [, ~ and ` must be doubled if they could not be
>> interpreted as special in markup
>>
>> - Some of the violators have been highlighted at
>>
>> https://github.com/david-a-wheeler/metamath-knife/pull/70#issuecomment-1027601499
>> .
>> -
>> - The question is: Should these rules be kept?
>>
>> * The use of space after ~ was used to make it easier to find labels
>> in comments for find/replace activities, and I think there are only a
>> small
>> handful of violators.
>> - * Space *before* ~ seems to have a lot of violators, so unless we
>> like the look of it I'm okay with relaxing this restriction.
>> - * Space after the initial ` and before the final ` in a math string
>> has no violators at a glance, I think VERIFY MARKUP was already
>> checking
>> this.
>> * Space before the initial ` and after the final ` in a math string
>> has lots of violators for things like (` e. `). Metamath will strip
>> the
>> surrounding spaces if it is a closing punctuation, but there are also
>> examples like "I` /\ `2" in the description of natural deduction rules
>> where the added spaces would be preserved, which is undesirable in
>> this
>> case (we want it to read I∧2). Using unicode html entities would be an
>> alternative way to handle this.
>> - * Non-doubled ~ inside a URL was already being warned against in
>> metamath.exe, although arguably it's fine since it doesn't do anything
>> and
>> is interpreted literally.
>> - * Non-doubled [ has lots of violators in set.mm. There are examples
>> that seem to suggest that people are relying on [ getting printed
>> literally
>> as long as the ] comes after a space. Something like [foo] has to be
>> written as [[foo] because otherwise it would be interpreted as a
>> bibliography tag, but [foo bar] can't be a tag and is interpreted
>> literally; this is the case that triggers the new warning.
>> - * In label/url mode, non-doubled ~ is already warned by metamath.exe
>> and there are no violators, but non-doubled [ interacts weirdly with
>> bibliography tags, and I think rejecting these is the right thing to
>> do
>> (there are no violators).
>> - * In bibliography references, there are no escapes but ~ and ` break
>> metamath.exe, so there are no violators.
>>
>> - Some more fun things:
>> * ~[ab] gets interpreted as *both* a bibliography tag and a label
>> reference, resulting in a broken link to the statement [ and a link to
>> the
>> bibliography tag "ab". Currently mm-knife does the next best thing
>> which is
>> to interpret it as an empty label reference followed by a bibliography
>> tag.
>> - * ` alone in a comment starts math mode and doesn't end it, so you
>> get a warning about "</TD></TR></TABLE></CENTER>" not having an html
>> definition. Naturally, you can therefore add a token called
>> </TD></TR></TABLE></CENTER> and give it htmldefs and it will pass
>> VERIFY
>> MARKUP, and the resulting HTML file will look just a bit off.
>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Metamath" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to metamath+u...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/metamath/CAFXXJStewJedDRcyp_GnCH2NO4zSX4A__eKMF620a%3DvdAA56kg%40mail.gmail.com.
>

Mario Carneiro

unread,
Feb 3, 2022, 3:07:54 AM2/3/22
to metamath
On Thu, Feb 3, 2022 at 2:45 AM Mázsa Péter <pe...@mazsa.com> wrote:
Hi Mario,


I have 3 questions:

1. You see
( ~ elv , ~ el2v and ~ el3v theorems)
here: http://us2.metamath.org/mpeuni/elv.html
In /develop, you find
( ~ elv , ~ el2v* and ~ el3v* theorems)

I see this in set.mm:

  ${
    elv.1 $e |- ( x e. _V -> ph ) $.
    $( New way ( ~ elv , ~ el2v theorems and ~ el3v theorems) to shorten some
       proofs.  Inference forms (with ` $e |- A e. _V $. ` ) of the general
       theorems (with ` $p |- ( A e. V -> ` ) may be superfluous.  (Contributed
       by Peter Mazsa, 13-Oct-2018.) $)
    elv $p |- ph $=
      ( cv cvv wcel vex ax-mp ) BDEFABGCH $.
  $}

I didn't find this in my local copy, I guess it's a new addition. If I had it would have shown up in my markup check because $e and $p are not tokens.

Is it possible to refer to a set of theorems with one string? I mean,
you understand that ~ el3v* means all theorems beginning with el3v
(i.e. ~ el3v ~ el3v1 ~ el3v2 ~ el3v3 ~ el3v12 ~ el3v13 and ~ el3v23 ),
but can you express this in a way that shows ( ~ elv , ~ el2v* and ~
el3v* theorems) here http://us2.metamath.org/mpeuni/elv.html as well?
Or should I write this: ( ~ elv , and the theorems beginning with `
el2v ` or ` el3v ` ) ? (Or is it solved already just didn't show up at
us2.metamat.org yet?)

I would not recommend using stars after or within labels, at least if you use ~ to auto-link them. What do you expect to happen? It's an HTML link, it can't link to a whole collection of things even if it could determine what collection you are talking about. Alternatives include linking to all of them, linking to one or two "representative" examples with some text to indicate that there is more, or just using the glob notation and linking nothing.

2. You see
Inference forms (with ) of the general theorems (with ) may be superfluous.
here: http://us2.metamath.org/mpeuni/elv.html
In /develop, you find
Inference forms (with ` $e |- A e. _V $. ` ) of the general theorems
(with ` $p |- ( A e. V -> ` ) may be superfluous.
How should I express what I want?

What *do* you want? You can just leave off the ticks and they will be printed in regular text, or you can use quotes (and they will still be printed in regular text), or you can use <HTML><PRE></PRE></HTML> to render them in monospace font.
 
3. Reference to the ` $e |- A e. _V $. ` stings in set.mm (cf. above)
are verified by the verify markup feature. Sometimes, in the notes
parts of other theorems, I want to refer to longer strings in set.mm ,
but verify markup doesn't allow it. Is it a way to do it?

 I don't understand the question. Math strings allow you to render metamath token strings the same way as they would appear normally in proofs. They don't do any cross referencing. Label references can be used for cross referencing, and they only link to theorems, not subsections or other things like that, although you can use <HTML> to write arbitrary markup if you really want (although I think this was a bad idea and I would like to decrease use of <HTML> in favor of more structured markup).

Mario

Mario Carneiro

unread,
Feb 3, 2022, 3:16:43 AM2/3/22
to metamath
This actually made me curious because this should have been caught by the existing VERIFY MARKUP check already, yet http://us.metamath.org/mpeuni/elv.html has already made it to the develop branch and the web site. It looks like VERIFY MARKUP has been disabled in CI? @David what's going on here? Looks like it's been like this since december: https://github.com/metamath/set.mm/commit/b6d91985163285cb27ecfcc46c31e4e4791dc2ab .

Mario

Mázsa Péter

unread,
Feb 3, 2022, 5:28:10 AM2/3/22
to meta...@googlegroups.com
On 2/3/22, Mario Carneiro <di....@gmail.com> wrote:
The function of it is cross referencing by searchability. If you are
able to put searchable strings into the notes of theorems, they may be
not cross-references in html-way, but they are cross-references in a
way that you can search them in set.mm locally.
2 examples:
>> Inference forms (with ` $e |- A e. _V $. ` ) of the general theorems
>> (with ` $p |- ( A e. V -> ` ) may be superfluous.
You search for $e |- A e. _V $. locally and you see that these el*v*
theorems refer to 395 theorems in set.mm containing $e |- A e. _V $. .
Or:
$( Two ways of saying that two classes are disjoint (when ` C = (/) ` :
` ( ( A i^i B ) = (/) <-> ( B i^i A ) = (/) ) ` ). (Contributed by Peter
Mazsa, 22-Mar-2017.) $)
ineqcom $p |- ( ( A i^i B ) = C <-> ( B i^i A ) = C ) $=
( cin incom eqeq1i ) ABDBADCABEF $.
means that when you search for ( A i^i B ) = (/) in set.mm you will
find the more general ineqcom (rendered perfectly az
http://us.metamath.org/mpeuni/ineqcom.html ). The problem is that if
you put too much characters between ` and ` , the /rewrap feature is
not able to rewrapping it: how should I quote longer strings? And: if
you write e.g. ` u ( `' _E |` A ) x ` in the notes of a theorem, it is
not allowed by verify markup, because the second ` is interpreded as
the closing ` instead of the third one.
So what I ideally want is:
*a general method
*for quoting
*arbitrary strings
*arbitrary lenght
*for searchability in set.mm
*showing in html version
*verifying markup

Jim Kingdon

unread,
Feb 3, 2022, 10:48:13 AM2/3/22
to meta...@googlegroups.com

VERIFY MARKUP is being run on iset.mm and set.mm and is passing.

Go to the latest entry in the Actions tab ( https://github.com/metamath/set.mm/runs/5038215989?check_suite_focus=true ), follow the link from the duplication check (takes you to https://github.com/metamath/set.mm/runs/5032713820?check_suite_focus=true ) and open the "Run scripts/verify --top_date_skip --extra 'write bibliography mmbiblio.html' set.mm" entry to see:

MM> !VERIFY MARKUP * / TOP_DATE_SKIP

Checking statement label conventions...
Checking latexdef, htmldef, althtmldef...
Checking statement comments...
Checking section header comments...
That just affects nf.mm and the other ones other than set.mm and iset.mm.
--
You received this message because you are subscribed to the Google Groups "Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metamath+u...@googlegroups.com.

Alexander van der Vekens

unread,
Feb 3, 2022, 12:20:50 PM2/3/22
to Metamath
Here my opinion on the topics discussed by Péter and Mario:
* I agree with Mario that stars should not be allowed after or within labels, at least if they are used with ~ to auto-link them. By the way, I also cannot find  ~ el2v* and ~ el3v* in the latest version of set.mm.
* ` $p |- ( A e. V -> `  should not be allowed because $e and $p are not tokens. Wouldn't $p ` |- ( A e. V -> ` be sufficient?
* quoted strings longer than a line can be split up. See, for example, ~ numclwwlk7. I think this is a sufficiently good solution.


Alexander van der Vekens

unread,
Feb 3, 2022, 12:53:25 PM2/3/22
to Metamath
And here my remarks on Mario's rules/checks:


  • * The use of space after ~ was used to make it easier to find labels in comments for find/replace activities, and I think there are only a small handful of violators.
This is OK as strict rule and fits to the conventions we have already documented in set.mm (space-separated tilde)

  • * Space *before* ~ seems to have a lot of violators, so unless we like the look of it I'm okay with relaxing this restriction.
This would also fit to the conventions we have already  documented in set.mm (space-separated tilde). I would propose to clean up set.mm and make it a strict rule.
  • * Space after the initial ` and before the final ` in a math string has no violators at a glance, I think VERIFY MARKUP was already checking this.
This rule should be kept (and checked).
  • * Space before the initial ` and after the final ` in a math string has lots of violators for things like (` e. `). Metamath will strip the surrounding spaces if it is a closing punctuation, but there are also examples like "I` /\ `2" in the description of natural deduction rules where the added spaces would be preserved, which is undesirable in this case (we want it to read I∧2). Using unicode html entities would be an alternative way to handle this.
I think such a rule would be too restrictive, and we should not enforce this.
  • * Non-doubled ~ inside a URL was already being warned against in metamath.exe, although arguably it's fine since it doesn't do anything and is interpreted literally.
There ARE real URLs which contain ~. In such URLs, ~~ must be used, see mmset.raw.html "http://www.math.miami.edu/~~ec/book". What exactly happens if this is written as "http://www.math.miami.edu/~ec/book"?
  • * Non-doubled [ has lots of violators in set.mm. There are examples that seem to suggest that people are relying on [ getting printed literally as long as the ] comes after a space. Something like [foo] has to be written as [[foo] because otherwise it would be interpreted as a bibliography tag, but [foo bar] can't be a tag and is interpreted literally; this is the case that triggers the new warning.
As far as I can remember there are strict rules for bibliographic references (no whitespaces, etc.) , and all other cases are  interpreted literally. We should stick with this and should not enforce the usage of [[ in these cases.
  • * In label/url mode, non-doubled ~ is already warned by metamath.exe and there are no violators, but non-doubled [ interacts weirdly with bibliography tags, and I think rejecting these is the right thing to do (there are no violators).
I cannot really imagine such a case, but to reject such constructs sounds fine.
  • * In bibliography references, there are no escapes but ~ and ` break metamath.exe, so there are no violators.
I do not really understand this case. Could you give an example?
  • Some more fun things:
    * ~[ab] gets interpreted as *both* a bibliography tag and a label reference, resulting in a broken link to the statement [ and a link to the bibliography tag "ab". Currently mm-knife does the next best thing which is to interpret it as an empty label reference followed by a bibliography tag.
Really funny, but this won't be possible in the future (if the strict space-separated tilde rule is enforced): it will have to be either "~ [ab]", then [ab] must be a label, or "~~[ab]", then [ab] must be a bibliography tag.
  • * ` alone in a comment starts math mode and doesn't end it, so you get a warning about "</TD></TR></TABLE></CENTER>" not having an html definition. Naturally, you can therefore add a token called </TD></TR></TABLE></CENTER> and give it htmldefs and it will pass VERIFY MARKUP, and the resulting HTML file will look just a bit off.
If this can be checked, it should not be allowed in the future.

David A. Wheeler

unread,
Feb 3, 2022, 2:14:12 PM2/3/22
to Metamath Mailing List
> It looks like VERIFY MARKUP has been disabled in CI? @David what's going on here? Looks like it's been like this since december: https://github.com/metamath/set.mm/commit/b6d91985163285cb27ecfcc46c31e4e4791dc2ab .

Markup checking happens for set.mm and iset.mm, but *not* for the other databases. At least, that's what the highlighted line is supposed to do :-).

--- David A. Wheeler

Mario Carneiro

unread,
Feb 3, 2022, 2:26:15 PM2/3/22
to metamath
On Thu, Feb 3, 2022 at 5:28 AM Mázsa Péter <pe...@mazsa.com> wrote:
>> 3. Reference to the ` $e |- A e. _V $. ` stings in set.mm (cf. above)
>> are verified by the verify markup feature. Sometimes, in the notes
>> parts of other theorems, I want to refer to longer strings in set.mm ,
>> but verify markup doesn't allow it. Is it a way to do it?
>>
>
>  I don't understand the question. Math strings allow you to render metamath
> token strings the same way as they would appear normally in proofs. They
> don't do any cross referencing. Label references can be used for cross
> referencing, and they only link to theorems, not subsections or other
> things like that, although you can use <HTML> to write arbitrary markup if
> you really want (although I think this was a bad idea and I would like to
> decrease use of <HTML> in favor of more structured markup).

The function of it is cross referencing by searchability. If you are
able to put searchable strings into the notes of theorems, they may be
not cross-references in html-way, but they are cross-references in a
way that you can search them in set.mm locally.
2 examples:
>> Inference forms (with ` $e |- A e. _V $. ` ) of the general theorems
>> (with ` $p |- ( A e. V -> ` ) may be superfluous.
You search for $e |- A e. _V $. locally and you see that these el*v*
theorems refer to 395 theorems in set.mm containing $e |- A e. _V $. .

Actually, I usually use searches for $e and $p when I want to *not* get any comment hits. For example searching for " label $p" is a very reliable way to jump to the definition of "label" via text search. For the purpose of this particular comment I think it suffices to just have ` |- A e. _V ` in the description.
 
Or:
  $( Two ways of saying that two classes are disjoint (when ` C = (/) ` :
     ` ( ( A i^i B ) = (/) <-> ( B i^i A ) = (/) ) ` ).  (Contributed by Peter
     Mazsa, 22-Mar-2017.) $)
  ineqcom $p |- ( ( A i^i B ) = C <-> ( B i^i A ) = C ) $=
    ( cin incom eqeq1i ) ABDBADCABEF $.
means that when you search for ( A i^i B ) = (/) in set.mm you will
find the more general ineqcom (rendered perfectly az
http://us.metamath.org/mpeuni/ineqcom.html ). The problem is that if
you put too much characters between ` and ` , the /rewrap feature is
not able to rewrapping it: how should I quote longer strings? And: if
you write e.g. ` u ( `' _E |` A ) x ` in the notes of a theorem, it is
not allowed by verify markup, because the second ` is interpreded as
the closing ` instead of the third one.

As mentioned, the fix is to split the math string by closing it on the first line and reopening it on the second line. The renderer will combine the strings.


On Thu, Feb 3, 2022 at 10:48 AM Jim Kingdon <kin...@panix.com> wrote:

VERIFY MARKUP is being run on iset.mm and set.mm and is passing.

On further investigation, it seems like VERIFY MARKUP has a bug that is preventing this error from being caught: the following test file

$c wff $. $v ph $.
wph $f wff ph $.
$( ` foo ` $)
a $p wff ph $= wph $.
$( $t $)

gives the error

?Warning: In the comment for statement "a", math symbol token "foo" does not
have a LaTeX and/or an HTML definition.

but replacing "foo" with "$e" yields no such error. In fact, this should probably be a separate error, since tokens can never contain $, although you will get the runaround if you try to quell this error: putting the htmldef "$e" as "..."; will silence this error but produce another error saying $e isn't a declared token, and $c $e $. is a syntax error.



On Thu, Feb 3, 2022 at 12:53 PM 'Alexander van der Vekens' via Metamath <meta...@googlegroups.com> wrote:
And here my remarks on Mario's rules/checks:
  • * Non-doubled ~ inside a URL was already being warned against in metamath.exe, although arguably it's fine since it doesn't do anything and is interpreted literally.
There ARE real URLs which contain ~. In such URLs, ~~ must be used, see mmset.raw.html "http://www.math.miami.edu/~~ec/book". What exactly happens if this is written as "http://www.math.miami.edu/~ec/book"?

It works, but metamath complains at you.
  • * Non-doubled [ has lots of violators in set.mm. There are examples that seem to suggest that people are relying on [ getting printed literally as long as the ] comes after a space. Something like [foo] has to be written as [[foo] because otherwise it would be interpreted as a bibliography tag, but [foo bar] can't be a tag and is interpreted literally; this is the case that triggers the new warning.
As far as I can remember there are strict rules for bibliographic references (no whitespaces, etc.) , and all other cases are  interpreted literally. We should stick with this and should not enforce the usage of [[ in these cases.

I'm inclined towards this as well. The rule is fairly simple: [<no whitespaces>] is a bib tag, anything else is not.
  • * In bibliography references, there are no escapes but ~ and ` break metamath.exe, so there are no violators.
I do not really understand this case. Could you give an example?

The question is what happens if you have a bib tag like [foo~bar`baz]. As I said, the rule is quite simple - no whitespace - and this permits a whole lot of escape characters, that somehow make their way through the nested regex substitutions that form metamath's markup renderer. In this case, it will complain about "baz]" not being a math token, meaning that it started math mode even though it's in a bib reference. If you close the string it even seems to work correctly: [foo`bar] ` is plaintext "[foo" followed by a math string consisting of the "bar]" token. (I guess I need to update the comment parser again...)

If you use [foo``bar], then you get an error saying

?Error: The bibliographic reference "[foo``bar]" in statement "a" was not found
as an <A NAME="foo``bar"></A> anchor in the file "mmset.html".

which indicates that it treated the foo``bar literally (without unescaping), but if you look at the generated HTML there will be [<A HREF="mmset.html#foo`bar">foo`bar</A>], so it does remove the escapes when actually printing it (so it's not possible to make a tag like this work correctly because it will either fail the VERIFY MARKUP check or it will be a broken link).

Similar things apply with the ~ escape character: using [foo~~bar] results in a search for the "foo~~bar" bib tag but the generated HTML uses "foo~bar" instead. If you use the ~ unescaped, you get some extra-strange errors:

$( [foo~bar] $)
a $p wff ph $= wph $.

?Error: The bibliographic reference "[foo~bar]" in statement "a" was not found
as an <A NAME="foo~bar"></A> anchor in the file "mmset.html".
?Warning: There is a "~" inside of a label in the comment of statement "a".
Use "~~" to escape "~" in an http reference.
?Warning: The label token "bar&quot;&gt;foo" (referenced in comment of
statement "a") is not a $a or $p statement label.

The first error looks fine, what you would expect for a nonexistent bibref, and indicates that "foo~bar" is the reference. The second error is weird because there is only one ~ in the text, so the part after it, "bar]", doesn't have any ~ in it. The generated HTML is also very broken:

[<A HREF="mmset.html#foo<FONT COLOR=blue >bar&quot;&gt;foo</FONT>&nbsp;<SPAN CLASS=r STYLE="color:#000000">(future)</SPAN>bar</A>]
 Notice that the <A> tag does not end before the <FONT> tag starts - that stuff is all inside the string literal. We can see something like an attempt to reference the nonexistent "bar&quot;&gt;foo" label in the middle of generating a bibref to "foo~bar".

This is why I think they should be banned entirely - metamath.exe does nothing sensible here.

  • Some more fun things:
    * ~[ab] gets interpreted as *both* a bibliography tag and a label reference, resulting in a broken link to the statement [ and a link to the bibliography tag "ab". Currently mm-knife does the next best thing which is to interpret it as an empty label reference followed by a bibliography tag.
Really funny, but this won't be possible in the future (if the strict space-separated tilde rule is enforced): it will have to be either "~ [ab]", then [ab] must be a label, or "~~[ab]", then [ab] must be a bibliography tag.

No, the space doesn't help. ~ [ab] is still a label and a bibref. The reason is that bibrefs have really high "priority" In the markup parser, as a result of their being handled in the first "regex pass" before looking for other things. (The fact that it is a bunch of passes is why you see various bits of bleeding implementation details like a search for "</TD></TR></TABLE></CENTER>" token.)
  • * ` alone in a comment starts math mode and doesn't end it, so you get a warning about "</TD></TR></TABLE></CENTER>" not having an html definition. Naturally, you can therefore add a token called </TD></TR></TABLE></CENTER> and give it htmldefs and it will pass VERIFY MARKUP, and the resulting HTML file will look just a bit off.
If this can be checked, it should not be allowed in the future.

It's certainly not allowed by the new parser, which is a proper parser. it sees an unclosed math string and warns on that; in HTML generation it auto-closes the string. No leakage.

Mázsa Péter

unread,
Feb 4, 2022, 4:34:52 AM2/4/22
to meta...@googlegroups.com
On 2/3/22, 'Alexander van der Vekens' via Metamath wrote:
> * I agree with Mario that stars should not be allowed after or within
> labels, at least if they are used with ~ to auto-link them.

Of course I agree as well. 4 options remained:
( ~ elv , and the theorems beginning with ` el2v ` or ` el3v ` )
( ~ elv , and the theorems beginning with el2v or el3v )
( ~ elv , ` el2v* ` and ` el3v* ` theorems)
( ~ elv , el2v* and el3v* theorems)
Perhaps the first is the best, I'll make experiments.

On 2/3/22, 'Alexander van der Vekens' via Metamath wrote:
> Wouldn't $p ` |- ( A e. V -> ` be sufficient?
On 2/3/22, Mario Carneiro <di....@gmail.com> wrote:
> For the purpose of
> this particular comment I think it suffices to just have ` |- A e. _V ` in
> the description.

I agree here as well, I will try this:
Inference forms (with ` |- A e. _V ` ) of the general theorems (with `
|- ( A e. V -> ` ) may be superfluous.

On 2/3/22, 'Alexander van der Vekens' via Metamath wrote:
> * quoted strings longer than a line can be split up. See, for example, ~
> numclwwlk7. I think this is a sufficiently good solution.
On 2/3/22, Mario Carneiro <di....@gmail.com> wrote:
> As mentioned, the fix is to split the math string by closing it on the
> first line and reopening it on the second line. The renderer will combine
> the strings.

This is perfect! I didn't know about this option. Thank you.


Peter

Mario Carneiro

unread,
Feb 4, 2022, 5:56:46 AM2/4/22
to metamath
FYI, I made some minor stylistic modifications to the el*v theorems in https://github.com/metamath/set.mm/pull/2472 ; you should use that as a base for modification to avoid conflicts.

--
You received this message because you are subscribed to the Google Groups "Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metamath+u...@googlegroups.com.

Mázsa Péter

unread,
Feb 4, 2022, 6:15:20 AM2/4/22
to meta...@googlegroups.com

Mario Carneiro

unread,
Feb 7, 2022, 4:06:44 AM2/7/22
to metamath
There is another matter that I can't seem to find any discussion about: Math markup strings in set.mm are exclusively single-line, even going to the extent of closing the string and reopening it on the next line if needed to wrap a line. But I can't find any evidence in metamath.exe of this behavior: VERIFY MARKUP doesn't care if you have newlines in a math string, and it displays just as you would expect. Maybe this is /REWRAP 's doing? Should the new markup checker enforce that math strings have no newlines, or allow them? Personally I don't see any reason other than historical momentum to disallow newlines in math strings.

Mario Carneiro

unread,
Feb 7, 2022, 4:50:51 AM2/7/22
to metamath
After playing with rewrap a bit, I think the reason for the restriction is that "write source /rewrap" will not attempt to break math strings, but it will remove newlines from math strings, meaning that if you have a long math string containing newlines it will be rewrapped into a long line, which then fails to pass verify markup.

This leaves the question of what to do, however. The rewrap command could reflow math tokens, but this almost always looks terrible, which is why it doesn't do this for regular statements unless they go over the line limit. A more reasonable approach would be to preserve any internal whitespace in a math string, to allow for manual indentation.

Alexander van der Vekens

unread,
Feb 7, 2022, 1:39:33 PM2/7/22
to Metamath
On Monday, February 7, 2022 at 10:50:51 AM UTC+1 di....@gmail.com wrote:
After playing with rewrap a bit, I think the reason for the restriction is that "write source /rewrap" will not attempt to break math strings, but it will remove newlines from math strings, meaning that if you have a long math string containing newlines it will be rewrapped into a long line, which then fails to pass verify markup.

This leaves the question of what to do, however. The rewrap command could reflow math tokens, but this almost always looks terrible, which is why it doesn't do this for regular statements unless they go over the line limit. A more reasonable approach would be to preserve any internal whitespace in a math string, to allow for manual indentation.
 
Do you want to modify the  "write source /rewrap" command? I am happy with its current behaviour, and to split math strings manually if they are too long seems to be sufficient.

Mario Carneiro

unread,
Feb 7, 2022, 9:45:30 PM2/7/22
to metamath
I was considering it. Or at least, deprecating it once we have an adequate replacement, which can then have a different behavior on long math strings.

If we're keeping the current behavior, I suppose that means that there should be a warning for newlines in a math string?

--
You received this message because you are subscribed to the Google Groups "Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metamath+u...@googlegroups.com.

Alexander van der Vekens

unread,
Feb 7, 2022, 11:57:39 PM2/7/22
to Metamath
On Tuesday, February 8, 2022 at 3:45:30 AM UTC+1 di....@gmail.com wrote:
I was considering it. Or at least, deprecating it once we have an adequate replacement, which can then have a different behavior on long math strings.

If we're keeping the current behavior, I suppose that means that there should be a warning for newlines in a math string?
 
To get a warning in this case would be great!

Benoit

unread,
Feb 9, 2022, 7:44:31 AM2/9/22
to Metamath
I just noticed that apparently, rewraps are not done within double quotes either ?  For instance, set.mm contains:
$( Major premise for the Aristotelian syllogism "Barbara", e.g.,
"All men are mortal". By convention, the major premise is first. $)

Note that the first line could be longer (rewrap could be after "men"). Also, the single space after 'mortal".' should be a double space. Or is it because this comment is not right before an assertion ($a or $p statement), and only these comments are reformatted/rewrapped ? If so, then shoud reformat/rewrap be extend to all comments, for more consistency ?

Benoît

Mario Carneiro

unread,
Feb 9, 2022, 8:15:03 AM2/9/22
to metamath
It's almost certainly the latter. Markup validation also only works on comments before a $a/$p statement, so I imagine using the markup-aware rewrapper on a comment that could contain arbitrary unvalidated text could do significant formatting damage.

--
You received this message because you are subscribed to the Google Groups "Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metamath+u...@googlegroups.com.

Mario Carneiro

unread,
Feb 9, 2022, 8:16:41 AM2/9/22
to metamath
In particular, the head comment (you know, the one that takes up the first 1/5 of the file) is one such comment, and rewrapping this comment would destroy the changelog.

Benoit

unread,
Feb 9, 2022, 8:27:59 AM2/9/22
to Metamath
Thanks. Agreed.

Mario Carneiro

unread,
Jun 9, 2022, 2:06:13 PM6/9/22
to metamath
I just noticed when rereading the metamath code for https://github.com/metamath/metamath-exe/pull/90 another behavior I haven't captured yet in the metamath-knife parser: In typesetting comments, you can use "foo" or 'bar' quoted strings as tokens, and these do not accept any escapes. *However*, it turns out that "a""b" does count as a single string (it must be two quotes in a row - "a" "a" is two tokens but "a""a" or 'a''a' is one string token). But counter to intuition, "a""a" is actually a string which contains two double quotes, i.e. <a""a>, and similarly 'a''a' encodes <a''a>. At least, this is the behavior I observed when trying this out in althtmldef, I didn't try all the commands to see if they act the same way.

This seems like such an obviously useless and incorrect behavior that I am inclined not to replicate it in metamath-knife. What do people think? The current metamath-knife behavior is to not allow any internal quotes in the string, so "a""a" is two adjacent string tokens.

Mario Carneiro

unread,
Jun 9, 2022, 2:14:16 PM6/9/22
to metamath
Oops, this is actually a bug, the very same bug that #90 is fixing. The doubled quote is supposed to be undoubled when interpreted, so "a""a" means <a"a> and 'a''a' means <a'a>, which seems like a lot more reasonable behavior. It still seems unnecessary though; is it worth supporting?

Benoit

unread,
Jun 9, 2022, 8:54:57 PM6/9/22
to Metamath
If you do not support this, then is there another way to obtain the strings <a'a> and <a"a> ?

Mario Carneiro

unread,
Jun 10, 2022, 1:02:46 PM6/10/22
to metamath
I added support for it in https://github.com/david-a-wheeler/metamath-knife/pull/91 . You can already obtain strings like <a'a> by using the other quote character - "a'a" - and if you need both ' and " you can sometimes use + to concatenate strings, e.g.

htmldef "foo" as "[']" + '["]';

would render foo as the string <[']["]>. However this + concatenation trick is not general purpose in $t comments, it is tokenized as a bunch of string literals and + keywords and it is up to the command to support this or not. Most of the $t commands support it but only in the second argument (after the "as"), and none of the $j commands support it. There isn't much demand for it in the other places though; it's not needed if you are only listing statement labels since these are not long enough to need line breaks and don't use quote characters anyway.

Reply all
Reply to author
Forward
0 new messages