id collisions in html output

14 views
Skip to first unread message

Jeremy Sylvestre

unread,
Sep 19, 2025, 1:35:33 AM (8 days ago) Sep 19
to PreTeXt development
This may get chalked up to "author error" but I feel like it's too easy for an author to unwittingly cause id collisions in their html output.

See live example here:

The source for this page of the live example starts off with

example xml:id="test-example"
  p
  p

example xml:id="test-example-2"
  p

In html output each of the two <p> in the first example get a pretext-generated id that is simply the xml:id of the parent <example> appended with a counter. In particular, in output the second <p> of the first example gets transformed into

<div class="para" id="test-example-2">

But the second example gets transformed into

<article class="example example-like" id="test-example-2">

and now we have clashing ids. You can see the effect of this clash if you scroll all the way to the bottom of the linked page in the live example and click the xref you find there.

Of course, if only the author had been more logical in choosing their xml:id for the first example, say

xml:id="test-example-1"

then this clash would have been avoided, because then the two <p> in that example would have been transformed into

<div class="para" id="test-example-1-1">
<div class="para" id="test-example-1-2">

But probably best not to bet on authors being perfectly logical at all times...

Cheers,
Jeremy S

Rob Beezer

unread,
Sep 19, 2025, 4:25:43 AM (8 days ago) Sep 19
to prete...@googlegroups.com
Thanks, Jeremy. Definitely not author error. BDFL error. But not unwittingly.

I was aware of this potential when I last redesigned the automatic IDs. I never did concoct a proof of the sort of failure you show here. Only assumed it would be rare. Generally I would suggest numbers in xml:id or label is not a good practice.

What can we do about it?

Ban numbers in authored identifiers? (No )

Make automatic numbers into letters instead (a,b,c)? (Hmmm.)

Use a new character to separate automatic numbers, since we currently only allow hyphen and underscore (iirc)? Asterisk or pipe or ...? (Trivial to implement, but too weird?)

Rob

Jeremy Sylvestre

unread,
Sep 19, 2025, 10:12:20 AM (8 days ago) Sep 19
to prete...@googlegroups.com
On Fri, 19 Sept 2025 at 02:25, 'Rob Beezer' via PreTeXt development <prete...@googlegroups.com> wrote:

What can we do about it?


Hmm, this probably requires more careful thought than just off-the-top-of-the-head ideas, so hopefully some people who are wiser than me can chime in. But here are a few off-the-top-of-my-head ideas anyway.
 

Use a new character to separate automatic numbers, since we currently only allow hyphen and underscore (iirc)?  Asterisk or pipe or ...? (Trivial to implement, but too weird?)


1. Separate the stem from the counter suffix with something less likely than a single hyphen? Like multiple hyphens?

id="parent-element---1-2-3"

or maybe

id="parent-element-+-1-2-3"

2. Surround the added-on-counter suffix with delimiters?

Maybe

id="parent-element[1-2-3]"

or

id="parent-element__1-2-3__"


or make it a little more "rare" with an extra non-hyphen stem-counter separator

id="parent-element+[1-2-3]"

or

id="parent-element__[1-2-3]"

3. I think this idea is almost guaranteed to avoid collisions, but I'm not sure if having typeable URLs is something desirable. But if you throw a non-keyboard unicode character in there somewhere, it's almost certain an author won't use that in their source for an id or label. How about separating the suffix with a single em-dash?

id="parent-element1-2-3"


Charilaos Skiadas

unread,
Sep 19, 2025, 10:23:04 AM (8 days ago) Sep 19
to prete...@googlegroups.com
Personally I think I would be happy with just a warning during build time that such a collision occurs, so I can change my xml:ids, rather than making the auto-numbering system more complex. Can we somehow keep track of the generated ids, and issue a warning when two different elements end up with the same id? I suspect it’s not as simple as what we do now with avoiding duplicate xml:ids.


Charilaos Skiadas
Department of Mathematics
Hanover College


--
You received this message because you are subscribed to the Google Groups "PreTeXt development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pretext-dev...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/pretext-dev/CALMYOnd7fpqMAYBQROyxybYihho8bf523WhyG6rNJw-JOWyXSA%40mail.gmail.com.

David W. Farmer

unread,
Sep 19, 2025, 10:24:27 AM (8 days ago) Sep 19
to prete...@googlegroups.com

I don't say we should ban authors from putting numbers in their ids,
but I think it is a bad and non-PreTeXtY idea which should be sharply
discouraged.

Especially in the case (sorry Jeremy!) where it is an ordinal number
indicating the place that item occurs in a sequence of other items.
That is just asking for trouble, since it becomes anti-semantic if
you add to or rearrange those items.

And please don't fix this "problem" by adding more characters to
the auto-generated id. That practically demands banning (advising)
authors from using some other strings in their ids. If you are going
to do that, then do the right thing and advise against numbers.

Regards,

David

Alex Jordan

unread,
Sep 19, 2025, 10:32:11 AM (8 days ago) Sep 19
to prete...@googlegroups.com
According to the doc for xml:ids
You are only supposed to use alphanumeric, underscore, and hyphen in them. (This came up recently somewhere else, I forget where, and that's when I recently even became aware for this.)

If that were enforced, lots of keyboard characters are available for PTX. However some are problematic. Like a colon might be nice except then we have to use ./ before all relative urls in a link.

Jeremy Sylvestre

unread,
Sep 19, 2025, 10:55:44 AM (8 days ago) Sep 19
to prete...@googlegroups.com
On Fri, 19 Sept 2025 at 08:24, David W. Farmer <far...@aimath.org> wrote:

I don't say we should ban authors from putting numbers in their ids,
but I think it is a bad and non-PreTeXtY idea which should be sharply
discouraged.

Especially in the case (sorry Jeremy!) where it is an ordinal number
indicating the place that item occurs in a sequence of other items.
That is just asking for trouble, since it becomes anti-semantic if
you add to or rearrange those items.


Unfortunately authors aren't always going to follow "best practice" and this particular "problem" leads to a mysterious bug (an xref that points to the wrong place) that is difficult for an author that is not familiar with pretext development to debug on their own. Even more likely, the bug probably won't be noticed by the author themselves. (Does anyone go through and click every single xref in their output to make sure it's functioning correctly? This bug has apparently been present in the output of one of my projects for years.) But the bug will cause issues and confusion for students.

But also:

1. Numbers *can* be semantic (fundamental-theorem-of-calculus-part-2).

2. Even the order of items *can* be semantic. Perhaps a second example is a continuation/modification of the first example, and the author is certain at the time of authoring that the order of those two items will *never* be rearranged.

3. My live example was contrived to demonstrate the issue, but is also representative of the type of thing authors might do when they unfortunately haven't stumbled across that one sentence in the docs where they are "sharply discouraged" from doing whatever it is you don't want them to do because you are against slightly modifying how ids are generated for some reason? Expecting authors to perfectly follow all best practices all the time is probably unrealistic.

4. Allowing something but then also "sharply discouraging" it doesn't really make sense to me.

kcri...@gmail.com

unread,
Sep 20, 2025, 8:48:07 AM (7 days ago) Sep 20
to PreTeXt development

Unfortunately authors aren't always going to follow "best practice" and this particular "problem" leads to a mysterious bug (an xref that points to the wrong place) that is difficult for an author that is not familiar with pretext development to debug on their own. Even more likely, the bug probably won't be noticed by the author themselves. (Does anyone go through and click every single xref in their output to make sure it's functioning correctly? This bug has apparently been present in the output of one of my projects for years.) But the bug will cause issues and confusion for students.


I hope I may be allowed to thank you for these excellent examples, and add two small comments in support.
 
But also:

1. Numbers *can* be semantic (fundamental-theorem-of-calculus-part-2).


Or "hensel-lemma-exercise-prime-5" and "hensel-lemma-exercise-prime-7" where the number here could refer to a modulus, and where requiring the author to type "seven" would actually hinder readability.
 
2. Even the order of items *can* be semantic. Perhaps a second example is a continuation/modification of the first example, and the author is certain at the time of authoring that the order of those two items will *never* be rearranged.

This is particularly true if it were to be a conversion of a pre-existing text where some items are "fixed in the tradition", as it were.  And unfortunately "exercise2" is less readable than "exercise-2".  

I also hope the idea of doing something that indicates the semantics, that this is autogenerated, is acceptable to suggest as a possible direction.  E.g. "test-example-p-2" or "test-example-autoid-p-2" or something similar.  (The former could run into similar problems as the initial example, of course.)  That might even enhance the ability of authors to discover in their generated files exactly how things work when creating the html, though I'm sure there are other potential issues.
Reply all
Reply to author
Forward
0 new messages