Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#993638: libxml2: XHTML 1.0 validation is broken

229 views
Skip to first unread message

Vincent Lefevre

unread,
Sep 3, 2021, 9:50:03 PM9/3/21
to
Package: libxml2
Version: 2.9.12+dfsg-3
Severity: grave
Justification: renders package unusable

After the upgrade to 2.9.12+dfsg-3, XHTML 1.0 validation is broken.
There was no such issue with 2.9.10+dfsg-6.7.

Testcase:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>title</title></head>
<body><p>text</p></body>
</html>

$ xmllint --noout --loaddtd --valid test.html
error : xmlAddEntity: invalid redeclaration of predefined entity
error : xmlAddEntity: invalid redeclaration of predefined entity

-- System Information:
Debian Release: bookworm/sid
APT prefers unstable-debug
APT policy: (500, 'unstable-debug'), (500, 'stable-security'), (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 5.10.0-8-amd64 (SMP w/8 CPU threads)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=POSIX, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages libxml2 depends on:
ii libc6 2.31-17
ii libicu67 67.1-7
ii liblzma5 5.2.5-2
ii zlib1g 1:1.2.11.dfsg-2

libxml2 recommends no packages.

libxml2 suggests no packages.

-- no debconf information

--
Vincent Lefèvre <vin...@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

Vincent Lefevre

unread,
Sep 12, 2021, 9:50:03 PM9/12/21
to
Control: found -1 2.9.12+dfsg-4

On 2021-09-04 03:40:17 +0200, Vincent Lefevre wrote:
> After the upgrade to 2.9.12+dfsg-3, XHTML 1.0 validation is broken.
> There was no such issue with 2.9.10+dfsg-6.7.

Still broken in 2.9.12+dfsg-4.

Mattia Rizzolo

unread,
Sep 19, 2021, 1:20:05 PM9/19/21
to
Control: tag -1 unreproducible

On Sat, Sep 04, 2021 at 03:40:17AM +0200, Vincent Lefevre wrote:
> After the upgrade to 2.9.12+dfsg-3, XHTML 1.0 validation is broken.
> There was no such issue with 2.9.10+dfsg-6.7.

Actually, I can't reproduce it.
And, honestly, I think that if really didn't work I would have heard
quite a lot of noise by now.

> $ xmllint --noout --loaddtd --valid test.html
> error : xmlAddEntity: invalid redeclaration of predefined entity
> error : xmlAddEntity: invalid redeclaration of predefined entity

I can never manage to download DTDs from w3.org (how could you?!), so,
taking your testcase and a copy of the same DTD:

mattia@warren /tmp/tmp/xml % l
total 68
-rw-r--r-- 1 mattia mattia 260 Sep 19 19:02 test.html
-rw-r--r-- 1 mattia mattia 26450 Sep 6 2014 xhtml1-strict.dtd
-rw-r--r-- 1 mattia mattia 12055 Sep 6 2014 xhtml-lat1.ent
-rw-r--r-- 1 mattia mattia 4293 Sep 6 2014 xhtml-special.ent
-rw-r--r-- 1 mattia mattia 14167 Sep 6 2014 xhtml-symbol.ent
mattia@warren /tmp/tmp/xml % cat test.html
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>title</title></head>
<body><p>text</p></body>
</html>
mattia@warren /tmp/tmp/xml % xmllint --dtdvalid xhtml1-strict.dtd --nonet --noout test.html
I/O error : Attempt to load network entity http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
test.html:2: warning: failed to load external entity "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
^
mattia@warren /tmp/tmp/xml %

which looks good to me.


This is with the current 2.9.12+dfsg-4.

--
regards,
Mattia Rizzolo

GPG Key: 66AE 2B4A FCCF 3F52 DA18 4D18 4B04 3FCD B944 4540 .''`.
More about me: https://mapreri.org : :' :
Launchpad user: https://launchpad.net/~mapreri `. `'`
Debian QA page: https://qa.debian.org/developer.php?login=mattia `-
signature.asc

Thorsten Glaser

unread,
Sep 19, 2021, 4:50:03 PM9/19/21
to
On Sun, 19 Sep 2021, Vincent Lefevre wrote:

> I can see that xhtml1-strict.dtd is provided by the w3c-dtd-xhtml
> package.

Not quite.

https://packages.qa.debian.org/w/w3c-dtd-xhtml/news/20160107T183823Z.html

------------------- Reason -------------------
RoQA; superseded by w3c-sgml-lib
----------------------------------------------

That’s not entirely true, though:

* [22]#826217 [n| | ] [[23]w3c-sgml-lib] [24]w3c-sgml-lib: XHTML 1.1
files missing
Reported by: [25]Thorsten Glaser <t...@mirbsd.de>; Date: Fri, 3 Jun
2016 11:21:02 UTC; Severity: normal; Filed 5 years and 109 days ago;
Modified 5 years and 109 days ago;

It probably contains the ones for 1.0, but I found w3c-sgml-lib to
not be sufficient in many ways and now use local files only… which
means validating involves copying the file, changing the http link
in the DOCTYPE with a local file:// link, then validating… working
but suboptimal.

bye,
//mirabilos
--
«MyISAM tables -will- get corrupted eventually. This is a fact of life. »
“mysql is about as much database as ms access” – “MSSQL at least descends
from a database” “it's a rebranded SyBase” “MySQL however was born from a
flatfile and went downhill from there” – “at least jetDB doesn’t claim to
be a database” (#nosec) ‣‣‣ Please let MySQL and MariaDB finally die!

Vincent Lefevre

unread,
Sep 19, 2021, 8:30:03 PM9/19/21
to
On 2021-09-19 22:33:09 +0200, Thorsten Glaser wrote:
> It probably contains the ones for 1.0, but I found w3c-sgml-lib to
> not be sufficient in many ways and now use local files only…

which has always been the case, AFAIK. And the XHTML 1.0 related files
seem to be identical to the w3c-dtd-xhtml ones, except for comments
and spacing. For instance, there's the following change in the comment
of xhtml-lat1.ent:

Typical invocation:

<!ENTITY % xhtml-lat1
PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN"
- "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent" >
+ "xhtml-lat1.ent" >
%xhtml-lat1;

but /usr/share/xml/xhtml/schema/dtd/1.0/xhtml1-strict.dtd from
w3c-dtd-xhtml is using:

<!ENTITY % HTMLlat1 PUBLIC
"-//W3C//ENTITIES Latin 1 for XHTML//EN"
"xhtml-lat1.ent">
%HTMLlat1;

<!ENTITY % HTMLsymbol PUBLIC
"-//W3C//ENTITIES Symbols for XHTML//EN"
"xhtml-symbol.ent">
%HTMLsymbol;

<!ENTITY % HTMLspecial PUBLIC
"-//W3C//ENTITIES Special for XHTML//EN"
"xhtml-special.ent">
%HTMLspecial;

(which has never had any issue). So, this was probably an old
documentation bug (but it doesn't matter when one uses only
public identifiers and catalogs).

> which means validating involves copying the file, changing the http
> link in the DOCTYPE with a local file:// link, then validating…
> working but suboptimal.

Everything should be available with the public identifiers via
catalogs. Perhaps w3c-sgml-lib doesn't set the catalogs correctly.
For instance, with w3c-dtd-xhtml, /etc/xml/w3c-dtd-xhtml.xml
contains:

<?xml version="1.0"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD XML Catalogs V1.0//EN"
"file:///usr/share/xml/schema/xml-core/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<delegatePublic publicIdStartString="-//W3C//DTD XHTML//EN" catalog="file:///usr/share/xml/xhtml/schema/dtd/catalog.xml"/>
<delegatePublic publicIdStartString="-//W3C//DTD XHTML 1.0 Transitional//EN" catalog="file:///usr/share/xml/xhtml/schema/dtd/1.0/catalog.xml"/>
<delegatePublic publicIdStartString="-//W3C//DTD XHTML Basic 1.0//EN" catalog="file:///usr/share/xml/xhtml/schema/dtd/basic/catalog.xml"/>
<delegatePublic publicIdStartString="-//W3C//DTD XHTML 1//EN" catalog="file:///usr/share/xml/xhtml/schema/dtd/catalog.xml"/>
<delegatePublic publicIdStartString="-//W3C//DTD XHTML 1.1//EN" catalog="file:///usr/share/xml/xhtml/schema/dtd/1.1/catalog.xml"/>
<delegatePublic publicIdStartString="-//W3C//DTD XHTML 1.0 Strict//EN" catalog="file:///usr/share/xml/xhtml/schema/dtd/1.0/catalog.xml"/>
<delegatePublic publicIdStartString="-//W3C//DTD XHTML 1.0 Frameset//EN" catalog="file:///usr/share/xml/xhtml/schema/dtd/1.0/catalog.xml"/>
<delegatePublic publicIdStartString="-//W3C//ENTITIES Symbols for XHTML//EN" catalog="file:///usr/share/xml/entities/xhtml/catalog.xml"/>
<delegatePublic publicIdStartString="-//W3C//ENTITIES Special for XHTML//EN" catalog="file:///usr/share/xml/entities/xhtml/catalog.xml"/>
<delegatePublic publicIdStartString="-//W3C//ENTITIES Latin 1 for XHTML//EN" catalog="file:///usr/share/xml/entities/xhtml/catalog.xml"/>
<delegatePublic publicIdStartString="-//W3C//DTD XHTML Basic//EN" catalog="file:///usr/share/xml/xhtml/schema/dtd/catalog.xml"/>
<delegatePublic publicIdStartString="-//W3C//DTD HTML//EN" catalog="file:///usr/share/xml/xhtml/schema/dtd/catalog.xml"/>
</catalog>

and /usr/share/xml/entities/xhtml/catalog.xml contains:

[...]
<group prefer="public">
<!-- ISO latin 1 entity set for Extensible HTML (XML 1.0 format) -->
<public publicId="-//W3C//ENTITIES Latin 1 for XHTML//EN" uri="xhtml-lat1.ent"/>
<public publicId="-//W3C//ENTITIES Symbols for XHTML//EN" uri="xhtml-symbol.ent"/>
<public publicId="-//W3C//ENTITIES Special for XHTML//EN" uri="xhtml-special.ent"/>
</group>
[...]

so that libxml2 gets the right files only by using public identifiers.

Vincent Lefevre

unread,
Sep 19, 2021, 9:30:03 PM9/19/21
to
On 2021-09-19 22:59:31 +0200, Mattia Rizzolo wrote:
> On Sun, Sep 19, 2021 at 09:45:19PM +0200, Vincent Lefevre wrote:
> > On 2021-09-19 19:15:54 +0200, Mattia Rizzolo wrote:
> > > I can never manage to download DTDs from w3.org (how could you?!), so,
> > > taking your testcase and a copy of the same DTD:
> >
> > The DTD is provided by Debian, no need to download it.
>
> But you need to instruct xmllint to use said DTD, it won't by its own
> decision to pick a random DTD from the filesystem.

No, this is not necessary with a correctly configured system.
This is not a random DTD, but the DTD mentioned in the HTML file,
which has the standard public identifier

"-//W3C//DTD XHTML 1.0 Strict//EN"

Then libxml2 can find the right file on the local file system via
catalogs. In my case (which is the *default* setup with Debian
packages on my system, i.e. I haven't changed anything about that
in /etc):

/etc/xml/catalog contains

<delegatePublic publicIdStartString="-//W3C//DTD XHTML 1.0" catalog="file:///etc/xml/w3c-dtd-xhtml.xml"/>

so that libxml2 then uses /etc/xml/w3c-dtd-xhtml.xml, which contains

<delegatePublic publicIdStartString="-//W3C//DTD XHTML 1.0 Strict//EN" catalog="file:///usr/share/xml/xhtml/schema/dtd/1.0/catalog.xml"/>

so that libxml2 then uses
/usr/share/xml/xhtml/schema/dtd/1.0/catalog.xml, which contains

<public publicId="-//W3C//DTD XHTML 1.0 Strict//EN" uri="xhtml1-strict.dtd"/>

so that libxml2 gets the file

/usr/share/xml/xhtml/schema/dtd/1.0/xhtml1-strict.dtd

There is the same mechanism for the .ent files referenced
by xhtml1-strict.dtd, i.e. via public identifiers.

> I also know how to
> use apt-file myself:
> | % apt-file search xhtml1-strict.dtd
> | dita-ot: /usr/share/dita-ot/demo/h2d/dtd/xhtml1-strict.dtd
> | erlang-erl-docgen: /usr/lib/erlang/lib/erl_docgen-1.1.1/priv/dtd/xhtml1-strict.dtd
> | kate5-data: /usr/share/katexmltools/xhtml1-strict.dtd.xml
> | libpxp-ocaml-dev: /usr/share/doc/libpxp-ocaml-dev/examples/namespaces/xhtml1-strict.dtd.gz
> | librdf-rdfa-parser-perl: /usr/share/perl5/auto/share/dist/RDF-RDFa-Parser/catalogue/www.w3.org/MarkUp/DTD/xhtml1-strict.dtd
> | w3-recs: /usr/share/doc/w3-recs/html/www.w3.org/TR/2002/REC-xhtml1-20020801/DTD/xhtml1-strict.dtd.gz
> | w3c-sgml-lib: /usr/share/xml/w3c-sgml-lib/schema/dtd/REC-xhtml1-20020801/xhtml1-strict.dtd
> | xemacs21-basesupport: /usr/share/xemacs21/xemacs-packages/etc/psgml-dtds/xhtml1-strict.dtd
> | xmlcopyeditor: /usr/share/xmlcopyeditor/dtd/xhtml1-strict.dtd
> | %
>
> indeed the one I used is the one from xmlcopyeditor (I picked a random
> package, trusting that said .dtd is actually the same as all of the
> above).

The one I'm using is from w3c-dtd-xhtml, apparently no longer
available in Debian (my machine is a Debian/unstable one installed
about 5 years ago, and Debian won't replace the package by
w3c-sgml-lib automatically). In any case, the concerned files
from w3c-sgml-lib seem to be the same with minor differences.

> My system is fine. That error message is only a red herring due to
> --nonet,

Everything is on the local filesystem. There is no reason to do
any network access! If libxml2 tries to do a network access, this
means that something on your system is broken... perhaps catalogs
that are not set up correctly.

> and indeed the return code of xmllint is 0.

Don't look at the return code of xmllint; it is not reliable.
Even in case of bad usage, it will sometimes return 0:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=727075

Validation issues are reported on stderr, e.g. with a working libxml2:

$ xmllint --loaddtd --nonet --noout test.html
test.html:6: parser error : EndTag: '</' not found

^

> If you prefer, I can modify the DOCTYPE and do this instead, so there
> won't be "I/O error"s and the return code is clear:
>
> mattia@warren /tmp/tmp/xml % cat test.html
> <?xml version="1.0" encoding="utf-8"?>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "file:///tmp/tmp/xml/xhtml1-strict.dtd">
> <html xmlns="http://www.w3.org/1999/xhtml">
> <head><title>title</title></head>
> <body><p>text</p></body>
> </html>
> mattia@warren /tmp/tmp/xml % xmllint --noout --nonet test.html ; echo $?
> 0

Wrong test. You forgot to load the DTD!

Please try:

xmllint --loaddtd --noout --nonet test.html

Note: you may also need to copy the 3 .ent files referenced by
the DTD in the same directory:

<!ENTITY % HTMLlat1 PUBLIC
"-//W3C//ENTITIES Latin 1 for XHTML//EN"
"xhtml-lat1.ent">
%HTMLlat1;

<!ENTITY % HTMLsymbol PUBLIC
"-//W3C//ENTITIES Symbols for XHTML//EN"
"xhtml-symbol.ent">
%HTMLsymbol;

<!ENTITY % HTMLspecial PUBLIC
"-//W3C//ENTITIES Special for XHTML//EN"
"xhtml-special.ent">
%HTMLspecial;

I have tried that:

$ ls -l /tmp/tmp/xml
total 68
-rw-r--r-- 1 vinc17 vinc17 13484 2012-04-24 22:49:16 xhtml-lat1.ent
-rw-r--r-- 1 vinc17 vinc17 4486 2012-04-24 22:49:16 xhtml-special.ent
-rw-r--r-- 1 vinc17 vinc17 13748 2012-04-24 22:49:16 xhtml-symbol.ent
-rw-r--r-- 1 vinc17 vinc17 25473 2012-04-24 22:49:15 xhtml1-strict.dtd

With libxml2 2.9.10+dfsg-6.7, strace shows that every file is loaded
from this directory, and I get no output, as expected.

But with libxml2 2.9.12+dfsg-4, I get:

$ xmllint --loaddtd --noout --nonet test.html
error : xmlAddEntity: invalid redeclaration of predefined entity
error : xmlAddEntity: invalid redeclaration of predefined entity

and strace still shows that every file is loaded from this directory.

Something interesting:

openat(AT_FDCWD, "/tmp/tmp/xml/xhtml-lat1.ent", O_RDONLY) = 5
lseek(5, 0, SEEK_CUR) = 0
read(5, "<!-- ..........................."..., 8192) = 8192
read(5, " \"&#215;\" ><!-- multiplication "..., 16384) = 5292
read(5, "", 11092) = 0
brk(0x559087649000) = 0x559087649000
close(5) = 0
[...]
openat(AT_FDCWD, "/tmp/tmp/xml/xhtml-symbol.ent", O_RDONLY) = 5
lseek(5, 0, SEEK_CUR) = 0
read(5, "<!-- ..........................."..., 8192) = 8192
read(5, " rArr can be used for 'impli"..., 16384) = 5556
read(5, "", 10828) = 0
close(5) = 0
[...]
openat(AT_FDCWD, "/tmp/tmp/xml/xhtml-special.ent", O_RDONLY) = 5
lseek(5, 0, SEEK_CUR) = 0
read(5, "<!-- ..........................."..., 8192) = 4486
read(5, "", 3706) = 0
write(2, "error : ", 8) = 8
write(2, "xmlAddEntity: invalid redeclarat"..., 57) = 57
write(2, "error : ", 8) = 8
write(2, "xmlAddEntity: invalid redeclarat"..., 57) = 57
close(5) = 0

So the issue seems to occur when reading xhtml-special.ent.

Hmm... there seems to be a subtle difference in xhtml-special.ent:

With the file from w3c-dtd-xhtml:

<!ENTITY quot "&#34;" ><!-- quotation mark = APL quote, U+0022 ISOnum -->
<!ENTITY amp "&#38;" ><!-- ampersand, U+0026 ISOnum -->
<!ENTITY lt "&#60;" ><!-- less-than sign, U+003C ISOnum -->
<!ENTITY gt "&#62;" ><!-- greater-than sign, U+003E ISOnum -->

But with the file from w3c-sgml-lib:

<!ENTITY lt "&#38;#60;" ><!-- less-than sign, U+003C ISOnum -->
<!ENTITY gt "&#62;" ><!-- greater-than sign, U+003E ISOnum -->
<!ENTITY amp "&#38;#38;" ><!-- ampersand, U+0026 ISOnum -->
<!ENTITY apos "&#39;" ><!-- The Apostrophe (Apostrophe Quote, APL Quote), U+0027 ISOnum -->
<!ENTITY quot "&#34;" ><!-- quotation mark (Quote Double), U+0022 ISOnum -->

The errors correspond to amp and lt.

Now, I don't know whether the new libxml2 version is too picky,
or there was a real issue with the old entity files (ignored
by all parsers until now?). In the latter case, I think that
there should be a Breaks against w3c-dtd-xhtml.

One more thing: I've just checked on my Debian/stable machine,
which just has w3c-sgml-lib installed:
"xmllint --loaddtd --nonet --noout" works without any error.
Thus there should be no issue by switching w3c-dtd-xhtml to
w3c-sgml-lib.

Vincent Lefevre

unread,
Sep 19, 2021, 10:00:03 PM9/19/21
to
Control: retitle -1 libxml2: XHTML 1.0 validation is broken with w3c-dtd-xhtml's xhtml-special.ent file
Control: tags -1 - unreproducible

This should be reproducible with w3c-dtd-xhtml's xhtml-special.ent file.
The summary of the actual issue is below.

On 2021-09-20 03:18:46 +0200, Vincent Lefevre wrote:
[...]
> So the issue seems to occur when reading xhtml-special.ent.
>
> Hmm... there seems to be a subtle difference in xhtml-special.ent:
>
> With the file from w3c-dtd-xhtml:
>
> <!ENTITY quot "&#34;" ><!-- quotation mark = APL quote, U+0022 ISOnum -->
> <!ENTITY amp "&#38;" ><!-- ampersand, U+0026 ISOnum -->
> <!ENTITY lt "&#60;" ><!-- less-than sign, U+003C ISOnum -->
> <!ENTITY gt "&#62;" ><!-- greater-than sign, U+003E ISOnum -->
>
> But with the file from w3c-sgml-lib:
>
> <!ENTITY lt "&#38;#60;" ><!-- less-than sign, U+003C ISOnum -->
> <!ENTITY gt "&#62;" ><!-- greater-than sign, U+003E ISOnum -->
> <!ENTITY amp "&#38;#38;" ><!-- ampersand, U+0026 ISOnum -->
> <!ENTITY apos "&#39;" ><!-- The Apostrophe (Apostrophe Quote, APL Quote), U+0027 ISOnum -->
> <!ENTITY quot "&#34;" ><!-- quotation mark (Quote Double), U+0022 ISOnum -->
>
> The errors correspond to amp and lt.
>
> Now, I don't know whether the new libxml2 version is too picky,
> or there was a real issue with the old entity files (ignored
> by all parsers until now?). In the latter case, I think that
> there should be a Breaks against w3c-dtd-xhtml.
>
> One more thing: I've just checked on my Debian/stable machine,
> which just has w3c-sgml-lib installed:
> "xmllint --loaddtd --nonet --noout" works without any error.
> Thus there should be no issue by switching w3c-dtd-xhtml to
> w3c-sgml-lib.

FYI, the change of xhtml-special.ent upstream seems to be in

https://github.com/w3c/markup-validator/commit/fa78ea2526fe20a89c90c4734f704fb0126186fd

(the diff output by git seems incorrect: one needs to browse the
files from the parent d1431fc to see the old version).

Vincent Lefevre

unread,
Sep 20, 2021, 5:50:03 AM9/20/21
to
Concerning the change in the libxml2 code, I found this:

https://gitlab.gnome.org/GNOME/libxml2/-/commit/01411e7c5ea0fff181271e092f46a2138c3720ec
"Check for invalid redeclarations of predefined entities"

with the example of the incorrect

<!ENTITY lt "<">

which was in the old libxml2 testcases, BTW.

Thus this is intentional. But such a major change (since this breaks
official DTDs released in the past, something which should normally
*never* happen) should have at least been announced somewhere.
Otherwise one doesn't know what's going on (even a web search for the
error message led to nothing -- now, there's only my bug report...).

Now, I understand why there's nothing mentioned in the NEWS file,
which is a symlink to the changelog file: this file stops at
"v2.9.9: Jan 03 2019", while this version is 2.9.12.

The upstream release notes of libxml2 2.9.11

https://mail.gnome.org/archives/xml/2021-May/msg00000.html

contain:

- Check for invalid redeclarations of predefined entities (Nick Wellnhofer)

Note that this change is recent, so that most users (Debian or not)
have not upgraded yet. Whether the issue would be more visible once
most users have upgraded (in particular if the old DTDs have been
archived locally with the XML data), I don't know.

Mattia Rizzolo

unread,
Sep 20, 2021, 5:50:05 AM9/20/21
to
On Mon, Sep 20, 2021 at 03:55:39AM +0200, Vincent Lefevre wrote:
> Control: retitle -1 libxml2: XHTML 1.0 validation is broken with w3c-dtd-xhtml's xhtml-special.ent file
>
> This should be reproducible with w3c-dtd-xhtml's xhtml-special.ent file.
> The summary of the actual issue is below.

Yes, indeed it is.

> > The errors correspond to amp and lt.
> >
> > Now, I don't know whether the new libxml2 version is too picky,
> > or there was a real issue with the old entity files (ignored
> > by all parsers until now?).

I bisected libxml2:

01411e7c5ea0fff181271e092f46a2138c3720ec is the first bad commit
commit 01411e7c5ea0fff181271e092f46a2138c3720ec
Author: Nick Wellnhofer <welln...@aevum.de>
Date: Mon Feb 8 20:58:32 2021 +0100

Check for invalid redeclarations of predefined entities

https://gitlab.gnome.org/GNOME/libxml2/-/commit/01411e7c5ea0fff181271e092f46a2138c3720ec

So it's clearly intentional of libxml2 to be more picky now, and flag
this issue in the old dtd.

> > In the latter case, I think that
> > there should be a Breaks against w3c-dtd-xhtml.

On its way.



Thanks for your help in debugging this issue.
signature.asc

Mattia Rizzolo

unread,
Sep 20, 2021, 6:20:03 AM9/20/21
to
On Mon, Sep 20, 2021 at 11:41:38AM +0200, Vincent Lefevre wrote:
> Please also make sure that the NEWS file is up-to-date; see my other
> message. This is also useful for the user when getting regressions
> in general (possibly from bug fixes like here).

I'm not sure I'd like to add such item to the Debian's NEWS. It would
stop updates for too many users that most likely are not affected. For
now, you are really the only one that brought up this issue.

> BTW, the error message should be more detailed, e.g. saying which
> entity and which URI. This would have made debugging so much easier.
> But that's a separate issue; I'll report a bug upstream if this has
> not already been done.

It hasn't been done, so you should raise a bug with them if you think
they should.

> I'm wondering whether this check for invalid redeclarations of
> predefined entities should also go to Debian/stable since it fixes
> an integer overflow at the same time:
>
> https://gitlab.gnome.org/GNOME/libxml2/-/issues/217
>
> Any security issue related to that?

AFAIK not yet at least.
signature.asc

Vincent Lefevre

unread,
Sep 20, 2021, 8:40:04 AM9/20/21
to
On 2021-09-20 11:58:00 +0000, Torrance, Douglas wrote:
> Control: affects -1 src:macaulay2
>
> I believe this bug is also affecting the build of the Macaulay2 package. From
> [1,2]:
>
> /usr/bin/make -C M2 validate-html
> make[2]: Entering directory '/<<BUILDDIR>>/macaulay2-1.18.0.1+git202109031258/M2'
> -- validating all html and xhtml files in /<<BUILDDIR>>/macaulay2-1.18.0.1+git202109031258/M2/usr-dist/common/share/doc/Macaulay2
> validating: BGG/html/_direct__Image__Complex.html
> *** invalid HTML: /<<BUILDDIR>>/macaulay2-1.18.0.1+git202109031258/M2/usr-dist/common/share/doc/Macaulay2/BGG/html/_direct__Image__Complex.html
> error: line 338: xmlParseEntityDecl: entity xhtml-qname-extra.mod not terminated

The error message is different. I'd say that this is a different issue.

Thorsten Glaser

unread,
Sep 20, 2021, 11:20:03 AM9/20/21
to
On Mon, 20 Sep 2021, Vincent Lefevre wrote:

> Then libxml2 can find the right file on the local file system via
> catalogs. In my case (which is the *default* setup with Debian

I never understood this catalogue thing. When I tried it, it didn’t
work for me (that may admittedly have been multiple releases ago),
the documentation was as good as Chinese to me, and… meh.

> Hmm... there seems to be a subtle difference in xhtml-special.ent:

Interesting.

I’m working with an XHTML 1.1 DTD, which has the entities inline
(not sure if that was my doing or if I got it like this) and it
too has:

<!-- C0 Controls and Basic Latin -->
<!ENTITY quot "&#34;"> <!-- quotation mark, U+0022 ISOnum -->
<!ENTITY amp "&#38;#38;"> <!-- ampersand, U+0026 ISOnum -->
<!ENTITY lt "&#38;#60;"> <!-- less-than sign, U+003C ISOnum -->
<!ENTITY gt "&#62;"> <!-- greater-than sign, U+003E ISOnum -->
<!-- note: not specified in HTML 4 -->
<!ENTITY apos "&#39;"> <!-- apostrophe = APL quote, U+0027 ISOnum -->

But if this upstream change affects DTDs that were once released, maybe
it should accept, but ignore, this specific wrong redeclaration. Though
you said the bug was introduced in a Debian package only… where did the
package get the wrong .ent files from? If this is truly Debian-local, I
agree nothing than the conflict is probably needed.

bye,
//mirabilos
--
15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)

Vincent Lefevre

unread,
Sep 20, 2021, 11:50:03 AM9/20/21
to
On 2021-09-20 15:00:00 +0000, Torrance, Douglas wrote:
> On Mon 20 Sep 2021 08:17:11 AM EDT, Vincent Lefevre <vin...@vinc17.net> wrote:
> > The error message is different. I'd say that this is a different issue.
>
> Fair enough. I think it's related -- the latest release is more strict about
> DTD files --

Yes, but there seem to be many changes, thus potentially several kinds
of regressions for different reasons. It is better to report one bug
per kind of regression.

> but involving a different DTD file, xhtml-math-svg.dtd from the
> w3c-sgml-lib package. Here's the output of xmllint, which gives a
> bit more info:
>
> $ xmllint --noout --loaddtd
> /usr/share/doc/Macaulay2/Macaulay2Doc/html/_ideal.html file:///usr/share/xml/w3c-sgml-lib/schema/dtd/WD-XHTMLplusMathMLplusSVG-20020809/xhtml-math-svg.dtd:338:
> parser error : xmlParseEntityDecl: entity xhtml-qname-extra.mod not
> terminated
> %xhtml-qname-extra.decl;
> ^
> Entity: line 2:
> "http://www.w3.org/Math/DTD/mathml2/mathml2-qname-1.mod"
> ^
> Perhaps this should be filed against w3c-sgml-lib?

I don't know about this one. It could be a bug either in libxml2
(e.g. an unexpected regression in a corner case due to some fix
of a more general case) or in the DTD from w3c-sgml-lib. Some
investigation would be needed.

Vincent Lefevre

unread,
Sep 23, 2021, 5:10:03 AM9/23/21
to
On 2021-09-20 12:11:17 +0200, Mattia Rizzolo wrote:
> On Mon, Sep 20, 2021 at 11:41:38AM +0200, Vincent Lefevre wrote:
> > BTW, the error message should be more detailed, e.g. saying which
> > entity and which URI. This would have made debugging so much easier.
> > But that's a separate issue; I'll report a bug upstream if this has
> > not already been done.
>
> It hasn't been done, so you should raise a bug with them if you think
> they should.

I've now reported the bug about the error message:

https://gitlab.gnome.org/GNOME/libxml2/-/issues/308

Of course, dropping the error as I suggested in

https://gitlab.gnome.org/GNOME/libxml2/-/issues/307

would also solve the issue.
0 new messages