Notes on the non-BMP character problem

13 views
Skip to first unread message

Jamie Norrish

unread,
Sep 5, 2022, 2:50:11 AM9/5/22
to efes-...@googlegroups.com
Hi,

With the aid of someone else experiencing the problem of non-BMP
Unicode characters breaking EFES/Kiln, I have managed to gather a
little more information.

Primarily, the problem occurs during serialisation only, and only
serialising as XML.

A map:match containing no map:transform step and serialised as XML
triggers the problem.

Setting the encoding and charset (both to UTF-8) on the serialiser
makes no difference to whether the problem occurs.

Serialising as text does not trigger the problem, whether there are
map:transforms or not.

It might be worth testing whether serialising as UTF-16 (changing the
encoding on the "xml" serialiser defined in config.xmap) causes the
issue.

We know of course that there is no direct bug in the Cocoon code (or
else it would always trigger), so a developer with a machine that
experiences the issue is likely needed to perform thorough debugging of
what is going on.

Jamie
Reply all
Reply to author
Forward
0 new messages