Correctly escape Euro symbol in XML files on write (Issue #22967)

336 views
Skip to first unread message

MortenMacFly

unread,
Nov 13, 2022, 8:49:53 AM11/13/22
to wx-...@googlegroups.com, Subscribed

Description

Use-case: User enters something in a text-box and this content is written to an XML file using wxXmlDocument.
Issue: When the user enters the Euro symbol (€) "garbage" is written to the XML file in UTF 8 while it should be: € to avoid any encoding conflicts (it results in encoding conflicts with external XML tools with the "garbage", unfortunately).

Possible solution / enhancement

The Euro symbol should be escaped on write similar to <, >, quotes etc.
I believe this could be handled in the method OutputEscapedString in [GIT]\src\xml\xml.cpp but I don't know if this is supposed to be the right place. Or maybe you have a better solution?

Please note that reading &#x20AC; that is present in a XML file works properly, just not writing it back to the file.


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/issues/22967@github.com>

MortenMacFly

unread,
Nov 13, 2022, 9:37:23 AM11/13/22
to wx-...@googlegroups.com, Subscribed

Answering myself: I see that this "solution" may become an issue when saving XML files as UTF-8, as this is a 16 bit notation?


Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/issues/22967/1312745930@github.com>

MortenMacFly

unread,
Nov 13, 2022, 9:55:24 AM11/13/22
to wx-...@googlegroups.com, Subscribed

Closed #22967 as completed.


Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/issue/22967/issue_event/7798537399@github.com>

MortenMacFly

unread,
Nov 13, 2022, 9:55:26 AM11/13/22
to wx-...@googlegroups.com, Subscribed

Oh dear, this is something that is crazy... I found out that actually wxWidgets might be correct here. According to the standard, the the Euro symbol should be escaped as:

UTF-8-Encoding: 0xE2 0x82 0xAC
UTF-16-Encoding: 0x20AC
UTF-32-Encoding: 0x000020AC

My file is UTF-8 and the first one is the "garbage" I am referring to. I was wondering why it is 3 bytes and thoughts thats an error, but it seems correct. What a pity. So I have to find another way how to cope with the fact that this is mis-interpreted by several 3rd party XML tools I use. :-(

Sorry for the noise...


Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/issues/22967/1312749511@github.com>

Reply all
Reply to author
Forward
0 new messages