sur-behoffski
unread,Jul 20, 2025, 6:01:12 AMJul 20Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to lua-l
G'day,
During the recent Unicode-in-Lua discussions, I started thinking
about the string syntax "\u{h...h}": PiL, 4th Ed, Section 4.1
(p. 31):
Since Lua 5.3, we can also specify UTF-8 characters with
the escape sequence \u{h...h}; we can write any number of
hexadecimal digits inside the brackets.
The library "utf8" was also introduced in 5.3.
I noted that string.format's "%q" operator, designed to allow
serialization of internal values in such a way that all special
characters to the general Lua parser were escaped,e.g.:
a = 'a "problematic" \\string'
print(string.format("%q", a) --> "a \"problematic\" \\string"
I noticed that quoting was only applied to relevant ASCII
characters that had special meaning in Lua programs; all the UTF-8
codes had a byte value higher than 0x7f (127), and so, not part of
the ASCII character set, were simply passed through verbatim.
----
My thought that, since the raw Lua parser accepts the "\u{...}"
syntax, that some potential users might want a "%q On Steroids",
with UTF-8 sequences emitted using the "\u{...}" syntax. This
would have a nice side-effect of making the serialized text 7-bit
clean (although invalid UTF-8 sequences would need to be
presented as a series of one or more "\ddd" specifiers). The
resulting string would make a string easier to read, as the
code point(s) would be shown directly, instead of through the
UTF-8 encoding syntax/semantics.
I toyed with the idea of a "%Q" format specifier, but I suspect
that this would be a breaking/incompatible change that's more
trouble than it's worth.
So, perhaps adding a function to utf8 to perform this transformation
could be worthwhile?
----
Just a thought bubble; comments for/against welcome.
cheers, s-b etc