%q doesn't format invalid utf8

78 views
Skip to first unread message

Rett Berg

unread,
Jun 7, 2024, 11:26:06 AMJun 7
to lu...@googlegroups.com
On lua 5.3 strings don't format invalid utf8

> ('%q'):format'\xF3'
""
> ('%q'):format'\x80'
""

It looks like there IS a byte inside the quotes, it just doesn't display.
> s = ('%q'):format'\x80'
> #s
3

Shouldn't the result of ('%q'):format'\x80' be "\x80" instead?

Best,
Rett

Francisco Olarte

unread,
Jun 7, 2024, 11:59:28 AMJun 7
to lu...@googlegroups.com
On Fri, 7 Jun 2024 at 17:26, Rett Berg <goog...@gmail.com> wrote:
> On lua 5.3 strings don't format invalid utf8
> > ('%q'):format'\xF3'
> ""
> > ('%q'):format'\x80'
> ""

Maybe there is a problem with your stdout visualizing (terminal).
Using "terminator" on debian linux:
$ lua5.3
Lua 5.3.6 Copyright (C) 1994-2020 Lua.org, PUC-Rio
> string.byte("abc",1,-1)
97 98 99
> string.byte(('%q'):format'\x80',1,-1)
34 128 34
> ('%q'):format'\x80'
"�"

I do not know how it will transmit across the line, my side has
mojibake, the generic replacement character ( question mark on a pi/4
tilted square ) between the quotes, >>"?"<<

> It looks like there IS a byte inside the quotes, it just doesn't display.
> > s = ('%q'):format'\x80'
> > #s
> 3

When in doubt, use string.byte to dump the real contents.

> Shouldn't the result of ('%q'):format'\x80' be "\x80" instead?

IIRC lua is "8 bit clean" somehow, it can read that back ( although my
terminal cannot display it properly, it probably can if I set it to
latin 1 or some other full 8 bit code.

( Manual states "Lua is 8-bit clean: strings can contain any 8-bit
value, including embedded zeros ('\0'). Lua is also encoding-agnostic;
it makes no assumptions about the contents of a string.", since
compilation (load) is done via intermediate strings it should work:

> s = ('%q'):format'\x80'
> f = load("return "..s)
> t = f()
> string.byte(t,1,-1)
128

Works for me at least.

Francisco Olarte.

Rett Berg

unread,
Jun 8, 2024, 3:05:06 PMJun 8
to lua-l
> > ('%q'):format'\x80'
> "�"
Interesting, so the character is there. And yes, it did "transmit across the line"


> IIRC lua is "8 bit clean" somehow, it can read that back ( although my
> terminal cannot display it properly, it probably can if I set it to
> latin 1 or some other full 8 bit code.

Interesting, so even though it's not valid utf8, lua can still serialize it correctly from a text file -- therefore there is no REQUIREMENT that %q escape the invalid utf8 characters.

Huh, I suppose that all makes sense! Thanks!

Francisco Olarte

unread,
Jun 8, 2024, 3:26:25 PMJun 8
to lu...@googlegroups.com
On Sat, 8 Jun 2024 at 21:05, Rett Berg <goog...@gmail.com> wrote:
> Interesting, so even though it's not valid utf8, lua can still serialize it correctly from a text file -- therefore there is no REQUIREMENT that %q escape the invalid utf8 characters.

Lua strings/source code is NOT utf8. They are just byte strings, more
or less. If you think on it as always using latin-1, or other similar
encoding ( where all characters encode into a single byte ) you will
probably find it easier to reason, and if you set your terminal to
some 1-byte code it will be easier too.

Utf8 is more or less a system to encode codepoint ( which are numbers,
but you can represent them in any way you like internally, as 32 bit
ints, 128 bit floats, whatever suits you ) sequences into byte
sequences. The problem is in Lua, like in not-too-recent C, you do not
have separate types for byte and characteres, so they get mixed up in
peoples heads.

Francisco Olarte

Rett Berg

unread,
Jun 8, 2024, 6:01:08 PMJun 8
to lu...@googlegroups.com
Being the author of stfu8 (for rust) I know waaay too much about how Unicode works lol


I kind of like that Lua code works directly on bytes. Thanks!

Best,
Rett

--
You received this message because you are subscribed to a topic in the Google Groups "lua-l" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lua-l/fLthnm3ejko/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lua-l+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lua-l/CA%2BbJJbxYj7mgO%3DF%2B%3Db-gOnGXt-gGdmN4h6oP5SGiA-9iO8c_fA%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages