On Thu Jul 17, 2025 at 7:45 PM CEST, 'Scott Morgan' via lua-l wrote:
> If you know the data is UTF-8, then use Lua's utf8 module. But you may
> find you'll need to supplement that with some extra functions/modules to
> comfortably handle UTF-8 text.
To be more precise, there is no support for Unicode in Lua
whatsoever. That is a not criticism of Lua as a language, it was
intended as a small embedded language, and proper Unicode-aware
string operations are a way more complicated than that.
In terms of the support of Unicode in Python (which is the
language I know the best) it is somewhere below the version 2.*,
i.e., “strings are bunch of bytes and we don’t do anything with
them”. I am supporter and user of the vis editor [1], which has
almost all functionality in Lua scripts, and when we got to just
primitive string operations like “upper-case” or “lower-case” a
string [2], we got to the situation that the only OS-independent
solution (which wouldn’t require writing it in C) was to pipe the
string through awk and even that doesn’t work well (i.e., both on
Linux, Mac OS X and *BSD).
As I said, it is not a problem, that the language doesn’t support
this, but it is sad, that I don’t know about any good Lua
library, which would be doing this, unfortunately, not even my
preferred PenLight.
Best,
Matěj
[1]
https://sr.ht/~martanne/vis/
[2]
https://lists.sr.ht/~martanne/devel/patches/49212 ; just
before you try to argue otherwise, string.upper("Да Нет
Dědeček") doesn’t give correct answer.
--
http://matej.ceplovi.cz/blog/, @mc...@en.osm.town
GPG Finger: 3C76 A027 CA45 AD70 98B5 BC1D 7920 5802 880B C9D8
Get up, stand up, don't give up the fight!
-- Bob Marley