Comparing strings, strcoll, and undefined behavior

75 views
Skip to first unread message

Halalaluyafail3

unread,
Oct 14, 2025, 6:01:05 PM (3 days ago) Oct 14
to lua-l
According to the reference manual, strings are relationally compared using the current locale:

> Otherwise, if both arguments are strings, then their values are compared according to the current locale.

From the implementation of this in l_strcmp it appears this means to invoke strcoll:

> int temp = strcoll(s1, s2);

Where it will repeatedly invoke the function to handle strings that contain zero bytes in them. Furthermore, it seems like there is no check on the contents of the strings beforehand so any string is accepted. However, according to the response to DR 484:

> ... the behavior of strcoll in the face of invalid input is already clearly undefined.

The behavior is undefined if strcoll is given "invalid input", which for example in a UTF-8 locale I assume means invalid UTF-8 strings. I don't know of any platforms that do anything bad on invalid input, but I think this undefined behavior is worth considering. Also, I don't know of any feasible way to validate strings for strcoll. I would expect copying LC_COLLATE to LC_CTYPE, validating the strings with mbsrtowcs (requires C94), then restoring LC_CTYPE to work. But that seems like an excessive amount of extra work just to make string comparison work.

bil til

unread,
Oct 15, 2025, 1:55:13 AM (3 days ago) Oct 15
to lu...@googlegroups.com
the comparison in Lua is based on string compare functions in C
compiler I think.

The handling of "current locale string comparision issue" typically is
handled very differently for different C compilers.

Can you specify, which machine you use / which operating system /
which C compiler?

Am Mi., 15. Okt. 2025 um 00:01 Uhr schrieb Halalaluyafail3
<luigi...@gmail.com>:

Halalaluyafail3

unread,
Oct 15, 2025, 1:27:32 PM (3 days ago) Oct 15
to lua-l
> the comparison in Lua is based on string compare functions in C compiler I think.

Yes, as I linked it uses the C function strcoll.


> The handling of "current locale string comparision issue" typically is handled very differently for different C compilers.
>
> Can you specify, which machine you use / which operating system / which C compiler?

That shouldn't be needed here. Whether something is undefined behavior or can cause undefined behavior can be known without any particular implementation. For example, INT_MAX+1 is undefined behavior and that can be known without needing to try any particular implementation.
Reply all
Reply to author
Forward
0 new messages