[PATCH v3] Fix variadic type mismatch in pushutfchar -- passes long but %U expects unsigned long

102 views
Skip to first unread message

Weixie Cui

unread,
Feb 25, 2026, 11:12:01 AM (6 days ago) Feb 25
to lu...@googlegroups.com, Weixie Cui
From: Weixie Cui <cuiw...@gmail.com>

---
lutf8lib.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lutf8lib.c b/lutf8lib.c
index b7f3fe1e..732f51b0 100644
--- a/lutf8lib.c
+++ b/lutf8lib.c
@@ -146,7 +146,7 @@ static int codepoint (lua_State *L) {
static void pushutfchar (lua_State *L, int arg) {
lua_Unsigned code = (lua_Unsigned)luaL_checkinteger(L, arg);
luaL_argcheck(L, code <= MAXUTF, arg, "value out of range");
- lua_pushfstring(L, "%U", (long)code);
+ lua_pushfstring(L, "%U", (unsigned long)code);
}


--
2.39.5 (Apple Git-154)

Halalaluyafail3

unread,
Feb 25, 2026, 12:22:24 PM (6 days ago) Feb 25
to lua-l
I think the code is fine as is, because va_arg allows mixing corresponding singed and unsigned types as long as the value is representable in both types. 

Denis Dos Santos Silva

unread,
Feb 26, 2026, 8:17:14 AM (5 days ago) Feb 26
to lua-l
c89/c90 ANSI:
lua_pushfstring(L, "%lu", (unsigned long)code);

Luiz Henrique de Figueiredo

unread,
Feb 26, 2026, 8:22:50 AM (5 days ago) Feb 26
to lu...@googlegroups.com
%U means UTF-8. See lobject.c:

case 'U': { /* an 'unsigned long' as a UTF-8 sequence */

Philippe Verdy

unread,
Feb 26, 2026, 9:18:49 AM (5 days ago) Feb 26
to lu...@googlegroups.com
Also if "%U" means UTF-8, the only acceptable values for a standard result must be positive and with at most 21 bits (in the range 0 to 0x1FFFFFL); passing a long or unsigned long would not change anything for such valid values; Even with the non-standard UTF-8 (from the obsolete first RFC that described it), it was only possible to encodode values up to 31 bits, so long or unsigned long did not change the result for any signed or unsigned integer type with an encoded binary length of at least 32-bit (true for both "long" and "unsigned long" in C, even on 8-bit platforms where both "int" and "short" were stored on 16 bits).

I don't know exactly how  "lobject.c" in Lua sources is supposed to handle any value that does not fit in the supported range (either from the UCS standard, or the wider range from the obsolete -RFC version that was proposed informally in early ISO 10646 where it was still not a a standardized encoding form but just a recommendation, that was finally not adopted by the joint efforts of the ISO the UTC, the RFC Editor for the UCS standard). None of these standards have allowed negative values for code points (and in fact many system-level I/O APIs use negative valuse not for code points, but only -1 as a special "end of file" condition indicating there's no "character" to retrieve from an input stream, other negative values are not used in any API standard I know for common language bindings in C, C++, Java, Javascript, etc.).

--
You received this message because you are subscribed to the Google Groups "lua-l" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lua-l+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lua-l/CABt16q%3DcP4MLJ0zZQgNh5jMOubNwyWeEJoBU295z_va0FKMwCw%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages