regression on zOS between 5.4.8 to 5.5.0 on zOS

283 views
Skip to first unread message

Eric Covener

unread,
Jan 16, 2026, 10:23:06 AMJan 16
to lu...@googlegroups.com
Hi, apologies in advance as I am only a casual lua user and only
moonlight on zOS (IBM's big-endian, ebcdic, mainframe OS) where this
is a problem.

I maintain a distribution of Apache HTTP Server on this platform,
which has lua extensions. It seems something between 5.4.8 and 5.5.0
has regressed the basics of the integration between the two.

We don't customize luaconf.h at all and this also works on 5.2.4.

When Apache invokes a very simple lua script, after just a very few
calls through our metatable, calling a C function will error out on:
> attempt to index a number value (local 'r')

Unfortunately I have not been able to reproduce it in a standalone
testcase. Additionally, seemingly innocuous things added to the lua
script make it work again -- these were originally side effects of
longer debug snippets. The smallest Apache-based testcase has some
embedded comments that illustrate this aspect:

function handle(r)
r.content_type = "text/plain"
-- fixes "attempt to index a number value..."
-- local d = debug.getregistry()
-- also works
-- local c= collectgarbage()
-- but not just:
-- collectgarbage()

if r.method == 'GET' then
r:puts("Hello Lua World!\n")
-- this small example fails on the second call
r:puts("Hello Lua World!\n")
end
end


The essence of what we do is here:

https://github.com/apache/httpd/blob/5bf7c9c34e5091894993b43a42639a3fd10d1418/modules/lua/lua_request.c#L2683
https://github.com/apache/httpd/blob/5bf7c9c34e5091894993b43a42639a3fd10d1418/modules/lua/lua_request.c#L2987
https://github.com/apache/httpd/blob/5bf7c9c34e5091894993b43a42639a3fd10d1418/modules/lua/lua_request.c#L1895

I also maintain this stack on AIX, another big endian platform, and
see no problem with 5.5.0.

During some AI driven debugging, whose fixes to
packing/padding/masking never panned out, the theory was that the
macros that isolate parts of each Instruction were suffering from some
s390x-specific problem resulting in very large values being seen for
"ARG A" such as 0x088DD000 instead of very small values.

Please let me know if there is any useful thing to try. Thanks!

--
Eric Covener
cov...@gmail.com

Denis Dos Santos Silva

unread,
Jan 17, 2026, 12:02:46 AMJan 17
to lua-l
Hi.

It appears that you are using Apache2 with mod_lua. mod_lua supports Lua 5.1, 5.2, 5.3, or LuaJIT 2.x. In any case, I believe the issue is most likely in the script itself, since it is returning an error message but does not provide any line information. If possible, could you please share a complete example?

Eric Covener

unread,
Jan 17, 2026, 8:43:29 AMJan 17
to lu...@googlegroups.com
On Sat, Jan 17, 2026 at 12:02 AM Denis Dos Santos Silva
<de...@roo.com.br> wrote:
>
> Hi.
>
> It appears that you are using Apache2 with mod_lua. mod_lua supports Lua 5.1, 5.2, 5.3, or LuaJIT 2.x.

Yes. I have some local trivial changes to support later lua releases,
and they are working on 5.4.8 (including zOS)

> In any case, I believe the issue is most likely in the script itself, since it is returning an error message but does not provide any line information. If possible, could you please share a complete example?

Coming back to my 5.5-based sandbox, it seems the error from the
interpreter is a little different (maybe different noise within the
stack)

https://gist.github.com/covener/75f3b68af2cc7fe8c684792d354e49c9 is
the complete simplified lua script from a regression test (with the
comment about how some weird debugging steps seem to avoid the error)

The full Apache error.log entry:

[Sat Jan 17 08:34:00.627574 2026] [lua:warn] [pid 67174462:tid
1550839160745492483] AH01471: Lua error:
/u/WASTST1/SRC/IHSJTest/data/lua/example.lua:18: attempt to index a
function value (local 'r')

Where everything after "Lua Error" is coming from the interpreter:
https://github.com/apache/httpd/blob/954d76b9029e06296ec425c4d2e0c6a14600becd/modules/lua/mod_lua.c#L100

Denis Dos Santos Silva

unread,
Jan 22, 2026, 9:50:25 AMJan 22
to lua-l

Thank you for the detailed explanation.

Following up on your comments, I set up a clean sandbox environment to validate the behavior locally using the same simplified script.

The setup completed successfully, and the tests ran as expected in this environment.

I exercised the setup with Lua 5.3.6 and 5.4.8 ( both officially supported ), as well as Lua 5.5.0 ( not officially supported ). In all cases, the behavior was consistent during these tests. The same CFLAGS were used for both mod_lua and the Linux build (-DUSE_LINUX).

For reference, below are the validation steps and results observed:



$ cd /tmp/apache2/bin
$ ./apachectl -L | grep LuaRoot LuaRoot (mod_lua.c)

$ strings httpd | grep LuaVersion
$LuaVersion: Lua 5.5.0 Copyright (C) 1994-2025 Lua.org, PUC-Rio $$LuaAuthors: R. Ierusalimschy, L. H. de Figueiredo, W. Celes $

$ cd /tmp/apache2/htdocs

$ cat test.lua
function handle(r)
r:puts(_VERSION)
r:puts("\n\n")
end

Hello Lua World!
Hello Lua World!

Lua 5.5  


Test environment details:

  • Distribution: Ubuntu 22.04

  • gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04.2)

Libraries and versions:

  • apr-1.7.6
  • apr-iconv-1.2.2
  • apr-util-1.6.3
  • httpd-2.4.66
  • lua-5.4.8
  • lua-5.5.0

I am sharing these results in case they are useful as an additional data point.


Kind regards,

PS:  I will organize the build script and make it available on a gist or GitLab.

  


links:

Eric Covener

unread,
Jan 29, 2026, 9:00:40 PM (12 days ago) Jan 29
to lu...@googlegroups.com
On Thu, Jan 22, 2026 at 9:50 AM Denis Dos Santos Silva <de...@roo.com.br> wrote:
>
> Thank you for the detailed explanation.
>
> Following up on your comments, I set up a clean sandbox environment to validate the behavior locally using the same simplified script.
>
> The setup completed successfully, and the tests ran as expected in this environment.

Hi, Thanks for this, but maybe I didn't stress this aspect properly:
For me it's only an issue on the combination of zOS AND 5.5.0.

The same SW stack and automated test works for me on AIX (which is
also Big Endian), Windows, Linux/x86_64, Linux/s390x, and
Linux/ppc64le
The previous stack, with 5.4.8 or even 5.2.2. worked OK on the above + zOS

My suspicion is that something about the structs, unions, and
shifting/masking around the representation of the stack is not as
portable as 5.4.8.
But I didn't get very far trying various de-optimizations in lobject.h
and lopcodes.h

Roberto Ierusalimschy

unread,
Feb 2, 2026, 2:19:14 PM (8 days ago) Feb 2
to lu...@googlegroups.com
> The same SW stack and automated test works for me on AIX (which is
> also Big Endian), Windows, Linux/x86_64, Linux/s390x, and
> Linux/ppc64le
> The previous stack, with 5.4.8 or even 5.2.2. worked OK on the above + zOS
>
> My suspicion is that something about the structs, unions, and
> shifting/masking around the representation of the stack is not as
> portable as 5.4.8.
> But I didn't get very far trying various de-optimizations in lobject.h
> and lopcodes.h

That is possible. Does zOS has something unusual related to alignment,
words, sizes for basic objects, or other memory issues?

In particular, can you try the following?

-------------------------------------------------------------------------
--- a/lstate.c
+++ b/lstate.c
@@ -345,2 +345,3 @@ LUA_API lua_State *lua_newstate (lua_Alloc f, void *ud, unsigned seed) {
(*f)(ud, NULL, LUA_TTHREAD, sizeof(global_State)));
+printf("%ld %ld\n", sizeof(StackValue), sizeof(TValue));
if (g == NULL) return NULL;
-------------------------------------------------------------------------

StackValue is a union than contains a TValue, and there is a lot of
casts from one to the other. If we want to get the next element, we
must add 1 to a pointer to the first type, before the cast.

In most machines, these two objects have the same size, so a bug of
adding 1 after the cast won't be detected.

-- Roberto

Eric Covener

unread,
Feb 4, 2026, 6:11:10 AM (7 days ago) Feb 4
to lu...@googlegroups.com
On Mon, Feb 2, 2026 at 2:19 PM Roberto Ierusalimschy
<rob...@inf.puc-rio.br> wrote:
>
> > The same SW stack and automated test works for me on AIX (which is
> > also Big Endian), Windows, Linux/x86_64, Linux/s390x, and
> > Linux/ppc64le
> > The previous stack, with 5.4.8 or even 5.2.2. worked OK on the above + zOS
> >
> > My suspicion is that something about the structs, unions, and
> > shifting/masking around the representation of the stack is not as
> > portable as 5.4.8.
> > But I didn't get very far trying various de-optimizations in lobject.h
> > and lopcodes.h
>
> That is possible. Does zOS has something unusual related to alignment,
> words, sizes for basic objects, or other memory issues?

Big endian is the obvious one but it shares that with AIX which is unaffected.
I have also seen warnings that "strict aliasing" is rather
strict/aggressive on zOS / xlc.


> In particular, can you try the following?
>
> -------------------------------------------------------------------------
> --- a/lstate.c
> +++ b/lstate.c
> @@ -345,2 +345,3 @@ LUA_API lua_State *lua_newstate (lua_Alloc f, void *ud, unsigned seed) {
> (*f)(ud, NULL, LUA_TTHREAD, sizeof(global_State)));
> +printf("%ld %ld\n", sizeof(StackValue), sizeof(TValue));
> if (g == NULL) return NULL;
> -------------------------------------------------------------------------
>
> StackValue is a union than contains a TValue, and there is a lot of
> casts from one to the other. If we want to get the next element, we
> must add 1 to a pointer to the first type, before the cast.
>
> In most machines, these two objects have the same size, so a bug of
> adding 1 after the cast won't be detected.

Both 16 bytes. I have gone a few rounds of unsuccessful AI-assisted
debugging but I do have some more verbose debug related to the various
offsets and macros, in lvm.c and lstate.c

I have collected it here:

https://gist.github.com/covener/4412c8be7762916335bb05e1d8171511

gottfried leibniz

unread,
Feb 6, 2026, 3:26:19 PM (4 days ago) Feb 6
to lu...@googlegroups.com
On 2/4/2026 7:10 AM, Eric Covener wrote:
> Big endian is the obvious one but it shares that with AIX which is unaffected.
> I have also seen warnings that "strict aliasing" is rather
> strict/aggressive on zOS / xlc.
>

What z/OS compiler is being used here? In your debug output, one of the
OP_GETTABUP instructions [1] looks different when compiling the included
example on other machines (RA_Raw *should* be 4).

If you are building Lua from the release page, would it be possible to
generate and provide the compiler listing, e.g., ./luac -l -l -p $FILE
(It may also be helpful to include both 5.5 and 5.4.8).

> I also maintain this stack on AIX, another big endian platform, and
> see no problem with 5.5.0.

While I presently do not have have access to a z/OS box (or one modern
enough with the requisite LLVM PTFs for Open XL C/C++), two related
issues I have hit in the past:

1. Older XL C/C++ 2.4.1 compilers (xlc/xlclang) will incorrectly
optimize the table.unpack loop[2] even with the strict options set
(i.e., -qstrict + -qstrict_induction), leading to a segfault around one
of the functions edge cases[3].

2. Older Nvidia C/C++ compilers (e.g., nvc 22.11-0) struggled with
overlapping assignments in lcode.c (even with [4] applied). For example,
compiling your case with this buggy compiler will output:

5 [18] SELF 0 0 255
6 [18] LOADK 2 5 ; "Hello Lua World: "
7 [18] GETTABUP 3 0 6 ; _ENV "type"

Which is also wrong and has an identical RA_Raw for OP_GETTABUP to your
case.

[1]
https://gist.github.com/covener/4412c8be7762916335bb05e1d8171511#file-lua-txt-L36
[2]
https://github.com/lua/lua/blob/2a7cf4f319fc276f4554a8f6364e6b1ba4eb2ded/ltablib.c#L214-L216
[3]
https://github.com/lua/lua/blob/2a7cf4f319fc276f4554a8f6364e6b1ba4eb2ded/testes/sort.lua#L98
[4]
https://github.com/lua/lua/commit/d51022bf9e496ae4a7276b600d2755becc7d4323

Eric Covener

unread,
Feb 6, 2026, 4:14:11 PM (4 days ago) Feb 6
to lu...@googlegroups.com
On Fri, Feb 6, 2026 at 3:26 PM gottfried leibniz
<gottfried.le...@gmail.com> wrote:
>
> On 2/4/2026 7:10 AM, Eric Covener wrote:
> > Big endian is the obvious one but it shares that with AIX which is unaffected.
> > I have also seen warnings that "strict aliasing" is rather
> > strict/aggressive on zOS / xlc.
> >
>
> What z/OS compiler is being used here? In your debug output, one of the
> OP_GETTABUP instructions [1] looks different when compiling the included
> example on other machines (RA_Raw *should* be 4).

Thanks so much Gottfried!

We are on the old/traditional native xlc. Our HTTPD (+prereqs) port is
somewhat old (~2005), so we are not using any of the more modern
porting-friendly compilers or options.

> If you are building Lua from the release page, would it be possible to
> generate and provide the compiler listing, e.g., ./luac -l -l -p $FILE
> (It may also be helpful to include both 5.5 and 5.4.8).

I've uploaded the two listings to the gist
https://gist.github.com/covener/4412c8be7762916335bb05e1d8171511

Thanks again!

Roberto Ierusalimschy

unread,
Feb 7, 2026, 1:08:08 PM (3 days ago) Feb 7
to lu...@googlegroups.com
> 2. Older Nvidia C/C++ compilers (e.g., nvc 22.11-0) struggled with
> overlapping assignments in lcode.c (even with [4] applied).

There is another "overlapping assignment" in lcode.c. In function
'luaK_indexed', there is this assignment:

t->u.ind.t = cast_byte(t->u.var.ridx);

Although the field u.ind.t does not overlap with u.var.ridx, the field
u.ind (the enclosing structure) overlaps with u.var. Technically that
should be correct, but maybe it causes problems for a buggy compiler.
(Just in case, commit 8164d093 added an intermediate variable into that
assignment.)

Maybe there are other undetected overlapping assignments in lcode.c;
it uses a lot the structure 'expdesc', and it does several assignments
among the fields of that structure.

-- Roberto

Roberto Ierusalimschy

unread,
Feb 7, 2026, 1:25:07 PM (3 days ago) Feb 7
to lu...@googlegroups.com
> Maybe there are other undetected overlapping assignments in lcode.c;
> it uses a lot the structure 'expdesc', and it does several assignments
> among the fields of that structure.

Another suspect is this one:

t->u.ind.t = cast_byte((t->k == VLOCAL) ? t->u.var.ridx: t->u.info);

In this case, the fields u.ind.t and t->u.info do indeed overlap. The
code does not assign one to the other---it assigns the result of a
conditional operator to the field u.ind.t---but it is easy to imagine
that a compiler can get this wrong. Whether the bug is in the code or
in the compiler seems a question for language lawyers. (Again, we can
avoid this code, just in case.)

-- Roberto

gottfried leibniz

unread,
Feb 8, 2026, 8:02:43 PM (2 days ago) Feb 8
to lu...@googlegroups.com
On 2/6/2026 5:13 PM, Eric Covener wrote:
> We are on the old/traditional native xlc. Our HTTPD (+prereqs) port is
> somewhat old (~2005), so we are not using any of the more modern
> porting-friendly compilers or options.

Interesting.

I spent time testing 5.5.0 on older z/OS images w/ 'V1.10 XL C/C++',
'V2.4 XL C/C++', and 'C/C++ for Open Enterprise Languages on z/OS 2.0.0,
clang version 14.0.0' and could not replicate this issue (note: all
older z/OS compilers).

Given that your Lua compiler listing output resembles what happens when
using older nvidia C/C++ compilers[1], I would experiment with making
the temp variable in dischargevars[2] volatile to see if things improve.
Although, given that prior releases of Lua work for you, I would be
surprised if it did.

In nvc's case, everything in testes/ passes after applying that
universal band-aid/hack.

[1] For reference, compiling lcode.c (w/ HEAD at c6b484823) using nvc
22.11-0 will emit for [2]:

199f: ff 24 c5 00 00 00 00 jmp *0x0(,%rax,8) // VLOCAL
to 19AD
....:
19ad: 41 c7 06 08 00 00 00 movl $0x8,(%r14) // VNONRELOC
19b4: e9 d9 02 00 00 jmp 1c92
<luaK_dischargevars+0x312>
....:
1c92: 48 83 c4 08 add $0x8,%rsp // Function
epilogue

Meanwhile, with nvc 26.1-0:

1a1f: ff 24 c5 00 00 00 00 jmp *0x0(,%rax,8) // VLOCAL
to 1A2D
....:
1a2d: 0f b6 43 08 movzbl 0x8(%rbx),%eax
1a31: 89 43 08 mov %eax,0x8(%rbx)
1a34: c7 03 08 00 00 00 movl $0x8,(%rbx) // VNONRELOC
1a3a: e9 ce 02 00 00 jmp 1d0d
<luaK_dischargevars+0x30d>
....:
1d0d: 48 83 c4 08 add $0x8,%rsp // Function
epilogue

AFIAK this was fixed in nvhpc-23-1.

[2]
https://github.com/lua/lua/blob/c6b484823806e08e1756b1a6066a3ace6f080fae/lcode.c#L829

Denis Dos Santos Silva

unread,
Feb 9, 2026, 6:20:14 PM (2 days ago) Feb 9
to lua-l
Hi all.

Eric Covener you can check/show lua stack (elements, size) between versions  ?

Eric Covener

unread,
Feb 9, 2026, 9:38:23 PM (2 days ago) Feb 9
to lu...@googlegroups.com
On Mon, Feb 9, 2026 at 6:20 PM Denis Dos Santos Silva <de...@roo.com.br> wrote:
>
> Hi all.
>
> Eric Covener you can check/show lua stack (elements, size) between versions ?

Any pointer on where/how to do that?
Reply all
Reply to author
Forward
0 new messages