table.push(t, ...) method?

204 views
Skip to first unread message

Rett Berg

unread,
Aug 19, 2025, 3:41:29 PMAug 19
to lu...@googlegroups.com
Can we add a table.push(t, ...) where it sets t[#t+1] to t[#t + #{...}] to the values in {...}?

Theoretically table.insert can be called for this, however:
* It has a rather odd API where the second value is an index or value depending on the number of arguments. It would be nice to have a clearer-named API for this.
* There is at least some performance cost of repeatedly querying the length of the table when you want to insert multiple elements.

The above API would perhaps behave surprisingly in cases where there were nil values in the table, but lots of APIs behave surprisingly in that case so I don't see it as a problem.

There are multiple times where I want to push 2 or three known values to a table, or want to create an API that accepts ... and pushes all the values (i.e. if you wanted to make a table implement file:write(...), which you then concat to get the result)

The amount of code would be near-zero. I'm planning on adding to my own libraries but wanted to know if perhaps we could add it to the standard table library.

- Rett

Spar

unread,
Aug 19, 2025, 3:49:06 PMAug 19
to lu...@googlegroups.com
Does this code solve your problem? Like table.insert_many?
 
```lua
local len = #tbl
tbl[len + 1] = val1
tbl[len + 2] = val2
tbl[len + 3] = val3
tbl[len + 4] = val4
```
--
You received this message because you are subscribed to the Google Groups "lua-l" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lua-l+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lua-l/CALW7ahxHmpTS71amv50a%3DwcA-_nL4TR%3DufQn42FBvM5f%3DMJNJA%40mail.gmail.com.
 

Sainan

unread,
Aug 19, 2025, 4:00:03 PMAug 19
to lu...@googlegroups.com
Given the use of ..., I'm assuming you mean that your table.push would accept a variable number of arguments, which has the inherent limitation that all arguments need to be on the stack, limiting you to ~250 arguments, maybe even less depending on the execution context.

I think it seems far more sane (and befitting of Lua) to simply force you to understand your own requirements and make you write your own loop that either iterates over a table or varargs within 3 lines of code.

-- Sainan

Augusto Goulart

unread,
Aug 19, 2025, 4:17:31 PMAug 19
to lu...@googlegroups.com
As I understood it, I believe Berg meant something akin of a merge method, instead of varags, you would have table.merge(t1, t2). Although if you don’t know how many items you need for t2, a loop is the correct way. Also, if you can concat strings with .. couldn’t you do the same with tables? I’m new here btw :)

Augusto

--
You received this message because you are subscribed to the Google Groups "lua-l" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lua-l+un...@googlegroups.com.

Wbertro

unread,
Aug 19, 2025, 4:33:41 PMAug 19
to lu...@googlegroups.com

Wouldn't table.move do exactly that right now?

--
You received this message because you are subscribed to the Google Groups "lua-l" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lua-l+un...@googlegroups.com.

Sainan

unread,
Aug 19, 2025, 4:35:10 PMAug 19
to lu...@googlegroups.com
Table concatenation sounds interesting, but only really makes sense for a very trivial array-like table. Otherwise defining sane semantics becomes a lot trickier.

-- Sainan

mjmouse9999

unread,
Aug 19, 2025, 6:44:34 PMAug 19
to lua-l
On Wednesday, August 20, 2025 at 5:41:29 AM UTC+10 Rett Berg wrote:
> Can we add a table.push(t, ...) where it sets t[#t+1] to t[#t + #{...}] to the
> values in {...}?

The following should work as a minimal implementation of what you want.

    -- this should be able to be manually inlined where used
    function table_push(t, ...) -- returns t
        return table.move({...}, 1, select('#', ...), #t + 1, t)
    end

    -- replace the first line with this for Teal
    function table_push<A>(t: {A}, ...: A): {A}


> Theoretically table.insert can be called for this, however:
> * It has a rather odd API where the second value is an index or value
>   depending on the number of arguments. It would be nice to have a
>   clearer-named API for this.
> * There is at least some performance cost of repeatedly querying the length of
>   the table when you want to insert multiple elements.

This should (only) be an additional log(n) for the binary search.

> The above API would perhaps behave surprisingly in cases where there were nil
> values in the table, but lots of APIs behave surprisingly in that case so I
> don't see it as a problem.

My function above just copies those nils into the resulting table, so calling
table_push({}, 'a', 'b', nil, 'c') is equivalent to {[1]='a', [2]='b', [4]='c'}
Having nils in the original `t` table will have the usual oddities with #t

Rett Berg

unread,
Aug 19, 2025, 8:56:13 PMAug 19
to lua-l
Yes, the table.move works, but requires allocating another table and moving the whole stack into it -- aka not performant.

Probably the fastest way to do it in pure lua with no memory allocation would be:

function push(t, ...)
  local len=#t
  for i=1,select('#', ...) do t[len+i] = select(i, ...) end
end

But I'm (1) a bit worried about the performance cost of all those select calls and (2) I KNOW it would be very performant in C.

- Rett

Wbertro

unread,
Aug 19, 2025, 10:18:55 PMAug 19
to lu...@googlegroups.com

It looked like you were referring to a 2nd table when you wrote "to the values in {...}"

- Robert

--
You received this message because you are subscribed to the Google Groups "lua-l" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lua-l+un...@googlegroups.com.

Philippe Verdy

unread,
Aug 21, 2025, 1:39:07 PMAug 21
to lu...@googlegroups.com
You can still use select by group of (say) 16 items; and divide the number of loops and of select calls by 16, you just then need the last loop with increment 1 to terminate the count. No need then to allocate a large array (with a variable size at each call), a single small array of 16 items can be reused in the first loop (and kept in a cache preserved across calls).This way you nearly reach what you could do by using a "native" C implementation (but with a binary extension and the need to load it from a potentially unsafe external source, which may not be possible in restricted environments where extensions are blacklisted); this would effectively run in "pure Lua" without external dependency.

Lars Müller

unread,
Aug 21, 2025, 4:28:50 PMAug 21
to lu...@googlegroups.com

Just write it yourself. I would suggest the following implementation:

function table.push(t, ...)
    for i = 1, select("#", ...) do
        local v = select(i, ...)
        table.insert(t, v)
    end
end


in a good Lua implementation (includes PUC and LuaJIT), this should run in amortized linear time in the number of items you push.
It is not linearithmic time (and even that would often be acceptable). This is because a length hint is cached, and each insertion only extends table length by 1;
when computing table length, both LuaJIT and PUC Lua look for a boundary in the vicinity of the hint (say, K) so they can handle this case in constant time AFAICT.

On LuaJIT, I would expect this implementation to be practically efficient. On PUC, you might want to replace the table.insert with t[#t + 1].
(You might also want to actually pack up the vararg in a table. For some reason, in my testing a while ago, this makes iterating varargs faster in PUC, whereas using select is much faster on JIT. Try it and see.)

I would not do the premature optimization of caching a length yourself and then adding a bunch of values based on that.
For one, it creates nasal demons with non-sequences unnecessarily. Just behaving like table.insert would ("filling random holes") is reasonable IMO.

Overwriting entries is not. (An error would also be reasonable, however.)

I have not tested this, but judging by the code this might very well turn out to be a pessimization in some cases:
If you add more values than K, you will trigger a new binary search - which uses the last length hint as a lower bound, but the "too large" array capacity as an upper bound.
I believe you might very well end up in linearithmic territory theoretically, missing the fast path in length.
(Again, measure to see what this practically translates to. It's very well possible that the multiple length operator applications end up more expensive.
Vary the number of elements you push in one go; for PUC you'll want to test numbers > 4, for LuaJIT > 2.)

To wrap it up, I would prefer such a feature not to be added to Lua for a multitude of reasons:
It is not sufficiently orthogonal to table.insert and table.move. If you want to add a single element, you use table.insert; if you want to append multiple elements from another table, you use table.move.
There are also nitty-gritty details to sort out. Do we prefer to take the elements to append as a table, or as a vararg; if the latter, what do we do with nils?
Do we allow overwriting values in a non-sequence, inconsistent with what table.insert would do? Do we perhaps throw an error when this happens?

Why should the language authors prescribe this when you can produce a simple - and in a well-optimized implementation like LuaJIT, hopefully sufficiently performant - implementation yourself?
(If you're embedding Lua, this includes the possibility of a C implementation. Though given LuaJIT, the assertion that a C implementation will be significantly faster than a Lua implementation need not be correct.)

To me this seems like it would violate Lua's goal of minimalism.

If you think that performance considerations necessitate this feature, I would recommend benchmarking to demonstrate this, preferably on both PUC and LuaJIT.

- Lars


p.s. "It has a rather odd API where the second value is an index or value depending on the number of arguments. It would be nice to have a clearer-named API for this."

I can second this. I think what happens when you do the seemingly harmless 'table.insert(t, ("1"):gsub("a", "b"))' is quite a bit of a Lua pitfall ;)

--
You received this message because you are subscribed to the Google Groups "lua-l" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lua-l+un...@googlegroups.com.

Roberto Ierusalimschy

unread,
Aug 21, 2025, 4:53:50 PMAug 21
to 'Lars Müller' via lua-l
> (You might also want to actually pack up the vararg in a table. For some
> reason, in my testing a while ago, this makes iterating varargs faster in
> PUC, whereas using select is much faster on JIT. Try it and see.)

select("#", ...) can be deceptive. In Lua, 'select' is a regular
function. So, in each call, *all* vararg arguments are passed along.
If there are N vararg arguments, each of these calls to 'select'
has to pass these N arguments. If you loop that N times, the total cost
becomes N^2.

-- Roberto

Renato Maia

unread,
Aug 21, 2025, 5:25:25 PMAug 21
to lu...@googlegroups.com
On Tue, Aug 19, 2025 at 9:56 PM Rett Berg <goog...@gmail.com> wrote:
> Yes, the table.move works, but requires allocating another table and moving the whole stack into it -- aka not performant.

Other option is to use `vararg` module (see
https://github.com/renatomaia/luavararg):

```lua
-- luarocks install --server=https://luarocks.org/manifests/maia vararg
local vararg = require "vararg"

function table.push(t, ...)
local len = #t
vararg.map(function (v)
len = len + 1
t[len] = v
end, ...)
end
```

Doing a quick micro-benchmark using luabench:

```sh
luarocks install --server=https://luarocks.org/manifests/maia vararg
luarocks install loop

git clone https://github.com/renatomaia/luabench.git
cd luabench/src

mkdir table_push_varargs
echo '
packvalues = [[
local vararg = require "vararg"
local generator = setmetatable({}, {__index = function (self, key)
return key end})
local unpackvalues = vararg.pack(table.unpack(generator, 1, count))
]]

return {
id = "push",
name = "Table Push Varargs",
repeats = 1e5,
nocollect = true,
variables = {
count = { min = 32, max = 128, name = "Elements inserted" },
},
test = [[
t = {}
push(t, unpackvalues())
]],
teardown = [[
assert(#t == count)
for i, v in ipairs(t) do
assert(i == v)
end
]],
cases = {
baseline = {
setup = packvalues..[[
local function push(t, ...) end
]],
teardown = "",
},
["table.move"] = {
setup = packvalues..[[
local function push(t, ...)
table.move({...}, 1, select("#", ...), #t+1, t)
end
]],
},
["for-select"] = {
setup = packvalues..[[
local function push(t, ...)
local len<const> = #t
for i = 1, select("#", ...) do
t[len + i] = select(i, ...)
end
end
]],
},
["vararg.map"] = {
setup = packvalues..[[
local function push(t, ...)
local len = #t
vararg.map(function (v)
len = len + 1
t[len] = v
end, ...)
end
]],
},
},
}
' > table_push_varargs/benchmark.lua

for ((i=0; i<30; i++)); do
lua run.lua table_push_varargs
done
lua plot.lua tabplot table_push_varargs/push.csv
less table_push_varargs/push.txt
```

I get the following results on my machine, but please don't take these
values too seriously. Always run your own benchmarks, and as close as
possible to your actual use case.

```
#
# Table Push Varargs
# Date: Thu Aug 21 18:12:19 2025
#
____________________________
Memory allocated (kilobytes)

Elements i | baseline | for-select | table.move
| vararg.map |
32 | 5493.8 (0.0e+00) | 55496 (0.0e+00) | 1.1096e+05
(0.0e+00) | 67996 (1.5e-11) |
64 | 5495.5 (0.0e+00) | 1.055e+05 (0.0e+00) | 2.1097e+05
(0.0e+00) | 1.18e+05 (4.4e-11) |
96 | 5496 (0.0e+00) | 2.055e+05 (0.0e+00) | 3.6097e+05
(0.0e+00) | 2.18e+05 (2.9e-11) |
128 | 5499 (0.0e+00) | 2.055e+05 (2.9e-11) | 4.1097e+05
(1.2e-10) | 2.18e+05 (5.8e-11) |
__________________________
GC CPU time used (seconds)

Elements i | baseline | for-select | table.move
| vararg.map |
32 | 0.0023266 (2.9e-04) | 0.0099682 (3.6e-04) | 0.034136
(7.7e-04) | 0.016761 (3.9e-04) |
64 | 0.0023032 (2.1e-04) | 0.011461 (2.4e-04) | 0.037912
(1.5e-03) | 0.027271 (6.4e-04) |
96 | 0.0022767 (1.5e-04) | 0.011919 (4.6e-04) | 0.038582
(7.5e-04) | 0.038949 (6.0e-04) |
128 | 0.0023312 (1.9e-04) | 0.01182 (3.1e-04) | 0.038703
(7.3e-04) | 0.038527 (5.0e-04) |
_______________________
CPU time used (seconds)

Elements i | baseline | for-select | table.move
| vararg.map |
32 | 0.019587 (1.0e-03) | 0.23275 (1.1e-02) | 0.14103
(8.3e-03) | 0.19419 (9.8e-03) |
64 | 0.031145 (1.6e-03) | 0.53878 (1.8e-02) | 0.22334
(9.7e-03) | 0.32668 (2.3e-02) |
96 | 0.041905 (4.7e-03) | 0.94453 (4.0e-02) | 0.33251
(1.9e-02) | 0.48665 (3.2e-02) |
128 | 0.053225 (5.8e-03) | 1.4093 (8.6e-02) | 0.39763
(2.0e-02) | 0.58945 (3.0e-02) |
```

--
Renato Maia

Spar

unread,
Aug 21, 2025, 5:29:24 PMAug 21
to lu...@googlegroups.com
We need C++ "template for" but for Lua varargs :-)
--
You received this message because you are subscribed to the Google Groups "lua-l" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lua-l+un...@googlegroups.com.

Sainan

unread,
Aug 21, 2025, 5:47:53 PMAug 21
to lu...@googlegroups.com
> If you loop that N times, the total cost becomes N^2.

I don't think this is correct, because the for bounds are only computed at the start of the loop:

function n(...)
print("n called")
return select("#", ...)
end
function iterva(...)
for i = 1, n(...) do
print(i)
end
end
iterva("a", "b", "c")
--> n called
--> 1
--> 2
--> 3

-- Sainan

Rett Berg

unread,
Aug 21, 2025, 5:48:57 PMAug 21
to lu...@googlegroups.com
select("#", ...) can be deceptive. In Lua, 'select' is a regular
function.  So, in each call, *all* vararg arguments are passed along.
If there are N vararg arguments, each of these calls to 'select'
has to pass these N arguments. If you loop that N times, the total cost
becomes N^2.
I'm VERY surprised by this, thank you for enlightening me.

I'm curious why this is the case? My (minimal) understanding of lua internals from the C API pov is that there is a "stack" that gets passed around between function calls -- why wouldn't `...` just "pass along the stack starting at index" of the current lua function? In that case it should bhe basically a pointer dereference.

varargs.map looks nifty

I guess I should have been more clear: the main reason I'm personally asking for this is because I dislike table.insert's API - I feel that insert shouldn't have been "overloaded" with pushing. I believed the performance win for multiple known values would help sell it though.

- Rett


You received this message because you are subscribed to a topic in the Google Groups "lua-l" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lua-l/lIJTvRYQ9xM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lua-l+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lua-l/0b37e9ad-eb86-4e36-b078-6cd388dc395d%40Spark.

Rett Berg

unread,
Aug 21, 2025, 5:50:36 PMAug 21
to lu...@googlegroups.com
the for bounds are only computed at the start of the loop:

I think Robert was referencing my code that called select(i, ...) within each loop.


--
You received this message because you are subscribed to a topic in the Google Groups "lua-l" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lua-l/lIJTvRYQ9xM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lua-l+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lua-l/MG7eqc8Rws59hfneyFVRAofv9NhdjJmX4JYANz7E0ML-Mt59TDd2lj3I4ggfqCU7c0hHX8Q8YhxIbGVWg8I-9BrUiRQ-y21GW0ZHm9vk2kY%3D%40calamity.inc.

mjmouse9999

unread,
Aug 21, 2025, 8:46:55 PMAug 21
to lua-l
On Friday, August 22, 2025 at 7:48:57AM UTC+10 Rett Berg wrote:
> > select("#", ...) can be deceptive. In Lua, 'select' is a regular
> > function.  So, in each call, *all* vararg arguments are passed along.
> > If there are N vararg arguments, each of these calls to 'select'
> > has to pass these N arguments. If you loop that N times, the total cost
> > becomes N^2.
> I'm VERY surprised by this, thank you for enlightening me.
>
> I'm curious why this is the case? My (minimal) understanding of lua internals
> from the C API pov is that there is a "stack" that gets passed around between
> function calls -- why wouldn't `...` just "pass along the stack starting at
> index" of the current lua function? In that case it should bhe basically a
> pointer dereference.

The trouble is that we need to have the function and the initial number below
that `...`, which necessitates a copy of all the items in the `...` to shift them
to the right place.

In Lua functions, functions with a ... in the parameter list already get all
their named arguments copied along the stack, since the named arguments get a
variable slot.

So then calling a function with a vararg in a Lua function requires:

1. Copy the function to a slot
2. Copy any given arguments to the next slots
3. Use the OP_VARARG instruction to copy the elements of the ... into place
4. Use OP_CALL with a sentinel value so that the arguments go to the top of the stack

Roberto Ierusalimschy

unread,
Aug 24, 2025, 10:37:12 AM (14 days ago) Aug 24
to 'Sainan' via lua-l
> > If you loop that N times, the total cost becomes N^2.
>
> I don't think this is correct, because the for bounds are only computed at the start of the loop:

Run the following script with an argument in the order of 1e6 and wait:

function add (...)
local sum = 0
for i = 1, select("#", ...) do
sum = sum + select(i, ...)
end
return sum
end

local a = {}
local lim = tonumber(arg[1])
for i = 1, lim do a[i] = 1 end
print(add(table.unpack(a)))

-- Roberto

Roberto Ierusalimschy

unread,
Aug 24, 2025, 10:50:03 AM (14 days ago) Aug 24
to lu...@googlegroups.com
> I'm curious why this is the case? My (minimal) understanding of lua
> internals from the C API pov is that there is a "stack" that gets passed
> around between function calls -- why wouldn't `...` just "pass along the
> stack starting at index" of the current lua function? In that case it
> should bhe basically a pointer dereference.

Compare foo(...) with foo(10,20,30,40). The code doing the call doesn't
know whether it will call a vararg function, and the function being
called doesn't know whether its argument is '...'. (One of the hidden
costs of a dynamic language.) So, we must have a uniform protocol
that works for all combinations. That protocol, in Lua, is to push
the function itself plus all its arguments to the stack before the call.

-- Roberto

Roberto Ierusalimschy

unread,
Aug 24, 2025, 11:15:54 AM (14 days ago) Aug 24
to 'Sainan' via lua-l
Sorry, 1e6 overflows the stack. The maximum is a little less than half
the stack limit. 490000 should do.

-- Roberto

Rett Berg

unread,
Aug 24, 2025, 11:21:10 AM (14 days ago) Aug 24
to lu...@googlegroups.com
I didn't realize the performance costs were quite so severe. I now feel even more strongly that there should be a table.push(t, ...) implemented in C stdlib, since it would be O(varargs) and would have a more straightforward API than table.insert.

- Rett



--
You received this message because you are subscribed to a topic in the Google Groups "lua-l" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lua-l/lIJTvRYQ9xM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lua-l+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lua-l/20250824151546.GB805571%40arraial.inf.puc-rio.br.

David Favro

unread,
Aug 24, 2025, 1:09:47 PM (14 days ago) Aug 24
to lu...@googlegroups.com, Rett Berg
On August 24, 2025 3:20:53 PM UTC, Rett Berg <goog...@gmail.com> wrote:
>I didn't realize the performance costs were quite so severe. I now feel
>
>even more strongly that there should be a table.push(t, ...)
>implemented
>in C stdlib, since it would be O(varargs) and would have a more
>straightforward API than table.insert.

Your initial request stated two use-cases: one is "two or three arguments", for which the performance costs [of `select()`] are not "so severe"; the other is a table-based implementation of `file:write()`. I find it a little difficult to imagine a non-contrived use-case of file:write() with much more than say 50 elements, which itself would be very rare, more typically 1-3 items. So, typically not a big performance problem.

Examine Roberto's example code: he had to call `table.unpack()` just so that he could deliberately manufacture the large vararg to then be iterated over using `select()`. In the real world, he could have just passed the table itself and directly iterated over its elements, or used `table.move()` to implement your `table.push()`.

Keep in mind also that the `select()` "penalty" exists only in Lua functions, not in C functions, which can directly iterate over the vararg. So your "table-based file:write()" could be implemented in C without this problem (which again, I don't think is really a problem -- and I say this as someone who has implemented this table-based file:write() in *Lua* using `select()`, and used it in production code for years without any issue). The C version would also have options other than tables for storing partial results, e.g. `luaL_buffer`.

Your `table.push()` is the same thing as `table.move()` but for varargs rather than tables (and less general). No question that varargs in Lua are very awkward, but that's a different and more general problem.

If this is really seen as a big problem (of which I'm skeptical), I think that a more general solution is not a specialized function for this one case, but rather to implement `select()` as a "special form" or "macro" or better yet an operator, probably with its own opcode, rather than an actual function-call. This would solve the entire class of problems, rather than one specific example.

I've been considering this idea [a non-function-call alternative to `select()`] for a long time [^1] but whenever I thought about doing the work[^2], I always realize that, while it *would* be a performance gain, it's not a huge issue. I do think a non-function-call `select()` would be a nice performance gain and I would love to see it, but in the real world it's not a super high priority to me because truly performance-critical functions can always be written in C, which will typically run far faster than the Lua version would, even with a more efficient `select()`.

-- David

[^1]: also for `rawget()`, but I think the case for it is even weaker.

[^2]: in my own fork of Lua.

Rett Berg

unread,
Aug 24, 2025, 2:25:03 PM (14 days ago) Aug 24
to David Favro, lu...@googlegroups.com
Hey David, these are fair and well made points. I agree an optimized select would be more welcome than a slightly more optimized push.

Vaughan McAlley

unread,
Aug 25, 2025, 9:36:03 AM (13 days ago) Aug 25
to lu...@googlegroups.com
You received this message because you are subscribed to the Google Groups "lua-l" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lua-l+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lua-l/CALW7ahxDokg99XnmLokR94_5nShz9Tg2fhrF5aWPh72vK%2BezbA%40mail.gmail.com.

Maybe something like this, creating a table for larger values:

function table.push(t, ...)
    local n = select("#", ...)
    local tn = #t
    if n > 3 then
        local t1 = {...}
        for i = 1, n do
            t[tn+i] = t1[i]
        end
    else
        for i = 1, n do
            t[tn+i] = select(i, ...)
        end
    end
end

-- Vaughan

Philippe Verdy

unread,
Aug 27, 2025, 12:11:14 PM (11 days ago) Aug 27
to lu...@googlegroups.com
Le jeu. 21 août 2025 à 22:28, 'Lars Müller' via lua-l <lu...@googlegroups.com> a écrit :

Just write it yourself. I would suggest the following implementation:

function table.push(t, ...)
    for i = 1, select("#", ...) do
        local v = select(i, ...)
        table.insert(t, v)
    end
end
The problem here is the slow loop by using table.insert() for each individual item.

Using a table constructor with varargs is much faster, but limited by the call stack.

My idea was just to use select() to return a bounded number N of item (e.g. 16 or 256) using a small reusable array to perform much faster table.move(), and iterate on groups of N items, and terminate with the remaining items if there are more than 0 (still less than N) which can also be performed in a single table.move() call.

You can experiment with the value to use for N, but for me N=16 or 256 should work in pure Lua without needing any native C implementation (which will not improve significantly the overall performance), so no need to change the Lua library just for that, you can already do that in existing all supported Lua versions (and without needing to support a local implementation fork, or the need to use a custom extension library, both are often not allowed in secure environments that support only pure Lua, plus a very restricted set of native extension libraries that have a good official support were and tested in many environments).

It's strange that the replies to this thread have ignored this suggestion which is easy to implement without falling into Lua internal limits.

Rett Berg

unread,
Aug 27, 2025, 12:24:09 PM (11 days ago) Aug 27
to lu...@googlegroups.com
> My idea was just to use select() to return a bounded number N of item (e.g. 16 or 256)

What API can do an upper bound on select? Even if such an API existed, it would still require copying the entire varargs stack for each call - which is the slow part (table.insert is NOT slow)

--
You received this message because you are subscribed to a topic in the Google Groups "lua-l" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lua-l/lIJTvRYQ9xM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lua-l+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lua-l/CAGa7JC22F6mXe7zAV5MCKcVuR6PNa8pcRHKXw4LZ6V%2B0JBDMwg%40mail.gmail.com.

Philippe Verdy

unread,
Aug 27, 2025, 12:35:52 PM (11 days ago) Aug 27
to lu...@googlegroups.com
No need of an extra API. You just need "select('#', ...)" once to return the size of the varargs. The cost of the "table.move" solution exposed above is limtied by the fact that it uses a large "{...}" table constructor. Which is clearly not needed. You can safely use a single static/reusable table constructor with a small size N (I initially suggested N=16 which is viable across all Lua versions). Then the mail loop is just iterating with table.move() by group of N items, which is very fast (faster than the linear loop item by item in Lua). Only 1 small table constructor (of size N) is needed, all the rest is performed very fast by table.move() operating on groups on N items instead of 1 by 1 with the naive implementation using "for i = 1, select("#", ...) do table.insert()".
My idea just drops all the many iterated calls to "table.insert()", it just iterates with "table.move()" with a sufficient size N. And you never reach any Lua limit and avoid the costly large table constructor.

You received this message because you are subscribed to the Google Groups "lua-l" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lua-l+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lua-l/CALW7ahzxQYZB8pp1qZdsAxsk44ehsOKmh-LJEw-%2BRsanycC1LQ%40mail.gmail.com.

Philippe Verdy

unread,
Aug 27, 2025, 2:02:40 PM (11 days ago) Aug 27
to lu...@googlegroups.com
Le ven. 22 août 2025 à 02:46, mjmouse9999 <mjmou...@gmail.com> a écrit :
On Friday, August 22, 2025 at 7:48:57AM UTC+10 Rett Berg wrote:
> > select("#", ...) can be deceptive. In Lua, 'select' is a regular
> > function.  So, in each call, *all* vararg arguments are passed along.
> > If there are N vararg arguments, each of these calls to 'select'
> > has to pass these N arguments. If you loop that N times, the total cost
> > becomes N^2.
> I'm VERY surprised by this, thank you for enlightening me.

That's the only case where there should exist a much faster (and safe) way to get the effective number of function parameters. So here I agree that select('#', ...) is deceptive. And this is a place for improvement in Lua, possibly by allowing this syntax "#..." in the parser, that would require a new bytecode to avoid the extra cost of pushing everything into a new call stack frame, or may be the compiler could autodetect "select('#', ...)" and use that bytecode specifically. Maybe we could restructure the call stack to allow part of frames to be shared across function calls without performing any costly allocations and copies.

This is to be done separately from the other needed optimization for table constructors (which should also avoid using costly stack frames). There is ample improvement to do there because this is a frequent use case where we need to load fast a lot of static data (without being forced to use some internal file I/O instead).

There are other ample independant improvements to allow tables to work efficiently not item by item but in a "vectorized" way: this requires changes in how tables are internally stored, For now there's only a naive implementation supporting a single sequence (with indexes from 1 to N inclusively) and then a costly hashmap for all the rest. Adding support for subsequences (with indexes starting at other positions than just 1) would be helpful to accelerate Lua a lot and start getting the performances we have in Java or C#. Already much has been done so that Lua is already more efficient than Python, but Lua has progressed significantly. A major effort on the internal implementation of tables would boost Lua to rocket speeds, as Lua tables are the major bottleneck with their  very naive implementation.

Adding internal support for subsequences in the internal representation of tables, and a better implementation of the internal hashmap (to allow efficient storage for tables with many "holes", even if indexes are just plain 32-bit or 64-bit integers, and not doubles, strings, or object references) would mean that implementations could finally be able to use accelerated hardware vector instructions (SSE, GPU...), and would also remove the internal stress on the garbage collector, and on the system-level memory management... even if these more complicated (platform-specific) implementations are not used in the core distribution.

We could dramatically be more efficient than Python (probably as fast as Java, and maybe for so far from Rust) without the huge complexity overhead of these complex languages that use lot of internal tricks, and are hard to learn and masterize for developers, and complicate to maintain on a mission-critical systems. Table implementations with better internal representation merit such effort (but this probably does not require lot of code as long as we don't go to platform-specific features, that independant implementations could experiment independantly, and that could possibly be integrated much later in core Lua, for major platforms like x64/AMD64 and ARM64, and possibly x86 for smaller integrated systems, the small 16-bit platforms for embedded systems could still use the existing portable core version).


Reply all
Reply to author
Forward
0 new messages