RFC: string.sub() address limits

130 views
Skip to first unread message

Martin Eden

unread,
Sep 2, 2025, 12:13:20 PM (5 days ago) Sep 2
to lu...@googlegroups.com
Hello list,

```
https://lua.org/manual/5.4/manual.html#pdf-string.sub

string.sub (s, i [, j])

Returns the substring of s that starts at i and continues until j;
i and j can be negative. If j is absent, then it is assumed to be equal
to -1 (which is the same as the string length). In particular, the call
string.sub(s,1,j) returns a prefix of s with length j, and string.sub
(s, -i) (for a positive i) returns a suffix of s with length i.
```

In simple words it means string.sub() uses two addresses and
1-based indices (with negative values featurism from Icon).

Suppose 8-bit microprocessor that have 0x100 memory and
can do arithmetic and address up to 0xFF.

I can reach all memory using machine codes but not in Lua,
where max possible index will be 0x7F.

I suppose for today's 64-bit processors, string which Lua can
address can be up to (2^63 - 1) bytes. Am I correct?

-- Martin

Sainan

unread,
Sep 2, 2025, 12:22:48 PM (5 days ago) Sep 2
to lu...@googlegroups.com
I'm not sure how you lost the MSB, but from what I can tell, 254 should be the limit for an 8-bit size_t/lua_Integer. 2^n-2 instead of 2^n-1 due to Lua counting from 1.

-- Sainan

Andrey Dobrovolsky

unread,
Sep 2, 2025, 2:47:39 PM (5 days ago) Sep 2
to lu...@googlegroups.com
Hi Martin Eden,

> Suppose 8-bit microprocessor that have 0x100 memory and can do arithmetic and address up to 0xFF

Unlike modern CPUs in the 8-bit epoch the bitness referred to ALU, not
to RAM bus width. So if You mention an 8-bit microprocessor (not a
microcontroller) usually it has a 16-bit address bus. Using 8-bit ALU
You can process multibyte numbers at a time cost. So I guess that Lua
for Your 8-bit microprocessor should have at least 32-bit numbers.
Creating the Lua state in 0x100 bytes may appear to be a little bit
complicated )) I never thought of running Lua with an 8-bit CPU, I
guess 64K should not be enough for Lua, taking into consideration that
You will need soft floating point routines.

If You keep in mind the 8088, it was a 16-bit CPU with 8-bit wide data bus.

Sainan wrote:

> I'm not sure how you lost the MSB

MSB stores sign for signed integers

Regards,
Andrew

вт, 2 вер. 2025 р. о 19:22 'Sainan' via lua-l <lu...@googlegroups.com> пише:
>
> I'm not sure how you lost the MSB, but from what I can tell, 254 should be the limit for an 8-bit size_t/lua_Integer. 2^n-2 instead of 2^n-1 due to Lua counting from 1.
>
> -- Sainan
>
> --
> You received this message because you are subscribed to the Google Groups "lua-l" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to lua-l+un...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/lua-l/BkssfMuTgjDa3hdBPwVd59cpvc11VVZFsD3Qlm4JyOMRXtudM9L0zWESM9PDilM9XZDGsVBNXwzXstiBhnqPgKLprTaGKL-6MZU4eU0juh4%3D%40calamity.inc.

Sainan

unread,
Sep 2, 2025, 11:00:15 PM (4 days ago) Sep 2
to lu...@googlegroups.com
> MSB stores sign for signed integers

Yeah, but addresses aren't signed.

-- Sainan

Andrey Dobrovolsky

unread,
Sep 3, 2025, 3:54:11 AM (4 days ago) Sep 3
to lu...@googlegroups.com
Sainan wrote:

> Yeah, but addresses aren't signed.

Sure, but Martin Eden was talking about an indices:

> I suppose for today's 64-bit processors, string which Lua can
> address can be up to (2^63 - 1) bytes. Am I correct?

In the sentence the word "address" is a verb, not a noun. So he is correct.

-- Andrew

ср, 3 вер. 2025 р. о 06:00 'Sainan' via lua-l <lu...@googlegroups.com> пише:
>
> > MSB stores sign for signed integers
>
> Yeah, but addresses aren't signed.
>
> -- Sainan
>
> --
> You received this message because you are subscribed to the Google Groups "lua-l" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to lua-l+un...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/lua-l/r6TBuF-GmQfumDCQ2chCVV7qGWkyJn47mhRa511wbk49UQRo4qOsto1ZdH2Ud-dHFVBqpxdhEIhy8EivcSRGsNxnTgkIXT_satdh-pyKiFc%3D%40calamity.inc.

Sainan

unread,
Sep 3, 2025, 4:07:25 AM (4 days ago) Sep 3
to lu...@googlegroups.com
I get what you mean, but I don't think it prevents access to those higher addresses. Tho I guess it does complicate certain operations.

-- Sainan

Andrey Dobrovolsky

unread,
Sep 3, 2025, 5:27:48 AM (4 days ago) Sep 3
to lu...@googlegroups.com
Sainan wrote:

> I get what you mean, but I don't think it prevents access to those higher addresses.

Haha, testing showed that You are right! But those higher addresses
are probably not accessed in the desired manner.
If the string length exceeds MAX_INT then string.sub will flood You
with the whole string until its end ignoring the third argument.
Testing was performed for isolated:

typedef int32_t lua_Integer;

static size_t posrelatI (lua_Integer pos, size_t len)
static size_t getendpos (/*lua_State *L, int arg, lua_Integer def,*/
lua_Integer pos,
size_t len)
static int str_sub (/*lua_State *L*/ size_t l, lua_Integer from, lua_Integer to)

from lstrlib.c.

For 64-bit integers I have absolutely no desire to perform the full
testing. But for 32-bit Lua integers such testing looks possible.
Will Lua prevent creating the string with the size greater than lua_Integer?

ср, 3 вер. 2025 р. о 11:07 'Sainan' via lua-l <lu...@googlegroups.com> пише:
>
> I get what you mean, but I don't think it prevents access to those higher addresses. Tho I guess it does complicate certain operations.
>
> -- Sainan
>
> --
> You received this message because you are subscribed to the Google Groups "lua-l" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to lua-l+un...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/lua-l/oUtXTgsDrpBLPaWVVacSOKhw3P860wN_C7HaYHQutHL3M2Ddd_PNAqvsPFkL63FRSympVd1LoPCFmts6TVd9ba6i0pd0PuW8iwt-n6xGObI%3D%40calamity.inc.

Sainan

unread,
Sep 3, 2025, 6:12:55 AM (4 days ago) Sep 3
to lu...@googlegroups.com
> But for 32-bit Lua integers such testing looks possible. Will Lua prevent creating the string with the size greater than lua_Integer?

Just built Lua in 32-bit mode and it appears so:

> str = ("a"):rep(0x7fffffff)
not enough memory
> str = ("a"):rep(0x6fffffff)
> #str
1879048191

Also worth noting that 32-bit builds typically still have 64-bit lua_Integer:

> 0x1000000000
68719476736

-- Sainan

Luiz Henrique de Figueiredo

unread,
Sep 3, 2025, 6:34:40 AM (4 days ago) Sep 3
to lu...@googlegroups.com
> Also worth noting that 32-bit builds typically still have 64-bit lua_Integer:

Did you set LUA_32BITS?
luaconf.h:#define LUA_32BITS 1

Andrey Dobrovolsky

unread,
Sep 3, 2025, 6:34:46 AM (4 days ago) Sep 3
to lu...@googlegroups.com
Unless lstrlib.c is equipped with MAXSIZE to prevent creation of too
big strings, I was able to create the file of 3G size and read it as
one string with io.read().

-- Andrew

ср, 3 вер. 2025 р. о 13:12 'Sainan' via lua-l <lu...@googlegroups.com> пише:
> --
> You received this message because you are subscribed to the Google Groups "lua-l" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to lua-l+un...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/lua-l/SeXoY3_8y5anZ-IsobvPGummNUpRyXeoyjEPajSC77zFeFsoOSJAMFL5xSQhaukByqyQxyuKRo-Kfx24UDXXjg_lH0iC0s7D4AT3Sw4KNEc%3D%40calamity.inc.

Andrey Dobrovolsky

unread,
Sep 3, 2025, 7:01:13 AM (4 days ago) Sep 3
to lu...@googlegroups.com
Everything looks nice.I've built Lua on 64-bit box with
luaconf.h: #define LUA_INT_DEFAULT LUA_INT_INT
and Lua refused to read the 3G file with error message about memory
block too big. So Lua prevents creating strings of the size which can
not be addressed in the correct way.

Regards,
Andrew

ср, 3 вер. 2025 р. о 13:34 Luiz Henrique de Figueiredo
<l...@tecgraf.puc-rio.br> пише:
>
> > Also worth noting that 32-bit builds typically still have 64-bit lua_Integer:
>
> Did you set LUA_32BITS?
> luaconf.h:#define LUA_32BITS 1
>
> --
> You received this message because you are subscribed to the Google Groups "lua-l" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to lua-l+un...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/lua-l/CAD55k3qx9gS%3D8usM-QOcWiK9AmSTaCq_TxzOimzFzT-r1xcpQQ%40mail.gmail.com.

bil til

unread,
Sep 4, 2025, 8:13:46 AM (3 days ago) Sep 4
to lu...@googlegroups.com
Am Di., 2. Sept. 2025 um 18:13 Uhr schrieb 'Martin Eden' via lua-l
<lu...@googlegroups.com>:
>
> I suppose for today's 64-bit processors, string which Lua can
> address can be up to (2^63 - 1) bytes. Am I correct?
>

This sound REALLY huge :)

... do you have a rough estimate, how large 2^63 memory would be?

(2^10 is 10^3, so 2^63 would be ca. 10^18, which is 1000 PB or 1
Million TB :) ... a computer with 1000 PB free RAM for your Lua system
would be REALLY big :).

Martin Eden

unread,
Sep 4, 2025, 1:01:20 PM (3 days ago) Sep 4
to lu...@googlegroups.com
Point is that half of memory is inaccessible because of function interface.

But this is common disease for C.

-- Martin


bil til

unread,
Sep 5, 2025, 12:09:40 AM (2 days ago) Sep 5
to lu...@googlegroups.com
Am Do., 4. Sept. 2025 um 19:01 Uhr schrieb 'Martin Eden' via lua-l
<lu...@googlegroups.com>:
>
>
> But this is common disease for C.
>
> -- Martin
>

Are you sure with "for C"? From Microcontroller Programming I know
that C is a very compact and "assembler-near" language... .

The "RAM wasting problem" of typical Computers is the "von Neumann
architecture", where the program code resides in RAM. According to
"Moore's law" of "cheap RAM doubling every year" this worked very
impressively well for the last 50 years. Just Gordon Moore died in
2023, his law currently also seems to hit physical limits, and his
company Intel is struggling, but anyway "von Neumann architecture" is
dominating large computer systems.

On the market for one-Chip computers / micro-controllers the typical
architecture is Harvard architecture - with program code residing in
ROM and RAM in "RAM-efficient" code is used exclusively for data... .

Fortunately in Lua this "keeping bin code in ROM" should now also be
possible since V5.5., as all code in the bin files should be 32bit
aligned (this is the requirement for fast int handling of ROM-data in
the very common arm Cortex-M architecture, e. g. STM32). I am
looking forward to test this soon, but I am currently still on other
"construction sites".

... just the "problem" I think is not C, but the von Neumann
architecture and the "typical assumption" of OS architectures, that
RAM is abundandly available in "any senseful size" (which is I think a
very valid assumption for most "large computer" applications, just NOT
for 1-chip-microcontrollers commonly used for IOT applications or
similar things).

Andrey Dobrovolsky

unread,
Sep 5, 2025, 11:12:10 AM (2 days ago) Sep 5
to lu...@googlegroups.com
I'm curious why string.rep() and string.pack() use additional limits
for the size of their output strings. It is MAXSIZE constant defined
in lstrlib.h. Running on a 64-bit machine Lua is able to process
strings of any size, addressable with 64-bit integers and existing in
reality, depending on the RAM and swap sizes. But MAXSIZE is 2G and
the string longer than 2G may be created by reading the file or with
the help of table.concat(), but string.rep() and string.pack() are
under additional restrictions. Why?
Probably this topic was discussed earlier or described in some
article, I would be grateful for the links.

Regards!
Andrew

пт, 5 вер. 2025 р. о 07:09 bil til <bilt...@gmail.com> пише:
> --
> You received this message because you are subscribed to the Google Groups "lua-l" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to lua-l+un...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/lua-l/CAOnDXmEy-GB1YdjWVDfKA-FJ5hj_P0k3M5jH0x0MoXZb41eCAA%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages