mmap() as an external string and zero-termination

53 views
Skip to first unread message

Andrey Dobrovolsky

unread,
Nov 7, 2025, 5:57:39 PM (5 days ago) Nov 7
to lu...@googlegroups.com
Greetings,

Introducing external strings in Lua opens the way to mmap() large
files. Shrinking the memory footprint is dramatic for read("a") cases,
performance gain depends on usage scenario.
lua_pushexternalstring() demands a zero-terminated buffer with
api_check(), but is this trailing zero byte really necessary for an
underlying luaS_newextlstr()?
The most attractive way to use mmap() is PROT_READ and MAP_PRIVATE,
which doesn't allocate and copy the buffer for the whole file. If we
need to append extra zero and use PROT_WRITE in addition, then the
buffer for the whole file is allocated and all the file content is
copied into it. All benefits are lost.
In my experiments with PROT_READ lua_pushexternalstring() was
accepting the buffers passed, probably because mmap() was using
cleared buffers.

So my question is: does lua_pushexternalstring() really need
exceptionally zero-terminated strings?

Best regards,
-- Andrew

Sean Conner

unread,
Nov 7, 2025, 7:25:15 PM (4 days ago) Nov 7
to lu...@googlegroups.com
It was thus said that the Great Andrey Dobrovolsky once stated:
I've checked the man page for mmap() across a few POSIX systems [1] and
even the POSIX standard [2] and I think your concern is bit overblown. The
map pages always stated that the offset and length have to be a multiple of
a page size. POSIX states:

The system shall always zero-fill any partial page at the end of an
object.

This is also mentioned on the Mac OS-X man page. A random file has a
1/pagesize chance of being exactly a multiple of a pagesize. So in the
common case, the mapping will have a 0-byte at the end. On the rare
occurance of a file having a length of a page size [3], one can always use
mmap() to anonymously map a page and the end of the file mapping.

As for lua_pushexternalstring() requiring a terminating NUL byte, Lua
guarentees that any string will always have a NUL byte, thus easing C
interoperability (a non-NUL terminated string being passed to functions like
printf() or strlen()).

-spc

[1] Linux, Mac OS-X

[2] https://pubs.opengroup.org/onlinepubs/9799919799/functions/mmap.html

[3] Obtainable via sysconf() (POSIX) or getpagesize()

Andrey Dobrovolsky

unread,
Nov 7, 2025, 10:28:10 PM (4 days ago) Nov 7
to lu...@googlegroups.com
Hi Sean,

сб, 8 лист. 2025 р. о 02:25 Sean Conner <se...@conman.org> пише:
>
> POSIX states:
>
> The system shall always zero-fill any partial page at the end of an
> object.
>

That's good news for me, thanks a lot!

> As for lua_pushexternalstring() requiring a terminating NUL byte, Lua
> guarentees that any string will always have a NUL byte, thus easing C
> interoperability (a non-NUL terminated string being passed to functions like
> printf() or strlen()).

Got it, thanks once again.

-- Andrew

Roberto Ierusalimschy

unread,
Nov 8, 2025, 9:39:24 AM (4 days ago) Nov 8
to lu...@googlegroups.com
> As for lua_pushexternalstring() requiring a terminating NUL byte, Lua
> guarentees that any string will always have a NUL byte, thus easing C
> interoperability (a non-NUL terminated string being passed to functions like
> printf() or strlen()).

As a concrete example, if you use any string as a file name in 'io.open',
Lua will just pass the char* to 'fopen'.

-- Roberto
Reply all
Reply to author
Forward
0 new messages