LAP: Lua Asynchronous Protocl (yielding/ready protocol)

Rett Berg

unread,

Apr 24, 2024, 4:37:40 PM4/24/24

to lu...@googlegroups.com

Hey Lua users,

First of all, let me say thank you for the awesome language. I really don't think it would be possible to better design Lua's approach to async. Without the simplicity it would be impossible for me to make such a simple protocol!

I've released a new library and protocol I've been working on for a month or so I call LAP[1]. I'd love feedback!

The "protocol" intends to be zero-dependency and trivial for library authors to integrate. They simply need to write the blocking parts of their APIs to `yield` specific values when they would otherwise block. My hope is that multiple library authors can implement the protocol and Lua users can write code in a "blocking style" that can also be run by a coroutine executor (i.e. concurrently).

Best,

Rett

[1]: https://github.com/civboot/civlua/blob/main/lib/lap/README.md

Sean Conner

unread,

Apr 24, 2024, 6:23:16 PM4/24/24

to lu...@googlegroups.com

It was thus said that the Great Rett Berg once stated:
> Hey Lua users,

Hello!

> First of all, let me say thank you for the awesome language. I really don't
> think it would be possible to better design Lua's approach to async.
> Without the simplicity it would be impossible for me to make such a simple
> protocol!
>
> I've released a new library and protocol I've been working on for a month
> or so I call LAP

> <https://github.com/civboot/civlua/blob/main/lib/lap/README.md>[1]. I'd

> love feedback!
>
> The "protocol" intends to be zero-dependency and trivial for library
> authors to integrate.

What do you mean by "zero-dependency"? Because I see it depends upon your
custom package manager 'pkg' to run, along with a few other dependencies. I
can't tell easily, but it appears that 'pkg' will download the modules at
run time if they aren't cached, which is something I do not like at all.
That aside ...

> They simply need to write the blocking parts of their
> APIs to `yield` specific values when they would otherwise block.

But why this:

> > yield(nil) or yield(false): forget the coroutine, the executor will not
> > run it.

For my own coroutine scheduler [2], I use yield(false/nil) to return
errors/timeouts to scheduled coroutines. A typical usecase for me would be:

-- send our request

socket:send(data)

-- set up a timeout in the future. If not cancelled,
-- in five seconds, we receive nil and an error code of
-- TIMEDOUT. This of course means that if after five seconds,
-- a reply comes in to our request, we have to ignore it

nfl.timeout(5)

-- yield our thread to await a reply. This is automagically
-- handled by the framework and will return the packet of
-- data when it's resumed.

local info,err = coroutine.yield()

-- cancel any pending timeout. If we haven't timedout, then
-- the timeout return is cancelled. If we have timedout,
-- this is a NOP

nfl.timeout(0)

if not info then
-- we timed out ...
end

Another issue I see (which is one I had to deal with) is if a coroutine
dies during execution. I see you don't check the status of the coroutine
before running it---coroutine.status() should return 'suspended' if you can
run it, otherwise there's some issue (if it's dead, then somehow a dead
coroutine was scheduled to run). And after calling coroutine.resume(), I
check to see if the coroutine status is dead---that means it's done running
and can be removed from the system.

> My hope is
> that multiple library authors can implement the protocol and Lua users can
> write code in a "blocking style" that can also be run by a coroutine
> executor (i.e. concurrently).

I'm not sure why you think this is necessary.

-spc

> [1]: https://github.com/civboot/civlua/blob/main/lib/lap/README.md

[2] https://github.com/spc476/lua-conmanorg/blob/master/lua/nfl.lua

Rett Berg

unread,

Apr 25, 2024, 6:16:16 PM4/25/24

to lua-l

Hey Sean, thanks for the great questions!

> What do you mean by "zero-dependency"?

The LAP protocol itself has zero dependencies. You simply yield the values specified in the protocol such as yield("sleep", 0.5) or simply yield(true). You can schedule coroutines/etc by simply interacting with the globals (i.e. LAP_READY[coroutine.create(function() ... end)] = "debug name")

> it appears that 'pkg' will download the modules at

run time if they aren't cached, which is something I do not like at all.

It absolutely will not do this -- quite the opposite. pkg is a library which enables local development of lua modules but falling back on require when necessary.

Second of all, the lap module isn't required to use the LAP protocol -- it just has some helpful data structures and also an example implementation of an executor (the Lap object).

> But why this:

> > yield(nil) or yield(false): forget the coroutine, the executor will not
> > run it.
> For my own coroutine scheduler [2], I use yield(false/nil) to return
errors/timeouts to scheduled coroutines. A typical usecase for me would be:

Because it uses error to signal errors. There are many cases where you want the coroutine forgotten and another coroutine may schedule it again with LAP_READY (see Send/Recv for one example).

> Another issue I see (which is one I had to deal with) is if a coroutine

dies during execution.

I should have made it more explicit, but with LAP it's invalid for the scheduled coroutines to return anything. As you said, it will cause a future error. Doing everything you said is a performance cost for little gain IMO :D

> I'm not sure why you think this is necessary.

Because I'd like to be able to write Lua applications that operate concurrently but the code itself is written in a blocking style. That's only possible if they all speak the same protocol.

Best,

Rett

Sean Conner

unread,

Apr 25, 2024, 8:05:13 PM4/25/24

to lu...@googlegroups.com

It was thus said that the Great Rett Berg once stated:

> > I'm not sure why you think [explicitly selecting async/sync] is necessary.

>
> Because I'd like to be able to write Lua applications that operate
> concurrently but the code itself is written in a blocking style. That's
> only possible if they all speak the same protocol.

I handled that differently. Take this function to obtain a file via
gopher:

function get_gopher(host,port,selector)
local input = {}
local conn = tcp.connect(host,port)
if conn then
conn:write(selector,"\r\n")
for line in conn:lines() do
table.insert(input.ine)
end
conn:close()
end
return input
end

If I want for this to run with blocking I/O (synchronously), then

local tcp = require "org.conman.net.tcp" -- [1]

gives me the TCP module with blocking behavior. If I want this to run
asynchronosly (say, I want to handle multiple connections, one per
coroutine), then:

local tcp = require "org.conman.nfl.tcp" -- [2]

Gives me that---no other change in the above function is required. The
two modules present the same API, but handle the connections very
differently beneath the scenes. Also, both use a module I wrote to present
an object [3] that has the same API as a Lua file object [4] where the only
two routines that need to be defined for it to work are obj:__refill() (to
read data) and obj:__drain() (to write data).

Obviously, how the synchronous and asynchronous fuctions are started are
different (synchronous can be called directly; asynchronous needs to be
spawned in a coroutine), but in my experience, it's rare for me to want to
switch a program between the two modes.

-spc

[1] https://github.com/spc476/lua-conmanorg/blob/master/lua/net/tcp.lua

[2] https://github.com/spc476/lua-conmanorg/blob/master/lua/nfl/tcp.lua

[3] https://github.com/spc476/lua-conmanorg/blob/master/lua/net/ios.lua

This only relies upon the string, table and math libraries from Lua.

[4] https://www.lua.org/manual/5.4/manual.html#pdf-file:close

Rett Berg

unread,

Apr 25, 2024, 8:17:43 PM4/25/24

to lu...@googlegroups.com

> no other change in the above function is required. The

two modules present the same API, but handle the connections very
differently beneath the scenes.

We have the same goals! Nice to see another person who would like to write in a blocking style but execute asynchronously.

The whole purpose of LAP_FNS_ASYNC (and *_SYNC) are so that the application (or test suite) can easily switch the lua globals between sync/async. The biggest offender here is the io module (technically also print but I ain't touching that).

To switch your application to async you just need to do

LAP_ASYNC = true

for _, fn in ipairs(LAP_FNS_ASYNC) do fn() end

require'something'.ioAsync() -- to override default "io" module

Note: the lap module (not protocol) provides lap.async() -- but all it does is exactly what is above.

Modules that support LAP will register their switching function to the respective LAP_FNS

The difference between our libraries (at first glance) is your implementation is closely tied to a specific library (i.e. nfl.SOCKETS) whereas mine is a more generic protocol -- it is "bring your own executor". You could make your library LAP compliant by instead of doing something like

nfl.SOCKETS:update(self.__socket,'w')

coroutine.yield()

do something like

while(self:someCheck()) do coroutine.yield('poll', self.fileno, POLLOUT) end

Note that the someCheck() is for executors that don't know how to handle poll.

Best,

Rett

--
You received this message because you are subscribed to a topic in the Google Groups "lua-l" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lua-l/QWXul3NUY1M/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lua-l+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lua-l/20240426000511.GG21073%40brevard.conman.org.

Sean Conner

unread,

Apr 25, 2024, 9:11:32 PM4/25/24

to lu...@googlegroups.com

It was thus said that the Great Rett Berg once stated:

> > no other change in the above function is required. The
> > two modules present the same API, but handle the connections very
> > differently beneath the scenes.
>
> We have the same goals! Nice to see another person who would like to write
> in a blocking style but execute asynchronously.
>
> The whole purpose of LAP_FNS_ASYNC (and *_SYNC) are so that the

> *application* (or test suite) can easily switch the lua globals between
> sync/async. The biggest offender here is the *io* module (technically also

> print but I ain't touching that).

Well, print() isn't defined by liblua, but instead is added by the Lua
executable itself, so it's not of any real concern to me.

> To switch your application to async you just need to do
>
> LAP_ASYNC = true
> for _, fn in ipairs(LAP_FNS_ASYNC) do fn() end
> require'something'.ioAsync() -- to override default "io" module

It seems that this is monkeypatching various Lua functions to yield at
appropriate spots (mainly I/O).

> Modules that support LAP will register their switching function to the
> respective LAP_FNS

I think I need to see a real example here, not the test code. I'm having
a hard time following how this is used.

> The difference between our libraries (at first glance) is your
> implementation is closely tied to a specific library (i.e. nfl.SOCKETS)
> whereas mine is a more generic protocol -- it is "bring your own executor".
> You could make your library LAP compliant by instead of doing something like

In a sense, yes, it does depend upon nfl.SOCKETS, which is an object that
implements epoll() (Linux), kqueue() (Mac OS-X), poll() (Solaris) or
select() (if all else fails) with a consistent API. That's because on most
Unix systems these days (especially Linux), the various select() functions
don't work with files, as they're aggressively cached in memory! select()
(et. al.---I'm using "select()" as the "catch-all function for polling for
I/O readiness on POSIX) really only work with sockets and TTYs, which is why
my implementation is geared around it.

> nfl.SOCKETS:update(self.__socket,'w')
> coroutine.yield()

This particular sequence exists when writing data to a socket/TTY. I
change the select() trigger to 'write ready', then yield, letting the main
event loop handle the resulting triggers for I/O. Once the data is written,
the socket/TTY trigger is flipped back to 'read ready'.

> do something like
>
> while(self:someCheck()) do coroutine.yield('poll', self.fileno, POLLOUT)
> end
>
> Note that the someCheck() is for executors that don't know how to handle
> poll.

I don't see the above as the same as what I wrote. But then when I write
code with coroutines, it's mostly to handle network I/O, usually in a server
context. I would need to see an actual implementation of an "executor" that
doesn't know how to handle poll to fully understand your implementation.

-spc

Rett Berg

unread,

Apr 25, 2024, 9:54:26 PM4/25/24

to lu...@googlegroups.com

> It seems that this is monkeypatching various Lua functions to yield at
appropriate spots (mainly I/O).

Yup, but it's only the application (or tests) that actually call the monkey patch -- effectively switching the "mode" of the entire language with very little code difference.

> I think I need to see a real example here, not the test code. I'm having
a hard time following how this is used.

TLDR; I enjoy writing small (and eventually) well-tested libraries which make otherwise complex problems easier to implement. I want these libraries to work in a multitude of use-cases, especially both blocking and non-blocking.

Let's say you wanted to write a text editor or similar. In your editor you want to depend on a nifty library like LinesFile because the parser (for syntax highlighting) depends on it

https://github.com/civboot/civlua/blob/main/lib/ds/ds/file.lua#L32

The actual file handle depends on whatever comes out of io.open, so it will block unless monkey patched. If you implement something like FDT (fd thread) then you can swap them out (see fd.c and fd.lua for full implementation. Basically: execute reads/etc in another thread and yield until complete)

https://github.com/civboot/civlua/blob/a951f12395c2bf7c0197eb92a2efe13314d3dc72/lib/fd/fd.h#L26

Obviously we could also do dependency injection, but at that point this becomes more like a job and less like a hobby for me.

> the various select() functions don't work with files ...etc

I'm curious to hear more. I use straight file descriptors (regular files in a separate pthread) -- do they have the same problem?

> I would need to see an actual implementation of an "executor" that

doesn't know how to handle poll to fully understand your implementation.

Well, the simplest possible executor is to instantly schedule everything that yields a truthy value as it's first yield result. That obviously will burn CPU, but it will result in concurrent execution!

--
You received this message because you are subscribed to a topic in the Google Groups "lua-l" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lua-l/QWXul3NUY1M/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lua-l+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/lua-l/20240426011130.GH21073%40brevard.conman.org.

Sean Conner

unread,

Apr 25, 2024, 10:38:56 PM4/25/24

to lu...@googlegroups.com

It was thus said that the Great Rett Berg once stated:
>

> > I think I need to see a real example here, not the test code. I'm
> > having a hard time following how this is used.
>

> Let's say you wanted to write a text editor or similar. In your editor you
> want to depend on a nifty library like LinesFile because the parser (for
> syntax highlighting) depends on it
>
> https://github.com/civboot/civlua/blob/main/lib/ds/ds/file.lua#L32
>
> The actual file handle depends on whatever comes out of io.open, so it will
> block

Maybe not, see below.

> > the various select() functions don't work with files ...etc
>
> I'm curious to hear more. I use straight file descriptors (regular files
> in a separate pthread) -- do they have the same problem?

Yes. If you open a file (not a device, FIFO, socket or directory) and
use that file descriptor in select() (poll(), epoll(), kqueue(), etc) it
will ALWAYS register as ready. Most of the time, the file will be cached in
RAM and a read will just be a memcpy() out of cache. Yes, some times it
won't be cached and thus, the caller is blocked, but you won't be able to
work around that with select().

Your method, of calling into another thread to do the I/O does work, but
to me, it's only a win if the file in question isn't cached in RAM, and once
it is, then the overhead of coordinating two threads seems like it will kill
any performance gains, but as always in these cases, measurements are
required (and difficult to do as it's hard to eject files from
cache---there's no API that I know of to do it under POSIX).

> > I would need to see an actual implementation of an "executor" that
> > doesn't know how to handle poll to fully understand your implementation.
>
> Well, the simplest possible executor is to instantly schedule everything
> that yields a truthy value as it's first yield result. That obviously will
> burn CPU, but it will result in concurrent execution!

I guess I'm failing to see the use of this. I'm sorry, I just need to see
an actual program with an "executor" other than select() ...

-spc

Paul Eipper

unread,

Apr 26, 2024, 10:30:41 AM4/26/24

to lu...@googlegroups.com

On Thu, Apr 25, 2024 at 11:38 PM Sean Conner <se...@conman.org> wrote:

Your method, of calling into another thread to do the I/O does work, but
to me, it's only a win if the file in question isn't cached in RAM, and once
it is, then the overhead of coordinating two threads seems like it will kill
any performance gains, but as always in these cases, measurements are
required (and difficult to do as it's hard to eject files from
cache---there's no API that I know of to do it under POSIX).

Tangential to the discussion here, just wanted to mention the ones I am aware of on Linux to try and get reproducible benchmarks:

https://man7.org/linux/man-pages/man2/posix_fadvise.2.html

https://man7.org/linux/man-pages/man5/proc.5.html (see `drop_caches`)

Indeed not necessarily as deterministic and fine-grained as one would like, but can help in getting a more comparable situation between benchmark runs.

There may be other tricks as well that I am not aware of.

att,

--

Paul Eipper

Rett Berg

unread,

Apr 26, 2024, 11:02:02 AM4/26/24

to lu...@googlegroups.com

> Yes. If you open a file (not a device, FIFO, socket or directory) and

use that file descriptor in select() (poll(), epoll(), kqueue(), etc) it
will ALWAYS register as ready.

I thought surely you were incorrect. Surely if there is already a thread writing to a file then it wouldn't be POLLIN or POLLOUT. Oh how mistaken I was :(

https://github.com/vitiral/notes/blob/main/c/pollthread.c

So... for the thread-backed solution I need to go back to my original design of eventfd for unix/BSD. I've only now realized that for Mac I can use a socketpair where I write 1 byte to indicate readiness and the other side reads one byte -- reducing waste with F_SETPIPE_SZ... though it's still wasteful (grumble). However, both of those solutions should work correctly with poll.

I haven't looked at Windows, hopefully there is some kind of similar solution there.

> then the overhead of coordinating two threads seems like it will kill any performance gains

I very much doubt this because of latency-numbers-every-programmer-should-know[1]. For small file reads it probably matters little either way since, as you say, the filedescriptor itself is caching it. But for large file reads (larger than the buffer which seems to be about 64KiB typically) it absolutely matters. Also, there is the time to open the file, which I'm also moving to a separate thread.

It's easy enough to get a rough idea by having one coroutine (asynchronously) open -> read or write somesize -> close -> run=false and have another coroutine which is just doing while run do count = count + 1; yield(true) end. Then see the count after the first completes.

Oh, and also benchmark to an entirely blocking implementation in terms of pure time taken.

I'll be sure to do this once I get something implemented.

[1]: https://static.googleusercontent.com/media/sre.google/en//static/pdf/rule-of-thumb-latency-numbers-letter.pdf

--

You received this message because you are subscribed to a topic in the Google Groups "lua-l" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lua-l/QWXul3NUY1M/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lua-l+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/lua-l/CAO8R-xiSnB0BQDQ25oFhcjZvYzJOV_HpzvHiOMch%3DMQhD8rocQ%40mail.gmail.com.

Rett Berg

unread,

Apr 26, 2024, 5:21:00 PM4/26/24

to lu...@googlegroups.com

polling eventfd will work quite nicely

https://github.com/vitiral/notes/blob/main/c/pollsemfd.c

I also did a version using socketpair and I made the diagnostics better

https://github.com/vitiral/notes/blob/main/c/pollsocket.c

This prints lots of lines like:

thread unlocked c=1
poll 1: 0 0
poll 20855: 0 1 -- the above status (neither readable) existed for 20.8k loops of poll()
poll 61: 0 0 -- for 61 loops the thread socket was readable
poll 55: 1 0 -- for 55 loops neither was readable
poll 170: 0 0 -- for 170 loops the main socket was readable
thread unlocked c=1
poll 18003: 0 1 -- continued

This is using a 16MiB file that is being written() -> seek(0) -> read() -> seek(0). You get about 18k "loops" while the file is being read+written to.

It will be interesting to see how lua performs with similar benchmarks.

Rett Berg

unread,

May 16, 2024, 1:22:23 PM5/16/24

to lu...@googlegroups.com

Just to close the loop on this, I've written my first library that uses the LAP protocol for handling the filedescriptors associated with a unix shell running under a child process

https://github.com/civboot/civlua/tree/main/lib/civix

The code, the line using LAP_ASYNC shows how the functions are all executed concurrently on the non-blocking filedescriptors.

-- sh:finish{ShFin} -> out, err
-- finish files (in sh or other) by writing other.input to stdin and reading
-- stdout/stderr. All processes are done asynchronously
M.Sh.finish = function(sh, other)
other = M.ShFin(other or {})
local inpf = other.stdin or sh.stdin
local outf = other.stdout or sh.stdout
local errf = other.stderr or sh.stderr
if not (other.input or outf or errf) then return end
local fns, out, err = {}
if other.input then assert(inpf, 'provided input without stdin')
push(fns, function()
inpf:write(other.input); inpf:close()
end)
end
if outf then push(fns, function() out = outf:read() end) end
if errf then push(fns, function() err = errf:read() end) end
if LAP_ASYNC then lap.all(fns) else M.Lap():run(fns) end
return out, err
end

Reply all

Reply to author

Forward