style experiments

Javier Guerra Giraldez

unread,

May 2, 2015, 4:28:02 AM5/2/15

to snabb...@googlegroups.com

Hi all,

I have a few of the "style proposals", or rather "experiments" for
further discussion:

1.- sandboxed apps [1][2]

- to use a new-style, you give it's 'dotted.path.name' to the
`config.app()` function instead of its class. the config file simply
checks if its argument is a string to use it to import it and execute
the apps initialization.

- I converted the basic apps (source, sink, repeater, etc). That
means a lua file for each one. also means there's no inheritance, so
each one had to repeat the `relink()` method originally defined in the
abstract app `Basic`.

- although there are very few examples of inheritance in apps; is easy
to imagine a few related apps that should be able to reuse some common
code. So I added in [2] an `include()` function to be used as a
composition mechanism. It simply runs another Lua file in the same
context as the current one. That means any 'global' function or
variable defined in the included file is accessible from the calling
file.

I think the main pros of this style are:

- cleaner and simpler code in apps

- any 'local' defined in the app is local to the instance, not to the
class; that means some important values can be slightly faster than
getting from fields of the instance table.

and the cons:

- the code is compiled and evaluated for each instance, which could
result in more memory overhead. still nowhere near as much as the
objective of a Lua State for each app, so i guess it's tolerable. we
could cache the compiled code, just before the first execution that
results in the initialization of the environment table, which becomes
the app instance; but it sounds like only a performance advantage for
the initialization and not core running time

2.- OOP syntax for core libraries.

- the minimal implementation for `link` and `packet` objects is in
[3]. Yes, it's just a single line at the end of each file.

in this case it was so simple because the objects already were FFI
ctypes and all interesting functions (except allocation) were already
"method-like": simple functions with the object in the first argument.
after setting the metatable the original API works just the same, but
now you can also use OOP-like methods. just like you can either do
"string.find(str, pattern)" or "(str):find(pattern)"

- in [4] is the changes to the basic apps to use OOP syntax. It
obviously reduces some lines sizes; but the short names of some
variables are more evident.

- in [5] i basically rewrote the basic apps to improve legibility.

there's a couple separate commits to show some extra changes when
defining an app:

- in [6] the call to `ffi.setmetatype()` uses a different table with
the specific methods. This makes the 'object' more focused, instead
of using the whole environment table. It could also be a point to add
some context to the names. ie. using `packet_length = length` instead
of `length = length` would make a line like "p:packet_length()...."
more explicit than "p:length()...", while the classic
"packet.length(p)...." would still work without changing name.

- in [7] the link.lua and packet.lua files adopt OOP syntax for method
definition. in the simplest case (like presented here), that means
that those methods are listed only on the `__index` field of the
metatype, not on the module table.

3.- others

in [8] the IPv6 tunnel app is transformed, both to the sandboxed style
and to use OOP syntax for link and packet. There seems to be a barely
noticeable speedup of 6%; probably because it uses a few local
variables instead of table fields.

[1]:https://github.com/javierguerragiraldez/snabbswitch/commit/986791790e7706f3ab943e238ef9b3d903e8b4f6

[2]:https://github.com/javierguerragiraldez/snabbswitch/commit/749100176c43ccd444026548736d0be84451c512

[3]:https://github.com/javierguerragiraldez/snabbswitch/commit/221c260fdbe99d76ff30ae36e981e731a0eb95f6

[4]:https://github.com/javierguerragiraldez/snabbswitch/commit/0eca8b6dd96658796e7ec4a9347d2b3d1eb79b00

[5]:https://github.com/javierguerragiraldez/snabbswitch/commit/221c260fdbe99d76ff30ae36e981e731a0eb95f6

[6]:https://github.com/javierguerragiraldez/snabbswitch/commit/7972512f8a69b9ccfcf8959d623173a357251253

[7]:https://github.com/javierguerragiraldez/snabbswitch/commit/98c8a601df1cb4fd793c23ae45bfdbd034609e9c

[8]:https://github.com/javierguerragiraldez/snabbswitch/commit/f73e3d48680fb860ca9db10d22076ddbf5333aa8

--
Javier

Luke Gorrie

unread,

May 3, 2015, 6:52:52 AM5/3/15

to snabb...@googlegroups.com

On 2 May 2015 at 10:28, Javier Guerra Giraldez <jav...@snabb.co> wrote:

I have a few of the "style proposals", or rather "experiments" for
further discussion:

This is very interesting and valuable work!

Very well presented too :).

1.- sandboxed apps [1][2]

- to use a new-style, you give it's 'dotted.path.name' to the
`config.app()` function instead of its class. the config file simply
checks if its argument is a string to use it to import it and execute
the apps initialization.

I like this for many reasons.

So, whereas for old-style apps we would write (or have written):

local basic = require("apps.basic.basic")
config.app('sink', basic.Sink)

for new-style apps we would write:

config.app('sink', 'apps.basic.sink')

and there is much to like here:

- The new version is simpler: app name matches the filename in the source.

- The new version is shorter: needn't bother require()ing modules to get apps.

- The new and old versions could easily be supported in parallel during a transition period. the engine could do one thing when the app is given as a table (class) and another when given as a string (module name).

Cool :).

I wonder if there is also a good way that we could shorten the names of apps (e.g. from 'apps.basic.sink' to 'sink') without introducing problems with ambiguity.

I have not dug into the implementation yet but I like the direction the interface is going! I suspect that the implementation will grow more constraints if it wants to allow separate Lua states and parallel execution of apps.

- I converted the basic apps (source, sink, repeater, etc). That
means a lua file for each one. also means there's no inheritance, so
each one had to repeat the `relink()` method originally defined in the
abstract app `Basic`.

Having a separate file for each app does seem pretty okay to me, even for this extreme case of the app only being around 5 lines of code.

The relink() method should perhaps be rethought in general...

These apps simply want a quick way to iterate over their input ports, without caring what they are called. This would "just work" if the link tables ('input' and 'output') indexed the links both by name (input.x and input.y) and by number (input[1] and input[2]). This could be taken care of in the engine and then the basic apps wouldn't need to relink().

Having said that, we currently have too little verification of the links in the app network. We should print good warning/error messages if you forget to connect a required link of an app or if you connect the same link more than once. This should be fixed in some nice way, perhaps partly in the engine and partly in the apps. (Likewise for invalid app configurations.)

- although there are very few examples of inheritance in apps; is easy
to imagine a few related apps that should be able to reuse some common
code. So I added in [2] an `include()` function to be used as a
composition mechanism. It simply runs another Lua file in the same
context as the current one. That means any 'global' function or
variable defined in the included file is accessible from the calling
file.

I am not so keen on this.

I would prefer to see an old-fashioned require() than a new-fangled include() for sharing common code. Otherwise the reader has to digress and read what include() does before they can understand the file.

Can we solve this in a nice way without adding the new programming construct?

If the problem is that we are requiring apps to implement too many callbacks then perhaps we could try to reduce those instead e.g. remove the relink() callback entirely in this instance so that there is no pressure on reusing callback/method implementations.

2.- OOP syntax for core libraries.

- the minimal implementation for `link` and `packet` objects is in
[3]. Yes, it's just a single line at the end of each file.

I have to say that I am impressed that you can do this so easily with Lua.

I agree that the extra work you did to create a specific method table for the objects is an improvement over automatically forwarding everything to the module (which, as you say, is bloated by the use of package.seeall).

I also reflect that our core APIs can continue to return FFI objects without risking heap allocation provided that they do it carefully:

Currently we have a problem that the 'struct packet *' returned by link.receive() may need to be allocated and garbage collected. That is because we are creating a new 'struct packet *' Lua object when we load the value out of the FFI array and this creates a boxed value allocation that may or may not be sunk by the JIT. This could be avoided if for example we would use a table to "intern" the 'struct packet *' pointers: lookup the existing Lua FFI pointer object in a table (based on memory address) instead of creating a new one.

That potentially removes on obstacle to sticking with our current style of using FFI pointers directly in our main APIs e.g. packet and link.

in [8] the IPv6 tunnel app is transformed, both to the sandboxed style
and to use OOP syntax for link and packet. There seems to be a barely
noticeable speedup of 6%; probably because it uses a few local
variables instead of table fields.

This is very interesting and, I think, significant.

If we can get ~5% more throughput in a non-trivial app network simply by adding a few 'local' keywords in the apps then it seems inevitable that this will happen. Even if we don't do this now it will be low-hanging fruit for somebody who needs it to reach a performance target one day. Then it would be hard to resist merging it onto master.

I compared master with your oop-minimal branch on the L2TPv3 app's selftest. That is, a Source app spamming 60-byte packets to a tunnel app to add encapsulation and then a sink. Here is what I see on Interlaken (2.4 GHz) when running for 10 minutes and pinned to one CPU core:

old: 19,323,115 packets per second

new: 20,113,735 packets per second

That is a 4.2% speedup with the code on your branch under this environment. That strikes me as a LOT for this non-trivial benchmark.

On first inspection this seems like a strong case to stop ignoring Mike Pall's advice (I sense your surprise from here...) and to consistently use low-overhead function/method calls.

If we stick with 'packet.length(p)' syntax then we would have to be prepared for a lot of boilerplate local caching ('local packet_length = require("packet").length') in the future. This will become a natural part of everybody's optimization campaigns.

The alternative, which seems to become more attractive to me all the time, would be to use the p:packet_length() syntax with efficient FFI metatable dispatch. This is concise, clear, and efficient.

Question: where, really, does this difference come from? Is it from the FFI metatables on link/packet objects, or from the app instance having a private 'local' environment instead of using "instance variables", or from both of those things, or from neither but instead something else that we are overlooking? (This is not to say that we need to dissect it in massive detail but just to confident that it is indeed strongly related to the basic programming style.)

Intriguing.

To be honest I have been anticipating that the performance difference would be 1% or less for this kind of benchmark. In that case I would argue that it is not enough to influence the design of our APIs or to encourage practices like routine local caching of functions like packet.length. That is a hard argument to make with the numbers we are looking at here though.

Cheers!

-Luke

Javier Guerra Giraldez

unread,

May 4, 2015, 5:05:43 AM5/4/15

to snabb...@googlegroups.com

On 3 May 2015 at 05:52, Luke Gorrie <lu...@snabb.co> wrote:
> Cool :).
>
> I wonder if there is also a good way that we could shorten the names of apps
> (e.g. from 'apps.basic.sink' to 'sink') without introducing problems with
> ambiguity.

i'm using `package.path` to search for the app file. we could add
some app-specific directory entry, like `package.path =
packate.path..';aps/.?.lua'

> I have not dug into the implementation yet but I like the direction the
> interface is going! I suspect that the implementation will grow more
> constraints if it wants to allow separate Lua states and parallel execution
> of apps.

i did stumble into a few sandbox "leaks", i.e differences between the
current implementation and what could be assumed on a many-state
world. for example, `package.loaded` is the same table for everybody,
so `require()`d modules are loaded once, instead of once per app.

I think a many-state implementation would be interesting to see, even
if there's no parallel execution yet.

> Having a separate file for each app does seem pretty okay to me, even for
> this extreme case of the app only being around 5 lines of code.

i like that too. i mentioned it because it becomes a requirement, not
only nice code layout.

> The relink() method should perhaps be rethought in general...

i think it can be very useful to have a callback at this point. maybe
the `relink()` name is too specific, something like `post_setup()`
could be more readable.

> These apps simply want a quick way to iterate over their input ports,
> without caring what they are called. This would "just work" if the link
> tables ('input' and 'output') indexed the links both by name (input.x and
> input.y) and by number (input[1] and input[2]). This could be taken care of
> in the engine and then the basic apps wouldn't need to relink().
>
> Having said that, we currently have too little verification of the links in
> the app network. We should print good warning/error messages if you forget
> to connect a required link of an app or if you connect the same link more
> than once. This should be fixed in some nice way, perhaps partly in the
> engine and partly in the apps. (Likewise for invalid app configurations.)

something like this?:

----- implementation -----
local function fields_present(table, f, ...)
if f == nil then return true end
if t[f] == nil then return false, f end
return fields_present(table, ...)
end

function required_inputs(...)
local ret, missing = fields_present(input, ...)
assert(ret, 'missing required input '..missing)
end
function required_outputs(...)
local ret, missing = fields_present(output, ...)
assert(ret, 'missing required output '..missing)
end
-----------------

--- usage (in tunnel.lua) -----
required_inputs('encapsulated', 'decapsulated')
required_outputs('encapsulated', 'decapsulated')
------

> I would prefer to see an old-fashioned require() than a new-fangled
> include() for sharing common code. Otherwise the reader has to digress and
> read what include() does before they can understand the file.
>
> Can we solve this in a nice way without adding the new programming
> construct?

it's not so easy, the semantics are just too different.

first i tried a new option to be used instead of `package.seeall` to
make the submodule environment the same as the calling app. it was
quickly a hideous mess of `gefvenv()` and metatables. in the end the
shared `package.loaded` meant the submodule was executed only once, so
it couldn't add itself to each calling app.

giving each app it's own copy of `package.loaded` was complex and
fragile. i think this might have to wait until we have multiple Lua
states.

then i tried a "spill()" function, to copy submodule-defined variables
into the app environment. that seemed to work for a while, but the
pulled functions didn't have access to the app's "global" variables,
so i had to add those as arguments to the submodule, making it
not-so-standard `require()`, and so much boilerplate that it negated
most of the benefits of the sandbox.

> If the problem is that we are requiring apps to implement too many callbacks
> then perhaps we could try to reduce those instead e.g. remove the relink()
> callback entirely in this instance so that there is no pressure on reusing
> callback/method implementations.

i think currently there's very few presure on reusing implementations;
just that it might be a good tool to use, and having `import()`
available (or a clever use of `require()`) would reduce requests for
inheritance.

> Question: where, really, does this difference come from? Is it from the FFI
> metatables on link/packet objects, or from the app instance having a private
> 'local' environment instead of using "instance variables", or from both of
> those things, or from neither but instead something else that we are
> overlooking? (This is not to say that we need to dissect it in massive
> detail but just to confident that it is indeed strongly related to the basic
> programming style.)

I did a few tests switching some lines of tunnel.lua back and forth
between the original and the modified one, checking performance at
each step.

tonight, it seems using locals vs globals for the instance variables
have very little, if any, impact.

the most significant change seems to be `p:prepend(header,
HEADER_SIZE)` vs. `packet.prepend(p, header, HEADER_SIZE)`, in the
encapsulation path.

the `link.receive(...)` lines would have a similar impact, but they
were already cached because of the `local receive, transmit =
link.receive, link.transmit` common at the top of the file.

so, the conclusion seems to be not that OOP syntax is "better" than
manually cacheing, but that it's less likely to miss one or two
significant points.

now it gets interesting to see if converting freelist.lua to this
style would make it better...

--
Javier

Luke Gorrie

unread,

May 6, 2015, 1:30:12 AM5/6/15

to snabb...@googlegroups.com

On Monday, 4 May 2015 11:05:43 UTC+2, Javier Guerra Giraldez wrote:

i'm using `package.path` to search for the app file. we could add
some app-specific directory entry, like `package.path =
packate.path..';aps/.?.lua'

Sounds interesting.

Note: we want to load code from the C symbols embedded inside the snabb binary and not from the file system. That is so that we are self-contained and can copy the snabb binary around.

I think a many-state implementation would be interesting to see, even
if there's no parallel execution yet.

Agree.

Could be that we hate it for some reason and it would be good to find out sooner rather than later :).

> I would prefer to see an old-fashioned require() than a new-fangled
> include() for sharing common code. Otherwise the reader has to digress and
> read what include() does before they can understand the file.
>
> Can we solve this in a nice way without adding the new programming
> construct?

it's not so easy, the semantics are just too different.

I am really cautious of the ideas with include(), spill(), getfenv(), etc. I see a big risk of making the code much harder to explain.

Can we keep it simple with require() and subroutines and then consider fancy notions later if verbosity is a persistent problem?

so, the conclusion seems to be not that OOP syntax is "better" than
manually cacheing, but that it's less likely to miss one or two
significant points.

OK.

I really dislike locally caching individual functions. Calling a function in another module is the most natural thing in the world and it would be sad to always have to worry about whether you should cache it with a 'local' before calling it.

So if the FFI metatables (OOP syntax) really does contribute a ~5% speedup over either no caching or only module caching then that is a strong point in my opinion. I don't think it has to beat local caching of functions on performance -- it is enough to beat that on being easier to use.

I am still surprised that the impact so high. Constant-key hashtable lookups require very few instructions in LuaJIT. On the other hand if we are doing it a hundred million times per second or so then I suppose you have to expect it to add up.

Javier Guerra Giraldez

unread,

May 6, 2015, 4:10:36 AM5/6/15

to snabb...@googlegroups.com

On 6 May 2015 at 00:30, Luke Gorrie <luk...@gmail.com> wrote:
> Note: we want to load code from the C symbols embedded inside the snabb
> binary and not from the file system. That is so that we are self-contained
> and can copy the snabb binary around.

uuuuuhhh.... i had totally forgot about that! and reading LuaJIT
source, i see that the symbol loader is in the same function as the
'package.preload' one, which is precisely what we don't want of
require(). grmbl, mpf.

well, the second and third loaders seem to do similar things... i'll
do some tests. if not, i'll see if i can reproduce the 'load lua from
a linked symbol' functionality with only FFI functions, in the worst
case, i'll just copy the 3-5 lines from the preload loader.

>> I think a many-state implementation would be interesting to see, even
>> if there's no parallel execution yet.
>
>
> Agree.
>
> Could be that we hate it for some reason and it would be good to find out
> sooner rather than later :).

at the very least, it should give an idea of the kind of memory
overhead we would see. the LuaState* structs by themselves shouldn't
be a problem, but there would be some Lua values and code repeated for
each one.

>> it's not so easy, the semantics are just too different.
>
> I am really cautious of the ideas with include(), spill(), getfenv(), etc. I
> see a big risk of making the code much harder to explain.

the (currently) bad thing about `require()` is that it loads the
requested module only once, and returns the same result again and
again. my simplistic attempts to provide a different `package.loaded`
table to each sandboxed app didn't work (yet) i consider this a
sandbox leakage, so i'm still trying to plug it. if it becomes too
much complex, i'll start with the "one LuaSpace* per app" aproach.

--
Javier

Luke Gorrie

unread,

May 6, 2015, 4:57:34 AM5/6/15

to snabb...@googlegroups.com

On 6 May 2015 at 10:10, Javier Guerra Giraldez <jav...@snabb.co> wrote:

the (currently) bad thing about `require()` is that it loads the
requested module only once, and returns the same result again and
again. my simplistic attempts to provide a different `package.loaded`
table to each sandboxed app didn't work (yet) i consider this a
sandbox leakage, so i'm still trying to plug it. if it becomes too
much complex, i'll start with the "one LuaSpace* per app" aproach.

Interesting thought.

So if we are speculatively taking steps down this path:

1. Sandboxes in plain Lua.

2. Sandboxes as Lua states.

3. Sandboxes as Lua states that run in parallel.

Then a sensible question is how much proof-of-concept work to do on each step before moving on to the next.

To me it starts to sound like we are ready to start moving on from #1 to #2. The sandboxes that we have now are enough to see basically how the programming style will be (more opinions on this would be welcome) and I don't know if we gain so much by working more on the mechanism if the next step is to switch over to Lua states as our sandboxes.

I also think it makes sense to focus on making sure the Lua sandbox model is okay sequentially before we move on to the parallel step. It is possible that separate states will be a dead-end for some reason and that might make us want to rethink our whole approach.

So the next PR that I would be keen to dig into is with apps running in private Lua states and still with everything running single-threaded :).

I wonder what the potential surprises will be... for example how will multiple Lua states interact with running the LuaJIT profiler? is there any potential for conflict between e.g. FFI definitions between states? etc. I don't have a strong mental model of how this stuff works in LuaJIT and for now I am assuming that other people have already been down these roads before us...

Justin Cormack

unread,

May 6, 2015, 6:14:50 AM5/6/15

to snabb...@googlegroups.com

On 6 May 2015 at 09:57, Luke Gorrie <lu...@snabb.co> wrote:
> I wonder what the potential surprises will be... for example how will
> multiple Lua states interact with running the LuaJIT profiler? is there any
> potential for conflict between e.g. FFI definitions between states? etc. I
> don't have a strong mental model of how this stuff works in LuaJIT and for
> now I am assuming that other people have already been down these roads
> before us...

FFI definitions should be independent. Profiles will be too I think.
Everything is attached to the state basically.

Fun fact: you can create new Lua states via ffi...

Justin

Javier Guerra Giraldez

unread,

May 6, 2015, 9:42:57 AM5/6/15

to snabb...@googlegroups.com

On 6 May 2015 at 03:57, Luke Gorrie <lu...@snabb.co> wrote:
> I wonder what the potential surprises will be... for example how will
> multiple Lua states interact with running the LuaJIT profiler? is there any
> potential for conflict between e.g. FFI definitions between states? etc. I
> don't have a strong mental model of how this stuff works in LuaJIT and for
> now I am assuming that other people have already been down these roads
> before us...

yes, i would be surprised if we don't get much surprises... ;-)

what i expect is that since each LuaState keeps its own JIT traces and
optimizer, there would be a lot less crosstalk between far away pieces
of code, so a non-optimized app shouldn't bog down so heavily, because
other parts of the code might be running 100% compiled.

the code plan that Mike has adviced is to keep a running loop on each
state, likely with a blocking point; but definitely not an "expose
functions and expect to be called" because that makes the traces start
anew each time and there's no opportunity

--
Javier

Javier Guerra Giraldez

unread,

May 6, 2015, 9:44:24 AM5/6/15

to snabb...@googlegroups.com

(this is not my keyboard... didn't know gmail could send on a single keypress)

On 6 May 2015 at 08:42, Javier Guerra Giraldez <jav...@snabb.co> wrote:
> the code plan that Mike has adviced is to keep a running loop on each
> state, likely with a blocking point; but definitely not an "expose
> functions and expect to be called" because that makes the traces start
> anew each time and there's no opportunity

... for codepaths to become hot and get compiled.

--
Javier

Javier Guerra Giraldez

unread,

May 6, 2015, 9:47:18 AM5/6/15

to snabb...@googlegroups.com

On 6 May 2015 at 05:14, Justin Cormack <jus...@specialbusservice.com> wrote:
> FFI definitions should be independent.

yes, the core ones should be require()ed by the framework so you get
fully typed FFI objects. that just happen to be the same as used by
other LuaState*s

> Fun fact: you can create new Lua states via ffi...

yep!, that's what i'm currently prototyping! still very ugly, but
looks that it could be groomed up significantly once i get the basic
functionality in place.

--
Javier

Luke Gorrie

unread,

May 7, 2015, 12:12:41 AM5/7/15

to snabb...@googlegroups.com

On 6 May 2015 at 12:14, Justin Cormack <jus...@specialbusservice.com> wrote:

FFI definitions should be independent. Profiles will be too I think.
Everything is attached to the state basically.

This all sounds good to me.

I suppose that we will need to change our profiling habits somehow if the current '-jp' command will only control profiling in the scheduler and not the apps. That is probably fine though. The solution might be to always count the number of cycles used by each app (manually in the scheduler) and then use the profiler/jitdump/jittrace on individual apps that we want to optimize.

Generally speaking it seems positive to have strong separation between apps including being sure that their JIT traces don't interfere with each other and the scheduler.

Reply all

Reply to author

Forward