Request: ZSUM

81 views
Skip to first unread message

camcaine

unread,
Apr 27, 2011, 7:13:44 AM4/27/11
to Redis DB
I could really use a function for a sorted set that gives back a SUM
of all the scores,
a bit like ZCOUNT but possibly with the ability to have a start and
stop option to limit the sum based on the range.

Any chance this might be implemented one day?

thanks!

Salvatore Sanfilippo

unread,
Apr 27, 2011, 7:17:51 AM4/27/11
to redi...@googlegroups.com
Hello, yes it is possible that we add options to ZCOUNT in order to do this.

like ZCOUNT key min max [OPTION] where option can be for instance SUMSCORES.

We need to know the exact use case to understand if it is something
that can be considered general enough.
Also it will be of great help to get feedbacks from the list about
this, useful to somebody else?
Btw after a proposal like this we wait at least a few months and then
reconsider the issue, in order to avoid adding things in an impulsive
way.

Salvatore

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>

--
Salvatore 'antirez' Sanfilippo
open source developer - VMware

http://invece.org
"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele

camcaine

unread,
Apr 27, 2011, 7:53:31 AM4/27/11
to Redis DB
My use case is for example a football team.
imagine a sorted set:

ZRANGE top_goal_scorers:2011 0 -1 WITHSCORES
# => Ronaldo(45), Messi(30), Rooney(25)

ZCOUNT top_goal_scorers:2011 0 -1 SUM
#=> 100

Maybe even have MIN/MAX/AVG?

Thanks
Cameron

On Apr 27, 12:17 pm, Salvatore Sanfilippo <anti...@gmail.com> wrote:
> Hello, yes it is possible that we add options to ZCOUNT in order to do this.
>
> like ZCOUNT key min max [OPTION] where option can be for instance SUMSCORES.
>
> We need to know the exact use case to understand if it is something
> that can be considered general enough.
> Also it will be of great help to get feedbacks from the list about
> this, useful to somebody else?
> Btw after a proposal like this we wait at least a few months and then
> reconsider the issue, in order to avoid adding things in an impulsive
> way.
>
> Salvatore
>
>
>
>
>
>
>
>
>
> On Wed, Apr 27, 2011 at 1:13 PM, camcaine <camcai...@googlemail.com> wrote:
> > I could really use a function for a sorted set that gives back a SUM
> > of all the scores,
> > a bit like ZCOUNT but possibly with the ability to have a start and
> > stop option to limit the sum based on the range.
>
> > Any chance this might be implemented one day?
>
> > thanks!
>
> > --
> > You received this message because you are subscribed to the Google Groups "Redis DB" group.
> > To post to this group, send email to redi...@googlegroups.com.
> > To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> > For more options, visit this group athttp://groups.google.com/group/redis-db?hl=en.

Pierre Chapuis

unread,
Apr 27, 2011, 10:10:09 AM4/27/11
to redi...@googlegroups.com
On Wed, 27 Apr 2011 04:53:31 -0700 (PDT), camcaine wrote:
> My use case is for example a football team.
> imagine a sorted set:
>
> ZRANGE top_goal_scorers:2011 0 -1 WITHSCORES
> # => Ronaldo(45), Messi(30), Rooney(25)
>
> ZCOUNT top_goal_scorers:2011 0 -1 SUM
> #=> 100
>
> Maybe even have MIN/MAX/AVG?

Also, the sum of squares and so on...

The only thing that will be able to meet those needs once
and for all is embedding a scripting language in the Redis
server to run user-defined procedures. It has drawbacks
(could hang the server if used incorrectly, adds a couple of
kilobytes to the process) but the advantages are worth it.

--
Pierre Chapuis

Javier Guerra Giraldez

unread,
Apr 27, 2011, 10:20:06 AM4/27/11
to redi...@googlegroups.com
On Wed, Apr 27, 2011 at 9:10 AM, Pierre Chapuis <cat...@archlinux.us> wrote:
> Also, the sum of squares and so on...
>
> The only thing that will be able to meet those needs once
> and for all is embedding a scripting language in the Redis
> server to run user-defined procedures. It has drawbacks
> (could hang the server if used incorrectly, adds a couple of
> kilobytes to the process) but the advantages are worth it.

Alchemy database (Redis plus SQL-like tables) already includes Lua for this.

--
Javier

Pierre Chapuis

unread,
Apr 27, 2011, 10:44:02 AM4/27/11
to redi...@googlegroups.com

Yes I know, I am pondering using it instead of Redis. But I don't like
the
fact that it adds a lot of SQL-like commands. Moreover it's not
necessarily
up to date with the Redis trunk, and afaik it's not primarily
maintained in
Git.

Last thing: I don't like the fact that you have to send a Lua chunk
every
time you run a command, there should be a way to preload them and then
run
them with arguments.

So maybe I'll turn to Alchemy someday, but maybe I'll have a try at
embedding
Lua in Redis myself instead.

camcaine

unread,
Apr 27, 2011, 11:01:41 AM4/27/11
to Redis DB
But this would fit nicely with other functions currently in redis I
feel without adding too much of a query language.
Min and max could be achieved with rank easily enough. But I think
SUM (and possibly AVG) would be good additions.

Javier Guerra Giraldez

unread,
Apr 27, 2011, 11:13:21 AM4/27/11
to redi...@googlegroups.com
On Wed, Apr 27, 2011 at 9:44 AM, Pierre Chapuis <cat...@archlinux.us> wrote:
>> Alchemy database (Redis plus SQL-like tables) already includes Lua for
>> this.
>
> Yes I know, I am pondering using it instead of Redis. But I don't like the
> fact that it adds a lot of SQL-like commands. Moreover it's not necessarily
> up to date with the Redis trunk, and afaik it's not primarily maintained in
> Git.

right. i was pointing it as food for tought; a possible enhancement
to Redis, maybe as an out-of-tree patch that tries to keep applicable
to current versions of Redis.


> Last thing: I don't like the fact that you have to send a Lua chunk every
> time you run a command, there should be a way to preload them and then run
> them with arguments.

there's a way, but it's somewhat awkward. also the parameter format
isn't very efficient nor appropriate from a Lua point of view. I
modified it a bit and got slightly better results. still not enough
to prompt me to switch from Redis to Alchemy.

Another issue is that the 'thousands of short Lua calls' works great
with standard Lua, but is not optimal for LuaJIT2, so the speedup is
limited. What does work great is a loop in Lua that communicates
with the C core, which means either some more work understanding the
Redis core, or (better IMHO) add some 'side port' to communicate with
a 'helper' thread.... i'd like to use ZMQ for that!

> So maybe I'll turn to Alchemy someday, but maybe I'll have a try at
> embedding
> Lua in Redis myself instead.

that's pretty straightforward in the simple cases, and given the
efficiency of even the standard Lua VM, the speedup over shuffling all
the relevant data to another process would be significant.

--
Javier

Salvatore Sanfilippo

unread,
Apr 27, 2011, 11:49:58 AM4/27/11
to redi...@googlegroups.com
On Wed, Apr 27, 2011 at 4:10 PM, Pierre Chapuis <cat...@archlinux.us> wrote:
> The only thing that will be able to meet those needs once
> and for all is embedding a scripting language in the Redis
> server to run user-defined procedures. It has drawbacks
> (could hang the server if used incorrectly, adds a couple of
> kilobytes to the process) but the advantages are worth it.

Hello Pierre,

It think it's about time to try a branch with Lua scripting added.
Just I want to make sure to do it the right way.
This is what I propose as an API:

EVAL <body> <arg> <arg> <arg> <arg> ...

basically you never define new commands, you pass instead full scripts.
Redis will try to be smart enough to cache Lua interpreters with
defined functions so that it is fast, but I like a lot the idea of
having to pass the script again and again, so everything is in the
application logic.

Another thing I want to understand is how to make this play well with
cluster. This is why I delayed this feature. I want to have cluster
more or less working (it's just a matter of two months at this point).
The only thing required to work in a cluster environment is to use
just a single key in the script (only if you are using Redis Cluster),
at least for now (as multi keys commands will not be supported), and
to specify in some way the positions of the keys arguments, so that we
can handle redirection in the right way, loading in the case of
diskstore (assuming we'll ship this in a stable release at some
point), and so forth.

The problem is, for performance reasons we want to know what arguments
are keys before the script is executed at all...
So we need to modify the script command into something like that:

EVAL <script> <num_keys> <arg> <arg> <arg> <arg> ...

this way we have the second argument of EVAL that is the number of
arguments that are keys (followed by all the other arguments).

Other changes are needed to get all this in the Redis internals, but
it si definitely something not impossible to achieve in short time...
:) I'll have updates asap.

Ciao,
Salvatore

Pierre Chapuis

unread,
Apr 27, 2011, 12:24:39 PM4/27/11
to redi...@googlegroups.com
On Wed, 27 Apr 2011 17:49:58 +0200, Salvatore Sanfilippo wrote:

> Hello Pierre,
>
> It think it's about time to try a branch with Lua scripting added.
> Just I want to make sure to do it the right way.

This is extremely good news, hopefully it will allow me to stop
implementing custom commands in C and rely on Lua snippets instead!

> This is what I propose as an API:
>
> EVAL <body> <arg> <arg> <arg> <arg> ...
>
> basically you never define new commands, you pass instead full
> scripts.
> Redis will try to be smart enough to cache Lua interpreters with
> defined functions so that it is fast, but I like a lot the idea of
> having to pass the script again and again, so everything is in the
> application logic.

I was more for a two-step process (DEFINE / RUN) but it would be more
difficult with cluster so I'm OK with your proposal.

> The problem is, for performance reasons we want to know what
> arguments
> are keys before the script is executed at all...
> So we need to modify the script command into something like that:
>
> EVAL <script> <num_keys> <arg> <arg> <arg> <arg> ...
>
> this way we have the second argument of EVAL that is the number of
> arguments that are keys (followed by all the other arguments).

As long as the script can still access keys, that's a good idea. But
it should still be possible to do things like (stupid example):

total_number_of_posts = function(users_set)
n = 0
for _,user_id in ipairs(redis.smembers(users_set)) do
n = n + redis.hget("user:" .. user_id).number_of_posts
end
return n
end

--
Pierre Chapuis

Salvatore Sanfilippo

unread,
Apr 27, 2011, 12:39:19 PM4/27/11
to redi...@googlegroups.com
Hola, much more info here:

http://antirez.com/post/redis-and-scripting.html

Cheers,
Salvatore

> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/redis-db?hl=en.
>
>

--

Jak Sprats

unread,
Apr 27, 2011, 2:13:35 PM4/27/11
to Redis DB
Hi All,

I only have about 5 minutes to respond right now, so I am going to
dump any helpful info I have.

Alchemy's first integration of LUA took a string and passed it to the
lua interpreter. The next version I did more or less the EVAL
Salvatore described and used lua_pushnumber, lua_pushstring, ....
lua_call and it was about 3X quicker, it was awesome. This has been
sponsored work, so I can not yet open source it, which is very
frustrating, but it is life.

Additionally using Luajit2, which is simple to include, the speeds of
the scripts approach java post Jit'ed speed.

The key to the lua integration though is defining a lua function that
can access redis' data functions from within lua scripts. My first
attempt at this was also awkward, but someone (forget name now) showed
me how to do this interface better, and basically I put all responses
in a lua table (the options would be returning a string, number, or
table, but putting everything in a table worked better for nested
calls, which get very tricky).

that said, embedding lua/luajit2 is dangerous because it opens up all
sorts possible abuses by the developer (still I think its awesome).

ok, when I have more time, I will dump any helpful hints I have

and another thing I researched is embedding V8, which is doable, but
not at an interpretive level like lua, and would make the node.js guys
slobber :)

- Jak

Jak Sprats

unread,
Apr 27, 2011, 2:32:28 PM4/27/11
to Redis DB
Hi All,

got 5 more minutes, I need to find more time and read this thread and
the post thoroughly, but wont have that until later.

Another important thing I came across when embedding Lua, was it
turned the database into something I developed on, so there has to be
straightforward ways to upload edited code.

Alchemy has:
1.) CONFIG SET luafilename startup_lua_funcs.lua
2.) CONFIG ADD luafile xyz.lua

#1 resets lua's state and passes the contents of startup_lua_funcs.lua
to the lua interpreter
#2 just passes the contents of xyz.lua to the lua interpreter

In practice I mostly use #2 while developing, but #1 is nice for
helper functions that you ALWAYS need.

Once these command upload processes are in place, you can actually
iteratively develop in lua in/on redis, it is wild, you make changes
to your lua file, upload them, run the LUA(EVAL) command and new
things happen, and the interpretation/jit time on upload is nothing.

5 minutes up

- Jak

p.s. Damon was the guy who helped me w/ correctly defining the C/lua
client()'s return format, he gets this stuff really well

Jak Sprats

unread,
Apr 27, 2011, 10:27:35 PM4/27/11
to Redis DB
Hi All,

ok, I read through this now and I dont agree w/ one aspect (if I
understand it correctly).

EVAL <script> <num_keys> <arg> <arg> <arg> <arg>

Would this pass the contents of <script> to the lua interpreter, if
so, this is a sub-optimal solution.

IMHO, it is easy to add functions to luaState by calling
luaL_loadfile(), or even read them in over the wire and call
luaL_dostring() to store functions in lua itself. There is no need to
store lua functions in redis, they remain in lua until you restart the
server.

Then the command is something like: ("luafunc" already defined by
luaL_loadfile() or luaL_dostring())
LUAFUNC luafunction numkeys arg1 arg2 arg3
and do the following:
lua_getfield(Lua, LUA_GLOBALSINDEX, luafunc); /* function to call
*/
lua_pushinteger(Lua, arg1);
lua_pushlstring(Lua, arg2, arg2.len);
lua_pushinteger(Lua, arg3);
lua_call(Lua, numkeys, 0);
This approach avoids the lua interpreter, uses the lua stack like it
should be used, and for short requests is 3X+ faster (I benchmarked
this last month).

Just my 2 cents, I have struggled w/ my own lua integration into
redis, and avoiding the lua intepreter is a good idea, getting it to
work like I wanted took forever (5 iterations maybe).

- Jak

Alexander Gladysh

unread,
Apr 27, 2011, 10:56:05 PM4/27/11
to redi...@googlegroups.com

There is no "source code interpreter" in Lua as you seem to imply here.

There is a compiler and a VM that interprets bytecode instructions,
generated by compiler.

That being said, suggested approach, when Lua source code is
transferred for each command, is in my humble opinion, indeed
suboptimal.

You pay for bandwidth and you pay for source code compilation.

And if you decide to use LuaJIT2 instead of classic Lua VM (and
believe me, you will), you will pay even more — if code would be
compiled and executed only once it will not ever be compiled into
machine code.

DEFINE/RUN separation makes much more sense to me.

And you may keep RUN to be the same as current EXEC — i.e. allow
arbitrary code as the first argument, just let user to DEFINE code
that it want to be cached between invocations.

You may also want to let users to DEFINE some code offline, to be
loaded at start. This will be requested by some users, I guess.

I would suggest following protocol:

DEFINE <name> <string> <arg_type1> <arg_name1> ... <arg_typeN> <arg_nameN>

RUN <string> <arg1> ... <argN>

Example:

DEFINE "lua_sum" "return redis.string(lhs) + redis.string(rhs)"
"string" "lhs" "string" "rhs"

Translates to Lua code like

redis.register("lua_sum", "string", "string", function(lhs, rhs)
return redis.string(lhs) + redis.string(rhs)
end)

RUN "lua_sum" "foo" "bar"

Translates to

redis.invoke("lua_sum", "foo", "bar")

Here redis.string() fetches a string value from DB (as opposed to, say
hash or set).

redis.register() wraps user's function into a function that does
argument validation and places the result into a table, available to
redis.invoke().

redis.invoke() simply finds a handler in the table by name and does pcall().

If someone is interested, I can write "reference implementations" for
all that stuff (except for actual interaction with Redis code though —
I'm not familiar with that).

There is one problem that will have to be addressed at some point,
regardless of how actual API would look like — Lua strings are
constant, and interning them into VM has a penalty (for hashing). For
small strings that is OK, for really huge strings this will be
noticeable.

This should be addressed by providing a separate API for non-interned
strings (simplest solution — treat them as binary blobs in Lua).

However, I would not recommend to spend a lot of time at this point
addressing this particular issue. I suggest to treat it as an
optimization, for special cases — it has good chances to uglify the
API otherwise.

My 2c,
Alexander.

Javier Guerra Giraldez

unread,
Apr 27, 2011, 11:00:53 PM4/27/11
to redi...@googlegroups.com
On Wed, Apr 27, 2011 at 9:27 PM, Jak Sprats <jaks...@gmail.com> wrote:
> Would this pass the contents of <script> to the lua interpreter, if
> so, this is a sub-optimal solution.

it's optimizable if the Lua state doesn't start empty, but with a
small 'framework', something like this:

------ lua code --------
local funcs = {}

local function exec(code, ...)
local f = funcs[code] or loadstring(code)
funcs[code] = f

return exec(yield(f(...)))
end

return coroutine.wrap (exec)
---------------------------

this is very rough; but the idea is that the Lua code spins in an
infinite loop (because of the unguarded recursion in 'exec'), but
being a coroutine, it's called by the C core with lua_resume().

the advantages are:

- the included code could use loadstring() or even require()... or
anything to get more code without having to transfer on every command
invocation.

- it's easy to keep compiled code in the Lua state. (but it should be
a weak table to allow garbage collection of old functions, or maybe
some manual expiration)

- since the Lua code is long running and not 'short called' (as
Alexander points), LuaJIT optimizations kick in, boosting performance
even more.

--
Javier

Javier Guerra Giraldez

unread,
Apr 27, 2011, 11:03:32 PM4/27/11
to redi...@googlegroups.com
On Wed, Apr 27, 2011 at 9:56 PM, Alexander Gladysh <agla...@gmail.com> wrote:
> There is one problem that will have to be addressed at some point,
> regardless of how actual API would look like — Lua strings are
> constant, and interning them into VM has a penalty (for hashing). For
> small strings that is OK, for really huge strings this will be
> noticeable.

Lua's string hashing is n-limited. for long strings it doesn't use
every byte to calculate the hash, so it converges to constant time

--
Javier

Alexander Gladysh

unread,
Apr 27, 2011, 11:05:55 PM4/27/11
to redi...@googlegroups.com
On Thu, Apr 28, 2011 at 07:00, Javier Guerra Giraldez
<jav...@guerrag.com> wrote:
> On Wed, Apr 27, 2011 at 9:27 PM, Jak Sprats <jaks...@gmail.com> wrote:

> it's optimizable if the Lua state doesn't start empty, but with a
> small 'framework', something like this:

<...>

> this is very rough; but the idea is that the Lua code spins in an
> infinite loop (because of the unguarded recursion in 'exec'), but
> being a coroutine, it's called by the C core with lua_resume().

> the advantages are:

> - the included code could use loadstring() or even require()... or
> anything to get more code without having to transfer on every command
> invocation.

> - it's easy to keep compiled code in the Lua state.  (but it should be
> a weak table to allow garbage collection of old functions, or maybe
> some manual expiration)

> - since the Lua code is long running and not 'short called' (as
> Alexander points), LuaJIT optimizations kick in, boosting performance
> even more.

You still pay for string interning in this case. It may be as fast as
hashing code and storing bytecode in Redis and may be not. Needs
profiling.

And you still have to pay for bandwidth.

And you do not provide good means for functional decomposition and
code reuse in this way. (At least not on client-side.)

I'm not convinced. :-)

Alexander.

Alexander Gladysh

unread,
Apr 27, 2011, 11:10:47 PM4/27/11
to redi...@googlegroups.com

I admit that I never hit this issue personally (despite implementing a
lot of server-side stuff in Lua). But I remember a few posts on Lua ML
from the people who had to counter string interning costs.

Anyway, as I said, it is an optimization and it should be dealt with
when it is time to think about optimizations.

Alexander.

Alexander Gladysh

unread,
Apr 27, 2011, 11:16:17 PM4/27/11
to redi...@googlegroups.com
On Thu, Apr 28, 2011 at 07:10, Alexander Gladysh <agla...@gmail.com> wrote:
> On Thu, Apr 28, 2011 at 07:03, Javier Guerra Giraldez
> <jav...@guerrag.com> wrote:
>> On Wed, Apr 27, 2011 at 9:56 PM, Alexander Gladysh <agla...@gmail.com> wrote:
>>> There is one problem that will have to be addressed at some point,
>>> regardless of how actual API would look like — Lua strings are
>>> constant, and interning them into VM has a penalty (for hashing). For
>>> small strings that is OK, for really huge strings this will be
>>> noticeable.

>> Lua's string hashing is n-limited.  for long strings it doesn't use
>> every byte to calculate the hash, so it converges to constant time

> I admit that I never hit this issue personally (despite implementing a
> lot of server-side stuff in Lua). But I remember a few posts on Lua ML
> from the people who had to counter string interning costs.

...Maybe I misremember and they were bothered not by hashing (or not
only by it), but by copying of memory. OK, let's leave this topic
until it will become relevant.

After all, for huge data sets user will still have the classic Redis
API — it is not that all commands will be rewritten to Lua anytime
soon.

Alexander.

Jak Sprats

unread,
Apr 27, 2011, 11:45:19 PM4/27/11
to Redis DB
>> There is no "source code interpreter" in Lua as you seem to imply here.
There is a compiler and a VM that interprets bytecode instructions,
generated by compiler.

I am bad at names (e.g. "source code interpreter").

What I am trying to say is calling the following in C:
A.) luaL_dostring(Lua, "lua_func(a,b,c);");
is MUCH slower than calling
B.) lua_getfield(Lua, LUA_GLOBALSINDEX, luafunc);
lua_pushinteger(Lua, a);
lua_pushlstring(Lua, b, b.len);
lua_pushinteger(Lua, c);
lua_call(Lua, 3, 0);

A.) requires runtime compilation into bytecode (thats the right
name :)
B.) puts some stuff on the Lua stack and executes it

combining B w/ LuaJit2 & pre-defined functions is superior in terms of
performance (avoids [redundant] compilation every EVAL command).

The correct API is another issue.

On Apr 27, 8:56 pm, Alexander Gladysh <aglad...@gmail.com> wrote:

Javier Guerra Giraldez

unread,
Apr 28, 2011, 12:07:35 AM4/28/11
to redi...@googlegroups.com
On Wed, Apr 27, 2011 at 10:45 PM, Jak Sprats <jaks...@gmail.com> wrote:
> A.) requires runtime compilation into bytecode (thats the right
> name :)
> B.) puts some stuff on the Lua stack and executes it

that's understood, and what i understood when Salvatore writes "Redis
will try to be smart enough to reuse an interpreter with the command
defined".

also note that it's what my 'framework' does, but in Lua instead of
doing it in C.

> combining B w/ LuaJit2 & pre-defined functions is superior in terms of
> performance (avoids [redundant] compilation every EVAL command).

Alexander point is that LuaJIT2 has three parts:

- the bytecode compiler. (similar to plain Lua's, but a different bytecode).

- the bytecode interpreter (usually faster than plain Lua's but not
the best possible).

- the JIT itself, which translates to machine language (where the magic lies).

all of us agree that it's wasteful to avoid repeatedly pass through
the first part; but the last part, being a JIT, will only kick in when
it detects a 'hot path'. typically when you have C code that calls
small Lua functions, the JIT is never 'hot' enough, and the Lua code
stays interpreted.

The workaround recommended by Mike Pall (LuaJIT author) is to either
turn the problem inside out (embed the C code in Lua instead of
embedding Lua in the C application); or do the coroutine control
inversion trick, where both sides (Lua and C) think they're the one
driving the execution.

--
Javier

Jak Sprats

unread,
Apr 28, 2011, 12:41:49 AM4/28/11
to Redis DB
>>but the last part, being a JIT, will only kick in when it detects a 'hot path'. typically when you have C code that calls small Lua functions, the JIT is never 'hot' enough, and the Lua code stays interpreted.

My understanding was, if you have a lua function named "lfunc",
luajit2 will turn it into machine-code just before it runs it(JIT-
compile), and then the next time you call "lfunc", luajit2 is smart
enough to reuse the machine-code, and this is how luajit2 gets its
fastest interpreted language in the world moniker.

If this is right, then predefined lua functions will be turned into
machine-code, it's a wonderful state-of-affairs.

This is how Java's JIT works (AFAIK) ... maybe I am just naive here
(any links educating me would be appreciated).

On Apr 27, 10:07 pm, Javier Guerra Giraldez <jav...@guerrag.com>
wrote:

Alexander Gladysh

unread,
Apr 28, 2011, 1:16:05 AM4/28/11
to redi...@googlegroups.com
On Thu, Apr 28, 2011 at 08:41, Jak Sprats <jaks...@gmail.com> wrote:
>>>but the last part, being a JIT, will only kick in when it detects a 'hot path'.  typically when you have C code that calls small Lua functions, the JIT is never 'hot' enough, and the Lua code stays interpreted.

> My understanding was, if you have a lua function named "lfunc",
> luajit2 will turn it into machine-code just before it runs it(JIT-
> compile), and then the next time you call "lfunc", luajit2 is smart
> enough to reuse the machine-code, and this is how luajit2 gets its
> fastest interpreted language in the world moniker.

LuaJIT compiles only hot paths. The bytecode must run several times to
be compiled to machine code.

Alexander.

Jak Sprats

unread,
Apr 28, 2011, 1:43:03 AM4/28/11
to Redis DB
>> LuaJIT compiles only hot paths. The bytecode must run several times to be compiled to machine code.

In practice, because redis is and will remain single threaded, meaning
EVERY LUA CALL WILL BLOCK THE SERVER DURING ITS ENTIRE EXECUTION, lua
functions should be short running and mostly simple functions (which
implies a pretty small code tree), which also means people are going
to be calling relatively few (10s-100s, not 10000s) of functions, and
calling them often.

Isn't this pattern of function calls, perfect for luajit2?

On Apr 27, 11:16 pm, Alexander Gladysh <aglad...@gmail.com> wrote:

Didier Spezia

unread,
Apr 28, 2011, 7:11:58 AM4/28/11
to Redis DB
Hi,

actually, luajit is a trace compiler, so it tends to optimize
(i.e. compile) hot loops rather than functions, as explained
by Javier and Alexander. Here is a little experiment:

function foo(a)
return (a+10)*2
end
x = 0
for i = 1,60 do
x = x + foo(1)
end
print (x)

By default, luajit wait for the 56th iteration of this loop
before deciding it has to optimize it. You can see this
by running this with luajit -jdump:

> ./luajit -jdump ex1.lua
---- TRACE 1 start ex1.lua:2
0001 ADDVN 1 0 0 ; 10
0002 MULVN 1 1 1 ; 2
0003 RET1 1 2
---- TRACE 1 abort ex1.lua:3 -- leaving loop in root trace

---- TRACE 1 start ex1.lua:7
0009 GGET 4 2 ; "x"
0010 GGET 5 1 ; "foo"
0011 KSHORT 6 1
0012 CALL 5 2 2
0000 . FUNCF 2 ; ex1.lua:2
0001 . ADDVN 1 0 0 ; 10
0002 . MULVN 1 1 1 ; 2
0003 . RET1 1 2
0013 ADDVV 4 4 5
0014 GSET 4 2 ; "x"
0015 FORL 0 => 0009
---- TRACE 1 IR
0001 int SLOAD #1 CI
0002 fun SLOAD #0 R
0003 tab FLOAD 0002 func.env
0004 int FLOAD 0003 tab.hmask
0005 > int EQ 0004 +63
0006 p32 FLOAD 0003 tab.node
0007 > p32 HREFK 0006 "x" @33
0008 > num HLOAD 0007
0009 > p32 HREFK 0006 "foo" @34
0010 > fun HLOAD 0009
0011 > fun EQ 0010 ex1.lua:2
0012 + num ADD 0008 +22
0013 num HSTORE 0007 0012
0014 + int ADD 0001 +1
0015 > int LE 0014 +60
0016 ------ LOOP ------------
0017 + num ADD 0012 +22
0018 num HSTORE 0007 0017
0019 + int ADD 0014 +1
0020 > int LE 0019 +60
0021 int PHI 0014 0019
0022 num PHI 0012 0017
---- TRACE 1 mcode 182
394cff3f mov dword [0x40eb14a0], 0x1
394cff4a movlpd xmm0, [0x40ecef88]
394cff53 cvtsd2si ebp, [rdx]
394cff57 mov ebx, [rdx-0x8]
394cff5a mov edx, [rbx+0x8]
394cff5d cmp dword [rdx+0x1c], +0x3f
394cff61 jnz 0x394c0010 ->0
394cff67 mov ecx, [rdx+0x14]
394cff6a mov rdi, 0xfffffffb40eba958
394cff74 cmp rdi, [rcx+0x320]
394cff7b jnz 0x394c0010 ->0
394cff81 lea eax, [rcx+0x318]
394cff87 cmp dword [rax+0x4], 0xfffeffff
394cff8e jnb 0x394c0010 ->0
394cff94 movlpd xmm7, [rax]
394cff98 mov rdi, 0xfffffffb40eb2810
394cffa2 cmp rdi, [rcx+0x338]
394cffa9 jnz 0x394c0010 ->0
394cffaf cmp dword [rcx+0x334], -0x09
394cffb6 jnz 0x394c0010 ->0
394cffbc cmp dword [rcx+0x330], 0x40ebd948
394cffc6 jnz 0x394c0010 ->0
394cffcc addsd xmm7, xmm0
394cffd0 movsd [rax], xmm7
394cffd4 add ebp, +0x01
394cffd7 cmp ebp, +0x3c
394cffda jg 0x394c0014 ->1
->LOOP:
394cffe0 addsd xmm7, xmm0
394cffe4 movsd [rax], xmm7
394cffe8 add ebp, +0x01
394cffeb cmp ebp, +0x3c
394cffee jle 0x394cffe0 ->LOOP
394cfff0 jmp 0x394c001c ->3
---- TRACE 1 stop -> loop

1320

Now, let's replace the loop by some repeated calls:

function foo(a)
return (a+10)*2
end
x = 0
x = x + foo(1)
... repeated 60 times ...
x = x + foo(1)
print (x)

Now the result is:

dspezia@stormbringer:~/dev/luajit/test> ./luajit -jdump ex2.lua
---- TRACE 1 start ex2.lua:2
0001 ADDVN 1 0 0 ; 10
0002 MULVN 1 1 1 ; 2
0003 RET1 1 2
---- TRACE 1 abort ex2.lua:3 -- leaving loop in root trace

1320

So luajit has detected that function foo is called multiple
times, but since the call is not in a loop, the JIT is not
triggered.

AFAIK, there is no way to enforce the compilation of a
specific function with luajit. The code must be in a loop
to be compiled. That's why Javier proposed the coroutine
control inversion trick.

That said, my understanding is the regular lua
implementation will be used rather than luajit.

The interpreter will be called from Redis. Most of the
time, it will call C functions to access Redis data. So
the time spent in the lua interpreter itself will be
negligible.

IMO, there is not much benefit in embedding luajit here
(and it is bad for portability).

Regards,
Didier.

Salvatore Sanfilippo

unread,
Apr 28, 2011, 7:17:39 AM4/28/11
to redi...@googlegroups.com
On Thu, Apr 28, 2011 at 1:11 PM, Didier Spezia <didi...@gmail.com> wrote:
> the time spent in the lua interpreter itself will be
> negligible.

Half of this thread can be summarized on the above statement ;)
Basically Lua speed is more than appropriate, especially at this stage
it does not matter at all.

What matters is the API and how to bind Redis to Lua, but I think I've
some good solution for that... see you soon with a topic branch
implementing this! :)

Salvatore

Jak Sprats

unread,
Apr 28, 2011, 2:48:47 PM4/28/11
to Redis DB
Hi Didier,

thanks for the example ... My understanding of when something got
jitted was based largely on assumption .... your example helped me get
what Alexander & Javier were saying ... it just seems wrong to me that
luajit could detect a function that keeps getting called and not JIT
it, but I am sure there are very good reasons for this.

that said, anyone who does want to try out luajit, it will be VERY
simple to tweak the makefile and git clone the repo.

in all my experience switching back and forth from lua and luajit w/
embedded lua in alchemy, sometimes there has been no speed up and
sometimes, something like a 8-20% speedup (I still dont understand why
8-20%). This was always for pretty short/simple functions.

- Jak

Jak Sprats

unread,
Apr 28, 2011, 2:50:16 PM4/28/11
to Redis DB
>> What matters is the API and how to bind Redis to Lua

yeah, getting that right is tricky ... I hope you nail it :)

On Apr 28, 5:17 am, Salvatore Sanfilippo <anti...@gmail.com> wrote:

Pierre Chapuis

unread,
Apr 29, 2011, 4:58:13 AM4/29/11
to redi...@googlegroups.com
On Thu, 28 Apr 2011 13:17:39 +0200, Salvatore Sanfilippo wrote:
> On Thu, Apr 28, 2011 at 1:11 PM, Didier Spezia <didi...@gmail.com>
> wrote:
>> the time spent in the lua interpreter itself will be
>> negligible.
>
> Half of this thread can be summarized on the above statement ;)
> Basically Lua speed is more than appropriate, especially at this
> stage
> it does not matter at all.
>
> What matters is the API and how to bind Redis to Lua, but I think
> I've
> some good solution for that... see you soon with a topic branch
> implementing this! :)

+1, we shouldn't care about the speed of the Lua interpreter here.
Plain
old Lua (not JIT) will be fast enough, and even if it isn't it will
still
be possible to replace it with LuaJIT later on.

What we do care about for now is:

* The API
* The overhead of calling a Lua function

Because of the second point, an implementation that creates a new Lua
state at each call will probably not work, and it will also probably be
necessary to cache functions one way or another (several ways to do
this have been given in previous posts).

The way I see it, a typical use case will be that a small number of
functions will be called a large number of times. Writing a
user-defined procedure for a one-shot call is usually not worth it.
Because of this, I was in favor of DEFINE / RUN, but EVAL with caching
is almost the same thing so it works for me too.

--
Pierre Chapuis

Didier Spezia

unread,
Apr 29, 2011, 8:34:21 AM4/29/11
to Redis DB

Redis can create a VM (i.e. a state in Lua terminology) per query,
per client session, or only one. Per query VMs would be probably
too expensive. Most data servers (SQL or NOSQL) implement
VMs per client session (RDBMS with stored procedures,
mongodb, etc ...) to provide client isolation.

Tokyo Tyrant and Kyoto Tycoon define one Lua state per thread
(and queries are dispatched on a pool of threads).

For Redis, we could imagine just one VM sharing its state
between all clients. It means Lua functions, modules,
global objects and variables, etc ... would be shared.
It is lighter, has some benefits but some drawbacks as
well.

It has consequence on the API: DEFINE/RUN would not be really
possible with this solution, except if Redis generates Lua
function identifiers, which are later used by the client
to reference them. RUN with a shared pool of Lua functions
indexed by the hash of the Lua code is still possible.
(Oracle does this for the SQL code).

IMO, the choice of the isolation level should be one of the
primary design decision.

Regarding the API, there are plenty of things to consider ...

- the client API (i.e. how to call Lua code from the client
to be executed on the server). How to cache/store Lua code?
Are stored procedures considered? Are parameter placeholders
considered (?, $1, :foo, whatever ...) in the Lua code?
Will a Lua initialization file be loaded at startup time
to define functions, global objects, etc ...?
Can some Lua code be called in existing commands (for instance
SORT or ZUNIONSTORE) to provide further mutation/aggregation
capabilities? What about a Lua based map/reduce mechanism
(a la mongodb/Tokyo Tyrant) to process results of sort
or range based queries? Does it make sense for Redis?
(I'm not convinced).

- the server API (i.e. how to access Redis data or run Redis
command from the Lua code?). How to stream results built by
the Lua code to the client? Should the results of Redis
commands run from the Lua code be converted systematically
to Lua objects? Lua objects are garbage collected, and
the Lua garbage collector will block the Redis event loop,
so it is perhaps not a good idea ... But in that case,
how can the Lua code access large query results?

It would be wise to study options that have been chosen
in other projects (Tyrant, Tycoon, Alchemy, PostgreSQL
for Lua, mongodb for Javascript, etc ...) before making
decisions ...

Regards,
Didier.


On Apr 29, 10:58 am, Pierre Chapuis <catw...@archlinux.us> wrote:
>  On Thu, 28 Apr 2011 13:17:39 +0200, Salvatore Sanfilippo wrote:
>
> > On Thu, Apr 28, 2011 at 1:11 PM, Didier Spezia <didier...@gmail.com>

Alexander Gladysh

unread,
Apr 29, 2011, 10:03:18 AM4/29/11
to redi...@googlegroups.com
On Fri, Apr 29, 2011 at 16:34, Didier Spezia <didi...@gmail.com> wrote:

> Redis can create a VM (i.e. a state in Lua terminology) per query,
> per client session, or only one. Per query VMs would be probably
> too expensive. Most data servers (SQL or NOSQL) implement
> VMs per client session (RDBMS with stored procedures,
> mongodb, etc ...) to provide client isolation.

Lua states are really cheap as long as you do not have to bind a ton
of dependencies to the state) (at least in classic Lua, in LJ you need
to take hot traces in account, as discussed).

> For Redis, we could imagine just one VM sharing its state
> between all clients. It means Lua functions, modules,
> global objects and variables, etc ... would be shared.
> It is lighter, has some benefits but some drawbacks as
> well.

It is a bad idea to have global objects and variables.

Also, you can set up reasonably isolated sandboxes within a single Lua state.

> - the client API (i.e. how to call Lua code from the client
> to be executed on the server). How to cache/store Lua code?
> Are stored procedures considered? Are parameter placeholders
> considered (?, $1, :foo, whatever ...) in the Lua code?

Why would one ever need parameter placeholders?

> Will a Lua initialization file be loaded at startup time
> to define functions, global objects, etc ...?

Global objects are a bad idea.

> - the server API (i.e. how to access Redis data or run Redis
> command from the Lua code?). How to stream results built by
> the Lua code to the client? Should the results of Redis
> commands run from the Lua code be converted systematically
> to Lua objects? Lua objects are garbage collected, and
> the Lua garbage collector will block the Redis event loop,
> so it is perhaps not a good idea ...

Lua has incremental GC, as long as it is properly configured, it does
not block much. (GC configuration probably should be exposed to user,
optimal values vary, depending on the usage patterns.)

> But in that case,
> how can the Lua code access large query results?

Via userdata objects with iterators and accessor functions.

My 2c,
Alexander.

Javier Guerra Giraldez

unread,
Apr 29, 2011, 10:29:26 AM4/29/11
to redi...@googlegroups.com
On Fri, Apr 29, 2011 at 9:03 AM, Alexander Gladysh <agla...@gmail.com> wrote:
>> But in that case,
>> how can the Lua code access large query results?
>
> Via userdata objects with iterators and accessor functions.

+1

in most of my Lua bindings, the first iteration is done the naïve way,
creating tables and populating with full results; but when the
datasize is potentially large, i typically switch soon to userdata
objects and accessor functions. sometimes then i add a metatable
layer to make those objects behave like lazy tables.

since in most cases the Lua code doesn't use all the data, the savings
are considerable

--
Javier

Reply all
Reply to author
Forward
0 new messages