like ZCOUNT key min max [OPTION] where option can be for instance SUMSCORES.
We need to know the exact use case to understand if it is something
that can be considered general enough.
Also it will be of great help to get feedbacks from the list about
this, useful to somebody else?
Btw after a proposal like this we wait at least a few months and then
reconsider the issue, in order to avoid adding things in an impulsive
way.
Salvatore
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>
--
Salvatore 'antirez' Sanfilippo
open source developer - VMware
http://invece.org
"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele
Also, the sum of squares and so on...
The only thing that will be able to meet those needs once
and for all is embedding a scripting language in the Redis
server to run user-defined procedures. It has drawbacks
(could hang the server if used incorrectly, adds a couple of
kilobytes to the process) but the advantages are worth it.
--
Pierre Chapuis
Alchemy database (Redis plus SQL-like tables) already includes Lua for this.
--
Javier
Yes I know, I am pondering using it instead of Redis. But I don't like
the
fact that it adds a lot of SQL-like commands. Moreover it's not
necessarily
up to date with the Redis trunk, and afaik it's not primarily
maintained in
Git.
Last thing: I don't like the fact that you have to send a Lua chunk
every
time you run a command, there should be a way to preload them and then
run
them with arguments.
So maybe I'll turn to Alchemy someday, but maybe I'll have a try at
embedding
Lua in Redis myself instead.
right. i was pointing it as food for tought; a possible enhancement
to Redis, maybe as an out-of-tree patch that tries to keep applicable
to current versions of Redis.
> Last thing: I don't like the fact that you have to send a Lua chunk every
> time you run a command, there should be a way to preload them and then run
> them with arguments.
there's a way, but it's somewhat awkward. also the parameter format
isn't very efficient nor appropriate from a Lua point of view. I
modified it a bit and got slightly better results. still not enough
to prompt me to switch from Redis to Alchemy.
Another issue is that the 'thousands of short Lua calls' works great
with standard Lua, but is not optimal for LuaJIT2, so the speedup is
limited. What does work great is a loop in Lua that communicates
with the C core, which means either some more work understanding the
Redis core, or (better IMHO) add some 'side port' to communicate with
a 'helper' thread.... i'd like to use ZMQ for that!
> So maybe I'll turn to Alchemy someday, but maybe I'll have a try at
> embedding
> Lua in Redis myself instead.
that's pretty straightforward in the simple cases, and given the
efficiency of even the standard Lua VM, the speedup over shuffling all
the relevant data to another process would be significant.
--
Javier
Hello Pierre,
It think it's about time to try a branch with Lua scripting added.
Just I want to make sure to do it the right way.
This is what I propose as an API:
EVAL <body> <arg> <arg> <arg> <arg> ...
basically you never define new commands, you pass instead full scripts.
Redis will try to be smart enough to cache Lua interpreters with
defined functions so that it is fast, but I like a lot the idea of
having to pass the script again and again, so everything is in the
application logic.
Another thing I want to understand is how to make this play well with
cluster. This is why I delayed this feature. I want to have cluster
more or less working (it's just a matter of two months at this point).
The only thing required to work in a cluster environment is to use
just a single key in the script (only if you are using Redis Cluster),
at least for now (as multi keys commands will not be supported), and
to specify in some way the positions of the keys arguments, so that we
can handle redirection in the right way, loading in the case of
diskstore (assuming we'll ship this in a stable release at some
point), and so forth.
The problem is, for performance reasons we want to know what arguments
are keys before the script is executed at all...
So we need to modify the script command into something like that:
EVAL <script> <num_keys> <arg> <arg> <arg> <arg> ...
this way we have the second argument of EVAL that is the number of
arguments that are keys (followed by all the other arguments).
Other changes are needed to get all this in the Redis internals, but
it si definitely something not impossible to achieve in short time...
:) I'll have updates asap.
Ciao,
Salvatore
> Hello Pierre,
>
> It think it's about time to try a branch with Lua scripting added.
> Just I want to make sure to do it the right way.
This is extremely good news, hopefully it will allow me to stop
implementing custom commands in C and rely on Lua snippets instead!
> This is what I propose as an API:
>
> EVAL <body> <arg> <arg> <arg> <arg> ...
>
> basically you never define new commands, you pass instead full
> scripts.
> Redis will try to be smart enough to cache Lua interpreters with
> defined functions so that it is fast, but I like a lot the idea of
> having to pass the script again and again, so everything is in the
> application logic.
I was more for a two-step process (DEFINE / RUN) but it would be more
difficult with cluster so I'm OK with your proposal.
> The problem is, for performance reasons we want to know what
> arguments
> are keys before the script is executed at all...
> So we need to modify the script command into something like that:
>
> EVAL <script> <num_keys> <arg> <arg> <arg> <arg> ...
>
> this way we have the second argument of EVAL that is the number of
> arguments that are keys (followed by all the other arguments).
As long as the script can still access keys, that's a good idea. But
it should still be possible to do things like (stupid example):
total_number_of_posts = function(users_set)
n = 0
for _,user_id in ipairs(redis.smembers(users_set)) do
n = n + redis.hget("user:" .. user_id).number_of_posts
end
return n
end
--
Pierre Chapuis
http://antirez.com/post/redis-and-scripting.html
Cheers,
Salvatore
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/redis-db?hl=en.
>
>
--
There is no "source code interpreter" in Lua as you seem to imply here.
There is a compiler and a VM that interprets bytecode instructions,
generated by compiler.
That being said, suggested approach, when Lua source code is
transferred for each command, is in my humble opinion, indeed
suboptimal.
You pay for bandwidth and you pay for source code compilation.
And if you decide to use LuaJIT2 instead of classic Lua VM (and
believe me, you will), you will pay even more — if code would be
compiled and executed only once it will not ever be compiled into
machine code.
DEFINE/RUN separation makes much more sense to me.
And you may keep RUN to be the same as current EXEC — i.e. allow
arbitrary code as the first argument, just let user to DEFINE code
that it want to be cached between invocations.
You may also want to let users to DEFINE some code offline, to be
loaded at start. This will be requested by some users, I guess.
I would suggest following protocol:
DEFINE <name> <string> <arg_type1> <arg_name1> ... <arg_typeN> <arg_nameN>
RUN <string> <arg1> ... <argN>
Example:
DEFINE "lua_sum" "return redis.string(lhs) + redis.string(rhs)"
"string" "lhs" "string" "rhs"
Translates to Lua code like
redis.register("lua_sum", "string", "string", function(lhs, rhs)
return redis.string(lhs) + redis.string(rhs)
end)
RUN "lua_sum" "foo" "bar"
Translates to
redis.invoke("lua_sum", "foo", "bar")
Here redis.string() fetches a string value from DB (as opposed to, say
hash or set).
redis.register() wraps user's function into a function that does
argument validation and places the result into a table, available to
redis.invoke().
redis.invoke() simply finds a handler in the table by name and does pcall().
If someone is interested, I can write "reference implementations" for
all that stuff (except for actual interaction with Redis code though —
I'm not familiar with that).
There is one problem that will have to be addressed at some point,
regardless of how actual API would look like — Lua strings are
constant, and interning them into VM has a penalty (for hashing). For
small strings that is OK, for really huge strings this will be
noticeable.
This should be addressed by providing a separate API for non-interned
strings (simplest solution — treat them as binary blobs in Lua).
However, I would not recommend to spend a lot of time at this point
addressing this particular issue. I suggest to treat it as an
optimization, for special cases — it has good chances to uglify the
API otherwise.
My 2c,
Alexander.
it's optimizable if the Lua state doesn't start empty, but with a
small 'framework', something like this:
------ lua code --------
local funcs = {}
local function exec(code, ...)
local f = funcs[code] or loadstring(code)
funcs[code] = f
return exec(yield(f(...)))
end
return coroutine.wrap (exec)
---------------------------
this is very rough; but the idea is that the Lua code spins in an
infinite loop (because of the unguarded recursion in 'exec'), but
being a coroutine, it's called by the C core with lua_resume().
the advantages are:
- the included code could use loadstring() or even require()... or
anything to get more code without having to transfer on every command
invocation.
- it's easy to keep compiled code in the Lua state. (but it should be
a weak table to allow garbage collection of old functions, or maybe
some manual expiration)
- since the Lua code is long running and not 'short called' (as
Alexander points), LuaJIT optimizations kick in, boosting performance
even more.
--
Javier
Lua's string hashing is n-limited. for long strings it doesn't use
every byte to calculate the hash, so it converges to constant time
--
Javier
> it's optimizable if the Lua state doesn't start empty, but with a
> small 'framework', something like this:
<...>
> this is very rough; but the idea is that the Lua code spins in an
> infinite loop (because of the unguarded recursion in 'exec'), but
> being a coroutine, it's called by the C core with lua_resume().
> the advantages are:
> - the included code could use loadstring() or even require()... or
> anything to get more code without having to transfer on every command
> invocation.
> - it's easy to keep compiled code in the Lua state. (but it should be
> a weak table to allow garbage collection of old functions, or maybe
> some manual expiration)
> - since the Lua code is long running and not 'short called' (as
> Alexander points), LuaJIT optimizations kick in, boosting performance
> even more.
You still pay for string interning in this case. It may be as fast as
hashing code and storing bytecode in Redis and may be not. Needs
profiling.
And you still have to pay for bandwidth.
And you do not provide good means for functional decomposition and
code reuse in this way. (At least not on client-side.)
I'm not convinced. :-)
Alexander.
I admit that I never hit this issue personally (despite implementing a
lot of server-side stuff in Lua). But I remember a few posts on Lua ML
from the people who had to counter string interning costs.
Anyway, as I said, it is an optimization and it should be dealt with
when it is time to think about optimizations.
Alexander.
>> Lua's string hashing is n-limited. for long strings it doesn't use
>> every byte to calculate the hash, so it converges to constant time
> I admit that I never hit this issue personally (despite implementing a
> lot of server-side stuff in Lua). But I remember a few posts on Lua ML
> from the people who had to counter string interning costs.
...Maybe I misremember and they were bothered not by hashing (or not
only by it), but by copying of memory. OK, let's leave this topic
until it will become relevant.
After all, for huge data sets user will still have the classic Redis
API — it is not that all commands will be rewritten to Lua anytime
soon.
Alexander.
that's understood, and what i understood when Salvatore writes "Redis
will try to be smart enough to reuse an interpreter with the command
defined".
also note that it's what my 'framework' does, but in Lua instead of
doing it in C.
> combining B w/ LuaJit2 & pre-defined functions is superior in terms of
> performance (avoids [redundant] compilation every EVAL command).
Alexander point is that LuaJIT2 has three parts:
- the bytecode compiler. (similar to plain Lua's, but a different bytecode).
- the bytecode interpreter (usually faster than plain Lua's but not
the best possible).
- the JIT itself, which translates to machine language (where the magic lies).
all of us agree that it's wasteful to avoid repeatedly pass through
the first part; but the last part, being a JIT, will only kick in when
it detects a 'hot path'. typically when you have C code that calls
small Lua functions, the JIT is never 'hot' enough, and the Lua code
stays interpreted.
The workaround recommended by Mike Pall (LuaJIT author) is to either
turn the problem inside out (embed the C code in Lua instead of
embedding Lua in the C application); or do the coroutine control
inversion trick, where both sides (Lua and C) think they're the one
driving the execution.
--
Javier
> My understanding was, if you have a lua function named "lfunc",
> luajit2 will turn it into machine-code just before it runs it(JIT-
> compile), and then the next time you call "lfunc", luajit2 is smart
> enough to reuse the machine-code, and this is how luajit2 gets its
> fastest interpreted language in the world moniker.
LuaJIT compiles only hot paths. The bytecode must run several times to
be compiled to machine code.
Alexander.
Half of this thread can be summarized on the above statement ;)
Basically Lua speed is more than appropriate, especially at this stage
it does not matter at all.
What matters is the API and how to bind Redis to Lua, but I think I've
some good solution for that... see you soon with a topic branch
implementing this! :)
Salvatore
+1, we shouldn't care about the speed of the Lua interpreter here.
Plain
old Lua (not JIT) will be fast enough, and even if it isn't it will
still
be possible to replace it with LuaJIT later on.
What we do care about for now is:
* The API
* The overhead of calling a Lua function
Because of the second point, an implementation that creates a new Lua
state at each call will probably not work, and it will also probably be
necessary to cache functions one way or another (several ways to do
this have been given in previous posts).
The way I see it, a typical use case will be that a small number of
functions will be called a large number of times. Writing a
user-defined procedure for a one-shot call is usually not worth it.
Because of this, I was in favor of DEFINE / RUN, but EVAL with caching
is almost the same thing so it works for me too.
--
Pierre Chapuis
> Redis can create a VM (i.e. a state in Lua terminology) per query,
> per client session, or only one. Per query VMs would be probably
> too expensive. Most data servers (SQL or NOSQL) implement
> VMs per client session (RDBMS with stored procedures,
> mongodb, etc ...) to provide client isolation.
Lua states are really cheap as long as you do not have to bind a ton
of dependencies to the state) (at least in classic Lua, in LJ you need
to take hot traces in account, as discussed).
> For Redis, we could imagine just one VM sharing its state
> between all clients. It means Lua functions, modules,
> global objects and variables, etc ... would be shared.
> It is lighter, has some benefits but some drawbacks as
> well.
It is a bad idea to have global objects and variables.
Also, you can set up reasonably isolated sandboxes within a single Lua state.
> - the client API (i.e. how to call Lua code from the client
> to be executed on the server). How to cache/store Lua code?
> Are stored procedures considered? Are parameter placeholders
> considered (?, $1, :foo, whatever ...) in the Lua code?
Why would one ever need parameter placeholders?
> Will a Lua initialization file be loaded at startup time
> to define functions, global objects, etc ...?
Global objects are a bad idea.
> - the server API (i.e. how to access Redis data or run Redis
> command from the Lua code?). How to stream results built by
> the Lua code to the client? Should the results of Redis
> commands run from the Lua code be converted systematically
> to Lua objects? Lua objects are garbage collected, and
> the Lua garbage collector will block the Redis event loop,
> so it is perhaps not a good idea ...
Lua has incremental GC, as long as it is properly configured, it does
not block much. (GC configuration probably should be exposed to user,
optimal values vary, depending on the usage patterns.)
> But in that case,
> how can the Lua code access large query results?
Via userdata objects with iterators and accessor functions.
My 2c,
Alexander.
+1
in most of my Lua bindings, the first iteration is done the naïve way,
creating tables and populating with full results; but when the
datasize is potentially large, i typically switch soon to userdata
objects and accessor functions. sometimes then i add a metatable
layer to make those objects behave like lazy tables.
since in most cases the Lua code doesn't use all the data, the savings
are considerable
--
Javier