About two different designs of the Redis hashes API

36 views
Skip to first unread message

Salvatore Sanfilippo

unread,
Sep 16, 2009, 5:49:45 AM9/16/09
to Redis DB
Hello Redis list,

because I need this feature for a paywork thing I'm going to implement
hashes ad Redis data type. Basically every key can contain a
dictionary itself, and this makes possible to do a lot of exciting
things that Redis currently does not do very well, for instance,
counting items, or creating maps that associate one string to another.

For instance imagine to have a social network where every user can
pick an alias for his friends. I want to call Wozniak as "wozzy", but
every user can select a different alias for the same user, in theory.

So I want that the key userdata:<myuserid>:aliases contains a map
between the real nicks and my aliases. Currently there is no easy, and
efficient way to model this on Redis. So I'm going to implement this
feature in few days starting from today. There is a design problem, as
usually. If I implement an API like this:

HSET mykey key value
KGET mykey key (will return "value")

This is the obvious thing to do, but there is a problem. All the
commands that we already have, operating on keys, can't be used
against keys inside our new hashes. Instead the following API will
allow this:

SET mykey>key value
GET mykey>key
INCR mykey>key
<almost every command> mykey>key

This is powerful, right? But, at a cost. Hash keys aren't binary safe
this way, can't contain spaces or newlines. Or we should resort to
quoting, that's not a good idea at all because the slowdown is huge.
This is why I'm going to use the first interface... this will not
allow initially to have all the goodies like lists inside hash values,
but in the future we may escape this limitations using something like
this:

HOP <mykey> INCR <key>

and so on. HOP is a special command that executes the following
command against an hash. This will not be available for some time
since the on-disk dumping engine is not smart enough to save hashes of
hashes of lists of values or other complex data types for now.

Another issue is... the following

HSET mykey <key> <value>

in order for key and value to be both binary safe strings... we need
to switch the Redis protocol. At the same time I want to guaranteed
backward compatibility with client libraries. So this is my solution.
The Redis server will accept multi-bulk as input. for every kind of
command. So instead to write:

INCR mykey

I can send the following to the server:

*2
$4
INCR
$4
mykey

Client libraries that will switch to the new protocol will expoit the
full power of hashes. Other client libraries will continue to work,
with the limitation that hash keys will continue to be simple strings
without spaces inside.

Please: if you think this design is wrong, this is the right time to tell me.

This is the initial set of commands of hashes.

HSET key value
HGET key
HDEL key
HEXISTS key
HINCR key
HDECR key
HKEYS key (return all the keys as a list)
HVALUES key (return all the values as a list)
HMEMBERS key (return all the key/value pairs, as a list)

Cheers,
Salvatore

--
Salvatore 'antirez' Sanfilippo
http://invece.org

"Once you have something that grows faster than education grows,
you’re always going to get a pop culture.", Alan Kay

Jaime Medrano

unread,
Sep 16, 2009, 6:43:36 AM9/16/09
to redi...@googlegroups.com
Salvatore Sanfilippo wrote:
> Hello Redis list,
>
> because I need this feature for a paywork thing I'm going to implement
> hashes ad Redis data type. Basically every key can contain a
> dictionary itself, and this makes possible to do a lot of exciting
> things that Redis currently does not do very well, for instance,
> counting items, or creating maps that associate one string to another.
>
>

I find it very useful!

> This is the initial set of commands of hashes.
>
> HSET key value
> HGET key
> HDEL key
> HEXISTS key
> HINCR key
> HDECR key
> HKEYS key (return all the keys as a list)
> HVALUES key (return all the values as a list)
> HMEMBERS key (return all the key/value pairs, as a list)
>
>

I'd love to have a HMGET command that behaves like MGET, returning
several values.

I miss also an HLEN command for counting the number of values.

Regards,
Jaime.

Michel Martens

unread,
Sep 16, 2009, 8:49:30 AM9/16/09
to redi...@googlegroups.com
Hey,

On Wed, Sep 16, 2009 at 6:49 AM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
>
> Hello Redis list,
>
> because I need this feature for a paywork thing I'm going to implement
> hashes ad Redis data type. Basically every key can contain a
> dictionary itself, and this makes possible to do a lot of exciting
> things that Redis currently does not do very well, for instance,
> counting items, or creating maps that associate one string to another.
>
> For instance imagine to have a social network where every user can
> pick an alias for his friends. I want to call Wozniak as "wozzy", but
> every user can select a different alias for the same user, in theory.
>
> So I want that the key userdata:<myuserid>:aliases contains a map
> between the real nicks and my aliases. Currently there is no easy, and
> efficient way to model this on Redis. So I'm going to implement this
> feature in few days starting from today. There is a design problem, as
> usually. If I implement an API like this:
>

Can we elaborate a bit about other approaches? What solution did you
find with the current feature set?

I don't like the feature per se, but if it's the only way to model a
problem then let's go for it. What I'm thinking right now
is that unless I face the same limitations you faced, I won't be able
to grasp the problem entirely. I will start thinking on my own
about how to best model it, but if you can share your thoughts that
would be great.

--
Michel

Joe Edelman

unread,
Sep 16, 2009, 9:41:49 AM9/16/09
to redi...@googlegroups.com
I'm currently doing something similar by building a compound key,
MD5ing the inner key and using something like <object-type>.<object-
id>.<field_name>.<md5(inner_key)> as the redis key. Then I use a set
called <object-type>.<object-id>.<field_name>.KEYS to hold all the
keys, so that it's possible to enumerate and delete them.

I got the idea from Ohm (http://ohm.keyvalue.org/).

So I'd be interested to know why that kind of approach doesn't work
for your problem domain.

That said, I think at least a two-level hashtable would serve Redis in
the long run. But I'm not sure if the HOP thing is the way to do it.
It seems to me that both the SELECT, multi-database feature, and the
HOP thing you are talking about, are hack-ish ways of doing something
that could be a little more elegant.

--
J.E. // nxhx.org // 413.570.0001

Salvatore Sanfilippo

unread,
Sep 17, 2009, 6:27:06 AM9/17/09
to redi...@googlegroups.com
On Wed, Sep 16, 2009 at 2:49 PM, Michel Martens <sov...@gmail.com> wrote:

> Can we elaborate a bit about other approaches? What solution did you
> find with the current feature set?

If you really need a map, with the current solution it's possible to
use different tricks. After all Redis exports an hash table, so it's
possible to use top-level keys:

SET mymap:userid:<key> <value>
SET mymap:userid:<otherkey> <value>

and so on. There are different problems with this approach:

1) No support for data containing spaces, and in general for binary
data. It is still possible to use an hash function like SHA1 or MD5 in
order to avoid this limitation, at the cost of computation speed.
2) No way to retrieve only "maymap:userid:*" keys or values, nor to count them.
3) No way to implement a sort opeation on top of this.

> I don't like the feature per se, but if it's the only way to model a
> problem then let's go for it. What I'm thinking right now
> is that unless I face the same limitations you faced, I won't be able
> to grasp the problem entirely. I will start thinking on my own
> about how to best model it, but if you can share your thoughts that
> would be great.

There are tons of trivial problems that can't be addressed well
without this feature. Note that I delayed the addition of hashes as a
data type for months, I like to be sure a feature is not an overkill
because I'm a big fan of simplicity, but it looks like we really need
it.

Even counting word occurrences in different files by different clients
require hashes...
and more importantly hashes where keys are binary safe strings, so the
protocol switch is also very crucial.

Cheers,
Salvatore

> --
> Michel

Salvatore Sanfilippo

unread,
Sep 17, 2009, 7:08:06 AM9/17/09
to redi...@googlegroups.com
On Wed, Sep 16, 2009 at 3:41 PM, Joe Edelman <joe.e...@gmail.com> wrote:
>
> I'm currently doing something similar by building a compound key,
> MD5ing the inner key and using something like <object-type>.<object-
> id>.<field_name>.<md5(inner_key)> as the redis key.  Then I use a set
> called <object-type>.<object-id>.<field_name>.KEYS to hold all the
> keys, so that it's possible to enumerate and delete them.

Yes, I used this approach a lot of times, it works and for large texts
it's the best (memory-wise).

> So I'd be interested to know why that kind of approach doesn't work
> for your problem domain.

You have to perform the MD5 sum of everything. Even if your objects
are 3/4 bytes (numbers mapped to strings) it's required to hash them
into a lot larger hex converted hashes. It's not possible to retrieve
keys at all directly. Otherwise you need another map, from the hash to
the actual value. *A lot* of wasted memory.
There is no way to apply sorting operations. Also locking is hard and
slow, for instance I used this technique some month ago to build a
redis example implementing an inverted index
(http://gist.github.com/120067), in order to avoid using hashes that
are too long I created an unique ID for every different word. How do
you guarantee that there are no race conditions when you get an unique
ID for a given string?

Something like this is required:

def get_unique_id(object,token)
md5 = Digest::MD5.hexdigest(token)
id = R.get("#{object}:#{md5}:id")
return id.to_i if id
id = R.incr("#{object}:next.id")
R.set("#{object}:#{id}:string",token)
if !R.setnx("#{object}:#{md5}:id",id)
# Someone added the new token faster than us.
R.del("#{object}:#{id}:string")
get_token_id(object,token)
else
id.to_i
end
end

Basically, given that Redis is an hash-table per se, everything can be
build in theory, including for instance a BTREE. But... using a lot of
memory and many more operations.

> That said, I think at least a two-level hashtable would serve Redis in
> the long run.  But I'm not sure if the HOP thing is the way to do it.
> It seems to me that both the SELECT, multi-database feature, and the
> HOP thing you are talking about, are hack-ish ways of doing something
> that could be a little more elegant.

The HOP thing is the bit I don't like too, and this is why I'm not
going to implement it for now. I hope a better solution will emerge.

Cheers,
Salvatore

Salvatore Sanfilippo

unread,
Sep 17, 2009, 7:08:44 AM9/17/09
to redi...@googlegroups.com
On Wed, Sep 16, 2009 at 12:43 PM, Jaime Medrano <jmed...@tuenti.com> wrote:

> I'd love to have a HMGET command that behaves like MGET, returning
> several values.
>
> I miss also an HLEN command for counting the number of values.

Indeed all useful additions! Thanks,

Cheers,
Salvatore

> Regards,
> Jaime.
Reply all
Reply to author
Forward
0 new messages