Storing a Map in Redis

3,214 views
Skip to first unread message

debasish

unread,
Dec 15, 2009, 11:46:32 AM12/15/09
to Redis DB
Hi -

I have just started looking into Redis. I was thinking how to store a
Java/Scala Map in Redis. Storing as individual key value pairs will
not help, since that takes away lots of features like

- spaces in key
- group operations over all keys
- count all keys

... etc. Basically this approach disintegrates the Map collection.

Another approach which I thought was to store as a list of tuples
[(k1, v1), (k2, v2) ...]. But this involves too much overhead in some
of the operations that I want to do. e.g. remove an entry from the map
using its key.

What will be the suggested approach for this in Redis ?

Thanks for the help.
- Debasish

Salvatore Sanfilippo

unread,
Dec 15, 2009, 12:07:37 PM12/15/09
to redi...@googlegroups.com
On Tue, Dec 15, 2009 at 5:46 PM, debasish <ghosh.d...@gmail.com> wrote:

Hello Debasish,

before to reply, please can you comment more on this?

> - group operations over all keys

Thanks.

p.s. a think that may interest you is that the hash type of Redis is
the nearest thing inside the TODO list we have currently. It's going
to be ready for the end of January as 1.4-rc1 (yep the next release of
Redis will be much more incremental, one or two new things for
release. 1.4 is hashes + blpop).

Cheers,
Salvatore

--
Salvatore 'antirez' Sanfilippo
http://invece.org

"Once you have something that grows faster than education grows,
you’re always going to get a pop culture.", Alan Kay

debasish

unread,
Dec 15, 2009, 12:15:16 PM12/15/09
to Redis DB
Hi Salvatore -

Maybe 'group operations' is too much of an ask. You can take that as
sorting on keys. Initially when I was thinking about grouping I was
referring to 'group by' kind of operations. I will have a look at the
upcoming hash in 1.4. Just wondering what would be the best way to do
maps in the current version. Would love to hear your thoughts.

Thanks.
- Debasish

On Dec 15, 10:07 pm, Salvatore Sanfilippo <anti...@gmail.com> wrote:
> On Tue, Dec 15, 2009 at 5:46 PM, debasish <ghosh.debas...@gmail.com> wrote:
>
> Hello Debasish,
>
> before to reply, please can you comment more on this?
>
> > - group operations over all keys
>
> Thanks.
>
> p.s. a think that may interest you is that the hash type of Redis is
> the nearest thing inside the TODO list we have currently. It's going
> to be ready for the end of January as 1.4-rc1 (yep the next release of
> Redis will be much more incremental, one or two new things for
> release. 1.4 is hashes + blpop).
>
> Cheers,
> Salvatore
>
> --
> Salvatore 'antirez' Sanfilippohttp://invece.org

Salvatore Sanfilippo

unread,
Dec 15, 2009, 12:40:27 PM12/15/09
to redi...@googlegroups.com
On Tue, Dec 15, 2009 at 6:15 PM, debasish <ghosh.d...@gmail.com> wrote:
> Hi Salvatore -
>
> Maybe 'group operations' is too much of an ask. You can take that as
> sorting on keys. Initially when I was thinking about grouping I was
> referring to 'group by' kind of operations. I will have a look at the
> upcoming hash in 1.4. Just wondering what would be the best way to do
> maps in the current version. Would love to hear your thoughts.

Ok, got it.

For storing a map into Redis the best approach currently is, excluding
for a moment the problem with spaces in keys, to store every key as a
top level key, with the given value.

You can count the keys with the 'dbsize" command.

To perform range queries against keys, it's better to store the keys
*also* in a sorted set.

But there is the problem with spaces in keys indeed. Actually this is,
in theory, not a problem, since Redis >= 1.1.x supports a new protocol
that can set binary safe keys, and internally the server never had
this limit. But currently we are not using this feature since there is
to upgrade all the clients so it's officially not supported.

So what to do? Use md5 for key names is an option, but this is
documented much better here:

http://code.google.com/p/redis/wiki/IntroductionToRedisDataTypes

But actually I'm not sure about what do you really need. From your
message it looks like you have an huge hash that you want to safe as a
whole in Redis. Or maybe you have many objects that are represented as
Java hashes and you want to store every object into Redis?

If it's the latter, I can't see how spaces in keys are a problem, it's
just a matter of avoiding spaces in field names :)
So it's something like this:

object:100:field => value

and so forth (yes, you need some kind of unique ID for using this approach).

This data model hash interesting features including the ability to use
SORT, Sorted Sets as indexes, and so forth.

Another approach is to use JSON encoded objects:

object:100 => The object as JSON string

This will save a lot of memory, but currently there is no support for
accessing individual fields (planned, see the JSON command in the TODO
list).

Also you can't use SORT with json encoded objects. But you can *still*
use sorted sets as indexes.

Ok sorry if the reply is a bit a mess but I wanted to give some kind
of general background on the matter. If you could describe better your
needs I'll try to reply more in deep.

While we are at it, I could like to ask the list, how do you encode
your objects in Redis? Json? using Ohm? As object:id:fieldname keys?
Thanks!

debasish

unread,
Dec 15, 2009, 1:02:21 PM12/15/09
to Redis DB
Thanks a lot for the detailed response. From the response I think
storing the map as top level key values will make sense. But I need to
keep the following in mind :-

1. md5 of the keys, since I don't have control over the space issue

2. I need to carefully construct the key structure, since they may
clash across maps. There will be multiple maps, but I can scope them
properly if I use a :delimited key structure as you suggested

3. regarding the values I am using a binary serialization of the
object. I can use JSON as well and I have the machinery to access
individual JSON fields - so no problems there. Using JSON will also
save space, I guess, as u mentioned. But, yeah, sorting will be a
problem.

4. In my use case there will be multiple maps and lists. I am thinking
of implementing redis as one of the persistent backing stores for akka
transactors. We already have Cassandra and MongoDB working there. So
all strcutures stored in redis will have an overall transaction scope
and the key namespace can be controlled by a transaction id. But I
really don't have control over the space issue in keys.

Is there anything above that looks a redis anti-pattern ?

Cheers.
- Debasish

On Dec 15, 10:40 pm, Salvatore Sanfilippo <anti...@gmail.com> wrote:
> Salvatore 'antirez' Sanfilippohttp://invece.org

Michel Martens

unread,
Dec 15, 2009, 5:04:01 PM12/15/09
to redi...@googlegroups.com
Hey,

On Tue, Dec 15, 2009 at 3:02 PM, debasish <ghosh.d...@gmail.com> wrote:
> Thanks a lot for the detailed response. From the response I think
> storing the map as top level key values will make sense. But I need to
> keep the following in mind :-
>
> 1. md5 of the keys, since I don't have control over the space issue
>
> 2. I need to carefully construct the key structure, since they may
> clash across maps. There will be multiple maps, but I can scope them
> properly if I use a :delimited key structure as you suggested

You can probably automate this process.

> 3. regarding the values I am using a binary serialization of the
> object. I can use JSON as well and I have the machinery to access
> individual JSON fields - so no problems there. Using JSON will also
> save space, I guess, as u mentioned. But, yeah, sorting will be a
> problem.

I'd say storing serialized objects as values takes you away from the
key/value paradigm.

It would be better, if you can, to translate the object to a hash
representation and store each
key and value pair.

> 4. In my use case there will be multiple maps and lists. I am thinking
> of implementing redis as one of the persistent backing stores for akka
> transactors. We already have Cassandra and MongoDB working there. So
> all strcutures stored in redis will have an overall transaction scope
> and the key namespace can be controlled by a transaction id. But I
> really don't have control over the space issue in keys.

You can check Ohm (http://ohm.keyvalue.org) for inspiration about
serializing an object to a hash representation
and avoiding problems with keys (it runs a base64 on user generated
strings, so no problems with spaces).

--
Michel

Salvatore Sanfilippo

unread,
Dec 15, 2009, 6:06:09 PM12/15/09
to redi...@googlegroups.com
On Tue, Dec 15, 2009 at 7:02 PM, debasish <ghosh.d...@gmail.com> wrote:
> Thanks a lot for the detailed response. From the response I think
> storing the map as top level key values will make sense. But I need to
> keep the following in mind :-
>
> 1. md5 of the keys, since I don't have control over the space issue
>
> 2. I need to carefully construct the key structure, since they may
> clash across maps. There will be multiple maps, but I can scope them
> properly if I use a :delimited key structure as you suggested

Like Michel said already, using base64 (or even md5 or sha1) you can
use a character that can't appear in the base64/md5/sha1 charset in
order to split the different parts of the keys without overlapping
problems.

> 3. regarding the values I am using a binary serialization of the
> object. I can use JSON as well and I have the machinery to access
> individual JSON fields - so no problems there. Using JSON will also
> save space, I guess, as u mentioned. But, yeah, sorting will be a
> problem.

For sorting you still have the sorted sets strategy, that after all is
the way to go when possible.
About JSON, it's all about your dataset, if memory is a *very*
important concern, JSON can save an impressive amount of space, even
if every-key-a-field is a more polite way to store data in theory.

Btw there are is something I want to implement in Redis hashes:
special encoding. The idea born with Sets, there is a plan to add
specially encoded sets that are very space efficient. For instance
when we have sets of integers what are preventing us from storing this
sets into a special very compact form? This is the same for hashes, if
we have an hash with 20 smal fields (both keys and values) like in
most practical usages, what will prevent us from storing this objects
in a special space-efficient way paying just a little CPU hint? And
this is fully transparent from the user point of view.

So soon Redis will get Hashes and Sets that are space efficient while
the semantic is fully opaque. The user will deal with a clean
environment of binary safe keys and strings, while under the hoods
Redis will encode the whole hash in a space efficient way. Given that
will'll do this only for small number of keys and short keys/values,
it will still be an O(1) business asymptotically.

Ok let's return back to the crude fact that we still miss this stuff :)

For now the best thing is: use JSON if you have a lot of objects, and
objects are small, and you can live without SORT but can use sorted
sets. Otherwise use a key for every field of every object.

> 4. In my use case there will be multiple maps and lists. I am thinking
> of implementing redis as one of the persistent backing stores for akka
> transactors. We already have Cassandra and MongoDB working there. So
> all strcutures stored in redis will have an overall transaction scope
> and the key namespace can be controlled by a transaction id. But I
> really don't have control over the space issue in keys.
>
> Is there anything above that looks a redis anti-pattern ?

Not really. It's a bit a strange use case as usually you can control
the key, but given that in this specific context you can't I think
there isn't any anti pattern here.

For any help, I'm here.

Salvatore Sanfilippo

unread,
Dec 15, 2009, 6:27:32 PM12/15/09
to redi...@googlegroups.com
On Wed, Dec 16, 2009 at 12:06 AM, Salvatore Sanfilippo
<ant...@gmail.com> wrote:
> On Tue, Dec 15, 2009 at 7:02 PM, debasish <ghosh.d...@gmail.com> wrote:
>> Thanks a lot for the detailed response. From the response I think

Hello Again,

I was not happy with telling you JSON will use less memory. I want to
show how much less memory, and at the same time, how much memory we
can safe with specially encoded Hashes and Sets. So I wrote the
following simple Ruby program:

require 'rubygems'
require 'redis'
require 'json'

def main(opts={})
obj = {
"name"=>"Salvatore",
"surname"=>"Sanfilippo",
"tel"=>"123456789",
"age"=>"32",
"aka"=>"antirez"
}
r = Redis.new(opts)
(1..10000).each{|id|
if false
obj.each{|k,v|
r["guy:#{id}:#{k}"] = v
}
else
r["guy:#{id}"] = obj.to_json
end
}
end

with "if true" will build 10000 objects where every field is a key:

guy:1:name => Salvatore
guy:1:surname => Sanfilippo
and so forth

with "if false" instead will encode this object with JSON:

guy:1 => JSON encoded fields

How much memory takes the first script and the second?

every-field-a-key: 9841136 (984 bytes per object)
json-encoded: 3671744 (367 bytes per object)

Raw object length: 88 encoded as json, without to consider some space
for the key.

So the difference is pretty impressive indeed. If you wonder why there
is so much overhead between the raw data and even the json object:
there are two Redis Objects (16 bytes * 2) one for key and one for the
value, plus two sds objects for dynamic strings, an overhead for other
16*2 byes (64 already, 88+64 = 152), add to this the hash table
business that may have 1/3 the buckets empty with 10000 keys + the
hash table metadata, and you got the picture.

Sergey Shepelev

unread,
Dec 15, 2009, 6:59:32 PM12/15/09
to redi...@googlegroups.com
Totally incomparable things. That's like saying how much raw text file
take more space than gzipped.

You can atomically change fields with type:id:key scheme.
You can atomically replace whole dict(map/hash) with type:id scheme.

They are meant to be used for different scenarios. If one keeps a
large library of uncompressed books, that's not telling one gain some
advantages, he just does wrong. On the contrary, if one stores a blog
post like {"number-of-comments": 50, "comments": [ {..},...]} and does
some external locking to atomically update list of comments and
number, one doesn't gain advantage of saving space, he just does
wrong.

What to compare here?

> How much memory takes the first script and the second?
>
> every-field-a-key: 9841136 (984 bytes per object)
> json-encoded: 3671744 (367 bytes per object)
>
> Raw object length: 88 encoded as json, without to consider some space
> for the key.
>
> So the difference is pretty impressive indeed. If you wonder why there
> is so much overhead between the raw data and even the json object:
> there are two Redis Objects (16 bytes * 2) one for key and one for the
> value, plus two sds objects for dynamic strings, an overhead for other
> 16*2 byes (64 already, 88+64 = 152), add to this the hash table
> business that may have 1/3 the buckets empty with 10000 keys + the
> hash table metadata, and you got the picture.
>
> Cheers,
> Salvatore
>
> --
> Salvatore 'antirez' Sanfilippo
> http://invece.org
>
> "Once you have something that grows faster than education grows,
> you’re always going to get a pop culture.", Alan Kay
>
> --
>
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>
>

Salvatore Sanfilippo

unread,
Dec 16, 2009, 5:08:24 AM12/16/09
to redi...@googlegroups.com
On Wed, Dec 16, 2009 at 12:59 AM, Sergey Shepelev <tem...@gmail.com> wrote:

> They are meant to be used for different scenarios. If one keeps a
> large library of uncompressed books, that's not telling one gain some
> advantages, he just does wrong. On the contrary, if one stores a blog
> post like {"number-of-comments": 50, "comments": [ {..},...]} and does
> some external locking to atomically update list of comments and
> number, one doesn't gain advantage of saving space, he just does
> wrong.
>
> What to compare here?

Hello Sergey,

indeed you are right, it's like to say that "a,b,c" string is a liked
list, that actually it's not. Also the every-field-one-key approach
also allows to:

- Have expires in single fields
- Use lists, sets, ordered sets in fields
- Use atomic operations like INCR in fields

It's another world. But my goal was not to compare two solutions that
are meant to be the same.
The Redis Wizard must know about this tradeoffs and to the best thing.
Maybe some objects of a given dataset are a very big number, and
small, it's not needed to access them by field or perform atomic
operations on fields, and there are memory constraints. When this
happens object encoding as a single value (being it JSON, Ruby
marshalls, and so forth) are a good option.

Also I wanted to compare for another reason. Storing fields of objects
into hashes is not as powerful as storing fields in keys, but it's a
compromise, and this time Redis will be able to *transparently*
perform the special encoding. I wanted to guess what the memory saving
will look like.

I think that the first Redis pattern is: there are no fixed rules :)
You have a few abstract data types and can you your creativity and
design skills to get the most out of it for the problem you are trying
to solve.

Sergey Shepelev

unread,
Dec 16, 2009, 5:53:12 AM12/16/09
to redi...@googlegroups.com
I see.

> I think that the first Redis pattern is: there are no fixed rules :)
> You have a few abstract data types and can you your creativity and
> design skills to get the most out of it for the problem you are trying
> to solve.
>

Care to write that down in wiki or somewhere?

> Cheers,
> Salvatore
>
> --
> Salvatore 'antirez' Sanfilippo
> http://invece.org
>
> "Once you have something that grows faster than education grows,
> you’re always going to get a pop culture.", Alan Kay
>
Reply all
Reply to author
Forward
0 new messages