Memory saving ID generation patterns (base 36)

76 views
Skip to first unread message

Aníbal Rojas

unread,
Oct 29, 2009, 3:08:23 PM10/29/09
to Redis DB
Hello,

Following the pattern described in http://code.google.com/p/redis/wiki/TwitterAlikeExample

INCR global:nextUserId => 1000
SET uid:1000:username antirez
SET uid:1000:password p1pp0

This is to avoid using a potentially long identified as ID and
lower the memory required to store the data.

We started generating IDs as consecutive integers. We also handle
a lot of legacy IDs that are large integers.

Something we realized is that were wasting a lot of pace using
Base 10 integers, when we could easily use a TinyURL like approach to
use represent the integer IDs.

Ruby examples:

$ ruby -e "puts 1000.to_s(36)"
rs
$ ruby -e "puts 1000.to_s(36)"
rs
$ ruby -e "puts 1000000.to_s(36)"
lfls
$ ruby -e "puts 9999999.to_s(36)"
5yc1r

This is very simple to accomplish and the memory saving was
_notorious_ (you can do the math) as soon as we modified the data to
follow this pattern.

As I undertand Redis 1.1 will include a Integer optimization
feature, but I suppose it will be only able to handle integers and not
integers surrounded by other letters.

Is there a better approach for this?


--
Aníbal Rojas
Ruby on Rails Web Developer
http://www.google.com/profiles/anibalrojas

Salvatore Sanfilippo

unread,
Oct 29, 2009, 3:26:16 PM10/29/09
to redi...@googlegroups.com
2009/10/29 Aníbal Rojas <aniba...@gmail.com>:

>    Something we realized is that were wasting a lot of pace using
> Base 10 integers, when we could easily use a TinyURL like approach to
> use represent the integer IDs.

Yes to use radix-36 integers can be a nice idea, but if you do some
test you'll discover that actually most memory is used by pointers and
redis objects structures, not for the actual data, so the difference
will be very little, but still better than nothing :)

>    As I undertand Redis 1.1 will include a Integer optimization
> feature, but I suppose it will be only able to handle integers and not
> integers surrounded by other letters.

Indeed this will only work for integers without other letters in the
middle. This can save a lot of space since it's not only a matter of
strings vs integers, but also some metadata is saved (at least 16
bytes per object).

A better approach requires the Hash data type, currently not
available, since instead to do:

user:1000:username it will be possible to take an hash called "Users"
where the keys are 1000 -> "antirez".
This allows to avoid to repeat "user" and to use the integer encoding.
A big win.

Probably we'll need to wait for 1.2 in order to use this.

Cheers,
Salvatore

--
Salvatore 'antirez' Sanfilippo
http://invece.org

"Once you have something that grows faster than education grows,
you’re always going to get a pop culture.", Alan Kay

Aníbal Rojas

unread,
Oct 29, 2009, 5:28:52 PM10/29/09
to redi...@googlegroups.com
Salvatore,

> Yes to use radix-36 integers can be a nice idea, but if you do some
> test you'll discover that actually most memory is used by pointers and
> redis objects structures, not for the actual data, so the difference
> will be very little, but still better than nothing :)

I made the mistake to erase the previous DB so I can't compare now :(

But I am almost sure the difference was not so little, but let me check.

>>    As I undertand Redis 1.1 will include a Integer optimization
>> feature, but I suppose it will be only able to handle integers and not
>> integers surrounded by other letters.

> Indeed this will only work for integers without other letters in the
> middle. This can save a lot of space since it's not only a matter of
> strings vs integers, but also some metadata is saved (at least 16
> bytes per object).

Will find time to build from head and compare both solutions (Not only
with raw keys, but with sets and lists)

> A better approach requires the Hash data type, currently not
> available, since instead to do:
>
> user:1000:username it will be possible to take an hash called "Users"
> where the keys are 1000 -> "antirez".
> This allows to avoid to repeat "user" and to use the integer encoding.
> A big win.

1.2 is fine, ZSET is a way more important ;-)

Best regards,

--
Aníbal

Brian Hammond

unread,
Oct 30, 2009, 4:52:17 PM10/30/09
to Redis DB
Why base 36?

On Oct 29, 3:08 pm, Aníbal Rojas <anibalro...@gmail.com> wrote:
> Hello,
>
>     Following the pattern described inhttp://code.google.com/p/redis/wiki/TwitterAlikeExample
Reply all
Reply to author
Forward
0 new messages