Object sharing is fun!

241 views
Skip to first unread message

Salvatore Sanfilippo

unread,
Mar 25, 2009, 6:59:22 PM3/25/09
to redi...@googlegroups.com
Hello,

some hour ago I implemented a new feature in Redis, object sharing, in
order to save memory when there are duplicated objects in the
database. The following is a short report of how it works since I
think it's pretty fun, involves minimal coding, but the results are
good.

The problem to solve is pretty simple. Redis is fast because the
dataset is (also) taken in memory and only synced un disk
asynchronously. Even if we have more memory, servers with 16GB of RAM
are now common, and it gets cheaper and cheaper, still for Redis
memory is a precious resource.

It is very common to have a lot of objects with the same value.
Imagine to have a large database that maps people's names with cities
where they live. "New York" or "Paris" will be much more common than
my home town "Campobello di Licata" in Sicily. But we are wasting a
lot of memory having a separated object for "New York" as value of
everybody living in New York inside our database.

Redis objects are more or less C structures like this:

struct object {
int type;
void *ptr;
int refcount;
};

When an object is created its refcount is zero. If I reference this
object I call the function incrRefCount() that will increment the
refcount by one. The reverse function, decrRefCount() is called every
time I remove a reference, if the refcount drops to zero the object is
destroyed. So we can have a single object like this with "New York" as
value referenced multiple times! That's exactly what we need to save a
lot of RAM.

But... there is a problem. How to find duplicated objects inside the
data set? This duplicated objects can be everywhere: keys, values,
elements of lists and sets. Do we need a background thread trying to
find objects with the same value in order to merge them into an unique
object? But this requires a lot of work, to handle lockings, CPU time,
and so forth. What we may want instead is to have some kind of pool of
objects that are likely to be duplicated, so every time we create a
new object we can check if we already have this object inside the
dataset, and use it instead of the new one.

Stream algorithms can help us :)

Basically you can see all the objects created as a stream of objects
you can observe. What about if we have a fixed-length dictionary where
we can store this objects? If I see "Paris" I'll add this to my
dictionary of possibly duplicated objects. If "Paris" is created again
I'll check against the dictionary and re-use the object. But we only
have a fixed length pool, how to make sure this pool will not get
bigger and bigger and at the same time to have inside the pool objects
that are probably common?

This is what Redis uses, a modification of a famous stream algorithm:
a) We got an object "foo"
b) Check if it's already in the pool. If yes, reuse the object and
increment a "popularity" counter associated with this object in the
pool
c) If the object is not found, get a random object form the pool and
decrement its popularity. If the popularity reaches zero delete this
object and add our new one instead ("foo" in the example)

In the long run our shared objects pool will be filled with common objects!

Ok that's great, but isn't it a mess to check if an object is already
in the pool every time we create an object? Redis creates a lot of
objects and there are many createObject() calls around in the source
code. Should we pay the big price of the complexity in order to
implement this feature? Not at all!

Actually in Redis objectst that are likely to be duplicated can only
come from two sources: the database file we load, or the command the
user sends to us. The first is just a special case of the latter,
actually, let's focus just on the latter.

When a user connects to the server and issues a command, like "SET foo
bar", in the client structure we create three redis objects, in the
c->argv structure. This objects are passed by reference in order to
reach the SET command implementation, that will add the key "foo" with
value "bar" eventually, with something like dictAdd(argv[1],argv[2]);

Basically every kind of object that is likely to be duplicated has as
source this arguments vector, an array of objects. So the only real
change that was needed is this:

/* Let's try to share objects on the command arguments vector */
if (server.shareobjects) {
int j;
for(j = 1; j < c->argc; j++)
c->argv[j] = tryObjectSharing(c->argv[j]);
}

Trivial!
Even the tryObjectSharing implementation is pretty simple, check it
yourself in the Redis source code if you are interested.

With this little changes there are huge memory savings. 50% in the
first tests I did with duplicated objects, like when you have a lot of
fields that can be "0" or "1". But duplication is everywhere in a
key-val DB. User IDs, timestamps, numbers, and so on.

Currently you can enalbe or disable object sharing in Redis-git but
can't specify the shared objects pool, this will be added soon.
Wanted to share this here since it's one of this features that need a
discussion on the mailing list to link in the documentation ;) This is
really one thing that can be explained to people not very "inside" to
low level programming much better in a tutorial form, and not in
standard documentation.

Regards,
Salvatore

--
Salvatore 'antirez' Sanfilippo
http://antirez.com

Organizations which design systems are constrained to produce designs
which are copies of the communication structures of these
organizations.

Conway's Law

Brian Hammond

unread,
Mar 25, 2009, 11:29:49 PM3/25/09
to Redis DB
This is a great idea and the implementation is nice and simple.

Are there new tests too? I didn't see those.

Thanks!
Brian
> Salvatore 'antirez' Sanfilippohttp://antirez.com

Chris Lamb

unread,
Mar 26, 2009, 4:29:00 PM3/26/09
to redi...@googlegroups.com
Salvatore Sanfilippo wrote:

> Redis objects are more or less C structures like this:
>
> struct object {
> int type;
> void *ptr;
> int refcount;
> };

You really want this to be:

struct foo {
int type;
int refcount;
void *ptr;
};

..otherwise your compiler will insert padding to align the pointer correctly
on x86_64 (and most other 64-bit archs).

Your structure uses 24 bytes whilst mine uses 16, a rather significant
saving.


Regards,

--
Chris Lamb, UK ch...@chris-lamb.co.uk
GPG: 0x634F9A20

signature.asc

Chris Lamb

unread,
Apr 16, 2009, 1:18:58 PM4/16/09
to redi...@googlegroups.com
Chris Lamb wrote:

> You really want this to be:

[..]


> ..otherwise your compiler will insert padding to align the pointer
> correctly on x86_64 (and most other 64-bit archs).

Did this get overlooked?

signature.asc

Salvatore Sanfilippo

unread,
Apr 16, 2009, 2:08:09 PM4/16/09
to redi...@googlegroups.com
On Thu, Apr 16, 2009 at 7:18 PM, Chris Lamb <ch...@chris-lamb.co.uk> wrote:
> Chris Lamb wrote:
>
>> You really want this to be:
> [..]
>> ..otherwise your compiler will insert padding to align the pointer
>> correctly on x86_64 (and most other 64-bit archs).
>
> Did this get overlooked?

Hello Chris!
actually in the article the structure was just written for example.
Actually it was broken but fixed later thanks to a patch received.

Thank you very much for the hint that turned out to be useful :)

Cheers,
Salvatore

>
> Regards,
>
> --
> Chris Lamb, UK                                     ch...@chris-lamb.co.uk
>                                                          GPG: 0x634F9A20
>

--
Salvatore 'antirez' Sanfilippo
http://invece.org

Akarin Akaza

unread,
Nov 6, 2022, 9:07:22 AM11/6/22
to Redis DB
I cannot find any functions called tryObjectSharing in Redis today, is it called makeObjectShared now?
Reply all
Reply to author
Forward
0 new messages