Redis Bitmaps

Gvozden

unread,

Nov 21, 2011, 5:05:50 PM11/21/11

to Redis DB

Hi All,

I wanted to use Redis bit operations with fairly large bitmaps
(~128mb) but relatively
sparse, or with bits grouped together in "hotspots" within a bitmap. I
noticed that, currently,
there is no memory optimization for uses like this. Also, there seems
to be no usual bitmap
operations commands, like get position of first set bit, get number of
set bits, get last set bit,
AND, OR, XOR, NOT of bitmaps, ...

One simple approach could be to implement page-wise set-bit counting.
To optimize the
memory consumption, madvise(MADV_DONTNEED) could be used when the "one
counter"
of a page hit zero.

Any thoughts?

Chears

Josiah Carlson

unread,

Nov 21, 2011, 6:46:28 PM11/21/11

to redi...@googlegroups.com

While it would be awesome if Redis automatically handled zero pages,
I'm not sure that the situation is generally useful enough, even if it
would definitely reduce memory utilization. I'm sure others will
disagree.

Incidentally, the point at which that kind of thing really saves you
memory, is just about the same time that just using a hash is probably
more efficient than un-optimized pages of 0s.

Personally, I like to stick with 2**20-element bitmaps (using 2-24
bits/field), sharding out as necessary.

The other operations (AND, OR, XOR, NOT, ...) are the kind of things
that should really be done with scripting, which you can try out in
unstable right now.

Regards,
- Josiah

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>

Gvozden

unread,

Nov 23, 2011, 5:18:21 AM11/23/11

to Redis DB

I see your points, I just hoped bitmaps will be promoted into the
first class citizen types in Redis.
As for memory issue, one highly advanced, and of course, disabled by
default, option could be to use madvise(MADV_MERGEABLE) flag on all
string allocations in Redis. Simple hack, that could help 1% of users.

Regards,

Joran Greef

unread,

Nov 23, 2011, 9:21:49 AM11/23/11

to redi...@googlegroups.com

Bitmaps can be very space efficient when run-length encoded.

Salvatore Sanfilippo

unread,

Nov 23, 2011, 9:32:46 AM11/23/11

to redi...@googlegroups.com

On Tue, Nov 22, 2011 at 12:46 AM, Josiah Carlson
<josiah....@gmail.com> wrote:

> Personally, I like to stick with 2**20-element bitmaps (using 2-24
> bits/field), sharding out as necessary.

Yes sharding to save memory on sparse bitmaps is probably the way to go.

Have a 128MB bitmap that is not very populate? Just handle that at
application level, splitting it for instance into 4096 keys holding
32k pages each.
So for instance you have bitmap:<count> so that bitmap:0 is from 0 to
4095 bit, bitmap:1 is from 4096 to ... and so forth.

Since for Redis semantics an empty key is an empty bitmap, GETBIT
calls will work as expected, but non populated zones of the bitmap
will not use memory.
It is a tradeoff of course, but could work well, and what is
interesting is that it is a "cluster friendly" solution.

Salvatore

--
Salvatore 'antirez' Sanfilippo
open source developer - VMware

http://invece.org
"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele

Josiah Carlson

unread,

Nov 23, 2011, 11:56:13 AM11/23/11

to redi...@googlegroups.com

On Wed, Nov 23, 2011 at 6:32 AM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
> On Tue, Nov 22, 2011 at 12:46 AM, Josiah Carlson
> <josiah....@gmail.com> wrote:
>
>> Personally, I like to stick with 2**20-element bitmaps (using 2-24
>> bits/field), sharding out as necessary.
>
> Yes sharding to save memory on sparse bitmaps is probably the way to go.

Incidentally, I did it to improve upon memory fragmentation and
overall performance. I had been populating a 50-70% full bitmap with
400 million entries in an effectively random fashion (I ended up
having something like 6 bitmaps, ranging in total size from 50 megs
for the small ones, to 1.2 gigs for the big one), and needing to
reallocate such a huge chunk of memory is a recipe for disaster.

Regards,
- Josiah

> Have a 128MB bitmap that is not very populate? Just handle that at
> application level, splitting it for instance into 4096 keys holding
> 32k pages each.
> So for instance you have bitmap:<count> so that bitmap:0 is from 0 to
> 4095 bit, bitmap:1 is from 4096 to ... and so forth.
>
> Since for Redis semantics an empty key is an empty bitmap, GETBIT
> calls will work as expected, but non populated zones of the bitmap
> will not use memory.
> It is a tradeoff of course, but could work well, and what is
> interesting is that it is a "cluster friendly" solution.
>
> Salvatore
>
> --
> Salvatore 'antirez' Sanfilippo
> open source developer - VMware
>
> http://invece.org
> "We are what we repeatedly do. Excellence, therefore, is not an act,
> but a habit." -- Aristotele
>

Guy

unread,

Mar 28, 2012, 1:05:11 PM3/28/12

to redi...@googlegroups.com

I see a great need to have a better memory wise implementation of BitMap in case we want to use Redis to store bitmap index for the data.

I think that in that case it will be best to compress the bit maps both for memory and scan time CPU with some Run-Length algorithm. For example EWAH.

Josiah Carlson

unread,

Mar 28, 2012, 4:50:56 PM3/28/12

to redi...@googlegroups.com

The problem with compression of any kind is that bit-level access goes
from O(1) to O(something else greater than 1), and could potentially
induce various kinds of memory thrashing when the bitset needs to be
expanded on a change. The current flat memory model offers the least
surprises.

Also, based on what was described in the link you provided, it would
seem that the EWAH method is only applicable for a sort of named
entity, where ordering doesn't matter. On the other hand, Redis
bitsets are basically a mapping of indexes (or ids) to individual
bits, and so lacks the limited item size requirement to make EWAH
viable. That is to say, EWAH looks to be trying to take intersections
of the form women -> {'jane242', 'jessie7', ...} ^ twitter ->
{'jessie7', ...}, compressing the sets into bitsets, then intersecting
the bitsets. Where Redis strings accessed via bits is fundamentally
different (primarily in the fact that the strings can't be
intersected/unioned/differenced natively).

Regards,
- Josiah

> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.

> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/-sVMhHC0BrgJ.

Salvatore Sanfilippo

unread,

Mar 28, 2012, 5:10:47 PM3/28/12

to redi...@googlegroups.com, redi...@googlegroups.com

+1 on predictable behavior.

The (very) sparse bitmap is partially solvable sharding among keys holding N bits each.

Cheers,
Salvatore

Sent by phone: sorry if this message is short or badly formatted.

Guy

unread,

Mar 28, 2012, 5:28:15 PM3/28/12

to redi...@googlegroups.com

Not having intersect/union... is certainly another limitation of the BitMap implementation.

For the implementation of Bit Map index of a large table, it is best to have a good compression as the BitMaps are rather sparse.

If the table has n (small) distinct value in a column, we can create n bit maps for each of the values, when most of them are 0, and a good Run-Length compression can be very efficient.

The issue of updating the index is relevant, but if the (BitMap) index is used for mostly read of a OLAP database, it is less problematic.

The addition of OR/AND/XOR together with a good compression will make Redis much more suitable for such In-Memory-OLAP applications.

JDelmerico

unread,

May 10, 2012, 8:17:40 AM5/10/12

to Redis DB

+1 for native server-supported bitmap operations.

While it's easy enough to do them outside of Redis you have to incur
the data transfer to the client in order to perform the operations.
If you're trying to do something like an intersection across many 100M
+ bitmaps you end up shipping quite a bit of data to the client.

Scripting might be a reasonable short-term solution, but it would be
nice to have first-class support for bit maps in the product. Would
this also incur a copy, albeit on the server, to perform the
operation?

Thanks for building a very interesting product!

Reply all

Reply to author

Forward