I wanted to use Redis bit operations with fairly large bitmaps
(~128mb) but relatively
sparse, or with bits grouped together in "hotspots" within a bitmap. I
noticed that, currently,
there is no memory optimization for uses like this. Also, there seems
to be no usual bitmap
operations commands, like get position of first set bit, get number of
set bits, get last set bit,
AND, OR, XOR, NOT of bitmaps, ...
One simple approach could be to implement page-wise set-bit counting.
To optimize the
memory consumption, madvise(MADV_DONTNEED) could be used when the "one
counter"
of a page hit zero.
Any thoughts?
Chears
Incidentally, the point at which that kind of thing really saves you
memory, is just about the same time that just using a hash is probably
more efficient than un-optimized pages of 0s.
Personally, I like to stick with 2**20-element bitmaps (using 2-24
bits/field), sharding out as necessary.
The other operations (AND, OR, XOR, NOT, ...) are the kind of things
that should really be done with scripting, which you can try out in
unstable right now.
Regards,
- Josiah
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>
Regards,
> Personally, I like to stick with 2**20-element bitmaps (using 2-24
> bits/field), sharding out as necessary.
Yes sharding to save memory on sparse bitmaps is probably the way to go.
Have a 128MB bitmap that is not very populate? Just handle that at
application level, splitting it for instance into 4096 keys holding
32k pages each.
So for instance you have bitmap:<count> so that bitmap:0 is from 0 to
4095 bit, bitmap:1 is from 4096 to ... and so forth.
Since for Redis semantics an empty key is an empty bitmap, GETBIT
calls will work as expected, but non populated zones of the bitmap
will not use memory.
It is a tradeoff of course, but could work well, and what is
interesting is that it is a "cluster friendly" solution.
Salvatore
--
Salvatore 'antirez' Sanfilippo
open source developer - VMware
http://invece.org
"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele
Incidentally, I did it to improve upon memory fragmentation and
overall performance. I had been populating a 50-70% full bitmap with
400 million entries in an effectively random fashion (I ended up
having something like 6 bitmaps, ranging in total size from 50 megs
for the small ones, to 1.2 gigs for the big one), and needing to
reallocate such a huge chunk of memory is a recipe for disaster.
Regards,
- Josiah
> Have a 128MB bitmap that is not very populate? Just handle that at
> application level, splitting it for instance into 4096 keys holding
> 32k pages each.
> So for instance you have bitmap:<count> so that bitmap:0 is from 0 to
> 4095 bit, bitmap:1 is from 4096 to ... and so forth.
>
> Since for Redis semantics an empty key is an empty bitmap, GETBIT
> calls will work as expected, but non populated zones of the bitmap
> will not use memory.
> It is a tradeoff of course, but could work well, and what is
> interesting is that it is a "cluster friendly" solution.
>
> Salvatore
>
> --
> Salvatore 'antirez' Sanfilippo
> open source developer - VMware
>
> http://invece.org
> "We are what we repeatedly do. Excellence, therefore, is not an act,
> but a habit." -- Aristotele
>
I see a great need to have a better memory wise implementation of BitMap in case we want to use Redis to store bitmap index for the data.
Also, based on what was described in the link you provided, it would
seem that the EWAH method is only applicable for a sort of named
entity, where ordering doesn't matter. On the other hand, Redis
bitsets are basically a mapping of indexes (or ids) to individual
bits, and so lacks the limited item size requirement to make EWAH
viable. That is to say, EWAH looks to be trying to take intersections
of the form women -> {'jane242', 'jessie7', ...} ^ twitter ->
{'jessie7', ...}, compressing the sets into bitsets, then intersecting
the bitsets. Where Redis strings accessed via bits is fundamentally
different (primarily in the fact that the strings can't be
intersected/unioned/differenced natively).
Regards,
- Josiah
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/-sVMhHC0BrgJ.
The (very) sparse bitmap is partially solvable sharding among keys holding N bits each.
Cheers,
Salvatore
Sent by phone: sorry if this message is short or badly formatted.
Not having intersect/union... is certainly another limitation of the BitMap implementation.