Merge Roaring Bitmap into Druid

59 views
Skip to first unread message

Roman Leventov

unread,
Dec 2, 2016, 2:02:50 PM12/2/16
to druid-de...@googlegroups.com, lem...@gmail.com
Since extendedset is going to be merged into Druid it seems reasonable to me to merge Roaring Bitmap too.

Why:
 - As all bitmap implementations are in the repo, interfaces could be unified and the "bitmap" part of bytebuffer-collections could be thrown away.
 - Buffer management deeply integrated with Druid historical runtime. Currently all bitmap implementations allocate fresh on-heap buffers for bitmap intersections and unions, which are performed before the majority of queries. It makes sense to pool those buffers, or at least reuse them within a query (for different segments). But unlikely changes which would allow to support this in upstream Roaring Bitmap library would be merged.
 - Allows to experiment with intersection/union modes. I suspect that for Druid it might be better to store populated bitmaps (e. g. > 10% of bits are set) as simple uncompressed bitmaps as a whole, and when union/intersection involves at least one such bitmap, merge roaring into such uncompressed bitmap.  (This especially makes sense for unions). Again, it's hard to commit and support that in the upstream library.

Downsides:
 - Have to keep an eye on the upstream library and port bug fixes/improvements.

Daniel Lemire

unread,
Dec 2, 2016, 2:21:18 PM12/2/16
to Druid Development, lem...@gmail.com
Speaking for myself, that would be fine. There is certainly nothing that prevents you from doing it.



- Daniel

Gian Merlino

unread,
Dec 7, 2016, 10:24:51 PM12/7/16
to druid-de...@googlegroups.com
Unlike extendedset, Roaring is used by other projects and is under active development, so the downside of moving into Druid is a bit higher. But it could still be worth it. I would vote +1 to merge it into Druid if there were a specific thing we wanted to change, with solid demonstrated benefit, that wasn't going to be accepted upstream. Otherwise I think we should keep using the external Roaring lib.

Gian

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAB5L%3Dweb%3DGXOoOELy%3DtAKxa7BXEAhDRN0UV7y6PtXZEnx09g5g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Xavier Léauté

unread,
Dec 8, 2016, 12:52:12 PM12/8/16
to druid-de...@googlegroups.com
Agree with Gian on this one.

On Wed, Dec 7, 2016 at 7:24 PM Gian Merlino <gi...@imply.io> wrote:
Unlike extendedset, Roaring is used by other projects and is under active development, so the downside of moving into Druid is a bit higher. But it could still be worth it. I would vote +1 to merge it into Druid if there were a specific thing we wanted to change, with solid demonstrated benefit, that wasn't going to be accepted upstream. Otherwise I think we should keep using the external Roaring lib.

Gian
On Fri, Dec 2, 2016 at 11:02 AM, Roman Leventov <roman.l...@metamarkets.com> wrote:
Since extendedset is going to be merged into Druid it seems reasonable to me to merge Roaring Bitmap too.

Why:
 - As all bitmap implementations are in the repo, interfaces could be unified and the "bitmap" part of bytebuffer-collections could be thrown away.
 - Buffer management deeply integrated with Druid historical runtime. Currently all bitmap implementations allocate fresh on-heap buffers for bitmap intersections and unions, which are performed before the majority of queries. It makes sense to pool those buffers, or at least reuse them within a query (for different segments). But unlikely changes which would allow to support this in upstream Roaring Bitmap library would be merged.
 - Allows to experiment with intersection/union modes. I suspect that for Druid it might be better to store populated bitmaps (e. g. > 10% of bits are set) as simple uncompressed bitmaps as a whole, and when union/intersection involves at least one such bitmap, merge roaring into such uncompressed bitmap.  (This especially makes sense for unions). Again, it's hard to commit and support that in the upstream library.

Downsides:
 - Have to keep an eye on the upstream library and port bug fixes/improvements.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CACZNdYD%3DaH-L6zDVjE%2BCJdXx%2BLUSFMv4WkJncyK%3DMbWo_pD-UA%40mail.gmail.com.

Fangjin Yang

unread,
Dec 13, 2016, 6:54:56 PM12/13/16
to Druid Development
Agree with Gian.


On Thursday, December 8, 2016 at 9:52:12 AM UTC-8, Xavier Léauté wrote:
Agree with Gian on this one.

On Wed, Dec 7, 2016 at 7:24 PM Gian Merlino <gi...@imply.io> wrote:
Unlike extendedset, Roaring is used by other projects and is under active development, so the downside of moving into Druid is a bit higher. But it could still be worth it. I would vote +1 to merge it into Druid if there were a specific thing we wanted to change, with solid demonstrated benefit, that wasn't going to be accepted upstream. Otherwise I think we should keep using the external Roaring lib.

Gian
On Fri, Dec 2, 2016 at 11:02 AM, Roman Leventov <roman.leventov@metamarkets.com> wrote:
Since extendedset is going to be merged into Druid it seems reasonable to me to merge Roaring Bitmap too.

Why:
 - As all bitmap implementations are in the repo, interfaces could be unified and the "bitmap" part of bytebuffer-collections could be thrown away.
 - Buffer management deeply integrated with Druid historical runtime. Currently all bitmap implementations allocate fresh on-heap buffers for bitmap intersections and unions, which are performed before the majority of queries. It makes sense to pool those buffers, or at least reuse them within a query (for different segments). But unlikely changes which would allow to support this in upstream Roaring Bitmap library would be merged.
 - Allows to experiment with intersection/union modes. I suspect that for Druid it might be better to store populated bitmaps (e. g. > 10% of bits are set) as simple uncompressed bitmaps as a whole, and when union/intersection involves at least one such bitmap, merge roaring into such uncompressed bitmap.  (This especially makes sense for unions). Again, it's hard to commit and support that in the upstream library.

Downsides:
 - Have to keep an eye on the upstream library and port bug fixes/improvements.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages