Better compressed bitmaps for Druid.io with the Roaring Bitmap library?

2,320 views
Skip to first unread message

Samy Chambi

unread,
May 28, 2014, 10:47:59 PM5/28/14
to druid-de...@googlegroups.com
Druid makes extensive used of compressed bitmap. Specifically, Druid uses the state-of-the-art Concise format (Colantonio and  Di Pietro, 2010). We recently proposed and benchmarked an alternative format, Roaring bitmaps. We believe Roaring bitmaps to be often superior to Concise bitmaps, and would like to collaborate with the Druid community to see if Druid can benefit from our work. 

Roaring is a bitmap compression scheme that stores 32-bit integers in a compact and efficient hybrid data structure, composed of short arrays and bitmaps. The scheme was coded in Java. The source code and the white paper can be accessed from our main site  http://roaringbitmap.org/ and from GitHub https://github.com/lemire/RoaringBitmap. Our Java library is part of the Maven repository. 

The version of Concise used by Druid is designed to work on memory-mapped files. We have reproduced this good idea in our Roaring library. We have classes such as ImmutableRoaringBitmap and MutableRoaringBitmap.

We wanted to benchmark Concise vs. Roaring on memory-mapped file. The benchmark's code and the real data sets are on GitHub: https://githubd.com/samytto/MemoryMappedBitmaps. (You can easily reproduce them by following the README.md file).

The benchmark use data from three real data sets. We create 200 bitmaps with the two compression schemes on each data set. These bitmaps are then serialized on disk, and then memory mapped. After warming the cache, we start the experiments by repeating each one 100 times before reporting on average results. Each experiment compares the Java on-heap memory usage of the two schemes, the disk space consumed by their respective files, their average time to perform scans, unions and intersections of the 200 bitmaps. 

Results shown that the current version of Roaring uses little more on-heap memory than Concise but performs scans, unions and intersections much faster. Of course, the results will vary depending on the data and the benchmark, but in our published benchmark, we get the following results:

- Roaring is 20 to 100 times faster than Concise on unions
- Roaring is 40 to 200 times faster than Concise on intersections
- Roaring can scan the set bits up to 3 times faster than Concise
- Roaring can offer better compression (up to 1.6x better than Concise)
- Roaring never uses more than 300 bytes per bitmap 

These results complement the results we already had  with on-heap bitmaps (see http://arxiv.org/abs/1402.6407). In some of these benchmarks, we found that Roaring could be 1000x faster than Concise.

It is important to realize that a roaring data structure supports fast random access and fast updates, unlike Concise. 

We think that Roaring could be interesting for Druid. If needed, we are willing to work with Druid so that our Java library fits the need of Druid. Naturally, we are inviting collaboration on our own library as well (https://github.com/lemire/RoaringBitmap).

This code is licensed under Apache License, Version 2.0. To the best of our knowledge, 

Best regards !

Samy for the Roaring team.
Universite du Quebec, Montreal, Canada



========= Appendix : benchmark output on an AMD Bulldozer Linux machine ========

Results interpretation ::
RAM Size = the required RAM space, in KB and bytes/bitmap, to store the 200 bitmaps
Disk Size = the required disk space, in MB and KB/bitmap, to store the 200 serialized bitmaps
Unions time = average time in ms to compute the union of 200 bitmaps
Horizontal unions time = average time in ms to compute the horizontal union of 200 bitmaps
Intersections time = average time in ms to compute the intersection of 200 bitmaps
Scans time = average time in ms to scan the 200 bitmaps


***************************
Roaring bitmap on census1881.csv dataset
***************************
RAM Size = 45,63 KB (233,60 bytes/bitmap)
Disk Size = 1,91 MB (9,76  KB/bitmap)
Horizontal unions time = 4 ms
Intersections time = 0.07 ms
Scans time = 13 ms
.ignore = 464741562
***************************
ConciseSet on census1881.csv dataset
***************************
RAM Size = 15,63 KB (80,00 bytes/bitmap)
Disk Size = 3,06 MB (15,66  KB/bitmap)
Unions time = 299 ms
Intersections time = 8 ms
Scans time = 48 ms
.ignore = 1574010228



***************************
Roaring bitmap on census-income.csv dataset
***************************
RAM Size = 38,88 KB (199,04 bytes/bitmap)
Disk Size = 2,26 MB (11,57  KB/bitmap)
Horizontal unions time = 4 ms
Intersections time = 0.05 ms
Scans time = 106 ms
.ignore = 1784600922
***************************
ConciseSet on census-income.csv dataset
***************************
RAM Size = 15,63 KB (80,00 bytes/bitmap)
Disk Size = 2,42 MB (12,40  KB/bitmap)
Unions time = 90 ms
Intersections time = 2 ms
Scans time = 122 ms
.ignore = -1673566912


***************************
Roaring bitmap on weather_sept_85.csv dataset
***************************
RAM Size = 55,29 KB (283,08 bytes/bitmap)
Disk Size = 8,33 MB (42,67  KB/bitmap)
Horizontal unions time = 6 ms
Intersections time = 0.1 ms
Scans time = 196 ms
.ignore = 1632713838
***************************
ConciseSet on weather_sept_85.csv dataset
***************************
RAM Size = 15,63 KB (80,00 bytes/bitmap)
Disk Size = 9,02 MB (46,20  KB/bitmap)
Unions time = 655 ms
Intersections time = 18 ms
Scans time = 303 ms
.ignore = -2042879588

Samy Chambi

unread,
May 29, 2014, 1:14:16 AM5/29/14
to druid-de...@googlegroups.com
Hi guys,

I'm sorry, there was a typo in the text. Here is the link to the memory-mapped files benchmarks on github: https://github.com/samytto/MemoryMappedBitmaps.

Best regards !
Samy.

Fangjin Yang

unread,
Jun 2, 2014, 1:39:25 PM6/2/14
to druid-de...@googlegroups.com
Hi Samy, thanks for the info! This is def interesting and I hope we can find some time to collaborate to try out roaring bitmaps in Druid.

Samy Chambi

unread,
Jun 3, 2014, 2:15:31 PM6/3/14
to druid-de...@googlegroups.com
Hi Fangjin,

Thanks for the feed-back !

We are very interested to collaborate on trying out Roaring bitmaps in Druid. We believe that it will improve querying performances in most cases. If there are any instructions or suggestions, we'll be ready to work on it !

Best regards !
Samy.

Fangjin Yang

unread,
Jun 3, 2014, 3:54:23 PM6/3/14
to druid-de...@googlegroups.com
Hi Samy, in the next few days I'll try to write up something more comprehensive about experimenting with swamping out the bitmap index structures in Druid.

-- FJ

Samy Chambi

unread,
Jun 9, 2014, 2:11:05 PM6/9/14
to druid-de...@googlegroups.com
Hi fangjin,

Ok ! excellent ! I'm waiting for your news.

- Samy.

Fangjin Yang

unread,
Jun 10, 2014, 2:14:09 PM6/10/14
to druid-de...@googlegroups.com
Hi Samy, on the read side of things, most of the concise set code is in druid.processing. Within there you'll find ConciseOffset, which is the concise implementation of an 'Offset' object in Druid (pretty much an index into an array). The other class to look at is ConciseCompressedIndexedInts which provides serde classes for Immutable Concise sets and ways to iterate through them.

On writing and generating concise sets, most of the code is in IndexMerger. If you were to replace the pieces of code in these classes to work with Roaring bitmaps, things should work for newly created segments. We'd have to talk about making things backwards compatible for old segments.

Let me know if this helps.

FJ


--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/10fbbbdf-cbd7-4ac8-ab9b-7607c2851368%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Samy Chambi

unread,
Jun 10, 2014, 6:18:56 PM6/10/14
to druid-de...@googlegroups.com
Hi Fangjin,

Thank you for these indications. I will try to replace the recommended pieces of code to use Roaring bitmaps. I'll give you news.

Best ragards !
- Samy.  

Fangjin Yang

unread,
Jun 11, 2014, 1:07:50 AM6/11/14
to druid-de...@googlegroups.com
Cool, let us know if you have more questions.

Samy Chambi

unread,
Jun 13, 2014, 11:29:47 PM6/13/14
to druid-de...@googlegroups.com
Hi Fangjin,

We are talking about the method ImmutableConciseSet.compareTo() used in this class and defined for ImmutableConciseSet objects.

We have to implement such a method for Roaring but we need to know the kind of sort on which it is based and for what purpose this method is used  ?

Thanks.

- Samy.

Daniel Lemire

unread,
Jun 16, 2014, 10:22:24 AM6/16/14
to druid-de...@googlegroups.com
Indeed, what is the purpose of these bitmap sorting methods within druid?

Fangjin Yang

unread,
Jun 16, 2014, 1:26:06 PM6/16/14
to druid-de...@googlegroups.com
Hey guys, I believe you don't need to actually implement this. If you have a roaring indexed ints, it only needs to extend indexed ints and not a comparable. 

- FJ

Daniel Lemire

unread,
Jun 16, 2014, 1:32:08 PM6/16/14
to druid-de...@googlegroups.com
Excellent. 

Charles Allen

unread,
Sep 3, 2014, 4:08:02 PM9/3/14
to druid-de...@googlegroups.com
Ran the benchmark on my work laptop. FYI:

$ sysctl -a | grep machdep.cpu

 

machdep
.cpu.max_basic: 13

machdep
.cpu.max_ext: 2147483656

machdep
.cpu.vendor: GenuineIntel

machdep
.cpu.brand_string: Intel(R) Core(TM) i7-3740QM CPU @ 2.70GHz

machdep
.cpu.family: 6

machdep
.cpu.model: 58

machdep
.cpu.extmodel: 3

machdep
.cpu.extfamily: 0

machdep
.cpu.stepping: 9

machdep
.cpu.feature_bits: 3219913727 2142954495

machdep
.cpu.leaf7_feature_bits: 641

machdep
.cpu.extfeature_bits: 672139520 1

machdep
.cpu.signature: 198313

machdep
.cpu.brand: 0

machdep
.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC POPCNT AES PCID XSAVE OSXSAVE TSCTMR AVX1.0 RDRAND F16C

machdep
.cpu.leaf7_features: SMEP ENFSTRG RDWRFSGS

machdep
.cpu.extfeatures: SYSCALL XD EM64T LAHF RDTSCP TSCI

machdep
.cpu.logical_per_package: 16

machdep
.cpu.cores_per_package: 8

machdep
.cpu.microcode_version: 21

machdep
.cpu.processor_flag: 4

machdep
.cpu.mwait.linesize_min: 64

machdep
.cpu.mwait.linesize_max: 64

machdep
.cpu.mwait.extensions: 3

machdep
.cpu.mwait.sub_Cstates: 135456

machdep
.cpu.thermal.sensor: 1

machdep
.cpu.thermal.dynamic_acceleration: 1

machdep
.cpu.thermal.invariant_APIC_timer: 1

machdep
.cpu.thermal.thresholds: 2

machdep
.cpu.thermal.ACNT_MCNT: 1

machdep
.cpu.thermal.core_power_limits: 1

machdep
.cpu.thermal.fine_grain_clock_mod: 1

machdep
.cpu.thermal.package_thermal_intr: 1

machdep
.cpu.thermal.hardware_feedback: 0

machdep
.cpu.thermal.energy_policy: 0

machdep
.cpu.xsave.extended_state: 7 832 832 0

machdep
.cpu.arch_perf.version: 3

machdep
.cpu.arch_perf.number: 4

machdep
.cpu.arch_perf.width: 48

machdep
.cpu.arch_perf.events_number: 7

machdep
.cpu.arch_perf.events: 0

machdep
.cpu.arch_perf.fixed_number: 3

machdep
.cpu.arch_perf.fixed_width: 48

machdep
.cpu.cache.linesize: 64

machdep
.cpu.cache.L2_associativity: 8

machdep
.cpu.cache.size: 256

machdep
.cpu.tlb.inst.small: 64

machdep
.cpu.tlb.inst.large: 8

machdep
.cpu.tlb.data.small: 64

machdep
.cpu.tlb.data.large: 32

machdep
.cpu.tlb.shared: 512

machdep
.cpu.address_bits.physical: 36

machdep
.cpu.address_bits.virtual: 48

machdep
.cpu.core_count: 4

machdep
.cpu.thread_count: 8


$ java -cp MemoryMappedFiles-0.0.1-SNAPSHOT-jar-with-dependencies.jar:lib/*  -XX:+UseG1GC -XX:+AggressiveOpts -XX:+UseFastAccessorMethods -XX:AllocatePrefetchLines=5 -XX:AllocatePrefetchStyle=3 -server  -javaagent:./lib/SizeOf-1.0-SNAPSHOT.jar Benchmark ../real-roaring-datasets


Java runs single-threaded

Results:


objc[78140]: Class JavaLaunchHelper is implemented in both /Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/bin/java and /Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/jre/lib/libinstrument.dylib. One of the two will be used. Which one is undefined.


JAVAGENT: call premain instrumentation for class SizeOf




Results interpretation ::


RAM Size = the required RAM space, in KB and bytes/bitmap, to store the 200 bitmaps


Disk Size = the required disk space, in MB and KB/bitmap, to store the 200 serialized bitmaps


Horizontal unions time = average time in ms to compute the horizontal union of 200 bitmaps


Intersections time = average time in ms to compute the intersection of 200 bitmaps


Scans time = average time in ms to scan the 200 bitmaps






***************************


Roaring bitmap on census1881.csv dataset


***************************


RAM Size = 29.77 KB (152.40 bytes/bitmap)


Disk Size = 1.91 MB (9.76  KB/bitmap)


Horizontal unions time = 4 ms


Intersections time = 0.06 ms


Scans time = 12 ms


.ignore = 464741562


***************************


ConciseSet on census1881.csv dataset


***************************


RAM Size = 15.63 KB (80.00 bytes/bitmap)


Disk Size = 3.06 MB (15.66  KB/bitmap)


Unions time = 269 ms


Intersections time = 6 ms


Scans time = 41 ms


.ignore = 1574010228


***************************


Roaring bitmap on census-income.csv dataset


***************************


RAM Size = 26.50 KB (135.68 bytes/bitmap)


Disk Size = 2.26 MB (11.57  KB/bitmap)


Horizontal unions time = 4 ms


Intersections time = 0.06 ms


Scans time = 100 ms


.ignore = 1784600922


***************************


ConciseSet on census-income.csv dataset


***************************


RAM Size = 15.63 KB (80.00 bytes/bitmap)


Disk Size = 2.42 MB (12.40  KB/bitmap)


Unions time = 81 ms


Intersections time = 1 ms


Scans time = 119 ms


.ignore = -1673566912


***************************


Roaring bitmap on weather_sept_85.csv dataset


***************************


RAM Size = 34.66 KB (177.48 bytes/bitmap)


Disk Size = 8.33 MB (42.67  KB/bitmap)


Horizontal unions time = 4 ms


Intersections time = 0.09 ms


Scans time = 181 ms


.ignore = 544521326


Charles Allen

unread,
Sep 3, 2014, 4:23:54 PM9/3/14
to druid-de...@googlegroups.com








***************************


ConciseSet on weather_sept_85.csv dataset


***************************


RAM Size = 15.63 KB (80.00 bytes/bitmap)


Disk Size = 9.02 MB (46.20  KB/bitmap)


Unions time = 593 ms


Intersections time = 13 ms


Scans time = 280 ms


.ignore = 1163895196

Steve Harris

unread,
Sep 3, 2014, 4:46:52 PM9/3/14
to druid-de...@googlegroups.com
Very cool.

If I read the results correctly the RAM footprint of Roaring is twice the size of concise. Is there anything we can do to improve that aspect?

Cheers,
Steve


--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Daniel Lemire

unread,
Sep 3, 2014, 4:48:22 PM9/3/14
to druid-de...@googlegroups.com
Thanks Charles for sharing your benchmark.

Daniel Lemire

unread,
Sep 3, 2014, 5:02:23 PM9/3/14
to druid-de...@googlegroups.com
You are correct. Both use a tiny amount of Java RAM, but Roaring has a larger overhead.  We made some reasonable choices that could be revised. I expect that there is a performance-memory trade-off. In this sense, I would not conclude hastily that we made the wrong choices. (We are looking for help with benchmarking for this precise reason so that we can tune in the right direction with respect to Druid.)

Note however that a common problem, memory-wise, might come when you are aggregating bitmaps. For example, if you need to compute the union of many bitmaps, you need to materialize intermediate results. My impression is that roaring might fare much better in such cases. For example, our horizontal_or function can compute the union of many bitmaps directly, with hardly any intermediate results. The Concise implementation might need to do many more allocations and de-allocations. These can put some real stress on the system.

That is, my claim is that roaring might be much nicer memory-wise in practice... e.g., put less stress on the garbage collector. I hope we can get hard data in this respect.

Since we can now run Druid with Roaring, it should be possible to quantify everything.

RAM Size = 15.63 KB (80.00 bytes<span

...

Steve Harris

unread,
Sep 3, 2014, 5:05:13 PM9/3/14
to druid-de...@googlegroups.com
Nice! Yeah, I agree that it's not necessarily even a problem. Possibly something to look at when we benchmark with our prod data. Too many variables to really guess what the impact would be otherwise.

Cheers,
Steve


--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Eric Tschetter

unread,
Sep 3, 2014, 5:41:41 PM9/3/14
to druid-de...@googlegroups.com
2x memory for 50~100x increase in performance seems very worthwhile to me.

The primary concern with increased memory footprint is how it affects
the amount of data that can fit in memory. So, I think the most
interesting thing to benchmark is a performance comparison when the
roaringBitmap/conciseSet is being paged in. It can be hard to
fabricate that situation, but that's probably the most interesting
thing to verify. If that is even comparable to conciseSet, it should
be a win to switch to roaringBitmap.

--Eric
> --
> You received this message because you are subscribed to the Google Groups
> "Druid Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to druid-developm...@googlegroups.com.
> To post to this group, send email to druid-de...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/druid-development/a33e45d5-25e3-437d-b6fd-82e33dae6445%40googlegroups.com.

Steven Harris

unread,
Sep 3, 2014, 5:46:36 PM9/3/14
to druid-de...@googlegroups.com

Daniel Lemire

unread,
Sep 23, 2014, 1:50:08 PM9/23/14
to druid-de...@googlegroups.com
We have just released Roaring version 0.4.0 with changes made by Samy that reduce the Java RAM memory usage for memory-mapped bitmaps to something that will always be within a few bytes of ImmutableConcise (e.g., 102 bytes vs 78 bytes per mapped bitmap). This small memory overhead is unavoidable: it is the cost of storing java objects. The nice thing with the new version is that the memory usage per bitmap is fixed (i.e., data independent). 

The new version is available right now from the maven repository (http://central.maven.org/maven2/org/roaringbitmap/RoaringBitmap/0.4.0/), see http://roaringbitmap.org/ for details.

About performance:

1- In our tests, it is faster to retrieve the content from a roaring object (whether it is memory mapped or not) than a concise object. That is, roaring can enumerate the set bits sometimes 2x faster than concise in some tests.

2- Whether you use concise or roaring, the time required for creating the bitmap object is tiny, as the data is not actually read (just the header). As I already stated, scanning the content is faster with roaring than concise in our tests.

3- If you incur a memory map fault when accessing the underlying data, then it needs to be recovered from disk. This will incur an IO cost but it is not related to the format  you use. Of course, since roaring sometimes compress better than concise, you can assume that roaring will generate fewer such faults.

4- Both concise and roaring can be serialized very fast if you just mesure CPU time. However, serialization is slightly more expensive, processor-wise, with roaring (say 2x) than concise. That is because concise just writes out the content of its memory verbatim whereas roaring needs to do some cheap operations. However, we are unconcerned because when serializing content in a real application, the bulk of the running time is spent in IO. That is, you serialize to disk or to the network. In a scenario where roaring compresses better (which is the case in our tests), the net result might be that overall serialization time (in clock time) is going to less with roaring.

Steve Harris

unread,
Sep 23, 2014, 2:03:03 PM9/23/14
to druid-de...@googlegroups.com
Very cool improvement!


For more options, visit https://groups.google.com/d/optout.

Arnon Moscona

unread,
Sep 23, 2014, 2:27:26 PM9/23/14
to druid-de...@googlegroups.com
Just a testimony on Lemire's work on bitmaps. I've been using RoaringBitmaps and its predecessor for the last four years in an implementation that is very, very similar to Druid DB. It has worked flawlessly. Very stable. I never had any memory issues with it. The bitmaps take a lot less space in memory than the associated data segments, so they are never a problem. I don't know the internals of Druid, but in my case I have a bitmap per vector (AKA column), rather than a bitmap per segment. For me this was not a problem. However, if limitations is your main concern I would say that since the bitmap is int based rather than long based then my vectors are limited to about 2B elements, which for me is just fine. If one needs a larger data set, then you need a more elaborate solution - either do a bitmap per segment, or wrap the bitmap API to adapt it to long, maintaining a set of bitmap shards that will maximize at the MAXINT size (or whatever limit) and present a long based API. In other words the limits I expect are not really the in-memory footprint of the bitmaps but their total storage capacity. At least that would be true in my architecture, but as I percieve it to be very close to the Druid design choices I would suspect that the same will hold for Druid.

In terms of stability, correctness, and speed - I've been very satisfied with Lemire's work.

Daniel Lemire

unread,
Sep 23, 2014, 11:59:38 PM9/23/14
to druid-de...@googlegroups.com

I was wrong in my analysis below. I assumed that when an ImmutableConciseSet was created, it just read the header and left it at that. But I now realize that the entire bitmap is scanned to determine the cardinality. This is not needed with Roaring. This means that creating an ImmutableRoaringBitmap is cheap in comparison with creating an ImmutableConciseSet. 


I provide updated numbers below (can be reproduced with the code at https://github.com/lemire/RoaringVSConciseBenchmark)... 

***************************
Roaring bitmap on census1881.csv dataset
***************************
Deserialization time: 1 ms
RAM Size = 20.3125 Kb (104 bytes/bitmap)
Disk Size = 1957.5 Kb (10022 bytes/bitmap)
Horizontal unions time = 3.43 ms
Intersections time = 0.05 ms
Scans time = 8.08 ms
.ignore = 421166178
***************************
ConciseSet on census1881.csv dataset
***************************
Deserialization time: 5 ms
RAM Size = 15.625 Kb (80 bytes/bitmap)
Disk Size = 3131.15625 Kb (16032 bytes/bitmap)
Unions time = 236.84 ms
Intersections time = 6.26 ms
Scans time = 38.75 ms
.ignore = 522008784
***************************
Roaring bitmap on census-income.csv dataset
***************************
Deserialization time: 1 ms
RAM Size = 20.3125 Kb (104 bytes/bitmap)
Disk Size = 2316.23828125 Kb (11859 bytes/bitmap)
Horizontal unions time = 2.97 ms
Intersections time = 0.05 ms
Scans time = 61.71 ms
.ignore = 567100982
***************************
ConciseSet on census-income.csv dataset
***************************
Deserialization time: 3 ms
RAM Size = 15.625 Kb (80 bytes/bitmap)
Disk Size = 2480.23828125 Kb (12699 bytes/bitmap)
Unions time = 73.79 ms
Intersections time = 2.12 ms
Scans time = 110.5 ms
.ignore = 587452328
***************************
Roaring bitmap on weather_sept_85.csv dataset
***************************
Deserialization time: 1 ms
RAM Size = 20.3125 Kb (104 bytes/bitmap)
Disk Size = 8546.015625 Kb (43756 bytes/bitmap)
Horizontal unions time = 4.88 ms
Intersections time = 0.07 ms
Scans time = 117.58 ms
.ignore = 817385326
***************************
ConciseSet on weather_sept_85.csv dataset
***************************
Deserialization time: 10 ms
RAM Size = 15.625 Kb (80 bytes/bitmap)
Disk Size = 9240.66796875 Kb (47312 bytes/bitmap)
Unions time = 508.37 ms
Intersections time = 15.5 ms
Scans time = 261.77 ms
.ignore = 920952760

Fangjin Yang

unread,
Sep 24, 2014, 1:58:14 PM9/24/14
to druid-de...@googlegroups.com
Hey guys, awesome work so far! We've left some comments on the PR for bytebuffer collections. Do you guys have a PR for Druid as well?

-- FJ

Daniel Lemire

unread,
Sep 24, 2014, 2:33:03 PM9/24/14
to druid-de...@googlegroups.com
We can probably issue a Roaring PR within a few days.

But our Druid fork  *replaces* ConciseSet with Roaring. 

We are missing "something" that allows you to switch back and forth between Roaring and Concise. Ensuring that Druid can work with different bitmap formats would really be done best by someone who is a Druid expert. (I suspect that it is a tiny amount of code, but it needs to be done right.)

Do you want us to go ahead with a PR? As it stands, it would be require someone else to add the missing "something" so that both Concise and Roaring are supported.

Fangjin Yang

unread,
Sep 24, 2014, 2:34:03 PM9/24/14
to druid-de...@googlegroups.com
Yeah, I can we can take a look at being able to support both.

Daniel Lemire

unread,
Sep 24, 2014, 2:41:38 PM9/24/14
to druid-de...@googlegroups.com
So you want us to issue a PR in the near future, with the limitation I have described?

Fangjin Yang

unread,
Sep 24, 2014, 2:44:57 PM9/24/14
to druid-de...@googlegroups.com
Yes, that should be fine, we may have some general comments about the PR but we can take a look at supporting both versions. Can you submit the PR for the druid-0.7.x branch?

Thanks,
FJ

Daniel Lemire

unread,
Sep 24, 2014, 2:58:41 PM9/24/14
to druid-de...@googlegroups.com, Samy Chambi
Good. We will work on it.

We have a Druid fork with Roaring in it, but I think it is off the main branch from several weeks ago. So an unknown amount of surgery is required to sync with the branch you specify. 

However, even if we were ready to issue the PR right now, we have to resolve the bytebuffer-collections dependency first.

That is,  *before* issuing a PR directly on DruidMoreover, we probably want to resolve the bytebuffer-collections PR, because it is a dependency. See my comment on this latter PR:

Daniel Lemire

unread,
Oct 16, 2014, 1:35:25 AM10/16/14
to druid-de...@googlegroups.com, chamb...@gmail.com
We are still trying to figure out how to get Roaring accepted into bytebuffer-collections as a first step. I have issued a new PR on bytebuffer-collections:

https://github.com/metamx/bytebuffer-collections/pull/3

Please note that it is simply not possible to somehow fully wrap Roaring into something that behaves like ConciseSet. There are API differences to account for and Java polymorphism only buys you so much. But at least as far as bytebuffer-collections, I was able to fully wrap both ConciseSet and RoaringBitmap while having just one implementation of the R-Tree logic.

Please note also that we *know* that integrating Roaring is not very difficult. We did it in a relatively short time, only using code duplication. It is mostly a matter of getting a pleasing design at this point.

Please let us know how we can help push this further...

Cheers!

Fangjin Yang

unread,
Oct 16, 2014, 9:59:33 AM10/16/14
to druid-de...@googlegroups.com, chamb...@gmail.com
Hey guys, sorry about the delay, I've been absolutely swamped. We'll take a closer look at the abstractions too and see if we can do some things on our side.

Daniel Lemire

unread,
Oct 20, 2014, 2:01:25 PM10/20/14
to druid-de...@googlegroups.com
Great. We have updated the latest PR with a more complete wrapping using interfaces instead of abstract classes.


The wrappers now support a wide range of operations.

Anyhow, it is a design possibility.

Fangjin Yang

unread,
Oct 20, 2014, 4:55:33 PM10/20/14
to druid-de...@googlegroups.com
Thanks guys, apologies again for being so slow to review. I have getting roaring integrated as my next todo.

Daniel Lemire

unread,
Oct 20, 2014, 5:05:19 PM10/20/14
to druid-de...@googlegroups.com
Great.

Samy pushed an update to Roaring this week-end (Roaring version 0.4.1) solving a design issue that made bugs more likely within Druid.

Daniel Lemire

unread,
Nov 17, 2014, 10:11:13 AM11/17/14
to druid-de...@googlegroups.com
Lucene 5.0 has adopted the roaring format for their bitmaps...

https://issues.apache.org/jira/browse/LUCENE-5983

(They have their own implementation however.)

Daniel Lemire

unread,
Nov 17, 2014, 10:31:43 AM11/17/14
to druid-de...@googlegroups.com
The Apache Lucene implementation differs in two interesting ways...

1. They have a special "reversed" format for compressing well super dense lists.

2. They can skip over values quickly while iterating.

Xavier Léauté

unread,
Nov 17, 2014, 3:50:44 PM11/17/14
to druid-de...@googlegroups.com
Interesting, do you think it would make sense to incorporate those enhancements?
Can you elaborate on what it means to skip over values quickly? 

Fangjin Yang

unread,
Nov 17, 2014, 4:11:18 PM11/17/14
to druid-de...@googlegroups.com
In case people have been following, Roaring is not integrated with Druid as a part of master.

Benchmarks here:

Would love to hear more feedback.

Steven Harris

unread,
Nov 17, 2014, 4:12:11 PM11/17/14
to druid-de...@googlegroups.com
Did you mean now?

Cheers,
Steve
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

Daniel Lemire

unread,
Nov 17, 2014, 4:12:32 PM11/17/14
to druid-de...@googlegroups.com

The format change is intriguing... Effectively, they either sort the original data, or its negation, whichever is most economical. This can be cool in the sense that you might be able implement super fast negation (complement). I do not have a very good idea right now as to whether it could improve compression ratios and/or performance in the context of druid. Obviously, anything that changes the format has to be done with care (e.g., it is an opportunity to create new bugs). 

They have a function called "advance".... you can find it by searching through their (relatively short) code:
https://svn.apache.org/viewvc/lucene/dev/branches/branch_5x/lucene/core/src/java/org/apache/lucene/util/RoaringDocIdSet.java?view=markup&pathrev=1629606
It would not be hard to extend the RoaringBitmap API to include something of the sort... Note that RoaringBitmap already offers things like "rank" and so on... So, unlike the format change, we could throw this in safely as a minor revision.

Fangjin Yang

unread,
Nov 17, 2014, 4:14:38 PM11/17/14
to druid-de...@googlegroups.com
Yes I meant now*

:P

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

Xavier

unread,
Jan 6, 2015, 6:53:46 PM1/6/15
to druid-de...@googlegroups.com
Daniel, Sami,

With Druid 0.7 approaching stable I was able to index real-world into Druid data using Roaring.
Overall, indices appear to be an average 20% larger after LZ4 compression compared to Concise.
However, the distribution is very uneven and the difference in size I have seen ranges from -5% to 60% depending on the dimension.

I wonder if implementing the format change to store the complement for very dense bitmaps would help with storage size.

Below is the size breakdown for one of our shards (in bytes).

   index_part lz4_roaring lz4_concise difference pct_difference
 dimension_30    17578086    18443080    -864994     -4.6900735
    index.drd        2996        3026        -30     -0.9914078
         time      133513      133513          0      0.0000000
     metric_1     4235418     4235418          0      0.0000000
     metric_2     4198553     4198553          0      0.0000000
     metric_3     5055205     5055205          0      0.0000000
     metric_4     4748182     4748182          0      0.0000000
     metric_5     5324002     5324002          0      0.0000000
     metric_6     8668828     8668828          0      0.0000000
     metric_7     1830836     1830836          0      0.0000000
     metric_8      221463      221463          0      0.0000000
     metric_9     1532420     1532420          0      0.0000000
    metric_10     3905608     3905608          0      0.0000000
    metric_11     3599839     3599839          0      0.0000000
    metric_12     3833695     3833695          0      0.0000000
    metric_13     3599839     3599839          0      0.0000000
    metric_14     6567728     6567728          0      0.0000000
    metric_15     1501424     1501424          0      0.0000000
    metric_16     2774797     2774797          0      0.0000000
    metric_17     2598475     2598475          0      0.0000000
    metric_18      286536      286536          0      0.0000000
    metric_19      227818      227818          0      0.0000000
    metric_20     2304851     2304851          0      0.0000000
    uniques_1    89662134    89662134          0      0.0000000
 dimension_38     4835158     4707990     127168      2.7011102
 dimension_57     5170816     4946612     224204      4.5324760
 dimension_58     4923536     4696244     227292      4.8398678
 dimension_15     4445384     4046522     398862      9.8569092
 dimension_18     4351307     3867079     484228     12.5218026
 dimension_55     4371696     3875162     496534     12.8132450
 dimension_54     4373791     3877233     496558     12.8070199
 dimension_31     4366643     3868919     497724     12.8646787
 dimension_40     4801468     4297178     504290     11.7353761
 dimension_37     4840761     4326591     514170     11.8839521
 dimension_56     4449383     3933505     515878     13.1149700
 dimension_59     4450117     3933763     516354     13.1262102
 dimension_27     4468347     3912953     555394     14.1937304
 dimension_29    11234477    10664343     570134      5.3461709
  dimension_1     4440348     3867860     572488     14.8011562
 dimension_60     4516536     3925414     591122     15.0588447
 dimension_25     4759323     4122831     636492     15.4382268
 dimension_45     9610128     8967340     642788      7.1681011
 dimension_39     4543012     3895138     647874     16.6328895
 dimension_41     4573721     3924473     649248     16.5435716
 dimension_42     4551205     3899801     651404     16.7035190
 dimension_61     4558020     3896826     661194     16.9675012
 dimension_34     4684917     4018371     666546     16.5874679
 dimension_63     4564120     3896964     667156     17.1198913
 dimension_62     4647595     3899299     748296     19.1905263
  dimension_7     4755298     3976020     779278     19.5994487
 dimension_52     5722255     4899999     822256     16.7807381
 dimension_44     4823553     3979909     843644     21.1975701
 dimension_46     5290284     4429112     861172     19.4434460
 dimension_26     5130066     4255626     874440     20.5478583
 dimension_28     5150488     4232952     917536     21.6760313
  dimension_2     4876022     3868792    1007230     26.0347416
 dimension_64     4907695     3883005    1024690     26.3890981
 dimension_53     5341963     3929317    1412646     35.9514389
 dimension_20    13265096    11500568    1764528     15.3429639
 dimension_19    10395068     8586310    1808758     21.0656033
 dimension_23     8950827     7074965    1875862     26.5140817
  dimension_4    10855199     7868925    2986274     37.9502156
 dimension_32     7590911     4550467    3040444     66.8160872
 dimension_50    11269244     7906126    3363118     42.5381280
 dimension_47    11300983     7936743    3364240     42.3881685
 dimension_49    11305696     7941456    3364240     42.3630125
 dimension_51    11280636     7907130    3373506     42.6641019
 dimension_48    11311957     7937329    3374628     42.5159144
 dimension_17     8745099     5352821    3392278     63.3736491
 dimension_22    14555290    10963334    3591956     32.7633547
 dimension_33     7876668     3998928    3877740     96.9694878
 dimension_24    22423276    18345506    4077770     22.2276235
  dimension_5    13665199     9587149    4078050     42.5366290
 dimension_43    33679672    29355782    4323890     14.7292619
 dimension_36    12777170     8273038    4504132     54.4435067
 dimension_14    13499745     8992729    4507016     50.1184457
 dimension_35    12807961     8278753    4529208     54.7088191
  dimension_9    12960121     8365743    4594378     54.9189474
 dimension_12    12976504     8376512    4599992     54.9153633
 dimension_10    13253760     8642098    4611662     53.3627598
  dimension_8    13055775     8443585    4612190     54.6235989
 dimension_13    13276098     8658854    4617244     53.3239618
 dimension_11    13070171     8452443    4617728     54.6318739
  dimension_6    14138330     8867010    5271320     59.4486755
  dimension_3    14017092     8724044    5293048     60.6719544
 dimension_21    37224218    31354082    5870136     18.7220790
 dimension_16    51451022    45544490    5906532     12.9687082


On Monday, November 17, 2014 1:14:38 PM UTC-8, Fangjin Yang wrote:
Yes I meant now*

:P
On Mon, Nov 17, 2014 at 1:12 PM, Daniel Lemire <lem...@gmail.com> wrote:

The format change is intriguing... Effectively, they either sort the original data, or its negation, whichever is most economical. This can be cool in the sense that you might be able implement super fast negation (complement). I do not have a very good idea right now as to whether it could improve compression ratios and/or performance in the context of druid. Obviously, anything that changes the format has to be done with care (e.g., it is an opportunity to create new bugs). 

They have a function called "advance".... you can find it by searching through their (relatively short) code:
https://svn.apache.org/viewvc/lucene/dev/branches/branch_5x/lucene/core/src/java/org/apache/lucene/util/RoaringDocIdSet.java?view=markup&pathrev=1629606
It would not be hard to extend the RoaringBitmap API to include something of the sort... Note that RoaringBitmap already offers things like "rank" and so on... So, unlike the format change, we could throw this in safely as a minor revision.




On Monday, November 17, 2014 3:50:44 PM UTC-5, Xavier wrote:
Interesting, do you think it would make sense to incorporate those enhancements?
Can you elaborate on what it means to skip over values quickly? 

On Mon, Nov 17, 2014 at 7:31 AM, Daniel Lemire <lem...@gmail.com> wrote:
The Apache Lucene implementation differs in two interesting ways...

1. They have a special "reversed" format for compressing well super dense lists.

2. They can skip over values quickly while iterating.



--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.

Eric Tschetter

unread,
Jan 6, 2015, 7:46:51 PM1/6/15
to druid-de...@googlegroups.com
Xavier,

What's the effective change in size of the segment as a whole, not
just the index portion of it? A 20% change in the bitmap indexes
could actually be only a 2~5% change in segment size, in which case it
might not be too bad?

--Eric
>>> email to druid-developm...@googlegroups.com.
>>> To post to this group, send email to druid-de...@googlegroups.com.
> --
> You received this message because you are subscribed to the Google Groups
> "Druid Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to druid-developm...@googlegroups.com.
> To post to this group, send email to druid-de...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/druid-development/0b7b21d1-ad81-4ac8-91e2-0189571b6291%40googlegroups.com.

Daniel Lemire

unread,
Jan 6, 2015, 7:49:21 PM1/6/15
to druid-de...@googlegroups.com
Good day Xavier,

This indicates to me that you have long ranges of consecutive values. It might very be the case that flipping (negating) the sets in such cases could improve the compression with Roaring. If done correctly, this might also improve performance as well.

It is hard, in the abstract (without actual data), to know how would a given strategy fare. Maybe some kind of public benchmark could be setup so that this problem can be explored.

I am available to help.

Thanks for the feedback.

- Daniel
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

Daniel Lemire

unread,
Jan 6, 2015, 7:57:44 PM1/6/15
to druid-de...@googlegroups.com
Good day Xavier,

To be clearer, if you were to publish dumps of representative bitmaps, one could try alternative Lucene-like formats and see whether it is worth pursuing, as far as compression ratios are concerned. It is entirely likely that the Lucene people genuinely and practically improved on the Roaring format... 

Publishing some representative bitmaps could be useful more generally, as many people could have a go at the problem without having to hack Druid itself.



Cheers!

- Daniel

Daniel Lemire

unread,
Jan 6, 2015, 8:02:30 PM1/6/15
to druid-de...@googlegroups.com
Good day Eric,

Of course,  a 2~5% change in storage accompanied with a noticeable performance boost would be interesting. ;-)


Cheers!

- Daniel

You received this message because you are subscribed to a topic in the Google Groups "Druid Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-development/_kw2jncIlp0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-developm...@googlegroups.com.

To post to this group, send email to druid-de...@googlegroups.com.

Eric Tschetter

unread,
Jan 6, 2015, 8:18:28 PM1/6/15
to druid-de...@googlegroups.com
Daniel,

"Long ranges of consecutive values" means that there are ranges where
the same value is repeated over and over and over again, correct?
I.e. something that would form a densely packed array?

If that's the case, then yeah, we do expect to have some dimensions
with lots of contiguous chunks of the same value and other dimensions
with less contiguous chunks. This is because the data is sorted by
the dimensions as it is stored, so the dimensions "closer" to the
timestamp will end up being densely packed, "further" from the
timestamp will be spread out more.

--Eric
> https://groups.google.com/d/msgid/druid-development/fc1b8448-1c7b-4267-8eb5-bc60a0ae8a78%40googlegroups.com.

Xavier Léauté

unread,
Jan 6, 2015, 11:36:18 PM1/6/15
to druid-de...@googlegroups.com
Apologize if that wasn't clear.

The overall index size increased by 20% compared to Concise, including the size of metrics and time columns. If we consider only dimension columns, the increase is 27%.

Compared to Druid 0.6, the segment is only about 15% bigger, with some of the increase from roaring compensated by LZ4 compressing better than LZF.

Daniel, I'll publish dumps of the underlying bitmaps later this week, if someone wants to have a look.

Daniel Lemire

unread,
Jan 7, 2015, 8:33:46 AM1/7/15
to druid-de...@googlegroups.com
Indeed Eric, but it is surprising to me that this effect propagates through 50 dimensions. You cannot have a table with 50 attributes, with all columns exhibiting long runs of identical values when using the lexicographical order.


So maybe there is something else going on here. I cannot tell in the abstract without access to the data.

Daniel Lemire

unread,
Jan 22, 2015, 4:11:09 PM1/22/15
to druid-de...@googlegroups.com
Has there been any update on this front?

- Daniel

Xavier Léauté

unread,
Jan 22, 2015, 4:21:31 PM1/22/15
to druid-de...@googlegroups.com
Hi Daniel, I apologize I haven't had time to extract the bitmaps yet, but I hope to get the to you soon.

Xavier Léauté

unread,
Jan 22, 2015, 5:36:07 PM1/22/15
to druid-de...@googlegroups.com
Hi Daniel, I put together a tarball of some of the bitmaps in the index I analyzed earlier.

I included the roaring bitmap indices for all the values of the following dimensions: dimension_33, dimension_8, and dimension_3, which are 97%, 55%, and 61% bigger, respectively, than the corresponding concise dimensions.

All the files were created by simply calling
ImmutableRoaringBitmap.serialize(new RandomAccessFile(...));
for each of the bitmaps involved.
Best,
Xavier

Daniel Lemire

unread,
Jan 22, 2015, 6:40:02 PM1/22/15
to druid-de...@googlegroups.com
Excellent thanks. This should prove very useful.

--
You received this message because you are subscribed to a topic in the Google Groups "Druid Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-development/_kw2jncIlp0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-developm...@googlegroups.com.

To post to this group, send email to druid-de...@googlegroups.com.

Daniel Lemire

unread,
Jan 26, 2015, 3:04:07 PM1/26/15
to druid-de...@googlegroups.com
Thank you Xavier for the data set.

So I understand that on this particular test set, Concise compresses better than Roaring (the difference ranges from 50% of 100%).

Here are some basic statistics about the provided bitmaps:

1. The median cardinality is 9. This means that a typical bitmap contains less than 10 values. Many bitmaps contain only 1 or 2 entries.
2. The median size in bytes of the roaring bitmaps is 36 bytes. (That is, we use roughly 4 bytes per set value in a typical case.)

Why is the Roaring format somewhat less storage efficient in such cases?  The gist of it is that the Roaring format assumes that you will have more than a couple of handful of entries. There is a storage overhead of about 18 bytes *per* bitmap due to the format itself. An assumption that went into the design was that 18 bytes was "negligible". For example, Roaring bitmaps have a header. The first 4 bytes are a cookie that serves to help ensure that we are decoding something in the right format (this costs 4 bytes). (This was added so we are more likely to detect corrupted data.)

To put it differently, in Roaring, we assume that a typical bitmap will contain a reasonable number of entries (many thousands). The Roaring format will still work in cases where there are very few entries, but it is not optimized for this case because we do not believe that this is a case where bitmaps are best anyhow.

Quoting from our paper:

"We assume that there are far fewer containers than integers. (...). When applications encounter integer sets with lower density (less than 0.1 %), a bitmap is unlikely to be the proper data structure."

If you are going to create so many tiny sets of integers, a bitmap-like data structure, is maybe not the best choice.  I suspect that a B-tree might do well.
Moreover, I am concerned that memory-file mapping thousands of tiny objects is likely not worth it. 

My feeling, therefore, is that this example does not illustrate an instance where Roaring does poorly. It is an instance where bitmaps are probably not best (whatever the format you use). 

Though I am working from very partial information, I believe that it might be worthwhile to consider alternate forms of indexing for such dimensions. 

Of course, I am eager to help, if I can.

Xavier Léauté

unread,
Jan 30, 2015, 5:59:03 PM1/30/15
to druid-de...@googlegroups.com
Thank you Daniel for the insight. The optimal type of indices will very much be application-specific, and there might be cases where Roaring would still be a better choice. From Druid's perspective our main concern is speed. For queries combining large number of indices (i.e. several thousands) across multiple columns, Roaring has shown to be faster than Concise, so I believe there is still value in supporting them, even if it comes at some cost of some memory.

Mixing different types of indices within a single Druid segment would probably not make sense at this point because of the overhead associated with doing operations on different bitmap formats for queries filtering on multiple columns simultaneously.

Regarding your concern mapping thousands of objects, that's a non-issue in Druid, since the entire set of bitmaps is memory mapped at once.

Daniel Lemire

unread,
Jan 30, 2015, 10:05:53 PM1/30/15
to druid-de...@googlegroups.com

Thanks for the feedback.

I should stress that I remain eager to collaborate on improving matters. Please do not hesitate to bring your concerns regarding bitmap or integer compression to me.

Regarding your concern mapping thousands of objects, that's a non-issue in Druid, since the entire set of bitmaps is memory mapped at once.

If you create bitmaps on-the-fly from a MappedByteBuffer and then eagerly garbage-collect them, then I agree.

However, if you hold many memory-mapped bitmap in memory, then my concern might be warranted. Even with memory mapping, there is a small but non-zero Java memory usage. E.g., all objects, even empty ones, use memory... and a ByteBuffer reference also uses memory and so on. Now, this overhead can easily match the actual size on disk when the bitmaps are tiny... and this has the potential of making memory-file mapping counterproductive.
 




Daniel Lemire

unread,
Feb 2, 2015, 11:29:19 AM2/2/15
to druid-de...@googlegroups.com
As a follow up:

1. I checked that the Lucene-Roaring format would not help. Though the data has long runs that are compressible, the density is not quite high enough for the Lucene variant to be beneficial.

2. Though Roaring uses more storage in the cases you provided, I checked that it could still be quite a bit faster. On dimension 33, the union of all bitmaps can be computed 4 times faster with Roaring than with Concise. I expect similar good numbers on other dimensions.

Follow-up work for the people working on Roaring would be to find a way to improve the compression in such specific cases (where there are many moderately long runs) without losing the higher speed. 

Eric Tschetter

unread,
Feb 10, 2015, 10:41:56 PM2/10/15
to druid-de...@googlegroups.com
Daniel,

On the objects, we are intelligent about how things are handled in
that we only have a single long-lived ByteBuffer reference and then
only materialize Java objects for the individual bitmaps that are
there. So there is no GC pressure from having lots of little bitmaps.

As Xavier indicated, the choice to use bitmaps and only bitmaps is one
of efficiency. The big benefit that we get from using bitmaps is that
we can index each column individually and then combine those
on-the-fly at query time to generate any "composite" index we want.
(Granted, the "composite" index I'm talking about is not sorted, so
there are trade-offs with what you would get with a composite b-tree
index, for example). If we start introducing b-tree indexes and other
things, that leads to very complex and intricate logic to try to
combine the indexes together at query time, which will likely affect
performance negatively.

Fwiw, 18 bytes of header is quite a bit. Is there any documentation
that you can point us to which describes what each value in the header
represents? One strategy that is specifically enabled by how we store
data is that any header information which provides insight into like
serialization version and things of that nature can be stored once in
the column's metadata rather than stored on each individual bitmap.
If it's true that the memory usage is largely from these headers, it
should be possible to remove the redundant bits and store them just
once as a "column header" instead.

--Eric
> --
> You received this message because you are subscribed to the Google Groups
> "Druid Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to druid-developm...@googlegroups.com.
> To post to this group, send email to druid-de...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/druid-development/306262b4-aabd-40bb-98a2-5bd65b741ead%40googlegroups.com.

Daniel Lemire

unread,
Feb 10, 2015, 11:27:47 PM2/10/15
to druid-de...@googlegroups.com
I understand. Thanks for the feedback.

Fwiw, 18 bytes of header is quite a bit.  Is there any documentation
that you can point us to which describes what each value in the header
represents?

The header is 4 bytes per se.

The serialization process is coded there...


It follows what is described in our manuscript (http://arxiv.org/abs/1402.6407) closely except for one component (https://github.com/lemire/RoaringBitmap/blob/master/src/main/java/org/roaringbitmap/buffer/MutableRoaringArray.java#L99-L103) we added specifically for low-memory memory mapping. It is not very complicated, but I would be happy to describe it more fully. 
 
 One strategy that is specifically enabled by how we store
data is that any header information which provides insight into like
serialization version and things of that nature can be stored once in
the column's metadata rather than stored on each individual bitmap.
If it's true that the memory usage is largely from these headers, it
should be possible to remove the redundant bits and store them just
once as a "column header" instead.

We could indeed saving 4 bytes per bitmap in this manner. I'd be happy to update the API to allow it, but I am not sure it is worth it.

I think that a longer term solution is for us to update the Roaring format more substantially now that we have a use case for Druid. This was a key input we were missing until a few weeks ago. Due to the need to think through the design, implement and test properly, this will take some time, though it should be done this year (2015). 

Eric Tschetter

unread,
Feb 11, 2015, 11:37:30 AM2/11/15
to druid-de...@googlegroups.com
Thanks for the pointers.

In general, the storage of the SERIAL_COOKIE header is, I believe, the
functional equivalent of storing a serialization version. One trick
to shrinking down storage and still maintaining the flexibility of
multiple serialization versions is to just use a single byte and have
v0, v1, v2, v3, v4, etc. If/when you run out of versions from that
single byte, you take the final v255 and have it indicate that you
need to look at the next byte to figure out the actual version. That
should shave off 3 bytes or 10% of the memory usage if the average is
36 bytes.

But anyway, if you think there is more space savings to be gained from
rethinking the format a bit, that sounds great to me as well.

--Eric
> --
> You received this message because you are subscribed to the Google Groups
> "Druid Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to druid-developm...@googlegroups.com.
> To post to this group, send email to druid-de...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/druid-development/4547b770-9673-4c60-8d53-8eef18f5a549%40googlegroups.com.

Daniel Lemire

unread,
Feb 11, 2015, 11:59:21 AM2/11/15
to druid-de...@googlegroups.com
That is correct.

Roaring is still quite a bit faster than Concise, even when it uses more space. So the existing Roaring format is a solid thing. The fact that the Apache Lucene people replaced their WAH/Concise-like bitmap compression routines with a Roaring-like format speaks to it. But I think that we can have both the high speed, and better compression in the sorted Druid setup by a non-trivial format update.

When we assessed Roaring, we did not target applications where the data is maintained in sorted form. I should point out that it is not out of ignorance... (see references below). We were just not motivated by sorted data.

Now, the experience with Druid, at least as far as the data that was provided to me here, has motivated me to further refine the Roaring format this year.

Well, that's one of my  New Year resolutions...

Cheers!

- Daniel  


References:

Daniel Lemire, Owen Kaser, Kamel Aouiche, Sorting improves word-aligned bitmap indexes. Data & Knowledge Engineering 69 (1), pages 3-28, 2010.

Daniel Lemire, Owen Kaser, Eduardo Gutarra, Reordering Rows for Better Compression: Beyond the Lexicographic Order, ACM Transactions on Database Systems 37 (3), 2012.

Daniel Lemire

unread,
Apr 29, 2015, 1:34:18 PM4/29/15
to druid-de...@googlegroups.com
BTW this work I alluded to is ongoing and will definitively come out this year.

- Daniel

Xavier Léauté

unread,
Apr 29, 2015, 1:37:46 PM4/29/15
to druid-de...@googlegroups.com
Awesome Daniel! Looking forward to it!

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

Daniel Lemire

unread,
Aug 6, 2015, 11:27:15 AM8/6/15
to Druid Development
Update: we have working code right now that resolves the issues encountered with the data sets provided by Xavier.

A release should follow... soon...

Xavier Léauté

unread,
Aug 6, 2015, 12:35:29 PM8/6/15
to druid-de...@googlegroups.com
Very exciting, we'd be more than happy to test it out!

Daniel Lemire

unread,
Aug 18, 2015, 11:00:20 AM8/18/15
to Druid Development
Good day Xavier,

I have just issued a PR regarding the new Roaring.


Please keep on providing us your valuable feedback.

- Daniel

Daniel Lemire

unread,
Aug 20, 2015, 3:01:46 PM8/20/15
to Druid Development
As part of the PR, I have included numbers... but here is the gist of it...

1. With version 0.5, Roaring now compresses as well as Concise in the cases where, before, Concise was far better than Roaring.

2. Serializing has generally gotten faster than before.

3. Computationally, roaring is typically faster than Concise, sometimes by a lot (e.g., 4 times faster). I think that this was verified in actual Druid applications, and it remains true with version 0.5.

Daniel Lemire

unread,
Sep 1, 2015, 10:09:52 AM9/1/15
to Druid Development
Since this initial announcement, we have published two performance-enhancing point releases (latest: 0.5.2). They improve the performance of some cases by up to 40%.

Daniel Lemire

unread,
Jun 15, 2016, 3:33:27 PM6/15/16
to Druid Development

On this topic, the following paper will be presented next month...

Samy Chambi, Daniel Lemire, Robert Godin, Kamel Boukhalfa, Charles Allen, Fangjin Yang, Optimizing Druid with Roaring bitmaps, IDEAS 2016, 2016.
http://r-libre.teluq.ca/950/

Fangjin

unread,
Jun 15, 2016, 4:04:20 PM6/15/16
to druid-de...@googlegroups.com

sascha...@smaato.com

unread,
Jul 1, 2016, 4:31:18 AM7/1/16
to Druid Development
great paper!!

The results of roaring bitmaps speak for themselves. We have two datasources that used V9 segments with concise encoding and lz4 compression on a Druid 0.9.0 cluster. Both datasources have about 50GBs of segment volume / hour, 30 dimensions, 10 metrics, (roughly). We switched both over to roaring encoding.
The data volume of one of the datasources increased by 20%, the volume of the other datasource increased by 60% !!
In the tests we've done so far, concise outperforms roaring slightly.
I was surprised to see such high cardinalities in the paper. We have a couple of dimensions with cardinality 2 - 100, some with cardinalities 1000-15000 and three dimensions that have 500000 distinct entries per day.
So, relative to the paper, our cardinalities are on the low end which might explain why we don't benefit from roaring so far. I'm a bit sad though that we don't seeing how roaring is the cool kid on the block.

Daniel Lemire

unread,
Jul 1, 2016, 10:02:43 AM7/1/16
to Druid Development
Is there any chance you could provide us with a sample of your bitmap data for analysis?

Given some data, we could probably tell what us going on quickly.

Daniel Lemire

unread,
Jul 1, 2016, 11:11:16 AM7/1/16
to Druid Development
For the record... here is a repository where we run extensive experimental performance reviews... 


Of course, our benchmarks are only as good as the data and use cases that we can get. So we really do encourage people to provide us with some of their data...

Charles Allen

unread,
Jul 1, 2016, 7:39:28 PM7/1/16
to Druid Development
Try regex or javascript filters. There are a few cases where roaring's boolean operations absolutely kicks the pants off of concise.

sascha...@smaato.com

unread,
Jul 6, 2016, 8:25:43 AM7/6/16
to Druid Development
@Daniel
 wrt "Is there any chance you could provide us with a sample of your bitmap data for analysis?"
Sorry for the late reply. I guess, we cannot pass over raw data or segments as a whole as they would contain business data, but when you say "bitmap data", do you mean an extract of the segments? So, is there a way for us to cut out just the bitmap portion from the segment files and send that to you? I take it, this would be fine as it would not reveal any business data, but how could be segregate that from the full segments?
How large a sample would you need?

thanks
Sascha

Daniel Lemire

unread,
Jul 6, 2016, 9:06:00 AM7/6/16
to Druid Development
Yes. So we do not want any business data. We just want the "bitmap portion" as you call it. It is just zeros and ones. Moreover, we do not need a whole lot. A sample of 200 bitmaps from a dataset you care about would probably be good enough.

I don't know, technically, what is the best way to extract this data from a deployed Druid system. Maybe someone who knows Druid very well share some kind of quick HOWTO?

As for the performance results you are getting, have you done some profiling? What percentage of the running time is spent on bitmap operations? As you know, by a very simple "theorem", if bitmap operations account of x% of the running time, then this x% bounds any benefit you might see by upgrading your bitmap indexes. So if x is small... there cannot be any interesting gain. So you want to be sure to test with queries where bitmaps are in the hot spot.

Cheers and thanks for the feedback!

Gian Merlino

unread,
Jul 6, 2016, 12:34:36 PM7/6/16
to druid-de...@googlegroups.com
If you run this patch you can get the dump-segment tool to dump either base64-encoded bitmaps or the decompressed integer arrays: https://github.com/druid-io/druid/pull/3221

Usage is something like this to get base64 encoded bitmaps: java io.druid.cli.Main tools dump-segment -d /var/druid/segment-cache/wikiticker/2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z/2016-06-08T01:52:07.897Z/0/ --dump bitmaps --column isRobot --out /tmp/dump

Or add --decompress-bitmaps to dump the integer arrays instead.

Gian

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

Daniel Lemire

unread,
Jul 6, 2016, 12:56:32 PM7/6/16
to Druid Development
Excellent. Sascha, could this work for you?

Sascha Coenen

unread,
Jul 6, 2016, 1:15:06 PM7/6/16
to druid-de...@googlegroups.com
yep. Sounds perfect. These Druid guys have a go-go-gadgeto-here-you-go for anything it seems. Nice!
Will try to get the green light from our product manager to work on this and to pass the data over to you.

--
You received this message because you are subscribed to a topic in the Google Groups "Druid Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-development/_kw2jncIlp0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-developm...@googlegroups.com.

To post to this group, send email to druid-de...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

...................................................................

Sascha Coenen
Senior Java Developer
sascha...@smaato.com


Smaato Inc.
San Francisco – New York – Hamburg – Singapore 
www.smaato.com

Valentinskamp 70, Emporio, 19th Floor
20355 Hamburg 

T: ­0049 (40) 3480 949 0
F: 0049 (40) 492 19 055

The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.

Gian Merlino

unread,
Jul 7, 2016, 9:34:09 AM7/7/16
to druid-de...@googlegroups.com
Hey Daniel,

When writing the bitmap dumping tool I noticed that for single element bitmaps (common for high cardinality columns) the Concise bitmaps are 8 bytes and the Roaring bitmaps are 18 bytes. I'm not sure if Sascha is running into the same thing, but that difference adds up over the millions of unique values that could be present in a 5M row segment. Any thoughts on whether this can be improved?

I also noticed that Druid does not have Roaring run compression turned on by default, and actually it's not even configurable (you have to modify the source and recompile). Raised this PR for that: https://github.com/druid-io/druid/pull/3228

Gian

Daniel Lemire

unread,
Jul 7, 2016, 11:47:54 AM7/7/16
to Druid Development

 I also noticed that Druid does not have Roaring run compression turned on by default, and actually it's not even configurable (you have to modify the source and recompile). Raised this PR for that: https://github.com/druid-io/druid/pull/3228
I wrongly assumed until now that Roaring had run compression turned on by default.  I think that your PR is very important.

Reference :

Consistently faster and smaller compressed bitmaps with Roaring


When writing the bitmap dumping tool I noticed that for single element bitmaps (common for high cardinality columns) the Concise bitmaps are 8 bytes and the Roaring bitmaps are 18 bytes. I'm not sure if Sascha is running into the same thing, but that difference adds up over the millions of unique values that could be present in a 5M row segment. Any thoughts on whether this can be improved?

"Premature optimization is the root of all evil." (Knuth)

I am not keen on micro-optimizations that are not guided by actual use cases and data. You often end up with more complex software that does not fare better (and can fare worse) in real life. Sometimes, when you see things in their full context, you can find really simple solutions that make troublesome cases go away entirely... or you can realize that what you perceived as a problem, is not at all a problem. I am sure you'll agree.

So here is what I recommend...

1. I would first repeat the tests regarding Roaring with run compression enabled. (So  your PR is important!)

2. If results are still not satisfactory, then I would grab a realistic workload and see what we can learn from it. I am available to work on this. I am sure we can solve any problem that arises... 

3. Here is what the Roaring FAQ has to say regarding space utilization:

"Given N integers in [0,x), then the serialized size in bytes of a Roaring bitmap should never exceed this bound:

8 + 9 * ((long)x+65535)/65536 + 2 * N

That is, given a fixed overhead for the universe size (x), Roaring bitmaps never use more than 2 bytes per integer. You can call RoaringBitmap.maximumSerializedSize for a more precise estimate."

Gian Merlino

unread,
Jul 7, 2016, 12:24:35 PM7/7/16
to druid-de...@googlegroups.com
Hey Daniel,

I respect (and agree with) your belief that it is better to optimize things based on real problems than imagined problems. I asked about the large-number-of-low-cardinality-bitmaps case hoping to learn if you had some wisdom or tricks related to that case. From your response I guess that it hasn't come up yet.

There are definitely real Druid users out there that have datasets with this property (usually for one or two columns out of many) although I'm not sure if any of those users have looked at Roaring vs Concise yet. If I run into another one in the future I'll try to get some data.

Gian

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

Daniel Lemire

unread,
Jul 7, 2016, 12:45:54 PM7/7/16
to Druid Development
Good day Gian,

It is not that I believe the use case is "imagined", but rather that we should look at the whole context before optimizing, to make sure we optimize the right "thing". 

If you go to a screw factory and tell them that their screws make terrible nails... they'll probably ask why you aren't using nails in the first place.

If you give me a bitmap index where the overwhelming majority of the bitmaps contain just one entry... I will question whether a bitmap index is the right tool.  It is possible that you might be able to make tremendous gains by forgoing the bitmap index approach entirely, independently of the chosen bitmap format.  By tremendous gains, I mean 10x or more in performance.

So that's what I mean... maybe there are giant performance gains lurking that we can only achieve if we step back and look at the problem with concrete data... while micro-optimizing can, at best, bring tiny gains.

Gian Merlino

unread,
Jul 7, 2016, 1:37:53 PM7/7/16
to druid-de...@googlegroups.com
Yeah that point is well taken.

The way I think about things is, let's say a real Druid user has a column with a lot of values that only occur in one row each, and that's causing either performance or storage-footprint problems. We have a few options.

1) Do nothing.
2) Make some tweak to the way we deal with bitmaps; it seems like we have the opportunity to save at least 10 bytes per value on storage footprint, unclear if there would be any performance gain.
3) Make a larger change in Druid such that it is fundamentally better at dealing with that kind of data; here we probably have larger opportunity for improvement, but it would also take more effort.

I could see any one of those 3 options winning a cost/benefit analysis. Right now #1 is winning since nobody has come forth to complain yet, so the benefit seems low :)

I think we're on the same page…

Gian

Gian Merlino

unread,
Jul 7, 2016, 1:38:46 PM7/7/16
to druid-de...@googlegroups.com
Hey Sascha,

If you do get a chance to dump the bitmaps, could you please also try reindexing with compressRunOnSerialization enabled? You'll need to build with the patch in https://github.com/druid-io/druid/pull/3228 if it's not in master or a release yet.

Gian

Gian Merlino

unread,
Jul 7, 2016, 1:39:53 PM7/7/16
to druid-de...@googlegroups.com
Actually, I guess you don't need to do that, since Daniel could always recompress them himself. So scratch that. But you might want to do it anyway just to see for yourself if there's an improvement there.

Gian

sascha...@smaato.com

unread,
Jul 8, 2016, 6:01:56 AM7/8/16
to Druid Development
I just received the green light to go ahead with providing the bitmaps after I explained to my PM that it is anonymous data.
It might take me a little while until I have applied the patch and succeeded in extracting the bitmaps as I'm doing this for the first time, so for the mean time, perhaps I can throw in some intermediate data that might or might not be useful to you as well: I attached two documents that show for each of our datasources how many distinct values there are in one day's worth of data, along with distinct values for all pairs of dimension values and also how correlated two given dimensions are.
I have anonymized the dimension names, so they are called A, B, C ... The dark green cells are the cardinalities for each dimension. The first page shows the 1-gram and bi-gram cardinalities. The second page shows the correlations between dimensions and the cardinalities on the marginal rows/cols and the third page shows rollup tests. I ingested the same hour of data several times and left out different combination of dimensions, noting down the resulting data volume and rollup-ratios.

"datasource 1" is the datasource for which roaring resulted in a 20% increase in data volume
"datasource 2" is the datasource for which we saw a 60% increase.

The rollup tests show that for datasource1, leaving dimensions out doesn't improve rollup as well as one would hope for. We also rolled up datasource1 from hourly to daily once which yields ONLY a reduction in data volume of 60%. We are very sad that it isn't more but we attribute it to the kind of data we have in that datasource.
For datasource2, the third page with rollup tests is missing because I haven't gotten around to perform them yet, but we accidentally left out 3 dimensions once while ingesting and noticed a considerable improvement in rollup. We plan to try out a daily rollup for this datasource as well and hope that it will roll-up better than the data in datasource1.



Datasource2 - Cardinalities - Anonymized.pdf
Datasource1 - Cardinalities -Anonymized.pdf

sascha...@smaato.com

unread,
Jul 8, 2016, 6:04:50 AM7/8/16
to Druid Development
one further note: the order of dimensions in the document is also the order in which we specify them inside the ingestion spec.
I tried out different dimension orders in the ingestion-spec but did not see any considerable impact on resulting data-volume

Gian Merlino

unread,
Jul 8, 2016, 1:41:16 PM7/8/16
to druid-de...@googlegroups.com
Hey Sascha,

When you analyze the segments, it'd be helpful to see which columns increased most in size when you switched to Roaring. You can use the meta.smoosh file inside the segment to see how big each column is (the start/end offsets are in there) and compare that for the Concise segment vs the Roaring one.

And unrelated to the bitmap stuff: if you're doing a bunch of rollup tests, one trick for that is to use the "cardinality" aggregator with "byRow": true and "fieldNames" set to the dimensions you propose keeping. It will return the estimated number of rows that you'd have if you rolled up on those dimensions, which you can use to compute an estimated rollup ratio.

Gian

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

Daniel Lemire

unread,
Jul 8, 2016, 1:45:36 PM7/8/16
to Druid Development
Goes without saying that for this comparison analysis to be useful, we need to use Roaring with run compression enabled...  which you get automatically with this recently merged PR : 

Reply all
Reply to author
Forward
0 new messages