A wired behavior using RoaringBitmap (Java) with serialize/de-serialize

81 views
Skip to first unread message

Jing Lv

unread,
Feb 24, 2019, 9:45:46 PM2/24/19
to Roaring Bitmaps
Hello,

      I am building up a bitmap service for calculating the DAUs/retention rate etc. It works very well (thanks to the RoaringBitmap project!). However, sometimes it appears to have some uncertain behavior in AND, OR and XOR operations between two bitmap - e.g, I have two bitmap which has 629422 and 631075 bits set (I serialized them, attached as example1.btm and example2.btm). Actually 629422 elements of each are the same, but when I do AND operation, I got a result of 30k+ elements. It seem OR/XOR operation did the wrong math as well.
      I big a bit further, figure out it may be an issue with serialization/de-serialization. In my service, I make it to save to file using serialization API of the RoaringBitmap. Maybe there is an issue here or I am doing something wrong, sometimes I got a file like example1 - if I de-serialize and print the elements using RoaringBitmap.forEach(), I get 629422 elements in total - which means it may have some duplicated elements? 
      This issue does break my service, which I use it to calculate retention rate (e.g, it keeps 50%+ and if the issue occurs it drop to 30%- which is not correct).  As I observe, it usually happen while I shutdown the service, do serialization; and restart with de-serialization - does it mean RoaringBitmap is not thread-safe or something?  I tried 0.7.9/0.7.38, but the same issue occurs.

      I haven't read the source code of RoaringBitmap yet but turn to mailing list. May some of the experts take a look? Thanks a lot.
example1.btm
example2.btm

Daniel Lemire

unread,
Feb 25, 2019, 6:54:00 PM2/25/19
to Roaring Bitmaps
Thank you for you report. File example1.btm is not a valid RoaringBitmap file. Any operation on a RoaringBitmap that you would build from this file is ill-defined.

Is your application multithreaded? In my opinion, that is the most likely cause of such corruption. RoaringBitmap is not thread safe: you are responsible as the programmer for properly locking the bitmaps.

It is possible that there could be bug in the library. If so we would need a reproducible case. The easiest thing would be to provide us with the input data (the list of integers) that are in both sets. Then describe the operations you did from these sets of integers. Note that RoaringBitmap is widely used in major projects and we have had no similar bug report (ever). Thus we should consider a library bug as possible but not likely.

We simply cannot help further without more information.

Jing Lv

unread,
Feb 25, 2019, 8:36:25 PM2/25/19
to Roaring Bitmaps
Hi Daniel,

Thank you for the reply!
I am not able to offer the full data sequence as it's in real mess environment, but I suggest you are right, it was doing the serialization while accepting new data. I made it thread safe now.
Will report if I observe something new.

Thanks a lot!

在 2019年2月26日星期二 UTC+8上午7:54:00,Daniel Lemire写道:

Daniel Lemire

unread,
Feb 25, 2019, 8:37:48 PM2/25/19
to Roaring Bitmaps
Great. Typo: "Thank you for your report."
Reply all
Reply to author
Forward
0 new messages