Hello,
I am building up a bitmap service for calculating the DAUs/retention rate etc. It works very well (thanks to the RoaringBitmap project!). However, sometimes it appears to have some uncertain behavior in AND, OR and XOR operations between two bitmap - e.g, I have two bitmap which has 629422 and 631075 bits set (I serialized them, attached as example1.btm and example2.btm). Actually 629422 elements of each are the same, but when I do AND operation, I got a result of 30k+ elements. It seem OR/XOR operation did the wrong math as well.
I big a bit further, figure out it may be an issue with serialization/de-serialization. In my service, I make it to save to file using serialization API of the RoaringBitmap. Maybe there is an issue here or I am doing something wrong, sometimes I got a file like example1 - if I de-serialize and print the elements using RoaringBitmap.forEach(), I get 629422 elements in total - which means it may have some duplicated elements?
This issue does break my service, which I use it to calculate retention rate (e.g, it keeps 50%+ and if the issue occurs it drop to 30%- which is not correct). As I observe, it usually happen while I shutdown the service, do serialization; and restart with de-serialization - does it mean RoaringBitmap is not thread-safe or something? I tried 0.7.9/0.7.38, but the same issue occurs.
I haven't read the source code of RoaringBitmap yet but turn to mailing list. May some of the experts take a look? Thanks a lot.