How to efectively intersect two RoaringBitmaps and get RoaringBitmap of matching positions

22 views
Skip to first unread message

Jan Novotný

unread,
May 7, 2021, 8:02:06 AM5/7/21
to Roaring Bitmaps
Hi  everyone,

   I tried to ask StackOverflow - but I'm also trying my luck here:

What is the best way performance-wise to compute bitmap of matching positions from two RoaringBitmaps?! Let's have an example:

First bitmap contains: [1, 2, 3, 4, 5, 6, 7, 8, 9]
Second bitmap contains: [ 3, 4, 7, 9]

The result I want to achieve is to perform AND operation upon these two bitmaps but instead of AND output I need the set of positions in the first bitmap that match the values in the second bitmap. So the wanted result in our example is:

Wanted result: [ 2, 3, 6, 8] (i.e. indexes from bitmap 1 that match items from bitmap 2)

It's always true that bitmap 2 is a subset of bitmap 1 (ie. every number in bitmap 2 is also present in bitmap 1).

My current implementation is:

final RoaringBitmap bitmap1 = generatedSets.getAllPrimaryKeys();
final RoaringBitmap bitmap2 = generatedSets.getFilteredKeys();
final RoaringBitmapWriter<RoaringBitmap> resultWriter = RoaringBitmapWriter.writer().constantMemory().runCompress(false).get();
final IntIterator allIt = bitmap1.getBatchIterator().asIntIterator(new int[8192]);
bitmap2.forEach(new IntConsumer() {
   private int position = -1;

   @Override
   public void accept(int filteredId) {
      while (allIt.hasNext()) {
         ++position;
         final int allId = allIt.next();
         if (filteredId == allId) {
               resultWriter.add(++position);
               break;
         }
      }
   }
});
final int[] result = resultWriter.get().toArray();

But it consists of two full scans using iterator / batch iterator and I feel that using some RoaringBitmap magic it could be much more efficient.

The reason behind this is that I have non sorted array that is connected with bitmap 1 and on the same indexes in this non sorted array there is information connected with particular number/id.

Results of above mentioned method aren't so bad:

RoaringBitmapGroup.roaringBitmapGroupBenchmark                              (filteredKeysShare)  (valueCount)   Mode  Cnt      Score   Error  Units

RoaringBitmapGroup.roaringBitmapGroup                   10         10000  thrpt       21372.660          ops/s

RoaringBitmapGroup.roaringBitmapGroup                   10         50000  thrpt        5009.545          ops/s

RoaringBitmapGroup.roaringBitmapGroup                   10        200000  thrpt        1406.591          ops/s

RoaringBitmapGroup.roaringBitmapGroup                    5         10000  thrpt       18790.768          ops/s

RoaringBitmapGroup.roaringBitmapGroup                    5         50000  thrpt        4512.753          ops/s

RoaringBitmapGroup.roaringBitmapGroup                    5        200000  thrpt        1354.632          ops/s

RoaringBitmapGroup.roaringBitmapGroup                    3         10000  thrpt       18759.471          ops/s

RoaringBitmapGroup.roaringBitmapGroup                    3         50000  thrpt        4924.395          ops/s

RoaringBitmapGroup.roaringBitmapGroup                    3        200000  thrpt        1340.569          ops/s

RoaringBitmapGroup.roaringBitmapGroup                  1.5         10000  thrpt       16759.455          ops/s

RoaringBitmapGroup.roaringBitmapGroup                  1.5         50000  thrpt        6326.682          ops/s

RoaringBitmapGroup.roaringBitmapGroup                  1.5        200000  thrpt        1674.980          ops/s

valueCount = size of bitmap 1
filteredKeysShare = ratio of size of bitmap 2 (the less the more sparse bitmap)

Thanks for any idea or feedback,
Best regards, Jan

Reply all
Reply to author
Forward
0 new messages