Implementing nextSetBit alternative

209 views
Skip to first unread message

Zoran Jeremic

unread,
Jun 2, 2018, 1:53:23 AM6/2/18
to Roaring Bitmaps
Hi guys,

I'm trying to replace Java BitSet with Roaring Bitsets in quite complex analytical system, but I need to keep compatibility with Java BitSet, so I can switch between implementations. So far, I made a good progress and Roaring BitSet shows quite promising results, especially related to the Memory allocation, but also in terms of speed Roaring Bitsets could outperform or at least be as good as Java BitSet. The only outstanding component that I need to redesign rely on the nextSetBit to search for a large number of random values from very large BitSet. Number of bits could be really high, and this is actually a big problem if I can't make it works at least as good as with Java BitSet.

I saw that few years ago there was discussion about this, and it was dismissed as something that could not be implemented with better performance. One of the solutions I could eventually use is to use Roaring Bitmaps until I need to search over it. When I need to search it, I would convert to Java BitSet and use nextSetBit. This would be simplest solution, but I thought that you might suggest something better that relies on Roaring Bitmaps or something that you came up with in the meantime.

Thanks,
Zoran

Daniel Lemire

unread,
Jun 3, 2018, 12:14:29 AM6/3/18
to Roaring Bitmaps

> The only outstanding component that I need to redesign rely on the nextSetBit to search for a large number of random values from very large BitSet.

Can you elaborate? Use code if needed.

Zoran Jeremic

unread,
Jun 3, 2018, 2:50:02 PM6/3/18
to Daniel Lemire, Roaring Bitmaps
Hi Daniel,

I'm not sure if these code examples would be helpful for you, but I copy/pasted some of the code which uses nextSetBit to jump to certain key, and I have several implementations of this method, for different tables. It's used across the application, and keys are accessed in random order from iterators. Because of this random access, iterator would not be very efficient, since bitset can be large, and number of keys I need to access could also be large. For example, iterator calling this method might have values (456789, 17, 453, 2, 56900....), and bitset could have several milions of values.
I've created a sample test which should demonstrate simplified system behavior and compare performance of both implementations

I'm not able to change the system logic, since this affect a number of components, so I have to go with low level changes, and I still need to support Java BitSet implementation and Roaring Bitmap implementation. Ideally, I would just provide nextSetBit implementation for RoaringBitmap, but if that's not possible, I would try other solution you suggest.

Thanks,
Zoran

On Sat, Jun 2, 2018 at 9:14 PM, Daniel Lemire <lem...@gmail.com> wrote:

> The only outstanding component that I need to redesign rely on the nextSetBit to search for a large number of random values from very large BitSet.

Can you elaborate? Use code if needed.

--
You received this message because you are subscribed to the Google Groups "Roaring Bitmaps" group.
To unsubscribe from this group and stop receiving emails from it, send an email to roaring-bitmaps+unsubscribe@googlegroups.com.
To post to this group, send email to roaring-bitmaps@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/roaring-bitmaps/2aeda4e8-88be-422e-8bf4-46d980a6c60a%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

code_examples.rtf
code_examples_test.rtf

Daniel Lemire

unread,
Jun 3, 2018, 9:05:40 PM6/3/18
to Roaring Bitmaps
I am sure we can help but I don’t understand what the code does. Why would you randomly call nextBitSet?

What is the purpose of these calls?

Can you please explain the motivation behind your code?

We need to know the motivation and "my code works like this" is not a motivation. Why does your code do such a thing?

Zoran Jeremic

unread,
Jun 4, 2018, 12:56:22 PM6/4/18
to Daniel Lemire, Roaring Bitmaps
Hi Daniel,

Sorry for not being clear enough. This is OLAP analytics application, and data used for underlying tables are stored in bitmaps. Every time user request some diagram to be displayed, data has to be aggregated from the bitmaps based on filters selected. In some cases we are taking data from one bitset in a sequence starting from certain position, and we get certain number of following bits, e.g. we read 50 set bits in sequence. That is in a cases when we create fact tables, and that is straightforward, because we could just use iterator for that and iterate through the values. However, more often we are reading foreign keys from fact table, and pull data from other bitsets based on these values. In this scenario, we are accessing data form one bitset in arbitrary order as per these foreign keys values, so we can look for index 185, then 56, then 144, and so on, so we can't use iterator for this. We need to implement method that will perform well in both scenarios, and be at least as fast as Java BitSet.

Since this is directly affecting user experience, and we might have really large bitsets, performance is very important

I hope this makes sense.

Thanks,
Zoran


--
You received this message because you are subscribed to the Google Groups "Roaring Bitmaps" group.
To unsubscribe from this group and stop receiving emails from it, send an email to roaring-bitmaps+unsubscribe@googlegroups.com.
To post to this group, send email to roaring-bitmaps@googlegroups.com.

Daniel Lemire

unread,
Jun 4, 2018, 2:19:53 PM6/4/18
to Roaring Bitmaps
So you have a list of values, in random order (not sorted) such as 185, 56, 144... and you have a set B (maybe implemented as a Java BitSet), and you want to output the list of values out 185, 56, 144...  that are included in the set B. Is that correct?

So you want to do something like this?

int[] array = {185, 56, 144,...};
RoaringBitmap b = ...


int[] partofintersection = RoaringBitmap.intersect(array,b);



That is it, right?

Zoran Jeremic

unread,
Jun 4, 2018, 2:36:50 PM6/4/18
to Daniel Lemire, Roaring Bitmaps
Hi Daniel,

Yes. That looks like my case, but I need one value at the time. I can't use intersection in this case, since this is used from interface which is implemented on several places and used across different components of the system. Using intersection would require some architectural changes in all these components, which I can't do, so I need to look for the solution which will return individual bit rather than intersection.

Zoran

Daniel Lemire

unread,
Jun 4, 2018, 2:52:44 PM6/4/18
to Roaring Bitmaps
So what is wrong with

RoaringBitmap b = ...


b
.contains(185); // returns a boolean


b
.contains(56); // returns a boolean


b
.contains(144); // returns a boolean


and so forth?

Zoran Jeremic

unread,
Jun 4, 2018, 3:34:03 PM6/4/18
to Daniel Lemire, Roaring Bitmaps
The problem is in case when bit is not set, and I need to return nextSetBit. Is it possible to implement it at least equally efficient as in Java BitSet?

Daniel Lemire

unread,
Jun 4, 2018, 5:33:07 PM6/4/18
to Roaring Bitmaps




The problem is in case when bit is not set, and I need to return nextSetBit.


Why?

We have agreed that you have a list of values, you want to know which ones belong to the set...  We also agreed that you are not going to start iterating over the values. Thus you do not need nextSetBit.

Zoran Jeremic

unread,
Jun 4, 2018, 6:02:43 PM6/4/18
to Daniel Lemire, Roaring Bitmaps
I've try to implement it in this way:

override def nextSetBit(fromIndex: Int): Int = {
var currentIndex = fromIndex
var setBit = -1
while (currentIndex < nBits && setBit == -1) {
if (underlying.contains(currentIndex)) {
setBit = currentIndex
} else {
currentIndex += 1
}
}
setBit
}
It should return bit I'm looking for, or it would search for next bit which is set. This doesn't have good performance. 
I've also tried contains(index) in performance test where it always returns a value, so it doesn't have to search for next bit. Even in those cases, it performs worse than java implementation when bitmap density is higher than 50%. So, as long as I have less than 50% of bits set, it performs faster than Java implementation. when density is more than 50%, it performs worse. 
If I have 50% density, and make a query with 50% of matching and 25% of non-existing bits, performance are hundreds of times worse.

These are results of tests I'm running on the 10 milions of bits. Time is in milliseconds. Density 10% means I have 1 million bits set. Looking for 50% of bits means I'm looking for 5 milions of bits, so half of the bits are not found. 

BitSet density: 10% TIME:74 

BitSet density: 25% TIME:357 

BitSet density: 50% TIME:84 

BitSet density: 75% TIME:69 

BitSet density: 100% TIME:88 

BitSet density: 25% looking for 50% of bits TIME:73 

BitSet density: 50% looking for 75% of bits TIME:390 

[info] JavaBitSetImplementationsNextSetBitBenchmark:

 

BitSet density: 10% TIME:24 

BitSet density: 25% TIME:81 

BitSet density: 50% TIME:133 

BitSet density: 75% TIME:273 

BitSet density: 100% TIME:460 

BitSet density: 25% looking for 50% of bits TIME:120 

BitSet density: 50% looking for 75% of bits TIME:445 

[info] RoaringBitSetImplementationsNextSetBitBenchmark:



Zoran Jeremic

unread,
Jun 4, 2018, 6:06:07 PM6/4/18
to Daniel Lemire, Roaring Bitmaps
I've try to implement it in this way:

override def nextSetBit(fromIndex: Int): Int = {
var currentIndex = fromIndex
var setBit = -1
while (currentIndex < nBits && setBit == -1) {
if (underlying.contains(currentIndex)) {
setBit = currentIndex
} else {
currentIndex += 1
}
}
setBit
}
It should return bit I'm looking for, or it would search for next bit which is set. This has really bad performance. 
I've also tried contains(index) in performance test where it always returns a value, so it doesn't have to search for next bit. Even in those cases, it performs really bad when bitmap density is higher than 50%. So, as long as I have less than 50% of bits set, it performs faster than Java implementation. when density is more than 50%, it performs worse. 
If I have 50% density, and make a query with 50% of matching and 25% of non-existing bits, performance are hundreds of times worse.

These are results of tests I'm running on the 10 milions of bits. Time is in milliseconds. Density 10% means I have 1 million bits set. 

BitSet density: 10% TIME:48 

BitSet density: 25% TIME:394 

BitSet density: 50% TIME:149 

BitSet density: 75% TIME:300 

BitSet density: 100% TIME:264 

BitSet density: 25% looking for 50% of bits TIME:70 

BitSet density: 50% looking for 75% of bits TIME:64 

[info] JavaBitSetImplementationsNextSetBitBenchmark:



BitSet density: 10% TIME:20 

BitSet density: 25% TIME:44 

BitSet density: 50% TIME:130 

BitSet density: 75% TIME:271 

BitSet density: 100% TIME:452 

BitSet density: 25% looking for 50% of bits TIME:117 

BitSet density: 50% looking for 75% of bits TIME:467 

[info] RoaringBitSetImplementationsNextSetBitBenchmark:



Daniel Lemire

unread,
Jun 4, 2018, 10:21:09 PM6/4/18
to Roaring Bitmaps
We can do better than what you have done there. Richard Startin implemented something much better which available in RoaringBitmap 0.7.12 called nextValue. It returns a long since RoaringBitmaps cover the full 32-bit range.

However, the code that you seem to be using appears to follow a bad design. So I would not, in general, expect good performance. Yet, if you are going to insist on using Roaring the way you describe, please use Richard's new function in 0.7.12.

- Daniel

Zoran Jeremic

unread,
Jun 5, 2018, 4:58:11 PM6/5/18
to Daniel Lemire, Roaring Bitmaps
Vau. Great job. This looks amazing now :)
The results I'm getting now on 10 million bits are given bellow.

The only case when it fails is when bitmap density is 25% and I'm searching for values that are all set, so it calls contains(index) only and skips nextValue(index). I'm not sure why, but it's always 3-4 times longer.

One thing that confuses me is that if my underlying bitmap is initialized as RoaringBitmap, it performs much slower, but if I initialize it as MutableRoarningBitmap and convert later to RoaringBitmap, I get these good results. Any particular reason why this would happen? I would expect these two to have same performance, and since some of the APIs are not available under MutableRoaringBitmap, I wanted to use only one underlying bitmap.

Also, I have performance improvement if I perform if contains validation before calling nextValue:

if(bitmap.contains(fromIndex)){
fromIndex
} else {
TypeConverter.longToInt(bitmap.nextValue(fromIndex))
}
Shouldn't nextValue check if value is there and skip further code execution?


itSet density: 10% TIME:33 

BitSet density: 25% TIME:113 

BitSet density: 50% TIME:52 

BitSet density: 75% TIME:37 

BitSet density: 100% TIME:49 

BitSet density: 25% looking for 50% of bits TIME:25 

BitSet density: 50% looking for 75% of bits TIME:42 

BitSet density: 75% looking for 100% of bits TIME:48 

BitSet density: 25% looking for 75% of bits TIME:52 

[info] RoaringBitSetImplementationsNextSetBitBenchmark:


BitSet density: 10% TIME:32 

BitSet density: 25% TIME:23 

BitSet density: 50% TIME:43 

BitSet density: 75% TIME:60 

BitSet density: 100% TIME:113 

BitSet density: 25% looking for 50% of bits TIME:48 

BitSet density: 50% looking for 75% of bits TIME:70 

BitSet density: 75% looking for 100% of bits TIME:106 

BitSet density: 25% looking for 75% of bits TIME:97 

[info] InMemoryBitSetImplementationsNextSetBitBenchmark:


Thanks,
Zoran

Zoran Jeremic

unread,
Jun 5, 2018, 6:19:53 PM6/5/18
to Daniel Lemire, Roaring Bitmaps
Actually, it turned out that these tests were not correct :(
I was converting MutableRoaringBitmap to RoaringBitmap too early, so it was empty during the test. Performance were better with my previous code than after using this nextValue implementation.
Reply all
Reply to author
Forward
0 new messages