Dear Vincent, thanks for your reply.
To sum up my reply: Should I try to replace bitsets completely by RaoringBitmap and see how it performs? (I need some help with the benchmarking, as I don't know good benchmarks for other use cases.)
to 1. Okay. I wasn't aware of that. I just copied the behaviour of bitset.pxi/pyx/pxd. I didn't know that this kind of usage is deprecated.
They claim it's a simple pip install. So it might be suitable as a standard package in sage.
It only supports values up to 2**32. But seriously, our current implementation requests 512 MB continous memory in this case. So this performs awful anyway.
I was reluctant to use it, because I thought, you cannot avoid allocating memory over and over.
Only after you and Travis insisted to give up our private implemtation, I looked up, if they have all the features I want.
(And we can always include this definition ourselves from the header.)
With this to perform `dest = A & B`, I need to copy `A` first and to do the operation inplace then. This is a bit slower than optimal, but avoids allocation, if possible.
So after a couple of initial reallocs there shouldn't be any more memory allocations.
The optimized subset check is much more important to me, which appears to be at least as good as my implementation.
Depending on what you and other people think of, I can try to make use of RoaringBitMap or any other library that likely performs well.
I could try to create a branch that completely rips out bitset.pxi and replaces it by a different implementation (with same names as to not change anything else). Then many people could try to come up with benchmarks in different parts of sage.
E.g. this might make dense graphs much better in comparison.
to 3. I didn't try many things yet. For my use case the only thing I need (at the moment) is fast intersection and fast subset check. Dense bitsets perform pretty good with that as it is hard to beat 256 bits per CPU-step (whatever this means). I could perform a bit better however,
by keeping an array of the significant chunks. I wanted to try different things. This is the main reason for
https://trac.sagemath.org/ticket/30549, because my design choice is awful, if you want to change one implemtation detail. With this ticket I can pretty much tell the face-structure that it consists of a roaring bitmap of atoms and coatoms and adapt a few basic functions to the name scheme of that (I also need to take care of freeing the memory, but hey, that's just writing a couple of __dealloc__ methods).