Crunching bit fields with bcolz

Ken Seehart

unread,

May 15, 2019, 12:50:00 AM5/15/19

to bcolz

My use case involves lots of bit field expressions. The primary ingredient of bit fields is the expression (a>>b)&c.

Unfortunately, numexpr doesn't support bit field use cases. As the name suggests, the assumption is that expressions are about numbers, and not about bits.This is surprising to me because I would expect there to be plenty of use cases for bit field crunching.

To be more specific, there is no datatype that supports both >> and & in numexpr. You need both of those to do anything with bit fields.

There has been some old discussion about the absence of support for unsigned int, which is what one would naturally choose for bit field tasks. That would be nice, but that's beating a 10 year old dead horse, and it's not going to happen, and it wouldn't solve the problem anyway. The argument against implementing unsigned int is that it is technically challenging to implement all the operators with all combinations of types such that everything makes sense mathematically. Fortunately, signed int would be adequate for bit field processing if it supported bit-wise &. It doesn't. I'm fine with signed int as 2's complement logic is not a problem for bit fields (the only caveat is that >> fills in the left with the sign bit instead of 0, but that difference is erased by the &). The only problem is that numexpr doesn't currently support & for signed int.

The easiest solution to all this would be to implement & for signed integers. I can't imagine any conflict with other use cases, since & doesn't have many mathematical use cases. I'm just talking about adding and_iii, and_lll. We don't have to go into dealing with new uses being surprised by unexpected results from bitwise & between 64 bit floating point and 32 bit int. We can safely assume that if someone is performing bitwise & that they are working with bits. Currently, & is only supported for bytes (and_bbb), but there is no >> for bytes (rshift_bbb), and in any case I would not want to process one byte at a time.

There's a secondary problem in my use case: numexpr doesn't support conditional expressions such as "a if b else c if d else e". But I'll cross that bridge when I get to it.

Until then, this means that I have to fall back on 'python' vm to evaluate my expressions. This works fine, but it's sad to miss out on the performance boost.

So I'd like comments on alternative solutions to the general problem of implementing bit field operations, i.e. "(a>>b)&c)", using a column store.

Options I've considered:

Add support for and_iii, and_lll to numexpr

Any downside to this?
It doesn't seem to have the can of worms that adding unsigned it would have.

Use numba to implement the inner loop, to be invoked per block

I like numba because it does exactly what I want it to do with my expressions
I can't imagine anything faster than compiled code running on each block
Unfortunately iterblocks() seems to have a lot of overhead that completely kills any performance gains
Also, iterblocks loads all columns. Is there a way to pull just the columns I need?
So I probably need something lower level than iterblocks to access the columns.

Find a different column store

Any ideas?

Or maybe just go settle for uncompressed flat binary data and roll my own solution with numba and numpy....

Francesc Alted

unread,

May 15, 2019, 3:58:32 AM5/15/19

to Bcolz

Hi Ken,

I see that you are incurring into the limitations of numexpr. What about using dask as the computing engine inside bcolz instead of numexpr? See https://bcolz.readthedocs.io/en/latest/reference.html#bcolz.eval. My experience is that numexpr is typically faster than dask, but on the other hand, dask is more featured, so perhaps it covers your use case well.

Regarding specifying columns for iterblocks, perhaps whereblocks would be a good fit for you: https://bcolz.readthedocs.io/en/latest/reference.html?highlight=whereblocks#bcolz.ctable.ctable.whereblocks

Note that if you want to process all the blocks, then you just have to pass a trivial condition like 'True' or similar.

Also, in case dask is not a good fit for you either, you may want to use numba so as to process the blocks out of the whereblocks iterator. After pondering about this, perhaps that would be the best option for you.

Hope it helps!

Missatge de Ken Seehart <kenseehart@gmail.com> del dia dc., 15 de maig 2019 a les 6:50:

--
You received this message because you are subscribed to the Google Groups "bcolz" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bcolz+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bcolz/31a19097-da1a-46bd-ad21-8c1a59c2ebe7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Francesc Alted

Ken Seehart

unread,

May 15, 2019, 6:51:29 PM5/15/19

to bcolz

Thank you, helps a lot!

Yes, vm='dask' gives me a 68% increase in performance over vm='python', and handles my expressions correctly using unsigned int.

Given that dask seems to work for me, whereblocks doesn't seem to add anything.My primary performance issue is applying my filter expression. Although whereblocks gives me the ability to specify columns, it has to touch every record before my code does anything, so it kills performance.

bcolz can use numexpr or dask internally (numexpr is used by default if installed, then dask and if these are not found, then the pure Python interpreter) so as to accelerate many internal vector and query operations (although it can use pure NumPy for doing so too). numexpr can optimize memory (cache) usage and uses multithreading for doing the computations, so it is blazing fast. This, in combination with carray/ctable disk-based, compressed containers, can be used for performing out-of-core computations efficiently, but most importantly transparently.

So vm='dask' doesn't invoke dask's parallel computing capability?

I'd love to throw more cores at this. How do I get dask parallel computation in a bcolz application?

Francesc Alted

unread,

May 16, 2019, 8:00:58 AM5/16/19

to Bcolz

Missatge de Ken Seehart <kense...@gmail.com> del dia dj., 16 de maig 2019 a les 0:51:

Thank you, helps a lot!

Yes, vm='dask' gives me a 68% increase in performance over vm='python', and handles my expressions correctly using unsigned int.

Given that dask seems to work for me, whereblocks doesn't seem to add anything.My primary performance issue is applying my filter expression. Although whereblocks gives me the ability to specify columns, it has to touch every record before my code does anything, so it kills performance.

Not exactly. whereblocks only touches all the rows in the *condition* column (or array, whatever you prefer). Moreover, if you specify the columns for the output, *only* these columns are touched. In general, there can be a big difference in terms of performance. See the example that I prepared about this:

https://github.com/Blosc/bcolz/blob/master/examples/querying-ranges.ipynb

bcolz can use numexpr or dask internally (numexpr is used by default if installed, then dask and if these are not found, then the pure Python interpreter) so as to accelerate many internal vector and query operations (although it can use pure NumPy for doing so too). numexpr can optimize memory (cache) usage and uses multithreading for doing the computations, so it is blazing fast. This, in combination with carray/ctable disk-based, compressed containers, can be used for performing out-of-core computations efficiently, but most importantly transparently.

So vm='dask' doesn't invoke dask's parallel computing capability?

I'd love to throw more cores at this. How do I get dask parallel computation in a bcolz application?

I am not sure on how to use all the cores in dask, although my perception was that the parallelism was enabled by default, but I might be wrong. But, as said, I think whereblocks + numba can provide a much better service to you (see the example above).

Francesc

Reply all

Reply to author

Forward