Controlling jitclass memory allocation

0 views
Skip to first unread message

Charles Cloud

unread,
Nov 29, 2016, 12:28:16 AM11/29/16
to Numba Public Discussion - Public
I'm working on bringing a way to create user defined aggregates in sqlite through Python using numba. I've implemented scalar functions using some dirty, evil metaprogramming hacks (https://github.com/cpcloud/slumba) and I thought doing it for aggregations would be even more fun!

The way SQLite does aggregations is straightforward:

1. You define a step function that's called for each row. In each call to the step function, you update the state of the aggregation as you see fit.
2. You define a finalize function that is called once. This is where you do any final computation from the intermediate results computed in all of the calls to the step function

For example, the avg function would be implemented something like this (ignoring both NULL checks and the case of a table with 0 rows or all NULL rows):

class MyAvg(object):

   
def __init__(self):
       
self.total = 0.0
       
self.count = 0

   
def step(self, value):
       
self.total += value
       
self.count += 1

   
def finalize(self):
       
return self.total / self.count

I thought this would be a natural candidate for numba jitclass-ization.

The thing that holds the state is a blob of bytes controlled by SQLite. There's a C function you use to get that context, and its signature is 

void *sqlite3_aggregate_context(sqlite3_context *ctx, int nbytes)

This function is called inside of step, to allocate a single struct per aggregate call that and is typically cast to whatever type the user wants to use to store the aggregation state. The first time the step function is called, memory is allocated and every subsequent call to step returns the pointer to the memory allocated in the first call. I'm not sure when this memory is freed, the deallocation is handled completely by SQLite.

The problem is two fold. 1) The memory allocated by this function is controlled entirely by SQLite. 2) There doesn't appear to be a way to directly manipulate the struct that is backing the jitclass, other than indirectly through attribute manipulation.

Is there a way to control how the struct that's backing jitclasses is allocated (and freed)? I'd like to cast the result of the call to sqlite3_aggregate_context to the type of the struct defined by numba and use that memory for the numba struct. Is this doable in numba? I've spent an hour or two fooling around with numba.extending, but I'm not sure that's what I want, or if it is it's not 100% clear to me how to manipulate heap allocated memory in numba.

Thanks!

Antoine Pitrou

unread,
Nov 29, 2016, 3:59:14 AM11/29/16
to numba...@continuum.io
On Mon, 28 Nov 2016 21:28:16 -0800 (PST)
Charles Cloud <cpc...@gmail.com> wrote:
>
> The problem is two fold. 1) The memory allocated by this function is
> controlled entirely by SQLite. 2) There doesn't appear to be a way to
> directly manipulate the struct that is backing the jitclass, other than
> indirectly through attribute manipulation.
>
> Is there a way to control how the struct that's backing jitclasses is
> allocated (and freed)? I'd like to cast the result of the call to
> sqlite3_aggregate_context to the type of the struct defined by numba and
> use that memory for the numba struct. Is this doable in numba?

No. I'd recommend you use a cfunc and cast the context argument to a
structured array (with a dtype of your choice) using the carray()
function. That will still give you structured R/W access to the
context's memory.

Regards

Antoine.


Siu Kwan Lam

unread,
Nov 29, 2016, 5:20:08 AM11/29/16
to numba...@continuum.io
I have thought about having an unsafe_cast() function that is similar to a C++ reinterpret_cast but just for internal use.  Such a function will allow casting of an arbitrary pointer into a jitclass instance.  The recently added `numba.extending.intrinsic` allow us to do that as an extension.  I have implemented such function in https://gist.github.com/sklam/ab2948068f76b6b206459fa4e2b4aafc.  The script demonstrates the use of such a unsafe_cast() function to cast a numpy record into a jitclass instance and manipulate it.
The jitclass is designed to allow interop with C.  The storage model for the fields matches that of a C structure (http://numba.pydata.org/numba-doc/latest/proposals/jit-classes.html#storage-model).  The instance itself is just two pointers.  The first being a pointer to the "memory information" structure  (meminfo) for memory management.  The second is a pointer to such a "C struct" data, which are the fields of the instance.  If the meminfo is NULL, the instance will not participate in memory management (reference counting).

There are safety concern for such unsafe casting.  Without memory management, the user must be clear about the lifetime of the underlying resource.  There is no protection against using a free'ed resource.  Nonetheless, carray() function have the same safety problem, but using it inside the restriction of a cfunc makes it safer.

So, are unsafe features such as the unsafe_cast something valuable to users that we should include as a stdlib in numba?  (Safety-concerned languages like Rust and Haskell have unsafe features that bypasses memory safety and type safety.)



--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users...@continuum.io.
To post to this group, send email to numba...@continuum.io.
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/20161129095906.56b5593c%40fsol.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.
--
Siu Kwan Lam
Software Engineer
Continuum Analytics

Charles Cloud

unread,
Nov 30, 2016, 2:22:01 PM11/30/16
to Numba Public Discussion - Public, soli...@pitrou.net
Interesting, this looks like a viable option. I'll investigate. Thanks!

Charles Cloud

unread,
Nov 30, 2016, 2:32:44 PM11/30/16
to Numba Public Discussion - Public
This is the exact thing I was looking for. I'd be really curious if this use case is common or if I'm outside the design space of numba. I actually think that better documentation of the codegen step would go a long way and the fact that there's an API for doing this already requires no additional effort from the numba team except perhaps this documentation. For example, it's not at all clear what's possible in the codegen stage from a user's perspective nor is it clear how to do it correctly. Maybe I simply haven't explored the depths of the documentation for both llvmlite and numba, in which case I'll stop opining :)

In any case, thanks very much for your help.

Siu Kwan Lam

unread,
Dec 2, 2016, 12:46:53 AM12/2/16
to Numba Public Discussion - Public
After discussing with the team, we agree that the unsafe_cast and similar low-level features are useful both internally and for users who wants to interface with another language (e.g. C).  We plan to introduce a "numba.unsafe" module to contain all of the "unsafe" features.  

> it's not at all clear what's possible in the codegen stage from a user's perspective nor is it clear how to do it correctly. 

The @intrinsic API is equivalent to C inline asm and should be used with care like inline asm.  The codegen stage is essentially writing assembly code through metaprogramming using llvmlite's IRBuilder (http://llvmlite.readthedocs.io/en/latest/ir/builder.html).  At this level, you can do anything.  Users should keep the low-level code in @intrinsic at a minimal.  It should just do one simple operation.

The @intrinsic API is relatively new and we want to use it more internally because we fully document it (we have an issue to remind it https://github.com/numba/numba/issues/2106).  For now, those feeling adventurous can use the docstring (https://github.com/numba/numba/blob/master/numba/extending.py#L286-L320).


Charles Cloud

unread,
Dec 2, 2016, 8:22:23 PM12/2/16
to Numba Public Discussion - Public
Awesome, excited to see what's next. FWIW, with some small additions (like an intrinsic "sizeof" operator so I could tell sqlite how much memory to allocate) I was able to get about a 10x speedup using numba + unsafe casting vs the Python stdlib Connection.create_aggregate method (using the builtin avg function as my point of reference). The numba version has the same API (modulo a decorator). Compared to the builtin avg, the numba version was only 2x slower.
Reply all
Reply to author
Forward
0 new messages