I'm working on bringing a way to create user defined aggregates in sqlite through Python using numba. I've implemented scalar functions using some dirty, evil metaprogramming hacks (
https://github.com/cpcloud/slumba) and I thought doing it for aggregations would be even more fun!
The way SQLite does aggregations is straightforward:
1. You define a step function that's called for each row. In each call to the step function, you update the state of the aggregation as you see fit.
2. You define a finalize function that is called once. This is where you do any final computation from the intermediate results computed in all of the calls to the step function
For example, the avg function would be implemented something like this (ignoring both NULL checks and the case of a table with 0 rows or all NULL rows):
class MyAvg(object):
def __init__(self):
self.total = 0.0
self.count = 0
def step(self, value):
self.total += value
self.count += 1
def finalize(self):
return self.total / self.count
I thought this would be a natural candidate for numba jitclass-ization.
The thing that holds the state is a blob of bytes controlled by SQLite. There's a C function you use to get that context, and its signature is
void *sqlite3_aggregate_context(sqlite3_context *ctx, int nbytes)
This function is called inside of step, to allocate a single struct per aggregate call that and is typically cast to whatever type the user wants to use to store the aggregation state. The first time the step function is called, memory is allocated and every subsequent call to step returns the pointer to the memory allocated in the first call. I'm not sure when this memory is freed, the deallocation is handled completely by SQLite.
The problem is two fold. 1) The memory allocated by this function is controlled entirely by SQLite. 2) There doesn't appear to be a way to directly manipulate the struct that is backing the jitclass, other than indirectly through attribute manipulation.
Is there a way to control how the struct that's backing jitclasses is allocated (and freed)? I'd like to cast the result of the call to sqlite3_aggregate_context to the type of the struct defined by numba and use that memory for the numba struct. Is this doable in numba? I've spent an hour or two fooling around with numba.extending, but I'm not sure that's what I want, or if it is it's not 100% clear to me how to manipulate heap allocated memory in numba.
Thanks!