Extending __getitem__

0 views
Skip to first unread message

Jim Pivarski

unread,
Nov 6, 2017, 8:47:20 PM11/6/17
to Numba Public Discussion - Public
Hi,

I'd like to use the Numba extension mechanism to override __getitem__ of Numpy subclasses. I've found examples that suggest I should be writing a numba.extending.lower_builtin function with "getitem" (a quoted string) as the first argument, but I must be missing some simple "register" mechanism or something, because it's not recognizing my override:

import numba
import numpy

class Wrapped(numpy.ndarray):
   
def __new__(cls, original):
       
return original.view(cls)
   
def __array_finalize__(self, obj):
       
pass

class WrappedType(numba.types.Type):
   
def __init__(self):
       
super(WrappedType, self).__init__(name="Wrapped")

wrappedtype
= WrappedType()

@numba.extending.typeof_impl.register(Wrapped)
def typeof_index(val, c):
   
return wrappedtype

@numba.extending.lower_builtin("getitem", wrappedtype, numba.types.Integer)
def wrapped_getitem(context, builder, sig, args):
   
return 999

@numba.njit
def testie(x):
   
return x[2]

a
= Wrapped(numpy.array([1, 2, 3, 4, 5]))

print(testie(a))

This results in the same compilation error as if I hadn't written a "wrapped_getitem" function:

Traceback (most recent call last):
 
File "<stdin>", line 1, in <module>
 
File "<string>", line 11, in <module>
 
File "/home/pivarski/.local/lib/python2.7/site-packages/numba/dispatcher.py", line 330, in _compile_for_args
   
raise e
numba
.errors.TypingError: Caused By:
Traceback (most recent call last):
 
File "/home/pivarski/.local/lib/python2.7/site-packages/numba/compiler.py", line 238, in run
    stage
()
 
File "/home/pivarski/.local/lib/python2.7/site-packages/numba/compiler.py", line 452, in stage_nopython_frontend
   
self.locals)
 
File "/home/pivarski/.local/lib/python2.7/site-packages/numba/compiler.py", line 865, in type_inference_stage
    infer
.propagate()
 
File "/home/pivarski/.local/lib/python2.7/site-packages/numba/typeinfer.py", line 844, in propagate
   
raise errors[0]
TypingError: Invalid usage of getitem with parameters (Wrapped, int64)
 
* parameterized
File "<string>", line 9
[1] During: typing of intrinsic-call at <string> (9)
[2] During: typing of static-get-item at <string> (9)

Failed at nopython (nopython frontend)
Invalid usage of getitem with parameters (Wrapped, int64)
 
* parameterized
File "<string>", line 9
[1] During: typing of intrinsic-call at <string> (9)
[2] During: typing of static-get-item at <string> (9)

Does anyone know what I'm missing?
Thanks!
-- Jim


Jim Pivarski

unread,
Nov 7, 2017, 10:40:04 AM11/7/17
to Numba Public Discussion - Public
Here's a somewhat hacky way to do it; I hope that someone can provide some insight about how these things are supposed to work and therefore to improve it.

First of all, I had registered my Numpy array subclass ("Wrapped"), but didn't create a model or boxing/unboxing rules. I can't subclass just any model— for instance, I can't use the StructModel example from the documentation— it doesn't provide whatever hooks are necessary to be able to override getitem. Since "Wrapped" is an array subclass, I can subclass the ArrayModel and pass through its boxing and unboxing rules.

import numba
import numpy

class Wrapped(numpy.ndarray):
   
def __new__(cls, original):
       
return original.view(cls)
   
def __array_finalize__(self, obj):
       
pass

class WrappedType(numba.types.Array): pass

wrappedtype
= WrappedType(numba.types.int64, 1, "C", readonly=True, name="Wrapped", aligned=True)


@numba.extending.typeof_impl.register(Wrapped)
def typeof_index(val, c):
   
return wrappedtype

@numba.extending.register_model(WrappedType)
class WrappedModel(numba.datamodel.models.ArrayModel):
   
pass

@numba.extending.unbox(WrappedType)
def unbox_wrapped(typ, obj, c):
   
return numba.targets.boxing.unbox_array(typ, obj, c)

@numba.extending.box(WrappedType)
def box_wrapped(typ, val, c):
   
return numba.targets.boxing.box_array(typ, val, c)

I had also been confused about what I should put in the lowered function implementation— apparently, I should be emitting LLVM code here. That's too low-level for my state of knowledge. Fortunately, there's a high-level interface, where I can use Numba's type inference and compilation, and just write my function as I would in an ordinary @njit.

As far as I know, however, I can't do this for getitem directly, so I make a new function with a new name. (Hacky workaround #1.)

def something(array, index):
   
return array[index] + 100

@numba.extending.overload(something)
def wrapped_something(array, index):
   
if isinstance(array, WrappedType) and isinstance(index, numba.types.Integer):
       
return something

This function doesn't exist until we use it in an @njit, so I have to do evaluate it with dummy arguments. (Hacky workaround #2.)

@numba.njit
def test1(x):
   
return something(x, 2)

a
= Wrapped(numpy.array([0, 10, 20, 30]))
result
= test1(a)
print(result)

Now I can write my getitem lowering by searching through the function definitions in the context, finding the one I've just instantiated into existence, and then passing the arguments to it. (Very hacky workaround #3.)

@numba.extending.lower_builtin("getitem", WrappedType, numba.types.Integer)

def wrapped_getitem(context, builder, sig, args):

   
for key, value in context._defns.items():
       
if getattr(key, "__name__", None) == "something":
           
break
    imp
= value.versions[0][1]
   
return imp(context, builder, sig, args)

@numba.njit
def test2(x):
   
return x[3]

result
= test2(a)
print(result)

It works. Note that I can't use my function "something" as a key for "context._defns" (or use the high-level "context.get_function") because the key I want is actually a numba.targets.base.OverloadSelector. It has an imp (LLVM implementation) of each function signature.

So this is a start. If I were writing a direct-use application, this would be sufficient. But I'm writing a library that will reinterpret array contents on demand, and the above is too hacky for a library. (It clutters the namespace with functions like "something" and relies on a fairly awkward way to compile it and get at the compiled version.

Any ideas?
Thanks!
-- Jim

Jim Pivarski

unread,
Nov 7, 2017, 11:20:43 AM11/7/17
to Numba Public Discussion - Public
This is less terrible: it doesn't use a function call on dummy data to invoke compilation, and although it pokes a hole in the namespace, I can reuse a canonical name (based on the name of my project, for instance). The following replaces the three steps of the hack above:

@numba.njit([numba.types.int64(wrappedtype, numba.types.int64)])

def something(array, index):
   
return array[index] + 100


cres
= something.overloads.values()[0]
something_imp = cres.target_context.get_function(cres.entry_point, cres.signature)._imp
del cres.target_context._defns[cres.entry_point]

@numba.extending.lower_builtin("getitem", WrappedType, numba.types.Integer)
def wrapped_getitem(context, builder, sig, args):

   
return something_imp(context, builder, sig, args)

Now I skip the @numba.extending.overload (I didn't want an overload, anyway) and replace it with a straight @numba.njit declaration. I give it an explicit signature because I'll know what that signature should be. The CompilationResult ("cres") has an "entry_point" which is the key I was looking for, though "get_function" gives me a wrapped version of the LLVM implementation. I only have to invoke two underscored attributes: "_imp" to unwrap the LLVM implementation and "_defns" to clean up the namespace. The latter may be optional.

-- Jim

Jim Pivarski

unread,
Nov 7, 2017, 4:23:19 PM11/7/17
to Numba Public Discussion - Public
One last thing: I was making my custom type a subclass of numpy.ndarray (and later, tuple) because this has a __getitem__ defined in the type inference. This also restricted the possible types to those that make sense for arrays (or tuples), which was too constraining. Besides, I should be able to control this. Here's how: define a subclass of AbstractTemplate with an "infer" decorator.

@numba.typing.templates.infer
class GetItemInterval(numba.typing.templates.AbstractTemplate):
    key
= "getitem"

   
def generic(self, args, kwds):
        tpe
, idx = args
       
if isinstance(tpe, WrappedType):
            idx
= numba.typing.builtins.normalize_1d_index(idx)
           
if isinstance(idx, numba.types.Integer):
               
return numba.typing.templates.signature(numba.types.float64, tpe, idx)

Naturally, the constraints and corresponding signature can have any form, and there's a small set of keys defined as strings.

Numba's extension capability is awesome, but underdocumented.

Jim Pivarski

unread,
Nov 8, 2017, 1:37:37 PM11/8/17
to Numba Public Discussion - Public
In case anyone is following this discussion I'm having with myself and wants to do a non-trivial Numba extension, here's a complete example:


(That link should remain valid after I merge this branch into the main codebase.) This is an implementation of jagged (ragged) arrays— arrays of arrays where the inner arrays may have any length. This implementation maintains two contiguous arrays: the "contents," which is a concatenation of all inner arrays, and the "stops", which are the indices where each inner array stops (exclusive upper bound on index). If you're creating this from a list of inner array lengths, the "stops" is the numpy.cumsum of those lengths. The start of list i is either 0 or stop[i - 1]. Whenever the user wants inner array i, we produce it on the fly by viewing the larger array.

This is easy to code up in Python by overloading __getitem__, __len__, and __iter__. However, none of that would be available inside a Numba JIT-compiled function. The implementation above adds JaggedArray's custom __getitem____len__, and __iter__ functionality. With it, you can do stuff like

a = JaggedArray.fromlists([1.1, 1.1, 1.1], [], [3.3, 3.3])
# a = JaggedArray(numpy.array([1.1, 1.1, 1.1, 3.3, 3.3]), numpy.array([3, 3, 5]))

@numba.njit
def test1(a, i):
   
return a[i]
print(test1(a, 0), a[0])
print(test1(a, 1), a[1])
print(test1(a, 2), a[2])
print(test1(a, -1), a[-1])
print(test1(a, -2), a[-2])
print(test1(a, -3), a[-3])

try:
    test1
(a, -3), a[3]
except IndexError:
   
print("IndexError")

try:
    test1
(a, -3), a[-4]
except IndexError:
   
print("IndexError")

@numba.njit
def test2(a):
   
out = 0.0
   
for ai in a:
       
out += ai.sum()
   
return out
print(test2(a))
print(test2(a))
print(test2(a))

@numba.njit
def test3(a):
   
return len(a)
print(test3(a))

and get output like

[ 1.1  1.1  1.1] [ 1.1  1.1  1.1]
[] []
[ 3.3  3.3] [ 3.3  3.3]
[ 3.3  3.3] [ 3.3  3.3]
[] []
[ 1.1  1.1  1.1] [ 1.1  1.1  1.1]
IndexError
IndexError
9.9
9.9
9.9
3

(The in-Python and in-Numba results agree.)

In the code, you'll find implementations of a JaggedArrayType with "contents" and "stops" members, getitem with arbitrary integer input type (no slices), a "len" implementation (which was by far the easiest), and iteration with a custom iterator. JaggedArray objects can also be unboxed (passed into a JIT-compiled function) and boxed (passed out of one), but not constructed within a JIT-compiled function. I didn't implement the constructor because I wasn't sure how to handle reference counting for the content and stops arrays when those arrays are already unboxed. Also, I won't be needing that functionality.

I hope this helps anyone else who's struggling with the extension mechanism!

Cheers,
-- Jim

Naveen Michaud-Agrawal

unread,
Nov 8, 2017, 3:21:11 PM11/8/17
to numba...@continuum.io
Thanks, this looks great! Could be a useful approach to dictionary encode complicated columnar data.

Naveen

--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users+unsubscribe@continuum.io.
To post to this group, send email to numba...@continuum.io.
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/93f5c6ef-eb80-4771-b244-2e4c8f7a5240%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Jim Pivarski

unread,
Nov 8, 2017, 4:08:26 PM11/8/17
to Numba Public Discussion - Public
I suppose what I'm trying to do is rather transparent: this is the ROOT I/O arm of a larger project to perform calculations over Object-Array-Maps (OAM in analogy with ORM). Up to now, I've been computing functions on complicated columnar data by transforming the Python AST and then passing the results to Numba:


My AST-to-AST transformation is essentially a compiler that has to do type inference. It would be more robust if I could just express my types and lowering rules for an existing compiler, such as Numba. I had some trouble with this half a year ago, but I'm starting to understand Numba's internals well enough that I think I'll be able to rewrite OAMap as a suite of Numba extensions. Then we'd be able to do things like write functions that take Apache Arrow buffers as input, treat them as nested objects in the source code, but have that compile to direct memory access (compact and usually sequential). If all works well and the Numba team is interested, it could become a submodule in Numba. (That's why I'm keeping the OAMap code separate from ROOT I/O.)

-- Jim

Stanley Seibert

unread,
Nov 8, 2017, 4:15:52 PM11/8/17
to Numba Public Discussion - Public
As a former ROOT user myself, I'm excited to see this!  I'd like to figure out how to add a more tutorial-oriented section to our docs that capture what you've described here for other extension authors.

--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users+unsubscribe@continuum.io.
To post to this group, send email to numba...@continuum.io.

Stanley Seibert

unread,
Nov 8, 2017, 4:18:47 PM11/8/17
to Numba Public Discussion - Public
And yes, we are interested in facilitating projects that build upon the Numba core (but don't necessarily have to become part of the Numba code base).  Before Numba 1.0, we'll be listing out exactly which things are considered stable for the 1.x series, and suggesting that all projects that depend on Numba fix their dependency to Numba >= 1 and Numba < 2.  Until 1.0 comes out next year, we don't expect any major breakage but can't guarantee it.

marss...@gmail.com

unread,
Nov 27, 2017, 9:06:01 AM11/27/17
to Numba Public Discussion - Public
What are the general 1.0 and 2.0 visions? Will messing with  fast new abstractions be supported directly in the type system?


On Wednesday, November 8, 2017 at 4:18:47 PM UTC-5, Stanley Seibert wrote:
And yes, we are interested in facilitating projects that build upon the Numba core (but don't necessarily have to become part of the Numba code base).  Before Numba 1.0, we'll be listing out exactly which things are considered stable for the 1.x series, and suggesting that all projects that depend on Numba fix their dependency to Numba >= 1 and Numba < 2.  Until 1.0 comes out next year, we don't expect any major breakage but can't guarantee it.
On Wed, Nov 8, 2017 at 3:15 PM, Stanley Seibert <ssei...@anaconda.com> wrote:
As a former ROOT user myself, I'm excited to see this!  I'd like to figure out how to add a more tutorial-oriented section to our docs that capture what you've described here for other extension authors.
On Wed, Nov 8, 2017 at 3:08 PM, Jim Pivarski <jpiv...@gmail.com> wrote:
I suppose what I'm trying to do is rather transparent: this is the ROOT I/O arm of a larger project to perform calculations over Object-Array-Maps (OAM in analogy with ORM). Up to now, I've been computing functions on complicated columnar data by transforming the Python AST and then passing the results to Numba:


My AST-to-AST transformation is essentially a compiler that has to do type inference. It would be more robust if I could just express my types and lowering rules for an existing compiler, such as Numba. I had some trouble with this half a year ago, but I'm starting to understand Numba's internals well enough that I think I'll be able to rewrite OAMap as a suite of Numba extensions. Then we'd be able to do things like write functions that take Apache Arrow buffers as input, treat them as nested objects in the source code, but have that compile to direct memory access (compact and usually sequential). If all works well and the Numba team is interested, it could become a submodule in Numba. (That's why I'm keeping the OAMap code separate from ROOT I/O.)

-- Jim

--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users...@continuum.io.

To post to this group, send email to numba...@continuum.io.
Reply all
Reply to author
Forward
0 new messages