PEP 576 and 580: faster call protocol

2 views
Skip to first unread message

Jeroen Demeyer

unread,
Jul 10, 2018, 10:16:25 AM7/10/18
to Numba Public Discussion - Public
Dear Numba developers,

I want to draw your attention to PEP 576 [1] and 580 [2]. I'm the author of PEP 580. The basic problem that these try to solve is that CPython has various optimizations for calling built-in functions/methods but that user-defined classes cannot benefit from these. Those PEPs define a protocol which every class can use. Since numba is all about creating fast functions, I think that it could benefit from a faster calling protocol. The two PEPs in question are two different solutions to this problem. I wrote a comparison in [3] between the two.

So my question to you is: does this look useful for Numba? If so, do you prefer PEP 576 or PEP 580?


Jeroen.


[1] https://www.python.org/dev/peps/pep-0576/
[2] https://www.python.org/dev/peps/pep-0580/
[3] https://mail.python.org/pipermail/python-dev/2018-July/154238.html

Jim Pivarski

unread,
Jul 10, 2018, 10:19:30 AM7/10/18
to numba...@continuum.io
If you haven't tried it yet, the numba/numba-dev Gitter separates messages for developers from those for users (numba/numba). It could be worth repeating this message there.




--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users...@continuum.io.
To post to this group, send email to numba...@continuum.io.
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/563f34f1-07b4-493d-95ff-fb703e1b8676%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Jim Pivarski

unread,
Jul 10, 2018, 10:20:57 AM7/10/18
to numba...@continuum.io
Sorry for the broadcast! That was intended for Jeroen.

Siu Kwan Lam

unread,
Jul 11, 2018, 12:26:15 PM7/11/18
to numba...@continuum.io
Hi Jeroen, 

Thanks for pointing out these two important PEPs.  After reading them, I am still unsure what their impact is to Numba as I am rather weawk in these parts of the CPython internal.  On the other hand, I can point out what Numba is doing.  Jitted functions in Numba are wrapped into a builtin_function_or_method type by using PyCFunction_NewEx (See https://github.com/numba/numba/blob/52ff5bff6199e0cdfff28a6cd4632e1646212ff3/numba/_dynfunc.c#L295).  The function object is then bound to an instance of ClosureObject which holds context info for the function.  This leads to my first question: Is the builtin_function_or_method object already getting the benefit from the fastcall?

Since a @jit-decorated function can be overloaded to emulate duck-typing.  A @jit'ed function is turned into a collection of the builtin_function_or_method objects, each representing a different type-signature.  The dispatcher finds the appropriate overload version and call the corresponding builtin_function_or_method objects.  This leads to my second question: With these PEPs, can I speedup the Dispatcher type by using the fastcall convention?

Thanks,
Siu

--
Siu Kwan Lam
Software Engineer
Anaconda, Inc

Jeroen Demeyer

unread,
Jul 11, 2018, 3:22:15 PM7/11/18
to numba...@continuum.io
On 2018-07-11 18:26, Siu Kwan Lam wrote:
> This leads
> to my first question: Is the /builtin_function_or_method /object already
> getting the benefit from the fastcall?

It depends on how it is called. There currently is no public API for
calling a function using the FASTCALL convention, there are private
functions like _PyCFunction_FastCallKeywords(). Internally, those are
used by the CPython bytecode interpreter.

> The dispatcher finds the appropriate overload version
> and call the corresponding /builtin_function_or_method /objects.

Are those builtin_function_or_method objects exposed to the outside? I'm
just wondering why you use that instead of bare C function pointers.

> This
> leads to my second question: With these PEPs, can I speedup the
> /Dispatcher/ type by using the fastcall convention?

Yes, that's exactly the point of those PEPs.

Does numba internally optimize those calls, say from one @jit function
to another @jit function?


Jeroen.

Siu Kwan Lam

unread,
Jul 12, 2018, 9:56:51 AM7/12/18
to Numba Public Discussion - Public


On Wednesday, July 11, 2018 at 2:22:15 PM UTC-5, Jeroen Demeyer wrote:
On 2018-07-11 18:26, Siu Kwan Lam wrote:
> This leads
> to my first question: Is the /builtin_function_or_method /object already
> getting the benefit from the fastcall?

It depends on how it is called. There currently is no public API for
calling a function using the FASTCALL convention, there are private
functions like _PyCFunction_FastCallKeywords(). Internally, those are
used by the CPython bytecode interpreter.

> The dispatcher finds the appropriate overload version
> and call the corresponding /builtin_function_or_method /objects.

Are those builtin_function_or_method objects exposed to the outside? I'm
just wondering why you use that instead of bare C function pointers.

 
builtin_function_or_method objects are exposed to the interpreter.  Inside compiled code, Numba know the exact function pointer.  It sounds like we are already getting some benefit by wrapping the function pointer as a builtin_function_or_method

 
> This
> leads to my second question: With these PEPs, can I speedup the
> /Dispatcher/ type by using the fastcall convention?

Yes, that's exactly the point of those PEPs.

Great! We have quite a bit of overhead at the transition from CPython interpreter into Numba compiled code.  Sounds like these PEPs can help.  However, I am not sure how these PEPs are different.  For instance, if I were to rewrite the Dispatcher type to leverage the enhancements in the PEPs, what is the main difference between the PEPs.  To make the question simpler, what is the difference b/w the 2 PEPs to make a new callable extension type leveraging the fastcall?


Does numba internally optimize those calls, say from one @jit function
to another @jit function?

As mentioned above, @jit function calling another @jit function does not involve builtin_function_or_method.  In fact, it doesn't involve CPython at all.  With one exception, there is an objectmode which is basically just unrolling the interpreter loop.  With this PEPs, our objectmode will likely get slower relatively unless we also apply the same changes.  But with PEP 523, we may just drop objectmode in the future.
 


Jeroen.

Jeroen Demeyer

unread,
Jul 12, 2018, 11:23:06 AM7/12/18
to numba...@continuum.io
On 2018-07-12 15:56, Siu Kwan Lam wrote:
> However, I am not sure how these PEPs are different.

I'm the author of PEP 580, so my answer is probably biased.

PEP 576 is simpler and less ambitious. IMHO, it might actually be a
performance regression in some cases (but there is no reference
implementation yet, so I cannot say that for sure).

PEP 580 on the other hand was designed for performance. Calls will
become as fast as calls of builtin_function_or_method. This is
especially important if you want to support methods (i.e. __get__) and
not only functions. It is also extensible, it allows future
optimizations. One thing that we (= me and Cython developers) are
thinking about is allowing calls with native C types, for example a C
long instead of a Python int wrapping that C long.


Jeroen.

Siu Kwan Lam

unread,
Jul 12, 2018, 11:53:27 AM7/12/18
to Numba Public Discussion - Public
We care about methods (for @jitclass).  Calling with native C types is also interesting.  Numba has to do a lot of that manually.  Numba compiled code is always dealing with native machine types.

In a broader sense, it sounds like PEP 580 should also apply to ctypes and cffi.  I would prefer PEP 580 since it gives more extensibility for the future.

Btw, can you clarify:

Calls will become as fast as calls of builtin_function_or_method.

Do you mean any function calls will become as fast .... ?

Jeroen Demeyer

unread,
Jul 13, 2018, 5:32:57 AM7/13/18
to numba...@continuum.io
On 2018-07-12 17:53, Siu Kwan Lam wrote:
> Calling with native C types is
> also interesting.

That's not part of the PEP, just an idea for future work. But it would
need PEP 580 as a base.

> Btw, can you clarify:
>
> Calls will become as fast as calls of builtin_function_or_method.
>
>
> Do you mean any function calls will become as fast .... ?

I'll try to be very precise:

Say C is a custom callable class which implements PEP 580 and f = C(...)
is an instance of it. Then the Python overhead of the function call
f(...) is reduced to the Python overhead of instances of
builtin_function_or_method. This also applies to bound and unbound
method calls.

To say it in a different way: all the existing optimizations for
built-in functions and methods become available to custom classes
implemented in C.

Siu Kwan Lam

unread,
Jul 13, 2018, 9:30:51 AM7/13/18
to numba...@continuum.io
Thanks for the clarification.  

--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users...@continuum.io.
To post to this group, send email to numba...@continuum.io.
Reply all
Reply to author
Forward
0 new messages