Do you know why Cython does not support vectorcall in this case? If the reason is simply that no one has written the code, I may give it a try.
The one complication with vectorcall is that it's stored per-instance. That's a bit different to Cython's normal code-generation for class slots so it may be complicated to implement. The `tp_call` slot also has a fixed non-vectorcall signature, so you'll need to account for that (although PyVectorcall_Call should do it fairly simply I think).
None of this is impossible so you're welcome to give it a go. As
you've seen, it can provide a reasonable amount of optimization so
it'd be a useful addition. But I do think it may be harder than it
sounds.