I'd have liked to use structs, but Cython ones do not allow Python objects in them. C arrays would be beyond the reach of the runtime system and would require manual memory management. There is no point in it. The way I envisioned the series was to make a Python tutorial, and then demo the Spiral code which has a similar high level of expressiveness, but greatly improved performance. Since the performance is not there
without having to write C style code, there is no point in doing that.
Actually, I am surprised that all the inlining of the primitives did not improve the performance much. Even with the heap allocation that is already being done I'd have expected it to be far faster than Python. There is nothing I can do about the Cython generated code, but I might want to rewrite it in C++ while keeping the expected allocation profile just to see how much the unexpected factors impact performance before starting work on that backend.