To help frame the current discussion and pull requests relating to (mostly) PP loading performance, I'd like to get an idea of the size of performance improvement possible. Establishing upper bounds on performance, within certain assumptions, is a useful piece in that puzzle.
Along those lines I've written some simple code to emulate the process of creating a 2D Cube once you know which PP rules are relevant. I've then timed this code with the current master vs. a "maximum-speed" branch. The
maximum-speed branch removes things like validity checking that offer no benefit in this controlled environment. Using the "master" branch, 2D Cubes are created at the rate of 1200 per second. Whereas the "maximum-speed" branch gives a rate of 11000 per second.
For the "maximum-speed" branch simple statistical profiling suggests that almost all the elapsed time is spent just creating objects. So further speed-ups are expected to require a reduction in the number of objects created. This might be possible by simplifying any overly complex private data structures and/or identifying where instances can be shared.
It remains to be seen how much of the validation code, etc. can be bypassed in a real implementation without compromising the robustness of the file conversion process and the design of the public API.
And while I have some of this in my head, some observations:
- Creating Unit instances is slow.
- Using ABCMeta makes instance creation slow.
- Making a NumPy array is slow. And p = np.array([v]) is slower than p = np.empty(1); p[0] = v
- _OrderedHashable is slow.
- __slots__ doesn't make that much difference (although I've not looked at the improvement in memory usage)