I spent a several hours over the last couple days thinking about performance and decided to make the leap and complicate things for the sake of performance. Along those lines, I implemented more optimised routines for reading bytes, shorts, ints, chars, and coordinates last night.
As a result we've gained a 20% speed increase (on both cPython and PyPy) as witnessed by the
before and
after. Of course your mileage may vary, but I am now reading 59 MLG Dallas winner's bracket replays in 12.5 seconds on cPython and 6.5 seconds on PyPy. A note on PyPy, it takes a few replays to get warmed up, for long running processes the performance gain is even higher.
If you are currently working with the new_data branch, please try it out and file any bugs that you find. I would be surprised if I managed the conversion without missing an edge case or two.
~Graylin