An update on this: I've narrowed down the problem. In pyaudio (which is a wrapper around portaudio), a C callback is called on a separate thread. This callback needs raw audio data form python so it calls into a registered python callback. In the c-wrapper, before making any Py* style calls, the GIL must be acquired. So in c code it looks like this
PyGILState_STATE gstate = PyGILState_Ensure();
// do stuff using Py* API, including calling the PyObject* callback
PyGILState_Release(gstate);
So far so good. I put timers around PyGILState_Ensure(). The difference between Kivy 1.8 and Kivy 1.9 are dramatic.
In 1.8, acquiring the GIL takes maybe 10 to 100 microseconds.
In 1.0, it ALWAYS takes at least 2000 microseconds, and often takes 13,000 or more when the graphics on screen are updating.
Obviously, this is a huge problem for an audio callback that is supposed to get audio data quickly.
So my question is: why does acquiring the GIL take so long in Kivy 1.9 vs 1.8 when updating the graphics?