In the end I've narrowed down the problem I had to something else in the program. Technically my code even looked OK as is, but it probably was from memory use and a lot of C++ strings being passed by value.
The lock helped my program simply by slowing down the rate a few other components could run (as I had suspected), and eventually it would crash, it just lasted for many times as long, which long term testing eventually ran into.
As far as the buffers used by in-flight requests, I think those were locally stored on the stack, so my evaluation seems to be that it actually is fully thread safe after all.