sync.Pool wrapper with stats

65 views
Skip to first unread message

Diego Augusto Molina

unread,
Oct 28, 2024, 3:46:43 AMOct 28
to golang-nuts
Hi, everyone! Thank you for reading. I wanted to share a pet project I started a few weeks ago in my free time, it's a wrapper around sync.Pool that has limited set of online statistics about the memory cost (arbitrarily defined by the user) for two reasons: (1) to know whether it wants it or not in the pool in the first place; (2) to preallocate items with a memory cost that is less likely to need further allocation.
sync.Pool is amazing, and one thing that is great to know as well is how to use it, since keeping a few big allocations in the pool might prove counterproductive. In the fmt package there's a canonical example of proper usage, where small buffers are put into a sync.Pool for (potential) reuse if they are up to certain length, and otherwise dropped for gc.
When writing client/server code I found it wasn't easy to find a right point for them, and partitioning the problem by endpoint and a few other dimensions was a good start. But still, I need to know beforehand what are going to be those dimensions, and sometimes I got MiB length for some endpoints and <1KiB for others, and I had to probe all of them, then write a hardcoded constant for them, etc.
So I thought it would be useful to be able to gather some statistics on the buffers and structures I was using, so as to know where I was getting to, and it turned out in most cases I got... a Normal Distribution-like set of sizes.
So I wrapped all that and tried to improve the abstraction, and packaged it into a fancy name that is probably too much for it: AdaptivePool. See the code here: https://github.com/diegommm/adaptivepool
I found it too fun to work with and didn't bother to look first if there was something already done for this, so please let me know if you know of something already there that is proved (still worth the fun, though).

The principle is that when you put an item into the pool, it will feed an online stats algorithm that will compute mean and standard deviation. It then checks if the size is within the Mean +/- Threshold * StdDev and puts it in the pool if so, otherwise just drops it on the floor. And when you Get something from the pool, and the pool doesn't have something, it will create for you an item with Mean + Threshold * StdDev.

I then added a lot of other things, decoupled the Normal Distribution logic and put it in an Estimator interface (so you don't depend on Normal Distribution and can write your own). This interface decides whether an item should be accepted, and suggests the size of new objects. I then decoupled the concept of "byte size" and changed it to "cost", and created an "ItemProvider" interface that creates items of the given type, measures them to the their cost, and clears them before reuse.

Finally, I added something for my HTTP server to make it easier to buffer the bodies of some endpoints, a ReaderBufferer (that can also buffer ReadClosers and call their Close method). I would not use that for large or otherwise streaming payloads, of course.

So well, that was the pet, I would love some feedback, I know I could be making many mistakes and wrong assumptions, so open to learning. Thank you!

P.S.: btw, added some benchmarks and put them in the commit messages
Reply all
Reply to author
Forward
0 new messages