Potential Python memory leak

41 views
Skip to first unread message

Viktor Vorobev

unread,
Nov 3, 2025, 7:59:05 AM (8 days ago) Nov 3
to Protocol Buffers
Hello! I think I've stumbled across an issue similar to this one:
https://github.com/protocolbuffers/protobuf/issues/10088

But I'm not entirely sure, so seeking help.
I've forked and hacked together a little demo repo: https://github.com/viktorvorobev/proto_leak

But basically the situation is as follows.
If you create a proto object, then use `ParseFromString` on it multiple times, then the object size grows uncontrollably until this object is deleted:

obj = schema_pb2.value_test_topic()
for _ in range(10_000_000):
     # here the object will grow
    obj.ParseFromString(b"...")
del obj

But if you recreate an object every time, then everything seems to be fine, so this code doesn't leak:

for _ in range(10_000_000):
    obj = schema_pb2.value_test_topic()
    obj.ParseFromString(b"...")

I suspect the same behaviour for Unpack, but didn't test it myself.

Is this known or intended?

Thanks a lot!

Em Rauch

unread,
Nov 3, 2025, 8:21:46 AM (8 days ago) Nov 3
to Viktor Vorobev, Protocol Buffers
The behavior listed is working as intended when PythonProtobuf is backed by upb.

The reason why is because upb's memory model is implemented using an Arena memory model (as described https://en.wikipedia.org/wiki/Region-based_memory_management). Under this model, you do no book-keeping on every individual allocation, instead there's one pool that you can only append to, and the only time that memory is freed is if that entire pool is released (because there's no book-keeping about what fine-grained memory is live or not). This has both less allocation and deallocation overhead, as well as less memory usage from bookkeeping, by having everything be in the single blob of memory which is much cheaper to drop.

In the upb model, each new top level message is holding this pool and so anything added has to stay live until that thing is released.

This is a known tradeoff: for the expected usecases of Protobuf where you have a lot of request-scoped messages (and some number of permanent immutable constants), it will be faster and use less memory, with an unavoidable the downside is that if you do have a long lived mutable object that is doing allocating modifications, the memory won't be released until you finally release that one object.

In a pinch if you really need some long lived constantly-allocating message, you can use CopyFrom() into a 'fresh' parent to basically reset it so what you have in the arena is exactly only the data that is actually reachable at that moment, at the cost of doing one deep copy of whatever that state is.

--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/protobuf/0395429c-f797-40fd-9dee-e123ed5d4f84n%40googlegroups.com.

Viktor Vorobev

unread,
Nov 3, 2025, 8:30:23 AM (8 days ago) Nov 3
to Em Rauch, Protocol Buffers
Great, now I see, thank you very much!
It would be good to have it in the docs/examples as well, as it kind of feels counterintuitive that ParseFromString actually allocates more and more memory, while from the name of it feels like it shouldn't.

Thanks again!

пн, 3 нояб. 2025 г. в 14:21, Em Rauch <esr...@google.com>:

Em Rauch

unread,
Nov 3, 2025, 8:35:02 AM (8 days ago) Nov 3
to Viktor Vorobev, Protocol Buffers
Do you mind actually opening a github issue about if we actually could make cases like ParseFromString() not hit this case?

It might be that this was already ruled out for some reason, but it does seem to me like operations which effectively clear out the entire message (including ParseFromString(), CopyFrom(), etc) should be able to reset the arena to a fresh one automatically, and that would reduce the surface area that an unbounded-growth case that you identify would be reachable.

Viktor Vorobev

unread,
Nov 3, 2025, 8:44:31 AM (8 days ago) Nov 3
to Em Rauch, Protocol Buffers
Sure, will do, and will link there this discussion, just in case

пн, 3 нояб. 2025 г. в 14:34, Em Rauch <esr...@google.com>:

Viktor Vorobev

unread,
Nov 3, 2025, 9:55:27 AM (8 days ago) Nov 3
to Protocol Buffers
Here is the link to the created issue for the newer generations: https://github.com/protocolbuffers/protobuf/issues/24257

(I just hate to leave such discussions without a trace)

понедельник, 3 ноября 2025 г. в 14:44:31 UTC+1, Viktor Vorobev:
Reply all
Reply to author
Forward
0 new messages