Discrepancy in generated code of unsafe_arena_release_foo

50 views
Skip to first unread message

King Kong

unread,
May 5, 2020, 3:44:32 PM5/5/20
to Protocol Buffers

I am using unsafe_arena_set_allocated_* and unsafe_arena_release_* APIs to avoid
unnecessary allocations when the sub-objects are only used in a function scope,
unsafe_arena_release_* is always called before leaving the scope. At the beginning,

I chose to always use the arena versions of unsafe APIs because the non-arena version 

unsafe_release does extra allocation when the object is allocated on arena, and the arena

version does not check if the object is allocated in arena, based on generated code like 

the following:


inline foo* outter1::unsafe_arena_release_foo() {
foo* temp = foo_;
foo_ = nullptr;
return temp;
}


However, I later also saw the following the generated code with arena check, it means it is not okay to call for heap-allocated objects.

inline void outter2::unsafe_arena_set_allocated_bar(
std::string* bar) {
GOOGLE_DCHECK(GetArenaNoVirtual() != nullptr);
... ...
}


So, why the discrepancy?


The official documentation at https://developers.google.com/protocol-buffers/docs/reference/arenas says the correct set of APIs should be used

according to how to the object is allocated, but having two different sets of unsafe APIs makes it harder to use, and it makes the code less robust to future changes. I am wondering if protobuf owners also realize such an issue and start to remove arena checking gradually? It that the case? I would like to hear how others think.


Thanks.

Josh Haberman

unread,
May 5, 2020, 5:25:05 PM5/5/20
to Protocol Buffers
The documentation for unsafe_arena_release_foo() says:

> This method should only be used when the parent message is on the arena and the submessage is on the same arena, or an arena with equivalent lifetime.

The DCHECK() you highlighted is checking the first part of this requirement.

Stack/heap-allocated messages must always uniquely own their sub-messages, which must always be on the heap. Protobuf doesn't support the use case you are describing, where a message temporarily points to a message it doesn't own. If we allowed that, protobuf itself would crash if a message in this state was ever destroyed.

If you are trying to avoid heap allocation overhead, the recommended solution is arenas.

King Kong

unread,
May 5, 2020, 7:20:33 PM5/5/20
to Protocol Buffers
Thanks for your reply.

Yes, I have been using arena, but I wanted to do better than batching heap allocation, when possible. The "scoped leasing" use case I described above achieve 0 heap allocation all the time. I guess my code could use GetArena() at runtime to decide how the "leasing" object is created and which set of set&release APIs to use.

King Kong

unread,
May 5, 2020, 8:09:01 PM5/5/20
to Protocol Buffers
unsafe_arena_set_allcoated* and unsafe_arena_release* API are "unsafe" APIs, callers should know what they are doing.
Why not remove arena checking in all generated unsafe_arena_release*  code? This will make unsafe_arena* APIs work for both arena-allocated and heap allocated objects. It seems a straightforward change to me, arena checking in unsafe_arena_release*  does not always happen anyway.

King Kong

unread,
May 5, 2020, 9:36:39 PM5/5/20
to Protocol Buffers
I could be wrong, but it seems to arena checking is only done for std::string in generated unsafe_arena_* code. I wonder if it is because google internal version of protobuf has arena support for std::string, thus the special treatment.

Josh Haberman

unread,
May 5, 2020, 10:02:39 PM5/5/20
to King Kong, Protocol Buffers
On Tue, May 5, 2020 at 4:20 PM King Kong <lei....@gmail.com> wrote:
Thanks for your reply.

Yes, I have been using arena, but I wanted to do better than batching heap allocation, when possible. The "scoped leasing" use case I described above achieve 0 heap allocation all the time. I guess my code could use GetArena() at runtime to decide how the "leasing" object is created and which set of set&release APIs to use.

If you want to avoid any heap allocation, you could give the arena an initial block of memory from the stack.

If you are trying to optimize, keep in mind that most of the savings from arenas come at deallocation time, where arena-allocated protos can simply skip the destructor. If you are using heap-allocated protos, the destructor has to check every message field to see if it needs to be deleted.

Arenas have given such performance benefits that we are focused on that use case. If we loosen the checks in unsafe_arena functions, it could force us to compromise later to keep supporting the heap-allocated case. I don't think we want to loosen these restrictions.
 
--
You received this message because you are subscribed to a topic in the Google Groups "Protocol Buffers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/protobuf/C26jc_jrCJQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to protobuf+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/protobuf/bfa10586-05af-430c-b809-5e4c65d43b5c%40googlegroups.com.

King Kong

unread,
May 5, 2020, 11:33:21 PM5/5/20
to Protocol Buffers
As a related issue, I remember reading some comments in protobuf source code that says google internal protobuf supports arena allocation that's not portable (probably assumes certain std::string implementation), thus it is not open sourced, what's the plan for string arena allocation support? If I don't care about portability, is it possible to get the non-portable solution somehow?

On Tuesday, May 5, 2020 at 7:02:39 PM UTC-7, Josh Haberman wrote:
To unsubscribe from this group and all its topics, send an email to prot...@googlegroups.com.

King Kong

unread,
May 7, 2020, 2:10:40 PM5/7/20
to Protocol Buffers
I guess my last post wasn't clear enough, so here is the clarification.

With open source version of protobuf, std::string cannot be allocated in arena, so set allocated and release APIs are necessary for temporarily sharing std::string pointers, according to official documentation and generated code, different APIs are needed (unsafe_arena_* vs non-arena), depending on how the "borrowing" object is allocated, this leads to fragile code. Allowing using one set of APIs in both cases seems a better alternative.

If the recommendation is not to use std::string, what's the replacement?
Thanks.

Josh Haberman

unread,
May 7, 2020, 2:18:49 PM5/7/20
to King Kong, Protocol Buffers
The unsafe_arena_* functions for strings are deprecated:

> When arenas are enabled, string and bytes fields generate unsafe_arena_release_field() and unsafe_arena_set_allocated_field() methods. Note that these methods are deprecated and will be removed in a future release. Those methods were added by mistake and offer no performance advantage over their safe equivalents.


We do not have any supported interface for borrowing a std::string and releasing it later. Our goal in this area is to reduce (and eventually remove) the use of std::string in generated APIs in favor of string_view, which could potentially support an accessor like:

// The "foo" field will point to the given string data, which must outlive the message.
void FooMessage::set_alias_foo(string_view str) {}

To unsubscribe from this group and all its topics, send an email to protobuf+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/protobuf/64726032-e17f-4c21-aa49-9273940d901b%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages