Status of string_view, repeated Cord or string_view fields?

174 views
Skip to first unread message

Jeffrey Baker

unread,
Sep 4, 2023, 8:16:50 PM9/4/23
to Protocol Buffers
I am aware that Cord support landed, which is very welcome. But it is quite limited since only singular bytes fields can be Cord. It is also limited in that the Cord fields will not alias the input, unless that input is also a Cord.
As an example of why this is limiting, consider a message that minimally frames other messages, where it is desirable to parse the inner messages one at a time, with aliasing/without copying, and where it is undesirable or impossible to copy the input. This would be done like:
message outer {
  repeated bytes inner = 1 [ctype=CORD];
}
... but this is impossible today, since Cord fields cannot be repeated, despite the tantalizing fact that RepeatedField is specialized for Cord. Ideally, what users really want here is repeated string_view, not Cord, because Cord has so much baggage (cordz etc) and often the inputs are string_view anyway, and constructing the initial Cord is just a waste of time in that case.
Today I support a codebase that refuses to use upstream protobuf because some grouchy, unreformed C programmers claim that it copies too much and is therefore too slow. The irritating part of the situation is: they are right. So I maintain a complete, private protobuf codec internally that is more efficient overall, but which never gets the benefit of upstream performance and feature work like tables, epsinput, etc.
Other than mentioning that I want it, I am not sure if there is anything I can do to bring string_view support to the open-source side of the project. If there is anything I can do, please mention it.

-jwb

Matthew Fowles Kulukundis

unread,
Sep 7, 2023, 12:04:24 PM9/7/23
to Jeffrey Baker, Protocol Buffers
Jeffrey~

We omitted `repeated` Cord simply out of an abundance of caution.  If we don't see issues come up with the existing Cord, we can expand it quite easily.  Internal to google, there is already support for repeated field of cord.

string_view aliasing is a bit of a longer term thing.  We have some pretty wonky support for it internally with a ctype=STRING_PIECE, but it is pretty hacky in ways we don't want to expose.  The current plan around that hinges upon protobuf editions.  Actually, Mike and I are presenting at gRPC Conf on the 20th and this specific bit will be covered in the presentation.[1]  Once we start using editions to control the API surface for string_view accessors, we can enable aliasing parses (although they will definitely not be the default).

Hope that makes sense!

Cheers,
Matt

[1]: https://fowles.github.io/unleashing-protobuf-evolution/index.html are the slides, but they may change a bit  

--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/protobuf/25c92047-f58b-428b-8fa6-f93eaca40cacn%40googlegroups.com.

Jeffrey Baker

unread,
Sep 7, 2023, 4:28:34 PM9/7/23
to Matthew Fowles Kulukundis, Protocol Buffers
Yes, that all makes good sense. The v23 release notes state that extension of Cord to repeated fields and string fields is contingent on some expression of interest. Despite the fact that my project is proprietary and not open source, I hope you consider my expression of interest to be on the public record.

I also understand that you might not want to unleash hard-to-use aliasing models on unsuspecting users, but I hope there is some way to offer the easy way and an optional hard-but-efficient way at the same time.

Best,
jwb
Reply all
Reply to author
Forward
0 new messages