Migrating away from github.com/gogo/protobuf

604 views
Skip to first unread message

Austin Cawley-Edwards

unread,
Apr 27, 2021, 4:51:35 PM4/27/21
to Prometheus Developers

Hi all,

The protobuf library used in Prometheus, gogo/protobuf, is no longer actively maintained (activity largely stopped pre-2020, looking for new ownership). Along with that, many of the performance boosts that are provided by gogo have been addressed in the official Go lib, golang/protobuf, as of v1.4.0.

Many projects that used gogo/protobuf have since switched to the official lib (ex: Istio, Envoyproxy), largely for eco-system compatibility reasons now that performance is not a factor. The gogo-compiled protobufs are not compatible with any golang-compiled protobufs, and vice-versa. This makes consuming external protobuf APIs impossible unless they also maintain a gogo version, which is not common.

I'm wondering if anyone has done work in this area recently, and if the community agrees it's a net benefit switching to the official golang/protobuf implementation.

What this would mean for Prometheus

Looking at the history of protobuf in Prometheus, it seems like both golang/protobuf and gogo/protobuf were until the end of 2017, when it then made sense to consolidate onto gogo (#3346) (#3394).

As noted in the above issues, the official golang/protobuf package is still present in vendored files, so it is just the Prometheus protos that would need to be updated. The build procedures (mainly scripts/genproto.sh) have not changed much since the 2017 shift, so the work would be mainly adjusting this back to use golang/protobuf and recompiling the `prompb` package.

Does anyone know of other necessary changes/ complications that I'm missing?

Thanks all for your time,
Austin

Frederic Branczyk

unread,
Apr 28, 2021, 5:07:19 AM4/28/21
to Austin Cawley-Edwards, Prometheus Developers
I'd be very happy with this. One consideration that we should take (certainly not blocking this but should keep in mind) is how this would affect immediate downstream users. Off the top of my head, I would think checking in with the Cortex and Thanos projects is probably a good idea, I know both have a good amount of optimizations optimizing allocations, so it would be good to check that these would still be possible.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/38caa51d-e88a-489b-a045-54144cd1a03fn%40googlegroups.com.

Tom Wilkie

unread,
Apr 28, 2021, 8:05:28 AM4/28/21
to Frederic Branczyk, Austin Cawley-Edwards, Prometheus Developers
> Along with that, many of the performance boosts that are provided by gogo have been addressed in the official Go lib, golang/protobuf, as of v1.4.0.

This surprised me a little, and sounds awesome - but I can't find any more information about it.  Does v1.4.0 generate code for the serialisation / deserialisation function or still rely on Golang reflection?  Does it allow for the neat tricks to get rid of pointers and all the allocations?  Anything I can read about this would be super useful.

> Off the top of my head, I would think checking in with the Cortex and Thanos projects is probably a good idea, I know both have a good amount of optimizations optimizing allocations, so it would be good to check that these would still be possible.

From a Cortex PoV, we have copies of these protos so I don't think this would be too much of a problem, but I'd defer to people who know better than me.

Cheers

Tom

Austin Cawley-Edwards

unread,
Apr 28, 2021, 12:39:59 PM4/28/21
to Tom Wilkie, Frederic Branczyk, Prometheus Developers
This surprised me a little, and sounds awesome - but I can't find any more information about it.  Does v1.4.0 generate code for the serialisation / deserialisation function or still rely on Golang reflection?  Does it allow for the neat tricks to get rid of pointers and all the allocations?  Anything I can read about this would be super useful.

Hmm, I'm not sure on all the specifics, but here's what I've found so far:
* Golang reflection has largely been replaced by the protoreflect package (which does not use Golang reflection under the hood)
* SerDe uses the protoreflect package via dedicated modules for different formats (json, text, wire)
* Not sure about the specific tricks/ if the overuse of pointers + allocations are still present, but there is now a first-class lib for creating compiler plugins that might be what you're looking for? Looks like there are many plugins that use it already, judging by the imports

Off the top of my head, I would think checking in with the Cortex and Thanos projects is probably a good idea ...

+1 sounds good! Are there specific people I should ping on this list? Or open up issues in their repos? You mentioned there are others for Cortex @tom.w...@gmail.com?

Austin Cawley-Edwards

unread,
Apr 28, 2021, 1:19:42 PM4/28/21
to Tom Wilkie, Frederic Branczyk, Prometheus Developers
Here's also some nice history on how Protobuf in Go has changed: https://github.com/protocolbuffers/protobuf-go#historical-legacy

And just for clarification, I'm recommending we switch to google.golang.org/protobuf (which is now the official implementation), sorry for the confusion/typo with golang/protobuf.

Austin Cawley-Edwards

unread,
Apr 28, 2021, 4:21:59 PM4/28/21
to Tom Wilkie, Frederic Branczyk, Prometheus Developers
Looks like Thanos is having a similar discussion internally: https://github.com/thanos-io/thanos/pull/4086#discussion_r617274400

Tom Wilkie

unread,
Apr 29, 2021, 5:00:08 AM4/29/21
to Austin Cawley-Edwards, Frederic Branczyk, Prometheus Developers
Sounds good! As this is pretty performance sensitive, I think we'd want to benchmark any changes to this code before we merge it.  Let me know if I can help there..

> You mentioned there are others for Cortex @tom.w...@gmail.com?

Bringing this up at the next Cortex community is probably the best way: https://github.com/cortexproject/cortex#community-meetings

Cheers!

Tom

Austin Cawley-Edwards

unread,
Apr 29, 2021, 6:01:12 PM4/29/21
to Tom Wilkie, Frederic Branczyk, Prometheus Developers
Totally on the benchmarking, that would be great!

I've got a work in progress branch here, but it looks like it will be rather intensive, unfortunately. The main issues seem to be:
* The use of pointers instead of structs, as @Tom Wilkie mentioned before. It looks like the lib likely won't change this
* Finding a replacement for the `gogoproto.nullable` annotation (perhaps envoy's validation plugin might be just that?)
* Size() methods are no longer generated
* Equals() no longer works out of the box, as protos are now not comparable (and I'm struggling to use the recommended tools)

If anyone has pointers/ ideas here, much appreciated!

Bringing this up at the next Cortex community is probably the best way: https://github.com/cortexproject/cortex#community-meetings

Cool, see you there! 

Frederic Branczyk

unread,
Jun 3, 2021, 3:46:12 PM6/3/21
to Austin Cawley-Edwards, Prometheus Developers, Tom Wilkie
This looks promising for us to potentially move off of gogo proto without a performance hit (maybe even an improvement): 

Austin Cawley-Edwards

unread,
Jun 17, 2021, 12:26:52 PM6/17/21
to Frederic Branczyk, Prometheus Developers, Tom Wilkie
Just saw this on the CNCF blog as well, seems like a promising library.

Tangentially, have you heard of https://github.com/bufbuild/buf? It seems much nicer than compiling with shell scripts and protoc.


Frederic Branczyk

unread,
Jun 17, 2021, 12:56:28 PM6/17/21
to Austin Cawley-Edwards, Prometheus Developers, Tom Wilkie
I have heard great thing, but haven’t used it. Wrongfully thought that they are mutually exclusive but turns out they are actually complementary: 

We should probably do an investigation of the combination.

Austin Cawley-Edwards

unread,
Jun 21, 2021, 1:53:37 PM6/21/21
to Frederic Branczyk, Prometheus Developers, Tom Wilkie
I've updated my branch (https://github.com/austince/prometheus/tree/feat/drop-gogo) to use both the vitess plugin and the buf tool, which indeed fit very nicely together.

I've only updated the code enough for it to compile, have not investigated the semantic differences. This is likely the furthest I'll be able to take this for a bit, so feedback and playing around are welcome and appreciated if this is where we'd like protobuf in Prometheus to go :)

Best,
Austin

Frederic Branczyk

unread,
Jul 8, 2021, 5:42:41 AM7/8/21
to Austin Cawley-Edwards, Prometheus Developers, Tom Wilkie
I think I'd be most useful to rebase, and create a PR from this, then we can see whether tests pass and we can run prombench (although I don't think there are any perf tests that involve the proto parts). Then we can discuss on there and figure out where to take this.

Thank you so much for the work you have already put into this!

callu...@gmail.com

unread,
Nov 28, 2023, 9:38:14 PM11/28/23
to Prometheus Developers
As part of all the remote write proto changes I've been working on I tried out moving us off of gogoproto, cherry picking Austin's original changes into a new branch off of the current main branch.

As Tom mentioned, the main reason for using gogoproto is that `repeated SomeMessageType = n;` fields within messages are generated as slices of concrete types rather than slices of pointers, which makes it much easier to write code that avoids extra memory allocations. From what I've hacked together, we can get similar (or potentially better) performance using vtproto and their pooling feature, but it's going to be a big refactoring effort.

It might, however, be worth it. It looks to me like even with slightly more allocations the proto marshalling is faster and the marshalled message is smaller. I'll push what I have later this week when I'm more confident it's working correctly.

Bartłomiej Płotka

unread,
Feb 3, 2024, 7:56:09 AMFeb 3
to Prometheus Developers
We did a bit of testing for remote write 2.0 work (e.g. here) for gogo vs different plugins, and vtproto is the most promising even with more pointers.

We have to get rid of nullables, yes (more pointers, pore separate objects on heap, generally more allocs), but even for our current remote write (especially with interning) there is literally not many slices (with many elements) that use custom types. And even if there are (e.g. []*TimeSeries) those objects might be worth to keep separate on the heap. This is also what protobuf direction will be, given the vision of "opaque API" (ability to load/allocate/ parts of proto message in a lazy way).

Furthermore we hit a roadblock a bit, as a apparently "optional" proto3 option does not work with proto. This makes it maybe even more worth doing. (e.g. PRW 2.0 optional timestamp int64 would not be able to have valid value of 0 etc).

I think I would consider doing this work this summer, perhaps as a GSoC mentorship. Anyone would like to mentor/co-mentor that with me or Callum? (: 

Kind Regards,
Bartek Plotka

Bartłomiej Płotka

unread,
Feb 5, 2024, 4:58:17 AMFeb 5
to Prometheus Developers
Issue for reference: https://github.com/prometheus/prometheus/issues/11908

Kind Regards,
Bartek Płotka

Mirco

unread,
Feb 14, 2024, 3:08:37 AMFeb 14
to Prometheus Developers
Hi all, sorry to disrupt this discussion.

Before stumbling upon this thread, I had worked on a separate fork to deprecate gogo in favor of csproto, as compiling it using enableunsafedecode=true seems to give performance even better than vtproto. (Note, I have only compared the performance of csproto and vtproto to the official proto generator, and not gogo).
As of now the branch compiles and passes all tests, but I haven't gone through the code to check for possible optimizations that could arise from migrating away from gogo.
Would you be interested in a pull request? As mentioned above, this would be also a good opportunity to cleanup the proto generation code using buf.

P.S.: This would depend on a change in prometheus/client_model, but would allow removing the duplicate proto definition in the repository.

King Regards,
Mirco De Zorzi.

Bryan Boreham

unread,
Feb 14, 2024, 4:18:30 AMFeb 14
to Prometheus Developers
No need to apologise!

Do you have benchmarks using the Prometheus remote-read and remote-write protobufs ?

Mirco

unread,
Feb 14, 2024, 1:32:28 PMFeb 14
to Prometheus Developers
Would the already present benchmarks in the code be sufficient?

If so, here are the remote readremote write, and a cheeky remote read with pooling for snappy decompression comparisons (ran with -tags stringlabels), along side the raw results in the same directory.
The remote read tests don't look great, but I might have missed something inside of the code.

Note: I took the liberty to use 10 labels for both tests.

Bjoern Rabenstein

unread,
Feb 15, 2024, 10:12:51 AMFeb 15
to Mirco, Prometheus Developers
Thanks for doing this.

Beyond benchmarks, two general concerns:

- How unsafe is `enableunsafedecode=true`? I spot-checked the csproto
code, and the risk seems to be on the side of the user code,
i.e. luckily there isn't any unsafe input, but I'm wondering how
easily we'll introduce bugs in that way. What's the gain by using
the flag vs. not using it? (FTR, the Prometheus code is using the
same trick with the `yoloString` itself, but that has also been
frowned upon...)

- How confident can we be that csproto will be consistently
maintained? It seems to be mostly the work of a single person, and
it is sponsored by a single company (which is at least already 13
years around, and has 7k employees, so probably not disappearing
tomorrow).

Not saying these are blockers, just trying to come to an informed
decision.

--
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

Mirco

unread,
Feb 15, 2024, 4:47:40 PMFeb 15
to Prometheus Developers
> I'm wondering how easily we'll introduce bugs in that way
I've just realized that using enableunsafedecode=true would prevent us from pooling the buffers, as whenever using unsafe.Pointer to convert between []byte and string one cannot modify the original []byte array.
I'll update the pooling benchmarks to reflect this.

>What's the gain by using the flag vs. not using it?
In general, using the safe method seems to degrade performance by ~10%, while still saving ~10% of memory. Unsafe would keep the same performance and save roughly ~30% memory.

Using safe unmarshaling and pooling both compressed and uncompressed buffers seems to be the best way forward - although I would like to get a double-check from someone regarding the benchmarks results.
Q: is there any reason why Prometheus already doesn't do this? It seems like the only place where buffers are pooled is for marshaling remote write responses.

>How confident can we be that csproto will be consistently maintained?
The nice thing about both csproto and vtproto are that they both leverage the structs generated by the default google implementation and override the Marshal and Unmarshal methods. This would mean that the biggest work would have to be done to migrate from gogo to either one of these two implementations, while switching from csproto to vtproto would be trivial, as they implement virtually the same interface. Even if csporto and vtproto were to be discontinued in the future, I think most protobuf alternative implementations would continue onto this path. 

Bryan Boreham

unread,
Feb 16, 2024, 4:03:57 AMFeb 16
to Prometheus Developers
Do you have one for sending remote-write data?  The one you linked is RemoteWritehandler which is for receiving.

From what I've seen elsewhere, Proto3 structs have extra data members which I expect to show significantly increased memory allocation.
Unfortunately the remote-write structure with a struct for every name-value pair and all labels repeated is maximally bad for this.

Bryan

Mirco

unread,
Feb 16, 2024, 7:31:31 AMFeb 16
to Prometheus Developers
I tried benchmarking BenchmarkSampleSend and the results seem awful, from ~150000 B/op and ~250 allocs/op to ~1400000 B/op to ~170000 allocs/op. Profiling the code, it seems like labelsToLabelsProto is the offender, as it allocates a *prompb.Label per label. I experimented with using a map[string]string instead, but it seems like the results aren't that much different.

One potential solution to this - which would require a relatively big refactor - would be to replace repeated Label labels with the raw byte representation that is held inside of labels_stringlabels, and switch from an array of Sample to two arrays of Timestamp and Value. Just doing the former seems to speed up the benchmarks by 30% (while allocations are still huge because of the *prompb.Sample).

Mirco

unread,
Mar 2, 2024, 11:00:02 AMMar 2
to Prometheus Developers
Given the benchmarks it seems like moving away from gogo is not feasible with the current protobuf schema.
I've only now looked at the remote write 2.0 branch, and that might fix the issues with label allocations.
Any reason samples are still encoded as a list of pairs? Using two arrays would both reduce the number of objects and allow using varint encoding for timestamps.

Bjoern Rabenstein

unread,
Mar 6, 2024, 6:26:50 AMMar 6
to Mirco, Prometheus Developers
On 02.03.24 08:00, Mirco wrote:
> Any reason samples are still encoded as a list of pairs? Using two arrays
> would both reduce the number of objects and allow using varint encoding for
> timestamps.

I assume you are referring to the protobuf messages for TimeSeries and Sample?

I'm not an expert for remote write (and even less so for remote read),
but I think it's safe to say that a change in the protobuf layout
would mean a new major version of the protocols. v2 is just underway,
so that would require the next major version bump to v3, which would
be a big deal given how widely used the protocol is.

Having said that, at least for remote write, there are usually not a
lot of samples in a TimeSeries message. The most common number is
AFAIK one. Mileage might vary for remote read, but that's also far
less used.

WRT varint: In my understanding of protobuf, varint will always be
used for an int64, even if it is a field in another message.

Bryan Boreham

unread,
May 16, 2024, 10:58:58 AMMay 16
to Prometheus Developers
> Any reason samples are still encoded as a list of pairs? Using two arrays would both reduce the number of objects and allow using varint encoding for timestamps.

Just to note that Prometheus is coded to send just one sample per series in a remote-write message.
In normal usage we want to send data straight away after it was scraped, so that fits.

Other clients, or Prometheus in some different mode, could benefit as you say.

Bryan

Reply all
Reply to author
Forward
0 new messages