Evolving remote APIs

220 views
Skip to first unread message

Fabian Reinartz

unread,
Nov 18, 2021, 10:37:09 AM11/18/21
to prometheus...@googlegroups.com, Xi Chen, Danny Clark, Lee Yanco

Hi developers,

We recently launched Google Cloud’s Prometheus metric backend based on Monarch. We encountered some obstacles regarding the remote APIs, which we believe to be common for backends that were not built for Prometheus bottom-up.

A central issue is that the remote APIs expose the Prometheus storage data model. It is notably different from the Prometheus/OpenMetrics instrumentation model and discards most of the structure known at scrape time.
Structured data is critical to store and query data more effectively and translate it to different underlying storage data models. With the current API however the structure is very challenging and sometimes impossible to restore.

We're also interested in potential new features, like first-class support for HA deduplication and write atomicity.


We’d like to explore evolving the remote APIs so that interoperability and compliance become more practically attainable for independently developed backends.

But there should be substantial opportunities for backends that largely reuse Prometheus code as well.


If I recall correctly from years ago, the current remote API was always meant as a starting point, rather than the final solution. Is now a good time to revisit its fundamentals?


Are there any recent discussions in this area to read up on and participate in?



Thanks,
Fabian

Julien Pivotto

unread,
Nov 18, 2021, 12:31:40 PM11/18/21
to Fabian Reinartz, prometheus...@googlegroups.com, Xi Chen, Danny Clark, Lee Yanco
Hello Fabian,

I am not a remote write expert but we are about to mark the remote write
specification as stable. Here is the proposal:
https://docs.google.com/document/d/1LPhVRSFkGNSuU1fBd81ulhsCPR4hkSZyyBj1SZ8fWOM/edit#heading=h.3p42p5s8n0ui

We are also working on transactional remote write:
https://docs.google.com/document/d/1UgSNnQYB1TJKVHkrUybEZPDsDl6MYCSiABm71v73Uvs/edit#heading=h.ih79uqrsv2dl

Regards,


>
>
>
> Thanks,
> Fabian
>
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAG97UEngEdrkrcGnH5J%2BRQ4ThT8GW530JcLOgdJ2Jf13A_RAjA%40mail.gmail.com.

--
Julien Pivotto
@roidelapluie

Bartłomiej Płotka

unread,
Nov 18, 2021, 1:33:01 PM11/18/21
to Prometheus Developers
Hi Fabian!

As Julian said, it sounds like we have to then talk about Remote Write v2. We can totally start designing one. Looking forward to proposals in this space! (: 

Kind Regards,
Bartek

Tom Wilkie

unread,
Nov 18, 2021, 3:14:55 PM11/18/21
to Bartłomiej Płotka, Prometheus Developers
Sounds good! Should we start a regular working group to focus on this?  I know many people have been interested.

In the meantime I'll try and get that v1 spec published on the prometheus site so we can officially call it "done".

One question I'm interested in discussing is to what extent the existing protocol can be evolved towards something that is e.g. transactional, more structured and lightweight vs throwing it away and building a complete new one.  There is a ton of support for remote write baked into existing software and it'd be a shame to throw that away.

Cheers

Tom

Bjoern Rabenstein

unread,
Nov 22, 2021, 7:16:50 AM11/22/21
to Fabian Reinartz, prometheus...@googlegroups.com, Xi Chen, Danny Clark, Lee Yanco
On 18.11.21 16:36, 'Fabian Reinartz' via Prometheus Developers wrote:
>
> A central issue is that the remote APIs expose the Prometheus storage data
> model. It is notably different from the Prometheus/OpenMetrics
> instrumentation model and discards most of the structure known at scrape
> time.
> Structured data is critical to store and query data more effectively and
> translate it to different underlying storage data models. With the current
> API however the structure is very challenging and sometimes impossible to
> restore.

Thanks for picking this up. These were precisely the concerns when
remote write was sketched out in 2016 – and one of the reasons to mark
it explicitly as experimental. “Sadly” (and also unsurprisingly),
everyone jumped on the experimental specification, and a whole
industry has evolved around it, so that we are essentially required to
go for a v2 to address the concerns now.

I can add from the Prometheus side that things are finally moving
towards storing structured data natively in the TSDB, namely with the
work on the new histograms. I expect that the same work will open up
possibilities for more structured data and also for richer and better
integrated meta-data. The implications for remote-write are twofold:
For one, those changes motivate to change remote-write along with
them. On the other hand, it also enables Prometheus to support a more
structured remote-write protocol in the first place.

(Interestingly, before remote-write, Prometheus had federation, and it
deliberately uses the same format as for scraping. The plan back then
was to “soon” enable the Prometheus TSDB to support all the structure
and meta-data in the exposition format. But that hasn't happened yet,
and federation still exposes all metrics as flat "untyped" metrics
without any meta-data.)

--
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

Fabian Reinartz

unread,
Nov 25, 2021, 1:35:20 PM11/25/21
to Prometheus Developers
Thanks for the responses everyone.
Sounds like the existing remote-write will become stable as is, which is certainly a logical consequence after all this time.

I can certainly imagine some features being added to it, as the transactional RW design shows.
Another semi-low-hanging fruit may be to send the metric type enum on regular requests, not just out of band, to support conversion on the fly.
But I think at some point it becomes possible but not necessarily sensical to extend the current proto.
If client and server implementations need extensive conditional logic to handle a single proto, e.g. negotiation "how structured" the data should be, a separate endpoint ultimately seems simpler.

The point on TSDB becoming more structured is interesting – how firm are these plans at this point? Any rough timelines?
My first hunch would've been to explore integrating directly at the scraping layer to directly stream OpenMetrics (or a proto-equivalent) from there, backed by a separate, per-write-target WAL.
This wouldn't constrain it by the currently supported storage data model and generally decouple the two aspects, which also seems more aligned with recent developments like the agent mode.
Any thoughts on that general direction?

Julien Pivotto

unread,
Nov 25, 2021, 4:25:10 PM11/25/21
to Fabian Reinartz, Prometheus Developers
On 25 Nov 10:35, Fabian Reinartz wrote:
> The point on TSDB becoming more structured is interesting – how firm are
> these plans at this point? Any rough timelines?
> My first hunch would've been to explore integrating directly at the
> scraping layer to directly stream OpenMetrics (or a proto-equivalent) from
> there, backed by a separate, per-write-target WAL.
> This wouldn't constrain it by the currently supported storage data model
> and generally decouple the two aspects, which also seems more aligned with
> recent developments like the agent mode.
> Any thoughts on that general direction?

As maintainer of Prometheus server, in general, I am worried that
getting a wal that'd be more "able" than the actual Prometheus TSDB
would weaken the Prometheus server use case in favor of SaaS platforms.

It does not sound great for the users who rely on Prometheus
alone, which I think will continue to represent a large part of our
community in the future.

Additionally, the Query Engine should take advantage of those new
properties as well: until we do not support that in Prometheus TSDB,
it's harder to take advantage of the OpenMetrics types in the language.

--
Julien Pivotto
@roidelapluie

Bartłomiej Płotka

unread,
Nov 26, 2021, 6:17:28 AM11/26/21
to Fabian Reinartz, Prometheus Developers
>  directly stream OpenMetrics (or a proto-equivalent) from there,

On another front, from an efficiency standpoint, don't we want to batch samples from exact same ts in many cases (e.g network partition)?

Kind Regards,
Bartek Płotka (@bwplotka)


--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.

Fabian Reinartz

unread,
Nov 26, 2021, 6:23:55 AM11/26/21
to Fabian Reinartz, Prometheus Developers
As maintainer of Prometheus server, in general, I am worried that
getting a wal that'd be more "able" than the actual Prometheus TSDB
would weaken the Prometheus server use case in favor of SaaS platforms.

It does not sound great for the users who rely on Prometheus
alone, which I think will continue to represent a large part of our
community in the future.

Where do you see the downside for these users? It doesn't seem that a structured remote-write API would take anything away from
users using the Prometheus server with local storage.
 
Additionally, the Query Engine should take advantage of those new
properties as well: until we do not support that in Prometheus TSDB,
it's harder to take advantage of the OpenMetrics types in the language.

True, though I don't understand why this is an argument against the remote-write protocol supporting the instrumentation data model.

Tailing whatever structure TSDB currently supports, which will probably be a moving target for some time, seems like it would cause unnecessary
change frequency to the API or require waiting a few years before making any changes at all. 
Or is the goal to not give service offerings access to more structure than Prometheus itself can make use of?


I should say that I'm primarily speaking from technical curiosity here. Our own offering doesn't need such fundamental changes, though
they would make some things a bit simpler of course.


On another front, from an efficiency standpoint, don't we want to batch samples from exact same ts in many cases (e.g network partition)?

Could you elaborate with an example?

Rob Skillington

unread,
Nov 27, 2021, 6:34:55 AM11/27/21
to Fabian Reinartz, Prometheus Developers
There's a.now out of date but working proof of concept PR from August last year that added TYPE, HELP and UNIT to the WAL and also to Prometheus Remote Write payloads (on a per TimeSeries samples basis):
https://github.com/prometheus/prometheus/pull/7771

Once it's added to the WAL there's no reason it can't be put into both (A) any new Remote API and (B) extending the existing Remote Write API v1 as a minor release (e.g. Remote Write 1.1).

There was a 20%-30% increase in network traffic with sending it with every single remote write request (on every series):
https://github.com/prometheus/prometheus/pull/7771#issuecomment-675956119

We arrived with another solution over the course of discussion on the PR which would be to "send type and unit every time (since so negligible) but help only every 5 minutes" with perhaps some way to tweak this behavior via config or some other means.

Rob


--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.

Bjoern Rabenstein

unread,
Dec 1, 2021, 7:24:28 AM12/1/21
to Fabian Reinartz, Prometheus Developers
On 25.11.21 10:35, Fabian Reinartz wrote:
>
> The point on TSDB becoming more structured is interesting – how firm are
> these plans at this point? Any rough timelines?

I hope there will be a PoC for histograms in two or three months. It's
hard to estimate how long it will take after that to get to a mature
implementation that can be part of a proper Prometheus release.

But that's only histograms, i.e. changing the hardcoded "every sample
is a timestamped float" to a hardcoded "every sample is either a
timestamped float or a timestamped histogram". My hope is that this
change will teach us how we can go one step further in the future and
generalize handling of structured sample data.

So yeah, it's at least three steps away, and timelines are hard to
predict.

> My first hunch would've been to explore integrating directly at the
> scraping layer to directly stream OpenMetrics (or a proto-equivalent) from
> there, backed by a separate, per-write-target WAL.
> This wouldn't constrain it by the currently supported storage data model
> and generally decouple the two aspects, which also seems more aligned with
> recent developments like the agent mode.
> Any thoughts on that general direction?

Yes, this would be more in line with an "agent" or "collector"
model. However, it would kick in earlier in the ingestion pipeline
than the current Prometheus agent (or Grafana agent, FWIW) and
therefore would need to reimplement certain parts (while the
Prometheus agent, broadly simplified, just takes things away, but
doesn't really change or add anything fundamental): Obviously, it
needed a completely new WAL and the ingestion into it. It even affects
the parser because the Prometheus 2.x parser shortcuts directly into
the flat internal TSDB data model.

Ironically, the idea is similar to the very early attempt of remote
write (pre 1.x), which was closer to the scraping layer. Also, prior
to Prometheus 2.x, parsing was separate from flattening the data
model, with the intention of enabling an easy migration to a TSDB
supporting a structured data model.

Back then, one reason to not go further down that path was the
requirement of also remote-write the result of recording
rules. Recording rules act on data in the TSDB and write data to the
TSDB, so they are closely linked to the data model of the
TSDB. In the spirit of "one day we will just enable the TSDB to handle
structured data", I would have preferred to go the extra mile and
convert the output of recording rules back into the data model of the
exposition format (similar to how we did it (imperfectly) for
federation), but the general consensus was to move remote-write away
from the scraping layer and closer to the TSDB layer (which might have
been a key to the success of remote-write).

That same reasoning is still relevant today, and this might touch the
concerns Julien has expressed: If users use Prometheus (or a
Prometheus-like agent) just to collect metrics into the metrics
solution of a vendor, things work out just fine. But if recording (or
alerting) rules come into the game, things get a bit awkward. Even if
we funneled the result of recording rules back into the future
scrape-layer-centric remote-write somehow, it will still feel a bit
like a misfit, and users might think it's better to not do rule
evaluation in Prometheus anymore but move this kind of processing into
the scope of the metrics vendor (which could be one that is
Prometheus-compatible, which would at least keep the rules portable,
but in many cases, it would be a very different system). From a
pessimistic perspective, one might say this whole approach reduces
Prometheus to service discovery and scraping. Everything from the
parser on will be new or different.

As a Prometheus developer, I would prefer that users utilize a much
larger part of what Prometheus offers today. I also see (and always
have seen) the need for structured data (and metadata, in case that
isn't implied). That's why I want to evolve the internal Prometheus
data model including the one used in the TSDB, and to evolve the
remote write/read protocols with it.

That's an idealistic perspective, of course, and similar to the
remote-write protocol as we know it, a more pragmatic approach might
be necessary to yield working results in time. But perhaps this time,
designs could take into account the vision above so that later, all
the pieces of the puzzle can fall into place rather than moving the
vision even farther out of reach.

Pengfei Zhang

unread,
Apr 13, 2022, 2:54:25 PM4/13/22
to Prometheus Developers
Hi, 

I make a short proposal of structured remote write at https://github.com/prometheus/prometheus/issues/10539. There must be lots of use case/concerns of not moving to structured protocol, but after weighing the pros and cons, I still think have such a proposal with discussion will help Prometheus 3.x set TSDB data model, scaling, etc. 

That proposal is still in draft status, please take a look and left any comment in your mind.
Reply all
Reply to author
Forward
0 new messages