PromCache - Aggresively caching the Query endpoint for multiple dashboards

1,113 views
Skip to first unread message

Dominik Schulz

unread,
Aug 7, 2017, 10:42:41 AM8/7/17
to Prometheus Developers
Hi,

I'd like to share a small tool we built because our Prometheus servers were getting overloaded by our Grafana Dashboards.

PromCache https://github.com/dominikschulz/promcache will manipulate all requests to the /api/v1/query endpoint to make them cacheable,
and cache the responses for a minute in memory. This is espeically helpful if lot's of dashboards are accessing the same timeseries frequently.

It's more of a hack than a proper project, but maybe it's helpful for someone. If I did reinvent the wheel, please let me know.

Best Regards,
Dominik

Ben Kochie

unread,
Aug 10, 2017, 8:12:04 AM8/10/17
to Dominik Schulz, Prometheus Developers
Thanks, this would be very useful for public internet facing systems.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/992bec42-7a63-4eef-8aa4-aa73b1470d4e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Richard Hartmann

unread,
Aug 10, 2017, 8:54:19 AM8/10/17
to Dominik Schulz, Prometheus Developers
FYI: we have been using this on our systems for the last three days
now; it works and is nice when half a team accesses a dashboard at
once. And yes, this has implications when closely following something
as you introduce non-deterministic delay.

Richard Hartmann

unread,
Aug 10, 2017, 8:56:04 AM8/10/17
to Ben Kochie, Dominik Schulz, Prometheus Developers
On Thu, Aug 10, 2017 at 2:12 PM, Ben Kochie <sup...@gmail.com> wrote:
> Thanks, this would be very useful for public internet facing systems.

You still have the issue of many expensive queries, especially with
diverse/malicious users.

But we should play with this for 34c3 and FOSDEM.

Johannes Ziemke

unread,
Aug 11, 2017, 12:59:08 PM8/11/17
to Richard Hartmann, Ben Kochie, Dominik Schulz, Prometheus Developers
Ha, I built something very similar for the public-internet-facing usecase. It's a bit more strict (whitelisting endpoints and arguments needed by grafana), though allows all queries for now. Didn't know httpcache, should have used that. I'll share the link in the next few days. Happy to merge with promcache.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAD77%2BgRDjanEjFC5nU6jrDg%2BfXQUD%3DHQhOQ91OYiVTB%2BXiW5Vw%40mail.gmail.com.
Message has been deleted

ja...@ranson.org

unread,
Mar 30, 2018, 3:45:23 PM3/30/18
to Prometheus Developers
Hi all - I noticed the message I sent a bit ago to this thread was deleted. Usually it is a good practice to let someone know that you have take such action and why. No one did that in this case. Could someone please reply to me with this information so that we can know if there is a more appropriate place to send this out, or if there is something else we can do differently? Thank you.

Ben Kochie

unread,
Mar 30, 2018, 4:57:28 PM3/30/18
to ja...@ranson.org, Prometheus Developers
This is more than likely Google spam filters.  We have only some control over them.


We've been getting reports that they have been more aggressive recently.

On Fri, Mar 30, 2018, 12:45 <ja...@ranson.org> wrote:
Hi all - I noticed the message I sent a bit ago to this thread was deleted. Usually it is a good practice to let someone know that you have take such action and why. No one did that in this case. Could someone please reply to me with this information so that we can know if there is a more appropriate place to send this out, or if there is something else we can do differently? Thank you.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.

Julius Volz

unread,
Mar 31, 2018, 2:01:34 AM3/31/18
to ja...@ranson.org, Prometheus Developers
Hi James,

That looks pretty cool! Btw., maybe it'd make sense to mention it in a separate [ANNOUNCE] thread on prometheus-users@ as well, for people not reading this list or not following this particular thread.

Cheers,
Julius

On Fri, Mar 30, 2018 at 9:02 PM, <ja...@ranson.org> wrote:
Dominik and all,

Happy Good Friday! Comcast is very pleased to announce that we have open-sourced a Dashboard Accelerator for Prometheus called Trickster.

Like PromCache, Trickster acts as a Reverse Proxy/Cache that sits between your Dashboard and your Prometheus instance(s). That's about where the similarities end. Rather than being an HTTP Object Cache (where full objects are periodically evicted and replaced with newer ones), Trickster is an incremental delta cache that understands and marshals Prometheus datasets as part of its core functionality. Which means that Trickster, knowing exactly what data you want, and exactly what data it already has, will intelligently fetch from Prometheus only the datapoints it needs to complete the cached dataset and service the client request. In most cases, that means the origin queries to Prometheus request about 1% of data normally needed without Trickster, and provides dashboard acceleration anywhere from 2x to 40X (no exaggeration), depending upon the number/depth of cardinal labels and complexity of queries. We have found that Trickster really levels the playing field: the more complex and slow the query, the faster the acceleration on it.

We have incorporated quite a few tricks (no pun intended) to ensure that, even though Trickster is doing all of this caching and acceleration, the recency of dashboards are not impaired at all. You still have the absolute latest data on your real-time dashboards, only faster. Trickster supports multiple caching fabrics (in-memory, filesystem, and redis), multi-origin setups, as well as a host of other features out-of-the-box. The Github readme file and docs directory include extensive documentation on how to install and customize Trickster for your own needs.

We hope you will check it out, try it out, and contribute back to the project. We are looking to build a good community base around Trickster so that we can submit it for adoption by the Cloud Native Computing Foundation, once we have a measurable following of users and contributors. So your help would be most appreciated. I have put 10 issues up in our Github project, if anything looks interesting to this audience.

You can find us on Twitter at @TricksterIO, on GitHub at https://github.com/comcast/trickster, on the Gophers slack at #trickster, and on Docker Hub at tricksterio/trickster.


Thank you all!

James Ranson
Architect @ Comcast and creator of Trickster




On Friday, August 11, 2017 at 10:59:08 AM UTC-6, Johannes Ziemke wrote:
> Ha, I built something very similar for the public-internet-facing usecase. It's a bit more strict (whitelisting endpoints and arguments needed by grafana), though allows all queries for now. Didn't know httpcache, should have used that. I'll share the link in the next few days. Happy to merge with promcache.
>
>
> On Thu, Aug 10, 2017 at 2:56 PM Richard Hartmann <richih.ma...@gmail.com> wrote:
> On Thu, Aug 10, 2017 at 2:12 PM, Ben Kochie <sup...@gmail.com> wrote:
>
> > Thanks, this would be very useful for public internet facing systems.
>
>
>
> You still have the issue of many expensive queries, especially with
>
> diverse/malicious users.
>
>
>
> But we should play with this for 34c3 and FOSDEM.
>
>
>
> --
>
> You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
>
> To post to this group, send email to prometheus...@googlegroups.com.
>
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAD77%2BgRDjanEjFC5nU6jrDg%2BfXQUD%3DHQhOQ91OYiVTB%2BXiW5Vw%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/afd56b86-22e3-4eac-9027-beb8d3fa12e9%40googlegroups.com.

Dominik Schulz

unread,
Apr 4, 2018, 2:41:49 AM4/4/18
to ja...@ranson.org, Prometheus Developers
Dear James,

this sounds pretty amazing. We've only built PromCache as a very simple hack to allow us to scale our Grafana setup, but we're very well aware of it's limitations.
Your project sounds much more sophisticated and we'll definitely check it out!

On Fri, Mar 30, 2018 at 9:02 PM, <ja...@ranson.org> wrote:
Dominik and all,

Happy Good Friday! Comcast is very pleased to announce that we have open-sourced a Dashboard Accelerator for Prometheus called Trickster.

Like PromCache, Trickster acts as a Reverse Proxy/Cache that sits between your Dashboard and your Prometheus instance(s). That's about where the similarities end. Rather than being an HTTP Object Cache (where full objects are periodically evicted and replaced with newer ones), Trickster is an incremental delta cache that understands and marshals Prometheus datasets as part of its core functionality. Which means that Trickster, knowing exactly what data you want, and exactly what data it already has, will intelligently fetch from Prometheus only the datapoints it needs to complete the cached dataset and service the client request. In most cases, that means the origin queries to Prometheus request about 1% of data normally needed without Trickster, and provides dashboard acceleration anywhere from 2x to 40X (no exaggeration), depending upon the number/depth of cardinal labels and complexity of queries. We have found that Trickster really levels the playing field: the more complex and slow the query, the faster the acceleration on it.

We have incorporated quite a few tricks (no pun intended) to ensure that, even though Trickster is doing all of this caching and acceleration, the recency of dashboards are not impaired at all. You still have the absolute latest data on your real-time dashboards, only faster. Trickster supports multiple caching fabrics (in-memory, filesystem, and redis), multi-origin setups, as well as a host of other features out-of-the-box. The Github readme file and docs directory include extensive documentation on how to install and customize Trickster for your own needs.

We hope you will check it out, try it out, and contribute back to the project. We are looking to build a good community base around Trickster so that we can submit it for adoption by the Cloud Native Computing Foundation, once we have a measurable following of users and contributors. So your help would be most appreciated. I have put 10 issues up in our Github project, if anything looks interesting to this audience.

You can find us on Twitter at @TricksterIO, on GitHub at https://github.com/comcast/trickster, on the Gophers slack at #trickster, and on Docker Hub at tricksterio/trickster.


Thank you all!

James Ranson
Architect @ Comcast and creator of Trickster



On Friday, August 11, 2017 at 10:59:08 AM UTC-6, Johannes Ziemke wrote:
> Ha, I built something very similar for the public-internet-facing usecase. It's a bit more strict (whitelisting endpoints and arguments needed by grafana), though allows all queries for now. Didn't know httpcache, should have used that. I'll share the link in the next few days. Happy to merge with promcache.
>
>
> On Thu, Aug 10, 2017 at 2:56 PM Richard Hartmann <richih.ma...@gmail.com> wrote:
> On Thu, Aug 10, 2017 at 2:12 PM, Ben Kochie <sup...@gmail.com> wrote:
>
> > Thanks, this would be very useful for public internet facing systems.
>
>
>
> You still have the issue of many expensive queries, especially with
>
> diverse/malicious users.
>
>
>
> But we should play with this for 34c3 and FOSDEM.
>
>
>
> --
>
> You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
>
--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-developers/9w89OkcEcTE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/afd56b86-22e3-4eac-9027-beb8d3fa12e9%40googlegroups.com.

Richard Hartmann

unread,
Apr 4, 2018, 4:07:56 AM4/4/18
to ja...@ranson.org, Prometheus Developers
Hi James,

this looks really interesting. We were forced to disable PromCache
after experiencing issues, so we look forward to testing Trickster.

Prometheus 2.x has limited capability to insert slightly into the
past. Will you catch any new data and refresh your cache in such a
case?


Richard
Reply all
Reply to author
Forward
0 new messages