How about adding some anomaly detection functions in the promql/function.go file

436 views
Skip to first unread message

董善东

unread,
Jun 24, 2021, 3:05:36 AM6/24/21
to Prometheus Developers
hi,all
In the existing prometheus version, the anomaly detection still relies fully on the rules setting. We find that it is inconvenient to set and hard to maintain in practical use.
So I propose to add some statistical analysis functions to provide better and stronger AD ability. 

I have tried to add some classic AD to prometheus to test. See the demo :
Step 1: prometheus query
  • original time series:node_memory_free_bytes
1-1.png

  • we can use n-sigma method to detect the original array, to help find the anomaly points。
  • normal result = 0, while abnormal result = 1
1-2.png

Step 2: used in grafana 
1. direct use 

2-1.png
2. with upper bound and lower bound, to help provide better knowledge of the AD detection. 
2-2.png

Wish for your reply and discussion. 


best, 
Dr. Dong Shandong. 

Bjoern Rabenstein

unread,
Jun 24, 2021, 5:14:51 PM6/24/21
to 董善东, Prometheus Developers
On 24.06.21 00:05, 董善东 wrote:
> hi,all
> In the existing prometheus version, the anomaly detection still relies
> fully on the rules setting. We find that it is inconvenient to set and hard
> to maintain in practical use.
> So I propose to add some statistical analysis functions to provide better
> and stronger AD ability.

Yeah, that's a frequent request. Unfortunately, there are so many
statistical analysis functions that we can hardly just add them all.

So far, the usual recommendation is to extract data from Prometheus
via the HTTP API and feed it to a fully-fledged statistics tool.

Obviously, that doesn't help you with alerts (which you probably want
to keep within Prometheus).

At the previous to last dev-summit (2021-05-27), we discussed the use
case.

Outcome was the following:
* We want to explore supporting analytics use cases within PromQL behind
a feature flag
* We are open to wrapping other languages, e.g. R, Fortran, SciPython,
given an accepted design doc

See alse notes here:
https://docs.google.com/document/d/11LC3wJcVk00l8w5P3oLQ-m3Y37iom6INAMEu2ZAGIIE/edit?ts=6036b8e0&pli=1#heading=h.sa2f6aem9wdt

So I guess you could just implement the functions you like and put
them into a PR, locked behind a feature flag.

Personally, I'm still not sure if that's a sustainable
approach. Perhaps integrating some scripting engine to allow
user-defined functions might be better. But we'll see…

--
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

Levi Harrison

unread,
Jun 24, 2021, 5:53:40 PM6/24/21
to Prometheus Developers
Maybe we could utilize Golang's plugin library to provide a way for users to write their own functions in compliance with the defined format and then dynamically load them into PromQL.

Shandong Dong

unread,
Jun 25, 2021, 4:27:45 AM6/25/21
to Prometheus Developers
Ok, I will try the PR first. Can I know what‘s the concern of "Personally, I'm still not sure if that's a sustainable approach. "?

Thanks,
Dong Shandong

Julien Pivotto

unread,
Jun 25, 2021, 5:30:10 AM6/25/21
to Levi Harrison, Prometheus Developers
On 24 Jun 14:53, Levi Harrison wrote:
> Maybe we could utilize Golang's plugin library <https://pkg.go.dev/plugin>to
> provide a way for users to write their own functions in compliance with the defined
> format
> <https://github.com/prometheus/prometheus/blob/7cb55d57328c60e4a69e741c4953b97e41bf0be3/promql/functions.go#L46>
> and then dynamically load them into PromQL.

That approach would ban windows users. I am not sure if that is an
acceptable trade-off.
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/8fb5d7fc-30f8-4a68-aa27-05fb66b4c2e7n%40googlegroups.com.


--
Julien Pivotto
@roidelapluie

Levi Harrison

unread,
Jun 25, 2021, 7:02:51 AM6/25/21
to Levi Harrison, Prometheus Developers
Yeah, no Windows support is less than desirable. Also after more research, the library seems to be really fraught with issues. Hashicorp has its plugin library, but I don't think we can afford to take the performance hit of going over RPC. I did find this library though, which I believe supports Windows and looks really promising. We would have to subject it to further testing though because I don't think it's broadly adopted.

It is the case that a scripting engine as Björn suggested would probably the cleanest way to allows users to define their own functions, but that approach brings a lot more complexity so I think this is a good solution for now, also having the added benefit of language that's already widely known.

Levi Harrison

unread,
Jun 25, 2021, 3:16:44 PM6/25/21
to Levi Harrison, Prometheus Developers
We could even use something like Yaegi, an interpreter, which would be slower than using compiled Go code, but makes the creation process a lot easier and more stable. This is even closer to having our own scripting language, but still is distinguished by the ability to simply drop this in with little extra work, and the adoption of Go as a language (Yaegi completely supports the Go specification).

Ben Kochie

unread,
Jun 26, 2021, 9:03:35 AM6/26/21
to Levi Harrison, Prometheus Developers
Is anyone using this kind of thing in production? I've seen lots of demos and promises over the years. Every anomaly detection project seems to be abandoned after 6 months.

I'd like to avoid adding functionality that is "cool" but has no practical application.


Shandong Dong

unread,
Jun 28, 2021, 3:09:57 AM6/28/21
to Prometheus Developers
Actually many companies have already take the intelligent anomaly detection algorithms into production use, such as: datadog, Azure, Tencent cloud, Ali Cloud,etc. 
For this" Every anomaly detection project seems to be abandoned after 6 months", it may be that AD project needs to be integrated with practical monitor architecture and the algorithms need to be continuously iterated. It's difficult for personally maintained projects to be continuously updated. 

Best,
Dong Shandong

Bjoern Rabenstein

unread,
Jun 30, 2021, 5:24:42 PM6/30/21
to Shandong Dong, Prometheus Developers
On 25.06.21 01:27, Shandong Dong wrote:
> Ok, I will try the PR first. Can I know what‘s the concern of "Personally,
> I'm still not sure if that's a sustainable approach. "?

We had a handful of requests in the past to add specific advanced
statistics functions. In one case, a function was actually added, see
https://prometheus.io/docs/prometheus/latest/querying/functions/#holt_winters

The problem with the latter is that it was actually not the variety of
Holt-Winters that most people wanted. A lot of misunderstanding
happened because of that. My impression is (but I might be proven
wrong) is that this is a rarely used PromQL function. But now we have
to support it at least until the next major release.

That latter problem will be avoided by feature flags. But if we now
each of the five to ten persons that requested new functions will add
on average two to three new functions, we end up with about 20 new
functions, all with the same potential of being misunderstand. Many
might be overlapping, so any new function needs to be reviewed for
overlap with existing ones. Even if they are all behind feature flags,
they will require a lot of code with potential interaction with
existing code and with each other, so there is some maintenance
overhead.

Eventually, reviewing and acceptance of even more functions behind
feature flags will slow down. So we are back at square one. And the
multitude of experimental functions will make it harder for users to
find the right one to try out. Which in turn will make it harder to
identify the actually generally useful functions and "graduate"
them. Realistically, there will be small groups of users liking
subsets of functions, but rarely functions that aither everyone needs
and likes or nobody.

It feels a bit like the attempt to create a Python interpreter for
data science that doesn't understand modules and instead tries to have
all required functions built-in. That's hardly a reasonably
approach. And that's why my personal idea is that Prometheus either
has to keep stating that it is only meant for basic mathematical
operations on metrics, or it has to provide some kind of "scripting
interface" to allow custom mathematical "libraries" for users with
special requirements.

Tristan Colgate

unread,
Jul 1, 2021, 1:58:45 AM7/1/21
to Bjoern Rabenstein, Shandong Dong, Prometheus Developers
Just as a point of personal experience on use of AD in production. Having tried in a few places:

Seasonality of human activity rarely lines up well with the data slicing needed for AD, You get *a lot* of false positives. It's almost always basically unusable for pagable alerting.

You need a lot of traffic. Doing stats on infrequent requests, or on systems that he state thier own traffic spikes (like batch jobs that cause traffic of any size close to typical usage), just doesn't work well.

A lot of simpler AD models have a loopback problem where you normal traffic looks anomalous sone time after an actual event , this causes another source of annoying false positive that result in people ignoring alerts.

There are AD models now that avoid some of the false positive issues, but I think it remains true that they would be useless without sufficient traffic volumes (last paper I read was from twitter I think). 

Personally I think any features along these lines will be good fodder for blog posts, but unlikely to end up useful for real world monitoring. (I speak as an author of related posts).

https://medium.com/qubit-engineering/using-seasonality-in-prometheus-alerting-d90e68337a4c

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.

Darshan Chaudhary

unread,
Jul 13, 2021, 12:50:53 AM7/13/21
to Prometheus Developers
In a similar vein to what Levi is proposing, how about exposing a plugin interface in prometheus that external projects can implement?
We can provide a UI to select the plugins that the user needs to bake into the prometheus binary before downloading

Ben Kochie

unread,
Jul 13, 2021, 3:45:57 AM7/13/21
to Darshan Chaudhary, Prometheus Developers
We've talked about making build-time plugins possible. Specifically for service discovery methods because some of them are very large/bloated due to vendored code.

A build time interface like Caddy or CoreDNS for other functions would be interesting.

Reply all
Reply to author
Forward
0 new messages