Metric type for basic web analytics

49 views
Skip to first unread message

Nick

unread,
Sep 10, 2020, 11:02:31 PM9/10/20
to Prometheus Users
Hi,

I'm using a Prometheus client library to gather some basic website analytics, but I'm confused as to which metric type is best suited for this use case.

I can capture a few required data points (as labels' values I suppose) for website visits, like:
But which metric type should I use for it, to be able to display: 1) a chart for website visits by domain, 2) list of most visited pages by domain, 3) list of most used browsers by domain.

Please suggest an ideal approach.

Thank you!

Brian Candler

unread,
Sep 11, 2020, 7:29:34 AM9/11/20
to Prometheus Users
The metric type you want is "counter", which is incremented on each hit.

However you need to be careful here, as you may end up with a cardinality explosion if you label your metrics with {domain="X", page="Y", browser="Z"}.  That is, there's an unlimited number of pages which could be scraped, you'll see these requests from bots, attacker scripts etc.

If you are going to use prometheus for this, you need to be careful to sanitise the input data, e.g. select from a small list of pre-existing domains, pages, and browsers, and ignore all unknown values.  Otherwise you may be better off with ad-hoc log analysis tools.

Nick

unread,
Sep 12, 2020, 3:22:18 AM9/12/20
to Prometheus Users
Thanks, that makes sense.

Nick

unread,
Sep 15, 2020, 1:11:48 AM9/15/20
to Prometheus Users
Keeping cardinality explosion in mind, what's a decent maximum number of exported metrics that can be considered performant for scraping and time-series processing?

As I mainly need the counter total, I can split the web analytics to reduce the number of possible label combinations, for example:

{ domain, page }
{ domain, browser }

Stuart Clark

unread,
Sep 15, 2020, 2:37:04 AM9/15/20
to Nick, Prometheus Users
For both of those some level of pre-processing would be advised to
reduce cardinality.

For page, remove query strings, etc. (especially as they can contain
unique tracking IDs). For the browser convert the user agent header into
a general string such as Firefox, Chrome, Opera, etc.


Brian Candler

unread,
Sep 15, 2020, 3:46:49 AM9/15/20
to Prometheus Users
On Tuesday, 15 September 2020 06:11:48 UTC+1, Nick wrote:
Keeping cardinality explosion in mind, what's a decent maximum number of exported metrics that can be considered performant for scraping and time-series processing?

It depends on how much resource you're prepared to throw at it.  If you build a powerful prometheus server then a total of 2 million timeseries is doable; beyond that you ought to look at sharding across multiple servers.

As I mainly need the counter total, I can split the web analytics to reduce the number of possible label combinations, for example:

{ domain, page }
{ domain, browser }

Yes that's fine, but you still want to limit the number of values for each label.  As Stuart said: in the case of browser you don't want the raw User-Agent header, but pick between a small pre-determined set of values like "Firefox", "Chrome", "IE", "Other".  In the case of "page" strip out any query string, and ideally also limit to known pages or "Other".

If you also want to record the raw User-Agent values for every request then do that in a separate logging system (e.g. loki, elasticsearch, a SQL database, or even just plain text files)

Tim Schwenke

unread,
Sep 15, 2020, 3:23:10 PM9/15/20
to Prometheus Users
"If you build a powerful prometheus server then a total of 2 million timeseries is doable; beyond that you ought to look at sharding across multiple servers."

I think you have forgot a zero

Nick

unread,
Sep 16, 2020, 1:59:53 AM9/16/20
to Prometheus Users
Thanks guys! I'll try to ensure that all label values are sanitized and limited to bare minimum, which may still end-up generating around 2,000-4,000 counter metrics per scrape (retained in Prometheus for up-to a year). I hope that's not going to be a problem?

Tim Schwenke

unread,
Sep 16, 2020, 4:46:08 AM9/16/20
to Prometheus Users
Prometheus can easily handle hundreds of thousands up to millions of time series. I recommend you to ignore the number of metrics and focus on the approximate number of time series in your system. Technically even the name of the metric is nothing more than a label called __name__

>>> services = 100
>>> replicas = 3
>>> endpoints = 20
>>> metric_request_counter = 5
>>> metric_latency = 20
>>> services * replicas * endpoints * metric_request_counter * metric_latency
600000
>>> services * replicas * endpoints * metric_request_counter * 10
300000

Ben Kochie

unread,
Sep 16, 2020, 4:54:14 AM9/16/20
to Nick, Prometheus Users
Single-digit thousands per target is fine. I usually recommend keeping things under 10k, with 50k being an absolute upper limit for sanity.

I have some rails apps that produce around 25-35k per target due to the number of Controller#Action pairs on some metrics.

On Wed, Sep 16, 2020 at 7:59 AM Nick <vicn...@gmail.com> wrote:
Thanks guys! I'll try to ensure that all label values are sanitized and limited to bare minimum, which may still end-up generating around 2,000-4,000 counter metrics per scrape (retained in Prometheus for up-to a year). I hope that's not going to be a problem?

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/6f274b1c-62ce-4f7a-8f41-01f8eae5f361n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages