Is Prometheus a good choice for a web analytics database?

2,836 views
Skip to first unread message

xilarra...@qdqmedia.com

unread,
Jan 26, 2016, 7:04:40 AM1/26/16
to Prometheus Developers
Hi!

This isn't a typical question I think :)

First of all I'll explain the background.

We are an online marketing company from Madrid (Spain) we've been using Prometheus to store application metrics and monitor our infrastructure for about two months, we haven't finished setting up everything yet, but we are very happy with Prometheus and its ecosystem around.

A big chunk of our work consists in recording web analytics events, like page views, and other types of events that occur on a web page. Once we do this, our customers can see this data in their dashboard.

So we started with phase 0, this will be selecting which will be our web analytics metric store, this will be a big decision because will be forever (nothing is forever, but yes for a long long time). We have tested InfluxDB but we aren't sure with performance on the tests that we have done.

Prometheus has nice API, awesome query language and toolkit, is developed in go, easy deployment and maintenance, grows quickly as application and community. In general it has almost everything that we need.

estimating (roughtly) for the data that will be stored:

* 3000000 metrics/month
* Almost everything will be counters
* 6-8 labels for each metric
* one of the labels will be the client ID (60000 clients)
* one of the labels will be the campaign ID (180000 campaings) but each client only has 3 tops
* the other labels are between 1 and 5 values


Some of the doubts that we have:

* Scalability stuff like sharding with federation... (anyone has real numbers?)
* Benchmarks about reads and writes
* Is it feasable to not delete any historical data?
* Hot Backups
* Can the last datapoint of a metric be retrieved without knowing its timestamp?
* How do you see Prometheus for this kind of metrics (page views, visits, conversions...)?
* Need to import old data (we've seen that timestamped data is not a feature right now)
* Store timestamped data (same as above)

We see that Prometheus is more oriented to other type of metrics, but we see Prometheus as a valid candidate at first sight, So before making a big decision and choosing Prometheus as our data store, would be awesome some advice of the people that know best Prometheus (and the limits)

Thank you!

Brian Brazil

unread,
Jan 26, 2016, 9:10:05 AM1/26/16
to xilarra...@qdqmedia.com, Prometheus Developers
On 26 January 2016 at 12:04, <xilarra...@qdqmedia.com> wrote:
Hi!

This isn't a typical question I think :)

First of all I'll explain the background.

We are an online marketing company from Madrid (Spain) we've been using Prometheus to store application metrics and monitor our infrastructure for about two months, we haven't finished setting up everything yet, but we are very happy with Prometheus and its ecosystem around.

Great to hear!
 
A big chunk of our work consists in recording web analytics events, like page views, and other types of events that occur on a web page. Once we do this, our customers can see this data in their dashboard.

So we started with phase 0, this will be selecting which will be our web analytics metric store, this will be a big decision because will be forever (nothing is forever, but yes for a long long time). We have tested InfluxDB but we aren't sure with performance on the tests that we have done.

Prometheus has nice API, awesome query language and toolkit, is developed in go, easy deployment and maintenance, grows quickly as application and community. In general it has almost everything that we need.

estimating (roughtly) for the data that will be stored:

* 3000000 metrics/month
* Almost everything will be counters
* 6-8 labels for each metric
    * one of the labels will be the client ID (60000 clients)
    * one of the labels will be the campaign ID (180000 campaings) but each client only has 3 tops
    * the other labels are between 1 and 5 values

With that sort of cardinality, while you can probably get it working, Prometheus is unlikely the best option. You'd be best off looking at a log-based approach such as the ELK stack for this data. 6-8 fields is tiny for logs, but a 30M cardinality (before allowing for fanout due to instances) is a significant problem for a metrics-based approach.


 

Some of the doubts that we have:

* Scalability stuff like sharding with federation... (anyone has real numbers?)

I don't think sharding helps with your scenario, as each partition still has the same cardinality to deal with. Sharding helps with cardinality due to a high number of instances, not with cardinality due to labels coming from instrumentation.

* Benchmarks about reads and writes

This would be just about doable it with a 1 minute scrape interval for the writes. Reads would depend on what you're doing, aggregating 180k timeseries across however many instances is unlikely to be instant.
 
* Is it feasable to not delete any historical data?

Not if you want to keep it forever. A few months is fine usually, but I'd expect you to have a good bit of churn which will limit you.
 
* Hot Backups

These are not currently supported. We consider the storage of Prometheus itself to be ephemeral, and plan that remote long term storage will cover high durability requirements.
 
* Can the last datapoint of a metric be retrieved without knowing its timestamp?

No, but you should always be exporting all time series.
 
* How do you see Prometheus for this kind of metrics (page views, visits, conversions...)?

System level statistics are fine, but you've too many clients by at least two orders of magnitude to consider breaking it out that far.
 
* Need to import old data (we've seen that timestamped data is not a feature right now)

That's not currently supported.
 
* Store timestamped data (same as above)

You can do it, but it's not advised as it gets messy.


Brian
 

We see that Prometheus is more oriented to other type of metrics, but we see Prometheus as a valid candidate at first sight, So before making a big decision and choosing Prometheus as our data store, would be awesome some advice of the people that know best Prometheus (and the limits)

Thank you!

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

xilarra...@qdqmedia.com

unread,
Jan 27, 2016, 3:07:29 AM1/27/16
to Prometheus Developers, xilarra...@qdqmedia.com
Thank you for the response.

We will continue searching!

Regards.
Reply all
Reply to author
Forward
0 new messages