Prometheus queue manager config need to be configurable

Colstuwjx

unread,

Jul 7, 2017, 3:36:16 AM7/7/17

to Prometheus Users

Hi team,

Currently, the queue manager is not configurable, due to the config is coded in here(https://github.com/prometheus/prometheus/blob/master/storage/remote/queue_manager.go#L128), and wasn't the part of startup parameters.

We should make it configurable.

BTW, as the queue manager is also a part of the prometheus server, it seems that too many samples would trouble the prometheus server itself, such as the rule evaluation, since the prometheus components is monolithic. Is there any better way or just plan about this? Such as the cortex solution with full separated components work together(https://docs.google.com/document/d/1C7yhMnb1x2sfeoe45f4mnnKConvroWhJ8KQZwIHJOuw/edit#).

Thanks.

Tom Wilkie

unread,

Jul 7, 2017, 5:28:41 AM7/7/17

to Colstuwjx, Prometheus Users

Hi colstuwjx

> We should make it configurable.

We decided not to make it configurable to begin with as the aim of the code is to dynamically adapt to the given situation, adding and removing shards to try and flush the current sample rate with a maximum delay of 5s. I fully suspect there are situations where it does the wrong thing - do you have an example of one?

> Is there any better way or just plan about this?

The queueing code is designed to cap the amount of memory it uses, so as not to bother the Prometheus server. If it can't flush samples quickly enough, it will drop them.

When using Cortex we tend to run Prometheus without any local storage, relying on Cortex for queries, rule evaluation and alerting. In this mode, Prometheus acts as the agents collecting and forwarding samples to Cortex. We haven't found it to be a bottleneck. Have you?

Thanks

Tom

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/5b6518d0-6003-4407-80ba-fd4b16fdde64%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Colstuwjx

unread,

Jul 7, 2017, 6:49:05 AM7/7/17

to Prometheus Users, cols...@gmail.com

I just setup a prometheus server and configure it with a remote storage, due to too many targets need to scrape, the remote storage queue is quick to full, thus, I'd like to configure the max samples which the queue manager hold.

As cortex has not been merged into the prometheus upstream, what's your suggestion about this?

Tom Wilkie

unread,

Jul 7, 2017, 7:32:14 AM7/7/17

to Colstuwjx, Prometheus Users

Do you have any logs from Prometheus? It sounds like your remote storage is too slow. Can you take a screenshot of the following queries:

- 90th percentile send batch latency: `histogram_quantile(0.9, sum(rate(prometheus_remote_storage_sent_batch_duration_seconds_bucket[5m])) by (le,queue))`

- Rate of dropped samples: `rate(prometheus_remote_storage_failed_samples_total[1m]) by (queue)`

Thanks

Tom

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/be2f0252-454d-47ac-b5a8-2be5f8088648%40googlegroups.com.

Colstuwjx

unread,

Jul 7, 2017, 9:02:34 PM7/7/17

to Prometheus Users, cols...@gmail.com

The queue capacity is 100k, and the shards was growing up from 2 to 4, one queue I found here, 12.5M succeed samples scraped in 5 minutes, and there is no failed_samples_total metric, instead, I found the dropped_samples_total was 140k samples in 5min. Graphs are shown below:

figure 1. 90th percentile send batch latency

figure 2. succeed samples rate

figure 3. dropped samples rate

Any suggestion about this? My local storage is pretty good, and it shows that there are 480k memory series stored.

Tom Wilkie

unread,

Jul 11, 2017, 11:41:23 AM7/11/17

to Colstuwjx, Prometheus Users

Hi colstuwjx

Sorry for the delay. Those graphs look fine for me, some initially dropped sample during the period where Prometheus ramps up the number of shards is to be expected. What were you expecting?

Thanks

Tom

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/5b371b76-d776-4bb1-821a-19159478ded8%40googlegroups.com.

Reply all

Reply to author

Forward