--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/b48e651a-b264-4ade-8eb8-7dc7eec11b15%40googlegroups.com.
Hi Richard,Reading between the lines it sounds like we're potentially talking about a broader/larger "State of Clojure" type thing for Prometheus. Is that accurate?Certainly don't mind the results being public. My only real concern is timelines: we were hoping to use some of the raw data to help advise some load testing on our end, and things are already looking pretty aggressive. If we're looking at something that's going to take weeks or more to start seeing results rolling in we probably won't quite get the data we were hoping to get in time. From a purely selfish perspective we'd be pretty disappointed to go forward without data from "the source", so to speak. Of course, I totally understand the team's actions here. I'm just whining to myself.Timelines aside, we'd be excited to see something "official" in the longer term. It would be useful for engineers like myself, and I know there are product managers and research folks lurking our virtual halls who would love such readily available data for future efforts.The questions from our survey:
- Roughly how many Prometheus *servers* are you operationally responsible for?
- Of all the Prometheus servers that you are responsible for, which version would you say is the most widely deployed?
- How many unique metrics are reporting across all of your Prometheus servers?
- How many unique *timeseries* are reporting across all of your Prometheus servers?
- If you use Grafana to visualize your Prometheus data, what version of Grafana do you typically use?
- What value do you typically use for the "scrape_interval" config setting in your Prometheus servers?
- Is there anything else you would like to tell us about your Prometheus deployment(s)? For example, interesting challenges, pain points, or quirks of your configuration?
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAMUmz5gNNk4naMFLJ_cWQJQO5g1pGVTZnn74FF4PV%3DHOmhpTqw%40mail.gmail.com.
Thanks for the link to the other survey. That's pretty good.On Fri, May 15, 2020 at 7:27 PM 'Tom Lee' via Prometheus Users <promethe...@googlegroups.com> wrote:Hi Richard,Reading between the lines it sounds like we're potentially talking about a broader/larger "State of Clojure" type thing for Prometheus. Is that accurate?Certainly don't mind the results being public. My only real concern is timelines: we were hoping to use some of the raw data to help advise some load testing on our end, and things are already looking pretty aggressive. If we're looking at something that's going to take weeks or more to start seeing results rolling in we probably won't quite get the data we were hoping to get in time. From a purely selfish perspective we'd be pretty disappointed to go forward without data from "the source", so to speak. Of course, I totally understand the team's actions here. I'm just whining to myself.Timelines aside, we'd be excited to see something "official" in the longer term. It would be useful for engineers like myself, and I know there are product managers and research folks lurking our virtual halls who would love such readily available data for future efforts.The questions from our survey:
- Roughly how many Prometheus *servers* are you operationally responsible for?
- Of all the Prometheus servers that you are responsible for, which version would you say is the most widely deployed?
The first two questions are good. I might modify the first one to clarify with/without HA. For example, we have 21 Prometheus servers, but 7 of those are duplicates for HA.
- How many unique metrics are reporting across all of your Prometheus servers?
- How many unique *timeseries* are reporting across all of your Prometheus servers?
These two need to be clarified for Prometheus. We tend to use the terms metrics and time-series interchangeably. Are you asking about unique metric names?
- If you use Grafana to visualize your Prometheus data, what version of Grafana do you typically use?
- What value do you typically use for the "scrape_interval" config setting in your Prometheus servers?
- Is there anything else you would like to tell us about your Prometheus deployment(s)? For example, interesting challenges, pain points, or quirks of your configuration?
I have a few additional questions that could be added to the list.* How many unique exporter/target types do you have?
* What is your samples/second ingestion rate across all Prometheus servers?
* What is your general metric retention time?
* Do you use external storage (Federation/remote_write/etc)
* If yes, which external storage system(s)?
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CABbyFmqGobiq7NUF-jzqGXOphhG8JjYrb%3D19C5Sm6W5CctmquA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAHJKeLpMkTe0LU_MiTEe9BgDCDAtQ7BCVaiXtjPRzoQ4FO1WTA%40mail.gmail.com.
Thanks for the link to the other survey. That's pretty good.On Fri, May 15, 2020 at 7:27 PM 'Tom Lee' via Prometheus Users <promethe...@googlegroups.com> wrote:Hi Richard,Reading between the lines it sounds like we're potentially talking about a broader/larger "State of Clojure" type thing for Prometheus. Is that accurate?Certainly don't mind the results being public. My only real concern is timelines: we were hoping to use some of the raw data to help advise some load testing on our end, and things are already looking pretty aggressive. If we're looking at something that's going to take weeks or more to start seeing results rolling in we probably won't quite get the data we were hoping to get in time. From a purely selfish perspective we'd be pretty disappointed to go forward without data from "the source", so to speak. Of course, I totally understand the team's actions here. I'm just whining to myself.Timelines aside, we'd be excited to see something "official" in the longer term. It would be useful for engineers like myself, and I know there are product managers and research folks lurking our virtual halls who would love such readily available data for future efforts.The questions from our survey:
- Roughly how many Prometheus *servers* are you operationally responsible for?
- Of all the Prometheus servers that you are responsible for, which version would you say is the most widely deployed?
The first two questions are good. I might modify the first one to clarify with/without HA. For example, we have 21 Prometheus servers, but 7 of those are duplicates for HA.
- How many unique metrics are reporting across all of your Prometheus servers?
- How many unique *timeseries* are reporting across all of your Prometheus servers?
These two need to be clarified for Prometheus. We tend to use the terms metrics and time-series interchangeably. Are you asking about unique metric names?
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CABbyFmqGobiq7NUF-jzqGXOphhG8JjYrb%3D19C5Sm6W5CctmquA%40mail.gmail.com.
On Sat, May 16, 2020 at 10:25 AM Ben Kochie <sup...@gmail.com> wrote:Thanks for the link to the other survey. That's pretty good.On Fri, May 15, 2020 at 7:27 PM 'Tom Lee' via Prometheus Users <promethe...@googlegroups.com> wrote:Hi Richard,Reading between the lines it sounds like we're potentially talking about a broader/larger "State of Clojure" type thing for Prometheus. Is that accurate?Certainly don't mind the results being public. My only real concern is timelines: we were hoping to use some of the raw data to help advise some load testing on our end, and things are already looking pretty aggressive. If we're looking at something that's going to take weeks or more to start seeing results rolling in we probably won't quite get the data we were hoping to get in time. From a purely selfish perspective we'd be pretty disappointed to go forward without data from "the source", so to speak. Of course, I totally understand the team's actions here. I'm just whining to myself.Timelines aside, we'd be excited to see something "official" in the longer term. It would be useful for engineers like myself, and I know there are product managers and research folks lurking our virtual halls who would love such readily available data for future efforts.The questions from our survey:
- Roughly how many Prometheus *servers* are you operationally responsible for?
- Of all the Prometheus servers that you are responsible for, which version would you say is the most widely deployed?
The first two questions are good. I might modify the first one to clarify with/without HA. For example, we have 21 Prometheus servers, but 7 of those are duplicates for HA.
- How many unique metrics are reporting across all of your Prometheus servers?
- How many unique *timeseries* are reporting across all of your Prometheus servers?
These two need to be clarified for Prometheus. We tend to use the terms metrics and time-series interchangeably. Are you asking about unique metric names?I guess the first question is about unique metric names. The problem is that there's no easy way to get the number of unique metric names across multiple servers, as there might be anything between 0 - 100% overlap of metric names between Prometheus servers, and getting users to calculate a set union might be too much work. Also, time series are more relevant than number of metrics in Prometheus, so maybe we should only keep the second question?
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAMUmz5gNNk4naMFLJ_cWQJQO5g1pGVTZnn74FF4PV%3DHOmhpTqw%40mail.gmail.com.
On Sat, May 16, 2020 at 11:13 AM Julius Volz <juliu...@gmail.com> wrote:
On Sat, May 16, 2020 at 10:25 AM Ben Kochie <sup...@gmail.com> wrote:
Thanks for the link to the other survey. That's pretty good.
On Fri, May 15, 2020 at 7:27 PM 'Tom Lee' via Prometheus Users <promethe...@googlegroups.com> wrote:
Hi Richard,
Reading between the lines it sounds like we're potentially talking about a broader/larger "State of Clojure" type thing for Prometheus. Is that accurate?
Certainly don't mind the results being public. My only real concern is timelines: we were hoping to use some of the raw data to help advise some load testing on our end, and things are already looking pretty aggressive. If we're looking at something that's going to take weeks or more to start seeing results rolling in we probably won't quite get the data we were hoping to get in time. From a purely selfish perspective we'd be pretty disappointed to go forward without data from "the source", so to speak. Of course, I totally understand the team's actions here. I'm just whining to myself.
Timelines aside, we'd be excited to see something "official" in the longer term. It would be useful for engineers like myself, and I know there are product managers and research folks lurking our virtual halls who would love such readily available data for future efforts.
The questions from our survey:
- Roughly how many Prometheus *servers* are you operationally responsible for?
- Of all the Prometheus servers that you are responsible for, which version would you say is the most widely deployed?
The first two questions are good. I might modify the first one to clarify with/without HA. For example, we have 21 Prometheus servers, but 7 of those are duplicates for HA.
- How many unique metrics are reporting across all of your Prometheus servers?
- How many unique *timeseries* are reporting across all of your Prometheus servers?
These two need to be clarified for Prometheus. We tend to use the terms metrics and time-series interchangeably. Are you asking about unique metric names?
I guess the first question is about unique metric names. The problem is that there's no easy way to get the number of unique metric names across multiple servers, as there might be anything between 0 - 100% overlap of metric names between Prometheus servers, and getting users to calculate a set union might be too much work. Also, time series are more relevant than number of metrics in Prometheus, so maybe we should only keep the second question?
Yes, I'm interested in what Tom's intent is behind the question. From a Prometheus perspective, the total time-series load is most important. But it might be different for his use case.
We should probably include some specific PromQL queries to make the results easy to gather for survey participants.
I think it would be really useful to give details of how to find the answers to all the questions - simple command line commands, etc - PromQL query, python -V, etc.
It wants to be as easy as possible to answer in my opinion.
Would we be interested in the usage of the wider ecosystem? E.g
usage of different SD methods, Alertmanager integrations, remote
read/write systems, Thanos/Cortex/VictoriaMetrics?
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CABbyFmrxYRAf0XBbKa7pguCmJbLCgWW-7oo-R_aB9rPQPL7xzA%40mail.gmail.com.
-- Stuart Clark
On 16/05/2020 10:18, Ben Kochie wrote:
On Sat, May 16, 2020 at 11:13 AM Julius Volz <juliu...@gmail.com> wrote:
On Sat, May 16, 2020 at 10:25 AM Ben Kochie <sup...@gmail.com> wrote:
Thanks for the link to the other survey. That's pretty good.
On Fri, May 15, 2020 at 7:27 PM 'Tom Lee' via Prometheus Users <promethe...@googlegroups.com> wrote:
Hi Richard,
Reading between the lines it sounds like we're potentially talking about a broader/larger "State of Clojure" type thing for Prometheus. Is that accurate?
Certainly don't mind the results being public. My only real concern is timelines: we were hoping to use some of the raw data to help advise some load testing on our end, and things are already looking pretty aggressive. If we're looking at something that's going to take weeks or more to start seeing results rolling in we probably won't quite get the data we were hoping to get in time. From a purely selfish perspective we'd be pretty disappointed to go forward without data from "the source", so to speak. Of course, I totally understand the team's actions here. I'm just whining to myself.
Timelines aside, we'd be excited to see something "official" in the longer term. It would be useful for engineers like myself, and I know there are product managers and research folks lurking our virtual halls who would love such readily available data for future efforts.
The questions from our survey:
- Roughly how many Prometheus *servers* are you operationally responsible for?
- Of all the Prometheus servers that you are responsible for, which version would you say is the most widely deployed?
The first two questions are good. I might modify the first one to clarify with/without HA. For example, we have 21 Prometheus servers, but 7 of those are duplicates for HA.
- How many unique metrics are reporting across all of your Prometheus servers?
- How many unique *timeseries* are reporting across all of your Prometheus servers?
These two need to be clarified for Prometheus. We tend to use the terms metrics and time-series interchangeably. Are you asking about unique metric names?
I guess the first question is about unique metric names. The problem is that there's no easy way to get the number of unique metric names across multiple servers, as there might be anything between 0 - 100% overlap of metric names between Prometheus servers, and getting users to calculate a set union might be too much work. Also, time series are more relevant than number of metrics in Prometheus, so maybe we should only keep the second question?
Yes, I'm interested in what Tom's intent is behind the question. From a Prometheus perspective, the total time-series load is most important. But it might be different for his use case.
We should probably include some specific PromQL queries to make the results easy to gather for survey participants.
I think it would be really useful to give details of how to find the answers to all the questions - simple command line commands, etc - PromQL query, python -V, etc.
It wants to be as easy as possible to answer in my opinion.
Would we be interested in the usage of the wider ecosystem? E.g usage of different SD methods, Alertmanager integrations, remote read/write systems, Thanos/Cortex/VictoriaMetrics?
On Sat, May 16, 2020 at 11:13 AM Julius Volz <juliu...@gmail.com> wrote:On Sat, May 16, 2020 at 10:25 AM Ben Kochie <sup...@gmail.com> wrote:Thanks for the link to the other survey. That's pretty good.On Fri, May 15, 2020 at 7:27 PM 'Tom Lee' via Prometheus Users <promethe...@googlegroups.com> wrote:Hi Richard,Reading between the lines it sounds like we're potentially talking about a broader/larger "State of Clojure" type thing for Prometheus. Is that accurate?Certainly don't mind the results being public. My only real concern is timelines: we were hoping to use some of the raw data to help advise some load testing on our end, and things are already looking pretty aggressive. If we're looking at something that's going to take weeks or more to start seeing results rolling in we probably won't quite get the data we were hoping to get in time. From a purely selfish perspective we'd be pretty disappointed to go forward without data from "the source", so to speak. Of course, I totally understand the team's actions here. I'm just whining to myself.Timelines aside, we'd be excited to see something "official" in the longer term. It would be useful for engineers like myself, and I know there are product managers and research folks lurking our virtual halls who would love such readily available data for future efforts.The questions from our survey:
- Roughly how many Prometheus *servers* are you operationally responsible for?
- Of all the Prometheus servers that you are responsible for, which version would you say is the most widely deployed?
The first two questions are good. I might modify the first one to clarify with/without HA. For example, we have 21 Prometheus servers, but 7 of those are duplicates for HA.
- How many unique metrics are reporting across all of your Prometheus servers?
- How many unique *timeseries* are reporting across all of your Prometheus servers?
These two need to be clarified for Prometheus. We tend to use the terms metrics and time-series interchangeably. Are you asking about unique metric names?I guess the first question is about unique metric names. The problem is that there's no easy way to get the number of unique metric names across multiple servers, as there might be anything between 0 - 100% overlap of metric names between Prometheus servers, and getting users to calculate a set union might be too much work. Also, time series are more relevant than number of metrics in Prometheus, so maybe we should only keep the second question?Yes, I'm interested in what Tom's intent is behind the question. From a Prometheus perspective, the total time-series load is most important. But it might be different for his use case.
We should probably include some specific PromQL queries to make the results easy to gather for survey participants.
Yes, I'm interested in what Tom's intent is behind the question. From a Prometheus perspective, the total time-series load is most important. But it might be different for his use case.Ah yep, really great question. I'm going to absolutely butcher the terminology here, but the idea is we're sort of trying to differentiate between "number of unique metric names" and "label/dimensional cardinality within those metrics". The reason for us differentiating is something of an implementation detail with respect to our own systems, but I think it also applies somewhat to Prometheus and/or Grafana too: when you run a non-aggregating query for a metric x, you might expect to see one timeseries charted -- or you might see hundreds or even thousands. In our own test setup we have JMX metrics for 15 Kafka servers reporting in. Executing a "query" like kafka_cluster_Partition_Value (a metric reported by the JMX exporter on behalf of Kafka) yields something like 20,000-30,000 distinct timeseries charted by Prometheus. It spends a surprising amount of time to execute that simple little query as a result. This sort of cardinality "explosion" has big implications for system architecture and scalability in our own systems, too.