How does api/v1/series get populated/updated?

Kevin Vasko

unread,

Jun 26, 2020, 4:56:41 PM6/26/20

to Prometheus Users

Grafana is running this query against Prometheus server against the Prometheus endpoint to capture the “Job” values.

I have a “gpu-node” job and a “cpu-node” job configured in Prometheus.yml.

I don’t know why the following query (which Grafana is running) to pull both lists of nodes isn’t pulling all systems reporting in both “job”s unless I set change the date range to >24 hours.

For example:

This is <24hrs in the past from my current time

api/v1/series?match%5B%5D=node_uname_info&start=2020-06-26T5:23:00Z&end=2020-06-27T23:23:00Z – this will only return 1 “job” value (cpu-node) job.

This is >24 hours in the past.
api/v1/series?match%5B%5D=node_uname_info&start=2020-06-25T5:23:00Z&end=2020-06-27T23:23Z – this will return all of the “job” values (both, cpu-node and gpu-node).

I have checked and validated that the systems reporting into gpu-node do have data within the last 5 min. Why isn’t the endpoint showing that there is data from these systems?

When/How does this endpoint get updated properly that it pulls _all_ “Job” values?

Any help would be appreciated.

Brian Brazil

unread,

Jun 26, 2020, 5:19:33 PM6/26/20

to Kevin Vasko, Prometheus Users

Things should be added instantly, it's only removing where it's a little more complicated. What version of Prometheus is this? Can you confirm that there is data coming back from a query in the same time ranges for both jobs?

--

Brian Brazil

www.robustperception.io

Kevin Vasko

unread,

Jun 26, 2020, 8:07:36 PM6/26/20

to Prometheus Users

I am new to Prometheus so i’m sure it’s something I’m overlooking.

Prometheus is version : 2.18.1

There is data as recently as 1 min ago in prometheus if i run this query (which makes sense, bc it’s pulling every 15s):

nvidia_gpu_power_usage{job=“gpu_nodes”}

If i run a query on that API within the last few hours I get nothing back for machines in “gpu_nodes”.

Brian Brazil

unread,

Jun 27, 2020, 5:37:49 AM6/27/20

to Kevin Vasko, Prometheus Users

http://demo.robustperception.io:9090/api/v1/series?match%5B%5D=up&start=2020-06-27T9:23:00Z&end=2020-06-28T23:23:00Z is working for me (the start is ~15 minutes before I'm writing this), so this doesn't seem to be broken. I'd suggest checking that node_uname_info is indeed coming from the GPU job.

--

Brian Brazil

www.robustperception.io

Kevin Vasko

unread,

Jun 27, 2020, 9:22:16 AM6/27/20

to Brian Brazil, Prometheus Users

Sorry, how do I check that?

-Kevin

On Jun 27, 2020, at 4:37 AM, Brian Brazil <brian....@robustperception.io> wrote:

Brian Candler

unread,

Jun 27, 2020, 10:32:06 AM6/27/20

to Prometheus Users

Instead of

nvidia_gpu_power_usage{job=“gpu_nodes”}

check

node_uname_info{job="gpu_nodes"}

(since it's a query on "node_uname_info" which is apparently missing information)

Kevin Vasko

unread,

Jun 29, 2020, 1:27:07 PM6/29/20

to Brian Candler, Prometheus Users

ahhhh,

Okay. That’s it. I was using the gpu node exporter for those machines and apparently it doesn’t have the node_uname_info variable.

Just curious, what is the best practices for when to use different “job”(s)?

I broke it down into “cpu-nodes” and “gpu-nodes” based on the exporter. I’m getting the feeling this isn’t the best idea.

-Kevin

On Jun 27, 2020, at 9:32 AM, Brian Candler <b.ca...@pobox.com> wrote:

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a00436c8-e9d0-4655-9432-c318e3852e8fo%40googlegroups.com.

Brian Candler

unread,

Jun 29, 2020, 5:14:12 PM6/29/20

to Prometheus Users

On Monday, 29 June 2020 18:27:07 UTC+1, Kevin Vasko wrote:

ahhhh,

Okay. That’s it. I was using the gpu node exporter for those machines and apparently it doesn’t have the node_uname_info variable.

Just curious, what is the best practices for when to use different “job”(s)?

I broke it down into “cpu-nodes” and “gpu-nodes” based on the exporter. I’m getting the feeling this isn’t the best idea.

No, that's absolutely right and proper - different jobs to scrape different exporters.

What you probably want to do is to control the instance labels, so that the metrics from both jobs both use instance="machine", rather than instance="machine:9100" and instance="machine:9200" (say).

https://www.robustperception.io/controlling-the-instance-label

This can be done with a little bit of label rewriting in your scrape jobs. Then you can make queries which join between node_uname_info and nvidia_gpu_power_usage, using the common instance label.

Reply all

Reply to author

Forward