How does api/v1/series get populated/updated?

29 views
Skip to first unread message

Kevin Vasko

unread,
Jun 26, 2020, 4:56:41 PM6/26/20
to Prometheus Users
Grafana is running this query against Prometheus server against the Prometheus endpoint to capture the “Job” values.

I have a “gpu-node” job and a “cpu-node” job configured in Prometheus.yml.

I don’t know why the following query (which Grafana is running) to pull both lists of nodes isn’t pulling all systems reporting in both “job”s unless I set change the date range to >24 hours.

For example:

This is <24hrs in the past from my current time

api/v1/series?match%5B%5D=node_uname_info&start=2020-06-26T5:23:00Z&end=2020-06-27T23:23:00Z – this will only return 1 “job” value (cpu-node) job.

This is >24 hours in the past.
api/v1/series?match%5B%5D=node_uname_info&start=2020-06-25T5:23:00Z&end=2020-06-27T23:23Z – this will return all of the “job” values (both, cpu-node and gpu-node).

I have checked and validated that the systems reporting into gpu-node do have data within the last 5 min. Why isn’t the endpoint showing that there is data from these systems?

When/How does this endpoint get updated properly that it pulls _all_ “Job” values?

Any help would be appreciated.

Brian Brazil

unread,
Jun 26, 2020, 5:19:33 PM6/26/20
to Kevin Vasko, Prometheus Users
Things should be added instantly, it's only removing where it's a little more complicated. What version of Prometheus is this? Can you confirm that there is data coming back from a query in the same time ranges for both jobs?

--

Kevin Vasko

unread,
Jun 26, 2020, 8:07:36 PM6/26/20
to Prometheus Users
I am new to Prometheus so i’m sure it’s something I’m overlooking.

Prometheus is version : 2.18.1

There is data as recently as 1 min ago in prometheus if i run this query (which makes sense, bc it’s pulling every 15s):

nvidia_gpu_power_usage{job=“gpu_nodes”}

If i run a query on that API within the last few hours I get nothing back for machines in “gpu_nodes”.

Brian Brazil

unread,
Jun 27, 2020, 5:37:49 AM6/27/20
to Kevin Vasko, Prometheus Users
http://demo.robustperception.io:9090/api/v1/series?match%5B%5D=up&start=2020-06-27T9:23:00Z&end=2020-06-28T23:23:00Z is working for me (the start is ~15 minutes before I'm writing this), so this doesn't seem to be broken. I'd suggest checking that node_uname_info is indeed coming from the GPU job.

--

Kevin Vasko

unread,
Jun 27, 2020, 9:22:16 AM6/27/20
to Brian Brazil, Prometheus Users
Sorry, how do I check that?

-Kevin

On Jun 27, 2020, at 4:37 AM, Brian Brazil <brian....@robustperception.io> wrote:



Brian Candler

unread,
Jun 27, 2020, 10:32:06 AM6/27/20
to Prometheus Users
Instead of

     nvidia_gpu_power_usage{job=“gpu_nodes”}

check

    node_uname_info{job="gpu_nodes"}

(since it's a query on "node_uname_info" which is apparently missing information)

Kevin Vasko

unread,
Jun 29, 2020, 1:27:07 PM6/29/20
to Brian Candler, Prometheus Users
ahhhh,

Okay. That’s it. I was using the gpu node exporter for those machines and apparently it doesn’t have the node_uname_info variable.

Just curious, what is the best practices for when to use different “job”(s)? 

I broke it down into “cpu-nodes” and “gpu-nodes” based on the exporter. I’m getting the feeling this isn’t the best idea.

-Kevin

On Jun 27, 2020, at 9:32 AM, Brian Candler <b.ca...@pobox.com> wrote:


--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a00436c8-e9d0-4655-9432-c318e3852e8fo%40googlegroups.com.

Brian Candler

unread,
Jun 29, 2020, 5:14:12 PM6/29/20
to Prometheus Users
On Monday, 29 June 2020 18:27:07 UTC+1, Kevin Vasko wrote:
ahhhh,

Okay. That’s it. I was using the gpu node exporter for those machines and apparently it doesn’t have the node_uname_info variable.

Just curious, what is the best practices for when to use different “job”(s)? 

I broke it down into “cpu-nodes” and “gpu-nodes” based on the exporter. I’m getting the feeling this isn’t the best idea.


No, that's absolutely right and proper - different jobs to scrape different exporters.

What you probably want to do is to control the instance labels, so that the metrics from both jobs both use instance="machine", rather than instance="machine:9100" and instance="machine:9200" (say).

This can be done with a little bit of label rewriting in your scrape jobs.  Then you can make queries which join between node_uname_info and nvidia_gpu_power_usage, using the common instance label.
Reply all
Reply to author
Forward
0 new messages