Prometheus recording rules

1,012 views
Skip to first unread message

Nikhil Goenka

unread,
Jul 17, 2017, 6:28:43 AM7/17/17
to Prometheus Users
Hi,
I have the following setup:

I have one prometheus server installed inside a network. There are several machines inside that network and I intend the prometheus server to act as an aggregator for those machines. Ideally, it will do a periodic snmp-walk (through snmp exporter) and aggregate the values received. The aggregator would do a snmp walk on all the machines inside the network and aggregate the values.

for e.g: There are 4 machines say type "A" with a snmp counter "xyz", so the aggregator would determine total for "xyz" as
xyz(Machine1) + xyz(Machine2) + xyz(Machine3) + xyz(Machine4).

I have written a recorder in order to apply the aggregation logic as follows:

job:snmp:sum=sum(alef_pp) by snmp

and my prometheus.yml (w.r.t snmp configuration) is as follows:

  - job_name: 'snmp'
    metrics_path : /snmp
    static_configs:
      - targets: ['192.168.0.191']
    params:
      module: [alef_pp]



However, I am not able to see any aggregation of the values. Am I missing something? Any help will be highly appreciated.


- Nikhil

Brian Brazil

unread,
Jul 17, 2017, 6:40:06 AM7/17/17
to Nikhil Goenka, Prometheus Users
On 17 July 2017 at 11:28, Nikhil Goenka <nikhil...@alefmobitech.com> wrote:
Hi,
I have the following setup:

I have one prometheus server installed inside a network. There are several machines inside that network and I intend the prometheus server to act as an aggregator for those machines. Ideally, it will do a periodic snmp-walk (through snmp exporter) and aggregate the values received. The aggregator would do a snmp walk on all the machines inside the network and aggregate the values.

for e.g: There are 4 machines say type "A" with a snmp counter "xyz", so the aggregator would determine total for "xyz" as
xyz(Machine1) + xyz(Machine2) + xyz(Machine3) + xyz(Machine4).

I have written a recorder in order to apply the aggregation logic as follows:

job:snmp:sum=sum(alef_pp) by snmp

This is an invalid expression. What you want is:

job:xyz:sum = sum by (job) (xyz)


You'd have one such rule per metric of interest.

Brian

 

and my prometheus.yml (w.r.t snmp configuration) is as follows:

  - job_name: 'snmp'
    metrics_path : /snmp
    static_configs:
      - targets: ['192.168.0.191']
    params:
      module: [alef_pp]



However, I am not able to see any aggregation of the values. Am I missing something? Any help will be highly appreciated.


- Nikhil

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAKH5-WFE9RZ0iMx8YO0o69fYPa57BUzQBRQtG8FCE3sTqqJHew%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.



--

Nikhil Goenka

unread,
Jul 17, 2017, 6:42:33 AM7/17/17
to Brian Brazil, Prometheus Users
So in the example shared by you - "job:xyz:sum = sum by (job) (xyz)", I assume xyz is to be substituted by module name whereas job by the actual job name.

Is my assumption correct?

On Mon, Jul 17, 2017 at 4:10 PM, Brian Brazil <brian....@robustperception.io> wrote:

Brian Brazil

unread,
Jul 17, 2017, 6:45:59 AM7/17/17
to Nikhil Goenka, Prometheus Users
On 17 July 2017 at 11:42, Nikhil Goenka <nikhil...@alefmobitech.com> wrote:
So in the example shared by you - "job:xyz:sum = sum by (job) (xyz)",

 
I assume xyz is to be substituted by module name

No, xyz is the metric name. A single snmp_exporter module will usually produce many, many metrics such as ifInOctets and ifOutOctets.

The module parameter is used to select what the snmp_exporter requests from the device, it would not appear in the resultant metrics.
 
whereas job by the actual job name.

Yes.

Brian
 

Is my assumption correct?


 

On Mon, Jul 17, 2017 at 4:10 PM, Brian Brazil <brian.brazil@robustperception.io> wrote:




--

Brian Brazil

unread,
Jul 18, 2017, 3:33:15 AM7/18/17
to Nikhil Goenka, Prometheus Users
On 18 July 2017 at 06:02, Nikhil Goenka <nikhil...@alefmobitech.com> wrote:
Thanks Brian.

Is there a way I can login into the database of prometheus to retrieve these values (generated by recording)? I have a utility which is going to use the aggregated values to set certain MIBs.



Brian



--

Nikhil Goenka

unread,
Jul 18, 2017, 5:30:55 AM7/18/17
to Brian Brazil, Prometheus Users
I do not see any entry for the recording in the generated metrics.

I have added the following rule:


On Tue, Jul 18, 2017 at 1:03 PM, Brian Brazil <brian....@robustperception.io> wrote:

Nikhil Goenka

unread,
Jul 18, 2017, 5:34:02 AM7/18/17
to Brian Brazil, Prometheus Users
I do not see any entry for the recording in the generated metrics after waiting for the evaluation_interval.

I have added the following rule:
snmp:alefPPUserSessionCount:sum = sum by (snmp)(alefPPUserSessionCount)

- where:
   snmp is my job_name
   alefPPUserSessionCount is the metrics name

Once the rule was added, I configured the rule file in prometheus.yml as:
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
    - "first.rules" -- This contains the recording rule.
    - "alert.rules"

Am I missing something? By what name will the recording rule seen in the generated metrics?



However, I do not see any entry 

On Tue, Jul 18, 2017 at 3:00 PM, Nikhil Goenka <nikhil...@alefmobitech.com> wrote:
I do not see any entry for the recording in the generated metrics.

I have added the following rule:


On Tue, Jul 18, 2017 at 1:03 PM, Brian Brazil <brian.brazil@robustperception.io> wrote:

Nikhil Goenka

unread,
Jul 18, 2017, 12:21:14 PM7/18/17
to Brian Brazil, Prometheus Users
Any inputs please?

On Tue, Jul 18, 2017 at 3:04 PM, Nikhil Goenka <nikhil...@alefmobitech.com> wrote:
I do not see any entry for the recording in the generated metrics after waiting for the evaluation_interval.

I have added the following rule:
snmp:alefPPUserSessionCount:sum = sum by (snmp)(alefPPUserSessionCount)

- where:
   snmp is my job_name
   alefPPUserSessionCount is the metrics name

Once the rule was added, I configured the rule file in prometheus.yml as:
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
    - "first.rules" -- This contains the recording rule.
    - "alert.rules"

Am I missing something? By what name will the recording rule seen in the generated metrics?



However, I do not see any entry 

Tobias Schmidt

unread,
Jul 18, 2017, 2:53:41 PM7/18/17
to Nikhil Goenka, Brian Brazil, Prometheus Users
You can try any rule expression in Prometheus' graph interface. If it doesn't return any results, the rule won't return anything either.

An aggregation happens using the label name, not the label value. Try `job:alefPPUserSessionCount:sum=sum by (job)(alefPPUserSessionCount)` instead.

It should be helpful to read https://prometheus.io/docs/practices/rules/.

On Tue, Jul 18, 2017 at 12:21 PM Nikhil Goenka <nikhil...@alefmobitech.com> wrote:
Any inputs please?

On Tue, Jul 18, 2017 at 3:04 PM, Nikhil Goenka <nikhil...@alefmobitech.com> wrote:
I do not see any entry for the recording in the generated metrics after waiting for the evaluation_interval.

I have added the following rule:
snmp:alefPPUserSessionCount:sum = sum by (snmp)(alefPPUserSessionCount)

- where:
   snmp is my job_name
   alefPPUserSessionCount is the metrics name

Once the rule was added, I configured the rule file in prometheus.yml as:
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
    - "first.rules" -- This contains the recording rule.
    - "alert.rules"

Am I missing something? By what name will the recording rule seen in the generated metrics?



However, I do not see any entry 
 

Is my assumption correct?


 

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.



--




--




--



--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAKH5-WEC-eo-orPuk%3DAC14T4%3DVORWeQeUB3snAZZmm%2BsiT%2BQMw%40mail.gmail.com.

Nikhil Goenka

unread,
Jul 19, 2017, 5:00:42 AM7/19/17
to Tobias Schmidt, Brian Brazil, Prometheus Users
I am getting the following error for my rules:
cat first.rules
snmp:alefPPUserSessionCount:sum = sum by (snmp)(alefPPUserSessionCount)

Here snmp is my job name and alefPPUserSessionCount is the metrics I wish to monitor.

The error that I am getting is as following:

root@amu-opr01:/home/alef/shubhada/prometheus/prometheus-2.0.0-beta.0.linux-amd64# ./promtool check rules  first.rules
Checking first.rules
  FAILED:
yaml: unmarshal errors:
  line 1: cannot unmarshal !!str `snmp:al...` into rulefmt.RuleGroups



On Wed, Jul 19, 2017 at 12:23 AM, Tobias Schmidt <tob...@gmail.com> wrote:
You can try any rule expression in Prometheus' graph interface. If it doesn't return any results, the rule won't return anything either.

An aggregation happens using the label name, not the label value. Try `job:alefPPUserSessionCount:sum=sum by (job)(alefPPUserSessionCount)` instead.

It should be helpful to read https://prometheus.io/docs/practices/rules/.

On Tue, Jul 18, 2017 at 12:21 PM Nikhil Goenka <nikhil.goenka@alefmobitech.com> wrote:
Any inputs please?

On Tue, Jul 18, 2017 at 3:04 PM, Nikhil Goenka <nikhil.goenka@alefmobitech.com> wrote:
I do not see any entry for the recording in the generated metrics after waiting for the evaluation_interval.

I have added the following rule:
snmp:alefPPUserSessionCount:sum = sum by (snmp)(alefPPUserSessionCount)

- where:
   snmp is my job_name
   alefPPUserSessionCount is the metrics name

Once the rule was added, I configured the rule file in prometheus.yml as:
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
    - "first.rules" -- This contains the recording rule.
    - "alert.rules"

Am I missing something? By what name will the recording rule seen in the generated metrics?



However, I do not see any entry 
On Tue, Jul 18, 2017 at 3:00 PM, Nikhil Goenka <nikhil.goenka@alefmobitech.com> wrote:
I do not see any entry for the recording in the generated metrics.

I have added the following rule:


On Tue, Jul 18, 2017 at 1:03 PM, Brian Brazil <brian.brazil@robustperception.io> wrote:
On 18 July 2017 at 06:02, Nikhil Goenka <nikhil.goenka@alefmobitech.com> wrote:
Thanks Brian.

Is there a way I can login into the database of prometheus to retrieve these values (generated by recording)? I have a utility which is going to use the aggregated values to set certain MIBs.



Brian

On Mon, Jul 17, 2017 at 4:15 PM, Brian Brazil <brian.brazil@robustperception.io> wrote:
On 17 July 2017 at 11:42, Nikhil Goenka <nikhil.goenka@alefmobitech.com> wrote:
So in the example shared by you - "job:xyz:sum = sum by (job) (xyz)",

 
I assume xyz is to be substituted by module name

No, xyz is the metric name. A single snmp_exporter module will usually produce many, many metrics such as ifInOctets and ifOutOctets.

The module parameter is used to select what the snmp_exporter requests from the device, it would not appear in the resultant metrics.
 
whereas job by the actual job name.

Yes.

Brian
 

Is my assumption correct?


 

On Mon, Jul 17, 2017 at 4:10 PM, Brian Brazil <brian.brazil@robustperception.io> wrote:
On 17 July 2017 at 11:28, Nikhil Goenka <nikhil.goenka@alefmobitech.com> wrote:
Hi,
I have the following setup:

I have one prometheus server installed inside a network. There are several machines inside that network and I intend the prometheus server to act as an aggregator for those machines. Ideally, it will do a periodic snmp-walk (through snmp exporter) and aggregate the values received. The aggregator would do a snmp walk on all the machines inside the network and aggregate the values.

for e.g: There are 4 machines say type "A" with a snmp counter "xyz", so the aggregator would determine total for "xyz" as
xyz(Machine1) + xyz(Machine2) + xyz(Machine3) + xyz(Machine4).

I have written a recorder in order to apply the aggregation logic as follows:

job:snmp:sum=sum(alef_pp) by snmp

This is an invalid expression. What you want is:

job:xyz:sum = sum by (job) (xyz)


You'd have one such rule per metric of interest.

Brian

 

and my prometheus.yml (w.r.t snmp configuration) is as follows:

  - job_name: 'snmp'
    metrics_path : /snmp
    static_configs:
      - targets: ['192.168.0.191']
    params:
      module: [alef_pp]



However, I am not able to see any aggregation of the values. Am I missing something? Any help will be highly appreciated.


- Nikhil

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.



--




--




--



--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.

Tobias Schmidt

unread,
Jul 19, 2017, 9:31:44 AM7/19/17
to Nikhil Goenka, Brian Brazil, Prometheus Users
You're using the beta of Prometheus 2.0 but apparently the old rules format. Read https://prometheus.io/blog/2017/06/21/prometheus-20-alpha3-new-rule-format/ or stick to the 1.x series of Prometheus for now.

On Wed, Jul 19, 2017 at 5:00 AM Nikhil Goenka <nikhil...@alefmobitech.com> wrote:
I am getting the following error for my rules:
cat first.rules
snmp:alefPPUserSessionCount:sum = sum by (snmp)(alefPPUserSessionCount)

Here snmp is my job name and alefPPUserSessionCount is the metrics I wish to monitor.

The error that I am getting is as following:

root@amu-opr01:/home/alef/shubhada/prometheus/prometheus-2.0.0-beta.0.linux-amd64# ./promtool check rules  first.rules
Checking first.rules
  FAILED:
yaml: unmarshal errors:
  line 1: cannot unmarshal !!str `snmp:al...` into rulefmt.RuleGroups


On Wed, Jul 19, 2017 at 12:23 AM, Tobias Schmidt <tob...@gmail.com> wrote:
You can try any rule expression in Prometheus' graph interface. If it doesn't return any results, the rule won't return anything either.

An aggregation happens using the label name, not the label value. Try `job:alefPPUserSessionCount:sum=sum by (job)(alefPPUserSessionCount)` instead.

It should be helpful to read https://prometheus.io/docs/practices/rules/.

On Tue, Jul 18, 2017 at 12:21 PM Nikhil Goenka <nikhil...@alefmobitech.com> wrote:
Any inputs please?

On Tue, Jul 18, 2017 at 3:04 PM, Nikhil Goenka <nikhil...@alefmobitech.com> wrote:
I do not see any entry for the recording in the generated metrics after waiting for the evaluation_interval.

I have added the following rule:
snmp:alefPPUserSessionCount:sum = sum by (snmp)(alefPPUserSessionCount)

- where:
   snmp is my job_name
   alefPPUserSessionCount is the metrics name

Once the rule was added, I configured the rule file in prometheus.yml as:
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
    - "first.rules" -- This contains the recording rule.
    - "alert.rules"

Am I missing something? By what name will the recording rule seen in the generated metrics?



However, I do not see any entry 
On Tue, Jul 18, 2017 at 3:00 PM, Nikhil Goenka <nikhil...@alefmobitech.com> wrote:
I do not see any entry for the recording in the generated metrics.

I have added the following rule:


On Tue, Jul 18, 2017 at 1:03 PM, Brian Brazil <brian....@robustperception.io> wrote:
On 18 July 2017 at 06:02, Nikhil Goenka <nikhil...@alefmobitech.com> wrote:
Thanks Brian.

Is there a way I can login into the database of prometheus to retrieve these values (generated by recording)? I have a utility which is going to use the aggregated values to set certain MIBs.



Brian

On Mon, Jul 17, 2017 at 4:15 PM, Brian Brazil <brian....@robustperception.io> wrote:
On 17 July 2017 at 11:42, Nikhil Goenka <nikhil...@alefmobitech.com> wrote:
So in the example shared by you - "job:xyz:sum = sum by (job) (xyz)",

 
I assume xyz is to be substituted by module name

No, xyz is the metric name. A single snmp_exporter module will usually produce many, many metrics such as ifInOctets and ifOutOctets.

The module parameter is used to select what the snmp_exporter requests from the device, it would not appear in the resultant metrics.
 
whereas job by the actual job name.

Yes.

Brian
 

Is my assumption correct?


 

On Mon, Jul 17, 2017 at 4:10 PM, Brian Brazil <brian....@robustperception.io> wrote:
On 17 July 2017 at 11:28, Nikhil Goenka <nikhil...@alefmobitech.com> wrote:
Hi,
I have the following setup:

I have one prometheus server installed inside a network. There are several machines inside that network and I intend the prometheus server to act as an aggregator for those machines. Ideally, it will do a periodic snmp-walk (through snmp exporter) and aggregate the values received. The aggregator would do a snmp walk on all the machines inside the network and aggregate the values.

for e.g: There are 4 machines say type "A" with a snmp counter "xyz", so the aggregator would determine total for "xyz" as
xyz(Machine1) + xyz(Machine2) + xyz(Machine3) + xyz(Machine4).

I have written a recorder in order to apply the aggregation logic as follows:

job:snmp:sum=sum(alef_pp) by snmp

This is an invalid expression. What you want is:

job:xyz:sum = sum by (job) (xyz)


You'd have one such rule per metric of interest.

Brian

 

and my prometheus.yml (w.r.t snmp configuration) is as follows:

  - job_name: 'snmp'
    metrics_path : /snmp
    static_configs:
      - targets: ['192.168.0.191']
    params:
      module: [alef_pp]



However, I am not able to see any aggregation of the values. Am I missing something? Any help will be highly appreciated.


- Nikhil

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.



--




--




--



--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages