Python prometheus client - metrics duplicated in multiprocessing mode, what to do?

42 views
Skip to first unread message

Nikolay Vasiliev

unread,
Feb 20, 2020, 6:05:20 AM2/20/20
to Prometheus Developers
Hello,

I would like to ask your opinion about a problem I figured out recently with how python prometheus client functions in our system.

Description of the problem

The problem is best described by a test (you can copy-paste it into test_multiprocessing.py/TestMultiProcess):

def test_gauge_all_accross_forks(self):
    pid
= 0
    values.ValueClass = MultiProcessValue(lambda: pid)
    g1
= Gauge('g1', 'help', registry=None)
    g1
.set(1)
    pid
= 1
    g2 = Gauge('g2', 'help', registry=None)
    g2
.set(2)
   
# this works fine
    self.assertEqual(1, self.registry.get_sample_value('g1', {'pid': '0'}))
   
self.assertEqual(2, self.registry.get_sample_value('g2', {'pid': '1'}))

   
# this metric has never been reported from pid:1, thus should not be present
    self.assertIsNone(1, self.registry.get_sample_value('g1', {'pid': '1'}))

Or more verbose, steps to reproduce:

- multiprocessing environment
- report a metric X from parent process (with pid 0)
- fork
- continue reporting metric X from parent process (with pid 0)
- report a metric Y from child process (with pid 1)
- collect metrics via normal mutliprocessing collector

Expectation:
1. metric X is reported with label "pid: 0", non-zero value
2. metric Y is reported with label "pidL 1", non-zero value
3. metric X is NOT reported with label "pid: 1" (i.e. it is not reported from pid 1 - should not be present)

Actual:
1) and 2) holds, 3) does not.

I was wondering if I can somehow fix the way we report metrics, on our side, but I discovered that this is not possible.

Results of my investigation

As of current master, `values` list here is used to store the list of all metric values that are synced via a memory-mapped dict:

And when there's a fork, this list gets copied over to child process. Then when it detects that there was a pid change, it "resets" the memory-mapped file,
and goes over all metric values:
and tries to read them from the file:

But for the parent metrics there isn't anything in the brand new memory mapped file. So `self._file.read_value(self._key)` initializes it... with 0!

Why is it a problem

Although it is not blocking us from using the client (yet), it creates N (number of parent metrics) times M (number of child processes) times series that are just 
occupying the prometheus. But it will eventually become a problem since the number of metrics is always growing and can reach some critical mass.

TL;DR What do you guys think? I cannot think of a decent solution just yet.

Thank you!



Brian Brazil

unread,
Feb 20, 2020, 6:19:04 AM2/20/20
to Nikolay Vasiliev, Prometheus Developers
On Thu, 20 Feb 2020 at 11:05, Nikolay Vasiliev <vasi...@spaziodati.eu> wrote:
Hello,

I would like to ask your opinion about a problem I figured out recently with how python prometheus client functions in our system.

Description of the problem

The problem is best described by a test (you can copy-paste it into test_multiprocessing.py/TestMultiProcess):

def test_gauge_all_accross_forks(self):
    pid
= 0
    values.ValueClass = MultiProcessValue(lambda: pid)
    g1
= Gauge('g1', 'help', registry=None)
    g1
.set(1)
    pid
= 1
    g2 = Gauge('g2', 'help', registry=None)
    g2
.set(2)
   
# this works fine
    self.assertEqual(1, self.registry.get_sample_value('g1', {'pid': '0'}))
   
self.assertEqual(2, self.registry.get_sample_value('g2', {'pid': '1'}))

   
# this metric has never been reported from pid:1, thus should not be present

This is incorrect. It was there before the fork, so it's still there after the fork. It never stopped "reporting" at any point.
Have you considered fork+exec?

Brian

 
TL;DR What do you guys think? I cannot think of a decent solution just yet.

Thank you!



--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/4a6f0abb-9555-4ead-be58-007b45120db4%40googlegroups.com.


--

Nikolay Vasiliev

unread,
Feb 20, 2020, 8:12:14 AM2/20/20
to Brian Brazil, Prometheus Developers
Let me respond inline.

On Thu, 20 Feb 2020 at 12:19, Brian Brazil <brian....@robustperception.io> wrote:
On Thu, 20 Feb 2020 at 11:05, Nikolay Vasiliev <vasi...@spaziodati.eu> wrote:
Hello,

I would like to ask your opinion about a problem I figured out recently with how python prometheus client functions in our system.

Description of the problem

The problem is best described by a test (you can copy-paste it into test_multiprocessing.py/TestMultiProcess):

def test_gauge_all_accross_forks(self):
    pid
= 0
    values.ValueClass = MultiProcessValue(lambda: pid)
    g1
= Gauge('g1', 'help', registry=None)
    g1
.set(1)
    pid
= 1
    g2 = Gauge('g2', 'help', registry=None)
    g2
.set(2)
   
# this works fine
    self.assertEqual(1, self.registry.get_sample_value('g1', {'pid': '0'}))
   
self.assertEqual(2, self.registry.get_sample_value('g2', {'pid': '1'}))

   
# this metric has never been reported from pid:1, thus should not be present

This is incorrect. It was there before the fork, so it's still there after the fork. It never stopped "reporting" at any point.


The problem for me is that now this metric is in the prometheus output 2 times, with 2 different sets of labels:
g1{pid="0"} 1.0
g1{pid="1"} 0.0

I would like a way not to have that g1{pid="1"} thing in the prometheus output. 

Do you think it is possible? The "ghost" metric for pid="1" looks wrong.
We are using multiprocessing.Process, I am honestly not sure exactly how it does the job, I only know that it does a fork.
 
Brian

 
TL;DR What do you guys think? I cannot think of a decent solution just yet.

Thank you!



--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/4a6f0abb-9555-4ead-be58-007b45120db4%40googlegroups.com.


--


--
Best regards,
Nikolay Vasiliev.

Brian Brazil

unread,
Feb 20, 2020, 8:33:43 AM2/20/20
to Nikolay Vasiliev, Prometheus Developers
As I said, you'd need to fork&exec. You can't remove metrics in multiprocessing mode.
That'd just be a fork.

Brian
 
 
Brian

 
TL;DR What do you guys think? I cannot think of a decent solution just yet.

Thank you!



--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/4a6f0abb-9555-4ead-be58-007b45120db4%40googlegroups.com.


--


--
Best regards,
Nikolay Vasiliev.

Nikolay Vasiliev

unread,
Feb 20, 2020, 9:39:35 AM2/20/20
to Brian Brazil, Prometheus Developers
Thanks for the response.

Now that I looked up what exec means, I see what you mean, if fork would be followed by an exec() yes it will initialize the entire process from scratch and hence all the internal state of the prometheus_client. This unfortunately means that the prometheus library and multiprocessing.Process are not entirely compatible.

> You can't remove metrics in multiprocessing mode.

Just to be sure, this also means that there's no chance our use case gets supported by the library?
I can imagine a fix (clearing that `values` list after fork() explicitly upon a request of the user of the client library) but don't know how well it goes inline with other code of the library.

Brian Brazil

unread,
Feb 20, 2020, 12:15:29 PM2/20/20
to Nikolay Vasiliev, Prometheus Developers
On Thu, 20 Feb 2020 at 14:39, Nikolay Vasiliev <vasi...@spaziodati.eu> wrote:
Thanks for the response.

Now that I looked up what exec means, I see what you mean, if fork would be followed by an exec() yes it will initialize the entire process from scratch and hence all the internal state of the prometheus_client. This unfortunately means that the prometheus library and multiprocessing.Process are not entirely compatible.

> You can't remove metrics in multiprocessing mode.

Just to be sure, this also means that there's no chance our use case gets supported by the library?
I can imagine a fix (clearing that `values` list after fork() explicitly upon a request of the user of the client library) but don't know how well it goes inline with other code of the library.

It'd seem like a lot of complexity for something that you're not meant to do (unregister metrics) outside of certain unittests.

Brian

Nikolay Vasiliev

unread,
Feb 21, 2020, 3:30:15 AM2/21/20
to Brian Brazil, Prometheus Developers
Thanks! I see :( We'll try to get by with some hack then.
Reply all
Reply to author
Forward
0 new messages