Proposal to speed up jmx

Andrey Falko

unread,

Jan 22, 2018, 11:29:38 PM1/22/18

to Prometheus Developers

Hi everyone,

While profiling some applications on I've noticed that the following regex shows up very high in my top functions list: https://github.com/prometheus/jmx_exporter/blob/master/collector/src/main/java/io/prometheus/jmx/JmxScraper.java#L177

To make matters worse, when my applications get highly loaded, the scrape time grows linearly. That regex is invoked on every scrape, slows down my systems, and causes delayed metrics feedback. The crux of the problem is that getKeyPropertyList is created on every scrape: https://github.com/prometheus/jmx_exporter/blob/master/collector/src/main/java/io/prometheus/jmx/JmxScraper.java#L161 . I'd like that to happen a lot less frequently.

I propose one of the following solutions:
1) Change the code to only run that function in the JmxCollector constructor and reloadConfig function (I happen to be sitting on this change locally)
2) Memoize getKeyPropertyList
3) Change yaml configuration so that users have the option to auto-discover and load all jmx mbeans and use reflection to find their properties (i.e. no regexes). That discovery would happen once at runtime. We'd still keep the current regex method and users can keep their old configs.

I'm more than happy to contribute one of these as a pull request and am open to other ideas as well.

Please advise on how I should proceed. Thank you!

Best regards,
Andrey Falko

David Karlsen

unread,

Jan 23, 2018, 4:20:35 AM1/23/18

to Andrey Falko, Prometheus Developers

It would be great with performance-increase on this library. We struggle a bit with the scrape time the way it is now. One thing to be aware of during the design is that som jmx-beans may not have registered the first time the scraper runs, so that it should allow for more beans appearing after some time.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/989bd868-29e3-4a4d-b746-5508ab0323bf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

--
David J. M. Karlsen - http://www.linkedin.com/in/davidkarlsen

Ben Kochie

unread,

Jan 23, 2018, 4:52:15 AM1/23/18

to David Karlsen, Andrey Falko, Prometheus Developers

+1, I suggest you start with putting this as a proposal on the jmx_exporter issue tracker.

On Tue, Jan 23, 2018 at 10:20 AM, David Karlsen <davidk...@gmail.com> wrote:

It would be great with performance-increase on this library. We struggle a bit with the scrape time the way it is now. One thing to be aware of during the design is that som jmx-beans may not have registered the first time the scraper runs, so that it should allow for more beans appearing after some time.

2018-01-23 5:29 GMT+01:00 Andrey Falko <ma3o...@gmail.com>:

Hi everyone,

While profiling some applications on I've noticed that the following regex shows up very high in my top functions list: https://github.com/prometheus/jmx_exporter/blob/master/collector/src/main/java/io/prometheus/jmx/JmxScraper.java#L177

To make matters worse, when my applications get highly loaded, the scrape time grows linearly. That regex is invoked on every scrape, slows down my systems, and causes delayed metrics feedback. The crux of the problem is that getKeyPropertyList is created on every scrape: https://github.com/prometheus/jmx_exporter/blob/master/collector/src/main/java/io/prometheus/jmx/JmxScraper.java#L161 . I'd like that to happen a lot less frequently.

I propose one of the following solutions:
1) Change the code to only run that function in the JmxCollector constructor and reloadConfig function (I happen to be sitting on this change locally)
2) Memoize getKeyPropertyList
3) Change yaml configuration so that users have the option to auto-discover and load all jmx mbeans and use reflection to find their properties (i.e. no regexes). That discovery would happen once at runtime. We'd still keep the current regex method and users can keep their old configs.

I'm more than happy to contribute one of these as a pull request and am open to other ideas as well.

Please advise on how I should proceed. Thank you!

Best regards,
Andrey Falko

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsubscri...@googlegroups.com.

To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/989bd868-29e3-4a4d-b746-5508ab0323bf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
--
David J. M. Karlsen - http://www.linkedin.com/in/davidkarlsen

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAGO7Ob0qAf%2Bdum_me9RCj5kNDYLvhhVEsmQZkaRXujj042n37g%40mail.gmail.com.

Brian Brazil

unread,

Jan 23, 2018, 5:51:03 AM1/23/18

to Andrey Falko, Prometheus Developers

On 23 January 2018 at 04:29, Andrey Falko <ma3o...@gmail.com> wrote:

Hi everyone,

While profiling some applications on I've noticed that the following regex shows up very high in my top functions list: https://github.com/prometheus/jmx_exporter/blob/master/collector/src/main/java/io/prometheus/jmx/JmxScraper.java#L177

To make matters worse, when my applications get highly loaded, the scrape time grows linearly. That regex is invoked on every scrape, slows down my systems, and causes delayed metrics feedback. The crux of the problem is that getKeyPropertyList is created on every scrape: https://github.com/prometheus/jmx_exporter/blob/master/collector/src/main/java/io/prometheus/jmx/JmxScraper.java#L161 . I'd like that to happen a lot less frequently.

I propose one of the following solutions:
1) Change the code to only run that function in the JmxCollector constructor and reloadConfig function (I happen to be sitting on this change locally)

mBeans can be (and often are) added after the exporter starts.

2) Memoize getKeyPropertyList

Some applications can have a lot of churn, do you have an idea to prevent this becoming a memory leak?

3) Change yaml configuration so that users have the option to auto-discover and load all jmx mbeans and use reflection to find their properties (i.e. no regexes). That discovery would happen once at runtime. We'd still keep the current regex method and users can keep their old configs.

Same issues as for 1), and would probably break something.

Brian

I'm more than happy to contribute one of these as a pull request and am open to other ideas as well.

Please advise on how I should proceed. Thank you!

Best regards,
Andrey Falko

--

You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/989bd868-29e3-4a4d-b746-5508ab0323bf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Brian Brazil

www.robustperception.io

Andrey F.

unread,

Jan 23, 2018, 2:04:08 PM1/23/18

to Brian Brazil, Prometheus Developers

On Tue, Jan 23, 2018 at 2:51 AM, Brian Brazil
<brian....@robustperception.io> wrote:
>>
>> I propose one of the following solutions:
>> 1) Change the code to only run that function in the JmxCollector
>> constructor and reloadConfig function (I happen to be sitting on this change
>> locally)
>
>
> mBeans can be (and often are) added after the exporter starts.
>

Good point, I didn't consider this. I can think of two ways to solve that:
(a) Add an option to yaml that allows users to specify a delay at
agent start up. The default would be something reasonable.
(b) Have an option where the mBeans are re-polled at regular
intervals. There is already a "reloadConfig" feature; perhaps a subset
of that can be run at a user-specified cadence w/ a reasonable
default.

I'm guessing that (b) would be better because it would cover the
dynamic mBean use-cases. However (a) could be used by non-dynamic use
cases to get maximum efficiency. Perhaps both (a) and (b) could be
implemented too.

>>
>> 2) Memoize getKeyPropertyList
>
>
> Some applications can have a lot of churn, do you have an idea to prevent
> this becoming a memory leak?
>

My initial assumption was that mBeans only need to be polled once, so
there would be logic to only run regexes if the keyPropertyList was
missing; so effectively after initial load, it would ignore any churn
and only have the initial mBeans loaded. In lieu of my assumption, the
only idea I have to avoid a leak is to split getKeyPropertyList to run
in another thread at its own cadence. However, that would essentially
look like what I described solution 1)(b) above.

>>
>> 3) Change yaml configuration so that users have the option to
>> auto-discover and load all jmx mbeans and use reflection to find their
>> properties (i.e. no regexes). That discovery would happen once at runtime.
>> We'd still keep the current regex method and users can keep their old
>> configs.
>
>
> Same issues as for 1), and would probably break something.
>

Yes, this would have the same issues as 1). (a) and (b) would apply to
this idea as well. In terms of breaking something, wouldn't that be
minimized by this being a required opt-in via config? All the previous
code would most remain untouched with this idea.

> Brian
>

Andrey F.

unread,

Jan 23, 2018, 2:07:00 PM1/23/18

to Ben Kochie, David Karlsen, Prometheus Developers

Thanks Ben, I've created the issue:
https://github.com/prometheus/jmx_exporter/issues/233

>>> email to prometheus-devel...@googlegroups.com.

>>> To post to this group, send email to

>>> prometheus...@googlegroups.com.

>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/prometheus-developers/989bd868-29e3-4a4d-b746-5508ab0323bf%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
>>
>> --
>> --
>> David J. M. Karlsen - http://www.linkedin.com/in/davidkarlsen
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Prometheus Developers" group.
>> To unsubscribe from this group and stop receiving emails from it, send an

>> email to prometheus-devel...@googlegroups.com.

>> To post to this group, send email to

>> prometheus...@googlegroups.com.

Brian Brazil

unread,

Jan 23, 2018, 3:04:04 PM1/23/18

to Andrey F., Prometheus Developers

On 23 January 2018 at 19:03, Andrey F. <ma3o...@gmail.com> wrote:

On Tue, Jan 23, 2018 at 2:51 AM, Brian Brazil
<brian.brazil@robustperception.io> wrote:
>>
>> I propose one of the following solutions:
>> 1) Change the code to only run that function in the JmxCollector
>> constructor and reloadConfig function (I happen to be sitting on this change
>> locally)
>
>
> mBeans can be (and often are) added after the exporter starts.
>

Good point, I didn't consider this. I can think of two ways to solve that:
(a) Add an option to yaml that allows users to specify a delay at
agent start up. The default would be something reasonable.

We have that for other reasons (broken jmx implementations), it's not sufficient here.

(b) Have an option where the mBeans are re-polled at regular
intervals. There is already a "reloadConfig" feature; perhaps a subset
of that can be run at a user-specified cadence w/ a reasonable
default.

That could produce weird output, if a new bbean was needed for context.

I'm guessing that (b) would be better because it would cover the
dynamic mBean use-cases. However (a) could be used by non-dynamic use
cases to get maximum efficiency. Perhaps both (a) and (b) could be
implemented too.

>>
>> 2) Memoize getKeyPropertyList
>
>
> Some applications can have a lot of churn, do you have an idea to prevent
> this becoming a memory leak?
>

My initial assumption was that mBeans only need to be polled once, so
there would be logic to only run regexes if the keyPropertyList was
missing; so effectively after initial load, it would ignore any churn
and only have the initial mBeans loaded. In lieu of my assumption, the
only idea I have to avoid a leak is to split getKeyPropertyList to run
in another thread at its own cadence. However, that would essentially
look like what I described solution 1)(b) above.

I don't see how another thread helps, we need up to date results synchronously with a scrape. This is a cache size question.

>>
>> 3) Change yaml configuration so that users have the option to
>> auto-discover and load all jmx mbeans and use reflection to find their
>> properties (i.e. no regexes). That discovery would happen once at runtime.
>> We'd still keep the current regex method and users can keep their old
>> configs.
>
>
> Same issues as for 1), and would probably break something.
>

Yes, this would have the same issues as 1). (a) and (b) would apply to
this idea as well. In terms of breaking something, wouldn't that be
minimized by this being a required opt-in via config? All the previous
code would most remain untouched with this idea.

> Brian
>

--

Brian Brazil

www.robustperception.io

Andrey F.

unread,

Jan 23, 2018, 4:43:05 PM1/23/18

to Brian Brazil, Prometheus Developers

On Tue, Jan 23, 2018 at 12:04 PM, Brian Brazil
<brian....@robustperception.io> wrote:
>>
>> >>
>> >> 2) Memoize getKeyPropertyList
>> >
>> >
>> > Some applications can have a lot of churn, do you have an idea to
>> > prevent
>> > this becoming a memory leak?
>> >
>>
>> My initial assumption was that mBeans only need to be polled once, so
>> there would be logic to only run regexes if the keyPropertyList was
>> missing; so effectively after initial load, it would ignore any churn
>> and only have the initial mBeans loaded. In lieu of my assumption, the
>> only idea I have to avoid a leak is to split getKeyPropertyList to run
>> in another thread at its own cadence. However, that would essentially
>> look like what I described solution 1)(b) above.
>
>
> I don't see how another thread helps, we need up to date results
> synchronously with a scrape. This is a cache size question.
>

What if we keep track of the result of `beanConn.queryMBeans(name,
null)` as a hash value, when it changes, we run getKeyPropertyList
similar to how it is done today, else proceed with cache?
whitelistObjectNames is of a bound size, so we'd have a 1-to-1 mapping
of hashes to track. As far as the cache of the results from
getKeyPropertyList, we'd have the following map:
HashMap<ObjectName, LinkedHashMap<String, String>>
If ObjectName is no longer present in the latest queryMBeans, then we
delete it from the map. This way, I think the amount of memory used by
the KeyProperty list map would follow the memory bounds of the mBeans
the application maintains.

>
>
>
>
> --
> Brian Brazil
> www.robustperception.io

Brian Brazil

unread,

Jan 23, 2018, 6:13:41 PM1/23/18

to Andrey F., Prometheus Developers

On 23 January 2018 at 21:42, Andrey F. <ma3o...@gmail.com> wrote:

On Tue, Jan 23, 2018 at 12:04 PM, Brian Brazil
<brian.brazil@robustperception.io> wrote:
>>
>> >>
>> >> 2) Memoize getKeyPropertyList
>> >
>> >
>> > Some applications can have a lot of churn, do you have an idea to
>> > prevent
>> > this becoming a memory leak?
>> >
>>
>> My initial assumption was that mBeans only need to be polled once, so
>> there would be logic to only run regexes if the keyPropertyList was
>> missing; so effectively after initial load, it would ignore any churn
>> and only have the initial mBeans loaded. In lieu of my assumption, the
>> only idea I have to avoid a leak is to split getKeyPropertyList to run
>> in another thread at its own cadence. However, that would essentially
>> look like what I described solution 1)(b) above.
>
>
> I don't see how another thread helps, we need up to date results
> synchronously with a scrape. This is a cache size question.
>

What if we keep track of the result of `beanConn.queryMBeans(name,
null)` as a hash value, when it changes, we run getKeyPropertyList
similar to how it is done today, else proceed with cache?
whitelistObjectNames is of a bound size, so we'd have a 1-to-1 mapping
of hashes to track.

I don't see how whitelistObjectNames comes into it.

As far as the cache of the results from
getKeyPropertyList, we'd have the following map:
HashMap<ObjectName, LinkedHashMap<String, String>>
If ObjectName is no longer present in the latest queryMBeans, then we
delete it from the map. This way, I think the amount of memory used by
the KeyProperty list map would follow the memory bounds of the mBeans
the application maintains.

That should work, though locking needs to be considered.

Brian

>
>
>
>
> --
> Brian Brazil
> www.robustperception.io

--

Brian Brazil

www.robustperception.io

Andrey F.

unread,

Jan 23, 2018, 9:08:06 PM1/23/18

to Brian Brazil, Prometheus Developers

On Tue, Jan 23, 2018 at 3:13 PM, Brian Brazil
<brian....@robustperception.io> wrote:
> On 23 January 2018 at 21:42, Andrey F. <ma3o...@gmail.com> wrote:
>>
>> On Tue, Jan 23, 2018 at 12:04 PM, Brian Brazil
>> <brian....@robustperception.io> wrote:
>> >>
>> >> >>
>> >> >> 2) Memoize getKeyPropertyList
>> >> >
>> >> >
>> >> > Some applications can have a lot of churn, do you have an idea to
>> >> > prevent
>> >> > this becoming a memory leak?
>> >> >
>> >>
>> >> My initial assumption was that mBeans only need to be polled once, so
>> >> there would be logic to only run regexes if the keyPropertyList was
>> >> missing; so effectively after initial load, it would ignore any churn
>> >> and only have the initial mBeans loaded. In lieu of my assumption, the
>> >> only idea I have to avoid a leak is to split getKeyPropertyList to run
>> >> in another thread at its own cadence. However, that would essentially
>> >> look like what I described solution 1)(b) above.
>> >
>> >
>> > I don't see how another thread helps, we need up to date results
>> > synchronously with a scrape. This is a cache size question.
>> >
>>
>> What if we keep track of the result of `beanConn.queryMBeans(name,
>> null)` as a hash value, when it changes, we run getKeyPropertyList
>> similar to how it is done today, else proceed with cache?
>> whitelistObjectNames is of a bound size, so we'd have a 1-to-1 mapping
>> of hashes to track.
>
>
> I don't see how whitelistObjectNames comes into it.
>

I'll be tracking if anything different pops up here:
https://github.com/prometheus/jmx_exporter/blob/master/collector/src/main/java/io/prometheus/jmx/JmxScraper.java#L113
If nothing changed, then we don't update the cache.

>>
>> As far as the cache of the results from
>> getKeyPropertyList, we'd have the following map:
>> HashMap<ObjectName, LinkedHashMap<String, String>>
>> If ObjectName is no longer present in the latest queryMBeans, then we
>> delete it from the map. This way, I think the amount of memory used by
>> the KeyProperty list map would follow the memory bounds of the mBeans
>> the application maintains.
>
>
> That should work, though locking needs to be considered.
>
> Brian
>

I'll throw together a draft PR and see how it goes. I suspect that
locking could be handled by using a ConcurrentHashMap, although that
still leaves the LinkedHashMap unsafe.

bur...@scalefastr.io

unread,

Jan 24, 2018, 10:45:38 AM1/24/18

to Prometheus Developers

The main problem with the JMX approach is that each bean call is reflected JMX which makes things impossibly slow. It's an RPC and there is ethernet latency on each call.

Usually that's pretty fast but most JMX uses use thousands and thousands of calls so performance falls quickly.

There are components like jolokia:

https://jolokia.org/

which then use REST for accessing the data. They're very fast, but they require an agent which means you can't really use a sidecar approach. You have to add the agent to the JVM via a command line and then you get a REST interface for pulling out the data.

bur...@scalefastr.io

unread,

Jan 24, 2018, 10:47:03 AM1/24/18

to Prometheus Developers

There's also FastJMX which is used with collectd and we had good experience with.

https://github.com/egineering-llc/collectd-fast-jmx

We migrated to FastJMX at some point with our collectd/kairosdb setup and were pretty happy with the results.

Brian Brazil

unread,

Jan 24, 2018, 10:52:09 AM1/24/18

to bur...@scalefastr.io, Prometheus Developers

On 24 January 2018 at 15:45, <bur...@scalefastr.io> wrote:

The main problem with the JMX approach is that each bean call is reflected JMX which makes things impossibly slow. It's an RPC and there is ethernet latency on each call.

There's no network latency when using a JMX agent, it's all inside a JVM.

Brian

Usually that's pretty fast but most JMX uses use thousands and thousands of calls so performance falls quickly.

There are components like jolokia:

https://jolokia.org/

which then use REST for accessing the data. They're very fast, but they require an agent which means you can't really use a sidecar approach. You have to add the agent to the JVM via a command line and then you get a REST interface for pulling out the data.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/8bdc33ef-70a0-4386-a9a2-042674fdf51e%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Brian Brazil

www.robustperception.io

Kevin Burton

unread,

Jan 24, 2018, 11:23:22 AM1/24/18

to Brian Brazil, Prometheus Developers

If you're inside the JVM as an agent you're right. It's still a RMI stub but it's fairly fast.

A remote process uses an RMI stub which is slow.

I've spent a ton of time optimizing JMX with ActiveMX and the collectd version.

Unless they changed something on Java 9...

There isn't a multi-get version of each stub so you're stuck making each call.

On Wed, Jan 24, 2018 at 7:52 AM, Brian Brazil <brian....@robustperception.io> wrote:

On 24 January 2018 at 15:45, <bur...@scalefastr.io> wrote:
The main problem with the JMX approach is that each bean call is reflected JMX which makes things impossibly slow. It's an RPC and there is ethernet latency on each call.

There's no network latency when using a JMX agent, it's all inside a JVM.

--

Affordable, scalable, and managed Elasticsearch and Cassandra clusters deployed on bare metal.

Kevin Burton

Founder/CEO Scalefastr

Location: San Francisco, CA

Reply all

Reply to author

Forward

Proposal to speed up jmx_exporter

Andrey Falko

David Karlsen

Ben Kochie

Brian Brazil

Andrey F.

Andrey F.

Brian Brazil

Andrey F.

Brian Brazil

Andrey F.

bur...@scalefastr.io

bur...@scalefastr.io

Brian Brazil

Kevin Burton