Rate limiting prometheus jmx-exporter

568 views
Skip to first unread message

kbraj...@gmail.com

unread,
Dec 10, 2020, 10:42:00 AM12/10/20
to Prometheus Developers

Background:
When JMX exporter is overloaded (say 20-30 qps), we have observed that some of the requests take more than 20 sec to serve  which was higher than the client-side request timeout. As a result, the agent tried to send response on connections already closed, which has two consequences: (a) it resulted errors and (b) it sometimes led to the socket channel being blocked indefinitely on the write syscall. Eventually, all threads of the HTTP server in the Prometheus agent get stuck and no more requests can be accepted. However, the thread accepting connections is still active and new connections are created but never actually used, and since all request threads of the HTTP server are stuck, the connections are never closed by the server, resulting in a long backlog of CLOSE_WAIT sockets waiting to be closed.

Proposed Solution
1. Limit connections
We want to limit the number of connection to the exporter agents. There is no native way for jmx exporter to put such restrictions(To over come this we will be adding ip table rules for jmx port). 

2. Adding timeouts to requestes
This could be easily achieved by JVM settings. But it would be nice to add these in jmx-exporter's documentation.
-Dsun.net.httpserver.maxReqTime=20 -Dsun.net.httpserver.maxRspTime=20

Please let me know if this idea makes sense to community. I can work on design.


Thanks
Brajesh Kumar

Brian Brazil

unread,
Dec 10, 2020, 11:04:56 AM12/10/20
to kbraj...@gmail.com, Prometheus Developers
On Thu, 10 Dec 2020 at 15:42, kbraj...@gmail.com <kbraj...@gmail.com> wrote:

Background:
When JMX exporter is overloaded (say 20-30 qps), we have observed that some of the requests take more than 20 sec to serve  which was higher than the client-side request timeout. As a result, the agent tried to send response on connections already closed, which has two consequences: (a) it resulted errors and (b) it sometimes led to the socket channel being blocked indefinitely on the write syscall. Eventually, all threads of the HTTP server in the Prometheus agent get stuck and no more requests can be accepted. However, the thread accepting connections is still active and new connections are created but never actually used, and since all request threads of the HTTP server are stuck, the connections are never closed by the server, resulting in a long backlog of CLOSE_WAIT sockets waiting to be closed.

Proposed Solution
1. Limit connections
We want to limit the number of connection to the exporter agents. There is no native way for jmx exporter to put such restrictions(To over come this we will be adding ip table rules for jmx port).

There is already a hardcoded limit of 5 threads in the exporter.

It sounds like the real problem here is excessive load on the exporter, 20-30qps is far from typical usage for any exporter before even considering the inefficiencies of JMX. I'd suggest reconsidering how you are using the exporter.

Brian
 

2. Adding timeouts to requestes
This could be easily achieved by JVM settings. But it would be nice to add these in jmx-exporter's documentation.
-Dsun.net.httpserver.maxReqTime=20 -Dsun.net.httpserver.maxRspTime=20

Please let me know if this idea makes sense to community. I can work on design.


Thanks
Brajesh Kumar

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/d3d0c669-0794-47e9-b0db-96cf2cb2d8efn%40googlegroups.com.


--

kbraj...@gmail.com

unread,
Dec 10, 2020, 11:19:54 AM12/10/20
to Prometheus Developers
Hello Brian,
Thanks for replying.

Yes I understand that this is unexpected load and 20-30 QPS is beyond the limits of jmx exporter. I was proposing to add configuration to limit the connection rates from the exporter itself (when running exporter as stand alone sever).

To answer your question about reconsidering the use of exporter, My team is maintaining the exporter and other teams are reading data from exporter via exposed port. In some cases(say, bad Prometheus server scrape interval ) can cause such unexpected load. In order to avoid such issues, I wanted to have rate limiting at exporter layer itself.

Can you please also point me to hard-coded threads count limit in jmx-exporter code?

Brian Brazil

unread,
Dec 10, 2020, 12:04:24 PM12/10/20
to kbraj...@gmail.com, Prometheus Developers
On Thu, 10 Dec 2020 at 16:19, kbraj...@gmail.com <kbraj...@gmail.com> wrote:
Hello Brian,
Thanks for replying.

Yes I understand that this is unexpected load and 20-30 QPS is beyond the limits of jmx exporter. I was proposing to add configuration to limit the connection rates from the exporter itself (when running exporter as stand alone sever).

To answer your question about reconsidering the use of exporter, My team is maintaining the exporter and other teams are reading data from exporter via exposed port. In some cases(say, bad Prometheus server scrape interval ) can cause such unexpected load. In order to avoid such issues, I wanted to have rate limiting at exporter layer itself. 

For something like JMX, that may not be the best of approaches. Especially if you're using the non-recommended standalone mode, as that's even less efficient and trickier to get working.

I'd suggest having other teams who want to know about the service you're running to look at your dashboards, rather than independently monitoring someone else's service.
 
Can you please also point me to hard-coded threads count limit in jmx-exporter code?

Reply all
Reply to author
Forward
0 new messages