Open file descriptors explode for stream queues

Stephan Schuler

unread,

Jun 20, 2022, 12:31:11 PM6/20/22

to rabbitmq-users

Hey there.

tl;dr:

When distributing messages on a stream queue, our RabbitMQ server eats up every open file descriptor it has, then starts crashing.

It seems to be related to my clients. While I keep my 46 clients up, the number of open file handles rises up to 32k. Once I kill my clients, the number of open file handles gets back to almost 0.

About the server

We're running RabbitMQ 3.10.5 on Erlang 23.2.3 in conjunction with corresponding Stream plugin.

We have 25 queues.
One is a stream queue, 24 are non-stream queues.

We have 46 connections.
23 target the stream queue, 23 target a single non-stream queue each.

This stream queue currently holds 600'000 messages up to seven days,
which fits in a single segment file.

It's a single server. No cluster, no quorum.

About the clients:

My client is PHP, its code is based on php-amqplib/discussions.

I do not put the last seem message ID into the queue. The documentation tells me this wouldn't be a good idea and I do have local storage available.

What's happening?

It seems like whenever a connection to the stream queue handles a message, the RabbitMQ server adds at least one additional file descriptor to the single segment file.

After after some (depending on the load, obviously), the RabbitMQ server has 32k open file handles to the same segment file. Then it starts carshing.

When I close/shutdown all my clients, the number of open file handles goes back to about 50.

Observations:

My clients usually have a single loop in PHP connecting to the RabbitMQ server, which can be seen via lsof.

The PHP client takes a message from Rabbit and issues another PHP script via exec(). As of how the framework I'm using works, this spawns a sh process which spawns a php process.

I can see (via lsof) that my TCP connection is present in the parent process (the script which is my loop connected to the RabbitMQ server), is not present in the sh process, but is present in the inner php process which just takes the message via command line argument and handles it without needing to interact with Rabbit.

lsof tells me it is the very same connection by showing me the exact same host:port->host:port line on both, the outer php process as well as the inner php process.

So what's going on here?

I have no clue what's going on here. It's not like my clients opened new connections per message. That's just not the case. The one TCP connection my main loop holds stays the same until I manually close it, lsof shows me it's the very same. The number of connections to RabbitMQ don't increase at all (as of what the web interface tells me) but my open file descriptors keep climbing.

Any help would be appreciated. My gut tells me increasing the number of allowed open file handles isn't gonna solve my problem.

Regards,
Stephan.

Karl Nilsson

unread,

Jun 20, 2022, 2:31:04 PM6/20/22

to rabbitmq-users

I understand you don't create new connections but do you create either new channels and/or consumers per message?

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/36aeb145-70fc-47b5-bc43-d2363a7dd4c5n%40googlegroups.com.

--

Karl Nilsson

Stephan Schuler

unread,

Jun 21, 2022, 10:34:45 AM6/21/22

to rabbitmq-users

Hey Karl,

thank you for your response.

No, we don't create connections, nor channels, nor consumers in response to messages. We have exactly the numbers above and only consume messages in order to do something entirely unrelated with them.

But I guess we solved it. It was the lib we were using being not really fit for streams, and handling consecutive messages kind of clumsy, I would say.

For consuming a stream, the lib (before we patched it) went something like this:

Connect to Rabbit
Fetch last seen message ID from local store
Register to the stream for consumption with given offset as consumer tag
Wait for one message
Cancel registration from step 3 by sending a cancellation for that consumer tag
Handle message
Store current message ID to local store
Repeat from 2

I guess the author of the lib we were using was under the impression his cancelling of the registration in step 5 would be the thing to do.
Looking at how he implemented it and how the methods are called, I would have thought the same.

We changed the lib to not cancel a consumer tag and create another one with an incremented offset but just keep messages coming in, which solved the problem. The number of open file handles used by Rabbit server is now no longer constantly scratching 32k but is right around 150.

I don't know if this constitutes as a bug nevertheless.
Maybe someone with deep knowledge of the protocol level could clarify if sending a cancellation for a given consumer tag is supposed to free file pointers on the server side (as we thought) or keep them around for further use (as it seems to be).

Fun fact:

We had the exact same behavior of the lib running for quite some time but with no visible problem at all, back when we didn't use streams. Consuming no-stream messages works in the exact same way but didn't cause problems.

Maybe it's either directly related to streams, or it's the fact that with no streams involved, fetching a message uses always the very same registration call, as opposed to with the clumsy message and offset situation, where every new expected message would register with a slightly different arguments payload because the requested offset would be +1 for every new message.

Anyway. Would be nice to know if our way of doing things was to be expected to be able to nearly tear down our RabbitMQ server node, or if this is actually to be considered a faulty Rabbit behavior. Despite of our approach of fetching one message at at time with increasing consumer tags and of course only if you can make sense of what I'm talking here, that is. Is this something I should raise a bug on github (because that's something a client should simply not be allowed to do), or is the resources situation as is the expected behavior and it's actually a non-issue for you RabbitMQ server people?

As to my problem with constantly increasing open file pointers: That's not an issue for me any longer.

Regards,

Stephan.

Karl Nilsson

unread,

Jun 21, 2022, 11:18:13 AM6/21/22

to rabbitmq-users

Right ok thanks, yes so in this case the lib _was_ creating a new consumer per message and you have indeed found a bug as the stream is not closed correctly when an AMQP consumer is cancelled. I will raise a ticket for this. It is much better to do what you are doing now anyway btw.

Cheers

Karl

To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/0525ba55-f7a5-4bef-a1f3-f9b5cb590084n%40googlegroups.com.

--

Karl Nilsson

kjnilsson

unread,

Jun 21, 2022, 11:53:31 AM6/21/22

to rabbitmq-users

Here is a PR for the fix

https://github.com/rabbitmq/rabbitmq-server/pull/5085/

Reply all

Reply to author

Forward