RabbitMQ java client - locked at SocketFrameHandler#writeFrame

232 views
Skip to first unread message

Gowri Sankar Suryanarayana

unread,
Mar 9, 2022, 9:40:40 AM3/9/22
to rabbitmq-users
Hello

We've been using amqp-client-3.6.0 and spring-rabbit-1.6.0.RELEASE dependencies to push and receive messages to our RabbitMQ (v3.8.3) cluster

We noticed for the first time that the some of the publishing threads were blocked on an rabbitmq client-internal method writeFrame in the class SocketFrameHandler waiting for the monitor on the ObjectOutputStream which was acquired by another thread which was publishing message at the time. This blockage happened for about 15 min. Java Thread dump for the relevant threads is attached. Thread http-nio-8080-exec-123 is waiting on the lock acquired by the thread http-nio-8080-exec-110

We haven't drilled down into the time taken by the causer thread. It might be a host issue with resources, a connection issue with rabbitmq broker or an internal issue with the broker. We'll do an investigation later as this is not the prime concern

But the concern here is the locking mechanism implemented in the native client library which doesn't allow concurrent writes to the same connection when different channels are used by different threads concurrently. It looks both reads and writes onto the connection are synchronized no matter how many channels are in scope.

Can someone shed light whether this behaviour is expected? We have always encouraged developers to reuse the connection and parallelize their reads/writes on different channels but we're not sure now.
threads-in-sockethandler-land

Luke Bakken

unread,
Mar 9, 2022, 12:01:01 PM3/9/22
to rabbitmq-users
Hello,

The current release of amqp-client is version 5.14.x. RabbitMQ is version 3.9.13.

Please upgrade! More than likely this is an old bug in amqp-client.

Thanks -
Luke

Gowri Sankar Suryanarayana

unread,
Mar 9, 2022, 12:04:15 PM3/9/22
to rabbitm...@googlegroups.com
Hello, Luke

I thought so, too, but the latest has the same code seen in the version we use. I think this is more of a design feature than a impl bug

image.png

--
A.S.Gowri Sankar


--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/XJdiTTq8SDs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/1a50a09a-6065-4069-987b-789b4e7c8a46n%40googlegroups.com.

Arnaud Cogoluègnes

unread,
Mar 9, 2022, 12:11:48 PM3/9/22
to rabbitmq-users
If we don't synchronize this call, frame content will interleave, and the server will receive garbage. What do you suggest to fix this?

Arnaud Cogoluègnes

unread,
Mar 9, 2022, 12:16:28 PM3/9/22
to rabbitmq-users
And you should still upgrade, the "15 minutes hanging" problem pops up every once in the while on this mailing list, the latest versions of the client have ways to mitigate it. We never managed to find the real cause, it's likely to be at the OS or infrastructure level.

Gowri Sankar Suryanarayana

unread,
Mar 9, 2022, 12:51:22 PM3/9/22
to rabbitm...@googlegroups.com
hello, Arnaud

I'm in no position to make suggestions as I'm more of a user than a framework/library designer/implementer. However,a few thoughts below.

1. I assumed channel communication is independent of the underlying connection, so they can all work without stepping on to each other. But if it's sharing the same socket and there's no way to differentiate the frames or packets by the channel they are meant for, I'm not sure how we can achieve true concurrency among different actors/clients
2. Can we replace sync blocks with Lock.tryLock with a timeout set? this is only to prevent infinite blocking though this cause data loss once the timeout is reached

In the end, it looks like the takeaway here is, we can't avoid locking but we can improve the time taken to perform the action within sync blocks

--
A.S.Gowri Sankar


Luke Bakken

unread,
Mar 9, 2022, 1:03:51 PM3/9/22
to rabbitmq-users
1. I assumed channel communication is independent of the underlying connection, so they can all work without stepping on to each other. But if it's sharing the same socket and there's no way to differentiate the frames or packets by the channel they are meant for, I'm not sure how we can achieve true concurrency among different actors/clients

Use different connections. Channels are in the AMQP protocol to provide a means to multiplex operations on the same TCP socket, but there's a limit to the performance you can get from one connection so we recommend running your own benchmarks to find the right combination of Channels and Connections for your application.
 
If you can find a way to reliably reproduce the issue you report using the latest version of amqp-client that would be great. If you do so, please share a complete code sample we can clone, compile and run.

Thanks,
Luke

Gary Russell

unread,
Mar 9, 2022, 1:46:12 PM3/9/22
to rabbitmq-users
Also spring-amqp 1.6.x has been out of support for a very long time; the last release was in 2017; 1.6.0 is more than a year older than that.

You can see the currently supported versions here [1].



From: rabbitm...@googlegroups.com <rabbitm...@googlegroups.com> on behalf of Luke Bakken <lu...@bakken.io>
Sent: Wednesday, March 9, 2022 12:01 PM
To: rabbitmq-users <rabbitm...@googlegroups.com>
Subject: [Suspected Spam] [rabbitmq-users] Re: RabbitMQ java client - locked at SocketFrameHandler#writeFrame
 
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.

Arnaud Cogoluègnes

unread,
Mar 10, 2022, 2:07:43 AM3/10/22
to rabbitmq-users

2. Can we replace sync blocks with Lock.tryLock with a timeout set? this is only to prevent infinite blocking though this cause data loss once the timeout is reached

In the end, it looks like the takeaway here is, we can't avoid locking but we can improve the time taken to perform the action within sync blocks


We can indeed use Lock instead of synchronized block. I'm not sure that using the timeout version would change much though: if a write operation takes a long time because the socket is likely to be dead, we can indeed react faster thanks to the lock timeout. What to do then? It's unlikely we can recover from the exception, so we must try to close the connection cleanly, which is likely to time out as well.

I don't understand your statement that "we can improve the time taken to perform the action within sync blocks", again, a Lock will just provide a way to react in a determinate amount of time, it won't make anything faster, especially the writing to the socket.

You have a point with the lock suggestion, but I'd like to be able to reproduce with the latest library version first, analyze what's going on, and then come up with actionable suggestions to see if we can make things better.
 

Gowri Sankar Suryanarayana

unread,
Mar 10, 2022, 6:48:32 AM3/10/22
to rabbitm...@googlegroups.com
> I don't understand your statement that "we can improve the time taken to perform the action within sync blocks"

what I meant is, we can only make the block which is guarded by the locks faster as timeouts have the side effect of losing the data if we don't find a way to retry the operation or recreate the connection
--
A.S.Gowri Sankar


--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/XJdiTTq8SDs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages